nlp_primitives.CountString#

class nlp_primitives.CountString(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

Determines how many times a given string shows up in a text field.

Parameters:
  • string (str) – The string to determine the count of. Defaults to the word “the”.

  • ignore_case (bool) – Determines if case of the string should be considered or not. Defaults to true.

  • ignore_non_alphanumeric (bool) – Determines if non-alphanumeric characters should be used in the search. Defaults to False.

  • is_regex (bool) – Defines if the string argument is a regex or not. Defaults to False.

  • match_whole_words_only (bool) – Determines if whole words should be matched or not. For example searching for word the against then, the, there should only return the if this argument was True. Defaults to False.

Examples

>>> count_string = CountString(string="the")
>>> count_string(["The problem was difficult.",
...               "He was there.",
...               "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match case of string
>>> count_string_ignore_case = CountString(string="the", ignore_case=False)
>>> count_string_ignore_case(["The problem was difficult.",
...                           "He was there.",
...                           "The girl went to the store."]).tolist()
[0.0, 1.0, 1.0]
>>> # Ignore non-alphanumeric characters in the search
>>> count_string_ignore_non_alphanumeric = CountString(string="the",
...                                                    ignore_non_alphanumeric=True)
>>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.",
...                                       "He was there.",
...                                       "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Specify the string as a regex
>>> count_string_is_regex = CountString(string="t.e", is_regex=True)
>>> count_string_is_regex(["The problem was difficult.",
...                        "He was there.",
...                        "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match whole words only
>>> count_string_match_whole_words_only = CountString(string="the",
...                                                   match_whole_words_only=True)
>>> count_string_match_whole_words_only(["The problem was difficult.",
...                                      "He was there.",
...                                      "The girl went to the store."]).tolist()
[1.0, 0.0, 2.0]
__init__(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

Methods

__init__([string, ignore_case, ...])

flatten_nested_input_types(input_types)

Flattens nested column schema inputs into a single list.

generate_name(base_feature_names)

generate_names(base_feature_names)

get_args_string()

get_arguments()

get_description(input_column_descriptions[, ...])

get_filepath(filename)

get_function()

process_text(text)

Attributes

base_of

base_of_exclude

commutative

compatibility

Additional compatible libraries

default_value

Default value this feature returns if no data found.

description_template

input_types

woodwork.ColumnSchema types of inputs

max_stack_depth

name

Name of the primitive

number_output_features

Number of columns in feature matrix associated with this feature

return_type

ColumnSchema type of return

uses_calc_time

uses_full_dataframe