nlp_primitives.CountString#

class nlp_primitives.CountString(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

Determines how many times a given string shows up in a text field.

Parameters:

string (str) – The string to determine the count of. Defaults to the word “the”.
ignore_case (bool) – Determines if case of the string should be considered or not. Defaults to true.
ignore_non_alphanumeric (bool) – Determines if non-alphanumeric characters should be used in the search. Defaults to False.
is_regex (bool) – Defines if the string argument is a regex or not. Defaults to False.
match_whole_words_only (bool) – Determines if whole words should be matched or not. For example searching for word the against then, the, there should only return the if this argument was True. Defaults to False.

Examples

>>> count_string = CountString(string="the")
>>> count_string(["The problem was difficult.",
...               "He was there.",
...               "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match case of string
>>> count_string_ignore_case = CountString(string="the", ignore_case=False)
>>> count_string_ignore_case(["The problem was difficult.",
...                           "He was there.",
...                           "The girl went to the store."]).tolist()
[0.0, 1.0, 1.0]
>>> # Ignore non-alphanumeric characters in the search
>>> count_string_ignore_non_alphanumeric = CountString(string="the",
...                                                    ignore_non_alphanumeric=True)
>>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.",
...                                       "He was there.",
...                                       "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Specify the string as a regex
>>> count_string_is_regex = CountString(string="t.e", is_regex=True)
>>> count_string_is_regex(["The problem was difficult.",
...                        "He was there.",
...                        "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match whole words only
>>> count_string_match_whole_words_only = CountString(string="the",
...                                                   match_whole_words_only=True)
>>> count_string_match_whole_words_only(["The problem was difficult.",
...                                      "He was there.",
...                                      "The girl went to the store."]).tolist()
[1.0, 0.0, 2.0]

__init__(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

Methods

`__init__`([string, ignore_case, ...])
`flatten_nested_input_types`(input_types)	Flattens nested column schema inputs into a single list.
`generate_name`(base_feature_names)
`generate_names`(base_feature_names)
`get_args_string`()
`get_arguments`()
`get_description`(input_column_descriptions[, ...])
`get_filepath`(filename)
`get_function`()
`process_text`(text)

Attributes

`base_of`
`base_of_exclude`
`commutative`
`compatibility`	Additional compatible libraries
`default_value`	Default value this feature returns if no data found.
`description_template`
`input_types`	woodwork.ColumnSchema types of inputs
`max_stack_depth`
`name`	Name of the primitive
`number_output_features`	Number of columns in feature matrix associated with this feature
`return_type`	ColumnSchema type of return
`uses_calc_time`
`uses_full_dataframe`

Table of Contents

Previous topic

Next topic

This Page

nlp_primitives.CountString#

Table of Contents

Previous topic

Next topic

This Page

Quick search

nlp_primitives.CountString#