featuretools.primitives.CountString#
- class featuretools.primitives.CountString(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#
Determines how many times a given string shows up in a text field.
- Parameters:
string (str) – The string to determine the count of. Defaults to the word “the”.
ignore_case (bool) – Determines if case of the string should be considered or not. Defaults to true.
ignore_non_alphanumeric (bool) – Determines if non-alphanumeric characters should be used in the search. Defaults to False.
is_regex (bool) – Defines if the string argument is a regex or not. Defaults to False.
match_whole_words_only (bool) – Determines if whole words should be matched or not. For example searching for word the against then, the, there should only return the if this argument was True. Defaults to False.
Examples
>>> count_string = CountString(string="the") >>> count_string(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Match case of string >>> count_string_ignore_case = CountString(string="the", ignore_case=False) >>> count_string_ignore_case(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [0.0, 1.0, 1.0] >>> # Ignore non-alphanumeric characters in the search >>> count_string_ignore_non_alphanumeric = CountString(string="the", ... ignore_non_alphanumeric=True) >>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Specify the string as a regex >>> count_string_is_regex = CountString(string="t.e", is_regex=True) >>> count_string_is_regex(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Match whole words only >>> count_string_match_whole_words_only = CountString(string="the", ... match_whole_words_only=True) >>> count_string_match_whole_words_only(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 0.0, 2.0]
- __init__(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#
Methods
__init__
([string, ignore_case, ...])flatten_nested_input_types
(input_types)Flattens nested column schema inputs into a single list.
generate_name
(base_feature_names)generate_names
(base_feature_names)get_args_string
()get_arguments
()get_description
(input_column_descriptions[, ...])get_filepath
(filename)get_function
()process_text
(text)Attributes
base_of
base_of_exclude
commutative
compatibility
Additional compatible libraries
default_value
Default value this feature returns if no data found.
description_template
input_types
woodwork.ColumnSchema types of inputs
max_stack_depth
name
Name of the primitive
number_output_features
Number of columns in feature matrix associated with this feature
return_type
ColumnSchema type of return
series_library
stack_on
stack_on_exclude
stack_on_self
uses_calc_time
uses_full_dataframe