featuretools.primitives.CountString#
- class featuretools.primitives.CountString(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#
Determines how many times a given string shows up in a text field.
- Parameters:
string (str) – The string to determine the count of. Defaults to the word “the”.
ignore_case (bool) – Determines if case of the string should be considered or not. Defaults to true.
ignore_non_alphanumeric (bool) – Determines if non-alphanumeric characters should be used in the search. Defaults to False.
is_regex (bool) – Defines if the string argument is a regex or not. Defaults to False.
match_whole_words_only (bool) – Determines if whole words should be matched or not. For example searching for word the against then, the, there should only return the if this argument was True. Defaults to False.
Examples
>>> count_string = CountString(string="the") >>> count_string(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Match case of string >>> count_string_ignore_case = CountString(string="the", ignore_case=False) >>> count_string_ignore_case(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [0.0, 1.0, 1.0] >>> # Ignore non-alphanumeric characters in the search >>> count_string_ignore_non_alphanumeric = CountString(string="the", ... ignore_non_alphanumeric=True) >>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Specify the string as a regex >>> count_string_is_regex = CountString(string="t.e", is_regex=True) >>> count_string_is_regex(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Match whole words only >>> count_string_match_whole_words_only = CountString(string="the", ... match_whole_words_only=True) >>> count_string_match_whole_words_only(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 0.0, 2.0]
- __init__(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#
Methods
__init__([string, ignore_case, ...])flatten_nested_input_types(input_types)Flattens nested column schema inputs into a single list.
generate_name(base_feature_names)generate_names(base_feature_names)get_args_string()get_arguments()get_description(input_column_descriptions[, ...])get_filepath(filename)get_function()process_text(text)Attributes
base_ofbase_of_excludecommutativedefault_valueDefault value this feature returns if no data found.
description_templateinput_typeswoodwork.ColumnSchema types of inputs
max_stack_depthnameName of the primitive
number_output_featuresNumber of columns in feature matrix associated with this feature
return_typeColumnSchema type of return
stack_onstack_on_excludestack_on_selfuses_calc_timeuses_full_dataframe