nlp_primitives.
DiversityScore
number of words used in the text
Given a list of strings, calculates the total number of unique words divided by the total number of words in order to give the text a score from 0-1 that indicates how unique the words used in it are. This primitive only evaluates the ‘clean’ versions of strings, so ignoring cases, punctuation, and stopwords in its evaluation.
If a string is missing, return NaN
Examples
>>> diversity_score = DiversityScore() >>> diversity_score(["hi hi hi", "hello its me", "hey what hey what", "a dog ate a basket"]).tolist() [0.3333333333333333, 1.0, 0.5, 1.0]
__init__
Initialize self. See help(type(self)) for accurate signature.
Methods
__init__()
Initialize self.
generate_name(base_feature_names)
generate_name
generate_names(base_feature_names)
generate_names
get_args_string()
get_args_string
get_arguments()
get_arguments
get_description(input_column_descriptions[, …])
get_description
get_filepath(filename)
get_filepath
get_function()
get_function
Attributes
base_of
base_of_exclude
commutative
compatibility
default_value
description_template
input_types
max_stack_depth
name
number_output_features
uses_calc_time
uses_full_entity