nlp_primitives.DiversityScore¶
- class nlp_primitives.DiversityScore[source]¶
- Calculates the overall complexity of the text based on the total
number of words used in the text
- Description:
Given a list of strings, calculates the total number of unique words divided by the total number of words in order to give the text a score from 0-1 that indicates how unique the words used in it are. This primitive only evaluates the ‘clean’ versions of strings, so ignoring cases, punctuation, and stopwords in its evaluation.
If a string is missing, return NaN
Examples
>>> diversity_score = DiversityScore() >>> diversity_score(["hi hi hi", "hello its me", "hey what hey what", "a dog ate a basket"]).tolist() [0.3333333333333333, 1.0, 0.5, 1.0]
- __init__()¶
Methods
__init__
()generate_name
(base_feature_names)generate_names
(base_feature_names)get_args_string
()get_arguments
()get_description
(input_column_descriptions[, ...])get_filepath
(filename)get_function
()Attributes
base_of
base_of_exclude
commutative
compatibility
Additional compatible libraries
default_value
Default value this feature returns if no data found.
description_template
input_types
woodwork.ColumnSchema types of inputs
max_stack_depth
name
Name of the primitive
number_output_features
Number of columns in feature matrix associated with this feature
return_type
ColumnSchema type of return
uses_calc_time
uses_full_dataframe