nlp_primitives.DiversityScore

class nlp_primitives.DiversityScore[source]
Calculates the overall complexity of the text based on the total

number of words used in the text

Description:

Given a list of strings, calculates the total number of unique words divided by the total number of words in order to give the text a score from 0-1 that indicates how unique the words used in it are. This primitive only evaluates the ‘clean’ versions of strings, so ignoring cases, punctuation, and stopwords in its evaluation.

If a string is missing, return NaN

Examples

>>> diversity_score = DiversityScore()
>>> diversity_score(["hi hi hi", "hello its me", "hey what hey what", "a dog ate a basket"]).tolist()
[0.3333333333333333, 1.0, 0.5, 1.0]
__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__()

Initialize self.

generate_name(base_feature_names)

generate_names(base_feature_names)

get_args_string()

get_arguments()

get_filepath(filename)

get_function()

Attributes

base_of

base_of_exclude

commutative

dask_compatible

default_value

input_types

max_stack_depth

name

number_output_features

uses_calc_time

uses_full_entity