nlp_primitives.DiversityScore¶

class nlp_primitives.DiversityScore[source]¶

Calculates the overall complexity of the text based on the total

number of words used in the text

Description:

Given a list of strings, calculates the total number of unique words divided by the total number of words in order to give the text a score from 0-1 that indicates how unique the words used in it are. This primitive only evaluates the ‘clean’ versions of strings, so ignoring cases, punctuation, and stopwords in its evaluation.

If a string is missing, return NaN

Examples

>>> diversity_score = DiversityScore()
>>> diversity_score(["hi hi hi", "hello its me", "hey what hey what", "a dog ate a basket"]).tolist()
[0.3333333333333333, 1.0, 0.5, 1.0]

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`()	Initialize self.
`generate_name`(base_feature_names)
`generate_names`(base_feature_names)
`get_args_string`()
`get_arguments`()
`get_filepath`(filename)
`get_function`()

Attributes

`base_of`
`base_of_exclude`
`commutative`
`dask_compatible`
`default_value`
`input_types`
`max_stack_depth`
`name`
`number_output_features`
`uses_calc_time`
`uses_full_entity`