nlp_primitives.DiversityScore#

class nlp_primitives.DiversityScore[source]#

Calculates the overall complexity of the text based on the total

number of words used in the text

Description:

Given a list of strings, calculates the total number of unique words divided by the total number of words in order to give the text a score from 0-1 that indicates how unique the words used in it are. This primitive only evaluates the ‘clean’ versions of strings, so ignoring cases, punctuation, and stopwords in its evaluation.

If a string is missing, return NaN

Examples

>>> diversity_score = DiversityScore()
>>> diversity_score(["hi hi hi", "hello its me", "hey what hey what", "a dog ate a basket"]).tolist()
[0.3333333333333333, 1.0, 0.5, 1.0]

__init__()#

Methods

`__init__`()
`flatten_nested_input_types`(input_types)	Flattens nested column schema inputs into a single list.
`generate_name`(base_feature_names)
`generate_names`(base_feature_names)
`get_args_string`()
`get_arguments`()
`get_description`(input_column_descriptions[, ...])
`get_filepath`(filename)
`get_function`()

Attributes

`base_of`
`base_of_exclude`
`commutative`
`compatibility`	Additional compatible libraries
`default_value`	Default value this feature returns if no data found.
`description_template`
`input_types`	woodwork.ColumnSchema types of inputs
`max_stack_depth`
`name`	Name of the primitive
`number_output_features`	Number of columns in feature matrix associated with this feature
`return_type`	ColumnSchema type of return
`series_library`
`stack_on`
`stack_on_exclude`
`stack_on_self`
`uses_calc_time`
`uses_full_dataframe`

Table of Contents

Previous topic

Next topic

This Page

nlp_primitives.DiversityScore#

Table of Contents

Previous topic

Next topic

This Page

Quick search

nlp_primitives.DiversityScore#