nlp_primitives.MedianWordLength#

class nlp_primitives.MedianWordLength(delimiters_regex='[- \\[\\].,!\\?;\\n]')[source]#

Determines the median word length.

Description:: Given list of strings, determine the median word length in each string. A word is defined as a series of any characters not separated by a delimiter. If a string is empty or NaN, return NaN.

Parameters:: delimiters_regex (str) – Delimiters as a regex string for splitting text into words. The default delimiters include “- [].,!?;n”.

Examples

>>> x = ['This is a test file', 'This is second line', 'third line $1,000', None]
>>> median_word_length = MedianWordLength()
>>> median_word_length(x).tolist()
[4.0, 4.0, 3.5, nan]

__init__(delimiters_regex='[- \\[\\].,!\\?;\\n]')[source]#

Methods

`__init__`([delimiters_regex])
`flatten_nested_input_types`(input_types)	Flattens nested column schema inputs into a single list.
`generate_name`(base_feature_names)
`generate_names`(base_feature_names)
`get_args_string`()
`get_arguments`()
`get_description`(input_column_descriptions[, ...])
`get_filepath`(filename)
`get_function`()

Attributes

`base_of`
`base_of_exclude`
`commutative`
`compatibility`	Additional compatible libraries
`default_value`	Default value this feature returns if no data found.
`description_template`
`input_types`	woodwork.ColumnSchema types of inputs
`max_stack_depth`
`name`	Name of the primitive
`number_output_features`	Number of columns in feature matrix associated with this feature
`return_type`	ColumnSchema type of return
`series_library`
`uses_calc_time`
`uses_full_dataframe`