nlp_primitives.LSA¶
- class nlp_primitives.LSA[source]¶
Calculates the Latent Semantic Analysis Values of NaturalLanguage Input
- Description:
Given a list of strings, transforms those strings using tf-idf and single value decomposition to go from a sparse matrix to a compact matrix with two values for each string. These values represent that Latent Semantic Analysis of each string. These values will represent their context with respect to (nltk’s gutenberg corpus.)[https://www.nltk.org/book/ch02.html#gutenberg-corpus]
If a string is missing, return NaN.
Examples
>>> lsa = LSA() >>> x = ["he helped her walk,", "me me me eat food", "the sentence doth long"] >>> res = lsa(x).tolist() >>> for i in range(len(res)): res[i] = [abs(round(x, 2)) for x in res[i]] >>> res [[0.01, 0.01, 0.01], [0.0, 0.0, 0.01]]
Now, if we change the values of the input corpus, to something that better resembles the given text, the same given input text will result in a different, more discerning, output. Also, NaN values are handled, as well as strings without words.
>>> lsa = LSA() >>> x = ["the earth is round", "", np.NaN, ".,/"] >>> res = lsa(x).tolist() >>> for i in range(len(res)): res[i] = [abs(round(x, 2)) for x in res[i]] >>> res [[0.02, 0.0, nan, 0.0], [0.02, 0.0, nan, 0.0]]
Methods
__init__()generate_name(base_feature_names)generate_names(base_feature_names)get_args_string()get_arguments()get_description(input_column_descriptions[, ...])get_filepath(filename)get_function()Attributes
base_ofbase_of_excludecommutativecompatibilityAdditional compatible libraries
default_valueDefault value this feature returns if no data found.
description_templateinput_typeswoodwork.ColumnSchema types of inputs
max_stack_depthnameName of the primitive
number_output_featuresNumber of columns in feature matrix associated with this feature
return_typeColumnSchema type of return
uses_calc_timeuses_full_dataframe