nlp_primitives.
LSA
Calculates the Latent Semantic Analysis Values of NaturalLanguage Input
Given a list of strings, transforms those strings using tf-idf and single value decomposition to go from a sparse matrix to a compact matrix with two values for each string. These values represent that Latent Semantic Analysis of each string. These values will represent their context with respect to (nltk’s gutenberg corpus.)[https://www.nltk.org/book/ch02.html#gutenberg-corpus]
If a string is missing, return NaN.
Examples
>>> lsa = LSA() >>> x = ["he helped her walk,", "me me me eat food", "the sentence doth long"] >>> res = lsa(x).tolist() >>> for i in range(len(res)): res[i] = [abs(round(x, 2)) for x in res[i]] >>> res [[0.01, 0.01, 0.01], [0.0, 0.0, 0.01]]
Now, if we change the values of the input corpus, to something that better resembles the given text, the same given input text will result in a different, more discerning, output. Also, NaN values are handled, as well as strings without words.
>>> lsa = LSA() >>> x = ["the earth is round", "", np.NaN, ".,/"] >>> res = lsa(x).tolist() >>> for i in range(len(res)): res[i] = [abs(round(x, 2)) for x in res[i]] >>> res [[0.02, 0.0, nan, 0.0], [0.02, 0.0, nan, 0.0]]
__init__
Initialize self. See help(type(self)) for accurate signature.
Methods
__init__()
Initialize self.
generate_name(base_feature_names)
generate_name
generate_names(base_feature_names)
generate_names
get_args_string()
get_args_string
get_arguments()
get_arguments
get_description(input_column_descriptions[, …])
get_description
get_filepath(filename)
get_filepath
get_function()
get_function
Attributes
base_of
base_of_exclude
commutative
compatibility
default_value
description_template
input_types
max_stack_depth
name
number_output_features
uses_calc_time
uses_full_entity