nlp_primitives.LSA¶
- class nlp_primitives.LSA(random_seed=0)[source]¶
- Calculates the Latent Semantic Analysis Values of NaturalLanguage Input - Description:
- Given a list of strings, transforms those strings using tf-idf and single value decomposition to go from a sparse matrix to a compact matrix with two values for each string. These values represent that Latent Semantic Analysis of each string. These values will represent their context with respect to (nltk’s gutenberg corpus.)[https://www.nltk.org/book/ch02.html#gutenberg-corpus] - If a string is missing, return NaN. 
 - Examples - >>> lsa = LSA() >>> x = ["he helped her walk,", "me me me eat food", "the sentence doth long"] >>> res = lsa(x).tolist() >>> for i in range(len(res)): res[i] = [abs(round(x, 2)) for x in res[i]] >>> res [[0.01, 0.01, 0.01], [0.0, 0.0, 0.01]] - Now, if we change the values of the input corpus, to something that better resembles the given text, the same given input text will result in a different, more discerning, output. Also, NaN values are handled, as well as strings without words. - >>> lsa = LSA() >>> x = ["the earth is round", "", np.NaN, ".,/"] >>> res = lsa(x).tolist() >>> for i in range(len(res)): res[i] = [abs(round(x, 2)) for x in res[i]] >>> res [[0.02, 0.0, nan, 0.0], [0.02, 0.0, nan, 0.0]] - Methods - __init__([random_seed])- generate_name(base_feature_names)- generate_names(base_feature_names)- get_args_string()- get_arguments()- get_description(input_column_descriptions[, ...])- get_filepath(filename)- get_function()- Attributes - base_of- base_of_exclude- commutative- compatibility- Additional compatible libraries - default_value- Default value this feature returns if no data found. - description_template- input_types- woodwork.ColumnSchema types of inputs - max_stack_depth- name- Name of the primitive - number_output_features- Number of columns in feature matrix associated with this feature - return_type- ColumnSchema type of return - uses_calc_time- uses_full_dataframe