featuretools.get_recommended_primitives#
- featuretools.get_recommended_primitives(entityset: EntitySet, include_time_series_primitives: bool = False, excluded_primitives: List[str] = ['count_string', 'distance_to_holiday', 'is_in_geobox', 'not_equal_scalar', 'equal_scalar', 'time_since', 'isin', 'multiply_boolean', 'numeric_lag', 'cum_count', 'cumulative_time_since_last_false', 'cumulative_time_since_last_true', 'diff', 'diff_datetime', 'is_first_occurrence', 'is_last_occurrence', 'time_since_previous', 'not', 'and', 'or', 'equal', 'not_equal']) List[str] [source]#
Get a list of recommended primitives given an entity set.
- Description:
This function works by first getting a list of valid primitives withholding any primitives specified in excluded_primitives that could be applied to a single-table EntitySet. Secondly, engineered features are created for non-numeric fields and are checked for non-uniqueness. If the feature is non-unique, it is added to the recommendation list. Then, numeric fields are checked for skewness. Depending on how skew a column is square_root or natural_logarithm will be recommended. Lastly if include_time_series_primitives is specified as True, Lag will always be recommended, as well as all Rolling and Expanding primitives if numeric columns are present.
- Parameters:
entityset (EntitySet) – EntitySet that only contains one dataframe.
include_time_series_primitives (bool) – Whether or not time-series primitives should be considered. Defaults to False.
excluded_primitives (List[str]) – List of transform primitives to exclude from recommendations. Defaults to DEFAULT_EXCLUDED_PRIMITIVES.
Note
The main objective of this function is to recommend primitives that could potentially provide important features to the modeling process. Non-numeric primitives do a great job in mainly serving as a way to extract information from origin features that may essentially be meaningless by themselves (e.g., NaturalLanguage, Datetime, LatLong). That is why they are the main focus of this function. Numeric transform primitives are very case-by-case dependent and therefore it is hard to mathematically quantify which should be recommended. Therefore, only transform primitives that address skewed numeric columns are included, as this is a standard and quantifiable transformation step. The only exception to this rule being for time series problems. Because there are so few primitives that are only applicable for time series, all of them are included in the recommended primitives list.
Note
This function currently only works for single table and will only recommend transform primitives.