NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
featuretools.
encode_features
Encode categorical features
feature_matrix (pd.DataFrame) – Dataframe of features.
features (list[PrimitiveBase]) – Feature definitions in feature_matrix.
top_n (int or dict[string -> int]) – Number of top values to include. If dict[string -> int] is used, key is feature name and value is the number of top values to include for that feature. If a feature’s name is not in dictionary, a default value of 10 is used.
include_unknown (pd.DataFrame) – Add feature encoding an unknown class. defaults to True
to_encode (list[str]) – List of feature names to encode. features not in this list are unencoded in the output matrix defaults to encode all necessary features.
inplace (bool) – Encode feature_matrix in place. Defaults to False.
drop_first (bool) – Whether to get k-1 dummies out of k categorical levels by removing the first level. defaults to False
verbose (str) – Print progress info.
encoded feature_matrix, encoded features
(pd.Dataframe, list)
Example
In [1]: f1 = ft.Feature(es["log"]["product_id"]) In [2]: f2 = ft.Feature(es["log"]["purchased"]) In [3]: f3 = ft.Feature(es["log"]["value"]) In [4]: features = [f1, f2, f3] In [5]: ids = [0, 1, 2, 3, 4, 5] In [6]: feature_matrix = ft.calculate_feature_matrix(features, es, ...: instance_ids=ids) ...: In [7]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, ...: features) ...: In [8]: f_encoded Out[8]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id = toothpaste>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>] In [9]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, ...: features, top_n=2) ...: In [10]: f_encoded Out[10]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>] In [11]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: include_unknown=False) ....: In [12]: f_encoded Out[12]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id = toothpaste>, <Feature: purchased>, <Feature: value>] In [13]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: to_encode=['purchased']) ....: In [14]: f_encoded Out[14]: [<Feature: product_id>, <Feature: purchased>, <Feature: value>] In [15]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: drop_first=True) ....: In [16]: f_encoded Out[16]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>]