NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:

pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.


featuretools.encode_features

featuretools.encode_features(feature_matrix, features, top_n=10, include_unknown=True, to_encode=None, inplace=False, drop_first=False, verbose=False)[source]

Encode categorical features

Parameters
  • feature_matrix (pd.DataFrame) – Dataframe of features.

  • features (list[PrimitiveBase]) – Feature definitions in feature_matrix.

  • top_n (int or dict[string -> int]) – Number of top values to include. If dict[string -> int] is used, key is feature name and value is the number of top values to include for that feature. If a feature’s name is not in dictionary, a default value of 10 is used.

  • include_unknown (pd.DataFrame) – Add feature encoding an unknown class. defaults to True

  • to_encode (list[str]) – List of feature names to encode. features not in this list are unencoded in the output matrix defaults to encode all necessary features.

  • inplace (bool) – Encode feature_matrix in place. Defaults to False.

  • drop_first (bool) – Whether to get k-1 dummies out of k categorical levels by removing the first level. defaults to False

  • verbose (str) – Print progress info.

Returns

encoded feature_matrix, encoded features

Return type

(pd.Dataframe, list)

Example

In [1]: f1 = ft.Feature(es["log"]["product_id"])

In [2]: f2 = ft.Feature(es["log"]["purchased"])

In [3]: f3 = ft.Feature(es["log"]["value"])

In [4]: features = [f1, f2, f3]

In [5]: ids = [0, 1, 2, 3, 4, 5]

In [6]: feature_matrix = ft.calculate_feature_matrix(features, es,
   ...:                                              instance_ids=ids)
   ...: 

In [7]: fm_encoded, f_encoded = ft.encode_features(feature_matrix,
   ...:                                            features)
   ...: 

In [8]: f_encoded
Out[8]: 
[<Feature: product_id = coke zero>,
 <Feature: product_id = car>,
 <Feature: product_id = toothpaste>,
 <Feature: product_id is unknown>,
 <Feature: purchased>,
 <Feature: value>]

In [9]: fm_encoded, f_encoded = ft.encode_features(feature_matrix,
   ...:                                            features, top_n=2)
   ...: 

In [10]: f_encoded
Out[10]: 
[<Feature: product_id = coke zero>,
 <Feature: product_id = car>,
 <Feature: product_id is unknown>,
 <Feature: purchased>,
 <Feature: value>]

In [11]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features,
   ....:                                            include_unknown=False)
   ....: 

In [12]: f_encoded
Out[12]: 
[<Feature: product_id = coke zero>,
 <Feature: product_id = car>,
 <Feature: product_id = toothpaste>,
 <Feature: purchased>,
 <Feature: value>]

In [13]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features,
   ....:                                            to_encode=['purchased'])
   ....: 

In [14]: f_encoded
Out[14]: [<Feature: product_id>, <Feature: purchased>, <Feature: value>]

In [15]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features,
   ....:                                            drop_first=True)
   ....: 

In [16]: f_encoded
Out[16]: 
[<Feature: product_id = coke zero>,
 <Feature: product_id = car>,
 <Feature: product_id is unknown>,
 <Feature: purchased>,
 <Feature: value>]