featuretools.encode_features¶
- featuretools.encode_features(feature_matrix, features, top_n=10, include_unknown=True, to_encode=None, inplace=False, drop_first=False, verbose=False)[source]¶
- Encode categorical features - Parameters
- feature_matrix (pd.DataFrame) – Dataframe of features. 
- features (list[PrimitiveBase]) – Feature definitions in feature_matrix. 
- top_n (int or dict[string -> int]) – Number of top values to include. If dict[string -> int] is used, key is feature name and value is the number of top values to include for that feature. If a feature’s name is not in dictionary, a default value of 10 is used. 
- include_unknown (pd.DataFrame) – Add feature encoding an unknown class. defaults to True 
- to_encode (list[str]) – List of feature names to encode. features not in this list are unencoded in the output matrix defaults to encode all necessary features. 
- inplace (bool) – Encode feature_matrix in place. Defaults to False. 
- drop_first (bool) – Whether to get k-1 dummies out of k categorical levels by removing the first level. defaults to False 
- verbose (str) – Print progress info. 
 
- Returns
- encoded feature_matrix, encoded features 
- Return type
- (pd.Dataframe, list) 
 - Example - In [1]: f1 = ft.Feature(es["log"].ww["product_id"]) In [2]: f2 = ft.Feature(es["log"].ww["purchased"]) In [3]: f3 = ft.Feature(es["log"].ww["value"]) In [4]: features = [f1, f2, f3] In [5]: ids = [0, 1, 2, 3, 4, 5] In [6]: feature_matrix = ft.calculate_feature_matrix(features, es, ...: instance_ids=ids) ...: In [7]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, ...: features) ...: In [8]: f_encoded Out[8]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id = toothpaste>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>] In [9]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, ...: features, top_n=2) ...: In [10]: f_encoded Out[10]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>] In [11]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: include_unknown=False) ....: In [12]: f_encoded Out[12]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id = toothpaste>, <Feature: purchased>, <Feature: value>] In [13]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: to_encode=['purchased']) ....: In [14]: f_encoded Out[14]: [<Feature: product_id>, <Feature: purchased>, <Feature: value>] In [15]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: drop_first=True) ....: In [16]: f_encoded Out[16]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>]