featuretools.wrappers.DFSTransformer#

class featuretools.wrappers.DFSTransformer(target_dataframe_name=None, agg_primitives=None, trans_primitives=None, allowed_paths=None, max_depth=2, ignore_dataframes=None, ignore_columns=None, seed_features=None, drop_contains=None, drop_exact=None, where_primitives=None, max_features=-1, verbose=False)[source]#

Transformer using Scikit-Learn interface for Pipeline uses.

__init__(target_dataframe_name=None, agg_primitives=None, trans_primitives=None, allowed_paths=None, max_depth=2, ignore_dataframes=None, ignore_columns=None, seed_features=None, drop_contains=None, drop_exact=None, where_primitives=None, max_features=-1, verbose=False)[source]#

Creates Transformer

Parameters:
  • target_dataframe_name (str) – Name of dataframe on which to make predictions.

  • agg_primitives (list[str or AggregationPrimitive], optional) –

    List of Aggregation Feature types to apply.

    Default: [“sum”, “std”, “max”, “skew”, “min”, “mean”,

    ”count”, “percent_true”, “num_unique”, “mode”]

  • trans_primitives (list[str or TransformPrimitive], optional) –

    List of Transform Feature functions to apply.

    Default: [“day”, “year”, “month”, “weekday”, “haversine”,

    ”num_words”, “num_characters”]

  • allowed_paths (list[list[str]]) – Allowed dataframe paths on which to make features.

  • max_depth (int) – Maximum allowed depth of features.

  • ignore_dataframes (list[str], optional) – List of dataframes to blacklist when creating features.

  • ignore_columns (dict[str -> list[str]], optional) – List of specific columns within each dataframe to blacklist when creating features.

  • seed_features (list[FeatureBase]) – List of manually defined features to use.

  • drop_contains (list[str], optional) – Drop features that contains these strings in name.

  • drop_exact (list[str], optional) – Drop features that exactly match these strings in name.

  • where_primitives (list[str or PrimitiveBase], optional) –

    List of Primitives names (or types) to apply with where clauses.

    Default:

    [“count”]

  • max_features (int, optional) – Cap the number of generated features to this number. If -1, no limit.

Example

In [1]: import featuretools as ft

In [2]: import pandas as pd

In [3]: from featuretools.wrappers import DFSTransformer

In [4]: from sklearn.pipeline import Pipeline

In [5]: from sklearn.ensemble import ExtraTreesClassifier

# Get example data
In [6]: train_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=3)

In [7]: test_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=2)

In [8]: y = [True, False, True]

# Build pipeline
In [9]: pipeline = Pipeline(steps=[
   ...:     ('ft', DFSTransformer(target_dataframe_name="customers",
   ...:                           max_features=2)),
   ...:     ('et', ExtraTreesClassifier(n_estimators=100))
   ...: ])
   ...: 

# Fit and predict
In [10]: pipeline.fit(X=train_es, y=y) # fit on customers in training entityset
Out[10]: 
Pipeline(steps=[('ft',
                 <featuretools_sklearn_transformer.transformer.DFSTransformer object at 0x7f56823306d0>),
                ('et', ExtraTreesClassifier())])

In [11]: pipeline.predict_proba(test_es) # predict probability of each class on test entityset
Out[11]: 
array([[0., 1.],
       [0., 1.]])

In [12]: pipeline.predict(test_es) # predict on test entityset
Out[12]: array([ True,  True])

# Same as above, but using cutoff times
In [13]: train_ct = pd.DataFrame()

In [14]: train_ct['customer_id'] = [1, 2, 3]

In [15]: train_ct['time'] = pd.to_datetime(['2014-1-1 04:00',
   ....:                                    '2014-1-2 17:20',
   ....:                                    '2014-1-4 09:53'])
   ....: 

In [16]: pipeline.fit(X=(train_es, train_ct), y=y)
Out[16]: 
Pipeline(steps=[('ft',
                 <featuretools_sklearn_transformer.transformer.DFSTransformer object at 0x7f56823306d0>),
                ('et', ExtraTreesClassifier())])

In [17]: test_ct = pd.DataFrame()

In [18]: test_ct['customer_id'] = [1, 2]

In [19]: test_ct['time'] = pd.to_datetime(['2014-1-4 13:48',
   ....:                                   '2014-1-5 15:32'])
   ....: 

In [20]: pipeline.predict_proba((test_es, test_ct))
Out[20]: 
array([[1., 0.],
       [1., 0.]])

In [21]: pipeline.predict((test_es, test_ct))
Out[21]: array([False, False])

Methods

__init__([target_dataframe_name, ...])

Creates Transformer

fit(X[, y])

Wrapper for DFS

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

set_output(*[, transform])

Set output container.

transform(X)

Wrapper for calculate_feature_matrix