featuretools.wrappers.
DFSTransformer
Transformer using Scikit-Learn interface for Pipeline uses.
__init__
Creates Transformer
target_entity (str) – Entity id of entity on which to make predictions.
agg_primitives (list[str or AggregationPrimitive], optional) –
List of Aggregation Feature types to apply.
Default: [“sum”, “std”, “max”, “skew”, “min”, “mean”,”count”, “percent_true”, “num_unique”, “mode”]
”count”, “percent_true”, “num_unique”, “mode”]
trans_primitives (list[str or TransformPrimitive], optional) –
List of Transform Feature functions to apply.
Default: [“day”, “year”, “month”, “weekday”, “haversine”,”num_words”, “num_characters”]
”num_words”, “num_characters”]
allowed_paths (list[list[str]]) – Allowed entity paths on which to make features.
max_depth (int) – Maximum allowed depth of features.
ignore_entities (list[str], optional) – List of entities to blacklist when creating features.
ignore_variables (dict[str -> list[str]], optional) – List of specific variables within each entity to blacklist when creating features.
seed_features (list[FeatureBase]) – List of manually defined features to use.
FeatureBase
drop_contains (list[str], optional) – Drop features that contains these strings in name.
drop_exact (list[str], optional) – Drop features that exactly match these strings in name.
where_primitives (list[str or PrimitiveBase], optional) –
List of Primitives names (or types) to apply with where clauses.
Default: [“count”]
Default:
[“count”]
max_features (int, optional) – Cap the number of generated features to this number. If -1, no limit.
Example
In [1]: import featuretools as ft In [2]: import pandas as pd In [3]: from featuretools.wrappers import DFSTransformer In [4]: from sklearn.pipeline import Pipeline In [5]: from sklearn.ensemble import ExtraTreesClassifier # Get example data In [6]: train_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=3) In [7]: test_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=2) In [8]: y = [True, False, True] # Build pipeline In [9]: pipeline = Pipeline(steps=[ ...: ('ft', DFSTransformer(target_entity="customers", ...: max_features=2)), ...: ('et', ExtraTreesClassifier(n_estimators=100)) ...: ]) ...: # Fit and predict In [10]: pipeline.fit(X=train_es, y=y) # fit on customers in training entityset Out[10]: Pipeline(steps=[('ft', <featuretools_sklearn_transformer.transformer.DFSTransformer object at 0x7f31aa88e190>), ('et', ExtraTreesClassifier())]) In [11]: pipeline.predict_proba(test_es) # predict probability of each class on test entityset Out[11]: array([[0., 1.], [0., 1.]]) In [12]: pipeline.predict(test_es) # predict on test entityset Out[12]: array([ True, True]) # Same as above, but using cutoff times In [13]: train_ct = pd.DataFrame() In [14]: train_ct['customer_id'] = [1, 2, 3] In [15]: train_ct['time'] = pd.to_datetime(['2014-1-1 04:00', ....: '2014-1-2 17:20', ....: '2014-1-4 09:53']) ....: In [16]: pipeline.fit(X=(train_es, train_ct), y=y) Out[16]: Pipeline(steps=[('ft', <featuretools_sklearn_transformer.transformer.DFSTransformer object at 0x7f31aa88e190>), ('et', ExtraTreesClassifier())]) In [17]: test_ct = pd.DataFrame() In [18]: test_ct['customer_id'] = [1, 2] In [19]: test_ct['time'] = pd.to_datetime(['2014-1-4 13:48', ....: '2014-1-5 15:32']) ....: In [20]: pipeline.predict_proba((test_es, test_ct)) Out[20]: array([[1., 0.], [1., 0.]]) In [21]: pipeline.predict((test_es, test_ct)) Out[21]: array([False, False])
Methods
__init__([target_entity, agg_primitives, …])
fit(X[, y])
fit
Wrapper for DFS
fit_transform(X[, y])
fit_transform
Fit to data, then transform it.
get_params([deep])
get_params
transform(X)
transform
Wrapper for calculate_feature_matrix