featuretools.wrappers.DFSTransformer¶
-
class
featuretools.wrappers.
DFSTransformer
(entities=None, relationships=None, entityset=None, target_entity=None, agg_primitives=None, trans_primitives=None, allowed_paths=None, max_depth=2, ignore_entities=None, ignore_variables=None, seed_features=None, drop_contains=None, drop_exact=None, where_primitives=None, max_features=-1, verbose=False)[source]¶ Transformer using Scikit-Learn interface for Pipeline uses.
-
__init__
(entities=None, relationships=None, entityset=None, target_entity=None, agg_primitives=None, trans_primitives=None, allowed_paths=None, max_depth=2, ignore_entities=None, ignore_variables=None, seed_features=None, drop_contains=None, drop_exact=None, where_primitives=None, max_features=-1, verbose=False)[source]¶ Creates Transformer
- Parameters
entities (dict[str -> tuple(pd.DataFrame, str, str)]) – Dictionary of entities. Entries take the format {entity id -> (dataframe, id column, (time_column))}.
relationships (list[(str, str, str, str)]) – List of relationships between entities. List items are a tuple with the format (parent entity id, parent variable, child entity id, child variable).
entityset (EntitySet) – An already initialized entityset. Required if entities and relationships are not defined.
target_entity (str) – Entity id of entity on which to make predictions.
agg_primitives (list[str or AggregationPrimitive], optional) –
List of Aggregation Feature types to apply.
- Default: [“sum”, “std”, “max”, “skew”, “min”, “mean”,
”count”, “percent_true”, “num_unique”, “mode”]
trans_primitives (list[str or TransformPrimitive], optional) –
List of Transform Feature functions to apply.
- Default: [“day”, “year”, “month”, “weekday”, “haversine”,
”num_words”, “num_characters”]
allowed_paths (list[list[str]]) – Allowed entity paths on which to make features.
max_depth (int) – Maximum allowed depth of features.
ignore_entities (list[str], optional) – List of entities to blacklist when creating features.
ignore_variables (dict[str -> list[str]], optional) – List of specific variables within each entity to blacklist when creating features.
seed_features (list[
FeatureBase
]) – List of manually defined features to use.drop_contains (list[str], optional) – Drop features that contains these strings in name.
drop_exact (list[str], optional) – Drop features that exactly match these strings in name.
where_primitives (list[str or PrimitiveBase], optional) –
List of Primitives names (or types) to apply with where clauses.
Default:
[“count”]
max_features (int, optional) – Cap the number of generated features to this number. If -1, no limit.
Example
In [1]: import featuretools as ft In [2]: import pandas as pd In [3]: from featuretools.wrappers import DFSTransformer In [4]: from sklearn.pipeline import Pipeline In [5]: from sklearn.ensemble import ExtraTreesClassifier # Get examle data In [6]: n_customers = 3 In [7]: es = ft.demo.load_mock_customer(return_entityset=True, n_customers=5) In [8]: y = [True, False, True] # Build dataset In [9]: pipeline = Pipeline(steps=[ ...: ('ft', DFSTransformer(entityset=es, ...: target_entity="customers", ...: max_features=3)), ...: ('et', ExtraTreesClassifier(n_estimators=100)) ...: ]) ...: # Fit and predict In [10]: pipeline.fit([1, 2, 3], y=y) # fit on first 3 customers Out[10]: Pipeline(memory=None, steps=[('ft', <featuretools_sklearn_transformer.transformer.DFSTransformer object at 0x7f9d98b51c88>), ('et', ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False))], verbose=False) In [11]: pipeline.predict_proba([4,5]) # predict probability of each class on last 2
-