NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:

pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.

featuretools.EntitySet.add_dataframe¶

EntitySet.add_dataframe(dataframe, dataframe_name=None, index=None, logical_types=None, semantic_tags=None, make_index=False, time_index=None, secondary_time_index=None, already_sorted=False)[source]¶

Add a DataFrame to the EntitySet with Woodwork typing information.

Parameters

dataframe (pandas.DataFrame) – Dataframe containing the data.
dataframe_name (str, optional) – Unique name to associate with this dataframe. Must be provided if Woodwork is not initialized on the input DataFrame.
index (str, optional) – Name of the column used to index the dataframe. Must be unique. If None, take the first column.
logical_types (dict[str -> Woodwork.LogicalTypes/str, optional]) – Keys are column names and values are logical types. Will be inferred if not specified.
semantic_tags (dict[str -> str/set], optional) – Keys are column names and values are semantic tags.
make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers. Otherwise, assume index exists.
time_index (str, optional) – Name of the column containing time data. Type must be numeric or datetime in nature.
secondary_time_index (dict[str -> list[str]]) – Name of column containing time data to be used as a secondary time index mapped to a list of the columns in the dataframe associated with that secondary time index.
already_sorted (bool, optional) – If True, assumes that input dataframe is already sorted by time. Defaults to False.

Notes

Will infer logical types from the data.

Example

In [1]: import featuretools as ft

In [2]: import pandas as pd

In [3]: transactions_df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
   ...:                                 "session_id": [1, 2, 1, 3, 4, 5],
   ...:                                 "amount": [100.40, 20.63, 33.32, 13.12, 67.22, 1.00],
   ...:                                 "transaction_time": pd.date_range(start="10:00", periods=6, freq="10s"),
   ...:                                 "fraud": [True, False, True, False, True, True]})
   ...: 

In [4]: es = ft.EntitySet("example")

In [5]: es.add_dataframe(dataframe_name="transactions",
   ...:                  index="id",
   ...:                  time_index="transaction_time",
   ...:                  dataframe=transactions_df)
   ...: 
Out[5]: 
Entityset: example
  DataFrames:
    transactions [Rows: 6, Columns: 5]
  Relationships:
    No relationships

In [6]: es["transactions"]
Out[6]: 
   id  session_id  amount    transaction_time  fraud
1   1           1  100.40 2021-09-17 10:00:00   True
2   2           2   20.63 2021-09-17 10:00:10  False
3   3           1   33.32 2021-09-17 10:00:20   True
4   4           3   13.12 2021-09-17 10:00:30  False
5   5           4   67.22 2021-09-17 10:00:40   True
6   6           5    1.00 2021-09-17 10:00:50   True

featuretools.Relationship featuretools.EntitySet.add_interesting_values