NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
EntitySet.
entity_from_dataframe
Load the data for a specified entity from a Pandas DataFrame.
entity_id (str) – Unique id to associate with this entity.
dataframe (pandas.DataFrame) – Dataframe containing the data.
index (str, optional) – Name of the variable used to index the entity. If None, take the first column.
variable_types (dict[str -> Variable/str], optional) – Keys are of variable ids and values are variable types or type_strings. Used to to initialize an entity’s store.
make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers. Otherwise, assume index exists.
time_index (str, optional) – Name of the variable containing time data. Type must be in variables.DateTime or be able to be cast to datetime (e.g. str, float, or numeric.)
variables.DateTime
secondary_time_index (dict[str -> Variable]) – Name of variable containing time data to use a second time index for the entity.
already_sorted (bool, optional) – If True, assumes that input dataframe is already sorted by time. Defaults to False.
Notes
Will infer variable types from Pandas dtype
Example
In [1]: import featuretools as ft In [2]: import pandas as pd In [3]: transactions_df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6], ...: "session_id": [1, 2, 1, 3, 4, 5], ...: "amount": [100.40, 20.63, 33.32, 13.12, 67.22, 1.00], ...: "transaction_time": pd.date_range(start="10:00", periods=6, freq="10s"), ...: "fraud": [True, False, True, False, True, True]}) ...: In [4]: es = ft.EntitySet("example") In [5]: es.entity_from_dataframe(entity_id="transactions", ...: index="id", ...: time_index="transaction_time", ...: dataframe=transactions_df) ...: Out[5]: Entityset: example Entities: transactions [Rows: 6, Columns: 5] Relationships: No relationships In [6]: es["transactions"] Out[6]: Entity: transactions Variables: id (dtype: index) session_id (dtype: numeric) amount (dtype: numeric) transaction_time (dtype: datetime_time_index) fraud (dtype: boolean) Shape: (Rows: 6, Columns: 5) In [7]: es["transactions"].df Out[7]: id session_id amount transaction_time fraud 1 1 1 100.40 2021-09-02 10:00:00 True 2 2 2 20.63 2021-09-02 10:00:10 False 3 3 1 33.32 2021-09-02 10:00:20 True 4 4 3 13.12 2021-09-02 10:00:30 False 5 5 4 67.22 2021-09-02 10:00:40 True 6 6 5 1.00 2021-09-02 10:00:50 True