NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:

pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.


featuretools.EntitySet.entity_from_dataframe

EntitySet.entity_from_dataframe(entity_id, dataframe, index=None, variable_types=None, make_index=False, time_index=None, secondary_time_index=None, already_sorted=False)[source]

Load the data for a specified entity from a Pandas DataFrame.

Parameters
  • entity_id (str) – Unique id to associate with this entity.

  • dataframe (pandas.DataFrame) – Dataframe containing the data.

  • index (str, optional) – Name of the variable used to index the entity. If None, take the first column.

  • variable_types (dict[str -> Variable/str], optional) – Keys are of variable ids and values are variable types or type_strings. Used to to initialize an entity’s store.

  • make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers. Otherwise, assume index exists.

  • time_index (str, optional) – Name of the variable containing time data. Type must be in variables.DateTime or be able to be cast to datetime (e.g. str, float, or numeric.)

  • secondary_time_index (dict[str -> Variable]) – Name of variable containing time data to use a second time index for the entity.

  • already_sorted (bool, optional) – If True, assumes that input dataframe is already sorted by time. Defaults to False.

Notes

Will infer variable types from Pandas dtype

Example

In [1]: import featuretools as ft

In [2]: import pandas as pd

In [3]: transactions_df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
   ...:                                 "session_id": [1, 2, 1, 3, 4, 5],
   ...:                                 "amount": [100.40, 20.63, 33.32, 13.12, 67.22, 1.00],
   ...:                                 "transaction_time": pd.date_range(start="10:00", periods=6, freq="10s"),
   ...:                                 "fraud": [True, False, True, False, True, True]})
   ...: 

In [4]: es = ft.EntitySet("example")

In [5]: es.entity_from_dataframe(entity_id="transactions",
   ...:                          index="id",
   ...:                          time_index="transaction_time",
   ...:                          dataframe=transactions_df)
   ...: 
Out[5]: 
Entityset: example
  Entities:
    transactions [Rows: 6, Columns: 5]
  Relationships:
    No relationships

In [6]: es["transactions"]
Out[6]: 
Entity: transactions
  Variables:
    id (dtype: index)
    session_id (dtype: numeric)
    amount (dtype: numeric)
    transaction_time (dtype: datetime_time_index)
    fraud (dtype: boolean)
  Shape:
    (Rows: 6, Columns: 5)

In [7]: es["transactions"].df
Out[7]: 
   id  session_id  amount    transaction_time  fraud
1   1           1  100.40 2021-09-02 10:00:00   True
2   2           2   20.63 2021-09-02 10:00:10  False
3   3           1   33.32 2021-09-02 10:00:20   True
4   4           3   13.12 2021-09-02 10:00:30  False
5   5           4   67.22 2021-09-02 10:00:40   True
6   6           5    1.00 2021-09-02 10:00:50   True