featuretools.EntitySet.entity_from_dataframe¶
-
EntitySet.
entity_from_dataframe
(entity_id, dataframe, index=None, variable_types=None, make_index=False, time_index=None, secondary_time_index=None, already_sorted=False)[source]¶ Load the data for a specified entity from a Pandas DataFrame.
- Parameters
entity_id (str) – Unique id to associate with this entity.
dataframe (pandas.DataFrame) – Dataframe containing the data.
index (str, optional) – Name of the variable used to index the entity. If None, take the first column.
variable_types (dict[str -> Variable/str], optional) – Keys are of variable ids and values are variable types or type_strings. Used to to initialize an entity’s store.
make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers. Otherwise, assume index exists.
time_index (str, optional) – Name of the variable containing time data. Type must be in
variables.DateTime
or be able to be cast to datetime (e.g. str, float, or numeric.)secondary_time_index (dict[str -> Variable]) – Name of variable containing time data to use a second time index for the entity.
already_sorted (bool, optional) – If True, assumes that input dataframe is already sorted by time. Defaults to False.
Notes
Will infer variable types from Pandas dtype
Example
In [1]: import featuretools as ft In [2]: import pandas as pd In [3]: transactions_df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6], ...: "session_id": [1, 2, 1, 3, 4, 5], ...: "amount": [100.40, 20.63, 33.32, 13.12, 67.22, 1.00], ...: "transaction_time": pd.date_range(start="10:00", periods=6, freq="10s"), ...: "fraud": [True, False, True, False, True, True]}) ...: In [4]: es = ft.EntitySet("example") In [5]: es.entity_from_dataframe(entity_id="transactions", ...: index="id", ...: time_index="transaction_time", ...: dataframe=transactions_df) ...: Out[5]: Entityset: example Entities: transactions [Rows: 6, Columns: 5] Relationships: No relationships In [6]: es["transactions"] Out[6]: Entity: transactions Variables: id (dtype: index) session_id (dtype: numeric) amount (dtype: numeric) transaction_time (dtype: datetime_time_index) fraud (dtype: boolean) Shape: (Rows: 6, Columns: 5) In [7]: es["transactions"].df Out[7]: id session_id amount transaction_time fraud 1 1 1 100.40 2020-07-02 10:00:00 True 2 2 2 20.63 2020-07-02 10:00:10 False 3 3 1 33.32 2020-07-02 10:00:20 True 4 4 3 13.12 2020-07-02 10:00:30 False 5 5 4 67.22 2020-07-02 10:00:40 True 6 6 5 1.00 2020-07-02 10:00:50 True