Frequently Asked Questions#
Here we are attempting to answer some commonly asked questions that appear on Github, and Stack Overflow.
[1]:
import pandas as pd
import woodwork as ww
import featuretools as ft
EntitySet#
How do I get a list of column names and types in an EntitySet
?#
After you create your EntitySet
, you may wish to view the column names. An EntitySet
contains multiple DataFrames, one for each table in the EntitySet
.
[2]:
es = ft.demo.load_mock_customer(return_entityset=True)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[2]:
Entityset: transactions
DataFrames:
transactions [Rows: 500, Columns: 6]
products [Rows: 5, Columns: 3]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 5]
Relationships:
transactions.product_id -> products.product_id
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
If you want to view the underlying Dataframe, you can do the following:
[3]:
es["transactions"].head()
[3]:
transaction_id | session_id | transaction_time | product_id | amount | _ft_last_time | |
---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2014-01-01 00:00:00 |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2014-01-01 00:01:05 |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2014-01-01 00:02:10 |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2014-01-01 00:03:15 |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2014-01-01 00:04:20 |
If you want view the columns and types for the “transactions” DataFrame, you can do the following:
[4]:
es["transactions"].ww
[4]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
transaction_id | int64 | Integer | ['index'] |
session_id | int64 | Integer | ['foreign_key', 'numeric'] |
transaction_time | datetime64[ns] | Datetime | ['time_index'] |
product_id | category | Categorical | ['category', 'foreign_key'] |
amount | float64 | Double | ['numeric'] |
_ft_last_time | datetime64[ns] | Datetime | ['last_time_index'] |
What is the difference between copy_columns
and additional_columns
?#
The function normalize_dataframe
creates a new DataFrame and a relationship from unique values of an existing DataFrame. It takes 2 similar arguments:
additional_columns
removes columns from the base DataFrame and moves them to the new DataFrame.copy_columns
keeps the given columns in the base DataFrame, but also copies them to the new DataFrame.
[5]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
products_df = data["products"]
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions",
dataframe=transactions_df,
index="transaction_id",
time_index="transaction_time",
)
es = es.add_dataframe(
dataframe_name="products", dataframe=products_df, index="product_id"
)
es = es.add_relationship("products", "product_id", "transactions", "product_id")
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
Before we normalize to create a new DataFrame, let’s look at the base DataFrame
[6]:
es["transactions"].head()
[6]:
transaction_id | session_id | transaction_time | product_id | amount | customer_id | device | session_start | zip_code | join_date | birthday | |
---|---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
Notice the columns session_id
, session_start
, join_date
, device
, customer_id
, and zip_code
.
[7]:
es = es.normalize_dataframe(
base_dataframe_name="transactions",
new_dataframe_name="sessions",
index="session_id",
make_time_index="session_start",
additional_columns=["join_date"],
copy_columns=["device", "customer_id", "zip_code", "session_start"],
)
Above, we normalized the columns to create a new DataFrame.
For
additional_columns
, the following column['join_date]
will be removed from thetransactions
DataFrame, and moved to the newsessions
DataFrame.For
copy_columns
, the following columns['device', 'customer_id', 'zip_code','session_start']
will be copied from thetransactions
DataFrame to the newsessions
DataFrame.
Let’s see this in the actual EntitySet
.
[8]:
es["transactions"].head()
[8]:
transaction_id | session_id | transaction_time | product_id | amount | customer_id | device | session_start | zip_code | birthday | |
---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | desktop | 2014-01-01 | 13244 | 1986-08-18 |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | desktop | 2014-01-01 | 13244 | 1986-08-18 |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | desktop | 2014-01-01 | 13244 | 1986-08-18 |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | desktop | 2014-01-01 | 13244 | 1986-08-18 |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | desktop | 2014-01-01 | 13244 | 1986-08-18 |
Notice above how ['device', 'customer_id', 'zip_code','session_start']
are still in the transactions
DataFrame, while ['join_date']
is not. But, they have all been moved to the sessions
DataFrame, as seen below.
[9]:
es["sessions"].head()
[9]:
session_id | join_date | device | customer_id | zip_code | session_start | |
---|---|---|---|---|---|---|
1 | 1 | 2012-04-15 23:31:04 | desktop | 2 | 13244 | 2014-01-01 00:00:00 |
2 | 2 | 2010-07-17 05:27:50 | mobile | 5 | 60091 | 2014-01-01 00:17:20 |
3 | 3 | 2011-04-08 20:08:14 | mobile | 4 | 60091 | 2014-01-01 00:28:10 |
4 | 4 | 2011-04-17 10:48:33 | mobile | 1 | 60091 | 2014-01-01 00:44:25 |
5 | 5 | 2011-04-08 20:08:14 | mobile | 4 | 60091 | 2014-01-01 01:11:30 |
How do I update a column’s description or metadata?#
You can directly update the description or metadata attributes of the column schema. However, you must specifically use the column schema returned by DataFrame.ww.columns['col_name']
, not DataFrame.ww['col_name'].ww.schema
. The column schema from DataFrame.ww.columns['col_name']
is still associated with the EntitySet and propagates any attribute updates, whereas the other does not. As an example, this is how you can update a column’s description or metadata:
column_schema = df.ww.columns['col_name']
column_schema.description = 'my description'
column_schema.metadata.update(key='value')
How do I combine two or more interesting values?#
You might want to create features that are conditioned on multiple values before they are calculated. This would require the use of interesting_values
. However, since we are trying to create the feature with multiple conditions, we will need to modify the Dataframe before we create the EntitySet
.
Let’s look at how you might accomplish this.
First, let’s create our Dataframes.
[12]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
products_df = data["products"]
[13]:
transactions_df.head()
[13]:
transaction_id | session_id | transaction_time | product_id | amount | customer_id | device | session_start | zip_code | join_date | birthday | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
1 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
2 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
3 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
4 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
[14]:
products_df.head()
[14]:
product_id | brand | |
---|---|---|
0 | 1 | B |
1 | 2 | B |
2 | 3 | B |
3 | 4 | B |
4 | 5 | A |
Now, let’s modify our transactions
Dataframe to create the additional column that represents multiple conditions for our feature.
[15]:
transactions_df["product_id_device"] = (
transactions_df["product_id"].astype(str) + " and " + transactions_df["device"]
)
Here, we created a new column called product_id_device
, which just combines the product_id
column, and the device
column.
Now let’s create our EntitySet
.
[16]:
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions",
dataframe=transactions_df,
index="transaction_id",
time_index="transaction_time",
logical_types={
"product_id": ww.logical_types.Categorical,
"product_id_device": ww.logical_types.Categorical,
"zip_code": ww.logical_types.PostalCode,
},
)
es = es.add_dataframe(
dataframe_name="products", dataframe=products_df, index="product_id"
)
es = es.normalize_dataframe(
base_dataframe_name="transactions",
new_dataframe_name="sessions",
index="session_id",
additional_columns=["device", "product_id_device", "customer_id"],
)
es = es.normalize_dataframe(
base_dataframe_name="sessions", new_dataframe_name="customers", index="customer_id"
)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[16]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 9]
products [Rows: 5, Columns: 2]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 2]
Relationships:
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
Now, we are ready to add our interesting values.
First, let’s view our options for what the interesting values could be.
[17]:
interesting_values = transactions_df["product_id_device"].unique().tolist()
interesting_values
[17]:
['5 and desktop',
'2 and desktop',
'3 and desktop',
'4 and desktop',
'1 and desktop',
'4 and mobile',
'5 and mobile',
'1 and mobile',
'3 and mobile',
'2 and mobile',
'4 and tablet',
'3 and tablet',
'2 and tablet',
'1 and tablet',
'5 and tablet']
If you wanted to, you could pick a subset of these, and the where
features created would only use those conditions. In our example, we will use all the possible interesting values.
Here, we set all of these values as our interesting values for this specific DataFrame and column. If we wanted to, we could make interesting values in the same way for more than one column, but we will just stick with this one for this example.
[18]:
values = {"product_id_device": interesting_values}
es.add_interesting_values(dataframe_name="sessions", values=values)
Now we can run DFS.
[19]:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=["count"],
where_primitives=["count"],
trans_primitives=[],
)
feature_matrix.head()
[19]:
COUNT(sessions) | COUNT(transactions) | COUNT(sessions WHERE product_id_device = 1 and tablet) | COUNT(sessions WHERE product_id_device = 2 and desktop) | COUNT(sessions WHERE product_id_device = 4 and desktop) | COUNT(sessions WHERE product_id_device = 1 and mobile) | COUNT(sessions WHERE product_id_device = 4 and tablet) | COUNT(sessions WHERE product_id_device = 5 and mobile) | COUNT(sessions WHERE product_id_device = 2 and mobile) | COUNT(sessions WHERE product_id_device = 1 and desktop) | ... | COUNT(transactions WHERE sessions.product_id_device = 3 and desktop) | COUNT(transactions WHERE sessions.product_id_device = 5 and desktop) | COUNT(transactions WHERE sessions.product_id_device = 5 and mobile) | COUNT(transactions WHERE sessions.product_id_device = 3 and tablet) | COUNT(transactions WHERE sessions.product_id_device = 2 and desktop) | COUNT(transactions WHERE sessions.product_id_device = 2 and tablet) | COUNT(transactions WHERE sessions.product_id_device = 4 and mobile) | COUNT(transactions WHERE sessions.product_id_device = 4 and tablet) | COUNT(transactions WHERE sessions.product_id_device = 5 and tablet) | COUNT(transactions WHERE sessions.product_id_device = 4 and desktop) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | |||||||||||||||||||||
2 | 7 | 93 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | ... | 0 | 16 | 0 | 0 | 0 | 0 | 18 | 0 | 13 | 10 |
5 | 6 | 79 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | ... | 0 | 15 | 0 | 0 | 0 | 0 | 10 | 14 | 0 | 14 |
4 | 8 | 109 | 0 | 1 | 1 | 1 | 0 | 0 | 2 | 0 | ... | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 18 | 18 |
1 | 8 | 126 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | ... | 0 | 12 | 0 | 16 | 15 | 0 | 56 | 27 | 0 | 0 |
3 | 6 | 93 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | ... | 0 | 29 | 0 | 0 | 0 | 15 | 0 | 0 | 0 | 0 |
5 rows × 32 columns
To better understand the where
clause features, let’s examine one of those features. The feature COUNT(sessions WHERE product_id_device = 5 and tablet)
, tells us how many sessions the customer purchased product_id
5 while on a tablet. Notice how the feature depends on multiple conditions (product_id = 5 & device = tablet).
[20]:
feature_matrix[["COUNT(sessions WHERE product_id_device = 5 and tablet)"]]
[20]:
COUNT(sessions WHERE product_id_device = 5 and tablet) | |
---|---|
customer_id | |
2 | 1 |
5 | 0 |
4 | 1 |
1 | 0 |
3 | 0 |
DFS#
Why is DFS not creating aggregation features?#
You may have created your EntitySet
, and then applied DFS to create features. However, you may be puzzled as to why no aggregation features were created.
This is most likely because you have a single DataFrame in your EntitySet, and DFS is not capable of creating aggregation features with fewer than 2 DataFrames. Featuretools looks for a relationship, and aggregates based on that relationship.
Let’s look at a simple example.
[21]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions", dataframe=transactions_df, index="transaction_id"
)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[21]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 11]
Relationships:
No relationships
Notice how we only have 1 DataFrame in our EntitySet
. If we try to create aggregation features on this EntitySet
, it will not be possible because DFS needs 2 DataFrames to generate aggregation features.
[22]:
feature_matrix, feature_defs = ft.dfs(
entityset=es, target_dataframe_name="transactions"
)
feature_defs
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[22]:
[<Feature: session_id>,
<Feature: product_id>,
<Feature: amount>,
<Feature: customer_id>,
<Feature: device>,
<Feature: zip_code>,
<Feature: DAY(birthday)>,
<Feature: DAY(join_date)>,
<Feature: DAY(session_start)>,
<Feature: DAY(transaction_time)>,
<Feature: MONTH(birthday)>,
<Feature: MONTH(join_date)>,
<Feature: MONTH(session_start)>,
<Feature: MONTH(transaction_time)>,
<Feature: WEEKDAY(birthday)>,
<Feature: WEEKDAY(join_date)>,
<Feature: WEEKDAY(session_start)>,
<Feature: WEEKDAY(transaction_time)>,
<Feature: YEAR(birthday)>,
<Feature: YEAR(join_date)>,
<Feature: YEAR(session_start)>,
<Feature: YEAR(transaction_time)>]
None of the above features are aggregation features. To fix this issue, you can add another DataFrame to your EntitySet
.
Solution #1 - You can add new DataFrame if you have additional data.
[23]:
products_df = data["products"]
es = es.add_dataframe(
dataframe_name="products", dataframe=products_df, index="product_id"
)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[23]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 11]
products [Rows: 5, Columns: 2]
Relationships:
No relationships
Notice how we now have an additional DataFrame in our EntitySet
, called products
.
Solution #2 - You can normalize an existing DataFrame.
[24]:
es = es.normalize_dataframe(
base_dataframe_name="transactions",
new_dataframe_name="sessions",
index="session_id",
make_time_index="session_start",
additional_columns=["device", "customer_id", "zip_code", "join_date"],
copy_columns=["session_start"],
)
es
[24]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 7]
products [Rows: 5, Columns: 2]
sessions [Rows: 35, Columns: 6]
Relationships:
transactions.session_id -> sessions.session_id
Notice how we now have an additional DataFrame in our EntitySet
, called sessions
. Here, the normalization created a relationship between transactions
and sessions
. However, we could have specified a relationship between transactions
and products
if we had only used Solution #1.
Now, we can generate aggregation features.
[25]:
feature_matrix, feature_defs = ft.dfs(
entityset=es, target_dataframe_name="transactions"
)
feature_defs[:-10]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[25]:
[<Feature: session_id>,
<Feature: product_id>,
<Feature: amount>,
<Feature: DAY(birthday)>,
<Feature: DAY(session_start)>,
<Feature: DAY(transaction_time)>,
<Feature: MONTH(birthday)>,
<Feature: MONTH(session_start)>,
<Feature: MONTH(transaction_time)>,
<Feature: WEEKDAY(birthday)>,
<Feature: WEEKDAY(session_start)>,
<Feature: WEEKDAY(transaction_time)>,
<Feature: YEAR(birthday)>,
<Feature: YEAR(session_start)>,
<Feature: YEAR(transaction_time)>,
<Feature: sessions.device>,
<Feature: sessions.customer_id>,
<Feature: sessions.zip_code>,
<Feature: sessions.COUNT(transactions)>,
<Feature: sessions.MAX(transactions.amount)>,
<Feature: sessions.MEAN(transactions.amount)>,
<Feature: sessions.MIN(transactions.amount)>,
<Feature: sessions.MODE(transactions.product_id)>,
<Feature: sessions.NUM_UNIQUE(transactions.product_id)>,
<Feature: sessions.SKEW(transactions.amount)>]
A few of the aggregation features are:
<Feature: sessions.MAX(transactions.amount)>
<Feature: sessions.SKEW(transactions.amount)>
<Feature: sessions.MIN(transactions.amount)>
<Feature: sessions.MEAN(transactions.amount)>
<Feature: sessions.COUNT(transactions)>
How do I speed up the runtime of DFS?#
One issue you may encounter while running ft.dfs
is slow performance. While Featuretools has generally optimal default settings for calculating features, you may want to speed up performance when you are calculating on a large number of features.
One quick way to speed up performance is by adjusting the n_jobs
settings of ft.dfs
or ft.calculate_feature_matrix
.
# setting n_jobs to -1 will use all cores
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_dataframe_name="customers",
n_jobs=-1)
feature_matrix, feature_defs = ft.calculate_feature_matrix(entityset=es,
features=feature_defs,
n_jobs=-1)
For more ways to speed up performance, please visit:
How do I include only certain features when running DFS?#
When using DFS to generate features, you may wish to include only certain features. There are multiple ways that you do this:
Use
ignore_columns
to specify columns in a DataFrame that should not be used to create features. It is a dictionary mapping dataframe names to a list of column names to ignore.Use
drop_contains
to drop features that contain any of the strings listed in this parameter.Use
drop_exact
to drop features that exactly match any of the strings listed in this parameter.
Here is an example of using all three parameters:
[26]:
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
ignore_columns={
"transactions": ["amount"],
"customers": ["age", "gender", "birthday"],
}, # ignore these columns
drop_contains=["customers.SUM("], # drop features that contain these strings
drop_exact=["STD(transactions.quanity)"],
) # drop features that exactly match
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
How do I specify primitives on a per column or per DataFrame basis?#
When using DFS to generate features, you may wish to use only certain features or DataFrames for specific primitives. This can be done through the primitive_options
parameter. The primitive_options
parameter is a dictionary that maps a primitive or a tuple of primitives to a dictionary containing options for the primitive(s). A primitive or tuple of primitives can also be mapped to a list of option dictionaries if the primitive(s) takes multiple inputs. The primitive keys can be the
string names of the primitive, the primitive class, or specific instances of the primitive. Each dictionary supplies options for their respective input column. There are multiple ways to control how primitives get applied through these options:
Use
ignore_dataframes
to specify DataFrames that should not be used to create features for that primitive. It is a list of DataFrame names to ignore.Use
include_dataframes
to specify the only DataFrames to be included to create features for that primitive. It is a list of DataFrame names to include.Use
ignore_columns
to specify columns in a DataFrame that should not be used to create features for that primitive. It is a dictionary mapping a DataFrame name to a list of column names to ignore.Use
include_columns
to specify the only columns in a DataFrame that should be used to create features for that primitive. It is a dictionary mapping a DataFrame name to a list of column names to include.
You can also use primitive_options
to specify which DataFrames or columns you wish to use as groupbys for groupby transformation primitives:
Use
ignore_groupby_dataframes
to specify DataFrames that should not be used to get groupbys for that primitive. It is a list of DataFrame names to ignore.Use
include_groupby_dataframes
to specify the only DataFrames that should be used to get groupbys for that primitive. It is a list of DataFrame names to include.Use
ignore_groupby_columns
to specify columns in a DataFrame that should not be used as groupbys for that primitive. It is a dictionary mapping a DataFrame name to a list of column names to ignore.Use
include_groupby_columns
to specify the only columns in a DataFrame that should be used as groupbys for that primitive. It is a dictionary mapping a DataFrame name to a list of column names to include.
Here is an example of using some of these options:
[27]:
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
primitive_options={
"mode": {
"ignore_dataframes": ["sessions"],
"ignore_columns": {"products": ["brand"], "transactions": ["product_id"]},
},
# For mode, ignore the "sessions" DataFrame and only include "brands" in the
# "products" dataframe and "product_id" in the "transactions" DataFrame
("count", "mean"): {"include_dataframes": ["sessions", "transactions"]},
# For count and mean, only include the dataframes "sessions" and "transactions"
},
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
Note that if options are given for a specific instance of a primitive and for the primitive generally (either by string name or class), the instances with their own options will not use the generic options. For example, in this case:
special_mean = Mean()
options = {
special_mean: {'include_dataframes': ['customers']},
'mean': {'include_dataframes': ['sessions']}
the primitive special_mean
will not use the DataFrame sessions
because it’s options have it only include customers
. Every other instance of the Mean
primitive will use the 'mean'
options.
For more examples of specifying options for DFS, please visit:
If I didn’t specify the cutoff_time, what date will be used for the feature calculations?#
The cutoff time will be set to the current time using cutoff_time = datetime.now()
.
How do I select a certain amount of past data when calculating features?#
You may encounter a situation when you wish to make prediction using only a certain amount of historical data. You can accomplish this using the training_window
parameter in ft.dfs
. When you use the training_window
, Featuretools will use the historical data between the cutoff_time
and cutoff_time - training_window
.
In order to make the calculation, Featuretools will check the time in the time_index
column of the target_dataframe
.
[28]:
es = ft.demo.load_mock_customer(return_entityset=True)
es["customers"].ww.time_index
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[28]:
'join_date'
Our target_dataframe has a time_index
, which is needed for the training_window
calculation. Here, we are creating a cutoff time DataFrame so that we can have a unique training window for each customer.
[29]:
cutoff_times = pd.DataFrame()
cutoff_times["customer_id"] = [1, 2, 3, 1]
cutoff_times["time"] = pd.to_datetime(
["2014-1-1 04:00", "2014-1-1 05:00", "2014-1-1 06:00", "2014-1-1 08:00"]
)
cutoff_times["label"] = [True, True, False, True]
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
training_window="1 hour",
)
feature_matrix.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[29]:
zip_code | COUNT(sessions) | MODE(sessions.device) | NUM_UNIQUE(sessions.device) | COUNT(transactions) | MAX(transactions.amount) | MEAN(transactions.amount) | MIN(transactions.amount) | MODE(transactions.product_id) | NUM_UNIQUE(transactions.product_id) | ... | STD(sessions.SUM(transactions.amount)) | SUM(sessions.MAX(transactions.amount)) | SUM(sessions.MEAN(transactions.amount)) | SUM(sessions.MIN(transactions.amount)) | SUM(sessions.NUM_UNIQUE(transactions.product_id)) | SUM(sessions.SKEW(transactions.amount)) | SUM(sessions.STD(transactions.amount)) | MODE(transactions.sessions.device) | NUM_UNIQUE(transactions.sessions.device) | label | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | time | |||||||||||||||||||||
1 | 2014-01-01 04:00:00 | 60091 | 1 | tablet | 1 | 12 | 139.09 | 85.469167 | 6.78 | 4 | 5 | ... | NaN | 139.09 | 85.469167 | 6.78 | 5.0 | -0.830975 | 39.825249 | tablet | 1 | True |
2 | 2014-01-01 05:00:00 | 13244 | 1 | tablet | 1 | 13 | 118.85 | 77.304615 | 21.82 | 1 | 5 | ... | NaN | 118.85 | 77.304615 | 21.82 | 5.0 | -0.314918 | 33.725036 | tablet | 1 | True |
3 | 2014-01-01 06:00:00 | 13244 | 2 | desktop | 1 | 12 | 128.26 | 81.747500 | 20.06 | 3 | 5 | ... | 563.882303 | 220.02 | 172.597273 | 111.82 | 6.0 | -0.289466 | 35.704680 | desktop | 1 | False |
1 | 2014-01-01 08:00:00 | 60091 | 1 | mobile | 1 | 16 | 126.11 | 88.755625 | 11.62 | 4 | 5 | ... | NaN | 126.11 | 88.755625 | 11.62 | 5.0 | -1.038434 | 32.324534 | mobile | 1 | True |
4 rows × 76 columns
Above, we ran DFS with training_window
argument of 1 hour
to create features that only used customer data collected in the last hour (from the cutoff time we provided).
Can I run DFS on a single table?#
Although possible, running DFS on a single table doesn’t make full use of DFS’s capabilities. For one, DFS will not be able to use any aggregation primitives, which require at least two tables. You will only be able to use transform primitives. This limits the complexity of the features that DFS can generate through feature stacking. Additionally, in certain situations, running single table DFS on data with time columns could risk label leakage. With data split in multiple tables, featuretools can filter data based on the cutoff time instead of assuming data was flattened appropriately, but it can not do this with only a single table.
If you only have a single table of data, DFS can certainly still be of use. There are two main ways to pass in a single table to DFS.
The first is to simply create an EntitySet with one table.
For example:
[30]:
transactions_df = ft.demo.load_mock_customer(return_single_table=True)
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions",
dataframe=transactions_df,
index="transaction_id",
time_index="transaction_time",
)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="transactions",
trans_primitives=[
"time_since",
"day",
"is_weekend",
"cum_min",
"minute",
"weekday",
"percentile",
"year",
"week",
"cum_mean",
],
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
The second way is to insert the dataframe into a dictionary mapping its name to a tuple containing specific dataframe information. We then pass in that dictionary to the dataframes
argument in DFS.
In this scenario, for the value in our dictionary, we pass in a tuple containing the dataframe, its index column, and its time index. More information about the possible parameters can be found in the DFS documentation.
For example:
[31]:
transactions_df = ft.demo.load_mock_customer(return_single_table=True)
dataframes = {"transactions": (transactions_df, "transaction_id", "transaction_time")}
feature_matrix, feature_defs = ft.dfs(
dataframes=dataframes,
target_dataframe_name="transactions",
trans_primitives=[
"time_since",
"day",
"is_weekend",
"cum_min",
"minute",
"weekday",
"percentile",
"year",
"week",
"cum_mean",
],
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
Before we examine the output, let’s look at our original single table.
[32]:
transactions_df.head()
[32]:
transaction_id | session_id | transaction_time | product_id | amount | customer_id | device | session_start | zip_code | join_date | birthday | brand | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | A |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | desktop | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
Now we can look at the transformations that Featuretools was able to apply to this single DataFrame to create feature matrix.
[33]:
feature_matrix.head()
[33]:
session_id | product_id | amount | customer_id | device | zip_code | brand | CUM_MEAN(amount) | CUM_MEAN(customer_id) | CUM_MEAN(session_id) | ... | WEEK(session_start) | WEEK(transaction_time) | WEEKDAY(birthday) | WEEKDAY(join_date) | WEEKDAY(session_start) | WEEKDAY(transaction_time) | YEAR(birthday) | YEAR(join_date) | YEAR(session_start) | YEAR(transaction_time) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
transaction_id | |||||||||||||||||||||
10 | 1 | 5 | 127.64 | 2 | desktop | 13244 | A | 127.640000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
2 | 1 | 2 | 109.48 | 2 | desktop | 13244 | B | 118.560000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
438 | 1 | 3 | 95.06 | 2 | desktop | 13244 | B | 110.726667 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
192 | 1 | 4 | 78.92 | 2 | desktop | 13244 | B | 102.775000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
271 | 1 | 3 | 31.54 | 2 | desktop | 13244 | B | 88.528000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
5 rows × 44 columns
How do I prevent label leakage with DFS?#
One concern you might have with using DFS is about label leakage. You want to make sure that labels in your data aren’t used incorrectly to create features and the feature matrix.
Featuretools is particularly focused on helping users avoid label leakage.
There are two ways to prevent label leakage depending on if your data has timestamps or not.
1. Data without timestamps#
In the case where you do not have timestamps, you can create one EntitySet
using only the training data and then run ft.dfs
. This will create a feature matrix using only the training data, but also return a list of feature definitions. Next, you can create an EntitySet
using the test data and recalculate the same features by calling ft.calculate_feature_matrix
with the list of feature definitions from before.
Here is what that flow would look like:
First, let’s create our training data.
[34]:
train_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"age": [40, 50, 10, 20, 30],
"gender": ["m", "f", "m", "f", "f"],
"signup_date": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
"labels": [True, False, True, False, True],
}
)
train_data.head()
[34]:
customer_id | age | gender | signup_date | labels | |
---|---|---|---|---|---|
0 | 1 | 40 | m | 2014-01-01 01:41:50 | True |
1 | 2 | 50 | f | 2014-01-01 02:06:50 | False |
2 | 3 | 10 | m | 2014-01-01 02:31:50 | True |
3 | 4 | 20 | f | 2014-01-01 02:56:50 | False |
4 | 5 | 30 | f | 2014-01-01 03:21:50 | True |
Now, we can create an entityset for our training data.
[35]:
es_train_data = ft.EntitySet(id="customer_train_data")
es_train_data = es_train_data.add_dataframe(
dataframe_name="customers", dataframe=train_data, index="customer_id"
)
es_train_data
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[35]:
Entityset: customer_train_data
DataFrames:
customers [Rows: 5, Columns: 5]
Relationships:
No relationships
Next, we are ready to create our features, and feature matrix for the training data. We don’t want Featuretools to use the labels column to build new features, so we will use the ignore_columns
option to exclude it. This would also remove the labels column from the feature matrix, so we will tell DFS to include it as a seed feature.
[36]:
labels_feature = ft.Feature(es_train_data["customers"].ww["labels"])
feature_matrix_train, feature_defs = ft.dfs(
entityset=es_train_data,
target_dataframe_name="customers",
ignore_columns={"customers": ["labels"]},
seed_features=[labels_feature],
)
feature_matrix_train
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[36]:
age | labels | DAY(signup_date) | MONTH(signup_date) | WEEKDAY(signup_date) | YEAR(signup_date) | |
---|---|---|---|---|---|---|
customer_id | ||||||
1 | 40 | True | 1 | 1 | 2 | 2014 |
2 | 50 | False | 1 | 1 | 2 | 2014 |
3 | 10 | True | 1 | 1 | 2 | 2014 |
4 | 20 | False | 1 | 1 | 2 | 2014 |
5 | 30 | True | 1 | 1 | 2 | 2014 |
We will also encode our feature matrix to make machine learning compatible features.
[37]:
feature_matrix_train_enc, features_enc = ft.encode_features(
feature_matrix_train, feature_defs
)
feature_matrix_train_enc.head()
[37]:
age | labels | DAY(signup_date) = 1 | DAY(signup_date) is unknown | MONTH(signup_date) = 1 | MONTH(signup_date) is unknown | WEEKDAY(signup_date) = 2 | WEEKDAY(signup_date) is unknown | YEAR(signup_date) = 2014 | YEAR(signup_date) is unknown | |
---|---|---|---|---|---|---|---|---|---|---|
customer_id | ||||||||||
1 | 40 | True | True | False | True | False | True | False | True | False |
2 | 50 | False | True | False | True | False | True | False | True | False |
3 | 10 | True | True | False | True | False | True | False | True | False |
4 | 20 | False | True | False | True | False | True | False | True | False |
5 | 30 | True | True | False | True | False | True | False | True | False |
Notice how the whole feature matrix only includes numeric and boolean values now.
Now we can use the feature definitions to calculate our feature matrix for the test data, and avoid label leakage.
[38]:
test_train = pd.DataFrame(
{
"customer_id": [6, 7, 8, 9, 10],
"age": [20, 25, 55, 22, 35],
"gender": ["f", "m", "m", "m", "m"],
"signup_date": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
"labels": [True, False, False, True, True],
}
)
es_test_data = ft.EntitySet(id="customer_test_data")
es_test_data = es_test_data.add_dataframe(
dataframe_name="customers",
dataframe=test_train,
index="customer_id",
time_index="signup_date",
)
# Use the feature definitions from earlier
feature_matrix_enc_test = ft.calculate_feature_matrix(
features=features_enc, entityset=es_test_data
)
feature_matrix_enc_test.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[38]:
age | labels | DAY(signup_date) = 1 | DAY(signup_date) is unknown | MONTH(signup_date) = 1 | MONTH(signup_date) is unknown | WEEKDAY(signup_date) = 2 | WEEKDAY(signup_date) is unknown | YEAR(signup_date) = 2014 | YEAR(signup_date) is unknown | |
---|---|---|---|---|---|---|---|---|---|---|
customer_id | ||||||||||
6 | 20 | True | True | False | True | False | True | False | True | False |
7 | 25 | False | True | False | True | False | True | False | True | False |
8 | 55 | False | True | False | True | False | True | False | True | False |
9 | 22 | True | True | False | True | False | True | False | True | False |
10 | 35 | True | True | False | True | False | True | False | True | False |
Check out the Modeling section for an example of using the encoded matrix with sklearn.
2. Data with timestamps#
If your data has timestamps, the best way to prevent label leakage is to use a list of cutoff times, which specify the last point in time data is allowed to be used for each row in the resulting feature matrix. To use cutoff times, you need to set a time index for each time sensitive DataFrame in your entity set.
Tip: Even if your data doesn’t have time stamps, you could add a column with dummy timestamps that can be used by Featuretools as time index.
When you call ft.dfs
, you can provide a DataFrame of cutoff times like this:
[39]:
cutoff_times = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"time": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
}
)
cutoff_times.head()
[39]:
customer_id | time | |
---|---|---|
0 | 1 | 2014-01-01 01:41:50 |
1 | 2 | 2014-01-01 02:06:50 |
2 | 3 | 2014-01-01 02:31:50 |
3 | 4 | 2014-01-01 02:56:50 |
4 | 5 | 2014-01-01 03:21:50 |
[40]:
train_test_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"age": [20, 25, 55, 22, 35],
"gender": ["f", "m", "m", "m", "m"],
"signup_date": pd.date_range("2010-01-01 01:41:50", periods=5, freq="25min"),
}
)
es_train_test_data = ft.EntitySet(id="customer_train_test_data")
es_train_test_data = es_train_test_data.add_dataframe(
dataframe_name="customers",
dataframe=train_test_data,
index="customer_id",
time_index="signup_date",
)
feature_matrix_train_test, features = ft.dfs(
entityset=es_train_test_data,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
)
feature_matrix_train_test.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[40]:
age | DAY(signup_date) | MONTH(signup_date) | WEEKDAY(signup_date) | YEAR(signup_date) | ||
---|---|---|---|---|---|---|
customer_id | time | |||||
1 | 2014-01-01 01:41:50 | 20 | 1 | 1 | 4 | 2010 |
2 | 2014-01-01 02:06:50 | 25 | 1 | 1 | 4 | 2010 |
3 | 2014-01-01 02:31:50 | 55 | 1 | 1 | 4 | 2010 |
4 | 2014-01-01 02:56:50 | 22 | 1 | 1 | 4 | 2010 |
5 | 2014-01-01 03:21:50 | 35 | 1 | 1 | 4 | 2010 |
Above, we have created a feature matrix that uses cutoff times to avoid label leakage. We could also encode this feature matrix using ft.encode_features
.
What is the difference between passing a primitive object versus a string to DFS?#
There are 2 ways to pass primitives to DFS: the primitive object, or a string of the primitive name.
We will use the Transform primitive called TimeSincePrevious
to illustrate the differences.
First, let’s use the string of primitive name.
[41]:
es = ft.demo.load_mock_customer(return_entityset=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[42]:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[],
trans_primitives=["time_since_previous"],
)
feature_matrix
[42]:
zip_code | TIME_SINCE_PREVIOUS(join_date) | |
---|---|---|
customer_id | ||
5 | 60091 | NaN |
4 | 60091 | 22948824.0 |
1 | 60091 | 744019.0 |
3 | 13244 | 10212841.0 |
2 | 13244 | 21282510.0 |
Now, let’s use the primitive object.
[43]:
from featuretools.primitives import TimeSincePrevious
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[],
trans_primitives=[TimeSincePrevious],
)
feature_matrix
[43]:
zip_code | TIME_SINCE_PREVIOUS(join_date) | |
---|---|---|
customer_id | ||
5 | 60091 | NaN |
4 | 60091 | 22948824.0 |
1 | 60091 | 744019.0 |
3 | 13244 | 10212841.0 |
2 | 13244 | 21282510.0 |
As we can see above, the feature matrix is the same.
However, if we need to modify controllable parameters in the primitive, we should use the primitive object. For instance, let’s make TimeSincePrevious return units of hours (the default is in seconds).
[44]:
from featuretools.primitives import TimeSincePrevious
time_since_previous_in_hours = TimeSincePrevious(unit="hours")
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[],
trans_primitives=[time_since_previous_in_hours],
)
feature_matrix
[44]:
zip_code | TIME_SINCE_PREVIOUS(join_date, unit=hours) | |
---|---|---|
customer_id | ||
5 | 60091 | NaN |
4 | 60091 | 6374.673333 |
1 | 60091 | 206.671944 |
3 | 13244 | 2836.900278 |
2 | 13244 | 5911.808333 |
Features#
How can I select features based on some attributes (a specific string, an explicit primitive type, a return type, a given depth)?#
You may wish to select a subset of your features based on some attributes.
Let’s say you wanted to select features that had the string amount
in its name. You can check for this by using the get_name
function on the feature definitions.
[45]:
es = ft.demo.load_mock_customer(return_entityset=True)
feature_defs = ft.dfs(
entityset=es, target_dataframe_name="customers", features_only=True
)
features_with_amount = []
for x in feature_defs:
if "amount" in x.get_name():
features_with_amount.append(x)
features_with_amount[0:5]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[45]:
[<Feature: MAX(transactions.amount)>,
<Feature: MEAN(transactions.amount)>,
<Feature: MIN(transactions.amount)>,
<Feature: SKEW(transactions.amount)>,
<Feature: STD(transactions.amount)>]
You might also want to only select features that are aggregation features.
[46]:
from featuretools import AggregationFeature
features_only_aggregations = []
for x in feature_defs:
if type(x) == AggregationFeature:
features_only_aggregations.append(x)
features_only_aggregations[0:5]
[46]:
[<Feature: COUNT(sessions)>,
<Feature: MODE(sessions.device)>,
<Feature: NUM_UNIQUE(sessions.device)>,
<Feature: COUNT(transactions)>,
<Feature: MAX(transactions.amount)>]
Also, you might only want to select features that are calculated at a certain depth. You can do this by using the get_depth
function.
[47]:
features_only_depth_2 = []
for x in feature_defs:
if x.get_depth() == 2:
features_only_depth_2.append(x)
features_only_depth_2[0:5]
[47]:
[<Feature: MAX(sessions.COUNT(transactions))>,
<Feature: MAX(sessions.MEAN(transactions.amount))>,
<Feature: MAX(sessions.MIN(transactions.amount))>,
<Feature: MAX(sessions.NUM_UNIQUE(transactions.product_id))>,
<Feature: MAX(sessions.SKEW(transactions.amount))>]
Finally, you might only want features that return a certain type. You can do this by using the column_schema
attribute. For more information on working with column schemas, take a look at Transitioning from Variables to Woodwork.
[48]:
features_only_numeric = []
for x in feature_defs:
if "numeric" in x.column_schema.semantic_tags:
features_only_numeric.append(x)
features_only_numeric[0:5]
[48]:
[<Feature: COUNT(sessions)>,
<Feature: NUM_UNIQUE(sessions.device)>,
<Feature: COUNT(transactions)>,
<Feature: MAX(transactions.amount)>,
<Feature: MEAN(transactions.amount)>]
Once you have your specific feature list, you can use ft.calculate_feature_matrix
to generate a feature matrix for only those features.
For our example, let’s use the features with only the string amount
in its name.
[49]:
feature_matrix = ft.calculate_feature_matrix(
entityset=es, features=features_with_amount
) # change to your specific feature list
feature_matrix.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
[49]:
MAX(transactions.amount) | MEAN(transactions.amount) | MIN(transactions.amount) | SKEW(transactions.amount) | STD(transactions.amount) | SUM(transactions.amount) | MAX(sessions.MEAN(transactions.amount)) | MAX(sessions.MIN(transactions.amount)) | MAX(sessions.SKEW(transactions.amount)) | MAX(sessions.STD(transactions.amount)) | ... | STD(sessions.MAX(transactions.amount)) | STD(sessions.MEAN(transactions.amount)) | STD(sessions.MIN(transactions.amount)) | STD(sessions.SKEW(transactions.amount)) | STD(sessions.SUM(transactions.amount)) | SUM(sessions.MAX(transactions.amount)) | SUM(sessions.MEAN(transactions.amount)) | SUM(sessions.MIN(transactions.amount)) | SUM(sessions.SKEW(transactions.amount)) | SUM(sessions.STD(transactions.amount)) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | |||||||||||||||||||||
5 | 149.02 | 80.375443 | 7.55 | -0.025941 | 44.095630 | 6349.66 | 94.481667 | 20.65 | 0.602209 | 51.149250 | ... | 7.928001 | 11.007471 | 4.961414 | 0.415426 | 402.775486 | 839.76 | 472.231119 | 86.49 | 0.014384 | 259.873954 |
4 | 149.95 | 80.070459 | 5.73 | -0.036348 | 45.068765 | 8727.68 | 110.450000 | 54.83 | 0.382868 | 54.293903 | ... | 3.514421 | 13.027258 | 16.960575 | 0.387884 | 235.992478 | 1157.99 | 649.657515 | 131.51 | 0.002764 | 356.125829 |
1 | 139.43 | 71.631905 | 5.81 | 0.019698 | 40.442059 | 9025.62 | 88.755625 | 26.36 | 0.640252 | 46.905665 | ... | 7.322191 | 13.759314 | 6.954507 | 0.589386 | 279.510713 | 1057.97 | 582.193117 | 78.59 | -0.476122 | 312.745952 |
3 | 149.15 | 67.060430 | 5.89 | 0.418230 | 43.683296 | 6236.62 | 82.109444 | 20.06 | 0.854976 | 50.110120 | ... | 10.724241 | 11.174282 | 5.424407 | 0.429374 | 219.021420 | 847.63 | 405.237462 | 66.21 | 2.286086 | 257.299895 |
2 | 146.81 | 77.422366 | 8.73 | 0.098259 | 37.705178 | 7200.28 | 96.581000 | 56.46 | 0.755711 | 47.935920 | ... | 17.221593 | 11.477071 | 15.874374 | 0.509798 | 251.609234 | 931.63 | 548.905851 | 154.60 | -0.277640 | 258.700528 |
5 rows × 37 columns
Above, notice how all the column names for our feature matrix contain the string amount
.
How do I create where features?#
Sometimes, you might want to create features that are conditioned on a second value before it is calculated. This extra filter is called a “where clause”. You can create these features using the using the interesting_values
of a column.
If you have categorical columns in your EntitySet
, you can use add_interesting_values
. This function will find interesting values for your categorical columns, which can then be used to generate “where” clauses.
First, let’s create our EntitySet
.
[50]:
es = ft.demo.load_mock_customer(return_entityset=True)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[50]:
Entityset: transactions
DataFrames:
transactions [Rows: 500, Columns: 6]
products [Rows: 5, Columns: 3]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 5]
Relationships:
transactions.product_id -> products.product_id
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
Now we can add the interesting values for the categorical column.
[51]:
es.add_interesting_values()
Now we can run DFS with the where_primitives
argument to define which primitives to apply with where clauses. In this case, let’s use the primitive count
. For this to work, the primitive count
must be present in both agg_primitives
and where_primitives
.
[52]:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=["count"],
where_primitives=["count"],
trans_primitives=[],
)
feature_matrix.head()
[52]:
zip_code | COUNT(sessions) | COUNT(transactions) | COUNT(sessions WHERE device = desktop) | COUNT(sessions WHERE device = mobile) | COUNT(sessions WHERE device = tablet) | COUNT(sessions WHERE customers.zip_code = 60091) | COUNT(sessions WHERE customers.zip_code = 13244) | COUNT(transactions WHERE sessions.device = tablet) | COUNT(transactions WHERE sessions.device = mobile) | COUNT(transactions WHERE sessions.device = desktop) | |
---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | |||||||||||
5 | 60091 | 6 | 79 | 2 | 3 | 1 | 6 | 0 | 14 | 36 | 29 |
4 | 60091 | 8 | 109 | 3 | 4 | 1 | 8 | 0 | 18 | 53 | 38 |
1 | 60091 | 8 | 126 | 2 | 3 | 3 | 8 | 0 | 43 | 56 | 27 |
3 | 13244 | 6 | 93 | 4 | 1 | 1 | 0 | 6 | 15 | 16 | 62 |
2 | 13244 | 7 | 93 | 3 | 2 | 2 | 0 | 7 | 28 | 31 | 34 |
We have now created some useful features. One example of a useful feature is the COUNT(sessions WHERE device = tablet)
. This feature tells us how many sessions a customer completed on a tablet.
[53]:
feature_matrix[["COUNT(sessions WHERE device = tablet)"]]
[53]:
COUNT(sessions WHERE device = tablet) | |
---|---|
customer_id | |
5 | 1 |
4 | 1 |
1 | 3 |
3 | 1 |
2 | 2 |
Primitives#
What is the difference between the primitive types (Transform, GroupBy Transform, & Aggregation)?#
You might curious to know the difference between the primitive groups. Let’s review the differences between transform, groupby transform, and aggregation primitives.
First, let’s create a simple EntitySet
.
[54]:
import pandas as pd
import featuretools as ft
df = pd.DataFrame(
{
"id": [1, 2, 3, 4, 5, 6],
"time_index": pd.date_range("1/1/2019", periods=6, freq="D"),
"group": ["a", "a", "a", "a", "a", "a"],
"val": [5, 1, 10, 20, 6, 23],
}
)
es = ft.EntitySet()
es = es.add_dataframe(
dataframe_name="observations", dataframe=df, index="id", time_index="time_index"
)
es = es.normalize_dataframe(
base_dataframe_name="observations", new_dataframe_name="groups", index="group"
)
es.plot()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[54]:
After calling normalize_dataframe
, the column “group” has the semantic tag “foreign_key” because it identifies another DataFrame. Alternatively, it could be set using the semantic_tags
parameter when we first call es.add_dataframe()
.
Transform Primitive#
The cum_sum primitive calculates the running sum in list of numbers.
[55]:
from featuretools.primitives import CumSum
cum_sum = CumSum()
cum_sum([1, 2, 3, 4, 5]).tolist()
[55]:
[1, 3, 6, 10, 15]
If we apply it using the trans_primitives
argument it will calculate it over the entire observations DataFrame like this:
[56]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=[],
trans_primitives=["cum_sum"],
groupby_trans_primitives=[],
)
feature_matrix
[56]:
group | val | CUM_SUM(val) | |
---|---|---|---|
id | |||
1 | a | 5 | 5.0 |
2 | a | 1 | 6.0 |
3 | a | 10 | 16.0 |
4 | a | 20 | 36.0 |
5 | a | 6 | 42.0 |
6 | a | 23 | 65.0 |
Groupby Transform Primitive#
If we apply it using groupby_trans_primitives
, then DFS will first group by any foreign key columns before applying the transform primitive. As a result, we get the cumulative sum by group.
[57]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=[],
trans_primitives=[],
groupby_trans_primitives=["cum_sum"],
)
feature_matrix
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:545: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped = frame.groupby(groupby)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:588: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
frame[name].update(pd.concat(col_vals))
[57]:
group | val | CUM_SUM(val) by group | |
---|---|---|---|
id | |||
1 | a | 5 | 5.0 |
2 | a | 1 | 6.0 |
3 | a | 10 | 16.0 |
4 | a | 20 | 36.0 |
5 | a | 6 | 42.0 |
6 | a | 23 | 65.0 |
Aggregation Primitive#
Finally, there is also the aggregation primitive “sum”. If we use sum, it will calculate the sum for the group at the cutoff time for each row. Because we didn’t specify a cutoff time it will use all the data for each group for each row.
[58]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=["sum"],
trans_primitives=[],
cutoff_time_in_index=True,
groupby_trans_primitives=[],
)
feature_matrix
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[58]:
group | val | groups.SUM(observations.val) | ||
---|---|---|---|---|
id | time | |||
1 | 2024-06-21 13:49:41.435775 | a | 5 | 65.0 |
2 | 2024-06-21 13:49:41.435775 | a | 1 | 65.0 |
3 | 2024-06-21 13:49:41.435775 | a | 10 | 65.0 |
4 | 2024-06-21 13:49:41.435775 | a | 20 | 65.0 |
5 | 2024-06-21 13:49:41.435775 | a | 6 | 65.0 |
6 | 2024-06-21 13:49:41.435775 | a | 23 | 65.0 |
If we set the cutoff time of each row to be the time index, then use sum as an aggregation primitive, the result is the same as cum_sum. (Though the order is different in the displayed dataframe).
[59]:
cutoff_time = df[["id", "time_index"]]
cutoff_time
[59]:
id | time_index | |
---|---|---|
1 | 1 | 2019-01-01 |
2 | 2 | 2019-01-02 |
3 | 3 | 2019-01-03 |
4 | 4 | 2019-01-04 |
5 | 5 | 2019-01-05 |
6 | 6 | 2019-01-06 |
[60]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=["sum"],
trans_primitives=[],
groupby_trans_primitives=[],
cutoff_time_in_index=True,
cutoff_time=cutoff_time,
)
feature_matrix
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[60]:
group | val | groups.SUM(observations.val) | ||
---|---|---|---|---|
id | time | |||
1 | 2019-01-01 | a | 5 | 5.0 |
2 | 2019-01-02 | a | 1 | 6.0 |
3 | 2019-01-03 | a | 10 | 16.0 |
4 | 2019-01-04 | a | 20 | 36.0 |
5 | 2019-01-05 | a | 6 | 42.0 |
6 | 2019-01-06 | a | 23 | 65.0 |
How do I get a list of all Aggregation and Transform primitives?#
You can do featuretools.list_primitives()
to get all the primitive in Featuretools. It will return a DataFrame with the names, type, and description of the primitives.
[61]:
df_primitives = ft.list_primitives()
df_primitives.head()
[61]:
name | type | description | valid_inputs | return_type | |
---|---|---|---|---|---|
0 | n_most_common | aggregation | Determines the `n` most common elements. | <ColumnSchema (Semantic Tags = ['category'])> | None |
1 | time_since_first | aggregation | Calculates the time elapsed since the first da... | <ColumnSchema (Logical Type = Datetime) (Seman... | <ColumnSchema (Logical Type = Double) (Semanti... |
2 | num_consecutive_greater_mean | aggregation | Determines the length of the longest subsequen... | <ColumnSchema (Semantic Tags = ['numeric'])> | <ColumnSchema (Logical Type = IntegerNullable)... |
3 | n_unique_months | aggregation | Determines the number of unique months. | <ColumnSchema (Logical Type = Datetime)> | <ColumnSchema (Logical Type = Integer) (Semant... |
4 | n_unique_days_of_calendar_year | aggregation | Determines the number of unique calendar days. | <ColumnSchema (Logical Type = Datetime)> | <ColumnSchema (Logical Type = Integer) (Semant... |
[62]:
df_primitives.tail()
[62]:
name | type | description | valid_inputs | return_type | |
---|---|---|---|---|---|
198 | expanding_trend | transform | Computes the expanding trend for events over a... | <ColumnSchema (Logical Type = Datetime) (Seman... | <ColumnSchema (Logical Type = Double) (Semanti... |
199 | date_to_time_zone | transform | Determines the timezone of a datetime. | <ColumnSchema (Logical Type = Datetime)> | <ColumnSchema (Logical Type = Categorical) (Se... |
200 | is_quarter_start | transform | Determines the is_quarter_start attribute of a... | <ColumnSchema (Logical Type = Datetime)> | <ColumnSchema (Logical Type = BooleanNullable)> |
201 | num_characters | transform | Calculates the number of characters in a given... | <ColumnSchema (Logical Type = NaturalLanguage)> | <ColumnSchema (Logical Type = IntegerNullable)... |
202 | less_than_scalar | transform | Determines if values are less than a given sca... | <ColumnSchema (Semantic Tags = ['numeric'])> | <ColumnSchema (Logical Type = BooleanNullable)> |
How do I change the units for a TimeSince primitive?#
There are a few primitives in Featuretools that make some time-based calculation. These include TimeSince, TimeSincePrevious, TimeSinceLast, TimeSinceFirst
.
You can change the units from the default seconds to any valid time unit, by doing the following:
[63]:
from featuretools.primitives import (
TimeSince,
TimeSinceFirst,
TimeSinceLast,
TimeSincePrevious,
)
time_since = TimeSince(unit="minutes")
time_since_previous = TimeSincePrevious(unit="hours")
time_since_last = TimeSinceLast(unit="days")
time_since_first = TimeSinceFirst(unit="years")
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[time_since_last, time_since_first],
trans_primitives=[time_since, time_since_previous],
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
Above, we changed the units to the following: - minutes for TimeSince
- hours for TimeSincePrevious
- days for TimeSinceLast
- years for TimeSinceFirst
.
Now we can see that our feature matrix contains multiple features where the units for the TimeSince primitives are changed.
[64]:
feature_matrix.head()
[64]:
zip_code | TIME_SINCE_FIRST(sessions.session_start, unit=years) | TIME_SINCE_LAST(sessions.session_start, unit=days) | TIME_SINCE_FIRST(transactions.transaction_time, unit=years) | TIME_SINCE_LAST(transactions.transaction_time, unit=days) | TIME_SINCE(birthday, unit=minutes) | TIME_SINCE(join_date, unit=minutes) | TIME_SINCE_PREVIOUS(join_date, unit=hours) | TIME_SINCE_FIRST(transactions.sessions.session_start, unit=years) | TIME_SINCE_LAST(transactions.sessions.session_start, unit=days) | |
---|---|---|---|---|---|---|---|---|---|---|
customer_id | ||||||||||
5 | 60091 | 10.476929 | 3824.241397 | 10.476929 | 3824.236131 | 2.098595e+07 | 7.327222e+06 | NaN | 10.476929 | 3824.241397 |
4 | 60091 | 10.476908 | 3824.352740 | 10.476908 | 3824.345969 | 9.389630e+06 | 6.944741e+06 | 6374.673333 | 10.476908 | 3824.352740 |
1 | 60091 | 10.476878 | 3824.277509 | 10.476878 | 3824.266224 | 1.574147e+07 | 6.932341e+06 | 206.671944 | 10.476878 | 3824.277509 |
3 | 13244 | 10.476772 | 3824.212057 | 10.476772 | 3824.200772 | 1.082675e+07 | 6.762127e+06 | 2836.900278 | 10.476772 | 3824.212057 |
2 | 13244 | 10.476962 | 3824.235379 | 10.476962 | 3824.226351 | 1.990451e+07 | 6.407419e+06 | 5911.808333 | 10.476962 | 3824.235379 |
There are now features where time unit is different from the default of seconds, such as TIME_SINCE_LAST(sessions.session_start, unit=days)
, and TIME_SINCE_FIRST(sessions.session_start, unit=years)
.
Modeling#
How does my train & test data work with Featuretools and sklearn’s train_test_split?#
You might be wondering how to properly use your train & test data with Featuretools, and sklearn’s train_test_split. There are a few things you must do to ensure accuracy with this workflow.
Let’s imagine we have a Dataframes for our train data, with the labels.
[65]:
train_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"age": [20, 25, 55, 22, 35],
"gender": ["f", "m", "m", "m", "m"],
"signup_date": pd.date_range("2010-01-01 01:41:50", periods=5, freq="25min"),
"labels": [False, True, True, False, False],
}
)
train_data.head()
[65]:
customer_id | age | gender | signup_date | labels | |
---|---|---|---|---|---|
0 | 1 | 20 | f | 2010-01-01 01:41:50 | False |
1 | 2 | 25 | m | 2010-01-01 02:06:50 | True |
2 | 3 | 55 | m | 2010-01-01 02:31:50 | True |
3 | 4 | 22 | m | 2010-01-01 02:56:50 | False |
4 | 5 | 35 | m | 2010-01-01 03:21:50 | False |
Now we can create our EntitySet
for the train data, and create our features. To prevent label leakage, we will use cutoff times (see earlier question).
[66]:
es_train_data = ft.EntitySet(id="customer_data")
es_train_data = es_train_data.add_dataframe(
dataframe_name="customers", dataframe=train_data, index="customer_id"
)
cutoff_times = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"time": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
}
)
feature_matrix_train, features = ft.dfs(
entityset=es_train_data,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
)
feature_matrix_train.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[66]:
age | labels | DAY(signup_date) | MONTH(signup_date) | WEEKDAY(signup_date) | YEAR(signup_date) | ||
---|---|---|---|---|---|---|---|
customer_id | time | ||||||
1 | 2014-01-01 01:41:50 | 20 | False | 1 | 1 | 4 | 2010 |
2 | 2014-01-01 02:06:50 | 25 | True | 1 | 1 | 4 | 2010 |
3 | 2014-01-01 02:31:50 | 55 | True | 1 | 1 | 4 | 2010 |
4 | 2014-01-01 02:56:50 | 22 | False | 1 | 1 | 4 | 2010 |
5 | 2014-01-01 03:21:50 | 35 | False | 1 | 1 | 4 | 2010 |
We will also encode our feature matrix to compatible for machine learning algorithms.
[67]:
feature_matrix_train_enc, feature_enc = ft.encode_features(
feature_matrix_train, features
)
feature_matrix_train_enc.head()
[67]:
age | labels | DAY(signup_date) = 1 | DAY(signup_date) is unknown | MONTH(signup_date) = 1 | MONTH(signup_date) is unknown | WEEKDAY(signup_date) = 4 | WEEKDAY(signup_date) is unknown | YEAR(signup_date) = 2010 | YEAR(signup_date) is unknown | ||
---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | time | ||||||||||
1 | 2014-01-01 01:41:50 | 20 | False | True | False | True | False | True | False | True | False |
2 | 2014-01-01 02:06:50 | 25 | True | True | False | True | False | True | False | True | False |
3 | 2014-01-01 02:31:50 | 55 | True | True | False | True | False | True | False | True | False |
4 | 2014-01-01 02:56:50 | 22 | False | True | False | True | False | True | False | True | False |
5 | 2014-01-01 03:21:50 | 35 | False | True | False | True | False | True | False | True | False |
[68]:
from sklearn.model_selection import train_test_split
X = feature_matrix_train_enc.drop(["labels"], axis=1)
y = feature_matrix_train_enc["labels"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
Now you can use the encoded feature matrix with sklearn’s train_test_split. This will allow you to train your model, and tune your parameters.
How are categorical columns encoded when splitting training and testing data?#
You might be wondering what happens when categorical columns are encoded with your training and testing data. You might be curious to know what happens if the train data has a categorical column that is not present in the testing data.
Let’s explore a simple example to see what happens during the encoding process.
[69]:
train_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"product_purchased": ["coke zero", "car", "toothpaste", "coke zero", "car"],
}
)
es_train = ft.EntitySet(id="customer_data")
es_train = es_train.add_dataframe(
dataframe_name="customers",
dataframe=train_data,
index="customer_id",
logical_types={"product_purchased": ww.logical_types.Categorical},
)
feature_matrix_train, features = ft.dfs(
entityset=es_train, target_dataframe_name="customers"
)
feature_matrix_train
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[69]:
product_purchased | |
---|---|
customer_id | |
1 | coke zero |
2 | car |
3 | toothpaste |
4 | coke zero |
5 | car |
We will use ft.encode_features
to properly encode the product_purchased
column.
[70]:
feature_matrix_train_encoded, features_encoded = ft.encode_features(
feature_matrix_train, features
)
feature_matrix_train_encoded.head()
[70]:
product_purchased = coke zero | product_purchased = car | product_purchased = toothpaste | product_purchased is unknown | |
---|---|---|---|---|
customer_id | ||||
1 | True | False | False | False |
2 | False | True | False | False |
3 | False | False | True | False |
4 | True | False | False | False |
5 | False | True | False | False |
Now lets imagine we have some test data that has doesn’t have one of the categorical values (toothpaste). Also, the test data has a value that wasn’t present in the train data (water).
[71]:
test_data = pd.DataFrame(
{
"customer_id": [6, 7, 8, 9, 10],
"product_purchased": ["coke zero", "car", "coke zero", "coke zero", "water"],
}
)
es_test = ft.EntitySet(id="customer_data")
es_test = es_test.add_dataframe(
dataframe_name="customers", dataframe=test_data, index="customer_id"
)
feature_matrix_test = ft.calculate_feature_matrix(
entityset=es_test, features=features_encoded
)
feature_matrix_test.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[71]:
product_purchased = coke zero | product_purchased = car | product_purchased = toothpaste | product_purchased is unknown | |
---|---|---|---|---|
customer_id | ||||
6 | True | False | False | False |
7 | False | True | False | False |
8 | True | False | False | False |
9 | True | False | False | False |
10 | False | False | False | True |
As seen above, we were able to successfully handle the encoding, and deal with the following complications: - toothpaste was present in the training data but not present in the testing data - water was present in the test data but not present in the training data.
Errors & Warnings#
Why am I getting this error ‘Index is not unique on dataframe’?#
You may be trying to create your EntitySet
, and run into this error.
IndexError: Index column must be unique
This is because each dataframe in your EntitySet needs a unique index.
Let’s look at a simple example.
[72]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 4], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
product_df
[72]:
id | rating | |
---|---|---|
0 | 1 | 3.5 |
1 | 2 | 4.0 |
2 | 3 | 4.5 |
3 | 4 | 1.5 |
4 | 4 | 5.0 |
Notice how the id
column has a duplicate index of 4
. If you try to add this dataframe to the EntitySet, you will run into the following error.
es = ft.EntitySet(id="product_data")
es = es.add_dataframe(dataframe_name="products",
dataframe=product_df,
index="id")
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-78-854fbaf207f8> in <module>
1 es = ft.EntitySet(id="product_data")
----> 2 es = es.add_dataframe(dataframe_name="products",
3 dataframe=product_df,
4 index="id")
~/Code/featuretools/featuretools/entityset/entityset.py in add_dataframe(self, dataframe, dataframe_name, index, logical_types, semantic_tags, make_index, time_index, secondary_time_index, already_sorted)
625 index_was_created, index, dataframe = _get_or_create_index(index, make_index, dataframe)
626
--> 627 dataframe.ww.init(name=dataframe_name,
628 index=index,
629 time_index=time_index,
/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in init(self, index, time_index, logical_types, already_sorted, schema, validate, use_standard_tags, **kwargs)
94 """
95 if validate:
---> 96 _validate_accessor_params(self._dataframe, index, time_index, logical_types, schema, use_standard_tags)
97 if schema is not None:
98 self._schema = schema
/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in _validate_accessor_params(dataframe, index, time_index, logical_types, schema, use_standard_tags)
877 # We ignore these parameters if a schema is passed
878 if index is not None:
--> 879 _check_index(dataframe, index)
880 if logical_types:
881 _check_logical_types(dataframe.columns, logical_types)
/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in _check_index(dataframe, index)
903 # User specifies an index that is in the dataframe but not unique
--> 904 raise IndexError('Index column must be unique')
905
906
IndexError: Index column must be unique
To fix the above error, you can do one of the following solutions:
Solution #1 - You can create a unique index on your Dataframe.
[73]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 5], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
product_df
[73]:
id | rating | |
---|---|---|
0 | 1 | 3.5 |
1 | 2 | 4.0 |
2 | 3 | 4.5 |
3 | 4 | 1.5 |
4 | 5 | 5.0 |
Notice how we now have a unique index column called id
.
[74]:
es = es.add_dataframe(dataframe_name="products", dataframe=product_df, index="id")
es
[74]:
Entityset: transactions
DataFrames:
transactions [Rows: 500, Columns: 6]
products [Rows: 5, Columns: 2]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 5]
Relationships:
transactions.product_id -> products.product_id
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
As seen above, we can now create our DataFrame for our EntitySet
without an error by creating a unique index in our Dataframe.
Solution #2 - Set make_index to True in your call to add_dataframe to create a new index on that data - make_index
creates a unique index for each row by just looking at what number the row is, in relation to all the other rows.
[75]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 4], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
es = ft.EntitySet(id="product_data")
es = es.add_dataframe(
dataframe_name="products", dataframe=product_df, index="product_id", make_index=True
)
es["products"]
[75]:
product_id | id | rating | |
---|---|---|---|
0 | 0 | 1 | 3.5 |
1 | 1 | 2 | 4.0 |
2 | 2 | 3 | 4.5 |
3 | 3 | 4 | 1.5 |
4 | 4 | 4 | 5.0 |
As seen above, we created our dataframe for our EntitySet
without an error using the make_index
argument.
Why am I getting the following warning ‘Using training_window but last_time_index is not set’?#
If you are using a training window, and you haven’t set a last_time_index
for your dataframe, you will get this warning. The training window attribute in Featuretools limits the amount of past data that can be used while calculating a particular feature vector.
You can add the last_time_index
to all dataframes automatically by calling your_entityset.add_last_time_indexes()
after you create your EntitySet
. This will remove the warning.
[76]:
es = ft.demo.load_mock_customer(return_entityset=True)
es.add_last_time_indexes()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
Now we can run DFS without getting the warning.
[77]:
cutoff_times = pd.DataFrame()
cutoff_times["customer_id"] = [1, 2, 3, 1]
cutoff_times["time"] = pd.to_datetime(
["2014-1-1 04:00", "2014-1-1 05:00", "2014-1-1 06:00", "2014-1-1 08:00"]
)
cutoff_times["label"] = [True, True, False, True]
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
training_window="1 hour",
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f5439781040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f5439781940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f5439781820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f543977dee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f543977d8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
last_time_index vs. time_index#
The
time_index
is when the instance was first known.The
last_time_index
is when the instance appears for the last time.For example, a customer’s session has multiple transactions which can happen at different points in time. If we are trying to count the number of sessions a user has in a given time period, we often want to count all the sessions that had any transaction during the training window. To accomplish this, we need to not only know when a session starts (time_index), but also when it ends (last_time_index). The last time that an instance appears in the data is stored as the
last_time_index
of a dataframe.Once the last_time_index has been set, Featuretools will check to see if the last_time_index is after the start of the training window. That, combined with the cutoff time, allows DFS to discover which data is relevant for a given training window.
Why am I getting errors with Featuretools on Google Colab?#
Google Colab, by default, has Featuretools 0.4.1
installed. You may run into issues following our newest guides, or latest documentation while using an older version of Featuretools. Therefore, we suggest you upgrade to the latest featuretools version by doing the following in your notebook in Google Colab:
!pip install -U featuretools
You may need to Restart the runtime by doing Runtime -> Restart Runtime. You can check latest Featuretools version by doing following:
import featuretools as ft
print(ft.__version__)
You should see a version greater than 0.4.1