What is Featuretools?#

Featuretools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

5 Minute Quick Start#

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

[1]:
import featuretools as ft

Load Mock Data#

[2]:
data = ft.demo.load_mock_customer()

Prepare data#

In this toy dataset, there are 3 DataFrames.

  • customers: unique customers who had sessions

  • sessions: unique sessions and associated attributes

  • transactions: list of events in this session

[3]:
customers_df = data["customers"]
customers_df
[3]:
customer_id zip_code join_date birthday
0 1 60091 2011-04-17 10:48:33 1994-07-18
1 2 13244 2012-04-15 23:31:04 1986-08-18
2 3 13244 2011-08-13 15:42:34 2003-11-21
3 4 60091 2011-04-08 20:08:14 2006-08-15
4 5 60091 2010-07-17 05:27:50 1984-07-28
[4]:
sessions_df = data["sessions"]
sessions_df.sample(5)
[4]:
session_id customer_id device session_start
13 14 1 tablet 2014-01-01 03:28:00
6 7 3 tablet 2014-01-01 01:39:40
1 2 5 mobile 2014-01-01 00:17:20
28 29 1 mobile 2014-01-01 07:10:05
24 25 3 desktop 2014-01-01 05:59:40
[5]:
transactions_df = data["transactions"]
transactions_df.sample(5)
[5]:
transaction_id session_id transaction_time product_id amount
74 417 5 2014-01-01 01:20:10 1 139.20
231 229 17 2014-01-01 04:10:15 2 90.79
434 127 31 2014-01-01 07:50:10 3 62.35
420 359 30 2014-01-01 07:35:00 3 72.70
54 249 4 2014-01-01 00:58:30 4 43.59

First, we specify a dictionary with all the DataFrames in our dataset. The DataFrames are passed in with their index column and time index column if one exists for the DataFrame.

[6]:
dataframes = {
    "customers": (customers_df, "customer_id"),
    "sessions": (sessions_df, "session_id", "session_start"),
    "transactions": (transactions_df, "transaction_id", "transaction_time"),
}

Second, we specify how the DataFrames are related. When two DataFrames have a one-to-many relationship, we call the “one” DataFrame, the “parent DataFrame”. A relationship between a parent and child is defined like this:

(parent_dataframe, parent_column, child_dataframe, child_column)

In this dataset we have two relationships

[7]:
relationships = [
    ("sessions", "session_id", "transactions", "session_id"),
    ("customers", "customer_id", "sessions", "customer_id"),
]

Note

To manage setting up DataFrames and relationships, we recommend using the EntitySet class which offers convenient APIs for managing data like this. See Representing Data with EntitySets for more information.

Run Deep Feature Synthesis#

A minimal input to DFS is a dictionary of DataFrames, a list of relationships, and the name of the target DataFrame whose features we want to calculate. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.

Let’s first create a feature matrix for each customer in the data

[8]:
feature_matrix_customers, features_defs = ft.dfs(
    dataframes=dataframes,
    relationships=relationships,
    target_dataframe_name="customers",
)
feature_matrix_customers
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f0742dbbf70> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f0742dbbf70> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f0742dbbf70> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f0742dbbf70> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f0742dbbf70> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
[8]:
zip_code COUNT(sessions) MODE(sessions.device) NUM_UNIQUE(sessions.device) COUNT(transactions) MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) MODE(transactions.product_id) NUM_UNIQUE(transactions.product_id) ... STD(sessions.SKEW(transactions.amount)) STD(sessions.SUM(transactions.amount)) SUM(sessions.MAX(transactions.amount)) SUM(sessions.MEAN(transactions.amount)) SUM(sessions.MIN(transactions.amount)) SUM(sessions.NUM_UNIQUE(transactions.product_id)) SUM(sessions.SKEW(transactions.amount)) SUM(sessions.STD(transactions.amount)) MODE(transactions.sessions.device) NUM_UNIQUE(transactions.sessions.device)
customer_id
1 60091 8 mobile 3 126 139.43 71.631905 5.81 4 5 ... 0.589386 279.510713 1057.97 582.193117 78.59 40.0 -0.476122 312.745952 mobile 3
2 13244 7 desktop 3 93 146.81 77.422366 8.73 4 5 ... 0.509798 251.609234 931.63 548.905851 154.60 35.0 -0.277640 258.700528 desktop 3
3 13244 6 desktop 3 93 149.15 67.060430 5.89 1 5 ... 0.429374 219.021420 847.63 405.237462 66.21 29.0 2.286086 257.299895 desktop 3
4 60091 8 mobile 3 109 149.95 80.070459 5.73 2 5 ... 0.387884 235.992478 1157.99 649.657515 131.51 37.0 0.002764 356.125829 mobile 3
5 60091 6 mobile 3 79 149.02 80.375443 7.55 5 5 ... 0.415426 402.775486 839.76 472.231119 86.49 30.0 0.014384 259.873954 mobile 3

5 rows × 75 columns

We now have dozens of new features to describe a customer’s behavior.

Change target DataFrame#

One of the reasons DFS is so powerful is that it can create a feature matrix for any DataFrame in our EntitySet. For example, if we wanted to build features for sessions.

[10]:
feature_matrix_sessions, features_defs = ft.dfs(
    dataframes=dataframes, relationships=relationships, target_dataframe_name="sessions"
)
feature_matrix_sessions.head(5)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f0742dbbf70> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f0742dbfee0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f0742d44040> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f0742dbf5e0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f0742dbbf70> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f0742dbf700> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
[10]:
customer_id device COUNT(transactions) MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) MODE(transactions.product_id) NUM_UNIQUE(transactions.product_id) SKEW(transactions.amount) STD(transactions.amount) ... customers.STD(transactions.amount) customers.SUM(transactions.amount) customers.DAY(birthday) customers.DAY(join_date) customers.MONTH(birthday) customers.MONTH(join_date) customers.WEEKDAY(birthday) customers.WEEKDAY(join_date) customers.YEAR(birthday) customers.YEAR(join_date)
session_id
1 2 desktop 16 141.66 76.813125 20.91 3 5 0.295458 41.600976 ... 37.705178 7200.28 18 15 8 4 0 6 1986 2012
2 5 mobile 10 135.25 74.696000 9.32 5 5 -0.160550 45.893591 ... 44.095630 6349.66 28 17 7 7 5 5 1984 2010
3 4 mobile 15 147.73 88.600000 8.70 1 5 -0.324012 46.240016 ... 45.068765 8727.68 15 8 8 4 1 4 2006 2011
4 1 mobile 25 129.00 64.557200 6.29 5 5 0.234349 40.187205 ... 40.442059 9025.62 18 17 7 4 0 6 1994 2011
5 4 mobile 11 139.20 70.638182 7.43 5 5 0.336381 48.918663 ... 45.068765 8727.68 15 8 8 4 1 4 2006 2011

5 rows × 44 columns

Understanding Feature Output#

In general, Featuretools references generated features through the feature name. In order to make features easier to understand, Featuretools offers two additional tools, featuretools.graph_feature() and featuretools.describe_feature(), to help explain what a feature is and the steps Featuretools took to generate it. Let’s look at this example feature:

[11]:
feature = features_defs[18]
feature
[11]:
<Feature: MODE(transactions.WEEKDAY(transaction_time))>
Feature lineage graphs#

Feature lineage graphs visually walk through feature generation. Starting from the base data, they show step by step the primitives applied and intermediate features generated to create the final feature.

[12]:
ft.graph_feature(feature)
[12]:
_images/index_22_0.svg
digraph "MODE(transactions.WEEKDAY(transaction_time))" {
	graph [bb="0,0,1456,156",
		rankdir=LR
	];
	node [label="\N",
		shape=box
	];
	edge [arrowhead=none,
		dir=forward,
		style=dotted
	];
	{
		graph [rank=min];
		"1_WEEKDAY(transaction_time)_weekday"	[height=0.94444,
			label=<<FONT POINT-SIZE="12"><B>Step 1:</B>   Transform<BR></BR></FONT>WEEKDAY>,
			pos="140,41",
			shape=diamond,
			width=3.8889];
	}
	sessions	[height=1.1389,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>★ sessions (target)</B></TD>
    </TR>
    <TR>
        <TD ALIGN="LEFT" port="MODE(transactions.WEEKDAY(transaction_time))" BGCOLOR="#D9EAD3">MODE(transactions.WEEKDAY(transaction_time))</TD>
    </TR>
</TABLE>>,
		pos="1258,79",
		shape=plaintext,
		width=5.5];
	transactions	[height=2.1667,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>transactions</B></TD>
    </TR><TR><TD ALIGN="LEFT" port="session_id">session_id</TD></TR>
<TR><TD ALIGN="LEFT" port="transaction_time">transaction_time</TD></TR>
<TR><TD ALIGN="LEFT" port="WEEKDAY(transaction_time)">WEEKDAY(transaction_time)</TD></TR>
</TABLE>>,
		pos="438.5,78",
		shape=plaintext,
		width=3.4028];
	transactions:transaction_time -> "1_WEEKDAY(transaction_time)_weekday"	[arrowhead="",
		pos="e,229.94,53.305 323.5,59 296.4,59 267.14,57.01 240.16,54.353",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[height=0.52778,
		label="group by
session_id",
		pos="641.5,60",
		width=1.2361];
	transactions:"WEEKDAY(transaction_time)" -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[arrowhead="",
		pos="e,612.79,40.777 554.5,22 571.67,22 589.24,28.414 604,35.971",
		style=solid];
	transactions:session_id -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[pos="554.5,97 574.82,97 595.8,88.179 611.95,79.15"];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode"	[height=0.94444,
		label=<<FONT POINT-SIZE="12"><B>Step 2:</B>   Aggregation<BR></BR></FONT>MODE>,
		pos="873,60",
		shape=diamond,
		width=4.1944];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode" -> sessions:"MODE(transactions.WEEKDAY(transaction_time))"	[arrowhead="",
		pos="e,1067,60 1024.1,60 1035.1,60 1046.1,60 1056.8,60",
		style=solid];
	"1_WEEKDAY(transaction_time)_weekday" -> transactions:"WEEKDAY(transaction_time)"	[arrowhead="",
		pos="e,323.5,22 227.61,28.274 254.68,25.178 284.94,22.601 313.48,22.091",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" -> "0_MODE(transactions.WEEKDAY(transaction_time))_mode"	[arrowhead="",
		pos="e,721.54,60 686.21,60 693.96,60 702.42,60 711.32,60",
		style=solid];
}
Feature descriptions#

Featuretools can also automatically generate English sentence descriptions of features. Feature descriptions help to explain what a feature is, and can be further improved by including manually defined custom definitions. See Generating Feature Descriptions for more details on how to customize automatically generated feature descriptions.

[13]:
ft.describe_feature(feature)
[13]:
'The most frequently occurring value of the day of the week of the "transaction_time" of all instances of "transactions" for each "session_id" in "sessions".'

What’s next?#

Table of contents#

Resources and References