What is Featuretools?

Featuretools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

In [1]: import featuretools as ft

Load Mock Data

In [2]: data = ft.demo.load_mock_customer()

Prepare data

In this toy dataset, there are 3 tables. Each table is called an entity in Featuretools.

  • customers: unique customers who had sessions

  • sessions: unique sessions and associated attributes

  • transactions: list of events in this session

In [3]: customers_df = data["customers"]

In [4]: customers_df
Out[4]: 
   customer_id zip_code           join_date date_of_birth
0            1    60091 2011-04-17 10:48:33    1994-07-18
1            2    13244 2012-04-15 23:31:04    1986-08-18
2            3    13244 2011-08-13 15:42:34    2003-11-21
3            4    60091 2011-04-08 20:08:14    2006-08-15
4            5    60091 2010-07-17 05:27:50    1984-07-28

In [5]: sessions_df = data["sessions"]

In [6]: sessions_df.sample(5)
Out[6]: 
    session_id  customer_id   device       session_start
13          14            1   tablet 2014-01-01 03:28:00
6            7            3   tablet 2014-01-01 01:39:40
1            2            5   mobile 2014-01-01 00:17:20
28          29            1   mobile 2014-01-01 07:10:05
24          25            3  desktop 2014-01-01 05:59:40

In [7]: transactions_df = data["transactions"]

In [8]: transactions_df.sample(5)
Out[8]: 
     transaction_id  session_id    transaction_time product_id  amount
74              232           5 2014-01-01 01:20:10          1  139.20
231              27          17 2014-01-01 04:10:15          2   90.79
434              36          31 2014-01-01 07:50:10          3   62.35
420              56          30 2014-01-01 07:35:00          3   72.70
54              444           4 2014-01-01 00:58:30          4   43.59

First, we specify a dictionary with all the entities in our dataset.

In [9]: entities = {
   ...:    "customers" : (customers_df, "customer_id"),
   ...:    "sessions" : (sessions_df, "session_id", "session_start"),
   ...:    "transactions" : (transactions_df, "transaction_id", "transaction_time")
   ...: }
   ...: 

Second, we specify how the entities are related. When two entities have a one-to-many relationship, we call the “one” enitity, the “parent entity”. A relationship between a parent and child is defined like this:

(parent_entity, parent_variable, child_entity, child_variable)

In this dataset we have two relationships

In [10]: relationships = [("sessions", "session_id", "transactions", "session_id"),
   ....:                  ("customers", "customer_id", "sessions", "customer_id")]
   ....: 

Note

To manage setting up entities and relationships, we recommend using the EntitySet class which offers convenient APIs for managing data like this. See Representing Data with EntitySets for more information.

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities, a list of relationships, and the “target_entity” to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.

Let’s first create a feature matrix for each customer in the data

In [11]: feature_matrix_customers, features_defs = ft.dfs(entities=entities,
   ....:                                                  relationships=relationships,
   ....:                                                  target_entity="customers")
   ....: 

In [12]: feature_matrix_customers
Out[12]: 
            zip_code  COUNT(sessions) MODE(sessions.device)  NUM_UNIQUE(sessions.device)  COUNT(transactions)  MAX(transactions.amount)  MEAN(transactions.amount)  MIN(transactions.amount)  MODE(transactions.product_id)  NUM_UNIQUE(transactions.product_id)  SKEW(transactions.amount)  STD(transactions.amount)  SUM(transactions.amount)  DAY(date_of_birth)  DAY(join_date)  MONTH(date_of_birth)  MONTH(join_date)  WEEKDAY(date_of_birth)  WEEKDAY(join_date)  YEAR(date_of_birth)  YEAR(join_date)  MAX(sessions.COUNT(transactions))  MAX(sessions.MEAN(transactions.amount))  MAX(sessions.MIN(transactions.amount))  MAX(sessions.NUM_UNIQUE(transactions.product_id))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.STD(transactions.amount))  MAX(sessions.SUM(transactions.amount))  MEAN(sessions.COUNT(transactions))  MEAN(sessions.MAX(transactions.amount))  MEAN(sessions.MEAN(transactions.amount))  MEAN(sessions.MIN(transactions.amount))  MEAN(sessions.NUM_UNIQUE(transactions.product_id))  MEAN(sessions.SKEW(transactions.amount))  MEAN(sessions.STD(transactions.amount))  MEAN(sessions.SUM(transactions.amount))  MIN(sessions.COUNT(transactions))  MIN(sessions.MAX(transactions.amount))  MIN(sessions.MEAN(transactions.amount))  MIN(sessions.NUM_UNIQUE(transactions.product_id))  MIN(sessions.SKEW(transactions.amount))  MIN(sessions.STD(transactions.amount))  MIN(sessions.SUM(transactions.amount))  MODE(sessions.DAY(session_start))  MODE(sessions.MODE(transactions.product_id))  MODE(sessions.MONTH(session_start))  MODE(sessions.WEEKDAY(session_start))  MODE(sessions.YEAR(session_start))  NUM_UNIQUE(sessions.DAY(session_start))  NUM_UNIQUE(sessions.MODE(transactions.product_id))  NUM_UNIQUE(sessions.MONTH(session_start))  NUM_UNIQUE(sessions.WEEKDAY(session_start))  NUM_UNIQUE(sessions.YEAR(session_start))  SKEW(sessions.COUNT(transactions))  SKEW(sessions.MAX(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  SKEW(sessions.MIN(transactions.amount))  SKEW(sessions.NUM_UNIQUE(transactions.product_id))  SKEW(sessions.STD(transactions.amount))  SKEW(sessions.SUM(transactions.amount))  STD(sessions.COUNT(transactions))  STD(sessions.MAX(transactions.amount))  STD(sessions.MEAN(transactions.amount))  STD(sessions.MIN(transactions.amount))  STD(sessions.NUM_UNIQUE(transactions.product_id))  STD(sessions.SKEW(transactions.amount))  STD(sessions.SUM(transactions.amount))  SUM(sessions.MAX(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  SUM(sessions.MIN(transactions.amount))  SUM(sessions.NUM_UNIQUE(transactions.product_id))  SUM(sessions.SKEW(transactions.amount))  SUM(sessions.STD(transactions.amount))  MODE(transactions.sessions.customer_id) MODE(transactions.sessions.device)  NUM_UNIQUE(transactions.sessions.customer_id)  NUM_UNIQUE(transactions.sessions.device)
customer_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
1              60091                8                mobile                            3                  126                    139.43                  71.631905                      5.81                              4                                    5                   0.019698                 40.442059                   9025.62                  18              17                     7                 4                       0                   6                 1994             2011                                 25                                88.755625                                   26.36                                                  5                                 0.640252                               46.905665                                 1613.93                           15.750000                               132.246250                                 72.774140                                 9.823750                                           5.000000                                  -0.059515                                39.093244                              1128.202500                                 12                                  118.90                                50.623125                                                  5                                -1.038434                               30.450261                                  809.97                                  1                                             4                                    1                                      2                                2014                                        1                                                  4                                           1                                            1                                         1                            1.946018                                -0.780493                                 -0.424949                                 2.440005                                           0.000000                                 -0.312355                                 0.778170                           4.062019                                7.322191                                13.759314                                6.954507                                           0.000000                                 0.589386                              279.510713                                 1057.97                               582.193117                                   78.59                                                 40                                -0.476122                              312.745952                                        1                             mobile                                              1                                         3
2              13244                7               desktop                            3                   93                    146.81                  77.422366                      8.73                              4                                    5                   0.098259                 37.705178                   7200.28                  18              15                     8                 4                       0                   6                 1986             2012                                 18                                96.581000                                   56.46                                                  5                                 0.755711                               47.935920                                 1320.64                           13.285714                               133.090000                                 78.415122                                22.085714                                           5.000000                                  -0.039663                                36.957218                              1028.611429                                  8                                  100.04                                61.910000                                                  5                                -0.763603                               27.839228                                  634.84                                  1                                             3                                    1                                      2                                2014                                        1                                                  4                                           1                                            1                                         1                           -0.303276                                -1.539467                                  0.235296                                 2.154929                                           0.000000                                  0.013087                                -0.440929                           3.450328                               17.221593                                11.477071                               15.874374                                           0.000000                                 0.509798                              251.609234                                  931.63                               548.905851                                  154.60                                                 35                                -0.277640                              258.700528                                        2                            desktop                                              1                                         3
3              13244                6               desktop                            3                   93                    149.15                  67.060430                      5.89                              1                                    5                   0.418230                 43.683296                   6236.62                  21              13                    11                 8                       4                   5                 2003             2011                                 18                                82.109444                                   20.06                                                  5                                 0.854976                               50.110120                                 1477.97                           15.500000                               141.271667                                 67.539577                                11.035000                                           4.833333                                   0.381014                                42.883316                              1039.436667                                 11                                  126.74                                55.579412                                                  4                                -0.289466                               35.704680                                  889.21                                  1                                             1                                    1                                      2                                2014                                        1                                                  4                                           1                                            1                                         1                           -1.507217                                -0.941078                                  0.678544                                 1.000771                                          -2.449490                                 -0.245703                                 2.246479                           2.428992                               10.724241                                11.174282                                5.424407                                           0.408248                                 0.429374                              219.021420                                  847.63                               405.237462                                   66.21                                                 29                                 2.286086                              257.299895                                        3                            desktop                                              1                                         3
4              60091                8                mobile                            3                  109                    149.95                  80.070459                      5.73                              2                                    5                  -0.036348                 45.068765                   8727.68                  15               8                     8                 4                       1                   4                 2006             2011                                 18                               110.450000                                   54.83                                                  5                                 0.382868                               54.293903                                 1351.46                           13.625000                               144.748750                                 81.207189                                16.438750                                           4.625000                                   0.000346                                44.515729                              1090.960000                                 10                                  139.20                                70.638182                                                  4                                -0.711744                               29.026424                                  771.68                                  1                                             1                                    1                                      2                                2014                                        1                                                  5                                           1                                            1                                         1                            0.282488                                 0.027256                                  1.980948                                 2.103510                                          -0.644061                                 -1.065663                                -0.391805                           3.335416                                3.514421                                13.027258                               16.960575                                           0.517549                                 0.387884                              235.992478                                 1157.99                               649.657515                                  131.51                                                 37                                 0.002764                              356.125829                                        4                             mobile                                              1                                         3
5              60091                6                mobile                            3                   79                    149.02                  80.375443                      7.55                              5                                    5                  -0.025941                 44.095630                   6349.66                  28              17                     7                 7                       5                   5                 1984             2010                                 18                                94.481667                                   20.65                                                  5                                 0.602209                               51.149250                                 1700.67                           13.166667                               139.960000                                 78.705187                                14.415000                                           5.000000                                   0.002397                                43.312326                              1058.276667                                  8                                  128.51                                66.666667                                                  5                                -0.539060                               36.734681                                  543.18                                  1                                             3                                    1                                      2                                2014                                        1                                                  5                                           1                                            1                                         1                           -0.317685                                -0.333796                                  0.335175                                -0.470410                                           0.000000                                  0.204548                                 0.472342                           3.600926                                7.928001                                11.007471                                4.961414                                           0.000000                                 0.415426                              402.775486                                  839.76                               472.231119                                   86.49                                                 30                                 0.014384                              259.873954                                        5                             mobile                                              1                                         3

We now have dozens of new features to describe a customer’s behavior.

Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.

In [13]: feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
   ....:                                                 relationships=relationships,
   ....:                                                 target_entity="sessions")
   ....: 

In [14]: feature_matrix_sessions.head(5)
Out[14]: 
            customer_id   device  COUNT(transactions)  MAX(transactions.amount)  MEAN(transactions.amount)  MIN(transactions.amount)  MODE(transactions.product_id)  NUM_UNIQUE(transactions.product_id)  SKEW(transactions.amount)  STD(transactions.amount)  SUM(transactions.amount)  DAY(session_start)  MONTH(session_start)  WEEKDAY(session_start)  YEAR(session_start) customers.zip_code  MODE(transactions.DAY(transaction_time))  MODE(transactions.MONTH(transaction_time))  MODE(transactions.WEEKDAY(transaction_time))  MODE(transactions.YEAR(transaction_time))  NUM_UNIQUE(transactions.DAY(transaction_time))  NUM_UNIQUE(transactions.MONTH(transaction_time))  NUM_UNIQUE(transactions.WEEKDAY(transaction_time))  NUM_UNIQUE(transactions.YEAR(transaction_time))  customers.COUNT(sessions) customers.MODE(sessions.device)  customers.NUM_UNIQUE(sessions.device)  customers.COUNT(transactions)  customers.MAX(transactions.amount)  customers.MEAN(transactions.amount)  customers.MIN(transactions.amount)  customers.MODE(transactions.product_id)  customers.NUM_UNIQUE(transactions.product_id)  customers.SKEW(transactions.amount)  customers.STD(transactions.amount)  customers.SUM(transactions.amount)  customers.DAY(date_of_birth)  customers.DAY(join_date)  customers.MONTH(date_of_birth)  customers.MONTH(join_date)  customers.WEEKDAY(date_of_birth)  customers.WEEKDAY(join_date)  customers.YEAR(date_of_birth)  customers.YEAR(join_date)
session_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
1                     2  desktop                   16                    141.66                  76.813125                     20.91                              3                                    5                   0.295458                 41.600976                   1229.01                   1                     1                       2                 2014              13244                                         1                                           1                                             2                                       2014                                               1                                                 1                                                  1                                                 1                          7                         desktop                                      3                             93                              146.81                            77.422366                                8.73                                        4                                              5                             0.098259                           37.705178                             7200.28                            18                        15                               8                           4                                 0                             6                           1986                       2012
2                     5   mobile                   10                    135.25                  74.696000                      9.32                              5                                    5                  -0.160550                 45.893591                    746.96                   1                     1                       2                 2014              60091                                         1                                           1                                             2                                       2014                                               1                                                 1                                                  1                                                 1                          6                          mobile                                      3                             79                              149.02                            80.375443                                7.55                                        5                                              5                            -0.025941                           44.095630                             6349.66                            28                        17                               7                           7                                 5                             5                           1984                       2010
3                     4   mobile                   15                    147.73                  88.600000                      8.70                              1                                    5                  -0.324012                 46.240016                   1329.00                   1                     1                       2                 2014              60091                                         1                                           1                                             2                                       2014                                               1                                                 1                                                  1                                                 1                          8                          mobile                                      3                            109                              149.95                            80.070459                                5.73                                        2                                              5                            -0.036348                           45.068765                             8727.68                            15                         8                               8                           4                                 1                             4                           2006                       2011
4                     1   mobile                   25                    129.00                  64.557200                      6.29                              5                                    5                   0.234349                 40.187205                   1613.93                   1                     1                       2                 2014              60091                                         1                                           1                                             2                                       2014                                               1                                                 1                                                  1                                                 1                          8                          mobile                                      3                            126                              139.43                            71.631905                                5.81                                        4                                              5                             0.019698                           40.442059                             9025.62                            18                        17                               7                           4                                 0                             6                           1994                       2011
5                     4   mobile                   11                    139.20                  70.638182                      7.43                              5                                    5                   0.336381                 48.918663                    777.02                   1                     1                       2                 2014              60091                                         1                                           1                                             2                                       2014                                               1                                                 1                                                  1                                                 1                          8                          mobile                                      3                            109                              149.95                            80.070459                                5.73                                        2                                              5                            -0.036348                           45.068765                             8727.68                            15                         8                               8                           4                                 1                             4                           2006                       2011

Understanding Feature Output

In general, Featuretools references generated features through the feature name. In order to make features easier to understand, Featuretools offers two additional tools, featuretools.graph_feature() and featuretools.describe_feature(), to help explain what a feature is and the steps Featuretools took to generate it. [let’s look at this example feature]

In [15]: feature = features_defs[18]

In [16]: feature
Out[16]: <Feature: MODE(transactions.WEEKDAY(transaction_time))>

Feature lineage graphs

Feature lineage graphs visually walk through feature generation. Starting from the base data, they show step by step the primitives applied and intermediate features generated to create the final feature.

In [17]: ft.graph_feature(feature)
Out[17]: <graphviz.dot.Digraph at 0x7f02995a0e10>
digraph "MODE(transactions.WEEKDAY(transaction_time))" {
	graph [bb="0,0,1213,156",
		rankdir=LR
	];
	node [label="\N",
		shape=box
	];
	edge [arrowhead=none,
		dir=forward,
		style=dotted
	];
	{
		graph [rank=min];
		"1_WEEKDAY(transaction_time)_weekday"		 [height=0.94444,
			label=<<FONT POINT-SIZE="12"><B>Step 1:</B>   Transform<BR></BR></FONT>WEEKDAY>,
			pos="111,60",
			shape=diamond,
			width=3.0833];
	}
	sessions	 [height=1.1389,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>★ sessions (target)</B></TD>
    </TR>
    <TR>
        <TD ALIGN="LEFT" port="MODE(transactions.WEEKDAY(transaction_time))" BGCOLOR="#D9EAD3">MODE(transactions.WEEKDAY(transaction_time))</TD>
    </TR>
</TABLE>>,
		pos="1050.5,60",
		shape=plaintext,
		width=4.5139];
	transactions	 [height=2.1667,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>transactions</B></TD>
    </TR><TR><TD ALIGN="LEFT" port="transaction_time">transaction_time</TD></TR>
<TR><TD ALIGN="LEFT" port="session_id">session_id</TD></TR>
<TR><TD ALIGN="LEFT" port="WEEKDAY(transaction_time)">WEEKDAY(transaction_time)</TD></TR>
</TABLE>>,
		pos="361.5,78",
		shape=plaintext,
		width=2.875];
	transactions:transaction_time -> "1_WEEKDAY(transaction_time)_weekday"	 [arrowhead="",
		pos="e,162.11,78.5 266.5,97 234.73,97 200.27,89.598 171.88,81.416",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	 [height=0.52778,
		label="group by
session_id",
		pos="537.5,41",
		width=1.0139];
	transactions:"WEEKDAY(transaction_time)" -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	 [arrowhead="",
		pos="e,500.89,28.539 457.5,22 468.54,22 480.21,23.734 491.12,26.153",
		style=solid];
	transactions:session_id -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	 [pos="457.5,59 471.88,59 487.36,56.198 500.84,52.805"];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode"	 [height=0.94444,
		label=<<FONT POINT-SIZE="12"><B>Step 2:</B>   Aggregation<BR></BR></FONT>MODE>,
		pos="731,41",
		shape=diamond,
		width=3.3611];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode" -> sessions:"MODE(transactions.WEEKDAY(transaction_time))"	 [arrowhead="",
		pos="e,896.5,41 852.12,41 863.56,41 875.08,41 886.29,41",
		style=solid];
	"1_WEEKDAY(transaction_time)_weekday" -> transactions:"WEEKDAY(transaction_time)"	 [arrowhead="",
		pos="e,266.5,22 161.37,41.223 188.63,32.547 223.38,23.809 256.34,22.246",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" -> "0_MODE(transactions.WEEKDAY(transaction_time))_mode"	 [arrowhead="",
		pos="e,609.66,41 574.35,41 582,41 590.47,41 599.4,41",
		style=solid];
}

Feature descriptions

Featuretools can also automatically generate English sentence descriptions of features. Feature descriptions help to explain what a feature is, and can be further improved by including manually defined custom definitions. See Generating Feature Descriptions for more details on how to customize automatically generated feature descriptions.

In [18]: ft.describe_feature(feature)
Out[18]: 'The most frequently occurring value of the day of the week of the "transaction_time" of all instances of "transactions" for each "session_id" in "sessions".'

What’s next?