What is Featuretools?

Featuretools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

In [1]: import featuretools as ft

Load Mock Data

In [2]: data = ft.demo.load_mock_customer()

Prepare data

In this toy dataset, there are 3 tables. Each table is called an entity in Featuretools.

  • customers: unique customers who had sessions

  • sessions: unique sessions and associated attributes

  • transactions: list of events in this session

In [3]: customers_df = data["customers"]

In [4]: customers_df
Out[4]: 
   customer_id zip_code           join_date date_of_birth
0            1    60091 2011-04-17 10:48:33    1994-07-18
1            2    13244 2012-04-15 23:31:04    1986-08-18
2            3    13244 2011-08-13 15:42:34    2003-11-21
3            4    60091 2011-04-08 20:08:14    2006-08-15
4            5    60091 2010-07-17 05:27:50    1984-07-28

In [5]: sessions_df = data["sessions"]

In [6]: sessions_df.sample(5)
Out[6]: 
    session_id  customer_id   device       session_start
13          14            1   tablet 2014-01-01 03:28:00
6            7            3   tablet 2014-01-01 01:39:40
1            2            5   mobile 2014-01-01 00:17:20
28          29            1   mobile 2014-01-01 07:10:05
24          25            3  desktop 2014-01-01 05:59:40

In [7]: transactions_df = data["transactions"]

In [8]: transactions_df.sample(5)
Out[8]: 
     transaction_id  session_id    transaction_time product_id  amount
74              232           5 2014-01-01 01:20:10          1  139.20
231              27          17 2014-01-01 04:10:15          2   90.79
434              36          31 2014-01-01 07:50:10          3   62.35
420              56          30 2014-01-01 07:35:00          3   72.70
54              444           4 2014-01-01 00:58:30          4   43.59

First, we specify a dictionary with all the entities in our dataset.

In [9]: entities = {
   ...:    "customers" : (customers_df, "customer_id"),
   ...:    "sessions" : (sessions_df, "session_id", "session_start"),
   ...:    "transactions" : (transactions_df, "transaction_id", "transaction_time")
   ...: }
   ...: 

Second, we specify how the entities are related. When two entities have a one-to-many relationship, we call the “one” enitity, the “parent entity”. A relationship between a parent and child is defined like this:

(parent_entity, parent_variable, child_entity, child_variable)

In this dataset we have two relationships

In [10]: relationships = [("sessions", "session_id", "transactions", "session_id"),
   ....:                  ("customers", "customer_id", "sessions", "customer_id")]
   ....: 

Note

To manage setting up entities and relationships, we recommend using the EntitySet class which offers convenient APIs for managing data like this. See Representing Data with EntitySets for more information.

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities, a list of relationships, and the “target_entity” to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.

Let’s first create a feature matrix for each customer in the data

In [11]: feature_matrix_customers, features_defs = ft.dfs(entities=entities,
   ....:                                                  relationships=relationships,
   ....:                                                  target_entity="customers")
   ....: 

In [12]: feature_matrix_customers
Out[12]: 
            zip_code  COUNT(sessions)  NUM_UNIQUE(sessions.device) MODE(sessions.device)  SUM(transactions.amount)  STD(transactions.amount)  MAX(transactions.amount)  SKEW(transactions.amount)  MIN(transactions.amount)  MEAN(transactions.amount)  COUNT(transactions)  NUM_UNIQUE(transactions.product_id)  MODE(transactions.product_id)  DAY(date_of_birth)  DAY(join_date)  YEAR(date_of_birth)  YEAR(join_date)  MONTH(date_of_birth)  MONTH(join_date)  WEEKDAY(date_of_birth)  WEEKDAY(join_date)  SUM(sessions.MIN(transactions.amount))  SUM(sessions.SKEW(transactions.amount))  SUM(sessions.STD(transactions.amount))  SUM(sessions.NUM_UNIQUE(transactions.product_id))  SUM(sessions.MAX(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.COUNT(transactions))  STD(sessions.MIN(transactions.amount))  STD(sessions.SKEW(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.NUM_UNIQUE(transactions.product_id))  STD(sessions.MAX(transactions.amount))  STD(sessions.MEAN(transactions.amount))  MAX(sessions.COUNT(transactions))  MAX(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.SUM(transactions.amount))  MAX(sessions.STD(transactions.amount))  MAX(sessions.NUM_UNIQUE(transactions.product_id))  MAX(sessions.MEAN(transactions.amount))  SKEW(sessions.COUNT(transactions))  SKEW(sessions.MIN(transactions.amount))  SKEW(sessions.SUM(transactions.amount))  SKEW(sessions.STD(transactions.amount))  SKEW(sessions.NUM_UNIQUE(transactions.product_id))  SKEW(sessions.MAX(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  MIN(sessions.COUNT(transactions))  MIN(sessions.SKEW(transactions.amount))  MIN(sessions.SUM(transactions.amount))  MIN(sessions.STD(transactions.amount))  MIN(sessions.NUM_UNIQUE(transactions.product_id))  MIN(sessions.MAX(transactions.amount))  MIN(sessions.MEAN(transactions.amount))  MEAN(sessions.COUNT(transactions))  MEAN(sessions.MIN(transactions.amount))  MEAN(sessions.SKEW(transactions.amount))  MEAN(sessions.SUM(transactions.amount))  MEAN(sessions.STD(transactions.amount))  MEAN(sessions.NUM_UNIQUE(transactions.product_id))  MEAN(sessions.MAX(transactions.amount))  MEAN(sessions.MEAN(transactions.amount))  NUM_UNIQUE(sessions.WEEKDAY(session_start))  NUM_UNIQUE(sessions.MODE(transactions.product_id))  NUM_UNIQUE(sessions.DAY(session_start))  NUM_UNIQUE(sessions.MONTH(session_start))  NUM_UNIQUE(sessions.YEAR(session_start))  MODE(sessions.WEEKDAY(session_start))  MODE(sessions.MODE(transactions.product_id))  MODE(sessions.DAY(session_start))  MODE(sessions.MONTH(session_start))  MODE(sessions.YEAR(session_start))  NUM_UNIQUE(transactions.sessions.customer_id)  NUM_UNIQUE(transactions.sessions.device)  MODE(transactions.sessions.customer_id) MODE(transactions.sessions.device)
customer_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
1              60091                8                            3                mobile                   9025.62                 40.442059                    139.43                   0.019698                      5.81                  71.631905                  126                                    5                              4                  18              17                 1994             2011                     7                 4                       0                   6                                   78.59                                -0.476122                              312.745952                                                 40                                 1057.97                               582.193117                           4.062019                                6.954507                                 0.589386                              279.510713                                           0.000000                                7.322191                                13.759314                                 25                                   26.36                                 0.640252                                 1613.93                               46.905665                                                  5                                88.755625                            1.946018                                 2.440005                                 0.778170                                -0.312355                                           0.000000                                 -0.780493                                 -0.424949                                 12                                -1.038434                                  809.97                               30.450261                                                  5                                  118.90                                50.623125                           15.750000                                 9.823750                                 -0.059515                              1128.202500                                39.093244                                           5.000000                                132.246250                                 72.774140                                            1                                                  4                                         1                                          1                                         1                                      2                                             4                                  1                                    1                                2014                                              1                                         3                                        1                             mobile
2              13244                7                            3               desktop                   7200.28                 37.705178                    146.81                   0.098259                      8.73                  77.422366                   93                                    5                              4                  18              15                 1986             2012                     8                 4                       0                   6                                  154.60                                -0.277640                              258.700528                                                 35                                  931.63                               548.905851                           3.450328                               15.874374                                 0.509798                              251.609234                                           0.000000                               17.221593                                11.477071                                 18                                   56.46                                 0.755711                                 1320.64                               47.935920                                                  5                                96.581000                           -0.303276                                 2.154929                                -0.440929                                 0.013087                                           0.000000                                 -1.539467                                  0.235296                                  8                                -0.763603                                  634.84                               27.839228                                                  5                                  100.04                                61.910000                           13.285714                                22.085714                                 -0.039663                              1028.611429                                36.957218                                           5.000000                                133.090000                                 78.415122                                            1                                                  4                                         1                                          1                                         1                                      2                                             3                                  1                                    1                                2014                                              1                                         3                                        2                            desktop
3              13244                6                            3               desktop                   6236.62                 43.683296                    149.15                   0.418230                      5.89                  67.060430                   93                                    5                              1                  21              13                 2003             2011                    11                 8                       4                   5                                   66.21                                 2.286086                              257.299895                                                 29                                  847.63                               405.237462                           2.428992                                5.424407                                 0.429374                              219.021420                                           0.408248                               10.724241                                11.174282                                 18                                   20.06                                 0.854976                                 1477.97                               50.110120                                                  5                                82.109444                           -1.507217                                 1.000771                                 2.246479                                -0.245703                                          -2.449490                                 -0.941078                                  0.678544                                 11                                -0.289466                                  889.21                               35.704680                                                  4                                  126.74                                55.579412                           15.500000                                11.035000                                  0.381014                              1039.436667                                42.883316                                           4.833333                                141.271667                                 67.539577                                            1                                                  4                                         1                                          1                                         1                                      2                                             1                                  1                                    1                                2014                                              1                                         3                                        3                            desktop
4              60091                8                            3                mobile                   8727.68                 45.068765                    149.95                  -0.036348                      5.73                  80.070459                  109                                    5                              2                  15               8                 2006             2011                     8                 4                       1                   4                                  131.51                                 0.002764                              356.125829                                                 37                                 1157.99                               649.657515                           3.335416                               16.960575                                 0.387884                              235.992478                                           0.517549                                3.514421                                13.027258                                 18                                   54.83                                 0.382868                                 1351.46                               54.293903                                                  5                               110.450000                            0.282488                                 2.103510                                -0.391805                                -1.065663                                          -0.644061                                  0.027256                                  1.980948                                 10                                -0.711744                                  771.68                               29.026424                                                  4                                  139.20                                70.638182                           13.625000                                16.438750                                  0.000346                              1090.960000                                44.515729                                           4.625000                                144.748750                                 81.207189                                            1                                                  5                                         1                                          1                                         1                                      2                                             1                                  1                                    1                                2014                                              1                                         3                                        4                             mobile
5              60091                6                            3                mobile                   6349.66                 44.095630                    149.02                  -0.025941                      7.55                  80.375443                   79                                    5                              5                  28              17                 1984             2010                     7                 7                       5                   5                                   86.49                                 0.014384                              259.873954                                                 30                                  839.76                               472.231119                           3.600926                                4.961414                                 0.415426                              402.775486                                           0.000000                                7.928001                                11.007471                                 18                                   20.65                                 0.602209                                 1700.67                               51.149250                                                  5                                94.481667                           -0.317685                                -0.470410                                 0.472342                                 0.204548                                           0.000000                                 -0.333796                                  0.335175                                  8                                -0.539060                                  543.18                               36.734681                                                  5                                  128.51                                66.666667                           13.166667                                14.415000                                  0.002397                              1058.276667                                43.312326                                           5.000000                                139.960000                                 78.705187                                            1                                                  5                                         1                                          1                                         1                                      2                                             3                                  1                                    1                                2014                                              1                                         3                                        5                             mobile

We now have dozens of new features to describe a customer’s behavior.

Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.

In [13]: feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
   ....:                                                 relationships=relationships,
   ....:                                                 target_entity="sessions")
   ....: 

In [14]: feature_matrix_sessions.head(5)
Out[14]: 
            customer_id   device  SUM(transactions.amount)  STD(transactions.amount)  MAX(transactions.amount)  SKEW(transactions.amount)  MIN(transactions.amount)  MEAN(transactions.amount)  COUNT(transactions)  NUM_UNIQUE(transactions.product_id)  MODE(transactions.product_id)  DAY(session_start)  YEAR(session_start)  MONTH(session_start)  WEEKDAY(session_start) customers.zip_code  NUM_UNIQUE(transactions.MONTH(transaction_time))  NUM_UNIQUE(transactions.WEEKDAY(transaction_time))  NUM_UNIQUE(transactions.YEAR(transaction_time))  NUM_UNIQUE(transactions.DAY(transaction_time))  MODE(transactions.MONTH(transaction_time))  MODE(transactions.WEEKDAY(transaction_time))  MODE(transactions.YEAR(transaction_time))  MODE(transactions.DAY(transaction_time))  customers.COUNT(sessions)  customers.NUM_UNIQUE(sessions.device) customers.MODE(sessions.device)  customers.SUM(transactions.amount)  customers.STD(transactions.amount)  customers.MAX(transactions.amount)  customers.SKEW(transactions.amount)  customers.MIN(transactions.amount)  customers.MEAN(transactions.amount)  customers.COUNT(transactions)  customers.NUM_UNIQUE(transactions.product_id)  customers.MODE(transactions.product_id)  customers.DAY(date_of_birth)  customers.DAY(join_date)  customers.YEAR(date_of_birth)  customers.YEAR(join_date)  customers.MONTH(date_of_birth)  customers.MONTH(join_date)  customers.WEEKDAY(date_of_birth)  customers.WEEKDAY(join_date)
session_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
1                     2  desktop                   1229.01                 41.600976                    141.66                   0.295458                     20.91                  76.813125                   16                                    5                              3                   1                 2014                     1                       2              13244                                                 1                                                  1                                                 1                                               1                                           1                                             2                                       2014                                         1                          7                                      3                         desktop                             7200.28                           37.705178                              146.81                             0.098259                                8.73                            77.422366                             93                                              5                                        4                            18                        15                           1986                       2012                               8                           4                                 0                             6
2                     5   mobile                    746.96                 45.893591                    135.25                  -0.160550                      9.32                  74.696000                   10                                    5                              5                   1                 2014                     1                       2              60091                                                 1                                                  1                                                 1                                               1                                           1                                             2                                       2014                                         1                          6                                      3                          mobile                             6349.66                           44.095630                              149.02                            -0.025941                                7.55                            80.375443                             79                                              5                                        5                            28                        17                           1984                       2010                               7                           7                                 5                             5
3                     4   mobile                   1329.00                 46.240016                    147.73                  -0.324012                      8.70                  88.600000                   15                                    5                              1                   1                 2014                     1                       2              60091                                                 1                                                  1                                                 1                                               1                                           1                                             2                                       2014                                         1                          8                                      3                          mobile                             8727.68                           45.068765                              149.95                            -0.036348                                5.73                            80.070459                            109                                              5                                        2                            15                         8                           2006                       2011                               8                           4                                 1                             4
4                     1   mobile                   1613.93                 40.187205                    129.00                   0.234349                      6.29                  64.557200                   25                                    5                              5                   1                 2014                     1                       2              60091                                                 1                                                  1                                                 1                                               1                                           1                                             2                                       2014                                         1                          8                                      3                          mobile                             9025.62                           40.442059                              139.43                             0.019698                                5.81                            71.631905                            126                                              5                                        4                            18                        17                           1994                       2011                               7                           4                                 0                             6
5                     4   mobile                    777.02                 48.918663                    139.20                   0.336381                      7.43                  70.638182                   11                                    5                              5                   1                 2014                     1                       2              60091                                                 1                                                  1                                                 1                                               1                                           1                                             2                                       2014                                         1                          8                                      3                          mobile                             8727.68                           45.068765                              149.95                            -0.036348                                5.73                            80.070459                            109                                              5                                        2                            15                         8                           2006                       2011                               8                           4                                 1                             4

What’s next?