What is Featuretools?¶

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.
5 Minute Quick Start¶
Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.
In [1]: import featuretools as ft
Load Mock Data¶
In [2]: data = ft.demo.load_mock_customer()
Prepare data¶
In this toy dataset, there are 3 tables. Each table is called an entity
in Featuretools.
customers: unique customers who had sessions
sessions: unique sessions and associated attributes
transactions: list of events in this session
In [3]: customers_df = data["customers"]
In [4]: customers_df
Out[4]:
customer_id zip_code join_date date_of_birth
0 1 60091 2011-04-17 10:48:33 1994-07-18
1 2 13244 2012-04-15 23:31:04 1986-08-18
2 3 13244 2011-08-13 15:42:34 2003-11-21
3 4 60091 2011-04-08 20:08:14 2006-08-15
4 5 60091 2010-07-17 05:27:50 1984-07-28
In [5]: sessions_df = data["sessions"]
In [6]: sessions_df.sample(5)
Out[6]:
session_id customer_id device session_start
13 14 1 tablet 2014-01-01 03:28:00
6 7 3 tablet 2014-01-01 01:39:40
1 2 5 mobile 2014-01-01 00:17:20
28 29 1 mobile 2014-01-01 07:10:05
24 25 3 desktop 2014-01-01 05:59:40
In [7]: transactions_df = data["transactions"]
In [8]: transactions_df.sample(5)
Out[8]:
transaction_id session_id transaction_time product_id amount
74 232 5 2014-01-01 01:20:10 1 139.20
231 27 17 2014-01-01 04:10:15 2 90.79
434 36 31 2014-01-01 07:50:10 3 62.35
420 56 30 2014-01-01 07:35:00 3 72.70
54 444 4 2014-01-01 00:58:30 4 43.59
First, we specify a dictionary with all the entities in our dataset.
In [9]: entities = {
...: "customers" : (customers_df, "customer_id"),
...: "sessions" : (sessions_df, "session_id", "session_start"),
...: "transactions" : (transactions_df, "transaction_id", "transaction_time")
...: }
...:
Second, we specify how the entities are related. When two entities have a one-to-many relationship, we call the “one” enitity, the “parent entity”. A relationship between a parent and child is defined like this:
(parent_entity, parent_variable, child_entity, child_variable)
In this dataset we have two relationships
In [10]: relationships = [("sessions", "session_id", "transactions", "session_id"),
....: ("customers", "customer_id", "sessions", "customer_id")]
....:
Note
To manage setting up entities and relationships, we recommend using the EntitySet
class which offers convenient APIs for managing data like this. See Representing Data with EntitySets for more information.
Run Deep Feature Synthesis¶
A minimal input to DFS is a set of entities, a list of relationships, and the “target_entity” to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.
Let’s first create a feature matrix for each customer in the data
In [11]: feature_matrix_customers, features_defs = ft.dfs(entities=entities,
....: relationships=relationships,
....: target_entity="customers")
....:
In [12]: feature_matrix_customers
Out[12]:
zip_code COUNT(sessions) NUM_UNIQUE(sessions.device) MODE(sessions.device) SUM(transactions.amount) STD(transactions.amount) MAX(transactions.amount) SKEW(transactions.amount) MIN(transactions.amount) MEAN(transactions.amount) COUNT(transactions) NUM_UNIQUE(transactions.product_id) MODE(transactions.product_id) DAY(date_of_birth) DAY(join_date) YEAR(date_of_birth) YEAR(join_date) MONTH(date_of_birth) MONTH(join_date) WEEKDAY(date_of_birth) WEEKDAY(join_date) SUM(sessions.MAX(transactions.amount)) SUM(sessions.SKEW(transactions.amount)) SUM(sessions.MIN(transactions.amount)) SUM(sessions.NUM_UNIQUE(transactions.product_id)) SUM(sessions.MEAN(transactions.amount)) SUM(sessions.STD(transactions.amount)) STD(sessions.COUNT(transactions)) STD(sessions.MAX(transactions.amount)) STD(sessions.SKEW(transactions.amount)) STD(sessions.MIN(transactions.amount)) STD(sessions.SUM(transactions.amount)) STD(sessions.NUM_UNIQUE(transactions.product_id)) STD(sessions.MEAN(transactions.amount)) MAX(sessions.COUNT(transactions)) MAX(sessions.SKEW(transactions.amount)) MAX(sessions.MIN(transactions.amount)) MAX(sessions.SUM(transactions.amount)) MAX(sessions.NUM_UNIQUE(transactions.product_id)) MAX(sessions.MEAN(transactions.amount)) MAX(sessions.STD(transactions.amount)) SKEW(sessions.COUNT(transactions)) SKEW(sessions.MAX(transactions.amount)) SKEW(sessions.MIN(transactions.amount)) SKEW(sessions.SUM(transactions.amount)) SKEW(sessions.NUM_UNIQUE(transactions.product_id)) SKEW(sessions.MEAN(transactions.amount)) SKEW(sessions.STD(transactions.amount)) MIN(sessions.COUNT(transactions)) MIN(sessions.MAX(transactions.amount)) MIN(sessions.SKEW(transactions.amount)) MIN(sessions.SUM(transactions.amount)) MIN(sessions.NUM_UNIQUE(transactions.product_id)) MIN(sessions.MEAN(transactions.amount)) MIN(sessions.STD(transactions.amount)) MEAN(sessions.COUNT(transactions)) MEAN(sessions.MAX(transactions.amount)) MEAN(sessions.SKEW(transactions.amount)) MEAN(sessions.MIN(transactions.amount)) MEAN(sessions.SUM(transactions.amount)) MEAN(sessions.NUM_UNIQUE(transactions.product_id)) MEAN(sessions.MEAN(transactions.amount)) MEAN(sessions.STD(transactions.amount)) NUM_UNIQUE(sessions.MODE(transactions.product_id)) NUM_UNIQUE(sessions.MONTH(session_start)) NUM_UNIQUE(sessions.YEAR(session_start)) NUM_UNIQUE(sessions.WEEKDAY(session_start)) NUM_UNIQUE(sessions.DAY(session_start)) MODE(sessions.MODE(transactions.product_id)) MODE(sessions.MONTH(session_start)) MODE(sessions.YEAR(session_start)) MODE(sessions.WEEKDAY(session_start)) MODE(sessions.DAY(session_start)) NUM_UNIQUE(transactions.sessions.customer_id) NUM_UNIQUE(transactions.sessions.device) MODE(transactions.sessions.customer_id) MODE(transactions.sessions.device)
customer_id
1 60091 8 3 mobile 9025.62 40.442059 139.43 0.019698 5.81 71.631905 126 5 4 18 17 1994 2011 7 4 0 6 1057.97 -0.476122 78.59 40 582.193117 312.745952 4.062019 7.322191 0.589386 6.954507 279.510713 0.000000 13.759314 25 0.640252 26.36 1613.93 5 88.755625 46.905665 1.946018 -0.780493 2.440005 0.778170 0.000000 -0.424949 -0.312355 12 118.90 -1.038434 809.97 5 50.623125 30.450261 15.750000 132.246250 -0.059515 9.823750 1128.202500 5.000000 72.774140 39.093244 4 1 1 1 1 4 1 2014 2 1 1 3 1 mobile
2 13244 7 3 desktop 7200.28 37.705178 146.81 0.098259 8.73 77.422366 93 5 4 18 15 1986 2012 8 4 0 6 931.63 -0.277640 154.60 35 548.905851 258.700528 3.450328 17.221593 0.509798 15.874374 251.609234 0.000000 11.477071 18 0.755711 56.46 1320.64 5 96.581000 47.935920 -0.303276 -1.539467 2.154929 -0.440929 0.000000 0.235296 0.013087 8 100.04 -0.763603 634.84 5 61.910000 27.839228 13.285714 133.090000 -0.039663 22.085714 1028.611429 5.000000 78.415122 36.957218 4 1 1 1 1 3 1 2014 2 1 1 3 2 desktop
3 13244 6 3 desktop 6236.62 43.683296 149.15 0.418230 5.89 67.060430 93 5 1 21 13 2003 2011 11 8 4 5 847.63 2.286086 66.21 29 405.237462 257.299895 2.428992 10.724241 0.429374 5.424407 219.021420 0.408248 11.174282 18 0.854976 20.06 1477.97 5 82.109444 50.110120 -1.507217 -0.941078 1.000771 2.246479 -2.449490 0.678544 -0.245703 11 126.74 -0.289466 889.21 4 55.579412 35.704680 15.500000 141.271667 0.381014 11.035000 1039.436667 4.833333 67.539577 42.883316 4 1 1 1 1 1 1 2014 2 1 1 3 3 desktop
4 60091 8 3 mobile 8727.68 45.068765 149.95 -0.036348 5.73 80.070459 109 5 2 15 8 2006 2011 8 4 1 4 1157.99 0.002764 131.51 37 649.657515 356.125829 3.335416 3.514421 0.387884 16.960575 235.992478 0.517549 13.027258 18 0.382868 54.83 1351.46 5 110.450000 54.293903 0.282488 0.027256 2.103510 -0.391805 -0.644061 1.980948 -1.065663 10 139.20 -0.711744 771.68 4 70.638182 29.026424 13.625000 144.748750 0.000346 16.438750 1090.960000 4.625000 81.207189 44.515729 5 1 1 1 1 1 1 2014 2 1 1 3 4 mobile
5 60091 6 3 mobile 6349.66 44.095630 149.02 -0.025941 7.55 80.375443 79 5 5 28 17 1984 2010 7 7 5 5 839.76 0.014384 86.49 30 472.231119 259.873954 3.600926 7.928001 0.415426 4.961414 402.775486 0.000000 11.007471 18 0.602209 20.65 1700.67 5 94.481667 51.149250 -0.317685 -0.333796 -0.470410 0.472342 0.000000 0.335175 0.204548 8 128.51 -0.539060 543.18 5 66.666667 36.734681 13.166667 139.960000 0.002397 14.415000 1058.276667 5.000000 78.705187 43.312326 5 1 1 1 1 3 1 2014 2 1 1 3 5 mobile
We now have dozens of new features to describe a customer’s behavior.
Change target entity¶
One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.
In [13]: feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
....: relationships=relationships,
....: target_entity="sessions")
....:
In [14]: feature_matrix_sessions.head(5)
Out[14]:
customer_id device SUM(transactions.amount) STD(transactions.amount) MAX(transactions.amount) SKEW(transactions.amount) MIN(transactions.amount) MEAN(transactions.amount) COUNT(transactions) NUM_UNIQUE(transactions.product_id) MODE(transactions.product_id) DAY(session_start) YEAR(session_start) MONTH(session_start) WEEKDAY(session_start) customers.zip_code NUM_UNIQUE(transactions.YEAR(transaction_time)) NUM_UNIQUE(transactions.DAY(transaction_time)) NUM_UNIQUE(transactions.MONTH(transaction_time)) NUM_UNIQUE(transactions.WEEKDAY(transaction_time)) MODE(transactions.YEAR(transaction_time)) MODE(transactions.DAY(transaction_time)) MODE(transactions.MONTH(transaction_time)) MODE(transactions.WEEKDAY(transaction_time)) customers.COUNT(sessions) customers.NUM_UNIQUE(sessions.device) customers.MODE(sessions.device) customers.SUM(transactions.amount) customers.STD(transactions.amount) customers.MAX(transactions.amount) customers.SKEW(transactions.amount) customers.MIN(transactions.amount) customers.MEAN(transactions.amount) customers.COUNT(transactions) customers.NUM_UNIQUE(transactions.product_id) customers.MODE(transactions.product_id) customers.DAY(date_of_birth) customers.DAY(join_date) customers.YEAR(date_of_birth) customers.YEAR(join_date) customers.MONTH(date_of_birth) customers.MONTH(join_date) customers.WEEKDAY(date_of_birth) customers.WEEKDAY(join_date)
session_id
1 2 desktop 1229.01 41.600976 141.66 0.295458 20.91 76.813125 16 5 3 1 2014 1 2 13244 1 1 1 1 2014 1 1 2 7 3 desktop 7200.28 37.705178 146.81 0.098259 8.73 77.422366 93 5 4 18 15 1986 2012 8 4 0 6
2 5 mobile 746.96 45.893591 135.25 -0.160550 9.32 74.696000 10 5 5 1 2014 1 2 60091 1 1 1 1 2014 1 1 2 6 3 mobile 6349.66 44.095630 149.02 -0.025941 7.55 80.375443 79 5 5 28 17 1984 2010 7 7 5 5
3 4 mobile 1329.00 46.240016 147.73 -0.324012 8.70 88.600000 15 5 1 1 2014 1 2 60091 1 1 1 1 2014 1 1 2 8 3 mobile 8727.68 45.068765 149.95 -0.036348 5.73 80.070459 109 5 2 15 8 2006 2011 8 4 1 4
4 1 mobile 1613.93 40.187205 129.00 0.234349 6.29 64.557200 25 5 5 1 2014 1 2 60091 1 1 1 1 2014 1 1 2 8 3 mobile 9025.62 40.442059 139.43 0.019698 5.81 71.631905 126 5 4 18 17 1994 2011 7 4 0 6
5 4 mobile 777.02 48.918663 139.20 0.336381 7.43 70.638182 11 5 5 1 2014 1 2 60091 1 1 1 1 2014 1 1 2 8 3 mobile 8727.68 45.068765 149.95 -0.036348 5.73 80.070459 109 5 2 15 8 2006 2011 8 4 1 4
What’s next?¶
Learn about Representing Data with EntitySets
Apply automated feature engineering with Deep Feature Synthesis
Explore runnable demos based on real world use cases
Can’t find what you’re looking for? Ask for Help