Save Intermediate Feature Matrix Results¶
In this tutorial, we will go over the how to save intermediate results when computing the feature matrix.
[1]:
import featuretools as ft
In this example, we will use a dataset of retail data of customers from a UK website from December 2010 to December 2011.
[2]:
es = ft.demo.load_retail(nrows=10000)
let’s use a simple feature for this example.
[3]:
region = ft.Feature(es["customers"]["Country"])
We can supply “cutoff times” to specify that we want to calculate features one year after a customer’s first invoice.
[4]:
import pandas as pd
cutoff_times = es["customers"].df[["CustomerID", "first_invoices_time"]].rename(
columns={"CustomerID": "instance_id", "first_invoices_time": "time"})
cutoff_times["time"] = cutoff_times["time"] + pd.Timedelta("365 days")
Here is what some of the cutoff times look like.
[5]:
cutoff_times.head(10)
[5]:
instance_id | time | |
---|---|---|
CustomerID | ||
17850.0 | 17850.0 | 2011-12-01 08:26:00 |
13047.0 | 13047.0 | 2011-12-01 08:34:00 |
12583.0 | 12583.0 | 2011-12-01 08:45:00 |
13748.0 | 13748.0 | 2011-12-01 09:00:00 |
15100.0 | 15100.0 | 2011-12-01 09:09:00 |
15291.0 | 15291.0 | 2011-12-01 09:32:00 |
14688.0 | 14688.0 | 2011-12-01 09:37:00 |
14527.0 | 14527.0 | 2011-12-01 09:41:00 |
15311.0 | 15311.0 | 2011-12-01 09:41:00 |
17809.0 | 17809.0 | 2011-12-01 09:41:00 |
If you want to save intermediate computations as CSVs, simply pass the location of a directory of where the computation should be saved. For example, if you pass a directory called “ft_temp”, CSV files will be output to the directory, named according t the timestamp that it represents.
[6]:
import os
save_progress = os.path.join(os.getcwd(), 'ft_temp')
if not os.path.exists(save_progress):
os.makedirs(save_progress)
[7]:
fm_save = ft.calculate_feature_matrix([region],
entityset=es,
cutoff_time=cutoff_times.sample(10),
save_progress=save_progress)
As seen below, there are now files in the directory, named by timestamp.
[8]:
% ls ft_temp/
ft_2011_12_01_03-08-00-000000.csv ft_2011_12_02_05-03-00-000000.csv
ft_2011_12_01_09-00-00-000000.csv ft_2011_12_02_05-19-00-000000.csv
ft_2011_12_01_12-43-00-000000.csv ft_2011_12_02_12-07-00-000000.csv
ft_2011_12_01_12-51-00-000000.csv ft_2011_12_02_12-18-00-000000.csv
ft_2011_12_02_03-19-00-000000.csv ft_2011_12_03_12-57-00-000000.csv
[9]:
import shutil
shutil.rmtree(save_progress)