NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
featuretools.demo.
load_flight
Download, clean, and filter flight data from 2017. The original dataset can be found here.
month_filter (list[int]) – Only use data from these months (example is [1, 2]). To skip, set to None.
[1, 2]
categorical_filter (dict[str->str]) – Use only specified categorical values. Example is {'dest_city': ['Boston, MA'], 'origin_city': ['Boston, MA']} which returns all flights in OR out of Boston. To skip, set to None.
{'dest_city': ['Boston, MA'], 'origin_city': ['Boston, MA']}
nrows (int) – Passed to nrows in pd.read_csv. Used before filtering.
pd.read_csv
demo (bool) – Use only two months of data. If False, use the whole year.
return_single_table (bool) – Exit the function early and return a dataframe.
verbose (bool) – Show a progress bar while loading the data.
Examples
In [1]: import featuretools as ft In [2]: es = ft.demo.load_flight(verbose=True, ...: month_filter=[1], ...: categorical_filter={'origin_city':['Boston, MA']}) ...: 100%|xxxxxxxxxxxxxxxxxxxxxxxxx| 100/100 [01:16<00:00, 1.31it/s] In [3]: es Out[3]: Entityset: Flight Data Entities: airports [Rows: 55, Columns: 3] flights [Rows: 613, Columns: 9] trip_logs [Rows: 9456, Columns: 22] airlines [Rows: 10, Columns: 1] Relationships: trip_logs.flight_id -> flights.flight_id flights.carrier -> airlines.carrier flights.dest -> airports.dest