featuretools.demo.
load_flight
Download, clean, and filter flight data from 2017. The original dataset can be found here.
month_filter (list[int]) – Only use data from these months (example is [1, 2]). To skip, set to None.
[1, 2]
categorical_filter (dict[str->str]) – Use only specified categorical values. Example is {'dest_city': ['Boston, MA'], 'origin_city': ['Boston, MA']} which returns all flights in OR out of Boston. To skip, set to None.
{'dest_city': ['Boston, MA'], 'origin_city': ['Boston, MA']}
nrows (int) – Passed to nrows in pd.read_csv. Used before filtering.
pd.read_csv
demo (bool) – Use only two months of data. If False, use the whole year.
return_single_table (bool) – Exit the function early and return a dataframe.
verbose (bool) – Show a progress bar while loading the data.
Examples
In [1]: import featuretools as ft In [2]: es = ft.demo.load_flight(verbose=True, ...: month_filter=[1], ...: categorical_filter={'origin_city':['Boston, MA']}) ...: 100%|xxxxxxxxxxxxxxxxxxxxxxxxx| 100/100 [01:16<00:00, 1.31it/s] In [3]: es Out[3]: Entityset: Flight Data DataFrames: airports [Rows: 55, Columns: 3] flights [Rows: 613, Columns: 9] trip_logs [Rows: 9456, Columns: 22] airlines [Rows: 10, Columns: 1] Relationships: trip_logs.flight_id -> flights.flight_id flights.carrier -> airlines.carrier flights.dest -> airports.dest