Feature Selection#

Featuretools provides users with the ability to remove features that are unlikely to be useful in building an effective machine learning model. Reducing the number of features in the feature matrix can both produce better results in the model as well as reduce the computational cost involved in prediction.

Featuretools enables users to perform feature selection on the results of Deep Feature Synthesis with three functions:

ft.selection.remove_highly_null_features
ft.selection.remove_single_value_features
ft.selection.remove_highly_correlated_features

We will describe each of these functions in depth, but first we must create an entity set with which we can run ft.dfs.

[1]:

import pandas as pd

import featuretools as ft
from featuretools.demo.flight import load_flight
from featuretools.selection import (
    remove_highly_correlated_features,
    remove_highly_null_features,
    remove_single_value_features,
)

es = load_flight(nrows=50)
es

Downloading data ...

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/demo/flight.py:291: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
  clean_data.loc[:, "dep_time"] = clean_data["scheduled_dep_time"] + pd.to_timedelta(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/demo/flight.py:296: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
  clean_data.loc[:, "arr_time"] = clean_data["dep_time"] + pd.to_timedelta(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/demo/flight.py:302: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
  clean_data["scheduled_dep_time"] + clean_data["scheduled_elapsed_time"]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)

[1]:

Entityset: Flight Data
  DataFrames:
    trip_logs [Rows: 50, Columns: 21]
    flights [Rows: 6, Columns: 9]
    airlines [Rows: 1, Columns: 1]
    airports [Rows: 4, Columns: 3]
  Relationships:
    trip_logs.flight_id -> flights.flight_id
    flights.carrier -> airlines.carrier
    flights.dest -> airports.dest

Remove Highly Null Features#

We might have a dataset with columns that have many null values. Deep Feature Synthesis might build features off of those null columns, creating even more highly null features. In this case, we might want to remove any features whose null values pass a certain threshold. Below is our feature matrix with such a case:

[2]:

fm, features = ft.dfs(
    entityset=es,
    target_dataframe_name="trip_logs",
    cutoff_time=pd.DataFrame(
        {
            "trip_log_id": [30, 1, 2, 3, 4],
            "time": pd.to_datetime(["2016-09-22 00:00:00"] * 5),
        }
    ),
    trans_primitives=[],
    agg_primitives=[],
    max_depth=2,
)
fm

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)

[2]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	NaN	NaN	NaN	NaN	<NA>	NaN	600.0	NaN	NaN	NaN	NaN	NaN	<NA>	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

We look at the above feature matrix and decide to remove the highly null features

[3]:

ft.selection.remove_highly_null_features(fm)

[3]:

	flight_id	distance	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	600.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Notice that calling remove_highly_null_features didn’t remove every feature that contains a null value. By default, we only remove features where the percentage of null values in the calculated feature matrix is above 95%. If we want to lower that threshold, we can set the pct_null_threshold paramter ourselves.

[4]:

remove_highly_null_features(fm, pct_null_threshold=0.2)

[4]:


trip_log_id
30
1
2
3
4

Remove Single Value Features#

Another situation we might run into is one where our calculated features don’t have any variance. In those cases, we are likely to want to remove the uninteresting features. For that, we use remove_single_value_features.

Let’s see what happens when we remove the single value features of the feature matrix below.

[5]:

fm

[5]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	NaN	NaN	NaN	NaN	<NA>	NaN	600.0	NaN	NaN	NaN	NaN	NaN	<NA>	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Note

A list of feature definitions such as those created by dfs can be provided to the feature selection functions. Doing this will change the outputs to include an updated list of feature definitions.

[6]:

new_fm, new_features = remove_single_value_features(fm, features=features)
new_fm

[6]:

	flight_id	distance	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	600.0	RSW	Fort Myers, FL	FL	CLT	3	Charlotte, NC	NC
1	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Now that we have the features definitions for the updated feature matrix, we can see that the features that were removed are:

[7]:

set(features) - set(new_features)

[7]:

{<Feature: air_time>,
 <Feature: arr_delay>,
 <Feature: canceled>,
 <Feature: carrier_delay>,
 <Feature: dep_delay>,
 <Feature: diverted>,
 <Feature: flights.carrier>,
 <Feature: flights.flight_num>,
 <Feature: late_aircraft_delay>,
 <Feature: national_airspace_delay>,
 <Feature: security_delay>,
 <Feature: taxi_in>,
 <Feature: taxi_out>,
 <Feature: weather_delay>}

With the function used as it is above, null values are not considered when counting a feature’s unique values. If we’d like to consider NaN its own value, we can set count_nan_as_value to True and we’ll see flights.carrier and flights.flight_num back in the matrix.

[8]:

new_fm, new_features = remove_single_value_features(
    fm, features=features, count_nan_as_value=True
)
new_fm

[8]:

	flight_id	distance	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	600.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

The features that were removed are:

[9]:

set(features) - set(new_features)

[9]:

{<Feature: air_time>,
 <Feature: arr_delay>,
 <Feature: canceled>,
 <Feature: carrier_delay>,
 <Feature: dep_delay>,
 <Feature: diverted>,
 <Feature: late_aircraft_delay>,
 <Feature: national_airspace_delay>,
 <Feature: security_delay>,
 <Feature: taxi_in>,
 <Feature: taxi_out>,
 <Feature: weather_delay>}

Remove Highly Correlated Features#

The last feature selection function we have allows us to remove features that would likely be redundant to the model we’re attempting to build by considering the correlation between pairs of calculated features.

When two features are determined to be highly correlated, we remove the more complex of the two. For example, say we have two features: col and -(col).

We can see that -(col) is just the negation of col, and so we can guess those features are going to be highly correlated. -(col) has has the Negate primitive applied to it, so it is more complex than the identity feature col. Therefore, if we only want one of col and -(col), we should keep the identity feature. For features that don’t have an obvious difference in complexity, we discard the feature that comes later in the feature matrix.

Let’s try this out on our data:

[10]:

fm, features = ft.dfs(
    entityset=es,
    target_dataframe_name="trip_logs",
    trans_primitives=["negate"],
    agg_primitives=[],
    max_depth=3,
)
fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan

[10]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(air_time)	-(arr_delay)	-(carrier_delay)	-(dep_delay)	-(distance)	-(late_aircraft_delay)	-(national_airspace_delay)	-(security_delay)	-(taxi_in)	-(taxi_out)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-88.0	12.0	-0.0	11.0	-600.0	-0.0	-0.0	-0.0	-10.0	-12.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-224.0	-1.0	-0.0	6.0	-1587.0	-0.0	-0.0	-0.0	-5.0	-28.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	226.0	0.0	0.0	0.0	0.0	0.0	False	-50.0	3.0	-0.0	2.0	-226.0	-0.0	-0.0	-0.0	-8.0	-18.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-87.0	3.0	-0.0	-0.0	-600.0	-0.0	-0.0	-0.0	-10.0	-11.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-235.0	-10.0	-0.0	4.0	-1587.0	-0.0	-0.0	-0.0	-3.0	-26.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

Note that we have some pretty clear correlations here between all the features and their negations.

Now, using remove_highly_correlated_features, our default threshold for correlation is 95% correlated, and we get all of the obviously correlated features removed, leaving just the less complex features.

[11]:

new_fm, new_features = remove_highly_correlated_features(fm, features=features)
new_fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]

[11]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(security_delay)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

The features that were removed are:

[12]:

set(features) - set(new_features)

[12]:

{<Feature: -(distance)>,
 <Feature: -(taxi_out)>,
 <Feature: -(dep_delay)>,
 <Feature: -(late_aircraft_delay)>,
 <Feature: distance>,
 <Feature: -(air_time)>,
 <Feature: -(taxi_in)>,
 <Feature: -(arr_delay)>,
 <Feature: -(national_airspace_delay)>,
 <Feature: -(carrier_delay)>}

Change the correlation threshold#

We can lower the threshold at which to remove correlated features if we’d like to be more restrictive by using the pct_corr_threshold parameter.

[13]:

new_fm, new_features = remove_highly_correlated_features(
    fm, features=features, pct_corr_threshold=0.9
)
new_fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]

[13]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	carrier_delay	weather_delay	security_delay	late_aircraft_delay	canceled	-(security_delay)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

The features that were removed are:

[14]:

set(features) - set(new_features)

[14]:

{<Feature: -(distance)>,
 <Feature: -(taxi_out)>,
 <Feature: -(dep_delay)>,
 <Feature: -(late_aircraft_delay)>,
 <Feature: national_airspace_delay>,
 <Feature: distance>,
 <Feature: -(air_time)>,
 <Feature: -(taxi_in)>,
 <Feature: -(arr_delay)>,
 <Feature: -(national_airspace_delay)>,
 <Feature: -(carrier_delay)>}

Check a Subset of Features#

If we only want to check a subset of features, we can set features_to_check to the list of features whose correlation we’d like to check, and no features outside of that list will be removed.

[15]:

new_fm, new_features = remove_highly_correlated_features(
    fm,
    features=features,
    features_to_check=["air_time", "distance", "flights.distance_group"],
)
new_fm.head()

[15]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(air_time)	-(arr_delay)	-(carrier_delay)	-(dep_delay)	-(distance)	-(late_aircraft_delay)	-(national_airspace_delay)	-(security_delay)	-(taxi_in)	-(taxi_out)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	0.0	0.0	0.0	0.0	0.0	False	-88.0	12.0	-0.0	11.0	-600.0	-0.0	-0.0	-0.0	-10.0	-12.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	0.0	0.0	0.0	0.0	0.0	False	-224.0	-1.0	-0.0	6.0	-1587.0	-0.0	-0.0	-0.0	-5.0	-28.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	0.0	0.0	0.0	0.0	0.0	False	-50.0	3.0	-0.0	2.0	-226.0	-0.0	-0.0	-0.0	-8.0	-18.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	0.0	0.0	0.0	0.0	0.0	False	-87.0	3.0	-0.0	-0.0	-600.0	-0.0	-0.0	-0.0	-10.0	-11.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	0.0	0.0	0.0	0.0	0.0	False	-235.0	-10.0	-0.0	4.0	-1587.0	-0.0	-0.0	-0.0	-3.0	-26.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

The features that were removed are:

[16]:

set(features) - set(new_features)

[16]:

{<Feature: distance>}

Protect Features from Removal#

To protect specific features from being removed from the feature matrix, we can include a list of features_to_keep, and these features will not be removed

[17]:

new_fm, new_features = remove_highly_correlated_features(
    fm,
    features=features,
    features_to_keep=["air_time", "distance", "flights.distance_group"],
)
new_fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]

[17]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(security_delay)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	226.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

The features that were removed are:

[18]:

set(features) - set(new_features)

[18]:

{<Feature: -(distance)>,
 <Feature: -(taxi_out)>,
 <Feature: -(dep_delay)>,
 <Feature: -(late_aircraft_delay)>,
 <Feature: -(air_time)>,
 <Feature: -(taxi_in)>,
 <Feature: -(arr_delay)>,
 <Feature: -(national_airspace_delay)>,
 <Feature: -(carrier_delay)>}

Table of Contents

Previous topic

Next topic

This Page

Feature Selection#

Remove Highly Null Features#

Remove Single Value Features#

Remove Highly Correlated Features#

Change the correlation threshold#

Check a Subset of Features#

Protect Features from Removal#

Table of Contents

Previous topic

Next topic

This Page

Quick search

Feature Selection#

Remove Highly Null Features#

Remove Single Value Features#

Remove Highly Correlated Features#

Change the correlation threshold#

Check a Subset of Features#

Protect Features from Removal#