NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:

pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.


featuretools.EntitySet.normalize_dataframe

EntitySet.normalize_dataframe(base_dataframe_name, new_dataframe_name, index, additional_columns=None, copy_columns=None, make_time_index=None, make_secondary_time_index=None, new_dataframe_time_index=None, new_dataframe_secondary_time_index=None)[source]

Create a new dataframe and relationship from unique values of an existing column.

Parameters
  • base_dataframe_name (str) – Datarame name from which to split.

  • new_dataframe_name (str) – Name of the new dataframe.

  • index (str) – Column in old dataframe that will become index of new dataframe. Relationship will be created across this column.

  • additional_columns (list[str]) – List of column names to remove from base_dataframe and move to new dataframe.

  • copy_columns (list[str]) – List of column names to copy from old dataframe and move to new dataframe.

  • make_time_index (bool or str, optional) – Create time index for new dataframe based on time index in base_dataframe, optionally specifying which column in base_dataframe to use for time_index. If specified as True without a specific column name, uses the primary time index. Defaults to True if base dataframe has a time index.

  • make_secondary_time_index (dict[str -> list[str]], optional) – Create a secondary time index from key. Values of dictionary are the columns to associate with a secondary time index. Only one secondary time index is allowed. If None, only associate the time index.

  • new_dataframe_time_index (str, optional) – Rename new dataframe time index.

  • new_dataframe_secondary_time_index (str, optional) – Rename new dataframe secondary time index.