NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:

pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.


featuretools.EntitySet

class featuretools.EntitySet(id=None, dataframes=None, relationships=None)[source]

Stores all actual data and typing information for an entityset

id
dataframe_dict
relationships
time_type
Properties:

metadata

__init__(id=None, dataframes=None, relationships=None)[source]

Creates EntitySet

Parameters
  • id (str) – Unique identifier to associate with this instance

  • dataframes (dict[str -> tuple(DataFrame, str, str, dict[str -> str/Woodwork.LogicalType], dict[str->str/set], boolean)]) – Dictionary of DataFrames. Entries take the format {dataframe name -> (dataframe, index column, time_index, logical_types, semantic_tags, make_index)}. Note that only the dataframe is required. If a Woodwork DataFrame is supplied, any other parameters will be ignored.

  • relationships (list[(str, str, str, str)]) – List of relationships between dataframes. List items are a tuple with the format (parent dataframe name, parent column, child dataframe name, child column).

Example

dataframes = {
    "cards" : (card_df, "id"),
    "transactions" : (transactions_df, "id", "transaction_time")
}

relationships = [("cards", "id", "transactions", "card_id")]

ft.EntitySet("my-entity-set", dataframes, relationships)

Methods

__init__([id, dataframes, relationships])

Creates EntitySet

add_dataframe(dataframe[, dataframe_name, …])

Add a DataFrame to the EntitySet with Woodwork typing information.

add_interesting_values([max_values, …])

Find or set interesting values for categorical columns, to be used to generate “where” clauses

add_last_time_indexes([updated_dataframes])

Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).

add_relationship([parent_dataframe_name, …])

Add a new relationship between dataframes in the entityset.

add_relationships(relationships)

Add multiple new relationships to a entityset

concat(other[, inplace])

Combine entityset with another to create a new entityset with the combined data of both entitysets.

find_backward_paths(start_dataframe_name, …)

Generator which yields all backward paths between a start and goal dataframe.

find_forward_paths(start_dataframe_name, …)

Generator which yields all forward paths between a start and goal dataframe.

get_backward_dataframes(dataframe_name[, deep])

Get dataframes that are in a backward relationship with dataframe

get_backward_relationships(dataframe_name)

get relationships where dataframe “dataframe_name” is the parent.

get_forward_dataframes(dataframe_name[, deep])

Get dataframes that are in a forward relationship with dataframe

get_forward_relationships(dataframe_name)

Get relationships where dataframe “dataframe_name” is the child

has_unique_forward_path(…)

Is the forward path from start to end unique?

normalize_dataframe(base_dataframe_name, …)

Create a new dataframe and relationship from unique values of an existing column.

plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

query_by_values(dataframe_name, instance_vals)

Query instances that have column with given value

replace_dataframe(dataframe_name, df[, …])

Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.

reset_data_description()

set_secondary_time_index(dataframe_name, …)

Set the secondary time index for a dataframe in the EntitySet using its dataframe name.

to_csv(path[, sep, encoding, engine, …])

Write entityset to disk in the csv format, location specified by path.

to_dictionary()

to_parquet(path[, engine, compression, …])

Write entityset to disk in the parquet format, location specified by path.

to_pickle(path[, compression, profile_name])

Write entityset in the pickle format, location specified by path.

Attributes

dataframe_type

String specifying the library used for the dataframes.

dataframes

metadata

Returns the metadata for this EntitySet.