featuretools.EntitySet#

class featuretools.EntitySet(id=None, dataframes=None, relationships=None)[source]#

Stores all actual data and typing information for an entityset

id#
dataframe_dict#
relationships#
time_type#
Properties:

metadata

__init__(id=None, dataframes=None, relationships=None)[source]#

Creates EntitySet

Parameters:
  • id (str) – Unique identifier to associate with this instance

  • dataframes (dict[str -> tuple(DataFrame, str, str, dict[str -> str/Woodwork.LogicalType], dict[str->str/set], boolean)]) – Dictionary of DataFrames. Entries take the format {dataframe name -> (dataframe, index column, time_index, logical_types, semantic_tags, make_index)}. Note that only the dataframe is required. If a Woodwork DataFrame is supplied, any other parameters will be ignored.

  • relationships (list[(str, str, str, str)]) – List of relationships between dataframes. List items are a tuple with the format (parent dataframe name, parent column, child dataframe name, child column).

Example

dataframes = {
    "cards" : (card_df, "id"),
    "transactions" : (transactions_df, "id", "transaction_time")
}

relationships = [("cards", "id", "transactions", "card_id")]

ft.EntitySet("my-entity-set", dataframes, relationships)

Methods

__init__([id, dataframes, relationships])

Creates EntitySet

add_dataframe(dataframe[, dataframe_name, ...])

Add a DataFrame to the EntitySet with Woodwork typing information.

add_interesting_values([max_values, ...])

Find or set interesting values for categorical columns, to be used to generate "where" clauses

add_last_time_indexes([updated_dataframes])

Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).

add_relationship([parent_dataframe_name, ...])

Add a new relationship between dataframes in the entityset.

add_relationships(relationships)

Add multiple new relationships to a entityset

concat(other[, inplace])

Combine entityset with another to create a new entityset with the combined data of both entitysets.

find_backward_paths(start_dataframe_name, ...)

Generator which yields all backward paths between a start and goal dataframe.

find_forward_paths(start_dataframe_name, ...)

Generator which yields all forward paths between a start and goal dataframe.

get_backward_dataframes(dataframe_name[, deep])

Get dataframes that are in a backward relationship with dataframe

get_backward_relationships(dataframe_name)

get relationships where dataframe "dataframe_name" is the parent.

get_forward_dataframes(dataframe_name[, deep])

Get dataframes that are in a forward relationship with dataframe

get_forward_relationships(dataframe_name)

Get relationships where dataframe "dataframe_name" is the child

has_unique_forward_path(...)

Is the forward path from start to end unique?

normalize_dataframe(base_dataframe_name, ...)

Create a new dataframe and relationship from unique values of an existing column.

plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

query_by_values(dataframe_name, instance_vals)

Query instances that have column with given value

replace_dataframe(dataframe_name, df[, ...])

Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.

reset_data_description()

set_secondary_time_index(dataframe_name, ...)

Set the secondary time index for a dataframe in the EntitySet using its dataframe name.

to_csv(path[, sep, encoding, engine, ...])

Write entityset to disk in the csv format, location specified by path.

to_dictionary()

to_parquet(path[, engine, compression, ...])

Write entityset to disk in the parquet format, location specified by path.

to_pickle(path[, compression, profile_name])

Write entityset in the pickle format, location specified by path.

Attributes

dataframe_type

String specifying the library used for the dataframes.

dataframes

metadata

Returns the metadata for this EntitySet.