NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release:

pip install featuretools==1.0.0rc1

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.


featuretools.EntitySet

class featuretools.EntitySet(id=None, entities=None, relationships=None)[source]

Stores all actual data for a entityset

id
entity_dict
relationships
time_type
Properties:

metadata

__init__(id=None, entities=None, relationships=None)[source]

Creates EntitySet

Parameters
  • id (str) – Unique identifier to associate with this instance

  • entities (dict[str -> tuple(pd.DataFrame, str, str, dict[str -> Variable])]) – dictionary of entities. Entries take the format {entity id -> (dataframe, id column, (time_index), (variable_types), (make_index))}. Note that time_index, variable_types and make_index are optional.

  • relationships (list[(str, str, str, str)]) – List of relationships between entities. List items are a tuple with the format (parent entity id, parent variable, child entity id, child variable).

Example

entities = {
    "cards" : (card_df, "id"),
    "transactions" : (transactions_df, "id", "transaction_time")
}

relationships = [("cards", "id", "transactions", "card_id")]

ft.EntitySet("my-entity-set", entities, relationships)

Methods

__init__([id, entities, relationships])

Creates EntitySet

add_interesting_values([max_values, verbose])

Find interesting values for categorical variables, to be used to generate “where” clauses

add_last_time_indexes([updated_entities])

Calculates the last time index values for each entity (the last time an instance or children of that instance were observed). Used when calculating features using training windows :param updated_entities: List of entity ids to update last_time_index for (will update all parents of those entities as well) :type updated_entities: list[str].

add_relationship(relationship)

Add a new relationship between entities in the entityset

add_relationships(relationships)

Add multiple new relationships to a entityset

concat(other[, inplace])

Combine entityset with another to create a new entityset with the combined data of both entitysets.

entity_from_dataframe(entity_id, dataframe)

Load the data for a specified entity from a Pandas DataFrame.

find_backward_paths(start_entity_id, …)

Generator which yields all backward paths between a start and goal entity.

find_forward_paths(start_entity_id, …)

Generator which yields all forward paths between a start and goal entity.

get_backward_entities(entity_id[, deep])

Get entities that are in a backward relationship with entity

get_backward_relationships(entity_id)

get relationships where entity “entity_id” is the parent.

get_forward_entities(entity_id[, deep])

Get entities that are in a forward relationship with entity

get_forward_relationships(entity_id)

Get relationships where entity “entity_id” is the child

has_unique_forward_path(start_entity_id, …)

Is the forward path from start to end unique?

normalize_entity(base_entity_id, …[, …])

Create a new entity and relationship from unique values of an existing variable.

plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

query_by_values(entity_id, instance_vals[, …])

Query instances that have variable with given value

reset_data_description()

to_csv(path[, sep, encoding, engine, …])

Write entityset to disk in the csv format, location specified by path.

to_dictionary()

to_parquet(path[, engine, compression, …])

Write entityset to disk in the parquet format, location specified by path.

to_pickle(path[, compression, profile_name])

Write entityset in the pickle format, location specified by path.

Attributes

dataframe_type

String specifying the library used for the entity dataframes.

entities

metadata

Returns the metadata for this EntitySet.