featuretools.EntitySet

class featuretools.EntitySet(id=None, entities=None, relationships=None)[source]

Stores all actual data for a entityset

id
entity_dict
relationships
time_type
Properties:

metadata

__init__(id=None, entities=None, relationships=None)[source]

Creates EntitySet

Parameters
  • id (str) – Unique identifier to associate with this instance

  • entities (dict[str -> tuple(pd.DataFrame, str, str, dict[str -> Variable])]) – dictionary of entities. Entries take the format {entity id -> (dataframe, id column, (time_index), (variable_types), (make_index))}. Note that time_index, variable_types and make_index are optional.

  • relationships (list[(str, str, str, str)]) – List of relationships between entities. List items are a tuple with the format (parent entity id, parent variable, child entity id, child variable).

Example

entities = {
    "cards" : (card_df, "id"),
    "transactions" : (transactions_df, "id", "transaction_time")
}

relationships = [("cards", "id", "transactions", "card_id")]

ft.EntitySet("my-entity-set", entities, relationships)

Methods

__init__([id, entities, relationships])

Creates EntitySet

add_interesting_values([max_values, verbose])

Find interesting values for categorical variables, to be used to generate “where” clauses

add_last_time_indexes([updated_entities])

Calculates the last time index values for each entity (the last time an instance or children of that instance were observed). Used when calculating features using training windows :param updated_entities: List of entity ids to update last_time_index for (will update all parents of those entities as well) :type updated_entities: list[str].

add_relationship(relationship)

Add a new relationship between entities in the entityset

add_relationships(relationships)

Add multiple new relationships to a entityset

concat(other[, inplace])

Combine entityset with another to create a new entityset with the combined data of both entitysets.

entity_from_dataframe(entity_id, dataframe)

Load the data for a specified entity from a Pandas DataFrame.

find_backward_paths(start_entity_id, …)

Generator which yields all backward paths between a start and goal entity.

find_forward_paths(start_entity_id, …)

Generator which yields all forward paths between a start and goal entity.

get_backward_entities(entity_id[, deep])

Get entities that are in a backward relationship with entity

get_backward_relationships(entity_id)

get relationships where entity “entity_id” is the parent.

get_forward_entities(entity_id[, deep])

Get entities that are in a forward relationship with entity

get_forward_relationships(entity_id)

Get relationships where entity “entity_id” is the child

has_unique_forward_path(start_entity_id, …)

Is the forward path from start to end unique?

normalize_entity(base_entity_id, …[, …])

Create a new entity and relationship from unique values of an existing variable.

plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

query_by_values(entity_id, instance_vals[, …])

Query instances that have variable with given value

reset_data_description()

to_csv(path[, sep, encoding, engine, …])

Write entityset to disk in the csv format, location specified by path.

to_dictionary()

to_parquet(path[, engine, compression, …])

Write entityset to disk in the parquet format, location specified by path.

to_pickle(path[, compression, profile_name])

Write entityset in the pickle format, location specified by path.

Attributes

dataframe_type

String specifying the library used for the entity dataframes.

entities

metadata

Returns the metadata for this EntitySet.