NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
featuretools.
EntitySet
Stores all actual data for a entityset
id
entity_dict
relationships
time_type
metadata
__init__
Creates EntitySet
id (str) – Unique identifier to associate with this instance
entities (dict[str -> tuple(pd.DataFrame, str, str, dict[str -> Variable])]) – dictionary of entities. Entries take the format {entity id -> (dataframe, id column, (time_index), (variable_types), (make_index))}. Note that time_index, variable_types and make_index are optional.
relationships (list[(str, str, str, str)]) – List of relationships between entities. List items are a tuple with the format (parent entity id, parent variable, child entity id, child variable).
Example
entities = { "cards" : (card_df, "id"), "transactions" : (transactions_df, "id", "transaction_time") } relationships = [("cards", "id", "transactions", "card_id")] ft.EntitySet("my-entity-set", entities, relationships)
Methods
__init__([id, entities, relationships])
add_interesting_values([max_values, verbose])
add_interesting_values
Find interesting values for categorical variables, to be used to generate “where” clauses
add_last_time_indexes([updated_entities])
add_last_time_indexes
Calculates the last time index values for each entity (the last time an instance or children of that instance were observed). Used when calculating features using training windows :param updated_entities: List of entity ids to update last_time_index for (will update all parents of those entities as well) :type updated_entities: list[str].
add_relationship(relationship)
add_relationship
Add a new relationship between entities in the entityset
add_relationships(relationships)
add_relationships
Add multiple new relationships to a entityset
concat(other[, inplace])
concat
Combine entityset with another to create a new entityset with the combined data of both entitysets.
entity_from_dataframe(entity_id, dataframe)
entity_from_dataframe
Load the data for a specified entity from a Pandas DataFrame.
find_backward_paths(start_entity_id, …)
find_backward_paths
Generator which yields all backward paths between a start and goal entity.
find_forward_paths(start_entity_id, …)
find_forward_paths
Generator which yields all forward paths between a start and goal entity.
get_backward_entities(entity_id[, deep])
get_backward_entities
Get entities that are in a backward relationship with entity
get_backward_relationships(entity_id)
get_backward_relationships
get relationships where entity “entity_id” is the parent.
get_forward_entities(entity_id[, deep])
get_forward_entities
Get entities that are in a forward relationship with entity
get_forward_relationships(entity_id)
get_forward_relationships
Get relationships where entity “entity_id” is the child
has_unique_forward_path(start_entity_id, …)
has_unique_forward_path
Is the forward path from start to end unique?
normalize_entity(base_entity_id, …[, …])
normalize_entity
Create a new entity and relationship from unique values of an existing variable.
plot([to_file])
plot
Create a UML diagram-ish graph of the EntitySet.
query_by_values(entity_id, instance_vals[, …])
query_by_values
Query instances that have variable with given value
reset_data_description()
reset_data_description
to_csv(path[, sep, encoding, engine, …])
to_csv
Write entityset to disk in the csv format, location specified by path.
to_dictionary()
to_dictionary
to_parquet(path[, engine, compression, …])
to_parquet
Write entityset to disk in the parquet format, location specified by path.
to_pickle(path[, compression, profile_name])
to_pickle
Write entityset in the pickle format, location specified by path.
Attributes
dataframe_type
String specifying the library used for the entity dataframes.
entities
Returns the metadata for this EntitySet.