NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
featuretools.
EntitySet
Stores all actual data and typing information for an entityset
id
dataframe_dict
relationships
time_type
metadata
__init__
Creates EntitySet
id (str) – Unique identifier to associate with this instance
dataframes (dict[str -> tuple(DataFrame, str, str, dict[str -> str/Woodwork.LogicalType], dict[str->str/set], boolean)]) – Dictionary of DataFrames. Entries take the format {dataframe name -> (dataframe, index column, time_index, logical_types, semantic_tags, make_index)}. Note that only the dataframe is required. If a Woodwork DataFrame is supplied, any other parameters will be ignored.
relationships (list[(str, str, str, str)]) – List of relationships between dataframes. List items are a tuple with the format (parent dataframe name, parent column, child dataframe name, child column).
Example
dataframes = { "cards" : (card_df, "id"), "transactions" : (transactions_df, "id", "transaction_time") } relationships = [("cards", "id", "transactions", "card_id")] ft.EntitySet("my-entity-set", dataframes, relationships)
Methods
__init__([id, dataframes, relationships])
add_dataframe(dataframe[, dataframe_name, …])
add_dataframe
Add a DataFrame to the EntitySet with Woodwork typing information.
add_interesting_values([max_values, …])
add_interesting_values
Find or set interesting values for categorical columns, to be used to generate “where” clauses
add_last_time_indexes([updated_dataframes])
add_last_time_indexes
Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).
add_relationship([parent_dataframe_name, …])
add_relationship
Add a new relationship between dataframes in the entityset.
add_relationships(relationships)
add_relationships
Add multiple new relationships to a entityset
concat(other[, inplace])
concat
Combine entityset with another to create a new entityset with the combined data of both entitysets.
find_backward_paths(start_dataframe_name, …)
find_backward_paths
Generator which yields all backward paths between a start and goal dataframe.
find_forward_paths(start_dataframe_name, …)
find_forward_paths
Generator which yields all forward paths between a start and goal dataframe.
get_backward_dataframes(dataframe_name[, deep])
get_backward_dataframes
Get dataframes that are in a backward relationship with dataframe
get_backward_relationships(dataframe_name)
get_backward_relationships
get relationships where dataframe “dataframe_name” is the parent.
get_forward_dataframes(dataframe_name[, deep])
get_forward_dataframes
Get dataframes that are in a forward relationship with dataframe
get_forward_relationships(dataframe_name)
get_forward_relationships
Get relationships where dataframe “dataframe_name” is the child
has_unique_forward_path(…)
has_unique_forward_path
Is the forward path from start to end unique?
normalize_dataframe(base_dataframe_name, …)
normalize_dataframe
Create a new dataframe and relationship from unique values of an existing column.
plot([to_file])
plot
Create a UML diagram-ish graph of the EntitySet.
query_by_values(dataframe_name, instance_vals)
query_by_values
Query instances that have column with given value
replace_dataframe(dataframe_name, df[, …])
replace_dataframe
Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.
reset_data_description()
reset_data_description
set_secondary_time_index(dataframe_name, …)
set_secondary_time_index
Set the secondary time index for a dataframe in the EntitySet using its dataframe name.
to_csv(path[, sep, encoding, engine, …])
to_csv
Write entityset to disk in the csv format, location specified by path.
to_dictionary()
to_dictionary
to_parquet(path[, engine, compression, …])
to_parquet
Write entityset to disk in the parquet format, location specified by path.
to_pickle(path[, compression, profile_name])
to_pickle
Write entityset in the pickle format, location specified by path.
Attributes
dataframe_type
String specifying the library used for the dataframes.
dataframes
Returns the metadata for this EntitySet.