NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:

pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.


featuretools.Entity

class featuretools.Entity(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, already_sorted=False, make_index=False, verbose=False)[source]

Represents an entity in a Entityset, and stores relevant metadata and data

An Entity is analogous to a table in a relational database

See also

Relationship, Variable, EntitySet

__init__(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, already_sorted=False, make_index=False, verbose=False)[source]

Create Entity

Parameters
  • id (str) – Id of Entity.

  • df (pd.DataFrame) – Dataframe providing the data for the entity.

  • entityset (EntitySet) – Entityset for this Entity.

  • variable_types (dict[str -> type/str/dict[str -> type]]) – An entity’s variable_types dict maps string variable ids to types (Variable) or type_string (str) or (type, kwargs) to pass keyword arguments to the Variable.

  • index (str) – Name of id column in the dataframe.

  • time_index (str) – Name of time column in the dataframe.

  • secondary_time_index (dict[str -> str]) – Dictionary mapping columns in the dataframe to the time index column they are associated with.

  • last_time_index (pd.Series) – Time index of the last event for each instance across all child entities.

  • make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers the (0, len(dataframe)). Otherwise, assume index exists in dataframe.

Methods

__init__(id, df, entityset[, …])

Create Entity

add_interesting_values([max_values, verbose])

Find interesting values for categorical variables, to be used to

convert_variable_type(variable_id, new_type)

Convert variable in dataframe to different type

delete_variables(variable_ids)

Remove variables from entity’s dataframe and from self.variables

set_index(variable_id[, unique])

param variable_id

Name of an existing variable to set as index.

set_secondary_time_index(secondary_time_index)

set_time_index(variable_id[, already_sorted])

update_data(df[, already_sorted, …])

Update entity’s internal dataframe, optionaly making sure data is sorted, reference indexes to other entities are consistent, and last_time_indexes are consistent.

Attributes

df

Dataframe providing the data for the entity.

last_time_index

Time index of the last event for each instance across all child entities.

shape

Shape of the entity’s dataframe

variable_types

Dictionary mapping variable id’s to variable types