NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
load_retail([id, nrows, return_single_table])
load_retail
Returns the retail entityset example.
load_mock_customer([n_customers, …])
load_mock_customer
Return dataframes of mock customer data
load_flight([month_filter, …])
load_flight
Download, clean, and filter flight data from 2017.
dfs([entities, relationships, entityset, …])
dfs
Calculates a feature matrix and features given a dictionary of entities and a list of relationships.
get_valid_primitives(entityset, target_entity)
get_valid_primitives
Returns two lists of primitives (transform and aggregation) containing primitives that can be applied to the specific target entity to create features.
Timedelta(value[, unit, delta_obj])
Timedelta
Represents differences in time.
make_temporal_cutoffs(instance_ids, cutoffs)
make_temporal_cutoffs
Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.
A list of all Featuretools primitives can be obtained by visiting primitives.featurelabs.com.
TransformPrimitive()
TransformPrimitive
Feature for entity that is a based off one or more other features in that entity.
AggregationPrimitive()
AggregationPrimitive
make_agg_primitive(function, input_types, …)
make_agg_primitive
Returns a new aggregation primitive class.
make_trans_primitive(function, input_types, …)
make_trans_primitive
Returns a new transform primitive class
Count()
Count
Determines the total number of values, excluding NaN.
Mean([skipna])
Mean
Computes the average for a list of values.
Sum()
Sum
Calculates the total addition, ignoring NaN.
Min()
Min
Calculates the smallest value, ignoring NaN values.
Max()
Max
Calculates the highest value, ignoring NaN values.
Std()
Std
Computes the dispersion relative to the mean value, ignoring NaN.
Median()
Median
Determines the middlemost number in a list of values.
Mode()
Mode
Determines the most commonly repeated value.
AvgTimeBetween([unit])
AvgTimeBetween
Computes the average number of seconds between consecutive events.
TimeSinceLast([unit])
TimeSinceLast
Calculates the time elapsed since the last datetime (default in seconds).
TimeSinceFirst([unit])
TimeSinceFirst
Calculates the time elapsed since the first datetime (in seconds).
NumUnique()
NumUnique
Determines the number of distinct values, ignoring NaN values.
PercentTrue()
PercentTrue
Determines the percent of True values.
All()
All
Calculates if all values are ‘True’ in a list.
Any()
Any
Determines if any value is ‘True’ in a list.
First()
First
Determines the first value in a list.
Last()
Last
Determines the last value in a list.
Skew()
Skew
Computes the extent to which a distribution differs from a normal distribution.
Trend()
Trend
Calculates the trend of a variable over time.
Entropy([dropna, base])
Entropy
Calculates the entropy for a categorical variable
IsIn([list_of_outputs])
IsIn
Determines whether a value is present in a provided list.
And()
And
Element-wise logical AND of two lists.
Or()
Or
Element-wise logical OR of two lists.
Not()
Not
Negates a boolean value.
Absolute()
Absolute
Computes the absolute value of a number.
Percentile()
Percentile
Determines the percentile rank for each value in a list.
TimeSince([unit])
TimeSince
Calculates time from a value to a specified cutoff datetime.
Second()
Second
Determines the seconds value of a datetime.
Minute()
Minute
Determines the minutes value of a datetime.
Weekday()
Weekday
Determines the day of the week from a datetime.
IsWeekend()
IsWeekend
Determines if a date falls on a weekend.
Hour()
Hour
Determines the hour value of a datetime.
Day()
Day
Determines the day of the month from a datetime.
Week()
Week
Determines the week of the year from a datetime.
Month()
Month
Determines the month value of a datetime.
Year()
Year
Determines the year value of a datetime.
Diff()
Diff
Compute the difference between the value in a list and the previous value in that list.
TimeSincePrevious([unit])
TimeSincePrevious
Compute the time since the previous entry in a list.
CumCount()
CumCount
Calculates the cumulative count.
CumSum()
CumSum
Calculates the cumulative sum.
CumMean()
CumMean
Calculates the cumulative mean.
CumMin()
CumMin
Calculates the cumulative minimum.
CumMax()
CumMax
Calculates the cumulative maximum.
NumCharacters()
NumCharacters
Calculates the number of characters in a string.
NumWords()
NumWords
Determines the number of words in a string by counting the spaces.
Latitude()
Latitude
Returns the first tuple value in a list of LatLong tuples.
Longitude()
Longitude
Returns the second tuple value in a list of LatLong tuples.
Haversine([unit])
Haversine
Calculates the approximate haversine distance between two LatLong variable types.
FeatureBase.rename(name)
FeatureBase.rename
Rename Feature, returns copy
FeatureBase.get_depth([stop_at])
FeatureBase.get_depth
Returns depth of feature
calculate_feature_matrix(features[, …])
calculate_feature_matrix
Calculates a matrix for a given set of instance ids and calculation times.
describe_feature(feature[, …])
describe_feature
Generates an English language description of a feature.
graph_feature(feature[, to_file, description])
graph_feature
Generates a feature lineage graph for the given feature
encode_features(feature_matrix, features[, …])
encode_features
Encode categorical features
remove_low_information_features(feature_matrix)
remove_low_information_features
Select features that have at least 2 unique values and that are not all null
remove_highly_correlated_features(feature_matrix)
remove_highly_correlated_features
Removes columns in feature matrix that are highly correlated with another column.
remove_highly_null_features(feature_matrix)
remove_highly_null_features
Removes columns from a feature matrix that have higher than a set threshold of null values.
remove_single_value_features(feature_matrix)
remove_single_value_features
Removes columns in feature matrix where all the values are the same.
replace_inf_values(feature_matrix[, …])
replace_inf_values
Replace all np.inf values in a feature matrix with the specified replacement value.
np.inf
save_features(features[, location, profile_name])
save_features
Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string.
load_features(features[, profile_name])
load_features
Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string.
EntitySet([id, entities, relationships])
EntitySet
Stores all actual data for a entityset
Entity(id, df, entityset[, variable_types, …])
Entity
Represents an entity in a Entityset, and stores relevant metadata and data
Relationship(parent_variable, child_variable)
Relationship
Class to represent an relationship between entities
EntitySet.entity_from_dataframe(entity_id, …)
EntitySet.entity_from_dataframe
Load the data for a specified entity from a Pandas DataFrame.
EntitySet.add_relationship(relationship)
EntitySet.add_relationship
Add a new relationship between entities in the entityset
EntitySet.normalize_entity(base_entity_id, …)
EntitySet.normalize_entity
Create a new entity and relationship from unique values of an existing variable.
EntitySet.add_interesting_values([…])
EntitySet.add_interesting_values
Find interesting values for categorical variables, to be used to generate “where” clauses
read_entityset(path[, profile_name])
read_entityset
Read entityset from disk, S3 path, or URL.
EntitySet.to_csv(path[, sep, encoding, …])
EntitySet.to_csv
Write entityset to disk in the csv format, location specified by path.
EntitySet.to_pickle(path[, compression, …])
EntitySet.to_pickle
Write entityset in the pickle format, location specified by path.
EntitySet.to_parquet(path[, engine, …])
EntitySet.to_parquet
Write entityset to disk in the parquet format, location specified by path.
EntitySet.__getitem__(entity_id)
EntitySet.__getitem__
Get entity instance from entityset
EntitySet.find_backward_paths(…)
EntitySet.find_backward_paths
Generator which yields all backward paths between a start and goal entity.
EntitySet.find_forward_paths(…)
EntitySet.find_forward_paths
Generator which yields all forward paths between a start and goal entity.
EntitySet.get_forward_entities(entity_id[, deep])
EntitySet.get_forward_entities
Get entities that are in a forward relationship with entity
EntitySet.get_backward_entities(entity_id[, …])
EntitySet.get_backward_entities
Get entities that are in a backward relationship with entity
EntitySet.plot([to_file])
EntitySet.plot
Create a UML diagram-ish graph of the EntitySet.
Entity.convert_variable_type(variable_id, …)
Entity.convert_variable_type
Convert variable in dataframe to different type
Entity.add_interesting_values([max_values, …])
Entity.add_interesting_values
Find interesting values for categorical variables, to be used to
Relationship.parent_variable
Instance of variable in parent entity
Relationship.child_variable
Instance of variable in child entity
Relationship.parent_entity
Parent entity object
Relationship.child_entity
Child entity object
Index(id, entity[, name, description])
Index
Represents variables that uniquely identify an instance of an entity
Id(id, entity[, name, categories])
Id
Represents variables that identify another entity
TimeIndex(id, entity[, name, description])
TimeIndex
Represents time index of entity
DatetimeTimeIndex(id, entity[, name, format])
DatetimeTimeIndex
Represents time index of entity that is a datetime
NumericTimeIndex(id, entity[, name, range, …])
NumericTimeIndex
Represents time index of entity that is numeric
Datetime(id, entity[, name, format])
Datetime
Represents variables that are points in time
Numeric(id, entity[, name, range, …])
Numeric
Represents variables that contain numeric values
Categorical(id, entity[, name, categories])
Categorical
Represents variables that can take an unordered discrete values
Ordinal(id, entity[, name])
Ordinal
Represents variables that take on an ordered discrete value
Boolean(id, entity[, name, true_values, …])
Boolean
Represents variables that take on one of two values
NaturalLanguage(id, entity[, name, description])
NaturalLanguage
Represents variables that are arbitary strings
LatLong(id, entity[, name, description])
LatLong
Represents an ordered pair (Latitude, Longitude) To make a latlong in a dataframe do data[‘latlong’] = data[[‘latitude’, ‘longitude’]].apply(tuple, axis=1)
ZIPCode(id, entity[, name, categories])
ZIPCode
Represents a postal address in the United States.
IPAddress(id, entity[, name, description])
IPAddress
Represents a computer network address.
FullName(id, entity[, name, description])
FullName
Represents a person’s full name.
EmailAddress(id, entity[, name, description])
EmailAddress
Represents an email box to which email message are sent.
URL(id, entity[, name, description])
URL
Represents a valid web url (with or without http/www)
PhoneNumber(id, entity[, name, description])
PhoneNumber
Represents any valid phone number.
DateOfBirth(id, entity[, name, format])
DateOfBirth
Represents a date of birth as a datetime
CountryCode(id, entity[, name, categories])
CountryCode
Represents an ISO-3166 standard country code.
SubRegionCode(id, entity[, name, categories])
SubRegionCode
Represents an ISO-3166 standard sub-region code.
FilePath(id, entity[, name, description])
FilePath
Represents a valid filepath, absolute or relative
find_variable_types()
find_variable_types
Retrieves all Variable types as a dictionary where key is type_string
list_variable_types()
list_variable_types
Retrieves all Variable types as a dataframe, with the column headers
graph_variable_types([to_file])
graph_variable_types
Create a UML diagram-ish graph of all the Variables.