NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
load_retail([id, nrows, return_single_table])
load_retail
Returns the retail entityset example.
load_mock_customer([n_customers, …])
load_mock_customer
Return dataframes of mock customer data
load_flight([month_filter, …])
load_flight
Download, clean, and filter flight data from 2017.
dfs([dataframes, relationships, entityset, …])
dfs
Calculates a feature matrix and features given a dictionary of dataframes and a list of relationships.
get_valid_primitives(entityset, …[, …])
get_valid_primitives
Returns two lists of primitives (transform and aggregation) containing primitives that can be applied to the specific target dataframe to create features.
Timedelta(value[, unit, delta_obj])
Timedelta
Represents differences in time.
make_temporal_cutoffs(instance_ids, cutoffs)
make_temporal_cutoffs
Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.
A list of all Featuretools primitives can be obtained by visiting primitives.featurelabs.com.
TransformPrimitive()
TransformPrimitive
Feature for dataframe that is a based off one or more other features in that dataframe.
AggregationPrimitive()
AggregationPrimitive
make_agg_primitive(function, input_types, …)
make_agg_primitive
Returns a new aggregation primitive class.
make_trans_primitive(function, input_types, …)
make_trans_primitive
Returns a new transform primitive class
Count()
Count
Determines the total number of values, excluding NaN.
Mean([skipna])
Mean
Computes the average for a list of values.
Sum()
Sum
Calculates the total addition, ignoring NaN.
Min()
Min
Calculates the smallest value, ignoring NaN values.
Max()
Max
Calculates the highest value, ignoring NaN values.
Std()
Std
Computes the dispersion relative to the mean value, ignoring NaN.
Median()
Median
Determines the middlemost number in a list of values.
Mode()
Mode
Determines the most commonly repeated value.
AvgTimeBetween([unit])
AvgTimeBetween
Computes the average number of seconds between consecutive events.
TimeSinceLast([unit])
TimeSinceLast
Calculates the time elapsed since the last datetime (default in seconds).
TimeSinceFirst([unit])
TimeSinceFirst
Calculates the time elapsed since the first datetime (in seconds).
NumUnique()
NumUnique
Determines the number of distinct values, ignoring NaN values.
PercentTrue()
PercentTrue
Determines the percent of True values.
All()
All
Calculates if all values are ‘True’ in a list.
Any()
Any
Determines if any value is ‘True’ in a list.
First()
First
Determines the first value in a list.
Last()
Last
Determines the last value in a list.
Skew()
Skew
Computes the extent to which a distribution differs from a normal distribution.
Trend()
Trend
Calculates the trend of a column over time.
Entropy([dropna, base])
Entropy
Calculates the entropy for a categorical column
IsIn([list_of_outputs])
IsIn
Determines whether a value is present in a provided list.
And()
And
Element-wise logical AND of two lists.
Or()
Or
Element-wise logical OR of two lists.
Not()
Not
Negates a boolean value.
Absolute()
Absolute
Computes the absolute value of a number.
Percentile()
Percentile
Determines the percentile rank for each value in a list.
TimeSince([unit])
TimeSince
Calculates time from a value to a specified cutoff datetime.
Second()
Second
Determines the seconds value of a datetime.
Minute()
Minute
Determines the minutes value of a datetime.
Weekday()
Weekday
Determines the day of the week from a datetime.
IsWeekend()
IsWeekend
Determines if a date falls on a weekend.
Hour()
Hour
Determines the hour value of a datetime.
Day()
Day
Determines the day of the month from a datetime.
Week()
Week
Determines the week of the year from a datetime.
Month()
Month
Determines the month value of a datetime.
Year()
Year
Determines the year value of a datetime.
Diff()
Diff
Compute the difference between the value in a list and the previous value in that list.
TimeSincePrevious([unit])
TimeSincePrevious
Compute the time since the previous entry in a list.
CumCount()
CumCount
Calculates the cumulative count.
CumSum()
CumSum
Calculates the cumulative sum.
CumMean()
CumMean
Calculates the cumulative mean.
CumMin()
CumMin
Calculates the cumulative minimum.
CumMax()
CumMax
Calculates the cumulative maximum.
NumCharacters()
NumCharacters
Calculates the number of characters in a string.
NumWords()
NumWords
Determines the number of words in a string by counting the spaces.
Latitude()
Latitude
Returns the first tuple value in a list of LatLong tuples.
Longitude()
Longitude
Returns the second tuple value in a list of LatLong tuples.
Haversine([unit])
Haversine
Calculates the approximate haversine distance between two LatLong columns.
FeatureBase.rename(name)
FeatureBase.rename
Rename Feature, returns copy
FeatureBase.get_depth([stop_at])
FeatureBase.get_depth
Returns depth of feature
calculate_feature_matrix(features[, …])
calculate_feature_matrix
Calculates a matrix for a given set of instance ids and calculation times.
describe_feature(feature[, …])
describe_feature
Generates an English language description of a feature.
graph_feature(feature[, to_file, description])
graph_feature
Generates a feature lineage graph for the given feature
encode_features(feature_matrix, features[, …])
encode_features
Encode categorical features
remove_low_information_features(feature_matrix)
remove_low_information_features
Select features that have at least 2 unique values and that are not all null
remove_highly_correlated_features(feature_matrix)
remove_highly_correlated_features
Removes columns in feature matrix that are highly correlated with another column.
remove_highly_null_features(feature_matrix)
remove_highly_null_features
Removes columns from a feature matrix that have higher than a set threshold of null values.
remove_single_value_features(feature_matrix)
remove_single_value_features
Removes columns in feature matrix where all the values are the same.
replace_inf_values(feature_matrix[, …])
replace_inf_values
Replace all np.inf values in a feature matrix with the specified replacement value.
np.inf
save_features(features[, location, profile_name])
save_features
Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string.
load_features(features[, profile_name])
load_features
Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string.
EntitySet([id, dataframes, relationships])
EntitySet
Stores all actual data and typing information for an entityset
Relationship(entityset, …)
Relationship
Class to represent a relationship between dataframes
EntitySet.add_dataframe(dataframe[, …])
EntitySet.add_dataframe
Add a DataFrame to the EntitySet with Woodwork typing information.
EntitySet.add_interesting_values([…])
EntitySet.add_interesting_values
Find or set interesting values for categorical columns, to be used to generate “where” clauses
EntitySet.add_last_time_indexes([…])
EntitySet.add_last_time_indexes
Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).
EntitySet.add_relationship([…])
EntitySet.add_relationship
Add a new relationship between dataframes in the entityset.
EntitySet.add_relationships(relationships)
EntitySet.add_relationships
Add multiple new relationships to a entityset
EntitySet.concat(other[, inplace])
EntitySet.concat
Combine entityset with another to create a new entityset with the combined data of both entitysets.
EntitySet.normalize_dataframe(…[, …])
EntitySet.normalize_dataframe
Create a new dataframe and relationship from unique values of an existing column.
EntitySet.set_secondary_time_index(…)
EntitySet.set_secondary_time_index
Set the secondary time index for a dataframe in the EntitySet using its dataframe name.
EntitySet.replace_dataframe(dataframe_name, df)
EntitySet.replace_dataframe
Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.
read_entityset(path[, profile_name])
read_entityset
Read entityset from disk, S3 path, or URL.
EntitySet.to_csv(path[, sep, encoding, …])
EntitySet.to_csv
Write entityset to disk in the csv format, location specified by path.
EntitySet.to_pickle(path[, compression, …])
EntitySet.to_pickle
Write entityset in the pickle format, location specified by path.
EntitySet.to_parquet(path[, engine, …])
EntitySet.to_parquet
Write entityset to disk in the parquet format, location specified by path.
EntitySet.__getitem__(dataframe_name)
EntitySet.__getitem__
Get dataframe instance from entityset
EntitySet.find_backward_paths(…)
EntitySet.find_backward_paths
Generator which yields all backward paths between a start and goal dataframe.
EntitySet.find_forward_paths(…)
EntitySet.find_forward_paths
Generator which yields all forward paths between a start and goal dataframe.
EntitySet.get_forward_dataframes(dataframe_name)
EntitySet.get_forward_dataframes
Get dataframes that are in a forward relationship with dataframe
EntitySet.get_backward_dataframes(dataframe_name)
EntitySet.get_backward_dataframes
Get dataframes that are in a backward relationship with dataframe
EntitySet.query_by_values(dataframe_name, …)
EntitySet.query_by_values
Query instances that have column with given value
EntitySet.plot([to_file])
EntitySet.plot
Create a UML diagram-ish graph of the EntitySet.
Relationship.parent_column
Column in parent dataframe
Relationship.child_column
Column in child dataframe
Relationship.parent_dataframe
Parent dataframe object
Relationship.child_dataframe
Child dataframe object
list_logical_types()
list_logical_types
Returns a dataframe describing all of the available Logical Types.
list_semantic_tags()
list_semantic_tags
Returns a dataframe describing all of the common semantic tags.