NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:

pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.

API Reference¶

Demo Datasets¶

`load_retail`([id, nrows, return_single_table])	Returns the retail entityset example.
`load_mock_customer`([n_customers, …])	Return dataframes of mock customer data
`load_flight`([month_filter, …])	Download, clean, and filter flight data from 2017.

Deep Feature Synthesis¶

`dfs`([dataframes, relationships, entityset, …])	Calculates a feature matrix and features given a dictionary of dataframes and a list of relationships.
`get_valid_primitives`(entityset, …[, …])	Returns two lists of primitives (transform and aggregation) containing primitives that can be applied to the specific target dataframe to create features.

Timedelta¶

Timedelta(value[, unit, delta_obj])

Represents differences in time.

Time utils¶

make_temporal_cutoffs(instance_ids, cutoffs)

Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.

Feature Primitives¶

A list of all Featuretools primitives can be obtained by visiting primitives.featurelabs.com.

Primitive Types¶

`TransformPrimitive`()	Feature for dataframe that is a based off one or more other features in that dataframe.
`AggregationPrimitive`()

Primitive Creation Functions¶

`make_agg_primitive`(function, input_types, …)	Returns a new aggregation primitive class.
`make_trans_primitive`(function, input_types, …)	Returns a new transform primitive class

Aggregation Primitives¶

`Count`()	Determines the total number of values, excluding NaN.
`Mean`([skipna])	Computes the average for a list of values.
`Sum`()	Calculates the total addition, ignoring NaN.
`Min`()	Calculates the smallest value, ignoring NaN values.
`Max`()	Calculates the highest value, ignoring NaN values.
`Std`()	Computes the dispersion relative to the mean value, ignoring NaN.
`Median`()	Determines the middlemost number in a list of values.
`Mode`()	Determines the most commonly repeated value.
`AvgTimeBetween`([unit])	Computes the average number of seconds between consecutive events.
`TimeSinceLast`([unit])	Calculates the time elapsed since the last datetime (default in seconds).
`TimeSinceFirst`([unit])	Calculates the time elapsed since the first datetime (in seconds).
`NumUnique`()	Determines the number of distinct values, ignoring NaN values.
`PercentTrue`()	Determines the percent of True values.
`All`()	Calculates if all values are ‘True’ in a list.
`Any`()	Determines if any value is ‘True’ in a list.
`First`()	Determines the first value in a list.
`Last`()	Determines the last value in a list.
`Skew`()	Computes the extent to which a distribution differs from a normal distribution.
`Trend`()	Calculates the trend of a column over time.
`Entropy`([dropna, base])	Calculates the entropy for a categorical column

Transform Primitives¶

Combine features¶

`IsIn`([list_of_outputs])	Determines whether a value is present in a provided list.
`And`()	Element-wise logical AND of two lists.
`Or`()	Element-wise logical OR of two lists.
`Not`()	Negates a boolean value.

General Transform Primitives¶

`Absolute`()	Computes the absolute value of a number.
`Percentile`()	Determines the percentile rank for each value in a list.
`TimeSince`([unit])	Calculates time from a value to a specified cutoff datetime.

Datetime Transform Primitives¶

`Second`()	Determines the seconds value of a datetime.
`Minute`()	Determines the minutes value of a datetime.
`Weekday`()	Determines the day of the week from a datetime.
`IsWeekend`()	Determines if a date falls on a weekend.
`Hour`()	Determines the hour value of a datetime.
`Day`()	Determines the day of the month from a datetime.
`Week`()	Determines the week of the year from a datetime.
`Month`()	Determines the month value of a datetime.
`Year`()	Determines the year value of a datetime.

Cumulative Transform Primitives¶

`Diff`()	Compute the difference between the value in a list and the previous value in that list.
`TimeSincePrevious`([unit])	Compute the time since the previous entry in a list.
`CumCount`()	Calculates the cumulative count.
`CumSum`()	Calculates the cumulative sum.
`CumMean`()	Calculates the cumulative mean.
`CumMin`()	Calculates the cumulative minimum.
`CumMax`()	Calculates the cumulative maximum.

NaturalLanguage Transform Primitives¶

`NumCharacters`()	Calculates the number of characters in a string.
`NumWords`()	Determines the number of words in a string by counting the spaces.

Location Transform Primitives¶

`Latitude`()	Returns the first tuple value in a list of LatLong tuples.
`Longitude`()	Returns the second tuple value in a list of LatLong tuples.
`Haversine`([unit])	Calculates the approximate haversine distance between two LatLong columns.

Feature methods¶

`FeatureBase.rename`(name)	Rename Feature, returns copy
`FeatureBase.get_depth`([stop_at])	Returns depth of feature

Feature calculation¶

calculate_feature_matrix(features[, …])

Calculates a matrix for a given set of instance ids and calculation times.

Feature descriptions¶

describe_feature(feature[, …])

Generates an English language description of a feature.

Feature visualization¶

graph_feature(feature[, to_file, description])

Generates a feature lineage graph for the given feature

Feature encoding¶

encode_features(feature_matrix, features[, …])

Encode categorical features

Feature Selection¶

`remove_low_information_features`(feature_matrix)	Select features that have at least 2 unique values and that are not all null
`remove_highly_correlated_features`(feature_matrix)	Removes columns in feature matrix that are highly correlated with another column.
`remove_highly_null_features`(feature_matrix)	Removes columns from a feature matrix that have higher than a set threshold of null values.
`remove_single_value_features`(feature_matrix)	Removes columns in feature matrix where all the values are the same.

Feature Matrix utils¶

replace_inf_values(feature_matrix[, …])

Replace all np.inf values in a feature matrix with the specified replacement value.

Saving and Loading Features¶

`save_features`(features[, location, profile_name])	Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string.
`load_features`(features[, profile_name])	Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string.

EntitySet, Relationship¶

Constructors¶

`EntitySet`([id, dataframes, relationships])	Stores all actual data and typing information for an entityset
`Relationship`(entityset, …)	Class to represent a relationship between dataframes

EntitySet load and prepare data¶

`EntitySet.add_dataframe`(dataframe[, …])	Add a DataFrame to the EntitySet with Woodwork typing information.
`EntitySet.add_interesting_values`([…])	Find or set interesting values for categorical columns, to be used to generate “where” clauses
`EntitySet.add_last_time_indexes`([…])	Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).
`EntitySet.add_relationship`([…])	Add a new relationship between dataframes in the entityset.
`EntitySet.add_relationships`(relationships)	Add multiple new relationships to a entityset
`EntitySet.concat`(other[, inplace])	Combine entityset with another to create a new entityset with the combined data of both entitysets.
`EntitySet.normalize_dataframe`(…[, …])	Create a new dataframe and relationship from unique values of an existing column.
`EntitySet.set_secondary_time_index`(…)	Set the secondary time index for a dataframe in the EntitySet using its dataframe name.
`EntitySet.replace_dataframe`(dataframe_name, df)	Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.

EntitySet serialization¶

read_entityset(path[, profile_name])

Read entityset from disk, S3 path, or URL.

`EntitySet.to_csv`(path[, sep, encoding, …])	Write entityset to disk in the csv format, location specified by path.
`EntitySet.to_pickle`(path[, compression, …])	Write entityset in the pickle format, location specified by path.
`EntitySet.to_parquet`(path[, engine, …])	Write entityset to disk in the parquet format, location specified by path.

EntitySet query methods¶

`EntitySet.__getitem__`(dataframe_name)	Get dataframe instance from entityset
`EntitySet.find_backward_paths`(…)	Generator which yields all backward paths between a start and goal dataframe.
`EntitySet.find_forward_paths`(…)	Generator which yields all forward paths between a start and goal dataframe.
`EntitySet.get_forward_dataframes`(dataframe_name)	Get dataframes that are in a forward relationship with dataframe
`EntitySet.get_backward_dataframes`(dataframe_name)	Get dataframes that are in a backward relationship with dataframe
`EntitySet.query_by_values`(dataframe_name, …)	Query instances that have column with given value

EntitySet visualization¶

EntitySet.plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

Relationship attributes¶

`Relationship.parent_column`	Column in parent dataframe
`Relationship.child_column`	Column in child dataframe
`Relationship.parent_dataframe`	Parent dataframe object
`Relationship.child_dataframe`	Child dataframe object

Data Type Util Methods¶

`list_logical_types`()	Returns a dataframe describing all of the available Logical Types.
`list_semantic_tags`()	Returns a dataframe describing all of the common semantic tags.

Featuretools External Ecosystem featuretools.demo.load_retail