API Reference¶

Demo Datasets¶

`load_retail`([id, nrows, return_single_table])	Returns the retail entityset example.
`load_mock_customer`([n_customers, …])	Return dataframes of mock customer data
`load_flight`([month_filter, …])	Download, clean, and filter flight data from 2017.

Deep Feature Synthesis¶

dfs([entities, relationships, entityset, …])

Calculates a feature matrix and features given a dictionary of entities and a list of relationships.

Wrappers¶

Scikit-learn (BETA)¶

wrappers.DFSTransformer([entities, …])

Transformer using Scikit-Learn interface for Pipeline uses.

Timedelta¶

Timedelta(value[, unit, delta_obj])

Represents differences in time.

Time utils¶

make_temporal_cutoffs(instance_ids, cutoffs)

Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.

Feature Primitives¶

A list of all Featuretools primitives can be obtained by visiting primitives.featurelabs.com.

Primitive Types¶

`TransformPrimitive`()	Feature for entity that is a based off one or more other features in that entity.
`AggregationPrimitive`()

Primitive Creation Functions¶

`make_agg_primitive`(function, input_types, …)	Returns a new aggregation primitive class.
`make_trans_primitive`(function, input_types, …)	Returns a new transform primitive class

Aggregation Primitives¶

`Count`()	Determines the total number of values, excluding NaN.
`Mean`([skipna])	Computes the average for a list of values.
`Sum`()	Calculates the total addition, ignoring NaN.
`Min`()	Calculates the smallest value, ignoring NaN values.
`Max`()	Calculates the highest value, ignoring NaN values.
`Std`()	Computes the dispersion relative to the mean value, ignoring NaN.
`Median`()	Determines the middlemost number in a list of values.
`Mode`()	Determines the most commonly repeated value.
`AvgTimeBetween`([unit])	Computes the average number of seconds between consecutive events.
`TimeSinceLast`([unit])	Calculates the time elapsed since the last datetime (default in seconds).
`TimeSinceFirst`([unit])	Calculates the time elapsed since the first datetime (in seconds).
`NumUnique`()	Determines the number of distinct values, ignoring NaN values.
`PercentTrue`()	Determines the percent of True values.
`All`()	Calculates if all values are ‘True’ in a list.
`Any`()	Determines if any value is ‘True’ in a list.
`First`()	Determines the first value in a list.
`Last`()	Determines the last value in a list.
`Skew`()	Computes the extent to which a distribution differs from a normal distribution.
`Trend`()	Calculates the trend of a variable over time.
`Entropy`([dropna, base])	Calculates the entropy for a categorical variable

Transform Primitives¶

Combine features¶

`IsIn`([list_of_outputs])	Determines whether a value is present in a provided list.
`And`()	Element-wise logical AND of two lists.
`Or`()	Element-wise logical OR of two lists.
`Not`()	Negates a boolean value.

General Transform Primitives¶

`Absolute`()	Computes the absolute value of a number.
`Percentile`()	Determines the percentile rank for each value in a list.
`TimeSince`([unit])	Calculates time from a value to a specified cutoff datetime.

Datetime Transform Primitives¶

`Second`()	Determines the seconds value of a datetime.
`Minute`()	Determines the minutes value of a datetime.
`Weekday`()	Determines the day of the week from a datetime.
`IsWeekend`()	Determines if a date falls on a weekend.
`Hour`()	Determines the hour value of a datetime.
`Day`()	Determines the day of the month from a datetime.
`Week`()	Determines the week of the year from a datetime.
`Month`()	Determines the month value of a datetime.
`Year`()	Determines the year value of a datetime.

Cumulative Transform Primitives¶

`Diff`()	Compute the difference between the value in a list and the previous value in that list.
`TimeSincePrevious`([unit])	Compute the time since the previous entry in a list.
`CumCount`()	Calculates the cumulative count.
`CumSum`()	Calculates the cumulative sum.
`CumMean`()	Calculates the cumulative mean.
`CumMin`()	Calculates the cumulative minimum.
`CumMax`()	Calculates the cumulative maximum.

Text Transform Primitives¶

`NumCharacters`()	Calculates the number of characters in a string.
`NumWords`()	Determines the number of words in a string by counting the spaces.

Location Transform Primitives¶

`Latitude`()	Returns the first tuple value in a list of LatLong tuples.
`Longitude`()	Returns the second tuple value in a list of LatLong tuples.
`Haversine`([unit])	Calculates the approximate haversine distance between two LatLong variable types.

Natural Language Processing Primitives¶

Natural Language Processing primitives create features for textual data. For more information on how to use and install these primitives, see here.

`DiversityScore`()	Calculates the overall complexity of the text based on the total
`LSA`()	Calculates the Latent Semantic Analysis Values of Text Input
`MeanCharactersPerWord`()	Determines the mean number of characters per word.
`PartOfSpeechCount`()	Calculates the occurences of each different part of speech.
`PolarityScore`()	Calculates the polarity of a text on a scale from -1 (negative) to 1 (positive)
`PunctuationCount`()	Determines number of punctuation characters in a string.
`StopwordCount`()	Determines number of stopwords in a string.
`TitleWordCount`()	Determines the number of title words in a string.
`UniversalSentenceEncoder`()	Transforms a sentence or short paragraph to a vector using [tfhub model](https://tfhub.dev/google/universal-sentence-encoder/2)
`UpperCaseCount`()	Calculates the number of upper case letters in text.

Feature methods¶

`FeatureBase.rename`(name)	Rename Feature, returns copy
`FeatureBase.get_depth`([stop_at])	Returns depth of feature

Feature calculation¶

calculate_feature_matrix(features[, …])

Calculates a matrix for a given set of instance ids and calculation times.

Feature visualization¶

graph_feature(feature[, to_file])

Generates a feature lineage graph for the given feature

Feature encoding¶

encode_features(feature_matrix, features[, …])

Encode categorical features

Saving and Loading Features¶

`save_features`(features[, location, profile_name])	Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string.
`load_features`(features[, profile_name])	Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string.

EntitySet, Entity, Relationship, Variable Types¶

Constructors¶

`EntitySet`([id, entities, relationships])	Stores all actual data for a entityset
`Entity`(id, df, entityset[, variable_types, …])	Represents an entity in a Entityset, and stores relevant metadata and data
`Relationship`(parent_variable, child_variable)	Class to represent an relationship between entities

EntitySet load and prepare data¶

`EntitySet.entity_from_dataframe`(entity_id, …)	Load the data for a specified entity from a Pandas DataFrame.
`EntitySet.add_relationship`(relationship)	Add a new relationship between entities in the entityset
`EntitySet.normalize_entity`(base_entity_id, …)	Create a new entity and relationship from unique values of an existing variable.
`EntitySet.add_interesting_values`([…])	Find interesting values for categorical variables, to be used to generate “where” clauses

EntitySet serialization¶

read_entityset(path[, profile_name])

Read entityset from disk, S3 path, or URL.

`EntitySet.to_csv`(path[, sep, encoding, …])	Write entityset to disk in the csv format, location specified by path.
`EntitySet.to_pickle`(path[, compression, …])	Write entityset in the pickle format, location specified by path.
`EntitySet.to_parquet`(path[, engine, …])	Write entityset to disk in the parquet format, location specified by path.

EntitySet query methods¶

`EntitySet.__getitem__`(entity_id)	Get entity instance from entityset
`EntitySet.find_backward_paths`(…)	Generator which yields all backward paths between a start and goal entity.
`EntitySet.find_forward_paths`(…)	Generator which yields all forward paths between a start and goal entity.
`EntitySet.get_forward_entities`(entity_id[, deep])	Get entities that are in a forward relationship with entity
`EntitySet.get_backward_entities`(entity_id[, …])	Get entities that are in a backward relationship with entity

EntitySet visualization¶

EntitySet.plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

Entity methods¶

`Entity.convert_variable_type`(variable_id, …)	Convert variable in dataframe to different type
`Entity.add_interesting_values`([max_values, …])	Find interesting values for categorical variables, to be used to

Relationship attributes¶

`Relationship.parent_variable`	Instance of variable in parent entity
`Relationship.child_variable`	Instance of variable in child entity
`Relationship.parent_entity`	Parent entity object
`Relationship.child_entity`	Child entity object

Variable types¶

`Index`(id, entity[, name])	Represents variables that uniquely identify an instance of an entity
`Id`(id, entity[, name, categories])	Represents variables that identify another entity
`TimeIndex`(id, entity[, name])	Represents time index of entity
`DatetimeTimeIndex`(id, entity[, name, format])	Represents time index of entity that is a datetime
`NumericTimeIndex`(id, entity[, name, range, …])	Represents time index of entity that is numeric
`Datetime`(id, entity[, name, format])	Represents variables that are points in time
`Numeric`(id, entity[, name, range, …])	Represents variables that contain numeric values
`Categorical`(id, entity[, name, categories])	Represents variables that can take an unordered discrete values
`Ordinal`(id, entity[, name])	Represents variables that take on an ordered discrete value
`Boolean`(id, entity[, name, true_values, …])	Represents variables that take on one of two values
`Text`(id, entity[, name])	Represents variables that are arbitary strings
`LatLong`(id, entity[, name])	Represents an ordered pair (Latitude, Longitude) To make a latlong in a dataframe do data[‘latlong’] = data[[‘latitude’, ‘longitude’]].apply(tuple, axis=1)
`ZIPCode`(id, entity[, name, categories])	Represents a postal address in the United States.
`IPAddress`(id, entity[, name])	Represents a computer network address.
`FullName`(id, entity[, name])	Represents a person’s full name.
`EmailAddress`(id, entity[, name])	Represents an email box to which email message are sent.
`URL`(id, entity[, name])	Represents a valid web url (with or without http/www)
`PhoneNumber`(id, entity[, name])	Represents any valid phone number.
`DateOfBirth`(id, entity[, name, format])	Represents a date of birth as a datetime
`CountryCode`(id, entity[, name, categories])	Represents an ISO-3166 standard country code.
`SubRegionCode`(id, entity[, name, categories])	Represents an ISO-3166 standard sub-region code.
`FilePath`(id, entity[, name])	Represents a valid filepath, absolute or relative

Variable Utils Methods¶

`find_variable_types`()	Retrieves all Variable types as a dictionary where key is type_string
`list_variable_types`()	Retrieves all Variable types as a dataframe, with the column headers
`graph_variable_types`([to_file])	Create a UML diagram-ish graph of all the Variables.

Feature Selection¶

remove_low_information_features(feature_matrix)

Select features that have at least 2 unique values and that are not all null