API Reference#

Demo Datasets#

`load_retail`([id, nrows, return_single_table])	Returns the retail entityset example.
`load_mock_customer`([n_customers, ...])	Return dataframes of mock customer data
`load_flight`([month_filter, ...])	Download, clean, and filter flight data from 2017.
`load_weather`([nrows, return_single_table])	Load the Australian daily-min-temperatures weather dataset.

Deep Feature Synthesis#

`dfs`([dataframes, relationships, entityset, ...])	Calculates a feature matrix and features given a dictionary of dataframes and a list of relationships.
`get_valid_primitives`(entityset, ...[, ...])	Returns two lists of primitives (transform and aggregation) containing primitives that can be applied to the specific target dataframe to create features.

Timedelta#

Timedelta(value[, unit, delta_obj])

Represents differences in time.

Time utils#

make_temporal_cutoffs(instance_ids, cutoffs)

Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.

Feature Primitives#

A list of all Featuretools primitives can be obtained by visiting primitives.featurelabs.com.

Primitive Types#

`TransformPrimitive`()	Feature for dataframe that is a based off one or more other features in that dataframe.
`AggregationPrimitive`()

Aggregation Primitives#

`Count`()	Determines the total number of values, excluding NaN.
`Mean`([skipna])	Computes the average for a list of values.
`Sum`()	Calculates the total addition, ignoring NaN.
`Min`()	Calculates the smallest value, ignoring NaN values.
`Max`()	Calculates the highest value, ignoring NaN values.
`Std`()	Computes the dispersion relative to the mean value, ignoring NaN.
`Median`()	Determines the middlemost number in a list of values.
`Mode`()	Determines the most commonly repeated value.
`AvgTimeBetween`([unit])	Computes the average number of seconds between consecutive events.
`TimeSinceLast`([unit])	Calculates the time elapsed since the last datetime (default in seconds).
`TimeSinceFirst`([unit])	Calculates the time elapsed since the first datetime (in seconds).
`NumUnique`()	Determines the number of distinct values, ignoring NaN values.
`PercentTrue`()	Determines the percent of True values.
`All`()	Calculates if all values are 'True' in a list.
`Any`()	Determines if any value is 'True' in a list.
`First`()	Determines the first value in a list.
`Last`()	Determines the last value in a list.
`Skew`()	Computes the extent to which a distribution differs from a normal distribution.
`Trend`()	Calculates the trend of a column over time.
`Entropy`([dropna, base])	Calculates the entropy for a categorical column

Transform Primitives#

Combine features#

`IsIn`([list_of_outputs])	Determines whether a value is present in a provided list.
`And`()	Performs element-wise logical AND of two lists.
`Or`()	Performs element-wise logical OR of two lists.
`Not`()	Negates a boolean value.

General Transform Primitives#

`Absolute`()	Computes the absolute value of a number.
`SquareRoot`()	Computes the square root of a number.
`NaturalLogarithm`()	Computes the natural logarithm of a number.
`Sine`()	Computes the sine of a number.
`Cosine`()	Computes the cosine of a number.
`Tangent`()	Computes the tangent of a number.
`Percentile`()	Determines the percentile rank for each value in a list.
`TimeSince`([unit])	Calculates time from a value to a specified cutoff datetime.

Datetime Transform Primitives#

`Second`()	Determines the seconds value of a datetime.
`Minute`()	Determines the minutes value of a datetime.
`Weekday`()	Determines the day of the week from a datetime.
`IsLeapYear`()	Determines the is_leap_year attribute of a datetime column.
`IsLunchTime`([lunch_hour])	Determines if a datetime falls during configurable lunch hour, on a 24-hour clock.
`IsMonthEnd`()	Determines the is_month_end attribute of a datetime column.
`IsMonthStart`()	Determines the is_month_start attribute of a datetime column.
`IsQuarterEnd`()	Determines the is_quarter_end attribute of a datetime column.
`IsQuarterStart`()	Determines the is_quarter_start attribute of a datetime column.
`IsWeekend`()	Determines if a date falls on a weekend.
`IsWorkingHours`([start_hour, end_hour])	Determines if a datetime falls during working hours on a 24-hour clock.
`IsYearEnd`()	Determines if a date falls on the end of a year.
`IsYearStart`()	Determines if a date falls on the start of a year.
`Hour`()	Determines the hour value of a datetime.
`Day`()	Determines the day of the month from a datetime.
`DayOfYear`()	Determines the ordinal day of the year from the given datetime
`DaysInMonth`()	Determines the day of the month from a datetime.
`Week`()	Determines the week of the year from a datetime.
`Month`()	Determines the month value of a datetime.
`PartOfDay`()	Determines the part of day of a datetime.
`Quarter`()	Determines the quarter a datetime column falls into (1, 2, 3, 4)
`Year`()	Determines the year value of a datetime.

Rolling Transform Primitives#

`RollingCount`([window_length, gap, min_periods])	Determines a rolling count of events over a given window.
`RollingMax`([window_length, gap, min_periods])	Determines the maximum of entries over a given window.
`RollingMean`([window_length, gap, min_periods])	Calculates the mean of entries over a given window.
`RollingMin`([window_length, gap, min_periods])	Determines the minimum of entries over a given window.
`RollingSTD`([window_length, gap, min_periods])	Calculates the standard deviation of entries over a given window.
`RollingTrend`([window_length, gap, min_periods])	Calculates the trend of a given window of entries of a column over time.

NaturalLanguage Transform Primitives#

`NumCharacters`()	Calculates the number of characters in a string.
`NumWords`()	Determines the number of words in a string by counting the spaces.

Location Transform Primitives#

`CityblockDistance`([unit])	Calculates the distance between points in a city road grid.
`GeoMidpoint`()	Determines the geographic center of two coordinates.
`Haversine`([unit])	Calculates the approximate haversine distance between two LatLong columns.
`IsInGeoBox`([point1, point2])	Determines if coordinates are inside a box defined by two corner coordinate points.
`Latitude`()	Returns the first tuple value in a list of LatLong tuples.
`Longitude`()	Returns the second tuple value in a list of LatLong tuples.

Cumulative Transform Primitives#

`Diff`([periods])	Computes the difference between the value in a list and the previous value in that list.
`DiffDatetime`([periods])	Computes the timedelta between a datetime in a list and the previous datetime in that list.
`TimeSincePrevious`([unit])	Computes the time since the previous entry in a list.
`CumCount`()	Calculates the cumulative count.
`CumSum`()	Calculates the cumulative sum.
`CumMean`()	Calculates the cumulative mean.
`CumMin`()	Calculates the cumulative minimum.
`CumMax`()	Calculates the cumulative maximum.

Natural Language Processing Primitives#

Natural Language Processing primitives create features for textual data. For more information on how to use and install these primitives, see here.

Primitives in standard install#

`CountString`([string, ignore_case, ...])	Determines how many times a given string shows up in a text field.
`DiversityScore`()	Calculates the overall complexity of the text based on the total
`LSA`([random_seed, corpus, algorithm])	Calculates the Latent Semantic Analysis Values of NaturalLanguage Input
`MeanCharactersPerWord`()	Determines the mean number of characters per word.
`MedianWordLength`([delimiters_regex])	Determines the median word length.
`NumUniqueSeparators`([separators])	Calculates the number of unique separators.
`NumberOfCommonWords`([word_set, delimiters_regex])	Determines the number of common words in a string.
`PartOfSpeechCount`()	Calculates the occurences of each different part of speech.
`PolarityScore`()	Calculates the polarity of a text on a scale from -1 (negative) to 1 (positive)
`PunctuationCount`()	Determines number of punctuation characters in a string.
`StopwordCount`()	Determines number of stopwords in a string.
`TitleWordCount`()	Determines the number of title words in a string.
`TotalWordLength`([delimiters_regex])	Determines the total word length.
`UpperCaseCount`()	Calculates the number of upper case letters in text.
`WhitespaceCount`()	Calculates number of whitespaces in a string.

Primitives that require installing tensorflow#

`Elmo`()	Transforms a sentence or short paragraph using deep contextualized langauge representations.
`UniversalSentenceEncoder`()	Transforms a sentence or short paragraph to a vector using [tfhub model](https://tfhub.dev/google/universal-sentence-encoder/2)

Feature methods#

`FeatureBase.rename`(name)	Rename Feature, returns copy.
`FeatureBase.get_depth`([stop_at])	Returns depth of feature

Feature calculation#

calculate_feature_matrix(features[, ...])

Calculates a matrix for a given set of instance ids and calculation times.

Feature descriptions#

describe_feature(feature[, ...])

Generates an English language description of a feature.

Feature visualization#

graph_feature(feature[, to_file, description])

Generates a feature lineage graph for the given feature

Feature encoding#

encode_features(feature_matrix, features[, ...])

Encode categorical features

Feature Selection#

`remove_low_information_features`(feature_matrix)	Select features that have at least 2 unique values and that are not all null
`remove_highly_correlated_features`(feature_matrix)	Removes columns in feature matrix that are highly correlated with another column.
`remove_highly_null_features`(feature_matrix)	Removes columns from a feature matrix that have higher than a set threshold of null values.
`remove_single_value_features`(feature_matrix)	Removes columns in feature matrix where all the values are the same.

Feature Matrix utils#

replace_inf_values(feature_matrix[, ...])

Replace all np.inf values in a feature matrix with the specified replacement value.

Saving and Loading Features#

`save_features`(features[, location, profile_name])	Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string.
`load_features`(features[, profile_name])	Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string.

EntitySet, Relationship#

Constructors#

`EntitySet`([id, dataframes, relationships])	Stores all actual data and typing information for an entityset
`Relationship`(entityset, ...)	Class to represent a relationship between dataframes

EntitySet load and prepare data#

`EntitySet.add_dataframe`(dataframe[, ...])	Add a DataFrame to the EntitySet with Woodwork typing information.
`EntitySet.add_interesting_values`([...])	Find or set interesting values for categorical columns, to be used to generate "where" clauses
`EntitySet.add_last_time_indexes`([...])	Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).
`EntitySet.add_relationship`([...])	Add a new relationship between dataframes in the entityset.
`EntitySet.add_relationships`(relationships)	Add multiple new relationships to a entityset
`EntitySet.concat`(other[, inplace])	Combine entityset with another to create a new entityset with the combined data of both entitysets.
`EntitySet.normalize_dataframe`(...[, ...])	Create a new dataframe and relationship from unique values of an existing column.
`EntitySet.set_secondary_time_index`(...)	Set the secondary time index for a dataframe in the EntitySet using its dataframe name.
`EntitySet.replace_dataframe`(dataframe_name, df)	Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.

EntitySet serialization#

read_entityset(path[, profile_name])

Read entityset from disk, S3 path, or URL.

`EntitySet.to_csv`(path[, sep, encoding, ...])	Write entityset to disk in the csv format, location specified by path.
`EntitySet.to_pickle`(path[, compression, ...])	Write entityset in the pickle format, location specified by path.
`EntitySet.to_parquet`(path[, engine, ...])	Write entityset to disk in the parquet format, location specified by path.

EntitySet query methods#

`EntitySet.__getitem__`(dataframe_name)	Get dataframe instance from entityset
`EntitySet.find_backward_paths`(...)	Generator which yields all backward paths between a start and goal dataframe.
`EntitySet.find_forward_paths`(...)	Generator which yields all forward paths between a start and goal dataframe.
`EntitySet.get_forward_dataframes`(dataframe_name)	Get dataframes that are in a forward relationship with dataframe
`EntitySet.get_backward_dataframes`(dataframe_name)	Get dataframes that are in a backward relationship with dataframe
`EntitySet.query_by_values`(dataframe_name, ...)	Query instances that have column with given value

EntitySet visualization#

EntitySet.plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

Relationship attributes#

`Relationship.parent_column`	Column in parent dataframe
`Relationship.child_column`	Column in child dataframe
`Relationship.parent_dataframe`	Parent dataframe object
`Relationship.child_dataframe`	Child dataframe object

Data Type Util Methods#

`list_logical_types`()	Returns a dataframe describing all of the available Logical Types.
`list_semantic_tags`()	Returns a dataframe describing all of the common semantic tags.

Primitive Util Methods#

`list_primitives`()	Returns a DataFrame that lists and describes each built-in primitive.
`summarize_primitives`()	Returns a metrics summary DataFrame of all primitives found in list_primitives.

Table of Contents

Previous topic

Next topic

This Page

API Reference#

Demo Datasets#

Deep Feature Synthesis#

Timedelta#

Time utils#

Feature Primitives#

Primitive Types#

Aggregation Primitives#

Transform Primitives#

Combine features#

General Transform Primitives#

Datetime Transform Primitives#

Rolling Transform Primitives#

NaturalLanguage Transform Primitives#

Location Transform Primitives#

Cumulative Transform Primitives#

Natural Language Processing Primitives#

Primitives in standard install#

Primitives that require installing tensorflow#

Feature methods#

Feature calculation#

Feature descriptions#

Feature visualization#

Feature encoding#

Feature Selection#

Feature Matrix utils#

Saving and Loading Features#

EntitySet, Relationship#

Constructors#

EntitySet load and prepare data#

EntitySet serialization#

EntitySet query methods#

EntitySet visualization#

Relationship attributes#

Data Type Util Methods#

Primitive Util Methods#

Table of Contents

Previous topic

Next topic

This Page

Quick search

API Reference#

Demo Datasets#

Deep Feature Synthesis#

Timedelta#

Time utils#

Feature Primitives#

Primitive Types#

Aggregation Primitives#

Transform Primitives#

Combine features#

General Transform Primitives#

Datetime Transform Primitives#

Rolling Transform Primitives#

NaturalLanguage Transform Primitives#

Location Transform Primitives#

Cumulative Transform Primitives#

Natural Language Processing Primitives#

Primitives in standard install#

Primitives that require installing tensorflow#

Feature methods#

Feature calculation#

Feature descriptions#

Feature visualization#

Feature encoding#

Feature Selection#

Feature Matrix utils#

Saving and Loading Features#

EntitySet, Relationship#

Constructors#

EntitySet load and prepare data#

EntitySet serialization#

EntitySet query methods#

EntitySet visualization#

Relationship attributes#

Data Type Util Methods#

Primitive Util Methods#