API Reference#
Demo Datasets#
  | 
Returns the retail entityset example.  | 
  | 
Return dataframes of mock customer data  | 
  | 
Download, clean, and filter flight data from 2017.  | 
  | 
Load the Australian daily-min-temperatures weather dataset.  | 
Deep Feature Synthesis#
  | 
Calculates a feature matrix and features given a dictionary of dataframes and a list of relationships.  | 
  | 
Returns two lists of primitives (transform and aggregation) containing primitives that can be applied to the specific target dataframe to create features.  | 
Wrappers#
scikit-learn (BETA)#
  | 
Transformer using Scikit-Learn interface for Pipeline uses.  | 
Timedelta#
  | 
Represents differences in time.  | 
Time utils#
  | 
Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.  | 
Feature Primitives#
Primitive Types#
Feature for dataframe that is a based off one or more other features in that dataframe.  | 
|
Aggregation Primitives#
  | 
Calculates if all values are 'True' in a list.  | 
  | 
Determines if any value is 'True' in a list.  | 
  | 
Computes the average number of seconds between consecutive events.  | 
  | 
Determines the total number of values, excluding NaN.  | 
  | 
Calculates the number of values that are above the mean.  | 
  | 
Determines the number of values that are below the mean.  | 
  | 
Determines the number of values greater than a controllable threshold.  | 
  | 
Determines the count of observations that lie inside  | 
  | 
Determines the number of values that fall within a certain range.  | 
  | 
Determines the number of values less than a controllable threshold.  | 
  | 
Determines the number of observations that lie outside  | 
  | 
Determines the number of values that fall outside a certain range.  | 
  | 
Calculates the entropy for a categorical column  | 
  | 
Determines the first value in a list.  | 
  | 
Determines the last value in a list.  | 
  | 
Calculates the highest value, ignoring NaN values.  | 
Determines the maximum number of consecutive False values in the input  | 
|
  | 
Determines the maximum number of consecutive negative values in the input  | 
  | 
Determines the maximum number of consecutive positive values in the input  | 
Determines the maximum number of consecutive True values in the input  | 
|
  | 
Determines the maximum number of consecutive zero values in the input  | 
  | 
Computes the average for a list of values.  | 
  | 
Determines the middlemost number in a list of values.  | 
  | 
Calculates the smallest value, ignoring NaN values.  | 
  | 
Determines the most commonly repeated value.  | 
  | 
Determines the n most common elements.  | 
  | 
Determines the length of the longest subsequence above the mean.  | 
  | 
Determines the length of the longest subsequence below the mean.  | 
  | 
Counts the number of True values.  | 
Determines the number of distinct values, ignoring NaN values.  | 
|
Determines the percent of True values.  | 
|
  | 
Computes the extent to which a distribution differs from a normal distribution.  | 
  | 
Computes the dispersion relative to the mean value, ignoring NaN.  | 
  | 
Calculates the total addition, ignoring NaN.  | 
  | 
Calculates the time elapsed since the first datetime (in seconds).  | 
  | 
Calculates the time elapsed since the last datetime (default in seconds).  | 
Calculates the time since the last False value.  | 
|
Calculates the time since the maximum value occurred.  | 
|
Calculates the time since the minimum value occurred.  | 
|
Calculates the time since the last True value.  | 
|
  | 
Calculates the trend of a column over time.  | 
Transform Primitives#
Binary Transform Primitives#
Performs element-wise addition of two lists.  | 
|
  | 
Adds a scalar to each value in the list.  | 
  | 
Divides a scalar by each value in the list.  | 
  | 
Divides each element in the list by a scalar.  | 
  | 
Determines if values in one list are equal to another list.  | 
  | 
Determines if values in a list are equal to a given scalar.  | 
Determines if values in one list are greater than another list.  | 
|
Determines if values in one list are greater than or equal to another list.  | 
|
  | 
Determines if values are greater than or equal to a given scalar.  | 
  | 
Determines if values are greater than a given scalar.  | 
  | 
Determines if values in one list are less than another list.  | 
Determines if values in one list are less than or equal to another list.  | 
|
  | 
Determines if values are less than or equal to a given scalar.  | 
  | 
Determines if values are less than a given scalar.  | 
  | 
Computes the modulo of a scalar by each element in a list.  | 
Performs element-wise modulo of two lists.  | 
|
  | 
Computes the modulo of each element in the list by a given scalar.  | 
Performs element-wise multiplication of two lists of boolean values.  | 
|
Performs element-wise multiplication of a numeric list with a boolean list.  | 
|
  | 
Multiplies each element in the list by a scalar.  | 
  | 
Determines if values in one list are not equal to another list.  | 
  | 
Determines if values in a list are not equal to a given scalar.  | 
  | 
Subtracts each value in the list from a given scalar.  | 
  | 
Performs element-wise subtraction of two lists.  | 
  | 
Subtracts a scalar from each element in the list.  | 
Combine features#
  | 
Determines whether a value is present in a provided list.  | 
  | 
Performs element-wise logical AND of two lists.  | 
  | 
Performs element-wise logical OR of two lists.  | 
  | 
Negates a boolean value.  | 
Cumulative Transform Primitives#
  | 
Computes the difference between the value in a list and the previous value in that list.  | 
  | 
Computes the timedelta between a datetime in a list and the previous datetime in that list.  | 
  | 
Computes the time since the previous entry in a list.  | 
  | 
Calculates the cumulative count.  | 
  | 
Calculates the cumulative sum.  | 
  | 
Calculates the cumulative mean.  | 
  | 
Calculates the cumulative minimum.  | 
  | 
Calculates the cumulative maximum.  | 
Datetime Transform Primitives#
  | 
Calculates the age in years as a floating point number given a  | 
  | 
Transforms time of an instance into the holiday name, if there is one.  | 
Determines the timezone of a datetime.  | 
|
  | 
Determines the day of the month from a datetime.  | 
Determines the ordinal day of the year from the given datetime  | 
|
Determines the day of the month from a datetime.  | 
|
  | 
Computes the number of days before or after a given holiday.  | 
  | 
Determines the hour value of a datetime.  | 
  | 
Determines if a given datetime is a federal holiday.  | 
Determines the is_leap_year attribute of a datetime column.  | 
|
  | 
Determines if a datetime falls during configurable lunch hour, on a 24-hour clock.  | 
Determines the is_month_end attribute of a datetime column.  | 
|
Determines the is_month_start attribute of a datetime column.  | 
|
Determines the is_quarter_end attribute of a datetime column.  | 
|
Determines the is_quarter_start attribute of a datetime column.  | 
|
Determines if a date falls on a weekend.  | 
|
  | 
Determines if a datetime falls during working hours on a 24-hour clock.  | 
Determines if a date falls on the end of a year.  | 
|
Determines if a date falls on the start of a year.  | 
|
  | 
Determines the minutes value of a datetime.  | 
  | 
Determines the month value of a datetime.  | 
Determines the part of day of a datetime.  | 
|
  | 
Determines the quarter a datetime column falls into (1, 2, 3, 4)  | 
  | 
Determines the season of a given datetime.  | 
  | 
Determines the seconds value of a datetime.  | 
  | 
Determines the week of the year from a datetime.  | 
  | 
Determines the day of the week from a datetime.  | 
  | 
Determines the year value of a datetime.  | 
Email and URL Transform Primitives#
Determines the domain of an email  | 
|
Determines if an email address is from a free email domain.  | 
|
Determines the domain of a url.  | 
|
Determines the protocol (http or https) of a url.  | 
|
  | 
Determines the top level domain of a url.  | 
Exponential Transform Primitives#
  | 
Computes the exponentially weighted moving average for a series of numbers  | 
  | 
Computes the exponentially weighted moving standard deviation for a series of numbers  | 
  | 
Computes the exponentially weighted moving variance for a series of numbers  | 
General Transform Primitives#
  | 
Calculates the absolute difference from the previous element  | 
  | 
Computes the absolute value of a number.  | 
  | 
Computes the cosine of a number.  | 
  | 
Determines if a value is null.  | 
Computes the natural logarithm of a number.  | 
|
  | 
Negates a numeric value.  | 
Determines the percentile rank for each value in a list.  | 
|
Computes the rate of change of a value per second.  | 
|
  | 
Determines if a value is equal to the previous value in a list.  | 
  | 
Computes the sine of a number.  | 
Computes the square root of a number.  | 
|
  | 
Computes the tangent of a number.  | 
  | 
Calculates the variance of a list of numbers.  | 
Location Transform Primitives#
  | 
Calculates the distance between points in a city road grid.  | 
Determines the geographic center of two coordinates.  | 
|
  | 
Calculates the approximate haversine distance between two LatLong columns.  | 
  | 
Determines if coordinates are inside a box defined by two corner coordinate points.  | 
  | 
Returns the first tuple value in a list of LatLong tuples.  | 
Returns the second tuple value in a list of LatLong tuples.  | 
NaturalLanguage Transform Primitives#
  | 
Determines how many times a given string shows up in a text field.  | 
Determines the mean number of characters per word.  | 
|
  | 
Determines the median word length.  | 
Calculates the number of characters in a given string, including whitespace and punctuation.  | 
|
  | 
Calculates the number of unique separators.  | 
  | 
Determines the number of words in a string.  | 
  | 
Determines the number of common words in a string.  | 
Determines the number of hashtags in a string.  | 
|
Determines the number of mentions in a string.  | 
|
  | 
Determines the number of unique words in a string.  | 
  | 
Determines the number of words in quotes in a string.  | 
Determines number of punctuation characters in a string.  | 
|
Determines the number of title words in a string.  | 
|
  | 
Determines the total word length.  | 
Calculates the number of upper case letters in text.  | 
|
Determines the number of words in a string that are entirely capitalized.  | 
|
Calculates number of whitespaces in a string.  | 
Postal Code Primitives#
Returns the one digit prefix of a given postal code.  | 
|
Returns the two digit prefix of a given postal code.  | 
Time Series Transform Primitives#
  | 
Computes the expanding count of events over a given window.  | 
  | 
Computes the expanding maximum of events over a given window.  | 
  | 
Computes the expanding mean of events over a given window.  | 
  | 
Computes the expanding minimum of events over a given window.  | 
  | 
Computes the expanding standard deviation for events over a given window.  | 
  | 
Computes the expanding trend for events over a given window.  | 
  | 
Shifts an array of values by a specified number of periods.  | 
  | 
Determines a rolling count of events over a given window.  | 
  | 
Determines the maximum of entries over a given window.  | 
  | 
Calculates the mean of entries over a given window.  | 
  | 
Determines the minimum of entries over a given window.  | 
  | 
Determines how many values are outliers over a given window.  | 
  | 
Calculates the standard deviation of entries over a given window.  | 
  | 
Calculates the trend of a given window of entries of a column over time.  | 
Natural Language Processing Primitives#
Natural Language Processing primitives create features for textual data. For more information on how to use and install these primitives, see here.
Primitives in standard install#
Calculates the overall complexity of the text based on the total  | 
|
  | 
Calculates the Latent Semantic Analysis Values of NaturalLanguage Input  | 
Calculates the occurences of each different part of speech.  | 
|
Calculates the polarity of a text on a scale from -1 (negative) to 1 (positive)  | 
|
Determines number of stopwords in a string.  | 
Primitives that require installing tensorflow#
  | 
Transforms a sentence or short paragraph using deep contextualized langauge representations.  | 
Transforms a sentence or short paragraph to a vector using [tfhub model](https://tfhub.dev/google/universal-sentence-encoder/2)  | 
Feature methods#
  | 
Rename Feature, returns copy.  | 
  | 
Returns depth of feature  | 
Feature calculation#
  | 
Calculates a matrix for a given set of instance ids and calculation times.  | 
Feature descriptions#
  | 
Generates an English language description of a feature.  | 
Feature visualization#
  | 
Generates a feature lineage graph for the given feature  | 
Feature encoding#
  | 
Encode categorical features  | 
Feature Selection#
  | 
Select features that have at least 2 unique values and that are not all null  | 
  | 
Removes columns in feature matrix that are highly correlated with another column.  | 
  | 
Removes columns from a feature matrix that have higher than a set threshold of null values.  | 
  | 
Removes columns in feature matrix where all the values are the same.  | 
Feature Matrix utils#
  | 
Replace all   | 
Saving and Loading Features#
  | 
Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string.  | 
  | 
Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string.  | 
EntitySet, Relationship#
Constructors#
  | 
Stores all actual data and typing information for an entityset  | 
  | 
Class to represent a relationship between dataframes  | 
EntitySet load and prepare data#
  | 
Add a DataFrame to the EntitySet with Woodwork typing information.  | 
Find or set interesting values for categorical columns, to be used to generate "where" clauses  | 
|
Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).  | 
|
  | 
Add a new relationship between dataframes in the entityset.  | 
  | 
Add multiple new relationships to a entityset  | 
  | 
Combine entityset with another to create a new entityset with the combined data of both entitysets.  | 
  | 
Create a new dataframe and relationship from unique values of an existing column.  | 
Set the secondary time index for a dataframe in the EntitySet using its dataframe name.  | 
|
  | 
Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.  | 
EntitySet serialization#
  | 
Read entityset from disk, S3 path, or URL.  | 
  | 
Write entityset to disk in the csv format, location specified by path.  | 
  | 
Write entityset in the pickle format, location specified by path.  | 
  | 
Write entityset to disk in the parquet format, location specified by path.  | 
EntitySet query methods#
  | 
Get dataframe instance from entityset  | 
Generator which yields all backward paths between a start and goal dataframe.  | 
|
Generator which yields all forward paths between a start and goal dataframe.  | 
|
  | 
Get dataframes that are in a forward relationship with dataframe  | 
  | 
Get dataframes that are in a backward relationship with dataframe  | 
  | 
Query instances that have column with given value  | 
EntitySet visualization#
  | 
Create a UML diagram-ish graph of the EntitySet.  | 
Relationship attributes#
Column in parent dataframe  | 
|
Column in child dataframe  | 
|
Parent dataframe object  | 
|
Child dataframe object  | 
Data Type Util Methods#
Returns a dataframe describing all of the available Logical Types.  | 
|
Returns a dataframe describing all of the common semantic tags.  | 
Primitive Util Methods#
  | 
Get a list of recommended primitives given an entity set.  | 
Returns a DataFrame that lists and describes each built-in primitive.  | 
|
Returns a metrics summary DataFrame of all primitives found in list_primitives.  |