API Reference#
Demo Datasets#
|
Returns the retail entityset example. |
|
Return dataframes of mock customer data |
|
Download, clean, and filter flight data from 2017. |
|
Load the Australian daily-min-temperatures weather dataset. |
Deep Feature Synthesis#
|
Calculates a feature matrix and features given a dictionary of dataframes and a list of relationships. |
|
Returns two lists of primitives (transform and aggregation) containing primitives that can be applied to the specific target dataframe to create features. |
Wrappers#
scikit-learn (BETA)#
|
Transformer using Scikit-Learn interface for Pipeline uses. |
Timedelta#
|
Represents differences in time. |
Time utils#
|
Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids. |
Feature Primitives#
Primitive Types#
Feature for dataframe that is a based off one or more other features in that dataframe. |
|
Aggregation Primitives#
|
Calculates if all values are 'True' in a list. |
|
Determines if any value is 'True' in a list. |
|
Computes the average number of seconds between consecutive events. |
|
Determines the total number of values, excluding NaN. |
|
Calculates the number of values that are above the mean. |
|
Determines the number of values that are below the mean. |
|
Determines the number of values greater than a controllable threshold. |
|
Determines the count of observations that lie inside |
|
Determines the number of values that fall within a certain range. |
|
Determines the number of values less than a controllable threshold. |
|
Determines the number of observations that lie outside |
|
Determines the number of values that fall outside a certain range. |
|
Calculates the entropy for a categorical column |
|
Determines the first value in a list. |
|
Determines the last value in a list. |
|
Calculates the highest value, ignoring NaN values. |
Determines the maximum number of consecutive False values in the input |
|
|
Determines the maximum number of consecutive negative values in the input |
|
Determines the maximum number of consecutive positive values in the input |
Determines the maximum number of consecutive True values in the input |
|
|
Determines the maximum number of consecutive zero values in the input |
|
Computes the average for a list of values. |
|
Determines the middlemost number in a list of values. |
|
Calculates the smallest value, ignoring NaN values. |
|
Determines the most commonly repeated value. |
|
Determines the n most common elements. |
|
Determines the length of the longest subsequence above the mean. |
|
Determines the length of the longest subsequence below the mean. |
|
Counts the number of True values. |
Determines the number of distinct values, ignoring NaN values. |
|
Determines the percent of True values. |
|
|
Computes the extent to which a distribution differs from a normal distribution. |
|
Computes the dispersion relative to the mean value, ignoring NaN. |
|
Calculates the total addition, ignoring NaN. |
|
Calculates the time elapsed since the first datetime (in seconds). |
|
Calculates the time elapsed since the last datetime (default in seconds). |
Calculates the time since the last False value. |
|
Calculates the time since the maximum value occurred. |
|
Calculates the time since the minimum value occurred. |
|
Calculates the time since the last True value. |
|
|
Calculates the trend of a column over time. |
Transform Primitives#
Binary Transform Primitives#
Performs element-wise addition of two lists. |
|
|
Adds a scalar to each value in the list. |
|
Divides a scalar by each value in the list. |
|
Divides each element in the list by a scalar. |
|
Determines if values in one list are equal to another list. |
|
Determines if values in a list are equal to a given scalar. |
Determines if values in one list are greater than another list. |
|
Determines if values in one list are greater than or equal to another list. |
|
|
Determines if values are greater than or equal to a given scalar. |
|
Determines if values are greater than a given scalar. |
|
Determines if values in one list are less than another list. |
Determines if values in one list are less than or equal to another list. |
|
|
Determines if values are less than or equal to a given scalar. |
|
Determines if values are less than a given scalar. |
|
Computes the modulo of a scalar by each element in a list. |
Performs element-wise modulo of two lists. |
|
|
Computes the modulo of each element in the list by a given scalar. |
Performs element-wise multiplication of two lists of boolean values. |
|
Performs element-wise multiplication of a numeric list with a boolean list. |
|
|
Multiplies each element in the list by a scalar. |
|
Determines if values in one list are not equal to another list. |
|
Determines if values in a list are not equal to a given scalar. |
|
Subtracts each value in the list from a given scalar. |
|
Performs element-wise subtraction of two lists. |
|
Subtracts a scalar from each element in the list. |
Combine features#
|
Determines whether a value is present in a provided list. |
|
Performs element-wise logical AND of two lists. |
|
Performs element-wise logical OR of two lists. |
|
Negates a boolean value. |
Cumulative Transform Primitives#
|
Computes the difference between the value in a list and the previous value in that list. |
|
Computes the timedelta between a datetime in a list and the previous datetime in that list. |
|
Computes the time since the previous entry in a list. |
|
Calculates the cumulative count. |
|
Calculates the cumulative sum. |
|
Calculates the cumulative mean. |
|
Calculates the cumulative minimum. |
|
Calculates the cumulative maximum. |
Datetime Transform Primitives#
|
Calculates the age in years as a floating point number given a |
|
Transforms time of an instance into the holiday name, if there is one. |
Determines the timezone of a datetime. |
|
|
Determines the day of the month from a datetime. |
Determines the ordinal day of the year from the given datetime |
|
Determines the day of the month from a datetime. |
|
|
Computes the number of days before or after a given holiday. |
|
Determines the hour value of a datetime. |
|
Determines if a given datetime is a federal holiday. |
Determines the is_leap_year attribute of a datetime column. |
|
|
Determines if a datetime falls during configurable lunch hour, on a 24-hour clock. |
Determines the is_month_end attribute of a datetime column. |
|
Determines the is_month_start attribute of a datetime column. |
|
Determines the is_quarter_end attribute of a datetime column. |
|
Determines the is_quarter_start attribute of a datetime column. |
|
Determines if a date falls on a weekend. |
|
|
Determines if a datetime falls during working hours on a 24-hour clock. |
Determines if a date falls on the end of a year. |
|
Determines if a date falls on the start of a year. |
|
|
Determines the minutes value of a datetime. |
|
Determines the month value of a datetime. |
Determines the part of day of a datetime. |
|
|
Determines the quarter a datetime column falls into (1, 2, 3, 4) |
|
Determines the season of a given datetime. |
|
Determines the seconds value of a datetime. |
|
Determines the week of the year from a datetime. |
|
Determines the day of the week from a datetime. |
|
Determines the year value of a datetime. |
Email and URL Transform Primitives#
Determines the domain of an email |
|
Determines if an email address is from a free email domain. |
|
Determines the domain of a url. |
|
Determines the protocol (http or https) of a url. |
|
|
Determines the top level domain of a url. |
Exponential Transform Primitives#
|
Computes the exponentially weighted moving average for a series of numbers |
|
Computes the exponentially weighted moving standard deviation for a series of numbers |
|
Computes the exponentially weighted moving variance for a series of numbers |
General Transform Primitives#
|
Calculates the absolute difference from the previous element |
|
Computes the absolute value of a number. |
|
Computes the cosine of a number. |
|
Determines if a value is null. |
Computes the natural logarithm of a number. |
|
|
Negates a numeric value. |
Determines the percentile rank for each value in a list. |
|
Computes the rate of change of a value per second. |
|
|
Determines if a value is equal to the previous value in a list. |
|
Computes the sine of a number. |
Computes the square root of a number. |
|
|
Computes the tangent of a number. |
|
Calculates the variance of a list of numbers. |
Location Transform Primitives#
|
Calculates the distance between points in a city road grid. |
Determines the geographic center of two coordinates. |
|
|
Calculates the approximate haversine distance between two LatLong columns. |
|
Determines if coordinates are inside a box defined by two corner coordinate points. |
|
Returns the first tuple value in a list of LatLong tuples. |
Returns the second tuple value in a list of LatLong tuples. |
NaturalLanguage Transform Primitives#
|
Determines how many times a given string shows up in a text field. |
Determines the mean number of characters per word. |
|
|
Determines the median word length. |
Calculates the number of characters in a given string, including whitespace and punctuation. |
|
|
Calculates the number of unique separators. |
|
Determines the number of words in a string. |
|
Determines the number of common words in a string. |
Determines the number of hashtags in a string. |
|
Determines the number of mentions in a string. |
|
|
Determines the number of unique words in a string. |
|
Determines the number of words in quotes in a string. |
Determines number of punctuation characters in a string. |
|
Determines the number of title words in a string. |
|
|
Determines the total word length. |
Calculates the number of upper case letters in text. |
|
Determines the number of words in a string that are entirely capitalized. |
|
Calculates number of whitespaces in a string. |
Postal Code Primitives#
Returns the one digit prefix of a given postal code. |
|
Returns the two digit prefix of a given postal code. |
Time Series Transform Primitives#
|
Computes the expanding count of events over a given window. |
|
Computes the expanding maximum of events over a given window. |
|
Computes the expanding mean of events over a given window. |
|
Computes the expanding minimum of events over a given window. |
|
Computes the expanding standard deviation for events over a given window. |
|
Computes the expanding trend for events over a given window. |
|
Shifts an array of values by a specified number of periods. |
|
Determines a rolling count of events over a given window. |
|
Determines the maximum of entries over a given window. |
|
Calculates the mean of entries over a given window. |
|
Determines the minimum of entries over a given window. |
|
Determines how many values are outliers over a given window. |
|
Calculates the standard deviation of entries over a given window. |
|
Calculates the trend of a given window of entries of a column over time. |
Natural Language Processing Primitives#
Natural Language Processing primitives create features for textual data. For more information on how to use and install these primitives, see here.
Primitives in standard install#
Calculates the overall complexity of the text based on the total |
|
|
Calculates the Latent Semantic Analysis Values of NaturalLanguage Input |
Calculates the occurences of each different part of speech. |
|
Calculates the polarity of a text on a scale from -1 (negative) to 1 (positive) |
|
Determines number of stopwords in a string. |
Primitives that require installing tensorflow#
|
Transforms a sentence or short paragraph using deep contextualized langauge representations. |
Transforms a sentence or short paragraph to a vector using [tfhub model](https://tfhub.dev/google/universal-sentence-encoder/2) |
Feature methods#
|
Rename Feature, returns copy. |
|
Returns depth of feature |
Feature calculation#
|
Calculates a matrix for a given set of instance ids and calculation times. |
Feature descriptions#
|
Generates an English language description of a feature. |
Feature visualization#
|
Generates a feature lineage graph for the given feature |
Feature encoding#
|
Encode categorical features |
Feature Selection#
|
Select features that have at least 2 unique values and that are not all null |
|
Removes columns in feature matrix that are highly correlated with another column. |
|
Removes columns from a feature matrix that have higher than a set threshold of null values. |
|
Removes columns in feature matrix where all the values are the same. |
Feature Matrix utils#
|
Replace all |
Saving and Loading Features#
|
Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string. |
|
Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string. |
EntitySet, Relationship#
Constructors#
|
Stores all actual data and typing information for an entityset |
|
Class to represent a relationship between dataframes |
EntitySet load and prepare data#
|
Add a DataFrame to the EntitySet with Woodwork typing information. |
Find or set interesting values for categorical columns, to be used to generate "where" clauses |
|
Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed). |
|
|
Add a new relationship between dataframes in the entityset. |
|
Add multiple new relationships to a entityset |
|
Combine entityset with another to create a new entityset with the combined data of both entitysets. |
|
Create a new dataframe and relationship from unique values of an existing column. |
Set the secondary time index for a dataframe in the EntitySet using its dataframe name. |
|
|
Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same. |
EntitySet serialization#
|
Read entityset from disk, S3 path, or URL. |
|
Write entityset to disk in the csv format, location specified by path. |
|
Write entityset in the pickle format, location specified by path. |
|
Write entityset to disk in the parquet format, location specified by path. |
EntitySet query methods#
|
Get dataframe instance from entityset |
Generator which yields all backward paths between a start and goal dataframe. |
|
Generator which yields all forward paths between a start and goal dataframe. |
|
|
Get dataframes that are in a forward relationship with dataframe |
|
Get dataframes that are in a backward relationship with dataframe |
|
Query instances that have column with given value |
EntitySet visualization#
|
Create a UML diagram-ish graph of the EntitySet. |
Relationship attributes#
Column in parent dataframe |
|
Column in child dataframe |
|
Parent dataframe object |
|
Child dataframe object |
Data Type Util Methods#
Returns a dataframe describing all of the available Logical Types. |
|
Returns a dataframe describing all of the common semantic tags. |
Primitive Util Methods#
|
Get a list of recommended primitives given an entity set. |
Returns a DataFrame that lists and describes each built-in primitive. |
|
Returns a metrics summary DataFrame of all primitives found in list_primitives. |