API Reference#

Demo Datasets#

`load_retail`([id, nrows, return_single_table])	Returns the retail entityset example.
`load_mock_customer`([n_customers, ...])	Return dataframes of mock customer data
`load_flight`([month_filter, ...])	Download, clean, and filter flight data from 2017.
`load_weather`([nrows, return_single_table])	Load the Australian daily-min-temperatures weather dataset.

Deep Feature Synthesis#

`dfs`([dataframes, relationships, entityset, ...])	Calculates a feature matrix and features given a dictionary of dataframes and a list of relationships.
`get_valid_primitives`(entityset, ...[, ...])	Returns two lists of primitives (transform and aggregation) containing primitives that can be applied to the specific target dataframe to create features.

Wrappers#

scikit-learn (BETA)#

wrappers.DFSTransformer([...])

Transformer using Scikit-Learn interface for Pipeline uses.

Timedelta#

Timedelta(value[, unit, delta_obj])

Represents differences in time.

Time utils#

make_temporal_cutoffs(instance_ids, cutoffs)

Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.

Feature Primitives#

Primitive Types#

`TransformPrimitive`()	Feature for dataframe that is a based off one or more other features in that dataframe.
`AggregationPrimitive`()

Aggregation Primitives#

`All`()	Calculates if all values are 'True' in a list.
`Any`()	Determines if any value is 'True' in a list.
`AvgTimeBetween`([unit])	Computes the average number of seconds between consecutive events.
`Count`()	Determines the total number of values, excluding NaN.
`CountAboveMean`([skipna])	Calculates the number of values that are above the mean.
`CountBelowMean`([skipna])	Determines the number of values that are below the mean.
`CountGreaterThan`([threshold])	Determines the number of values greater than a controllable threshold.
`CountInsideNthSTD`([n])	Determines the count of observations that lie inside
`CountInsideRange`([lower, upper, skipna])	Determines the number of values that fall within a certain range.
`CountLessThan`([threshold])	Determines the number of values less than a controllable threshold.
`CountOutsideNthSTD`([n])	Determines the number of observations that lie outside
`CountOutsideRange`([lower, upper, skipna])	Determines the number of values that fall outside a certain range.
`Entropy`([dropna, base])	Calculates the entropy for a categorical column
`First`()	Determines the first value in a list.
`Last`()	Determines the last value in a list.
`Max`()	Calculates the highest value, ignoring NaN values.
`MaxConsecutiveFalse`()	Determines the maximum number of consecutive False values in the input
`MaxConsecutiveNegatives`([skipna])	Determines the maximum number of consecutive negative values in the input
`MaxConsecutivePositives`([skipna])	Determines the maximum number of consecutive positive values in the input
`MaxConsecutiveTrue`()	Determines the maximum number of consecutive True values in the input
`MaxConsecutiveZeros`([skipna])	Determines the maximum number of consecutive zero values in the input
`Mean`([skipna])	Computes the average for a list of values.
`Median`()	Determines the middlemost number in a list of values.
`Min`()	Calculates the smallest value, ignoring NaN values.
`Mode`()	Determines the most commonly repeated value.
`NMostCommon`([n])	Determines the n most common elements.
`NumConsecutiveGreaterMean`([skipna])	Determines the length of the longest subsequence above the mean.
`NumConsecutiveLessMean`([skipna])	Determines the length of the longest subsequence below the mean.
`NumTrue`()	Counts the number of True values.
`NumUnique`()	Determines the number of distinct values, ignoring NaN values.
`PercentTrue`()	Determines the percent of True values.
`Skew`()	Computes the extent to which a distribution differs from a normal distribution.
`Std`()	Computes the dispersion relative to the mean value, ignoring NaN.
`Sum`()	Calculates the total addition, ignoring NaN.
`TimeSinceFirst`([unit])	Calculates the time elapsed since the first datetime (in seconds).
`TimeSinceLast`([unit])	Calculates the time elapsed since the last datetime (default in seconds).
`TimeSinceLastFalse`()	Calculates the time since the last False value.
`TimeSinceLastMax`()	Calculates the time since the maximum value occurred.
`TimeSinceLastMin`()	Calculates the time since the minimum value occurred.
`TimeSinceLastTrue`()	Calculates the time since the last True value.
`Trend`()	Calculates the trend of a column over time.

Transform Primitives#

Binary Transform Primitives#

`AddNumeric`()	Performs element-wise addition of two lists.
`AddNumericScalar`([value])	Adds a scalar to each value in the list.
`DivideByFeature`([value])	Divides a scalar by each value in the list.
`DivideNumericScalar`([value])	Divides each element in the list by a scalar.
`Equal`()	Determines if values in one list are equal to another list.
`EqualScalar`([value])	Determines if values in a list are equal to a given scalar.
`GreaterThan`()	Determines if values in one list are greater than another list.
`GreaterThanEqualTo`()	Determines if values in one list are greater than or equal to another list.
`GreaterThanEqualToScalar`([value])	Determines if values are greater than or equal to a given scalar.
`GreaterThanScalar`([value])	Determines if values are greater than a given scalar.
`LessThan`()	Determines if values in one list are less than another list.
`LessThanEqualTo`()	Determines if values in one list are less than or equal to another list.
`LessThanEqualToScalar`([value])	Determines if values are less than or equal to a given scalar.
`LessThanScalar`([value])	Determines if values are less than a given scalar.
`ModuloByFeature`([value])	Computes the modulo of a scalar by each element in a list.
`ModuloNumeric`()	Performs element-wise modulo of two lists.
`ModuloNumericScalar`([value])	Computes the modulo of each element in the list by a given scalar.
`MultiplyBoolean`()	Performs element-wise multiplication of two lists of boolean values.
`MultiplyNumericBoolean`()	Performs element-wise multiplication of a numeric list with a boolean list.
`MultiplyNumericScalar`([value])	Multiplies each element in the list by a scalar.
`NotEqual`()	Determines if values in one list are not equal to another list.
`NotEqualScalar`([value])	Determines if values in a list are not equal to a given scalar.
`ScalarSubtractNumericFeature`([value])	Subtracts each value in the list from a given scalar.
`SubtractNumeric`([commutative])	Performs element-wise subtraction of two lists.
`SubtractNumericScalar`([value])	Subtracts a scalar from each element in the list.

Combine features#

`IsIn`([list_of_outputs])	Determines whether a value is present in a provided list.
`And`()	Performs element-wise logical AND of two lists.
`Or`()	Performs element-wise logical OR of two lists.
`Not`()	Negates a boolean value.

Cumulative Transform Primitives#

`Diff`([periods])	Computes the difference between the value in a list and the previous value in that list.
`DiffDatetime`([periods])	Computes the timedelta between a datetime in a list and the previous datetime in that list.
`TimeSincePrevious`([unit])	Computes the time since the previous entry in a list.
`CumCount`()	Calculates the cumulative count.
`CumSum`()	Calculates the cumulative sum.
`CumMean`()	Calculates the cumulative mean.
`CumMin`()	Calculates the cumulative minimum.
`CumMax`()	Calculates the cumulative maximum.

Datetime Transform Primitives#

`Age`()	Calculates the age in years as a floating point number given a
`DateToHoliday`([country])	Transforms time of an instance into the holiday name, if there is one.
`DateToTimeZone`()	Determines the timezone of a datetime.
`Day`()	Determines the day of the month from a datetime.
`DayOfYear`()	Determines the ordinal day of the year from the given datetime
`DaysInMonth`()	Determines the day of the month from a datetime.
`DistanceToHoliday`([holiday, country])	Computes the number of days before or after a given holiday.
`Hour`()	Determines the hour value of a datetime.
`IsFederalHoliday`([country])	Determines if a given datetime is a federal holiday.
`IsLeapYear`()	Determines the is_leap_year attribute of a datetime column.
`IsLunchTime`([lunch_hour])	Determines if a datetime falls during configurable lunch hour, on a 24-hour clock.
`IsMonthEnd`()	Determines the is_month_end attribute of a datetime column.
`IsMonthStart`()	Determines the is_month_start attribute of a datetime column.
`IsQuarterEnd`()	Determines the is_quarter_end attribute of a datetime column.
`IsQuarterStart`()	Determines the is_quarter_start attribute of a datetime column.
`IsWeekend`()	Determines if a date falls on a weekend.
`IsWorkingHours`([start_hour, end_hour])	Determines if a datetime falls during working hours on a 24-hour clock.
`IsYearEnd`()	Determines if a date falls on the end of a year.
`IsYearStart`()	Determines if a date falls on the start of a year.
`Minute`()	Determines the minutes value of a datetime.
`Month`()	Determines the month value of a datetime.
`PartOfDay`()	Determines the part of day of a datetime.
`Quarter`()	Determines the quarter a datetime column falls into (1, 2, 3, 4)
`Second`()	Determines the seconds value of a datetime.
`Week`()	Determines the week of the year from a datetime.
`Weekday`()	Determines the day of the week from a datetime.
`Year`()	Determines the year value of a datetime.

Email and URL Transform Primitives#

`EmailAddressToDomain`()	Determines the domain of an email
`IsFreeEmailDomain`()	Determines if an email address is from a free email domain.
`URLToDomain`()	Determines the domain of a url.
`URLToProtocol`()	Determines the protocol (http or https) of a url.
`URLToTLD`()	Determines the top level domain of a url.

Exponential Transform Primitives#

`ExponentialWeightedAverage`([com, span, ...])	Computes the exponentially weighted moving average for a series of numbers
`ExponentialWeightedSTD`([com, span, ...])	Computes the exponentially weighted moving standard deviation for a series of numbers
`ExponentialWeightedVariance`([com, span, ...])	Computes the exponentially weighted moving variance for a series of numbers

General Transform Primitives#

`Absolute`()	Computes the absolute value of a number.
`Cosine`()	Computes the cosine of a number.
`IsNull`()	Determines if a value is null.
`NaturalLogarithm`()	Computes the natural logarithm of a number.
`Negate`()	Negates a numeric value.
`Percentile`()	Determines the percentile rank for each value in a list.
`RateOfChange`()	Computes the rate of change of a value per second.
`Sine`()	Computes the sine of a number.
`SquareRoot`()	Computes the square root of a number.
`Tangent`()	Computes the tangent of a number.

Location Transform Primitives#

`CityblockDistance`([unit])	Calculates the distance between points in a city road grid.
`GeoMidpoint`()	Determines the geographic center of two coordinates.
`Haversine`([unit])	Calculates the approximate haversine distance between two LatLong columns.
`IsInGeoBox`([point1, point2])	Determines if coordinates are inside a box defined by two corner coordinate points.
`Latitude`()	Returns the first tuple value in a list of LatLong tuples.
`Longitude`()	Returns the second tuple value in a list of LatLong tuples.

NaturalLanguage Transform Primitives#

`CountString`([string, ignore_case, ...])	Determines how many times a given string shows up in a text field.
`MeanCharactersPerWord`()	Determines the mean number of characters per word.
`MedianWordLength`([delimiters_regex])	Determines the median word length.
`NumCharacters`()	Calculates the number of characters in a string.
`NumUniqueSeparators`([separators])	Calculates the number of unique separators.
`NumWords`()	Determines the number of words in a string by counting the spaces.
`NumberOfCommonWords`([word_set, delimiters_regex])	Determines the number of common words in a string.
`NumberOfHashtags`()	Determines the number of hashtags in a string.
`NumberOfMentions`()	Determines the number of mentions in a string.
`NumberOfUniqueWords`([case_insensitive])	Determines the number of unique words in a string.
`NumberOfWordsInQuotes`([quote_type])	Determines the number of words in quotes in a string.
`PunctuationCount`()	Determines number of punctuation characters in a string.
`TitleWordCount`()	Determines the number of title words in a string.
`TotalWordLength`([do_not_count])	Determines the total word length.
`UpperCaseCount`()	Calculates the number of upper case letters in text.
`WhitespaceCount`()	Calculates number of whitespaces in a string.

Postal Code Primitives#

`OneDigitPostalCode`()	Returns the one digit prefix of a given postal code.
`TwoDigitPostalCode`()	Returns the two digit prefix of a given postal code.

Time Series Transform Primitives#

`ExpandingCount`([gap, min_periods])	Computes the expanding count of events over a given window.
`ExpandingMax`([gap, min_periods])	Computes the expanding maximum of events over a given window.
`ExpandingMean`([gap, min_periods])	Computes the expanding mean of events over a given window.
`ExpandingMin`([gap, min_periods])	Computes the expanding minimum of events over a given window.
`ExpandingSTD`([gap, min_periods])	Computes the expanding standard deviation for events over a given window.
`ExpandingTrend`([gap, min_periods])	Computes the expanding trend for events over a given window.
`Lag`([periods])	Shifts an array of values by a specified number of periods.
`RollingCount`([window_length, gap, min_periods])	Determines a rolling count of events over a given window.
`RollingMax`([window_length, gap, min_periods])	Determines the maximum of entries over a given window.
`RollingMean`([window_length, gap, min_periods])	Calculates the mean of entries over a given window.
`RollingMin`([window_length, gap, min_periods])	Determines the minimum of entries over a given window.
`RollingOutlierCount`([window_length, gap, ...])	Determines how many values are outliers over a given window.
`RollingSTD`([window_length, gap, min_periods])	Calculates the standard deviation of entries over a given window.
`RollingTrend`([window_length, gap, min_periods])	Calculates the trend of a given window of entries of a column over time.

Natural Language Processing Primitives#

Natural Language Processing primitives create features for textual data. For more information on how to use and install these primitives, see here.

Primitives in standard install#

`DiversityScore`()	Calculates the overall complexity of the text based on the total
`LSA`([random_seed, corpus, algorithm])	Calculates the Latent Semantic Analysis Values of NaturalLanguage Input
`PartOfSpeechCount`()	Calculates the occurences of each different part of speech.
`PolarityScore`()	Calculates the polarity of a text on a scale from -1 (negative) to 1 (positive)
`StopwordCount`()	Determines number of stopwords in a string.

Primitives that require installing tensorflow#

`Elmo`()	Transforms a sentence or short paragraph using deep contextualized langauge representations.
`UniversalSentenceEncoder`()	Transforms a sentence or short paragraph to a vector using [tfhub model](https://tfhub.dev/google/universal-sentence-encoder/2)

Feature methods#

`FeatureBase.rename`(name)	Rename Feature, returns copy.
`FeatureBase.get_depth`([stop_at])	Returns depth of feature

Feature calculation#

calculate_feature_matrix(features[, ...])

Calculates a matrix for a given set of instance ids and calculation times.

Feature descriptions#

describe_feature(feature[, ...])

Generates an English language description of a feature.

Feature visualization#

graph_feature(feature[, to_file, description])

Generates a feature lineage graph for the given feature

Feature encoding#

encode_features(feature_matrix, features[, ...])

Encode categorical features

Feature Selection#

`remove_low_information_features`(feature_matrix)	Select features that have at least 2 unique values and that are not all null
`remove_highly_correlated_features`(feature_matrix)	Removes columns in feature matrix that are highly correlated with another column.
`remove_highly_null_features`(feature_matrix)	Removes columns from a feature matrix that have higher than a set threshold of null values.
`remove_single_value_features`(feature_matrix)	Removes columns in feature matrix where all the values are the same.

Feature Matrix utils#

replace_inf_values(feature_matrix[, ...])

Replace all np.inf values in a feature matrix with the specified replacement value.

Saving and Loading Features#

`save_features`(features[, location, profile_name])	Saves the features list as JSON to a specified filepath/S3 path, writes to an open file, or returns the serialized features as a JSON string.
`load_features`(features[, profile_name])	Loads the features from a filepath, S3 path, URL, an open file, or a JSON formatted string.

EntitySet, Relationship#

Constructors#

`EntitySet`([id, dataframes, relationships])	Stores all actual data and typing information for an entityset
`Relationship`(entityset, ...)	Class to represent a relationship between dataframes

EntitySet load and prepare data#

`EntitySet.add_dataframe`(dataframe[, ...])	Add a DataFrame to the EntitySet with Woodwork typing information.
`EntitySet.add_interesting_values`([...])	Find or set interesting values for categorical columns, to be used to generate "where" clauses
`EntitySet.add_last_time_indexes`([...])	Calculates the last time index values for each dataframe (the last time an instance or children of that instance were observed).
`EntitySet.add_relationship`([...])	Add a new relationship between dataframes in the entityset.
`EntitySet.add_relationships`(relationships)	Add multiple new relationships to a entityset
`EntitySet.concat`(other[, inplace])	Combine entityset with another to create a new entityset with the combined data of both entitysets.
`EntitySet.normalize_dataframe`(...[, ...])	Create a new dataframe and relationship from unique values of an existing column.
`EntitySet.set_secondary_time_index`(...)	Set the secondary time index for a dataframe in the EntitySet using its dataframe name.
`EntitySet.replace_dataframe`(dataframe_name, df)	Replace the internal dataframe of an EntitySet table, keeping Woodwork typing information the same.

EntitySet serialization#

read_entityset(path[, profile_name])

Read entityset from disk, S3 path, or URL.

`EntitySet.to_csv`(path[, sep, encoding, ...])	Write entityset to disk in the csv format, location specified by path.
`EntitySet.to_pickle`(path[, compression, ...])	Write entityset in the pickle format, location specified by path.
`EntitySet.to_parquet`(path[, engine, ...])	Write entityset to disk in the parquet format, location specified by path.

EntitySet query methods#

`EntitySet.__getitem__`(dataframe_name)	Get dataframe instance from entityset
`EntitySet.find_backward_paths`(...)	Generator which yields all backward paths between a start and goal dataframe.
`EntitySet.find_forward_paths`(...)	Generator which yields all forward paths between a start and goal dataframe.
`EntitySet.get_forward_dataframes`(dataframe_name)	Get dataframes that are in a forward relationship with dataframe
`EntitySet.get_backward_dataframes`(dataframe_name)	Get dataframes that are in a backward relationship with dataframe
`EntitySet.query_by_values`(dataframe_name, ...)	Query instances that have column with given value

EntitySet visualization#

EntitySet.plot([to_file])

Create a UML diagram-ish graph of the EntitySet.

Relationship attributes#

`Relationship.parent_column`	Column in parent dataframe
`Relationship.child_column`	Column in child dataframe
`Relationship.parent_dataframe`	Parent dataframe object
`Relationship.child_dataframe`	Child dataframe object

Data Type Util Methods#

`list_logical_types`()	Returns a dataframe describing all of the available Logical Types.
`list_semantic_tags`()	Returns a dataframe describing all of the common semantic tags.

Primitive Util Methods#

`get_recommended_primitives`(entityset[, ...])	Get a list of recommended primitives given an entity set.
`list_primitives`()	Returns a DataFrame that lists and describes each built-in primitive.
`summarize_primitives`()	Returns a metrics summary DataFrame of all primitives found in list_primitives.

Table of Contents

Previous topic

Next topic

This Page

Quick search