featuretools.primitives.PercentChange#

class featuretools.primitives.PercentChange(periods=1, fill_method='pad', limit=None, freq=None)[source]#

Determines the percent difference between values in a list.

Description:

Given a list of numbers, return the percent difference between each subsequent number. Percentages are shown in decimal form (not multiplied by 100). Uses pandas’ pct_change function.

Parameters:
  • periods (int) – Periods to shift for calculating percent change. Default is 1.

  • fill_method (str) – Method for filling gaps in reindexed Series. Valid options are backfill, bfill, pad, ffill. pad / ffill: fill gap with last valid observation. backfill / bfill: fill gap with next valid observation. Default is pad.

  • limit (int) – The max number of consecutive NaN values in a gap that can be filled. Default is None.

  • freq (DateOffset, timedelta, or offset alias string) –

    If freq is specified, instead of calcualting change between subsequent points, PercentChange will calculate change between points with a certain interval between their date indices. freq defines the desired interval. When freq is used, the resulting index will also be filled to include any missing dates from the specified interval.

    If the index is not date/datetime and freq is used, it will raise a NotImplementedError.

    If freq is None, no changes will be applied. Default is None.

Examples

>>> percent_change = PercentChange()
>>> percent_change([2, 5, 15, 3, 3, 9, 4.5]).to_list()
[nan, 1.5, 2.0, -0.8, 0.0, 2.0, -0.5]
We can control the number of periods to return the percent

difference between points further from one another.

>>> percent_change_2 = PercentChange(periods=2)
>>> percent_change_2([2, 5, 15, 3, 3, 9, 4.5]).to_list()
[nan, nan, 6.5, -0.4, -0.8, 2.0, 0.5]

We can control the method used to handle gaps in data.

>>> percent_change = PercentChange()
>>> percent_change([2, 4, 8, None, 16, None, 32, None]).to_list()
[nan, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0]
>>> percent_change_backfill = PercentChange(fill_method='backfill')
>>> percent_change_backfill([2, 4, 8, None, 16, None, 32, None]).to_list()
[nan, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, nan]

We can also control the maximum number of NaN values to fill in a gap.

>>> percent_change = PercentChange()
>>> percent_change([2, None, None, None, 4]).to_list()
[nan, 0.0, 0.0, 0.0, 1.0]
>>> percent_change_limited = PercentChange(limit=2)
>>> percent_change_limited([2, None, None, None, 4]).to_list()
[nan, 0.0, 0.0, nan, nan]
Finally, we can specify a date frequency on which to calculate percent

change.

>>> import pandas as pd
>>> dates = pd.DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-05'])
>>> x_indexed = pd.Series([1, 2, 3, 4], index=dates)
>>> percent_change = PercentChange()
>>> percent_change(x_indexed).to_list()
[nan, 1.0, 0.5, 0.33333333333333326]
>>> date_offset = pd.tseries.offsets.DateOffset(days=1)
>>> percent_change_freq = PercentChange(freq=date_offset)
>>> percent_change_freq(x_indexed).to_list()
[nan, 1.0, 0.5, nan]
__init__(periods=1, fill_method='pad', limit=None, freq=None)[source]#

Methods

__init__([periods, fill_method, limit, freq])

flatten_nested_input_types(input_types)

Flattens nested column schema inputs into a single list.

generate_name(base_feature_names)

generate_names(base_feature_names)

get_args_string()

get_arguments()

get_description(input_column_descriptions[, ...])

get_filepath(filename)

get_function()

Attributes

base_of

base_of_exclude

commutative

default_value

Default value this feature returns if no data found.

description_template

input_types

woodwork.ColumnSchema types of inputs

max_stack_depth

name

Name of the primitive

number_output_features

Number of columns in feature matrix associated with this feature

return_type

ColumnSchema type of return

stack_on

stack_on_exclude

stack_on_self

uses_calc_time

uses_full_dataframe