- class featuretools.primitives.RollingMean(window_length=3, gap=1, min_periods=0)[source]#
Calculates the mean of entries over a given window.
Given a list of numbers and a corresponding list of datetimes, return a rolling mean of the numeric values, starting at the row gap rows away from the current row and looking backward over the specified time window (by window_length and gap).
Input datetimes should be monotonic.
window_length (int, string, optional) – Specifies the amount of data included in each window. If an integer is provided, it will correspond to a number of rows. For data with a uniform sampling frequency, for example of one day, the window_length will correspond to a period of time, in this case, 7 days for a window_length of 7. If a string is provided, it must be one of pandas’ offset alias strings (‘1D’, ‘1H’, etc), and it will indicate a length of time that each window should span. The list of available offset aliases can be found at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases. Defaults to 3.
gap (int, string, optional) – Specifies a gap backwards from each instance before the window of usable data begins. If an integer is provided, it will correspond to a number of rows. If a string is provided, it must be one of pandas’ offset alias strings (‘1D’, ‘1H’, etc), and it will indicate a length of time between a target instance and the beginning of its window. Defaults to 1.
min_periods (int, optional) – Minimum number of observations required for performing calculations over the window. Can only be as large as window_length when window_length is an integer. When window_length is an offset alias string, this limitation does not exist, but care should be taken to not choose a min_periods that will always be larger than the number of observations in a window. Defaults to 1.
Only offset aliases with fixed frequencies can be used when defining gap and window_length. This means that aliases such as M or W cannot be used, as they can indicate different numbers of days. (‘M’, because different months have different numbers of days; ‘W’ because week will indicate a certain day of the week, like W-Wed, so that will indicate a different number of days depending on the anchoring date.)
When using an offset alias to define gap, an offset alias must also be used to define window_length. This limitation does not exist when using an offset alias to define window_length. In fact, if the data has a uniform sampling frequency, it is preferable to use a numeric gap as it is more efficient.
>>> import pandas as pd >>> rolling_mean = RollingMean(window_length=3) >>> times = pd.date_range(start='2019-01-01', freq='1min', periods=5) >>> rolling_mean(times, [4, 3, 2, 1, 0]).tolist() [nan, 4.0, 3.5, 3.0, 2.0]
We can also control the gap before the rolling calculation.
>>> import pandas as pd >>> rolling_mean = RollingMean(window_length=3, gap=0) >>> times = pd.date_range(start='2019-01-01', freq='1min', periods=5) >>> rolling_mean(times, [4, 3, 2, 1, 0]).tolist() [4.0, 3.5, 3.0, 2.0, 1.0]
We can also control the minimum number of periods required for the rolling calculation.
>>> import pandas as pd >>> rolling_mean = RollingMean(window_length=3, min_periods=3, gap=0) >>> times = pd.date_range(start='2019-01-01', freq='1min', periods=5) >>> rolling_mean(times, [4, 3, 2, 1, 0]).tolist() [nan, nan, 3.0, 2.0, 1.0]
__init__([window_length, gap, min_periods])
Flattens nested column schema inputs into a single list.
Additional compatible libraries
Default value this feature returns if no data found.
woodwork.ColumnSchema types of inputs
Name of the primitive
Number of columns in feature matrix associated with this feature
ColumnSchema type of return