Feature primitives#

Feature primitives are the building blocks of Featuretools. They define individual computations that can be applied to raw datasets to create new features. Because a primitive only constrains the input and output data types, they can be applied across datasets and can stack to create new calculations.

Why primitives?#

The space of potential functions that humans use to create a feature is expansive. By breaking common feature engineering calculations down into primitive components, we are able to capture the underlying structure of the features humans create today.

A primitive only constrains the input and output data types. This means they can be used to transfer calculations known in one domain to another. Consider a feature which is often calculated by data scientists for transactional or event logs data: average time between events. This feature is incredibly valuable in predicting fraudulent behavior or future customer engagement.

DFS achieves the same feature by stacking two primitives "time_since_previous" and "mean"

[1]:

import featuretools as ft

es = ft.demo.load_mock_customer(return_entityset=True)

feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["mean"],
    trans_primitives=["time_since_previous"],
    features_only=True,
)

feature_defs

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(

[1]:

[<Feature: zip_code>,
 <Feature: MEAN(transactions.amount)>,
 <Feature: TIME_SINCE_PREVIOUS(join_date)>,
 <Feature: MEAN(sessions.MEAN(transactions.amount))>,
 <Feature: MEAN(sessions.TIME_SINCE_PREVIOUS(session_start))>]

Note

The primitive arguments to DFS (eg. agg_primitives and trans_primitives in the example above) accept snake_case, camelCase, or TitleCase strings of included Featuretools primitives (ie. time_since_previous, timeSincePrevious, and TimeSincePrevious are all acceptable inputs).

Note

When dfs is called with features_only=True, only feature definitions are returned as output. By default this parameter is set to False. This parameter is used quickly inspect the feature definitions before the spending time calculating the feature matrix.

A second advantage of primitives is that they can be used to quickly enumerate many interesting features in a parameterized way. This is used by Deep Feature Synthesis to get several different ways of summarizing the time since the previous event.

[2]:

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["mean", "max", "min", "std", "skew"],
    trans_primitives=["time_since_previous"],
)

feature_matrix[
    [
        "MEAN(sessions.TIME_SINCE_PREVIOUS(session_start))",
        "MAX(sessions.TIME_SINCE_PREVIOUS(session_start))",
        "MIN(sessions.TIME_SINCE_PREVIOUS(session_start))",
        "STD(sessions.TIME_SINCE_PREVIOUS(session_start))",
        "SKEW(sessions.TIME_SINCE_PREVIOUS(session_start))",
    ]
]

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f415f1668b0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f415f1669d0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f415f16c310> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f415f16c1f0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f415f16c310> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f415f1668b0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f415f16c1f0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f415f1669d0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f415f16c1f0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f415f16c310> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f415f1669d0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f415f1668b0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(

[2]:

	MEAN(sessions.TIME_SINCE_PREVIOUS(session_start))	MAX(sessions.TIME_SINCE_PREVIOUS(session_start))	MIN(sessions.TIME_SINCE_PREVIOUS(session_start))	STD(sessions.TIME_SINCE_PREVIOUS(session_start))	SKEW(sessions.TIME_SINCE_PREVIOUS(session_start))
customer_id
5	1007.500000	1170.0	715.0	157.884451	-1.507217
4	999.375000	1625.0	650.0	308.688904	1.065177
1	966.875000	1170.0	715.0	171.754341	-0.254557
3	888.333333	1170.0	650.0	177.613813	0.434581
2	725.833333	975.0	520.0	194.638554	0.162631

Aggregation vs Transform Primitive#

In the example above, we use two types of primitives.

Aggregation primitives: These primitives take related instances as an input and output a single value. They are applied across a parent-child relationship in an EntitySet. E.g: "count", "sum", "avg_time_between".

$digraph "COUNT(sessions)" { graph [bb="0,0,780,119", rankdir=LR ]; node [label="\N", shape=box ]; edge [arrowhead=none, dir=forward, style=dotted ]; customers [height=1.1389, label=< <TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10"> <TR> <TD colspan="1" bgcolor="#A9A9A9"><B>★ customers (target)</B></TD> </TR> <TR> <TD ALIGN="LEFT" port="COUNT(sessions)" BGCOLOR="#D9EAD3">COUNT(sessions)</TD> </TR> </TABLE>>, pos="676.5,59.5", shape=plaintext, width=2.875]; sessions [height=1.6528, label=< <TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10"> <TR> <TD colspan="1" bgcolor="#A9A9A9"><B>sessions</B></TD> </TR><TR><TD ALIGN="LEFT" port="session_id">session_id (index)</TD></TR> <TR><TD ALIGN="LEFT" port="customer_id">customer_id</TD></TR> </TABLE>>, pos="82.5,59.5", shape=plaintext, width=2.2917]; "COUNT(sessions)_groupby_sessions--customer_id" [height=0.52778, label="group by customer_id", pos="253,40.5", width=1.4444]; sessions:session_id -> "COUNT(sessions)_groupby_sessions--customer_id" [arrowhead="", pos="e,200.9,53.958 158.5,58.5 169.02,58.5 180.09,57.387 190.79,55.715", style=solid]; sessions:customer_id -> "COUNT(sessions)_groupby_sessions--customer_id" [pos="158.5,21.5 172.41,21.5 187.26,23.546 200.94,26.295"]; "0_COUNT(sessions)_count" [height=0.94444, label=<<FONT POINT-SIZE="12"><B>Aggregation</B><BR></BR></FONT>COUNT>, pos="439,40.5", shape=diamond, width=2.7222]; "0_COUNT(sessions)_count" -> customers:"COUNT(sessions)" [arrowhead="", pos="e,580.5,40.5 537.12,40.5 548.14,40.5 559.32,40.5 570.19,40.5", style=solid]; "COUNT(sessions)_groupby_sessions--customer_id" -> "0_COUNT(sessions)_count" [arrowhead="", pos="e,340.88,40.5 305.25,40.5 313.37,40.5 322.01,40.5 330.84,40.5", style=solid]; }$

Transform primitives: These primitives take one or more columns from a dataframe as an input and output a new column for that dataframe. They are applied to a single dataframe. E.g: "hour", "time_since_previous", "absolute".

$digraph "TIME_SINCE_PREVIOUS(join_date)" { graph [bb="0,0,721,119", rankdir=LR ]; node [label="\N", shape=box ]; edge [arrowhead=none, dir=forward, style=dotted ]; customers [height=1.6528, label=< <TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10"> <TR> <TD colspan="1" bgcolor="#A9A9A9"><B>★ customers (target)</B></TD> </TR><TR><TD ALIGN="LEFT" port="join_date">join_date</TD></TR> <TR> <TD ALIGN="LEFT" port="TIME_SINCE_PREVIOUS(join_date)" BGCOLOR="#D9EAD3">TIME_SINCE_PREVIOUS(join_date)</TD> </TR> </TABLE>>, pos="146.5,59.5", shape=plaintext, width=4.0694]; "0_TIME_SINCE_PREVIOUS(join_date)_time_since_previous" [height=0.94444, label=<<FONT POINT-SIZE="12"><B>Transform</B><BR></BR></FONT>TIME_SINCE_PREVIOUS>, pos="525,40.5", shape=diamond, width=5.4444]; customers:join_date -> "0_TIME_SINCE_PREVIOUS(join_date)_time_since_previous" [arrowhead="", pos="e,402.4,53.269 286.5,58.5 320.84,58.5 357.9,56.606 392.29,54.046", style=solid]; "0_TIME_SINCE_PREVIOUS(join_date)_time_since_previous" -> customers:"TIME_SINCE_PREVIOUS(join_date)" [arrowhead="", pos="e,286.5,21.5 405.15,27.252 370.54,24.311 332.38,21.941 296.54,21.555", style=solid]; }$

The above graphs were generated using the graph_feature function. These feature lineage graphs help to visually show how primitives were stacked to generate a feature.

For a DataFrame that lists and describes each built-in primitive in Featuretools, call ft.list_primitives().

[3]:

ft.list_primitives().head(5)

[3]:

	name	type	description	valid_inputs	return_type
0	first	aggregation	Determines the first value in a list.	<ColumnSchema>	None
1	kurtosis	aggregation	Calculates the kurtosis for a list of numbers	<ColumnSchema (Logical Type = Double) (Semanti...	<ColumnSchema (Logical Type = Double) (Semanti...
2	num_zero_crossings	aggregation	Determines the number of times a list crosses 0.	<ColumnSchema (Semantic Tags = ['numeric'])>	<ColumnSchema (Logical Type = Integer) (Semant...
3	count_below_mean	aggregation	Determines the number of values that are below...	<ColumnSchema (Semantic Tags = ['numeric'])>	<ColumnSchema (Logical Type = IntegerNullable)...
4	count_greater_than	aggregation	Determines the number of values greater than a...	<ColumnSchema (Semantic Tags = ['numeric'])>	<ColumnSchema (Logical Type = Integer) (Semant...

For a DataFrame of metrics that summarizes various properties and capabilities of all of the built-in primitives in Featuretools, call ft.summarize_primitives().

[4]:

ft.summarize_primitives()

[4]:

	Metric	Count
0	total_primitives	203
1	aggregation_primitives	65
2	transform_primitives	138
3	unique_input_types	23
4	unique_output_types	22
5	uses_multi_input	50
6	uses_multi_output	2
7	uses_external_data	1
8	are_controllable	87
9	uses_address_input	0
10	uses_age_input	0
11	uses_age_fractional_input	0
12	uses_age_nullable_input	0
13	uses_boolean_input	18
14	uses_boolean_nullable_input	12
15	uses_categorical_input	0
16	uses_country_code_input	0
17	uses_currency_code_input	0
18	uses_datetime_input	68
19	uses_double_input	4
20	uses_email_address_input	2
21	uses_filepath_input	1
22	uses_ip_address_input	0
23	uses_integer_input	4
24	uses_integer_nullable_input	0
25	uses_lat_long_input	6
26	uses_natural_language_input	17
27	uses_ordinal_input	4
28	uses_person_full_name_input	3
29	uses_phone_number_input	0
30	uses_postal_code_input	2
31	uses_sub_region_code_input	0
32	uses_timedelta_input	0
33	uses_url_input	3
34	uses_unknown_input	0
35	uses_numeric_tag_input	87
36	uses_category_tag_input	11
37	uses_index_tag_input	1
38	uses_time_index_tag_input	29
39	uses_date_of_birth_tag_input	1
40	uses_ignore_tag_input	0
41	uses_passthrough_tag_input	0
42	uses_foreign_key_tag_input	1

Defining Custom Primitives#

The library of primitives in Featuretools is constantly expanding. Users can define their own primitive using the APIs below. To define a primitive, a user will

Specify the type of primitive Aggregation or Transform
Define the input and output data types
Write a function in python to do the calculation
Annotate with attributes to constrain how it is applied

Once a primitive is defined, it can stack with existing primitives to generate complex patterns. This enables primitives known to be important for one domain to automatically be transfered to another.

[5]:

import pandas as pd
from woodwork.column_schema import ColumnSchema
from woodwork.logical_types import Datetime, NaturalLanguage

from featuretools.primitives import AggregationPrimitive, TransformPrimitive
from featuretools.tests.testing_utils import make_ecommerce_entityset

Simple Custom Primitives#

[6]:

class Absolute(TransformPrimitive):
    name = "absolute"
    input_types = [ColumnSchema(semantic_tags={"numeric"})]
    return_type = ColumnSchema(semantic_tags={"numeric"})

    def get_function(self):
        def absolute(column):
            return abs(column)

        return absolute

Above, we created a new transform primitive that can be used with Deep Feature Synthesis by deriving a new primitive class using TransformPrimitive as a base and overriding get_function to return a function that calculates the feature. Additionally, we set the input data types that the primitive applies to and the return data type. Input and return data types are defined using a Woodwork ColumnSchema. A full guide on Woodwork logical types and semantic tags can be found in the Woodwork Understanding Logical Types and Semantic Tags guide.

Similarly, we can make a new aggregation primitive using AggregationPrimitive.

[7]:

class Maximum(AggregationPrimitive):
    name = "maximum"
    input_types = [ColumnSchema(semantic_tags={"numeric"})]
    return_type = ColumnSchema(semantic_tags={"numeric"})

    def get_function(self):
        def maximum(column):
            return max(column)

        return maximum

Because we defined an aggregation primitive, the function takes in a list of values but only returns one.

Now that we’ve defined two primitives, we can use them with the dfs function as if they were built-in primitives.

[8]:

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="sessions",
    agg_primitives=[Maximum],
    trans_primitives=[Absolute],
    max_depth=2,
)

feature_matrix.head(5)[
    [
        "customers.MAXIMUM(transactions.amount)",
        "MAXIMUM(transactions.ABSOLUTE(amount))",
    ]
]

[8]:

	customers.MAXIMUM(transactions.amount)	MAXIMUM(transactions.ABSOLUTE(amount))
session_id
1	146.81	141.66
2	149.02	135.25
3	149.95	147.73
4	139.43	129.00
5	149.95	139.20

Word Count Example#

Here we define a transform primitive, WordCount, which counts the number of words in each row of an input and returns a list of the counts.

[9]:

class WordCount(TransformPrimitive):
    """
    Counts the number of words in each row of the column. Returns a list
    of the counts for each row.
    """

    name = "word_count"
    input_types = [ColumnSchema(logical_type=NaturalLanguage)]
    return_type = ColumnSchema(semantic_tags={"numeric"})

    def get_function(self):
        def word_count(column):
            word_counts = []
            for value in column:
                words = value.split(None)
                word_counts.append(len(words))
            return word_counts

        return word_count

[10]:

es = make_ecommerce_entityset()

feature_matrix, features = ft.dfs(
    entityset=es,
    target_dataframe_name="sessions",
    agg_primitives=["sum", "mean", "std"],
    trans_primitives=[WordCount],
)

feature_matrix[
    [
        "customers.WORD_COUNT(favorite_quote)",
        "STD(log.WORD_COUNT(comments))",
        "SUM(log.WORD_COUNT(comments))",
        "MEAN(log.WORD_COUNT(comments))",
    ]
]

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f415f16c310> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f415f166280> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f415f16c1f0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f415f166280> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f415f16c1f0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/latest/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f415f16c310> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(

[10]:

	customers.WORD_COUNT(favorite_quote)	STD(log.WORD_COUNT(comments))	SUM(log.WORD_COUNT(comments))	MEAN(log.WORD_COUNT(comments))
id
0	9.0	540.436860	2500.0	500.0
1	9.0	583.702550	1732.0	433.0
2	9.0	NaN	246.0	246.0
3	6.0	883.883476	1256.0	628.0
4	6.0	0.000000	9.0	3.0
5	12.0	19.798990	68.0	34.0

By adding some aggregation primitives as well, Deep Feature Synthesis was able to make four new features from one new primitive.

Multiple Input Types#

If a primitive requires multiple features as input, input_types has multiple elements, eg [ColumnSchema(semantic_tags={'numeric'}), ColumnSchema(semantic_tags={'numeric'})] would mean the primitive requires two columns with the semantic tag numeric as input. Below is an example of a primitive that has multiple input features.

[11]:

class MeanSunday(AggregationPrimitive):
    """
    Finds the mean of non-null values of a feature that occurred on Sundays
    """

    name = "mean_sunday"
    input_types = [
        ColumnSchema(semantic_tags={"numeric"}),
        ColumnSchema(logical_type=Datetime),
    ]
    return_type = ColumnSchema(semantic_tags={"numeric"})

    def get_function(self):
        def mean_sunday(numeric, datetime):
            days = pd.DatetimeIndex(datetime).weekday.values
            df = pd.DataFrame({"numeric": numeric, "time": days})
            return df[df["time"] == 6]["numeric"].mean()

        return mean_sunday

[12]:

feature_matrix, features = ft.dfs(
    entityset=es,
    target_dataframe_name="sessions",
    agg_primitives=[MeanSunday],
    trans_primitives=[],
    max_depth=1,
)

feature_matrix[
    [
        "MEAN_SUNDAY(log.value, datetime)",
        "MEAN_SUNDAY(log.value_2, datetime)",
    ]
]

[12]:

	MEAN_SUNDAY(log.value, datetime)	MEAN_SUNDAY(log.value_2, datetime)
id
0	NaN	NaN
1	NaN	NaN
2	NaN	NaN
3	2.5	1.0
4	7.0	3.0
5	NaN	NaN

Table of Contents

Previous topic

Next topic

This Page

Feature primitives#

Why primitives?#

Aggregation vs Transform Primitive#

Defining Custom Primitives#

Simple Custom Primitives#

Word Count Example#

Multiple Input Types#

Table of Contents

Previous topic

Next topic

This Page

Quick search

Feature primitives#

Why primitives?#

Aggregation vs Transform Primitive#

Defining Custom Primitives#

Simple Custom Primitives#

Word Count Example#

Multiple Input Types#