{ "cells": [ { "cell_type": "markdown", "id": "92a0dab5", "metadata": {}, "source": [ "# Deployment\n", "\n", "Deployment of machine learning models requires repeating feature engineering steps on new data. In some cases, these steps need to be performed in near real-time. Featuretools has capabilities to ease the deployment of feature engineering.\n", "\n", "## Saving Features\n", "\n", "First, let's build some generate some training and test data in the same format. We use a random seed to generate different data for the test." ] }, { "cell_type": "raw", "id": "129c8011", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. note ::\n", "\n", " Features saved in one version of Featuretools are not guaranteed to load in another. This means the features might need to be re-created after upgrading Featuretools." ] }, { "cell_type": "code", "execution_count": null, "id": "01c19e97", "metadata": {}, "outputs": [], "source": [ "import featuretools as ft\n", "\n", "es_train = ft.demo.load_mock_customer(return_entityset=True)\n", "es_test = ft.demo.load_mock_customer(return_entityset=True, random_seed=33)" ] }, { "cell_type": "markdown", "id": "042f8c02", "metadata": {}, "source": [ "Now let's build some features definitions using DFS. Because we have categorical features, we also encode them with one hot encoding based on the values in the training data." ] }, { "cell_type": "code", "execution_count": null, "id": "6bcc87a0", "metadata": {}, "outputs": [], "source": [ "feature_matrix, feature_defs = ft.dfs(entityset=es_train,\n", " target_dataframe_name=\"customers\")\n", "\n", "feature_matrix_enc, features_enc = ft.encode_features(feature_matrix, feature_defs)\n", "feature_matrix_enc" ] }, { "cell_type": "markdown", "id": "03ffe00a", "metadata": {}, "source": [ "Now, we can use [featuretools.save_features](../generated/featuretools.save_features.rst#featuretools.save_features) to save a list features to a json file" ] }, { "cell_type": "code", "execution_count": null, "id": "79d4ff65", "metadata": {}, "outputs": [], "source": [ "ft.save_features(features_enc, \"feature_definitions.json\")" ] }, { "cell_type": "markdown", "id": "67723f25", "metadata": {}, "source": [ "## Calculating Feature Matrix for New Data\n", "\n", "We can use [featuretools.load_features](../generated/featuretools.load_features.rst#featuretools.load_features) to read in a list of saved features to calculate for our new entity set." ] }, { "cell_type": "code", "execution_count": null, "id": "a8f728c0", "metadata": {}, "outputs": [], "source": [ "saved_features = ft.load_features('feature_definitions.json')" ] }, { "cell_type": "markdown", "id": "1624ea4d", "metadata": {}, "source": [ "After we load the features back in, we can calculate the feature matrix." ] }, { "cell_type": "code", "execution_count": null, "id": "f37f61e0", "metadata": {}, "outputs": [], "source": [ "feature_matrix = ft.calculate_feature_matrix(saved_features, es_test)\n", "feature_matrix" ] }, { "cell_type": "markdown", "id": "c9f39b54", "metadata": {}, "source": [ "As you can see above, we have the exact same features as before, but calculated using the test data." ] }, { "cell_type": "markdown", "id": "42a47ad9", "metadata": {}, "source": [ "## Exporting Feature Matrix\n", "\n", "### Save as csv\n", "\n", "The feature matrix is a pandas DataFrame that we can save to disk" ] }, { "cell_type": "code", "execution_count": null, "id": "570c69fa", "metadata": {}, "outputs": [], "source": [ "feature_matrix.to_csv(\"feature_matrix.csv\")" ] }, { "cell_type": "markdown", "id": "f0fc5342", "metadata": {}, "source": [ "We can also read it back in as follows:" ] }, { "cell_type": "code", "execution_count": null, "id": "297db0a6", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "saved_fm = pd.read_csv(\"feature_matrix.csv\", index_col=\"customer_id\")\n", "saved_fm" ] }, { "cell_type": "code", "execution_count": null, "id": "1b84dc51", "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "import os\n", "os.remove(\"feature_definitions.json\")\n", "os.remove(\"feature_matrix.csv\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 5 }