SQL Database Integration#

featuretools_sql is an add-on library that supports automatic EntitySet creation from a relational database.

Currently, featuretools_sql is compatible with the following systems:

  • MySQL

  • PostgreSQL

  • Snowflake

The DBConnector object exposed by the featuretools_sql library provides the interface to connecting to the DBMS.

Installing featuretools_sql#

Install with pip

python -m pip install "featuretools[sql]"

Connecting to your database instance#

Depending on your choice of DBMS, you may have to provide different pieces of information to the DBConnector object.

If you want to connect to a MySQL instance, you must pass the string "mysql" into the system_name argument.

If you want to connect to a PostgreSQL instance, you must pass the string "postgresql" into the system_name argument.

If you want to connect to a Snowflake instance, you must pass the string "snowflake" into the system_name argument.

Here is an example call to the constructor of the object, connecting to a PostgreSQL database:

from featuretools_sql.connector import DBConnector

connector_object = DBConnector(
    system_name="postgresql",
    user="postgres",
    host="localhost",
    port="5432",
    database="postgres",
    schema="public",
)

Note that the choice of RDBMS does affect the required arguments – for example, if you were connecting to a MySQL instance, you would not need a schema argument.

Converting to an EntitySet#

You can call the get_entityset method to instruct the DBConnector object to build an EntitySet.

This method will loop through all the tables in the database and copy them into dataframes. Then it will populate the relationships data structure. It will finally pass those two arguments into the EntitySet constructor in Featuretools, and return the object.

es = connector_object.get_entityset()

Optionally, you can pass in table names to the select_only parameter if you only want to include a subset of the tables in the database.

es = connector_object.get_entityset(select_only=["Products", "Transactions"])

Examining the EntitySet’s member data#

You can examine the member data of the DBConnector object to ensure that it imported data correctly.

To access the dataframes it imported, access the .dataframes attribute. To access the relationships data structure, access the .relationships attribute.

If you would like to visualize the EntitySet as a graph, you can call es.plot().

Calling DFS#

The EntitySet object is ready to be passed into Featuretools’s DFS algorithm! Read more about DFS here.