SQL Database Integration#
featuretools_sql
is an add-on library that supports automatic EntitySet
creation from a relational database.
Currently, featuretools_sql
is compatible with the following systems:
MySQL
PostgreSQL
Snowflake
The DBConnector
object exposed by the featuretools_sql
library provides the interface to connecting to the DBMS.
Installing featuretools_sql#
Install with pip
python -m pip install "featuretools[sql]"
Connecting to your database instance#
Depending on your choice of DBMS, you may have to provide different pieces of information to the DBConnector
object.
If you want to connect to a MySQL
instance, you must pass the string "mysql"
into the system_name
argument.
If you want to connect to a PostgreSQL
instance, you must pass the string "postgresql"
into the system_name
argument.
If you want to connect to a Snowflake
instance, you must pass the string "snowflake"
into the system_name
argument.
Here is an example call to the constructor of the object, connecting to a PostgreSQL
database:
from featuretools_sql.connector import DBConnector
connector_object = DBConnector(
system_name="postgresql",
user="postgres",
host="localhost",
port="5432",
database="postgres",
schema="public",
)
Note that the choice of RDBMS does affect the required arguments – for example, if you were connecting to a MySQL
instance, you would not need a schema
argument.
Converting to an EntitySet#
You can call the get_entityset
method to instruct the DBConnector
object to build an EntitySet.
This method will loop through all the tables in the database and copy them into dataframes. Then it will populate the relationships data structure. It will finally pass those two arguments into the EntitySet constructor in Featuretools, and return the object.
es = connector_object.get_entityset()
Optionally, you can pass in table names to the select_only
parameter if you only want to include a subset of the tables in the database.
es = connector_object.get_entityset(select_only=["Products", "Transactions"])
Examining the EntitySet’s member data#
You can examine the member data of the DBConnector
object to ensure that it imported data correctly.
To access the dataframes it imported, access the .dataframes
attribute. To access the relationships data structure, access the .relationships
attribute.
If you would like to visualize the EntitySet as a graph, you can call es.plot()
.
Calling DFS#
The EntitySet object is ready to be passed into Featuretools’s DFS
algorithm! Read more about DFS
here.