
Pandas to_sql, the step by step guide to writing data
The Pandas method to_sql is used to send data from a DataFrame to a SQL database. Common use cases for this method are sending data back from batch machine learning predictions, tracking metrics, and storing prepared datasets.
The Pandas method to_sql
is used to send data from a DataFrame to a SQL database. Common use cases for this method are sending data back from batch machine learning predictions, tracking metrics, and storing prepared datasets.
Pandas to_sql
is simple to use, and this is the process:
- Install
pandas
andsqlalchemy
to your environment using Conda or Pip - Create a connection to your database using
sqlalchemy
- Create a DataFrame
- Define the table name what to do if the table already exists
- Open the connection and use the
to_sql
method on your DataFrame to send the data to the defined table
Pandas to_sql, let’s get started
The purpose of this post is to explain how to use the method and an example of it in action. I will split it up into the following sections:
- Installing requirements
- Connecting to your database, using Snowflake as an example
- Looking at the parameters for
to_sql
- Writing to a database using
to_sql
Installing packages required for to_sql
Let’s start by installing the required packages, which are pandas
and sqlalchemy
. It’s recommended to create a separate Python environment for each project, we will do this by using Conda:
conda create -n env_name python=3.9 pandas sqlalchemy
conda activate env_name
Connecting to a database using SQLAlchemy
In order to use to_sql
you need to provide a connection object for the database that you will be sending your data to. This connection object can be provided either by sqlite3
or sqlalchemy
, in this example we will be using sqlalchemy
as this is the recommended approach by Pandas.
sqlalchemy
provides support for multiple database types, but we are going to be using Snowflake as it is a common choice for many.
Create SQLAlchemy connection engine
Making the connection is a fairly simple process, but still requires us to enter in some information. The fields we need to enter are:
account_indentifier
: The identifier for your instance, which is normally found in the URL used for loading the Snowflake UI:“account_identifier".snowflakecomputing.com
user
: Your usernamepassword
: Your password
Once we have this information we can create a connection engine, which is what we will use for to_sql
:
from sqlalchemy import create_engine
#Create connection to Snowflake using your account and user
account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
conn_string = f"snowflake://{user}:{password}@{account_identifier}/"
engine = create_engine(conn_string)
to_sql parameters
to_sql
has several parameters which adjust the functionality and data that’s sent to the database. These are:
name
- SQL table name (required)con
- Connection provided by eithersqlite
orsqlalchemy
(required)schema
- SQL database schemaif_exists
- What to do if the table already exists, either ‘fail’, ‘replace’, or ‘append’index
- Whether to write the DataFrame index column to the table or notindex_label
- The name to give the index column if written to the tablechunksize
- Number of rows to be written at a timedtype
- Dictionary of all columns and their data typesmethod
- Specify the SQL insertion clause
The optional parameter that is of most interest to users is if_exists
as you could want to create a new table with your DataFrame or append an existing one.
Use to_sql to write data to Snowflake database
Now that we have installed our requirements, created the connection, and understood the parameters involved in the method, we can start writing data to our database.
In this example we will write a very simple DataFrame to a table in Snowflake:
import pandas as pd
from sqlalchemy import create_engine
#Create connection to Snowflake using your account and user
account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
conn_string = f"snowflake://{user}:{password}@{account_identifier}/"
engine = create_engine(conn_string)
#Create your DataFrame
table_name = 'jobs'
df = pd.DataFrame(data=[['Stephen','Data scientist'],['Jane','Data analyst']],columns=['Name','Job'])
#What to do if the table exists? replace, append, or fail?
if_exists = 'replace'
#Write the data to Snowflake
with engine.connect() as con:
df.to_sql(
name=table_name.lower(),
con=con,
if_exists=if_exists
)