Pandas to_sql, the step by step guide to writing data

The Pandas method to_sql is used to send data from a DataFrame to a SQL database. Common use cases for this method are sending data back from batch machine learning predictions, tracking metrics, and storing prepared datasets.

Pandas to_sql is simple to use, and this is the process:

  1. Install pandas and sqlalchemy to your environment using Conda or Pip
  2. Create a connection to your database using sqlalchemy
  3. Create a DataFrame
  4. Define the table name what to do if the table already exists
  5. Open the connection and use the to_sql method on your DataFrame to send the data to the defined table

Pandas to_sql, let’s get started

The purpose of this post is to explain how to use the method and an example of it in action. I will split it up into the following sections:

  1. Installing requirements
  2. Connecting to your database, using Snowflake as an example
  3. Looking at the parameters for to_sql
  4. Writing to a database using to_sql

Installing packages required for to_sql

Let’s start by installing the required packages, which are pandas and sqlalchemy. It’s recommended to create a separate Python environment for each project, we will do this by using Conda:

conda create -n env_name python=3.9 pandas sqlalchemy
conda activate env_name

Connecting to a database using SQLAlchemy

In order to use to_sql you need to provide a connection object for the database that you will be sending your data to. This connection object can be provided either by sqlite3 or sqlalchemy , in this example we will be using sqlalchemy as this is the recommended approach by Pandas.

sqlalchemy provides support for multiple database types, but we are going to be using Snowflake as it is a common choice for many.

Create SQLAlchemy connection engine

Making the connection is a fairly simple process, but still requires us to enter in some information. The fields we need to enter are:

  • account_indentifier : The identifier for your instance, which is normally found in the URL used for loading the Snowflake UI: “account_identifier".snowflakecomputing.com
  • user : Your username
  • password : Your password

Once we have this information we can create a connection engine, which is what we will use for to_sql :

from sqlalchemy import create_engine

#Create connection to Snowflake using your account and user

account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'

conn_string = f"snowflake://{user}:{password}@{account_identifier}/"
engine = create_engine(conn_string)

to_sql parameters

to_sql has several parameters which adjust the functionality and data that’s sent to the database. These are:

  • name - SQL table name (required)
  • con - Connection provided by either sqlite or sqlalchemy (required)
  • schema - SQL database schema
  • if_exists - What to do if the table already exists, either ‘fail’, ‘replace’, or ‘append’
  • index - Whether to write the DataFrame index column to the table or not
  • index_label - The name to give the index column if written to the table
  • chunksize - Number of rows to be written at a time
  • dtype - Dictionary of all columns and their data types
  • method - Specify the SQL insertion clause

The optional parameter that is of most interest to users is if_exists as you could want to create a new table with your DataFrame or append an existing one.

Use to_sql to write data to Snowflake database

Now that we have installed our requirements, created the connection, and understood the parameters involved in the method, we can start writing data to our database.

In this example we will write a very simple DataFrame to a table in Snowflake:

import pandas as pd
from sqlalchemy import create_engine

#Create connection to Snowflake using your account and user

account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'

conn_string = f"snowflake://{user}:{password}@{account_identifier}/"
engine = create_engine(conn_string)

#Create your DataFrame

table_name = 'jobs'
df = pd.DataFrame(data=[['Stephen','Data scientist'],['Jane','Data analyst']],columns=['Name','Job'])

#What to do if the table exists? replace, append, or fail?

if_exists = 'replace'

#Write the data to Snowflake

with engine.connect() as con:
    df.to_sql(
        name=table_name.lower(), 
        con=con, 
        if_exists=if_exists
    )

References

Pandas to_sql documentation
SQLAlchemy documentation

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway