Write Pandas DataFrame to Snowflake table, explained (2022)

Being able to easily write a Pandas DataFrame to a Snowflake table will make your Python workflow considerably easier, whether this be production jobs like loading scheduled predictions or ad-hoc tasks such as a set of prepared features. In this post I will show you the industry best practice for doing exactly this.

Write Pandas DataFrame to Snowflake table

The current best practice for how to write data from a Pandas DataFrame to a Snowflake table is:

  1. Install the required packages pandas, snowflake-connector-python, sqlalchemy
  2. Connect to Snowflake using sqlalchemy
  3. Define what to do if the table already exists
  4. Write DataFrame to Snowflake using to_sql from Pandas and pd_writer from the Python connector

to_sql, write_pandas, and pd_writer, what’s the deal?

When discussing writing data to Snowflake from Pandas there are three methods or functions that get mentioned, which leads to a bit of confusion around what to use, when, and how. Here I will outline what these three functions do and how we will use them in this example:

to_sql

to_sql is a method in the Pandas package which can load a DataFrame to a SQL database, of which Snowflake is one. It requires a connection to your database, which is provided through the SQLAlchemy package.

to_sql is what most people use to send data to Snowflake as it allows for the possibility of creating a table if it doesn't exist yet, as well as options for what to do if it does exist. For this reason, we will be using it in our example.

write_pandas

write_pandas is a method in the Snowflake Connector for Python package which allows the user to append data from a DataFrame to an existing table in Snowflake.

The positive of using write_pandas is that everything is contained within the Snowflake Connector, however the tradeoff is that you can only append tables that already exist, which is quite restricting.

For this reason, we will not be using this method and have chosen to use to_sql instead.

pd_writer

pd_writer is a function in the Snowflake Connector package which can be used in the to_sql method to speed up the insertion into your table. We will be using this in our example as it provides a significant performance improvement, especially for large datasets.

Install SQLAlchemy, Snowflake Connector for Python and Pandas to your machine

In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:

  • pandas
  • snowflake-connector-python
  • sqlalchemy

The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:

conda create -n env_name python=3.9 pandas sqlalchemy
conda activate env_name
pip install "snowflake-connector-python[pandas]"

Connect to Snowflake using SQLAlchemy

Now onto the Python code. First of all you need to connect to Snowflake using your credentials. We do this by using the SQLAlchemy package to create the engine which allows us to interact with Snowflake.

from sqlalchemy import create_engine

#Create connection to Snowflake using your account and user

account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
database_name = '<database_name>'
schema_name = '<schema_name>'

conn_string = f"snowflake://{user}:{password}@{account_identifier}/{database_name}/{schema_name}"
engine = create_engine(conn_string)

Entering Snowflake credentials in Python

As you can see, your Snowflake credentials are required to create this connection. You can of course decide what the best insertion method is for your project, whether that be environment variables or something else, but given that this is sensitive information you should remember the golden rule:

Do not enter your credentials in plain text, especially when working within a Git repository

Gaining write access to Snowflake tables

OK, one last note on connecting to your instance, I promise. Your Snowflake user will have a certain level of access which grants you rights to certain databases, schemas, and tables. So make sure that you have write access to the database and schema you are interacting with.

Create table in Snowflake if it already exists

In this example we are using the to_sql method from Pandas to write our data to Snowflake, which is the current best practice when writing data. The reason that it is so widely used, instead write_pandas , is because we don’t need to worry if this table already exists or not.

All we need to do is define what to do if the table does in fact already exist, the options are either:

  • append: Add the data at the end of the existing table
  • replace: Drop the table and create a new one in it’s place
  • fail: Return an error

Use to_sql to write Pandas DataFrame to Snowflake

Next up is defining the table name, which will be searched for or created in the schema and database that we stated earlier.

After the table has been defined we will use the to_sql function to write the data, which handles all the behind the scenes SQL magic. Let’s put this into action:

from snowflake.connector.pandas_tools import pd_writer
import pandas as pd

#Create your DataFrame

table_name = 'cities'
df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])

#What to do if the table exists? replace, append, or fail?

if_exists = 'replace'

#Write the data to Snowflake, using pd_writer to speed up loading

with engine.connect() as con:
        df.to_sql(name=table_name.lower(), con=con, if_exists=if_exists, method=pd_writer)

You may have noticed that we use table_name.lower() as the table name in the method, this is because Snowflake requires only lower case table names, otherwise an error is returned.

Writing to Snowflake table from DataFrame, bringing it all together

Now that we have explored all aspects of the task as hand, it’s time to bring it all together into one code snippet:

from snowflake.connector.pandas_tools import pd_writer
import pandas as pd
from sqlalchemy import create_engine

#Create connection to Snowflake using your account and user

account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
database_name = '<database_name>'
schema_name = '<schema_name>'

conn_string = f"snowflake://{user}:{password}@{account_identifier}/{database_name}/{schema_name}"
engine = create_engine(conn_string)

#Create your DataFrame

table_name = 'cities'
df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])

#What to do if the table exists? replace, append, or fail?

if_exists = 'replace'

#Write the data to Snowflake, using pd_writer to speed up loading

with engine.connect() as con:
        df.to_sql(name=table_name.lower(), con=con, if_exists=if_exists, method=pd_writer)

Connect to Snowflake from Python
Create Snowflake table from Pandas
Read Snowflake data to Pandas DataFrame
Run Snowflake SQL queries in Python

References

write_pandas documentation
Python connector installation documentation
Pandas to_sql documentation
Snowflake pd_writer documentation
SQLAlchemy create engine documentation

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway