Use SQLAlchemy to create Snowflake table from Pandas DataFrame

Creating, replacing, or appending a table in Snowflake directly from a Pandas DataFrame in Python reduces the friction of infrastructure and gets the data into the hands of end users faster. In this post I will show how to create a table in Snowflake from Pandas using only SQLAlchemy.

Stephen Allwright

27 May 2022

Create Snowflake table from Pandas using SQLAlchemy

The current best practice for creating a Snowflake table using SQLAlchemy is:

Install the required packages: pandas sqlalchemy
Connect to Snowflake using sqlalchemy
Create Snowflake table using to_sql method from Pandas

Install SQLAlchemy and Pandas to your machine

In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:

pandas
sqlalchemy

The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:

conda create -n env_name python=3.9 pandas sqlalchemy
conda activate env_name

Connect to Snowflake using SQLAlchemy

Now onto the Python code. First of all you need to connect to Snowflake using your credentials. We do this by using the SQLAlchemy package to create the engine which allows us to interact with Snowflake.

from sqlalchemy import create_engine

#Create connection to Snowflake using your account and user

account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
database_name = '<database_name>'
schema_name = '<schema_name>'

conn_string = f"snowflake://{user}:{password}@{account_identifier}/{database_name}/{schema_name}"
engine = create_engine(conn_string)

Create table if it already exists using SQLAlchemy

If the table you want to create already exists then we need to know what to do with it. to_sql has a parameter if_exists for this eventuality, and there are three options that can be parsed in:

append. Add the data at the end of the existing table
replace. Drop the table and create a new one in it’s place
fail. Return an error

Use to_sql to create Snowflake table from Pandas

Now that we have a connection to Snowflake and decided what to do if the table already exists, we can define the table name. It’s important to note that this table name needs to be lower case, otherwise an error will be returned.

Once this is decided upon then we can let the to_sql method take our DataFrame, do some SQL magic in the background, and create our new table:

import pandas as pd

#Create your DataFrame

table_name = 'cities'
df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])

#What to do if the table exists? replace, append, or fail?

if_exists = 'replace'

#Write the data to Snowflake

with engine.connect() as con:
        df.to_sql(name=table_name.lower(), con=con, if_exists=if_exists)

Snowflake table from Pandas DataFrame using SQLAlchemy, tying it all together

Now that we have explored all aspects of the task as hand, it’s time to bring it all together into one code snippet:

import pandas as pd
from sqlalchemy import create_engine

#Create connection to Snowflake using your account and user

account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
database_name = '<database_name>'
schema_name = '<schema_name>'

conn_string = f"snowflake://{user}:{password}@{account_identifier}/{database_name}/{schema_name}"
engine = create_engine(conn_string)

#Create your DataFrame

table_name = 'cities'
df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])

#What to do if the table exists? replace, append, or fail?

if_exists = 'replace'

#Write the data to Snowflake

with engine.connect() as con:
        df.to_sql(name=table_name.lower(), con=con, if_exists=if_exists)

Write Pandas data to Snowflake table
Create Snowflake table from Pandas

References

SQLAlchemy create engine documentation
Pandas to_sql documentation

Snowfake in Python

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.