
Write Pandas DataFrame to Snowflake table, explained (2022)
Being able to easily write a Pandas DataFrame to a Snowflake table will make your Python workflow considerably easier, whether this be production jobs like loading scheduled predictions or ad-hoc tasks such as a set of prepared features.
Being able to easily write a Pandas DataFrame to a Snowflake table will make your Python workflow considerably easier, whether this be production jobs like loading scheduled predictions or ad-hoc tasks such as a set of prepared features. In this post I will show you the industry best practice for doing exactly this.
Write Pandas DataFrame to Snowflake table
The current best practice for how to write data from a Pandas DataFrame to a Snowflake table is:
- Install the required packages
pandas, snowflake-connector-python, sqlalchemy
- Connect to Snowflake using
sqlalchemy
- Define what to do if the table already exists
- Write DataFrame to Snowflake using
to_sql
from Pandas andpd_writer
from the Python connector
to_sql, write_pandas, and pd_writer, what’s the deal?
When discussing writing data to Snowflake from Pandas there are three methods or functions that get mentioned, which leads to a bit of confusion around what to use, when, and how. Here I will outline what these three functions do and how we will use them in this example:
to_sql
to_sql
is a method in the Pandas package which can load a DataFrame to a SQL database, of which Snowflake is one. It requires a connection to your database, which is provided through the SQLAlchemy package.
to_sql
is what most people use to send data to Snowflake as it allows for the possibility of creating a table if it doesn't exist yet, as well as options for what to do if it does exist. For this reason, we will be using it in our example.
write_pandas
write_pandas
is a method in the Snowflake Connector for Python package which allows the user to append data from a DataFrame to an existing table in Snowflake.
The positive of using write_pandas
is that everything is contained within the Snowflake Connector, however the tradeoff is that you can only append tables that already exist, which is quite restricting.
For this reason, we will not be using this method and have chosen to use to_sql
instead.
pd_writer
pd_writer
is a function in the Snowflake Connector package which can be used in the to_sql
method to speed up the insertion into your table. We will be using this in our example as it provides a significant performance improvement, especially for large datasets.
Install SQLAlchemy, Snowflake Connector for Python and Pandas to your machine
In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:
pandas
snowflake-connector-python
sqlalchemy
The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:
conda create -n env_name python=3.9 pandas sqlalchemy
conda activate env_name
pip install "snowflake-connector-python[pandas]"
Connect to Snowflake using SQLAlchemy
Now onto the Python code. First of all you need to connect to Snowflake using your credentials. We do this by using the SQLAlchemy package to create the engine which allows us to interact with Snowflake.
from sqlalchemy import create_engine
#Create connection to Snowflake using your account and user
account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
database_name = '<database_name>'
schema_name = '<schema_name>'
conn_string = f"snowflake://{user}:{password}@{account_identifier}/{database_name}/{schema_name}"
engine = create_engine(conn_string)
Entering Snowflake credentials in Python
As you can see, your Snowflake credentials are required to create this connection. You can of course decide what the best insertion method is for your project, whether that be environment variables or something else, but given that this is sensitive information you should remember the golden rule:
Do not enter your credentials in plain text, especially when working within a Git repository
Gaining write access to Snowflake tables
OK, one last note on connecting to your instance, I promise. Your Snowflake user will have a certain level of access which grants you rights to certain databases, schemas, and tables. So make sure that you have write access to the database and schema you are interacting with.
Create table in Snowflake if it already exists
In this example we are using the to_sql
method from Pandas to write our data to Snowflake, which is the current best practice when writing data. The reason that it is so widely used, instead write_pandas
, is because we don’t need to worry if this table already exists or not.
All we need to do is define what to do if the table does in fact already exist, the options are either:
append
: Add the data at the end of the existing tablereplace
: Drop the table and create a new one in it’s placefail
: Return an error
Use to_sql to write Pandas DataFrame to Snowflake
Next up is defining the table name, which will be searched for or created in the schema and database that we stated earlier.
After the table has been defined we will use the to_sql
function to write the data, which handles all the behind the scenes SQL magic. Let’s put this into action:
from snowflake.connector.pandas_tools import pd_writer
import pandas as pd
#Create your DataFrame
table_name = 'cities'
df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])
#What to do if the table exists? replace, append, or fail?
if_exists = 'replace'
#Write the data to Snowflake, using pd_writer to speed up loading
with engine.connect() as con:
df.to_sql(name=table_name.lower(), con=con, if_exists=if_exists, method=pd_writer)
You may have noticed that we use table_name.lower()
as the table name in the method, this is because Snowflake requires only lower case table names, otherwise an error is returned.
Writing to Snowflake table from DataFrame, bringing it all together
Now that we have explored all aspects of the task as hand, it’s time to bring it all together into one code snippet:
from snowflake.connector.pandas_tools import pd_writer
import pandas as pd
from sqlalchemy import create_engine
#Create connection to Snowflake using your account and user
account_identifier = '<account_identifier>'
user = '<user_login_name>'
password = '<password>'
database_name = '<database_name>'
schema_name = '<schema_name>'
conn_string = f"snowflake://{user}:{password}@{account_identifier}/{database_name}/{schema_name}"
engine = create_engine(conn_string)
#Create your DataFrame
table_name = 'cities'
df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])
#What to do if the table exists? replace, append, or fail?
if_exists = 'replace'
#Write the data to Snowflake, using pd_writer to speed up loading
with engine.connect() as con:
df.to_sql(name=table_name.lower(), con=con, if_exists=if_exists, method=pd_writer)
Related articles
Connect to Snowflake from Python
Create Snowflake table from Pandas
Read Snowflake data to Pandas DataFrame
Run Snowflake SQL queries in Python
References
write_pandas documentation
Python connector installation documentation
Pandas to_sql documentation
Snowflake pd_writer documentation
SQLAlchemy create engine documentation