write_pandas and Snowflake Connector for Python

write_pandas and Snowflake Connector for Python, explained

Write_pandas is a method in the Snowflake Connector for Python which allows the user to append data to a table in Snowflake from a Pandas DataFrame. In this post I will show how to use the method and answer some common questions that users have.

Stephen Allwright
Stephen Allwright

Write_pandas is a method in the Snowflake Connector for Python which allows the user to append data to a table in Snowflake from a Pandas DataFrame. In this post I will show how to use the method and answer some common questions that users have.

write_pandas and Snowflake Connector for Python

The best practice for using write_pandas to write a Pandas DataFrame to Snowflake is:

  1. Install the required packages: pandas snowflake-connector-python
  2. Connect to Snowflake using connect method in  snowflake-connector-python
  3. Write DataFrame to Snowflake using write_pandas method from Python connector

Install Snowflake Connector for Python and Pandas to your machine

In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:

  • pandas
  • snowflake-connector-python

The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:

conda create -n env_name python=3.9 pandas
conda activate env_name
pip install "snowflake-connector-python[pandas]"

Connect to Snowflake using the connector for Python

Now onto the Python code. First of all you need to connect to Snowflake using your credentials. We will be doing this of course by using the Snowflake Connector for Python connector:

import os
from snowflake.connector import connect

#Use your Snowfake user credentials, recommended to save these as env variables

conn = connect(
        user=os.environ["SNOW_USER"],
        password=os.environ["SNOW_PASSWORD"],
        account=os.environ["SNOW_ACCOUNT"],
        role=os.environ["SNOW_ROLE"],
    )

We have now created a connection object with Snowflake that we can use later on to interact with the instance.

Using Snowflake credentials in Python

As you can see, your Snowflake credentials are required to create this connection, and we have entered these using environment variables. You can of course decide what the best insertion method is for your project, but given that this is sensitive information you should remember the golden rule:

Do not enter your credentials in plain text, especially when working within a Git repository

User access rights in Snowflake

OK, one last note on connect to your instance, I promise. Your Snowflake user will have a certain level of access which grants you rights to certain databases, schemas, and tables. So make sure that you have the correct level of access for your needs.

Use write_pandas to write Pandas DataFrame to Snowflake

Now that you have your connection to the Snowflake instance you can start to do the interesting bit, writing the data.

In order to write the data you need to first define the table that you will be working with. One of the quirks, and downsides, of using the Snowflake connector functions is that this table needs to already exist before you can append it.

After the table has been defined I will use the write_pandas function to append the data, which does some SQL magic behind the scenes. It first uses a PUT command to upload the data to a temporary storage and then uses COPY INTO to move the data from that storage to the table. Let’s put the function into action:

from snowflake.connector.pandas_tools import write_pandas
import pandas as pd

#Define the table name, schema, and database you want to write to
#Note: the table, schema, and database need to already exist in Snowflake

table_name = 'CITIES'
schema = 'CUSTOMERS'
database = 'DW'

df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])

#Combine these using the function from the Snowflake connector

success, num_chunks, num_rows, output = write_pandas(
            conn=conn,
            df=df,
            table_name=table_name,
            database=database,
            schema=schema
        )

The write_pandas function only requires conn , df , and table_name but I have chosen to also define the database and schema as this is a best practice to ensure that the correct table is being modified.

Output from write_pandas function

The function will return some useful information for us so that we can understand if the appending has worked as expected, these are:

  • success : True/False for if the function was successful
  • num_chunks : The number of data chunks copied
  • num_rows : The number of rows appended
  • output : The COPY INTO command used

Write Pandas DataFrame to table using Snowflake Connector for Python

Now that we have explored all aspects of appending the data, it’s time to bring it all together into one code snippet where we go from connection to writing the data:

from snowflake.connector.pandas_tools import write_pandas
from snowflake.connector import connect
import pandas as pd
import os

#Use your Snowfake user credentials, recommended to save these as env variables

conn = connect(
        user=os.environ["SNOW_USER"],
        password=os.environ["SNOW_PASSWORD"],
        account=os.environ["SNOW_ACCOUNT"],
        role=os.environ["SNOW_ROLE"],
    )

#Define the table name, schema, and database you want to write to
#Note: the table, schema, and database need to already exist in Snowflake

table_name = 'CITIES'
schema = 'CUSTOMERS'
database = 'DW'

df = pd.DataFrame(data=[['Stephen','Oslo'],['Jane','Stockholm']],columns=['Name','City'])

#Combine these using the function from the Snowflake connector

success, num_chunks, num_rows, output = write_pandas(
            conn=conn,
            df=df,
            table_name=table_name,
            database=database,
            schema=schema
        )

Using Snowflake Connector when table doesn’t exist yet

If you try to run these code snippets for a table that doesn’t exist yet in Snowflake then an error will be returned. If you would like to create the table before you append then I would recommend using the best practice for writing Pandas data to Snowflake as described in this post.

Write Pandas data to Snowflake table
Create Snowflake table from Pandas
Create table from Pandas using Python connector

References

write_pandas documentation
Python connector installation documentation

Snowfake in Python

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.

Comments