Use Snowflake Connector for Python to create table from Pandas DataFrame

Use Snowflake Connector for Python to create table from Pandas DataFrame

One common task that developers have is to create a Snowflake table using a Pandas DataFrame that they’ve created in Python. There are several ways to achieve this, but in this post I will run through how it can be done by using only the Snowflake Python connector.

Stephen Allwright
Stephen Allwright

One common task that developers have is to create a Snowflake table using a Pandas DataFrame that they’ve created in Python. There are several ways to achieve this, but in this post I will run through how it can be done by using only the Snowflake Connector for Python.

Create table using Snowflake Connector for Python and Pandas

By utilising the Snowflake Connector for Python it is easy to read and write data between DataFrames and tables. However, it's not possible to natively create a table in Snowflake using the connector. Therefore we are going to need to create a workaround to achieve what we want, which will look like the following:

  1. Install the required packages: pandas snowflake-connector-python
  2. Connect to Snowflake using snowflake-connector-python
  3. Create a custom SQL statement by looping through all columns in the DataFrame
  4. Execute this custom SQL command to create or replace the table
  5. Write to this newly created empty table using write_pandas

Install Snowflake Connector for Python and Pandas to your machine

In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:

  • pandas
  • snowflake-connector-python

The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:

conda create -n env_name python=3.9 pandas
conda activate env_name
pip install "snowflake-connector-python[pandas]"

Connect to Snowflake using the Python connector

Now onto the Python code. First of all you need to connect to Snowflake using your credentials. We will be doing this of course by using the Snowflake Python connector:

import os
from snowflake.connector import connect

#Use your Snowfake user credentials, recommended to save these as env variables

conn = connect(
        user=os.environ["SNOW_USER"],
        password=os.environ["SNOW_PASSWORD"],
        account=os.environ["SNOW_ACCOUNT"],
        role=os.environ["SNOW_ROLE"],
    )

We have now created a connection object with Snowflake that we can use later on to interact with the instance.

SQL query for creating table using Snowflake Connector for Python

Due to the missing functionality that creates a table, we are required to create a custom SQL query ourselves to do the job. We will loop through each column in the DataFrame to find the datatype and name which is then appended to our custom SQL. This SQL query is then executed using our Snowflake connection object. It’s a bit messy, but that’s what you get with a work around:

import pandas as pd

#Define the table name, schema, and database you want to write to

table_name = 'CITIES'
schema = 'CUSTOMERS'
database = 'DW'

#Create the SQL statement to create or replace the table

create_tbl_statement = "CREATE OR REPLACE TABLE " + database + "." + schema + "." + table_name + " (\\n"

# Loop through each column finding the datatype and adding it to the statement

for column in dataframe.columns:
    if (
    dataframe[column].dtype.name == "int"
    or dataframe[column].dtype.name == "int64"
    ):
    	create_tbl_statement = create_tbl_statement + column + " int"
    elif dataframe[column].dtype.name == "object":
    	create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"
    elif dataframe[column].dtype.name == "datetime64[ns]":
    	create_tbl_statement = create_tbl_statement + column + " datetime"
    elif dataframe[column].dtype.name == "float64":
    	create_tbl_statement = create_tbl_statement + column + " float8"
    elif dataframe[column].dtype.name == "bool":
    	create_tbl_statement = create_tbl_statement + column + " boolean"
    else:
    	create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"

    # If column is not last column, add comma, else end sql-query
    if dataframe[column].name != dataframe.columns[-1]:
    	create_tbl_statement = create_tbl_statement + ",\\n"
    else:
    	create_tbl_statement = create_tbl_statement + ")"
       
#Execute the SQL statement to create the table

conn.cursor().execute(create_tbl_statement)

Once this command has been run, an empty table will be created in Snowflake which will then allow us to write data to that table.

Use write_pandas to write DataFrame to Snowflake

Now that we have created our empty table in Snowflake using the custom SQL query, we are fill it with the data from our Pandas DataFrame. This is done simply by using the write_pandas function from the Python connector:

from snowflake.connector.pandas_tools import write_pandas

# Loading our DataFrame data to the newly created empty table

success, num_chunks, num_rows, output = write_pandas(
            conn=conn,
            df=df,
            table_name=table_name,
            database=database,
            schema=schema
        )

The write_pandas function only requires conn , df , and table_name but I have chosen to also define the database and schema as this is a best practice to ensure that the correct table is being modified.

Output from write_pandas function

The function will return some useful information for us so that we can understand if the appending has worked as expected, these are:

  • success : True/False for if the function was successful
  • num_chunks : The number of data chunks copied
  • num_rows : The number of rows appended
  • output : The COPY INTO command used

Create table from DataFrame using Snowflake Connector for Python, tying it all together

We’ve looked at all the constituent parts of this task, now let’s pull it all together into one Python script:

from snowflake.connector import connect
from snowflake.connector.pandas_tools import write_pandas
import pandas as pd
import os

#Use your Snowfake user credentials, recommended to save these as env variables

conn = connect(
        user=os.environ["SNOW_USER"],
        password=os.environ["SNOW_PASSWORD"],
        account=os.environ["SNOW_ACCOUNT"],
        role=os.environ["SNOW_ROLE"],
    )

#Define the table name, schema, and database you want to write to

table_name = 'CITIES'
schema = 'CUSTOMERS'
database = 'DW'

#Create the SQL statement to create or replace the table

create_tbl_statement = "CREATE OR REPLACE TABLE " + database + "." + schema + "." + table_name + " (\\n"

# Loop through each column finding the datatype and adding it to the statement

for column in dataframe.columns:
    if (
    dataframe[column].dtype.name == "int"
    or dataframe[column].dtype.name == "int64"
    ):
    	create_tbl_statement = create_tbl_statement + column + " int"
    elif dataframe[column].dtype.name == "object":
    	create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"
    elif dataframe[column].dtype.name == "datetime64[ns]":
    	create_tbl_statement = create_tbl_statement + column + " datetime"
    elif dataframe[column].dtype.name == "float64":
    	create_tbl_statement = create_tbl_statement + column + " float8"
    elif dataframe[column].dtype.name == "bool":
    	create_tbl_statement = create_tbl_statement + column + " boolean"
    else:
    	create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"

    # If column is not last column, add comma, else end sql-query
    if dataframe[column].name != dataframe.columns[-1]:
    	create_tbl_statement = create_tbl_statement + ",\\n"
    else:
    	create_tbl_statement = create_tbl_statement + ")"
       
#Execute the SQL statement to create the table

conn.cursor().execute(create_tbl_statement)

# Loading our DataFrame data to the newly created empty table

success, num_chunks, num_rows, output = write_pandas(
            conn=conn,
            df=df,
            table_name=table_name,
            database=database,
            schema=schema
        )

Write Pandas data to Snowflake table
Create Snowflake table from Pandas

References

write_pandas documentation
Python connector installation documentation

Snowfake in Python

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.