
Use Snowflake Connector for Python to create table from Pandas DataFrame
One common task that developers have is to create a Snowflake table using a Pandas DataFrame that they’ve created in Python. There are several ways to achieve this, but in this post I will run through how it can be done by using only the Snowflake Python connector.
One common task that developers have is to create a Snowflake table using a Pandas DataFrame that they’ve created in Python. There are several ways to achieve this, but in this post I will run through how it can be done by using only the Snowflake Connector for Python.
Create table using Snowflake Connector for Python and Pandas
By utilising the Snowflake Connector for Python it is easy to read and write data between DataFrames and tables. However, it's not possible to natively create a table in Snowflake using the connector. Therefore we are going to need to create a workaround to achieve what we want, which will look like the following:
- Install the required packages:
pandas snowflake-connector-python
- Connect to Snowflake using
snowflake-connector-python
- Create a custom SQL statement by looping through all columns in the DataFrame
- Execute this custom SQL command to create or replace the table
- Write to this newly created empty table using
write_pandas
Install Snowflake Connector for Python and Pandas to your machine
In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:
pandas
snowflake-connector-python
The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:
conda create -n env_name python=3.9 pandas
conda activate env_name
pip install "snowflake-connector-python[pandas]"
Connect to Snowflake using the Python connector
Now onto the Python code. First of all you need to connect to Snowflake using your credentials. We will be doing this of course by using the Snowflake Python connector:
import os
from snowflake.connector import connect
#Use your Snowfake user credentials, recommended to save these as env variables
conn = connect(
user=os.environ["SNOW_USER"],
password=os.environ["SNOW_PASSWORD"],
account=os.environ["SNOW_ACCOUNT"],
role=os.environ["SNOW_ROLE"],
)
We have now created a connection object with Snowflake that we can use later on to interact with the instance.
SQL query for creating table using Snowflake Connector for Python
Due to the missing functionality that creates a table, we are required to create a custom SQL query ourselves to do the job. We will loop through each column in the DataFrame to find the datatype and name which is then appended to our custom SQL. This SQL query is then executed using our Snowflake connection object. It’s a bit messy, but that’s what you get with a work around:
import pandas as pd
#Define the table name, schema, and database you want to write to
table_name = 'CITIES'
schema = 'CUSTOMERS'
database = 'DW'
#Create the SQL statement to create or replace the table
create_tbl_statement = "CREATE OR REPLACE TABLE " + database + "." + schema + "." + table_name + " (\\n"
# Loop through each column finding the datatype and adding it to the statement
for column in dataframe.columns:
if (
dataframe[column].dtype.name == "int"
or dataframe[column].dtype.name == "int64"
):
create_tbl_statement = create_tbl_statement + column + " int"
elif dataframe[column].dtype.name == "object":
create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"
elif dataframe[column].dtype.name == "datetime64[ns]":
create_tbl_statement = create_tbl_statement + column + " datetime"
elif dataframe[column].dtype.name == "float64":
create_tbl_statement = create_tbl_statement + column + " float8"
elif dataframe[column].dtype.name == "bool":
create_tbl_statement = create_tbl_statement + column + " boolean"
else:
create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"
# If column is not last column, add comma, else end sql-query
if dataframe[column].name != dataframe.columns[-1]:
create_tbl_statement = create_tbl_statement + ",\\n"
else:
create_tbl_statement = create_tbl_statement + ")"
#Execute the SQL statement to create the table
conn.cursor().execute(create_tbl_statement)
Once this command has been run, an empty table will be created in Snowflake which will then allow us to write data to that table.
Use write_pandas to write DataFrame to Snowflake
Now that we have created our empty table in Snowflake using the custom SQL query, we are fill it with the data from our Pandas DataFrame. This is done simply by using the write_pandas
function from the Python connector:
from snowflake.connector.pandas_tools import write_pandas
# Loading our DataFrame data to the newly created empty table
success, num_chunks, num_rows, output = write_pandas(
conn=conn,
df=df,
table_name=table_name,
database=database,
schema=schema
)
The write_pandas
function only requires conn
, df
, and table_name
but I have chosen to also define the database
and schema
as this is a best practice to ensure that the correct table is being modified.
Output from write_pandas function
The function will return some useful information for us so that we can understand if the appending has worked as expected, these are:
success
: True/False for if the function was successfulnum_chunks
: The number of data chunks copiednum_rows
: The number of rows appendedoutput
: TheCOPY INTO
command used
Create table from DataFrame using Snowflake Connector for Python, tying it all together
We’ve looked at all the constituent parts of this task, now let’s pull it all together into one Python script:
from snowflake.connector import connect
from snowflake.connector.pandas_tools import write_pandas
import pandas as pd
import os
#Use your Snowfake user credentials, recommended to save these as env variables
conn = connect(
user=os.environ["SNOW_USER"],
password=os.environ["SNOW_PASSWORD"],
account=os.environ["SNOW_ACCOUNT"],
role=os.environ["SNOW_ROLE"],
)
#Define the table name, schema, and database you want to write to
table_name = 'CITIES'
schema = 'CUSTOMERS'
database = 'DW'
#Create the SQL statement to create or replace the table
create_tbl_statement = "CREATE OR REPLACE TABLE " + database + "." + schema + "." + table_name + " (\\n"
# Loop through each column finding the datatype and adding it to the statement
for column in dataframe.columns:
if (
dataframe[column].dtype.name == "int"
or dataframe[column].dtype.name == "int64"
):
create_tbl_statement = create_tbl_statement + column + " int"
elif dataframe[column].dtype.name == "object":
create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"
elif dataframe[column].dtype.name == "datetime64[ns]":
create_tbl_statement = create_tbl_statement + column + " datetime"
elif dataframe[column].dtype.name == "float64":
create_tbl_statement = create_tbl_statement + column + " float8"
elif dataframe[column].dtype.name == "bool":
create_tbl_statement = create_tbl_statement + column + " boolean"
else:
create_tbl_statement = create_tbl_statement + column + " varchar(16777216)"
# If column is not last column, add comma, else end sql-query
if dataframe[column].name != dataframe.columns[-1]:
create_tbl_statement = create_tbl_statement + ",\\n"
else:
create_tbl_statement = create_tbl_statement + ")"
#Execute the SQL statement to create the table
conn.cursor().execute(create_tbl_statement)
# Loading our DataFrame data to the newly created empty table
success, num_chunks, num_rows, output = write_pandas(
conn=conn,
df=df,
table_name=table_name,
database=database,
schema=schema
)
Related articles
Write Pandas data to Snowflake table
Create Snowflake table from Pandas
References
write_pandas documentation
Python connector installation documentation