Snowflake to Pandas DataFrame using Python connector

Snowflake to Pandas DataFrame using fetch_pandas_all

Being able to read data from Snowflake into a Pandas DataFrame makes machine learning development much simpler. In this post I will explain in simple terms how you can do this from start to finish.

Stephen Allwright
Stephen Allwright

Being able to read data from Snowflake into a Pandas DataFrame makes machine learning development much simpler. In this post I will explain in simple terms how you can do this from start to finish.

Snowflake to Pandas DataFrame, what's the process?

The current best practice for reading data from Snowflake into a Pandas DataFrame is:

  1. Install the required packages pandas, snowflake-connector-python
  2. Connect to Snowflake using snowflake-connector-python
  3. Write the SQL select command you want to use
  4. Use fetch_pandas_all to run the defined SQL command and return the DataFrame

Install Snowflake Connector for Python and Pandas to your machine

In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:

  • pandas
  • snowflake-connector-python

The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:

conda create -n env_name python=3.9 pandas
conda activate env_name
pip install "snowflake-connector-python[pandas]"

Connect using Snowflake Connector for Python

Now we have the required packages, we can take the first step which is connecting to Snowflake.

Making the connection is a fairly simple process, but still requires us to enter in some information. The fields we need to enter are:

  • account_indentifier : The identifier for your instance, which is normally found in the URL use when loading up the Snowflake UI: “account_identifier".snowflakecomputing.com
  • user : Your username
  • password : Your password
from snowflake.connector import connect

#Use your Snowfake user credentials to connect

conn = connect(
        user='<user_login_name>',
        password='<password>',
        account='<account_identifier>'
    )

Additional parameters for Snowflake Connector for Python

There are also a number of optional fields that we could provide in this connection function, some of the most useful are:

  • database : The initial database for the session
  • schema : The initial schema for the session
  • warehouse : The warehouse for the session
  • role : The user role for the session

These additional parameters would be added to the same connection function:

from snowflake.connector import connect

#Use your Snowfake user credentials to connect

conn = connect(
        user='<user_login_name>',
        password='<password>',
        account='<account_identifier>',
        database='<database>',
        schema='<schema>',
        warehouse='<warehouse>',
        role='<role>'
    )

Using your credentials to connect to Snowflake

As we can see from the example above, you need to enter your Snowflake credentials to access the data. Because of this, there are two things you need to keep in mind:

  1. Do not enter your credentials in plain text, it's best to store them as environment variables
  2. Make sure you have the correct permissions for the given database, schema, and tables you are wanting to access

Use SQL select statement to fetch data into Pandas

The benefit of using this process to read data from Snowflake into Pandas is that it's not limited to reading only from defined tables and views. You can write any SQL select statement and the resulting data will be returned as a DataFrame.

This SQL command can either be written in a .sql file and read into Python, or it can be written inline. In this example I will show how to perform the latter.

# Create your SQL command

sql_query = 
	"""
    select name, city, age
    from customers
    """

Snowflake to Pandas DataFrame using fetch_pandas_all

Now you have the SQL command and the connection to Snowflake, we can complete the process by using fetch_pandas_all from the Snowflake Python Connector package to get the data and return it as a DataFame.

The method doesn't require any parameters, just that a cursor object with the desired SQL command be created.

In this code example I will combine all the above steps with the final task of fetching the data.

#Creating a Pandas dataframe from a SQL query in Snowflake

from snowflake.connector import connect
import pandas as pd

#Connect to Snowflake using the required user

conn = connect(
        user='<user_login_name>',
        password='<password>',
        account='<account_identifier>'
    )

#Create your SQL command

sql_query = 
	"""
    select name, city, age
    from customers
    """

#Create the cursor object with your SQL command

cursor = conn.cursor()
cursor.execute(sql)

#Convert output to a dataframe

df = cursor.fetch_pandas_all()
cursor.close()

Writing data back to Snowflake

If you are using data from Snowflake to train a machine learning model, you may be interested in writing your predictions back to Snowflake for further use. If so, then you might want to check out the following posts:

Write Pandas data to Snowflake table‌‌
Create Snowflake table from Pandas


Execute a Snowflake SQL command in Python

References

Snowflake documentation

‌‌

Snowfake in Python

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.