Being able to read data from Snowflake into a Pandas DataFrame makes machine learning development much simpler. In this post I will explain in simple terms how you can do this from start to finish.
Snowflake to Pandas DataFrame, what's the process?
The current best practice for reading data from Snowflake into a Pandas DataFrame is:
- Install the required packages
- Connect to Snowflake using
- Write the SQL select command you want to use
fetch_pandas_allto run the defined SQL command and return the DataFrame
Install Snowflake Connector for Python and Pandas to your machine
In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:
The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:
conda create -n env_name python=3.9 pandas conda activate env_name pip install "snowflake-connector-python[pandas]"
Connect using Snowflake Connector for Python
Now we have the required packages, we can take the first step which is connecting to Snowflake.
Making the connection is a fairly simple process, but still requires us to enter in some information. The fields we need to enter are:
account_indentifier: The identifier for your instance, which is normally found in the URL use when loading up the Snowflake UI:
user: Your username
password: Your password
from snowflake.connector import connect #Use your Snowfake user credentials to connect conn = connect( user='<user_login_name>', password='<password>', account='<account_identifier>' )
Additional parameters for Snowflake Connector for Python
There are also a number of optional fields that we could provide in this connection function, some of the most useful are:
database: The initial database for the session
schema: The initial schema for the session
warehouse: The warehouse for the session
role: The user role for the session
These additional parameters would be added to the same connection function:
from snowflake.connector import connect #Use your Snowfake user credentials to connect conn = connect( user='<user_login_name>', password='<password>', account='<account_identifier>', database='<database>', schema='<schema>', warehouse='<warehouse>', role='<role>' )
Using your credentials to connect to Snowflake
As we can see from the example above, you need to enter your Snowflake credentials to access the data. Because of this, there are two things you need to keep in mind:
- Do not enter your credentials in plain text, it's best to store them as environment variables
- Make sure you have the correct permissions for the given database, schema, and tables you are wanting to access
Use SQL select statement to fetch data into Pandas
The benefit of using this process to read data from Snowflake into Pandas is that it's not limited to reading only from defined tables and views. You can write any SQL select statement and the resulting data will be returned as a DataFrame.
This SQL command can either be written in a
.sql file and read into Python, or it can be written inline. In this example I will show how to perform the latter.
# Create your SQL command sql_query = """ select name, city, age from customers """
Snowflake to Pandas DataFrame using fetch_pandas_all
Now you have the SQL command and the connection to Snowflake, we can complete the process by using
fetch_pandas_all from the Snowflake Python Connector package to get the data and return it as a DataFame.
The method doesn't require any parameters, just that a cursor object with the desired SQL command be created.
In this code example I will combine all the above steps with the final task of fetching the data.
#Creating a Pandas dataframe from a SQL query in Snowflake from snowflake.connector import connect import pandas as pd #Connect to Snowflake using the required user conn = connect( user='<user_login_name>', password='<password>', account='<account_identifier>' ) #Create your SQL command sql_query = """ select name, city, age from customers """ #Create the cursor object with your SQL command cursor = conn.cursor() cursor.execute(sql) #Convert output to a dataframe df = cursor.fetch_pandas_all() cursor.close()
Writing data back to Snowflake
If you are using data from Snowflake to train a machine learning model, you may be interested in writing your predictions back to Snowflake for further use. If so, then you might want to check out the following posts: