
Snowflake to Pandas DataFrame using fetch_pandas_all
Being able to read data from Snowflake into a Pandas DataFrame makes machine learning development much simpler. In this post I will explain in simple terms how you can do this from start to finish.
Being able to read data from Snowflake into a Pandas DataFrame makes machine learning development much simpler. In this post I will explain in simple terms how you can do this from start to finish.
Snowflake to Pandas DataFrame, what's the process?
The current best practice for reading data from Snowflake into a Pandas DataFrame is:
- Install the required packages
pandas, snowflake-connector-python
- Connect to Snowflake using
snowflake-connector-python
- Write the SQL select command you want to use
- Use
fetch_pandas_all
to run the defined SQL command and return the DataFrame
Install Snowflake Connector for Python and Pandas to your machine
In order to execute the code described in this post you need to first install some required packages to your environment or machine, these are:
pandas
snowflake-connector-python
The best practice is to create a separate Python environment for each project, so I will first create a Conda environment and then install the required packages:
conda create -n env_name python=3.9 pandas
conda activate env_name
pip install "snowflake-connector-python[pandas]"
Connect using Snowflake Connector for Python
Now we have the required packages, we can take the first step which is connecting to Snowflake.
Making the connection is a fairly simple process, but still requires us to enter in some information. The fields we need to enter are:
account_indentifier
: The identifier for your instance, which is normally found in the URL use when loading up the Snowflake UI:“account_identifier".snowflakecomputing.com
user
: Your usernamepassword
: Your password
from snowflake.connector import connect
#Use your Snowfake user credentials to connect
conn = connect(
user='<user_login_name>',
password='<password>',
account='<account_identifier>'
)
Additional parameters for Snowflake Connector for Python
There are also a number of optional fields that we could provide in this connection function, some of the most useful are:
database
: The initial database for the sessionschema
: The initial schema for the sessionwarehouse
: The warehouse for the sessionrole
: The user role for the session
These additional parameters would be added to the same connection function:
from snowflake.connector import connect
#Use your Snowfake user credentials to connect
conn = connect(
user='<user_login_name>',
password='<password>',
account='<account_identifier>',
database='<database>',
schema='<schema>',
warehouse='<warehouse>',
role='<role>'
)
Using your credentials to connect to Snowflake
As we can see from the example above, you need to enter your Snowflake credentials to access the data. Because of this, there are two things you need to keep in mind:
- Do not enter your credentials in plain text, it's best to store them as environment variables
- Make sure you have the correct permissions for the given database, schema, and tables you are wanting to access
Use SQL select statement to fetch data into Pandas
The benefit of using this process to read data from Snowflake into Pandas is that it's not limited to reading only from defined tables and views. You can write any SQL select statement and the resulting data will be returned as a DataFrame.
This SQL command can either be written in a .sql
file and read into Python, or it can be written inline. In this example I will show how to perform the latter.
# Create your SQL command
sql_query =
"""
select name, city, age
from customers
"""
Snowflake to Pandas DataFrame using fetch_pandas_all
Now you have the SQL command and the connection to Snowflake, we can complete the process by using fetch_pandas_all
from the Snowflake Python Connector package to get the data and return it as a DataFame.
The method doesn't require any parameters, just that a cursor object with the desired SQL command be created.
In this code example I will combine all the above steps with the final task of fetching the data.
#Creating a Pandas dataframe from a SQL query in Snowflake
from snowflake.connector import connect
import pandas as pd
#Connect to Snowflake using the required user
conn = connect(
user='<user_login_name>',
password='<password>',
account='<account_identifier>'
)
#Create your SQL command
sql_query =
"""
select name, city, age
from customers
"""
#Create the cursor object with your SQL command
cursor = conn.cursor()
cursor.execute(sql)
#Convert output to a dataframe
df = cursor.fetch_pandas_all()
cursor.close()
Writing data back to Snowflake
If you are using data from Snowflake to train a machine learning model, you may be interested in writing your predictions back to Snowflake for further use. If so, then you might want to check out the following posts:
Write Pandas data to Snowflake table
Create Snowflake table from Pandas
Related articles
Execute a Snowflake SQL command in Python
References