Select a random sample of rows in Snowflake

Learn how to select a sample of rows randomly from a table or view in Snowflake.

Stephen Allwright
Stephen Allwright

What is the sample function in Snowflake?

The sample function in Snowflake allows you to select either a fixed number or a certain percentage of rows in a table or view.

The syntax for the function is:

select * from table sample ...

What is the difference between sample and tablesample?

There are two functions in Snowflake that can be used to sample rows, they are sample and tablesample. They can be used interchangeably, but in this tutorial, we will be using the more commonly used sample.

Sample a percentage of rows

There are two main use cases for using the sample function, the first we will look at is when you want to sample a percentage of rows randomly from a table or view.

The syntax for returning a percentage of rows is:

select * from table sample (x);

Where x is the percentage you want to return, represented by an integer or float between 0 (no rows) and 100 (all rows).

Let's look at an example where you want to return 10.5% of the rows in your table. The query for this would be:

select * from table sample (10.5);
💡
What we're defining here is the probability that a row will be selected, but we can see it simply as the percentage of rows being returned.

Sample a fixed number of rows

Another common use case for using sample is to return a fixed number of rows randomly.

The syntax for doing this is:

select * from table sample (x rows);

Where x is the number of rows you want to return, represented by an integer between 0 and 1,000,000.

💡
Each row will then have an x/num_rows probability of being included in the sample.

Therefore, if you wanted to return 150 rows from your table, this would be the query:

select * from table sample (150 rows)

Summary of how to use the sample function

To summarise what we covered in this tutorial:

Sample a percentage of rows

select * from table sample (x);

Sample a fixed number of rows

select * from table sample (x rows);

Transformation functions

coalesce function
replace function
listagg function
concat function
substring function
ifnull function

References

Sample documentation

Snowflake

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.