How to use SUBSTRING function in Snowflake

Substring, also known as substr, is a string and binary function in Snowflake. In this post I will walk through what it’s useful for and how to use it in your database.

Stephen Allwright
Stephen Allwright

Substring, also known as substr, is a string and binary function in Snowflake, in its simplest form all you need to run is select substring(column,start,length) from table. In this post I will walk through what it’s useful for and how to use it in your database.

Snowflake substring, what is it?

Substring is a Snowflake function which takes a column and reduces each value to a subset of the original, based on the start index and length defined. It is similar to the functions left and right and is also known as splitting, slicing, or cutting.

How do you use substring in Snowflake?

Using substring in Snowflake is easy once you get the hang of it. Firstly, let’s look at the syntax of the function:

substring( base_expr , start_expr , length_expr )

The parameters for the function are:

  • base_expr - The column or value which will be split
  • start_expr - The index position where the splitting will start
  • length_expr - The number of characters to return after start_expr

There are a few things that you should be aware of when using this function:

  • The first index value of a string is 1, not 0
  • The base_expr must either be a binary or a string, and the output value is the same type as this input
  • If start_expr is negative then this is the index in relation to the end of the base_expr
  • length_expr must be greater than 0, otherwise an empty string is returned
  • length_expr is not a required field. If it is blank then all characters after start_expr are returned
  • null is returned if any of the inputs are null

Example of using substring function in Snowflake

Substring is a very helpful function to know as it is a catch all for slicing column values. To demonstrate how this function could be used in a real use case, let’s look at an example where you want to return different sections of a phone number:

/*
| phone_number |
|--------------|
| 123456789    |
| 987654321    |
*/

select 
phone_number, 
substring(phone_number, 1, 3) as first_3_digits,
substring(phone_number, 4, 3) as middle_3_digits,
substring(phone_number, -3, 3) as last_3_digits,
substring(phone_number, -6) as last_6_digits
from phone_number

/*
| phone_number | first_3_digits | middle_3_digits | last_3_digits | last_6_digits |
|--------------|----------------|-----------------|---------------|---------------|
| 123456789    | 123            | 456             | 789           | 456789        |
| 987654321    | 987            | 654             | 321           | 654321        |
*/

Snowflake coalesce function
Snowflake replace function
Snowflake listagg function
Snowflake concat function
Select a random sample of rows in Snowflake

References

Snowflake substring documentation

Snowflake

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.