Pandas groupby aggregate functions

Learn what the possible aggregate functions are for Pandas groupby

7 Feb 2023

What is the groupby function in Pandas?

groupby is a function for Pandas which allows you to aggregate a DataFrame up to a higher level of extraction.

As an example, if you have row-level order data but want to aggregate the data on a customer level then you could use groupby on the customer identifier to do this, therefore allowing you to present calculations such as total revenue per customer and mean revenue per order.

Groupby aggregate functions in Pandas

When using the groupby function, you must define which columns will be aggregated and what aggregation calculations should be undertaken.

There are a number of built-in aggregations within Pandas that are very simple to use, these are:

count() – Number of non-null observations
nunique() - Number of unique values
sum() – Sum of values
mean() – Mean of values
median() – Arithmetic median of values
mad() - Mean absolute deviation of values
prod() - Product of values
min() – Minimum
max() – Maximum
mode() – Mode
std() – Standard deviation
var() – Variance

Use groupby with a single aggregation

The syntax for using a single built-in Pandas aggregation is:

df.groupby('customer_id').agg({'revenue':'sum','product_id':'count'})

Use groupby with multiple aggregations

It's also possible to use multiple aggregate functions on the same column, to do that we just need to create a list of functions, like so:

df.groupby('customer_id').agg({'revenue':['sum','mean','std'],'product_id':['count','nunique']})

Pandas groupby column and sum another column
Divide columns
Scale multiple columns
Label encode columns
Remove outliers

References

Groupby documentation
Aggregate documentation

Pandas

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.