Pandas groupby aggregate functions
Learn what the possible aggregate functions are for Pandas groupby
What is the groupby function in Pandas?
groupby
is a function for Pandas which allows you to aggregate a DataFrame up to a higher level of extraction.
As an example, if you have row-level order data but want to aggregate the data on a customer level then you could use groupby
on the customer identifier to do this, therefore allowing you to present calculations such as total revenue per customer
and mean revenue per order
.
Groupby aggregate functions in Pandas
When using the groupby
function, you must define which columns will be aggregated and what aggregation calculations should be undertaken.
There are a number of built-in aggregations within Pandas that are very simple to use, these are:
count()
– Number of non-null observationsnunique()
- Number of unique valuessum()
– Sum of valuesmean()
– Mean of valuesmedian()
– Arithmetic median of valuesmad()
- Mean absolute deviation of valuesprod()
- Product of valuesmin()
– Minimummax()
– Maximummode()
– Modestd()
– Standard deviationvar()
– Variance
Use groupby with a single aggregation
The syntax for using a single built-in Pandas aggregation is:
df.groupby('customer_id').agg({'revenue':'sum','product_id':'count'})
Use groupby with multiple aggregations
It's also possible to use multiple aggregate functions on the same column, to do that we just need to create a list of functions, like so:
df.groupby('customer_id').agg({'revenue':['sum','mean','std'],'product_id':['count','nunique']})
Related articles
Pandas groupby column and sum another column
Divide columns
Scale multiple columns
Label encode columns
Remove outliers