What is a Pandas groupby aggregate function?
Groupby is a function for Pandas which allows you to aggregate a DataFrame up a higher level of extraction. For example, if you have row level order data but want to calculate the data on a customer level then you could use groupby on the customer identifier to do this, therefore allowing you to present calculations such as
total revenue and
mean revenue per order.
What are the possible Pandas groupby aggregate functions?
When using the groupby function you must define which columns will be aggregated and what type of aggregation calculations should be undertaken. You can use separate packages such as NumPy for aggregations within the groupby function, however there are a number of built in aggregations that are very simple to use, these are:
- count() – Number of non-null observations
- nunique() - Number of unique values
- sum() – Sum of values
- mean() – Mean of values
- median() – Arithmetic median of values
- mad() - Mean absolute deviation of values
- prod() - Product of values
- min() – Minimum
- max() – Maximum
- mode() – Mode
- std() – Standard deviation
- var() – Variance
You can use these aggregations in the following way: