Pandas groupby column and sum another column

Pandas groupby column and sum another column

GroupBy is a method in the Pandas package which allows the user to aggregate a DataFrame to a given column’s unique values. Whilst undertaking this operation it’s also possible to aggregate the values in other columns, such as taking the sum of all values.

Stephen Allwright
Stephen Allwright

Groupby is a method in the Pandas package which allows the user to aggregate a DataFrame to a given column’s unique values. Whilst undertaking this operation it’s also possible to aggregate the values in other columns, such as taking the sum of all values. In this post I will explain how we can use this method to aggregate one column whilst summing another.

Pandas groupby column and sum another column

To groupby one column whilst summing another column, you need to do the following:

  1. Define the column to aggregate, this will be used in the by parameter
  2. Define the column or columns you want to sum, these will be called on the groupby method
  3. Run the following: df.groupby(by='column1')['column2'].sum()

A more in depth example can be seen here:

import pandas as pd

df = pd.DataFrame(
    [[1, 30, 20, 40], [2, 65, 55, 75], [2, 82, 93, 104], [1, 11, 12, 13]],
    columns=["column1", "column2", "column3", "column4"],
)

print(df)

"""
Output:
column1  column2  column3  column4
1       30       20       40
2       65       55       75
2       82       93      104
1       11       12       13
"""

df_grouped = df.groupby(by="column1")["column2"].sum()

print(df_grouped)

"""
Output:
column1
1    41
2    147
Name: column2, dtype: int64
"""

Pandas groupby column and sum multiple columns

The process for summing multiple columns is very similar to the previous example, but we want to sum for a defined list of columns, not just one.

Sum multiple columns by using column names

In this example we will select multiples columns by their name:

df_grouped = df.groupby(by="column1")["column2","column3"].sum()

print(df_grouped)

"""
Output:
column1  column2  column3                
1             41       32
2            147      148
"""

Sum multiple columns by using column index values

If we have a large number of columns, it can be unrealistic to list all the names in plain text, so in this example we sum up all columns after the second column by using their index values:

df_grouped = df.groupby(by="column1")[df.columns[2:]].sum()

print(df_grouped)

"""
Output:
column1  column3  column4              
1             32       53
2            148      179
"""

Pandas groupby column and sum all columns

Summing up all columns is the simplest of all the possible situations, and can be done like so:

df_grouped = df.groupby(by="column1").sum()

print(df_grouped)

"""
Output:
column1  column2  column3  column4                        
1             41       32       53
2            147      148      179
"""

Pandas groupby aggregate functions

References

Groupby documentation

Pandas

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.