
Pandas groupby column and sum another column
GroupBy is a method in the Pandas package which allows the user to aggregate a DataFrame to a given column’s unique values. Whilst undertaking this operation it’s also possible to aggregate the values in other columns, such as taking the sum of all values.
Groupby is a method in the Pandas package which allows the user to aggregate a DataFrame to a given column’s unique values. Whilst undertaking this operation it’s also possible to aggregate the values in other columns, such as taking the sum of all values. In this post I will explain how we can use this method to aggregate one column whilst summing another.
Pandas groupby column and sum another column
To groupby one column whilst summing another column, you need to do the following:
- Define the column to aggregate, this will be used in the
by
parameter - Define the column or columns you want to sum, these will be called on the
groupby
method - Run the following:
df.groupby(by='column1')['column2'].sum()
A more in depth example can be seen here:
import pandas as pd
df = pd.DataFrame(
[[1, 30, 20, 40], [2, 65, 55, 75], [2, 82, 93, 104], [1, 11, 12, 13]],
columns=["column1", "column2", "column3", "column4"],
)
print(df)
"""
Output:
column1 column2 column3 column4
1 30 20 40
2 65 55 75
2 82 93 104
1 11 12 13
"""
df_grouped = df.groupby(by="column1")["column2"].sum()
print(df_grouped)
"""
Output:
column1
1 41
2 147
Name: column2, dtype: int64
"""
Pandas groupby column and sum multiple columns
The process for summing multiple columns is very similar to the previous example, but we want to sum for a defined list of columns, not just one.
Sum multiple columns by using column names
In this example we will select multiples columns by their name:
df_grouped = df.groupby(by="column1")["column2","column3"].sum()
print(df_grouped)
"""
Output:
column1 column2 column3
1 41 32
2 147 148
"""
Sum multiple columns by using column index values
If we have a large number of columns, it can be unrealistic to list all the names in plain text, so in this example we sum up all columns after the second column by using their index values:
df_grouped = df.groupby(by="column1")[df.columns[2:]].sum()
print(df_grouped)
"""
Output:
column1 column3 column4
1 32 53
2 148 179
"""
Pandas groupby column and sum all columns
Summing up all columns is the simplest of all the possible situations, and can be done like so:
df_grouped = df.groupby(by="column1").sum()
print(df_grouped)
"""
Output:
column1 column2 column3 column4
1 41 32 53
2 147 148 179
"""
Related articles
Pandas groupby aggregate functions