
Set value for multiple rows in Pandas DataFrame
Being able to set or update the values in multiple rows within a DataFrame is useful when undertaking feature engineering or data cleaning. In this post I will show the various ways you can do this with some simple examples.
Being able to set or update the values in multiple rows within a DataFrame is useful when undertaking feature engineering or data cleaning. In this post I will show the various ways you can do this with some simple examples.
Pandas DataFrame set value for multiple rows
Setting a value for multiple rows in a DataFrame can be done in several ways, but the most common method is to set the new value based on a condition by doing the following: df.loc[df['column1'] >= 100, 'column2'] = 10
Set value for multiple rows based on a condition in Pandas
In this example we are changing values in the Score
column based on a condition in the Age
column.
import pandas as pd
df = pd.DataFrame(
[
["Stephen", 30, 8],
["Olga", 65, 5],
["David", 25, 9],
["Jane", 42, 2],
["Manny", 51, 3],
["Sigrid", 18, 6],
],
columns=["Name", "Age", "Score"],
)
print(df)
"""
Output:
Name Age Score
0 Stephen 30 8
1 Olga 65 5
2 David 25 9
3 Jane 42 2
4 Manny 51 3
5 Sigrid 18 6
"""
df.loc[df['Age'] >= 50, 'Score'] = 10
print(df)
"""
Output:
Name Age Score
0 Stephen 30 8
1 Olga 65 10
2 David 25 9
3 Jane 42 2
4 Manny 51 10
5 Sigrid 18 6
"""
Set value for multiple rows based on index values in Pandas
If you don’t want to change a value based on a condition, but instead change a set of rows based on their index values then there are several ways to do this.
Using .at Pandas method
This method allows you to set a value for a given slice of rows and list of column names.
df.at[:3, ["Age", "Score"]] = 100
"""
Output:
Name Age Score
0 Stephen 100 100
1 Olga 100 100
2 David 100 100
3 Jane 100 100
4 Manny 51 3
5 Sigrid 18 6
"""
Using .iloc Pandas method
If you want to set the value for a slice of rows but don’t want to write the column names in plain text then we can use the .iloc
method which selects columns based on their index values.
df.iloc[:3, [1, 2]] = 100
"""
Output:
Name Age Score
0 Stephen 100 100
1 Olga 100 100
2 David 100 100
3 Jane 42 2
4 Manny 51 3
5 Sigrid 18 6
"""
One difference to note between using these two methods is that .loc
uses exclusive indexing whilst .at
uses inclusive indexing, which is why they update different rows with the same index slice values.
Set value for multiple rows by replacing all occurrences in Pandas
If you want to replace all occurrences of a value regardless of where it is in the DataFrame then using the .replace
method is the best approach.
df.replace(9, 100, inplace=True)
"""
Output:
Name Age Score
0 Stephen 30 8
1 Olga 65 5
2 David 25 100
3 Jane 42 2
4 Manny 51 3
5 Sigrid 18 6
"""
Related articles
If you would like to learn more about selection methods in Pandas then here are some articles that should interest you:
References
Pandas replace documentation
Pandas at documentation
Pandas iloc documentation
Pandas loc documentation