Pandas loc vs iloc, what's the difference?

What are loc and iloc in Pandas?

loc and iloc are data selection methods in the Python package, Pandas. They both allow the user to index and select data from a DataFrame, but go about it in slightly different ways. There is a common confusion amongst data scientists about when to use loc and when to use iloc, which derives from this similarity.

I hope to clear up some of this confusion in this post.

What does loc do in Pandas?

loc is a Pandas method for selecting data in a DataFrame based on the label of the row or column and uses the following syntax: DataFrame.loc[row, column]

Sounds simple enough right, well you're right. Let's look at an example to explain further.

Say you want to select the second column in your DataFrame, which is labelled 'Revenue', then you would use the label 'Revenue'. The same logic applies when selecting rows as well. If your rows have a label then you can select the rows based on this index label. It's important to remember that the label can be either a numerical value or a string, which in part is where some of the confusion between the two methods comes from.

Let us look at the above example coded out in Python:

import pandas as pd

df = pd.DataFrame(
        {
            "Name": ["David", "Jane", "Susan", "George"],
            "Revenue": [2000, 3000, 4000, 5000],
        },
        index=[100, 101, 102, 103],
    )

#Selecting the Revenue column, and all rows, using loc

df.loc[:,'Revenue']

#Selecting the 101 labelled index, and all columns, using loc

df.loc[101,:]

What does iloc do in Pandas?

iloc is a Pandas method for selecting data in a DataFrame based on the index of the row or column and uses the following syntax: DataFrame.iloc[row, column]

Let's look at the above example again, but how it would work for iloc instead.

When using iloc you select using the index value instead of the label as with loc, this means that our column 'Revenue' will be selected using it's index position, which is 1. The same of course applies to selecting rows as well, where if we wanted to select the second row, we would use the index value, 1, and not it's label which in our case is 101. This can cause confusion when the labels are numerical, as in our example.

Again, let's look at this example using Python:

import pandas as pd

df = pd.DataFrame(
        {
            "Name": ["David", "Jane", "Susan", "George"],
            "Revenue": [2000, 3000, 4000, 5000],
        },
        index=[100, 101, 102, 103],
    )

#Selecting the Revenue column, and all rows, using iloc

df.iloc[:,1]

#Selecting the second row, and all columns, using iloc

df.iloc[1,:]

When should I use loc vs iloc?

As we have seen, loc and iloc have the same purpose but have a subtle difference in their approach. This subtle difference is easily remembered with the following phrase:

iloc for indexes, loc for labels

So...

loc vs iloc, when should you use them?

  • Use iloc when you want to select your rows and columns based on their index values
  • Use loc when you want to select your rows and columns based on their labels

If you're interested in learning more about how to work with data in Pandas, then these articles will be of interest:

Divide two columns in a DataFrame
One hot encoding vs label encoding
Set value for multiple rows in Pandas DataFrame

References

Pandas indexing and selecting data
iloc documentation
loc documentation

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway