Being able to set or update the values in multiple rows within a DataFrame is useful when undertaking feature engineering or data cleaning. In this post I will show the various ways you can do this with some simple examples.
GroupBy is a method in the Pandas package which allows the user to aggregate a DataFrame to a given column’s unique values. Whilst undertaking this operation it’s also possible to aggregate the values in other columns, such as taking the sum of all values.
loc and iloc are data selection methods in the Python package, Pandas. They both allow the user to index and select data from a DataFrame, but go about it in slightly different ways
When working with a data science or machine learning project it is common to use a Pandas DataFrame to store the data, however when it comes to feature engineering it can be confusing to know what options are available for arithmetic operations of columns or rows
Feature encoding is the process of taking a categorical variable and transforming it into a numerical feature for a machine learning model to learn from. This is a required step of feature engineering as machine learning models can only take in numerical features as input.