Feature engineering for machine learning

Code snippets and examples of how to prepare your data for modelling and create models typically used in data science.

This is a topic page, where our content is gathered into helpful groupings

Feature encoding

Transform categorical feature into sequential numeric features using feature encoding

Label encode multiple columns in a Pandas DataFrame
Label encode unseen values in a Pandas DataFrame
One hot encoding vs label encoding

Feature scaling

Transform a numerical column through scaling to help improve model training

Scale multiple columns in a Pandas DataFrame

Data cleaning

Clean up your datasets to improve your model's ability to learn

Remove outliers from Pandas DataFrame

Groupby operations

Aggregate a dataset to another level by using the Groupby function

Pandas groupby aggregate functions
Pandas groupby column and sum another column

Selecting and changing values

Manipulate and analyse your dataset using the wide range of functions available in Pandas

Pandas loc vs iloc
Set value for multiple rows in Pandas DataFrame
Divide two columns in Pandas DataFrame