Pandas

Pandas dataframes code snippets and real world examples from data solutions in production. Learn from how pandas dataframes are used in industry

Divide two columns in Pandas DataFrame

Divide two columns in Pandas DataFrame Members Public

When working with a data science or machine learning project it is common to use a Pandas DataFrame to store the data, however when it comes to feature engineering it can be confusing to know what options are available for arithmetic operations of columns or rows

Stephen Allwright
Stephen Allwright
Pandas
One hot encoding vs label encoding

One hot encoding vs label encoding, which is best? Members Public

Feature encoding is the process of taking a categorical variable and transforming it into a numerical feature for a machine learning model to learn from. This is a required step of feature engineering as machine learning models can only take in numerical features as input.

Stephen Allwright
Stephen Allwright
Pandas
Remove outliers from Pandas DataFrame

Remove outliers from Pandas DataFrame Members Public

Outliers are data points in a dataset that are considered to be extreme, false, or not representative of what the data is describing. These outliers can be caused by either incorrect data collection or genuine outlying observations.

Stephen Allwright
Stephen Allwright
Pandas
Label encode unseen values in a Pandas DataFrame

Label encode unseen values in a Pandas DataFrame Members Public

Label encoding is a popular method when preparing data for machine learning, but once the label encoder is fitted to a set of data, it returns an error when asked to transform a value not seen during the fitting

Stephen Allwright
Stephen Allwright
Pandas
Scale multiple columns in a Pandas DataFrame

Scale multiple columns in a Pandas DataFrame Members Public

The practical problem when working with feature scaling in a real world project is that you are often required to scale mutliple features, and also apply the same scaling that was fit on your training data to your scoring data later on

Stephen Allwright
Stephen Allwright
Pandas
Label encode multiple columns in a Pandas DataFrame

Label encode multiple columns in a Pandas DataFrame Members Public

Label encoding is a feature engineering method for categorical features, where a column with values ['egg','flour','bread'] would be turned in to [0,1,2] which is usable by a machine learning model

Stephen Allwright
Stephen Allwright
Pandas