Pandas
Pandas dataframes code snippets and real world examples from data solutions in production. Learn from how pandas dataframes are used in industry
Divide two columns in Pandas DataFrame Paid Members Public
When working with a data science or machine learning project it is common to use a Pandas DataFrame to store the data, however when it comes to feature engineering it can be confusing to know what options are available for arithmetic operations of columns or rows
One hot encoding vs label encoding, which is best? Paid Members Public
Feature encoding is the process of taking a categorical variable and transforming it into a numerical feature for a machine learning model to learn from. This is a required step of feature engineering as machine learning models can only take in numerical features as input.
Remove outliers from Pandas DataFrame Paid Members Public
Outliers are data points in a dataset that are considered to be extreme, false, or not representative of what the data is describing. These outliers can be caused by either incorrect data collection or genuine outlying observations.
Label encode unseen values in a Pandas DataFrame Paid Members Public
Label encoding is a popular method when preparing data for machine learning, but once the label encoder is fitted to a set of data, it returns an error when asked to transform a value not seen during the fitting
Scale multiple columns in a Pandas DataFrame Paid Members Public
The practical problem when working with feature scaling in a real world project is that you are often required to scale mutliple features, and also apply the same scaling that was fit on your training data to your scoring data later on
Label encode multiple columns in a Pandas DataFrame Paid Members Public
Label encoding is a feature engineering method for categorical features, where a column with values ['egg','flour','bread'] would be turned in to [0,1,2] which is usable by a machine learning model