Member-only story
Machine Learning 101 P1: Feature Scaling with Python
There are 3 main stages in any machine learning process:
- Data pre-processing
- Data modeling
- Evaluation ML model
Feature scaling is a part of data pre-processing stage.
What is feature scaling in data pre-processing stage?
This is a method used to normalize or standardize the range of independent variables/features to avoid the unnecessary dominance of one feature over another existing in raw data. Feature scaling is always applied to columns.
There are several feature scaling techniques, some of the most widely known include (note that these techniques are not the same):
- Normalization: used when almost (all) features follow normal distribution.
- Standardization (Z-score Normalization): works all the time :)
- Min-max scaling
- Absolute maximum scaling
Feature scaling before or after splitting dataset into training and test sets
The idea of splitting dataset into training set and test set is to make use of training set to train the ML model and then test set to evaluate the ML model. Feature scaling will manipulate the data in order to set all values of features in the same range. If we perform feature scaling before splitting the dataset…