Machine learning basics (part 1)

Hang Nguyen
3 min readApr 12, 2022

Goal of machine learning: to make predictions and do classifications. In order to achieve this goal, we should decide which method fits our needs the best by using Testing Data.

Basic terms

Original data set should be cleansed, modified and transformed so that it is finally sliced normally into 2 datasets:

  • Training Data: Just a smaller random subset of the cleansed data
  • Testing Data: Even smaller random subset of cleansed data

Well, the proportion Training Data:Testing Data normally is 2:1/3, can be different according to one’s own preferences. The difference in the use is as following:

  • Training Data: Used in to build model.
  • Test Data: Used to evaluate Machine Learning methods. Regardless of the ML method, we should always check how it performs with Testing Data.

Don’t be fooled by how well a ML method fits training data. Fitting the training data well but making poor predictions, is called the Bias-Variance Tradeoff.

Steps in ML

Suppose now that we already have the training data. We need to do 2 things:

  • Choose some suitable methods
  • Validate these chosen models or “testing the algorithm"

OK, done with theory part! How about the practical side of the problem? When we first split the data set into training data and testing data, assume…

--

--

Hang Nguyen

A Data Engineer with a passion for technology, literature, and philosophy.