Machine learning basics (part 6): Classification

Hang Nguyen
9 min readMay 13, 2022

In this part, we will take a concise view of different central tasks and concepts involved in machine learning, classification in particular. A task here refers to whatever it is that machine learning is intended to improve preformance. When this is a classification task, we need to learn an appropriate classifier from training data. There are several types of classifiers, for instance, Bayesian classifiers, distance-based classifiers, say, nearest neighbor searching to name a few. Generally we call them as models.

BASIC CONCEPTS

Label

To classify, labels of data cases are known so that computational models can be built on the basis of a training set and validated with a test set. (Other tasks than classification, labels are typically unkown and can be generated by for instance, clustering).

Noise

Data cases are represented with variables (attributes or features). Noise may distort variable values. It is formed by erroneous measurements or random influence like “measuring inaccuracy”. Missing values can appear also. Not withstanding these data quality lowering matters classification is still possible provided that the data set has not heavily deformed. One possible consequence of noisy data is that it is not generally advisable to attempt to match the training data exactly, as this may lead to overfitting the noise.

BINARY CLASSIFICATION

--

--

Hang Nguyen

A Data Engineer with a passion for technology, literature, and philosophy.