Member-only story

Machine learning basics (part 6): Classification

9 min readMay 13, 2022

In this part, we will take a concise view of different central tasks and concepts involved in machine learning, classification in particular. A task here refers to whatever it is that machine learning is intended to improve preformance. When this is a classification task, we need to learn an appropriate classifier from training data. There are several types of classifiers, for instance, Bayesian classifiers, distance-based classifiers, say, nearest neighbor searching to name a few. Generally we call them as models.

BASIC CONCEPTS

Label

To classify, labels of data cases are known so that computational models can be built on the basis of a training set and validated with a test set. (Other tasks than classification, labels are typically unkown and can be generated by for instance, clustering).

Noise

Data cases are represented with variables (attributes or features). Noise may distort variable values. It is formed by erroneous measurements or random influence like “measuring inaccuracy”. Missing values can appear also. Not withstanding these data quality lowering matters classification is still possible provided that the data set has not heavily deformed. One possible consequence of noisy data is that it is not generally advisable to attempt to match the training data exactly, as this may lead to overfitting the noise.

Machine learning basics (part 6): Classification

BASIC CONCEPTS

Label

Noise

BINARY CLASSIFICATION

Written by Hang Nguyen

No responses yet