Member-only story

Machine Learning basics (part 2)

Hang Nguyen
3 min readApr 12, 2022

--

Continuing from part 1, here we would discuss further a few simple models like Linear Regression with some more basic concepts.

Basic terms

Overfit

The model fits the training data well but perform poorly with testing data.

Underfit

The model does not fit the training data, therefore perform poorly with testing data.

Sum of squares

around the mean

Or written as SS(mean). Calculated by calculating the distance from mean to the data point, squaring it and adding those squares together.

around the line

Or written as SS(fit). Like the SS(mean) with the difference in the mean of training data -> mean of expected values from the fitted line.

Bias

Describes a ML method’s inability to capture the true relationship of chosen variable. The bigger the bias, the less true the relation between variables is (the less the model fits the training data).

Calculating sum of square to see how well a model fits training data. We measure the distances from fit lines to the data, square them (avoid cancel out the by negative values) and add them up.

Formula:

Variance

A number shows how spread the test data to fitted line.

--

--

Hang Nguyen
Hang Nguyen

Written by Hang Nguyen

Just sharing (data) knowledge

No responses yet