Member-only story
Machine Learning basics (part 2)
Continuing from part 1, here we would discuss further a few simple models like Linear Regression with some more basic concepts.
Basic terms
Overfit
The model fits the training data well but perform poorly with testing data.
Underfit
The model does not fit the training data, therefore perform poorly with testing data.
Sum of squares
around the mean
Or written as SS(mean). Calculated by calculating the distance from mean to the data point, squaring it and adding those squares together.
around the line
Or written as SS(fit). Like the SS(mean) with the difference in the mean of training data -> mean of expected values from the fitted line.
Bias
Describes a ML method’s inability to capture the true relationship of chosen variable. The bigger the bias, the less true the relation between variables is (the less the model fits the training data).
Calculating sum of square to see how well a model fits training data. We measure the distances from fit lines to the data, square them (avoid cancel out the by negative values) and add them up.
Formula:
Variance
A number shows how spread the test data to fitted line.