Bias vs Variance in ML

seems not clearly explained what it is due to un-cleary math denotion.

The video gave us more precision meaning of bias, variance.

Expected error of algorithm

the goal of the algorithm is to reduce the total error when we do the  prediction/(genelization).

Thus we need to calculate the expected error of the algorithm over prediction data set.

by math:

expected error of an algorithm: (x,y) are draw from samples ( for testing or prediction), D is training set.

where

The E[f^(x,D)] means: expected fitting function/classifier, which means: for one training set D, we can train the model to get f^(x),  for another training set D, we can get another f^(x), when we average all those training sets Ds, we get the E[f^(x, D)].

Variance

The variance means the difference between the one particular classifier/regression function draw from one particular D  and expected regress/classifier.

low variance means: we almost get the same/similar regression function even if the training set is different.  for example: some linear functions.

Bias

The bias: the difference between my expected regress/classifier and real regression/classifier. the expected one is the best this algorithm can do, that means it captures the limitation of this algorithm.

thus the bias of the model/algorithm.

High bias means: linear function to fit curve. no matter what we did, there are bias there.

the irreducible error

Since all three terms are non-negative, the irreducible error forms a lower bound on the expected error on unseen samples

Key to understanding

The key to understanding is thinking of using multiple ( more than 2 for better understanding) different training data set, what are those bias/variance terms’ errors over predict set.

For bias: if low, means no matter what training set, it almost gives us the similar/same regress/classifier.

For variance: if high, it means different training set will give different regress/classifier.

General graph:

For each Algo, we can calculate Total Error, bias, variance ( over the all Training Ds, and  prediction set (x, y),  thus we get the figure above.

More details at: