Bias-Variance Decomposition

Suppose that Y = f(X) + ε with noise term ε having Var(ε) = σ².

Let g(X;θ) be a trained/fitted model used to predict Y based on X (i.e., ideally g ≈ f), where θ represents the vector of trainable model parameters.

Consider the expected squared prediction error for a new input point x, and denote y = (Y|X=x). Then

Sq.Err. = E((y - g(x;θ)²)

= σ² + [E(y) - E(g(x;θ)]² + E([g(x;θ) - E(g(x;θ))]²)

= "irreducible error" + Bias²(g(x;θ)) + Variance(g(x;θ))