Bias-Variance Decomposition

Suppose that Y = f(X) + ε with noise term ε having Var(ε) = σ².
Let g(X;θ) be a trained/fitted model used to predict Y based on X (i.e., ideally g ≈ f), where θ represents the vector of trainable model parameters.
Consider the expected squared prediction error for a new input point x, and denote y = (Y|X=x). Then

Sq.Err. = E((y - g(x;θ)²)
= σ² + [E(y) - E(g(x;θ)]² + E([g(x;θ) - E(g(x;θ))]²)
= "irreducible error" + Bias²(g(x;θ)) + Variance(g(x;θ))

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!