Reinforcement Learning/Statistical estimators: Bias and Variance

Statistical estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data.

Suppose we have a statistical model, parameterized by a real number θ, giving rise to a probability distribution for observed data, $$P_\theta(x) = P(x\mid\theta)$$.

Assume statistic $$\hat\theta$$ serves as an estimator of θ based on any observed data $$x$$. That is, we assume that our data follow some distribution $$P(x\mid\theta)$$ with unknown value of θ (in other words, θ is a fixed constant that is part of this distribution, but is unknown). We construct some estimator $$\hat\theta$$ that maps observed data to values that we hope are close to θ.

Bias
The bias of an estimator $$\hat\theta$$ relative to $$\theta$$ is defined as


 * $$ \operatorname{Bias}_\theta[\,\hat\theta\,] = \operatorname{E}_{x\mid\theta}[\,\hat{\theta}\,]-\theta$$

where $$\operatorname{E}_{x\mid\theta}$$ denotes expected value over the distribution $$P(x\mid\theta)$$, i.e. averaging over all possible observations $$x$$.

The meaning of a biased estimator is that there is a systematic difference between the estimated parameter ($$\hat{\theta}$$) and the real value of the parameter ($$\theta$$). However, usually the difference becomes smaller with growing number of input data and eventually a biased estimator becomes useful.

An estimator is said to be unbiased if its bias is equal to zero for all values of parameter θ.

= Variance = The meaning of an estimator with high variance is that the estimated parameter ($$\hat{\theta}$$) is very sensitive to the input (observed data, $$x$$)

The variance of an estimator $$\hat\theta$$ is$$\text{Var}(\hat{\theta}) = \mathbb{E}_{x|\theta} \big[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2\big]$$

Mean squared error
The mean squared error (MSE) of an estimator $$\hat\theta$$ is$$\textrm{MSE}(\hat{\theta}) = \textrm{Var}(\hat{\theta}) + \textrm{Bias}_\theta(\hat{\theta})^2$$