about me
art
biz
Chess
corrections
economics
EconoSchool
Finance
friends
fun
game theory
games
geo
mathstat
misc
NatScience
... more
Profil
Logout
Subscribe Weblog

 

In his latest entry Steffen (Mit dem Kopf voran) posted the following quote from Enrico Fermi:
"I remember my friend Johnny von Neumann used to say, 'with four parameters I can fit an elephant and with five I can make him wiggle his trunk.'" A meeting with Enrico Fermi, Nature 427, 297; 2004.
I know that there exists a fool-proof method for sculpting an elephant (get a huge block of marble and chip away everything that doesn't look like an elephant) but fitting an elephant with four parameters is an impossible task. Actually, the question, "How many paramters does it take to fit an elephant?" has already been answered by Wel in 1975: He started with an idealized drawing (A) defined by 36 points and showed that one could be satisfied with the fit of a 30 term model:elephant
The point of the elephant-story* is what every student is told in his first econometrics lecture: The residual sum can't increase by adding more explanatory variables (and therefore more parameteres) to a linear regression model. As a rule: The more explanatory variables, the better the fit. The problem that arises when one keeps adding regressors is that the precision of the estimators will be very poor, i.e. overfitted models (models with too much paramters) have estimated (and actual) sampling variances that are needlessly large (i.e. with new data your model will perform poorly).

For a linear model containing predictors x1,x2, ...,xk with estimators of coefficients b0, b1,...,bk, if the regressor xk+1 is added to the model producing new estimators of coefficients b*0, b*1,...,b*k+1, then for i = 0,...,k

Var(b*i) ≥ Var(bi)

Ergo: If a model is made more complex, the variance of model predictions will increase (since the parameters are estimated less precisely).

On the other hand if important regressors are excluded the predicted dependend variables ("y hat") will systematically deviate from the observed variables (y) and the residuals (which contain the omitted variable) will be unnecessarily large. If the assumed model is not correct due to an important predictor being excluded then the bias could be reduced by including that predictor.

The figure shown below shows the trade-off between (squared) bias (solid line) and variance versus the number of estimable parameters in the model. All model selection methods implicitly employ some notion of this trade-off:biasvariancetradeoff
Keep in mind that the best approximating model need not occur exactly where the two curves intersect. Conclusion: Whether a predictor is worth adding to a model (for a fixed vector of regressors, x) depends on whether the reduction in squared bias is greater than the increase in variance.

Model too simple ⇔ high bias/low variance

Model too complex ⇔ low bias/high variance

~~~

Note for Advanced Readers: Mean Squared Error (MSE):

Assume z = h(x,θ)+v, where z is the dependend variable, h(.) is some function, x is is the vector of regressors, v is noise and θ is a vector of model parameters. The MSE of the model at a fixed x can be decomposed as:
variancebiastradeoff

*If one shows up with a model with lots of parameters opponents will argue that one has used enough parameters to fit an elephant.

Graphics taken from Model Selection and Multi-Model Inference,
Kenneth Burnham & David Anderson

Name

Url

Remember my settings?

Title:

Text:


JCaptcha - you have to read this picture in order to proceed
Change Picture