Recently, R.J. Rummel looked at

and concluded that if x

Bryan Caplan replied:

One can easily see that the coefficient's variance is increasing in the squared correlation coefficient between the two explanatory variables (r

Maybe that's what Rummel ment with "stealing". Rummel then suggests to run two bivariate regression for each explanatory variable. What is somehow forgotten in the whole discussion is that correlation among regressors is actualy the only reason why we run a multiple regression in the first place. We care about net effects! When the true model contains two explanatory variables but we only include one explanatory variable the resulting slope would be biased (see my story about the omitted variable bias). A valid critique would be that the two variables in our regression are actually generated by a third variable which we do not observe, i.e. they are proxies for the same thing. In this case our model (with two regressors) is wrong/misspecified. The only solution to this problem is 1. to use more data and 2. to think harder about how our world works.

and concluded that if x

_{1}and x_{2}are highly correlated, the estimate of one coefficient, say β_{1}, could be large at the expense of of β_{2}. He actually spoke of "stealing".Bryan Caplan replied:

People who use statistics often talk as if multicollinearity (high correlations between [explanatory] variables) biases results. But it doesn't. Multicollinearity leads to big standard errors, but if your [explanatory] variables are highly correlated, theyTo put is more formally: If the true model contains two explanatory variables and a constant, the formula for the variance of the slope coefficients looks as follows:SHOULD be big! Intuitively, big standard errors mean that the effects of different variables are highly uncertain, and if your [explanatory] variables are highly correlated, highly uncertain is what you should be.

One can easily see that the coefficient's variance is increasing in the squared correlation coefficient between the two explanatory variables (r

_{12}). This is why correlated regressors could make all coefficients insignificantly different from zero (tested one at a time). But what's more interesting is that a positve correlation between the regressors leads to a negative correlation between the estimated coefficients (this is why the complete set of regressors can be jointly significant although individual regressors are insigificantly different from zero):Maybe that's what Rummel ment with "stealing". Rummel then suggests to run two bivariate regression for each explanatory variable. What is somehow forgotten in the whole discussion is that correlation among regressors is actualy the only reason why we run a multiple regression in the first place. We care about net effects! When the true model contains two explanatory variables but we only include one explanatory variable the resulting slope would be biased (see my story about the omitted variable bias). A valid critique would be that the two variables in our regression are actually generated by a third variable which we do not observe, i.e. they are proxies for the same thing. In this case our model (with two regressors) is wrong/misspecified. The only solution to this problem is 1. to use more data and 2. to think harder about how our world works.

Mahalanobis - am 2005-09-23 15:14 - Rubrik: mathstat

Paul N (guest) meinte am 27. Sep, 08:18:

Good post. By the way, I wish there a job where I could do "2." all day long and get paid for it.