about me
game theory
... more
Subscribe Weblog

Fresh from the Fridge: Imagine you have collected data on quit rates experienced by a set of firms during a given year and data on the firms' avarage wage rates (sample size = 100). The data are presented grafically in the figure below:
wagequit01Each dot in this figure represents a quit-rate/hourly-wage combination for one of the hundred firms. From a visual inspection of all data points, it appears from this figure that firms paying higher wages in our (hypothetical) example do indeed have lower quit rates. But would it make sense to explain the quit rates solely with the wage rates? Definitely not. Economic theory suggests there are many factors besides wages that systematically influence quit rates. These include characteristics both of firms (e.g. employee benefits offered, working conditions, and firm size) and of their workers (e.g. age and level of training).

If any of these other variables that we have omitted from our analysis tend to vary across firms systematically with the wage rates that the firms offer, the resulting estimated relationship between wage rates and quit rates will be incorrect. In such a cases, we must take these other variables into account by using a model with more than one independent variable.

The data in the plot given above were generated under the assumption that the only variable affecting a firm's quit rate besides its wage rate is the average age of its workforce. Older workers are, ceteris paribus, less likely to quit their jobs for a number of reasons (as workers grow older, ties to friends, neighbors, and co-workers become stronger, and the psychological costs involved in changing jobs--which often require a geographic move--grow larger). To cut a long story short: Half of the firms of our sample employed mainly young workers and paid an average wage of 8 Euro, the other half employed mainly old workers and therefore paid an average wage of 10 Euro. In both cases the standard deviation of the wage rate was two Euro. Furthermore, at any level of wages a firm's quit rate was 10 percentage points lower if it employed mainly old workers.
wagequit02The red line is what we get when we do not control for age and run a regression over the whole sample. The blue regression lines emerge when we run separate regressions for firms with an old workforce and firms with a young workforce*. It can easily be seen that the first model exaggerates (=omitted variable bias) the effect of a wage increase (slope of -3.5 instead of -2.5 (= true slope)). In words: A wage increase of one Euro decreases the quit rate by 2.5 % and not by 3.5 %.

Example taken from "Modern Labor Economics - Theory and Public Policy" (Ehrenberg & Smith).

*Actually, one should run a single regression that includes a dummy variable for age.
Paul N (guest) meinte am 15. Sep, 09:53:
If you want to know the true relationship between wage and quit rate, don't you also have to compare the effects of not just age, but also race, sex, religion, how much water you drink, etc?

The thing I don't get about regression analysis is, in order for it to work, you have to control for every variable that can affect what you're measuring, but how can you possibly know what every such variable is? It's hard enough in the hard sciences - I imagine it to be even harder in social sciences that are ~psychology. 
Mahalanobis antwortete am 15. Sep, 13:26:
...don't you also have to compare the effects of...
Economic theory suggests there are many factors besides wages... RTFP ;-D
The thing I don't get about regression analysis is, in order for it to work, you have to control for every variable that can affect what you're measuring, but how can you possibly know what every such variable is?
1. You do not have to control for every variable that can affect what you're measuring. The error term picks up all omitted variabls and measurement errors. From a theoretical point of view, errors are often assumed to be normally distrbuted since they represent the combined effect of a myriad of omitted variables with good manners (see Lindeberg-Levy or Lindeberg-Feller Central Limit Theorem). You only run into trouble if those omitted variables are not orthogonal to the regressors. In this case the estimates are biased. But keep in mind that if you include unnecessary regressors, your estimates become inefficient.

2. Runnig a regression is just one stage of a long process. Econometricians do such things as Model Selection, Diagnostic Checking, Specification Testing, Sensitivity Analysis... Sure, omitted variables are a nasty problem because residuals ("estimated errors") are always orthogonal to the regressors (that's what OLS does...). But even here the situation is not entirely hopeless.

3. What you are actually saying is that one cannot make any statements about economic reationships at all because there are too many effects which would have to be taken into account. The Austrian School would probably say that it is logical that wage-increases reduce quit rates. Sure, but by 0.1%, 10%, or 100%? That's what non-ivory-tower people care about. 
Paul N (guest) antwortete am 17. Sep, 00:19:
Well I like the example a lot, but I think it just illustrates this problem. Before you take other variables into account, you have a correlation. After you think about all the variables you can possibly imagine and control for them, then you still have a correlation.

I don't know much about econometrics but it's almost daily that you see correlational health studies reported (my favorite example is "women who do more housework found to have lower endometrial cancer rate"), with the clear implication that the result is somehow causative. In almost every case, you can think of at least 5, typically 10 variables that could play a role in the observed effect (e.g. age, race, income, job type, family status, other exercise, smoking, drinking, history, culture, etc.). Sometimes some of these variables are accounted for, but I suspect there's a strong bias to include only the most obvious ones, or ones that don't weaken the correlation; in any case, rarely, even for JAMA or NEJM articles like this, do you get the sense that other variables really aren't a problem.

So I guess I'm just taking out my frustration, because it infuriates me almost every time I see a correlational study reported - I feel like these data are typically latched onto because it gives people an excuse to believe what they want to be true. 
kimcil (guest) antwortete am 7. Apr, 12:28:
I am preparing a research paper and collecting information on this topic. Your post is one of the better that I have read. Thank you for putting this information into one post.
alfamart official partner merchandise fifa piala dunia brazil 2014
Unit Link Terbaik di Indonesia Commonwealth Life Investra Link 
Kevin (guest) meinte am 19. Sep, 20:34:
2.5% or 3.5%? When does it really matter?
It's clear that there are some policy applications for which 2.5% is very far from 3.5%, but others for which these numbers are very close.

For the (wonderfully clear) particular example you specified, it's hard to tell whether or not the omitted variable actually is a problem or just an annoyance. 
dsds (guest) meinte am 5. Jul, 06:57:
Many players in the purchase of laptop cooling pads is going to be seen ergonomic design, this term has always existed among industry, a good action can use computers to help players develop a good habit, but also laid the foundation for the physical and mental health. For the players, the front office a long time in a notebook or a game, arms, shoulders, wrists will appear uncomfortable feeling, however regular cooling pad can solve this problem, the use of ergonomic design, but also can adjust different perspective, the general radiator has two angles are adjusted to facilitate the players to use. 
jacksmith (guest) meinte am 9. Jan, 11:04:
It's a neat argument. examcollection 156-215.75 In hunter gatherer days, there was only group selection because the group shared everything and thus all individuals within 156-215.75 vce questions the tribe had the same success with offspring. Intelligence or productivity was selected for by the relatively weak force of tribal fights. vce 156-215.75