David I. Levine

Industrial Relations BA 259C

January 2003

Regressions, Causality, and Returns to Education

Male college graduates earn about 70% more than high school graduates in the U.S. Does this fact mean that sending the typical high school graduate to 4 years of college would add 70% to his wage?

The answer is probably "No." High school and college graduates differ along many dimensions other than education; thus, some of the correlation is probably due to other factors. Consider yourselves: If for some reason you had been prohibited from entering college, the many cognitive, motivational and (on average) family advantages that brought you this far would typically have led you to have earnings above the average among high school graduates. 

To pick an extreme example, players in the National Basketball Association earn many times the average national income.  That fact does not imply that the average American, if given a job in the NBA, would earn much money.

Social scientists normally use ordinary least squares regressions to analyze questions such as the returns to education. For example, we might run:

log(wage) = B · years of schooling + other stuff.(1)

A more realistic statistical model of the situation with unmeasured characteristics is:

Yit = B·Sit + C·Xit + Ai + eit ,

where Y is the outcome of interest for person i at time t (for example, log(wage)), S is schooling, X is a vector of other characteristics, A is a constant but unmeasured person-specific factors, and e is a well-behaved residual. The unmeasured person-specific intercept is also called the fixed effect, the person-specific unobserved heterogeneity.

To the extent the unmeasured factors (A) are correlated with the observed characteristics such as schooling, the estimate of the returns to schooling (B) will be biased. We normally think that unmeasured ability, for example, leads to both high income and to high educational attainment. Thus, the estimate b will capture both the true effect of education plus some of the fact that high-ability people get more education. I think of unmeasured factors "leaking into" measured factors that are correlated.(2)

Some of these factors that both cause earnings and probably differ among high school and college graduates are easy to observe, such as age, race, and region. Other factors are measured in some data sets but not others: parental education or family income, scores on cognitive achievement tests, and self-reported motivation. No matter how many of these factors we control for, some of the remaining educational wage gap may be due to differences in unobservable factors.(3)

There are a number of strategies researchers use to control for selection effects.

Control for more observable factors

If we add more observable differences (age, race, region, parental education, family income, scores on cognitive achievement tests, self-reported motivation, etc.) and the gap remains, we are more confident that some of it would remain even with perfect controls. If a few observable factors influence the gap enormously, we suspect that controlling a bit more for the so-far unobservable factors would matter as well.

For example, Alan Krueger and Larry Summers added a number of measures of working conditions (danger, etc.) to a model with unexplained differences among high- and low-wage industries. They found that the gaps between industries did not narrow with a few sensible measures of working conditions, suggesting that the gap probably would not narrow if more controls were added. In several of my papers I have added measures of skill and working conditions and found that the variability between high- and low-wage employers remains almost unchanged (e.g., O'Shaughnessy, Levine and Cappelli, 1998).

BUT

To the extent these factors raise earnings by raising education levels, we can understate the effects of education by including the control variables. For example, if parental education had no effect on the earnings of children unless the children also have high education, then education actually is causally linked to higher earnings -- at least for those youths.

Find a natural experiment or an instrumental variable

We'd like to estimate the effect of attending a junior college on wages:

Log(wage) = controls + B · Attending a junior college + u.

Unfortunately, it is likely that attending a junior college is correlated with unobserved factors such as ability; that is, E(attending a junior college · u) is not zero.

Kane and Rouse noted that some people grew up near a junior college, while others live many miles away. Junior colleges rarely influenced where families located (assuming we are looking at families that moved there when the kids were young). At the same time, the presence of a junior college nearby affects of the costs of attending. We can run a first-stage regression:

Pr(attending junior college) = controls + a * miles to nearest junior college.

We can then use the estimated coefficients to predict p^, the probability of attending junior college. Now we can run a second-stage wage equation:

wages = controls + B2SLS · p^.

As long as miles to the nearest junior college is uncorrelated with ability, the two-stage least squares estimate b2SLS will be an unbiased estimate of the true effect of junior college attendance on wages. This procedure is called two-stage least squares or instrumental variables techniques, with miles to the nearest junior college as the instrument.(4)

It might be simpler to think of the analysis in 2 pieces.  We can run a non-causal reduced form equation:

wages = controls + c * miles to nearest junior college.

Now the coefficient we care about is B, and can be estimated by b = Δwage/Δattend JC = Δwage/Δmiles to jc / ΔPr(attend jc) / Δmiles to jc = c / a.  Intuitively, assume junior colleges are sprinkled at random around the nation, but being near one leads to .2 years higher education.  Moreover, due to the effects of a junior college on wages, wages are 1 percent higher near a junior college.  Then it appears the causal link between attending a year of junior colleges and wages = 1% higher wages near a JC / .2 years more education near a JC = 5% more wages / year of junior college.

A good instrument is correlated with the main variable of interest (in this example, attending a junior college) but not correlated with the error term of the main equation (ability, etc.) Such IVs are rare!

Even when you find an instrument, you need to recall that it answers a specific question.  The “miles to junior college” instrument, for example, tells us how a year of junior college affects earnings for those whose decision might be influenced by travel times.  Thus, the estimates do not tell us much about returns to education for students who did not finish 10th grade, about students who attended an elite university, or about students who have spent their whole lives dreaming of a career that a junior college enables.  Each natural experiment affects just a subsample of people; as such, our results answer a question that refers to this subsample.

The best IV's are literally random, and these are even rarer. Barry Staw used lottery numbers in the Vietnam-era draft in his classic paper on cognitive dissonance.  Draft lottery numbers were based on birthdays, a random event. Thus, Staw discovered a fantastic natural experiment.

Some years later, the economist Josh Angrist rediscovered birthdays as a good instrumental variable. For example, in California, children born in the summer start kindergarten when they have just turned five (unless their parents hold them back), while children born in the winter start kindergarten when they are 5½ . At age 16, youth born in the summer have finished more months of schooling than have children born in the winter. Thus, the compulsory age of schooling (16 in most states) leads to several more months of required schooling and more actual schooling for summer births. Alan Krueger and Angrist used this fact to study the returns to education. Again, the required conditions were that birthdays predict months of education, and that birthdays be uncorrelated with ability.  If younger students in first grade (for example) were disproportionately small, picked on, or less able to learn to read, we might expect that birthdays would be correlated with academic ability or other outcomes that affect dropping out, even prior to age 16.

Some analysts have used lottery winnings, shocks to weather (for farmers), or German repatriations to victims of the holocaust as fairly exogenous increases in income. They then looked at how the higher income affected consumption and other behaviors.

As long as miles to the nearest junior college is uncorrelated with ability, the two-stage least squares estimate b2SLS will be an unbiased estimate of the true effect of junior college attendance on wages. This procedure is called two-stage least squares or instrumental variables techniques, with miles to the nearest junior college as the instrument.(4)

Here is a simplified version of the Staw, Angrist, and Kane&Rouse papers, that may help illuminate the method.

Staw’s study

Lottery number that makes Vietnam likely

Lottery number that makes Vietnam unlikely

Need to believe ROTC is ok to justify going

No (avoiding Vietnam is sufficient justification)

Yes

Performance in ROTC

Lousy

Good

  Angrist wanted to know how Vietnam service affected wages.  OLS analysis was surely biased, as disadvantaged people were over-represented among those in the military.

Angrist study design (with illustrative numbers)

Lottery number that makes Vietnam likely

Lottery number that makes Vietnam unlikely

Difference

Odds of Vietnam

.8 (some were disabled, for example)

.3 (some volunteered, for example)

ΔPr(Vietnam service) / Δ draft category = .5

Wages prior to Vietnam

$5

$5.01

Essentially zero, as randomized

Wages after Vietnam

$10

$11

Δwage / Δ draft category =  $1

The estimated causal effect we want is Δwage / ΔVietnam service. 

Arithmetic tell us is Δwage / ΔVietnam service = (Δwage / Δ draft category) / [ΔPr(Vietnam service) / Δ draft category ] = $1 / .5 = $2

That is, an extra half of those with unlucky draft numbers served in Vietnam, so we double the raw effect of draft #s of wages to get the best estimate of the causal effect.

 

Kane and Rouse study design (with illustrative numbers)

Near Junior College

Far from JC

Difference

Odds of going to JC

.8

.3

ΔPr(attend JC) / Δ near JC= .5

Average wages

$10

$11

Δwage / Δ near JC =  $1

The estimated causal effect we want is Δwage / Δattend JC. 

Arithmetic tell us is Δwage / Δattend JC = (Δwage / Δnear JC) / [ΔPr(attend JC) / ΔNear JC] = $1 / .5 = $2

That is, an extra half of those near a JC attend, so we double the raw effect of being near a JC on wages to get the best estimate of the causal effect.

BUT finding good natural experiments is difficult.

Finding good natural experiments is difficult. With the exception of birthdays, nature rarely randomizes.  (Where else does true randomization occur?)

If the first stage is weak, that is the correlation of the instruments with the main variable is low, then the result can be quite biased and noisy. Actually, what matters is partial correlation of the instruments with the main variable, conditioning on all the control variables.  That is, the instrument must have a large incremental R2 in the first stage equation, after including all the other controls.  Most scholars do not present the first-stage regression, but you always should. Intuitively, the first-stage equation predicts X with the instrument Z with an equation like X = c Z. The predicted values of X = X^ = c^ · Z are then used in the second-stage equation:

Y = B · X^ + etc.

The R2 of the first-stage equation = (variance of X^) / (Variance of X). If this R2 is small,. then V(X^) is small. Recall that in the one-variable case the second-stage estimate of bIV = (X^ ' X^)-1 (X^ 'Y) = cov(X^, Y) / V(X^). As V(X^) shrinks, the estimate becomes unstable -- tiny changes in cov(X^, Y) lead to big changes in bIV .

Even worse, if the instrument is slightly imperfect then the IV estimate will be biased. We are typically concerned that the IV might have a little bit of correlation with the error. In that case, because V(X^) is small, even a little bit of covariance can foul things up a lot.

As a final obstacle, the theory of IVs is based on the large-sample properties of the estimators. In small samples IV estimates do even worse unless the relationship in the first-stage regression is quite strong.

Examine differences in policy

In many settings researchers assume that policy influences the factor of interest (e.g., schooling) but is uncorrelated with omitted factors A. When policy is exogenous in this sense, it can be used to identify the regression.

For example, David Card and Alan Krueger compared the earnings of blacks who grew up in North and South Carolina before the end of segregation. Although rather similar states, one Carolina had vastly worse schools for blacks than did the other. (Both sets of schools for blacks were substantially worse than schools for whites.) To the extent that blacks in the two states had similar innate ability, the differences in schooling caused by North vs. South Carolina governments is an unbiased measure of the true effect of schooling.

When attacking a different problem, David Card used different minimum wages (relative to average wages) across states to identify the effect of the minimum wage on teen employment.

Other states have "right-to-work laws" that prohibit requiring employees to join a union. Henry Faber used differences in right-to-work laws across states to measure how these laws affect the percent of the workforce that is in a union.

Edward Lazear used differences in legally mandated costs of dismissal (e.g., severance pay) across European nations to identify whether costs of dismissal affected employment rates. Similarly, Christopher Ruhm used differences in mandated paid parental leave across European nations to identify how such leave affected female employment rates.

BUT policies are endogenous

Policies are themselves endogenous. If states with low unemployment raise the minimum wage, causality goes from employment to wages, not the opposite. If a recession sparks fear of unemployment and, thus, stricter rules against layoffs, causality goes from employment to layoff restrictions and severance pay, not the opposite. Anti-union states probably have right-to-work laws and lower %union. Sometimes the reverse causality is institutionalized: In the US, higher unemployment sometimes automatically raises the maximum duration of unemployment benefits.

Degrees of freedom are limited.

Whenever we are estimating the effects of policy changes, we have degrees of freedom equal to the number of distinct policy regimes. Thus, even if we have 60,000 people, if they are in 9 regions and the policy only varies across regions, the correctly estimated standard errors will be big. (We can estimate the standard errors correctly by running the regression at the region level, or by adjusting for clustering by region using a clustering correction (svyreg in Stata) or random effects [Moulton 1986].)

Even worse, states and nations that are near each other are more similar to each other than they are to distant regions. Thus, the 50 states are not really 50 independent draws. When we correct for regional autocorrelation, the effective degrees of freedom drops further and estimated standard errors rise further. Intuitively, southern states or Scandinavian nations have a lot in common, and the estimates should not treat them as uncorrelated observations.

Examine changes over time.

Panel or longitudinal estimates frequently assume the unobserved heterogeneity is constant, and examine changes over time. If we subtract equation (1) from a lagged version of itself, we get:

Yit -Yit-1 = B·(Xit - Xit-1 ) + (eit - eit-1 ).

The important thing to note is that the unobserved constant factor A has dropped out of the estimate. It is also possible to achieve identical estimates by adding in a separate intercept for each individual. The complete set of intercepts is called "fixed effects." The variance of the fixed effects indicates the amount of heterogeneity among people (or states or firms). Thus, first-difference or complete set of dummy estimates are also called fixed effect estimates or least-squares-dummy estimates.

For example:

Charles Brown examined people changing jobs to see whether their wages rose when they moved to jobs with poorer working conditions (1980). Alan Krueger and Lawrence Summers examined people changing industries to see whether their wages rose when they moved to an industry that paid high wages in the cross section. Similarly, Charles Brown and James Medoff examined people changing employers to see whether their wages rose when the size of their employer rose. In each of these cases, stable forms of ability, family social background, height, and a host of other factors remained constant over time, and would not bias the estimate of B.

Sometimes the differencing can be within otherwise homogenous categories (instead of over time). Cecilia Rouse and Orley Ashenfelter examined wage differences between twins who had different years of schooling. This differencing automatically controlled for all genetic factors and family resources, as well as the vast majority of differences in family and peer influences. Many other studies have compared siblings to control for shared family background (e.g., Korneman and Winship, 1995).

To use a firm-level example, Huselid and Becker examine the profitability of companies before vs. after they add new  work practices such as training and employee involvement.

Finally, most of the studies of policies noted above actually used first-difference estimates, comparing states of nations before vs. after a policy was enacted.

BUT First-differencing eliminates a lot of information

First-differencing eliminates a lot of “bad” variation (A), but it also eliminates useful information. Most basically, no information is available from people, firms or nations that did not change their policies. For schooling, it is rare to have data on earnings between high school and college, and then post-college earnings. Even if we had such data, it is likely that youth who work and then return to college are not a random sample; for example, they may have taken low-wage known-to-be-temporary jobs for the year or two between spells of full-time education.  Even for observations with variation over time, all differences that persist over time between people or organizations is discarded.

Measurement error rises

Corresponding to the lower information available, the effects of random measurement error rises in longitudinal estimates. Random measurement error typically biases coefficients downward.  To see this, consider a carpenter measuring a series of perfectly square tables. With no measurement error, height always equals length, and the estimated correlation = true correlation = unity. With measurement error, sometimes height is measured larger or smaller than width, and the correlation declines.

If we measure the factor X with the variance of the measurement error equal to 2 percent of  the variance of X itself, the signal/noise ratio is 2% and the bias in the simplest (one-variable) measurement error model is also 2%. That is, the estimated coefficient will be about 2% below the true coefficient -- a rather small problem. If %change in Y has only about 5% the variance of its level, then the signal/noise ratio is now 5:2, and the downward bias is 40% -- a significant problem.

Thus, most of the studies cited above try to correct for measurement error. For example, the Ashenfelter and Rouse study of twins dropped twins from the sample if they disagreed on the level of education of both themselves and the other.

Unfortunately, each correction for measurement error adds more assumptions and can lead to problems of its own. Results are often quite sensitive to the adjustment for measurement error.

Changes are not exogenous.

We use first differencing to eliminate unobserved heterogeneity that is constant among people. Nevertheless, unobserved heterogeneity remains between changers and others, otherwise they would not have changed.

In the Ashenfelter and Rouse study, why did one twin have more education than another? Whatever that factor was, it (not just the extra education) may be causing the higher wages.  For example, we know that twins often differ on birth weight, and birth weight affects education.

Why did one firm add more "high performance" workplace practices in Huselid and Becker? Whatever that factor was, it (not just the change in the workplace) may be causing the higher productivity.

In the studies of employees moving between employers or industries, it is possible that high-wage sectors have higher skill levels. At the same time, if firms (and perhaps even workers) cannot easily see skill levels, it may take a time for ability to become clearer. In that case, we expect people to start in low-wage industries and then for some of them to receive wage and be hired in high-wage industries. Thus, industries do not pay high wages (conditional on skill), they just have a lot of skilled people. The first-difference estimate is similar to the cross-section even if skills are constant, merely due to improved sorting and learning about skills. (Lawrence Katz and Robert Gibbons model this situation and compare it with the data.)

Model the selection process explicitly

In some early work, James Heckman found a clever way to model the selection process formally that relied only on assumptions about the distributions of error terms. Unfortunately, such estimates are not robust, as they do not really have any identifying information. Instead, they rely on assumptions about functional forms, and researchers rarely have enough information about functional forms to make such assumptions. Thus, the results are not reliable and are quite sensitive to small changes in specification. If you ever see results with "Mills ratios" added in to correct for selection, be suspicious. (Shaver presents a recent example of organizational research using this unconvincing sample selection correction.)

In other cases, Heckman and others have had information about the process of selection. Consider the case of wages for women. In that case, women with professional degrees mostly work, but many women with lower education do not work for pay. It is likely that the women out of the paid labor force have below-average opportunities for paid labor (A < 0, in the model above). Thus, we would compare average high-educated women with high-wage lower-educated women, biasing the returns to education downward. In one recent paper, Hilary Hoynes and Nada Eissa argued that the presence of young children affects labor supply but not wages. (This assumption can be debated.) If so, we can formally model the decision to work, and use the predicted probability of working to adjust the estimates in the wage equation.

Perform a true experiment

Social scientists sometimes receive funding for true experiments. That is, we randomize who gets training from a waiting list (LaLonde, 1986), who gets larger class sizes (STAR experiment in Tennessee, Krueger, 1997), who gets a voucher to move from public housing into the suburbs (the Gatreaux experiment, Rosenbaum, 1995), and so forth. Field experiments have a disproportionate effect on social scientists' understanding of reality, as they are more convincing than other forms of research.  They have an even stronger influence on policy, in large part because they are easy to explain.

BUT

Field experiments are costly.

True randomization is difficult. For example, people may try to re-apply if they lose a lottery.

True "treatment" is difficult. Some who are in the treatment group do not take advantage of the program. Some who are in the control group find similar nonexperimental programs (for example, to receive subsidized training).  Attrition occurs. Moreover, attrition is nonrandom, as those doing poorly and those not permitted into the experimental group are most likely to drop out of the sample.

For all of these reasons, experiments typically identify the “intention to treat,” as opposed to the treatment itself.  This effect is typically smaller than the effect of the treatment on those accepting the treatment.  For example, if 60% of those eligible participate in a program that raises their earnings by 10%, the effect of the program is 10% higher earnings on those accepting treatment (“treatment on the treated”) but only 6% higher earnings on those offered the training (“intention to treat”).  The six percent figure is more helpful in understanding what would happen if the program were offered more widely.  Without a model of who accepts the treatment, it is harder to understand the effect of the treatment on the treated, as some part of it may be due to self-selection into the training program.

Experiments rarely last as long as researchers would like, which makes it hard to know if people are excited to be in something novel, not responding as well as they would to a permanent program, responding more in the short run to take advantage of an opportunity they know will not last, or something else.

Field experiments often have tiny N, if we think of N = # of independent treatments. Consider two teachers who have a new method, compared with 2 teachers using an old method. Even if each teacher teaches 200 students, in some sense N = 4 here, which makes statistical inference weak. Intuitively, if the control group has 1 excellent teacher, his or her capability could swamp even quite a good experimental method.

Moreover, we suspect that innovative teachers volunteer to be in experiments, so the results may not generalize. (See Heckman, 1995 for a critique of field experiments.)

HINTS

Whenever you see randomization, consider writing a paper on it. Lotteries, factors correlated with birthdays, random choices from waiting lists, and so forth, are rare but important.

Whenever you see a waiting list, consider asking the people running the program to draw randomly. If they track any outcome data, this method will automatically perform an experiment.

Perform a quasi-experiment

Cheaper than a field experiment is presenting decision-makers scenarios, and asking them what they would do in that situation. By giving subjects different scenarios, it can be difficult for them to identify the question being asked by the researcher. Costs are lower, and the subjects and scenarios can be more realistic than in most laboratory experiments with college sophomores.

The downside is clear: Subjects have no incentive to give the decision the thought they would for a true decision. Also, as in any experiment, subjects may answer in ways they feel are preferred by society or by the experimenter.

For example, Kahnemann, Knetsch and Thaler gave subjects scenarios, and asked how differences in the situation changed evaluations of pay cuts: real vs. nominal cut; cut base pay vs. cut bonus.

I gave scenarios to pay-setting executives with variations in inflation, relative wage changes in the market, unemployment, and the employers' ability to pay. Differences in suggested pay changes for different occupations tested theories of markets, fairness, and ability to pay (Levine, 1993).

Appendix: Reminders about ordinary least squares regression.

Consider the one-variable OLS model with schooling as the only variable X that determines income (Y):

A1) Yi = B·Xi + ei.

To reduce notation, we have already subtracted the mean of Y and of X from each data point. This transformation ensures the constant term is zero, so it is omitted from this Appendix.

In the ideal situation, the observable variables X are uncorrelated with the error term e. I use the notation X'e to be the vector product; that is, Xi·ei. Thus, we can write the assumption that X and e are uncorrelated (that is, have a covariance of zero) three ways: cov(X, e) = E(X'e) = E(Xi·ei ) = 0.

The ordinary least squares (OLS) estimate of B is defined as:

A2) bOLS = (X'X)-1 · X'Y.

We can substitute in from (A1) for Y and get:

A3) bOLS = X'(B·X + e ) / X'X = (BX'X + X'e ) / X'X

Take expectations and recall that X is uncorrelated with e (that is, E(X'e) = 0). Thus, we have

A4) E(bOLS ) = B (X'X) / X'X = B

In the one-variable case, we can rewrite the formula for bOLS (A1) as

A5) bOLS = (Xi·Yi ) /(Xi·Xi ) = cov(X,Y)/V(X).

In words, the coefficient of X on Y is large if the two have a high covariance. The coefficient declines if X has a lot of variance. Intuitively, if we correlate weight on height, as we move height from feet to inches, the standard deviation rises by a factor of 12. The higher variance corresponds to a smaller b. A one foot increase in height has 12 times the coefficient as when height is measured in inches, and a one foot change in height = a 12 inch change.

Two key points

1. Anything that raises E(X'e), so that high values of X will have high error terms, will bias bOLS. Intuitively, if an omitted factor raises Y (that is, raises the error term e), and is positively correlated with X, then E(X'e) > 0. Looking at (A3), we see that bOLS can be expected to be > B. The effect of the omitted factor "leaks onto" the X that is included in the regression.

2. Anything that raises V(X) such as purely random measurement error will bias bOLS down.



References

Abraham, Katharine G.; Farber, Henry S. Returns to Seniority in Union and Nonunion Jobs: A New Look at the Evidence. Industrial & Labor Relations Review v. 42, n1 (Oct 1988):3-19.

Abstract: Analysis using cross-sectional data indicates that the positive association between earnings and seniority is generally much stronger for nonunion than for union workers. This result is inconsistent with the general belief that seniority is more important in the union sector. Further analysis leads to the conclusion that standard estimates of the return to seniority will be biased upward because of unmeasured worker heterogeneity, job heterogeneity, or both. This upward bias is more likely to occur in the nonunion sector. This bias is corrected for in an analysis of data on male blue-collar workers for the period 1968-1980; the results show a larger return to seniority in the union sector. The analysis also indicates that workers on jobs of long duration will earn more than workers on short jobs. A union worker in a 20-year job will earn 9% more each year than a union worker in a 5-year job. A nonunion worker in a 20-year job will earn 18% more annually than one in a 5-year job.

Joshua D. Angrist, “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records,”  The American Economic Review Vol. 80, No. 3 (Jun., 1990), pp. 313-336  Stable URL: http://links.jstor.org/sici?sici=0002-8282%28199006%2980%3A3%3C313%3ALEATVE%3E2.0.CO%3B2-H

Angrist, Joshua D.; Krueger, Alan B. Does Compulsory School Attendance Affect Schooling and Earnings? Quarterly Journal of Economics v106, n4 (Nov 1991):979-1014.

Abstract: It is established that season of birth is related to educational attainment because of school start age policy and compulsory school attendance laws. Individuals born in the beginning of the year start school at an older age and can therefore drop out after completing less schooling than individuals born near the end of the year. Roughly 25% of potential dropouts remain in school because of compulsory schooling laws. The impact of compulsory schooling on earnings is estimated using quarter of birth as an instrument for education. The instrumental variables estimate of the return to education is close to the ordinary least squares estimate, suggesting that there is little bias in conventional estimates.

Ashenfelter, Orley; Rouse, Cecilia. Income, schooling, and ability: Evidence from a new sample of identical twins. Quarterly Journal of Economics v113, n1 (Feb 1998):253-284.

Abstract: A model is developed of optimal schooling investments and it is estimated using new data on approximately 700 identical twins. An average return to schooling of 9% for identical twins is estimated, but estimated returns appear to be slightly higher for less able individuals. Simple cross-section estimates are marginally upward biased. These empirical results imply that abler individuals attain more schooling because they face lower marginal costs of schooling, not because of higher marginal benefits

Brown, Charles, "Equalizing Differences in Labor Markets," Quarterly Journal of Economics, 85, 1980.

Brown, Charles; Medoff, James. The Employer Size-Wage Effect. Journal of Political Economy v. 97, n5 (Oct 1989):1027-1059.

Abstract: There is considerable evidence that "large" employers pay more than "small" employers even when their union status is equal. The size-wage differential is one of the key differentials seen in labor markets. In an analysis, estimates are presented of size-wage differentials based on 5 data files, including the Current Population Survey. Some results are: 1. The effect of employer size on wages is both an establishment and a firm size effect. 2. Even for subsets of workers grouped by "collar color," union status, or industry, those who work for larger employers receive higher wages. 3. Within detailed, professional, technical, and managerial occupations, employer size premia are smallest (in percentage terms) in the highest pay grades. 4. The employer size effects are not significantly reduced by looking at changes in wages for particular workers as they move to different-sized employers. 5. The size premium occurs even in contexts in which the threat of unionization is implausible and in the union sector.

Card, David; Krueger, Alan B. School Quality and Black-White Relative Earnings: A Direct Assessment. Quarterly Journal of Economics v107, n1 (Feb 1992):151-200.

Abstract: Direct evidence of the role of school quality in explaining the growth of black-white relative earnings between 1960 and 1980 is presented. A strong relationship is found between school quality and the economic return to additional years of schooling for black and white workers. The estimates suggest that measures of school quality can explain 15%-25% of the convergence in relative rates of return to schooling for Southern-born black workers between 1960 and 1980. The remainder of the convergence in black-white relative returns to education is attributed to an economywide increase in the relative value of black education between 1970 and 1980. While returns to education for white workers fell sharply during the 1970s, returns for older cohorts of black workers were relatively stable. A direct relationship is also found between the relative quality of schools for black and white students from a particular state and cohort and their relative earnings later in life.

Card, David; Krueger, Alan B. Does School Quality Matter? Returns to Education and the Characteristics of Public Schools in the United States. Journal of Political Economy v100, n1 (Feb 1992):1-40.

Abstract: The effects of school quality - measured by the pupil-teacher ratio, average term length, and relative teacher pay - on the rate of return to education for men born between 1920 and 1949 are estimated. Using earnings data from the 1980 census, it is found that men who were educated in states with higher quality schools have a higher return to additional years of schooling. Rates of return are also higher for individuals from states with better educated teachers and with a higher fraction of female teachers. Holding constant school quality measures, however, no evidence is found that parental income or education affects average state-level rates of return.

Card, David E. and Alan B. Krueger. Myth and measurement : the new economics of the minimum wage Princeton, N.J. : Princeton University Press, 1995.

Farber, Henry S. The Decline of Unionization in the United States: What Can Be Learned from Recent Experience? Journal of Labor Economics v8, n1, Part 2 (Jan 1990):S75-S105.

Abstract: Based on the May Current Population Surveys, the fraction of private nonagricultural employment made up of union members fell from 25.6% in 1973 to 14.1% in 1985. The dramatic decline in unionization in the US over the last decade is investigated in the context of a supply-demand model of union status determination using data from surveys of workers conducted in 1977 and 1984, along with data from the National Labor Relations Board on representation elections. It is concluded that the decline in unionization since 1977 is accounted for largely by 2 factors: 1. There has been an increase in employer resistance to unionization, probably due to increased product market competitiveness. 2. There has been a decrease in demand for union representation by nonunion workers due to an increase in the satisfaction of nonunion workers with their jobs and a decline in their beliefs that unions are able to improve wages and working conditions.

Gibbons, Robert, and Larry Katz "Layoffs and Lemons." Journal of Labor Economics v9, n4 (Oct 1991):351-380.

Abstract: Theoretical and empirical analyses of an asymmetric-information model of layoffs are provided. When firms have discretion with respect to whom to lay off, the market infers that laid-off workers are of low ability. Assuming that no such negative inference is warranted if workers are displaced in a plant closing, postdisplacement wages should be lower and postdisplacement unemployment spells should be longer for those displaced by layoffs than for those displaced by plant closings, but predisplacement wages should not differ by cause of displacement. An analysis using Current Population Surveys data shows that, with respect to pre- and postdisplacement earnings, the postdisplacement earnings of white-collar workers who are displaced by layoffs are significantly lower than those of white-collar workers displaced by plant closings. Predisplacement earnings do not vary with cause of displacement.

Heckman, James J., "The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models, Annals of economic and Social measurement, Fall 1976, 5:4, 475-92.

Heckman, James J., and V. Josheph Hotz, "Choosing among Alternative nonexperimental Mehthods for Estimating Impact of Social Programs," J. of the American Statistical Assocaition, Dec. 1992, v. 84: 262-280.

Heckman, James J., and Jeffrey A. Smith, "Assessing the Case for Social Experiments," Journal of Economic Perspectives, 9, 5, Spring 1995: 85-110.

Kane, Thomas J; Rouse, Cecilia Elena. Labor-market returns to two- and four-year college. American Economic Review v85, n3 (Jun 1995):600-614.

Abstract: Despite their importance as providers of post secondary education, little is known about the labor-market payoffs to a community-college education. An attempt is made to fill this gap by employing 2 different data sets that allow one to distinguish between 2-year and 4-year college attendance: 1. the National Longitudinal Study of the High School Class of 1972 (NLS-72) and the National Longitudinal Survey of Youth (NLSY). Using the NLS-72, it is found that the average person who attended a 2-year college earned about 10% more than those without any college education, even without completing an associate's degree. Further, contrary to widespread skepticism regarding the value of a community-college education, the estimated returns to a credit at a 2-year or 4-year college are both positive and remarkably similar: roughly 4%-6% for every 30 completed credits (2 semesters). Evidence is also found of the additional value of an associate's degree for women and a bachelor's degree for men.

Huselid, Mark A; Becker, Brian E. Methodological issues in cross-sectional and panel estimates of the human resource-firm performance link. Industrial Relations v35, n3 (Jul 1996):400-22.

Abstract: Because companies differ in factors such as management ability that may lead to both high performance work systems and enhanced firm performance, conventional estimates of the effects of human resource management practices on firm performance may be biased upward. Alternatively, if HR management practices are measured with error, estimates of their effects on firm performance may be biased downward. It is found that, although longitudinal estimates that avoid the first source of bias are substantially smaller than cross-sectional estimates, the former are strongly influenced by errors in measuring HR management practices. Based on independent estimates of the measurement error, a

Kahneman, Daniel; Knetsch, Jack L.; Thaler, Richard. Fairness as a Constraint on Profit Seeking: Entitlements in the Market. American Economic Review v76, n4 (Sep 1986):728-741.

Abstract: While there may be good reasons to assume that firms seek their maximal profits as if they were subject only to legal and budgetary restraints, the patterns of incomplete adjustment often observed in markets suggest that some additional restraints are operative. Household surveys of public opinions are used here to infer rules of fairness for conduct in the market from evaluations of particular actions by hypothetical firms. The analysis considers 3 determinants of fairness judgments: 1. the reference transaction, 2. the outcomes to the firm and to the transactors, and 3. the occasion for the action of the firm. In customer or labor markets, it is acceptable for the firm to raise prices or cut wages when profits are threatened and to maintain prices when costs are reduced. However, it is unfair to exploit shifts in demand by raising prices or cutting wages. Several market anomalies are explained by assuming that these standards of fairness influence firms' behavior.

Korenman, Sanders, and Christopher Winship. "A reanalysis of The Bell Curve" Cambridge, MA : National Bureau of Economic Research, W.P. no. 5230. 1995.

Krueger, Alan B. and Lawrence H. Summers. Reflections on the inter-industry wage structure Cambridge, Mass. : National Bureau of Economic Research, W.P. no. 1968, 1986

LaLonde, Robert J. Evaluating the Econometric Evaluations of Training Programs with Experimental Data. American Economic Review v76, n4 (Sep 1986):604-620.

Abstract: An attempt is made to compare the effect on trainee earnings of an employment program that was run as a field experiment where participants were randomly assigned to treatment and control groups with an array of estimates that an econometrician without experimental data might have produced. The results likely to be reported by an econometrician using nonexperimental data and the most modern methods are examined, and the extent to which the results are sensitive to alternative econometric specifications is tested. The field experiment is based on the National Supported Work (NSW) Demonstration program, a temporary employment program designed to help disadvantaged workers lacking basic job skills enter the labor market. The results show that many of the econometric procedures and comparison groups employed to evaluate employment and training programs would not have yielded accurate or precise estimates of the effect of the NSW program. The econometric estimates often differ considerably from the experimental results.

Lazear, Edward P..  Job Security Provisions and Employment Quarterly Journal Economics, Aug 1990, 105(3): 699-726.

Abstract: Job security provisions and employment are examined by considering a 2-period labor market without any government-mandated or voluntary severance pay. The data consist of 468 observations made up of 22 countries times 29 years of experience between 1956 and 1984. The variables collected are civilian labor force, employment, population, average hours worked, and gross domestic product. The best estimates suggest that moving from no required severance pay to 3 months of required severance pay to employees with 10 years of service would reduce the employment-population ratio by approximately 1%. Although theory gives no guidance on the effects of severance pay on unemployment rates, mandated severance pay seems to increase unemployment rates. The estimates suggest that severance pay turns full-time jobs into part-time ones.

Levine, David I. Fairness, markets, and ability to pay: Evidence from compensation executives. American Economic Review v83, n5 (Dec 1993):1241-1259.

Abstract: A unique set of data based on surveys of 139 compensation executives is examined. Respondents read scenarios describing a hypothetical company and its labor market, and recommended wage changes for several positions. Contrary to some popular theories, differences in unemployment, quit rates, and a company's return of assets led to almost no change in respondents' recommended wage increases. When market wages for closely related occupations diverged, most respondents did not recommend adjusting relative wages within the company. However, when the occupations were not closely related (blue vs. white collar), most respondents recommended adjusting wages to reflect market forces.

Moulton, Brent R. Random Group Effects and the Precision of Regression Estimates.Journal of Econometrics v32, 3 (Aug 1986):385-397. Pub type: Methods; Experimental.

Abstract: When explanatory variable data in a regression model are derived from a population with grouped structure, the regression errors often are correlated within groups. Error component and random coefficient regression models are viewed as models of the intraclass correlation. Several empirical examples are analyzed to examine the applicability of random effects models and the consequences of inappropriately using ordinary least squares (OLS) estimation in the presence of random group effects. The main findings are that the assumption of independent errors is usually incorrect and that the unadjusted OLS standard errors often have a substantial downward bias, indicating a considerable danger of spurious regression.

Rosenbaum, James E. Changing the geography of opportunity by expanding residential choice: Lessons from the Gautreaux program. Housing Policy Debate v6, n1 (1995):231-269.

Abstract: The concept of geography of opportunity suggests that where individuals live affects their opportunities. While multivariate analyses cannot control completely for individual self-selection to neighborhoods, a residential integration program - the Gautreaux program - is examined, in which low-income blacks are randomly assigned to middle-income white suburbs or low-income mostly black urban areas. Compared with urban movers, adult suburban movers experience higher employment but no different wages or hours worked, and suburban mover youth do better on several educational measures and, if not in college, are more likely to have jobs with good pay and benefits. The 2 groups of youth are equally likely to interact with peers, but suburban movers are much more likely to interact with whites and only slightly less likely to interact with blacks.

Rouse, Cecilia Elena. Private school vouchers and student achievement: An evaluation of the Milwaukee parental choice program. Quarterly Journal of Economics v113, n2 (May 1998):553-602.

Abstract: In 1990, Wisconsin began providing vouchers to a small number of low-income students to attend nonsectarian private schools. Controlling for individual fixed-effects, the test scores of students selected to attend participating private school are compared with those of unsuccessful applicants and other students from the Milwaukee public schools. It is found that students in the Milwaukee Parental Choice Program had faster math score gains than, but similar reading score gains to, the comparison groups. The results appear robust to data imputations and sample attrition, although these deficiencies of the data should be kept in mind when interpreting the results.

Rouse, Cecilia Elena. Schools and student achievement: More evidence from the Milwaukee parental choice program. Economic Policy Review v4, n1 (Mar 1998):61-76. [ABI has full text.]

Abstract: In 1990, Wisconsin became the first state in the nation to implement a publicly funded school voucher program. The Milwaukee Parental Choice Program provides a voucher to low-income students to attend nonsectarian private schools. In this paper, 3 existing studies of the effects of the choice schools on student achievement are reviewed. Two of the studies report significant gains in math for the choice students and two report no significant effects in reading. The analysis is also extended to compare the achievement of students in the choice schools with that of three different types of schools: attendance area schools, magnet schools and attendance area schools with small class sizes and supplemental funding from the state of Wisconsin. Results from the studies are discussed in detail.

Ruhm, Christopher J. The economic consequences of parental leave mandates: Lessons from Europe. Quarterly Journal of Economics v113, n1 (Feb 1998):285-317. [NBER WP version]

Abstract: A study investigates the economic consequences of rights to paid parental leave in 9 European countries over the 1969 through 1993 period. Since women use virtually all parental leave in most nations, men constitute a reasonable comparison group, and most of the analysis examines how changes in paid leave affect the gap between female and male labor market outcomes. The employment-to-populations ratios of women in their prime child-bearing years are also compared with those of corresponding aged men and older females. Parental leave is associated with increases in women's employment, but with reductions in their relative wages at extended durations.

Shaver, J Myles. Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival? Management Science v44, n4 (Apr 1998):571-585.

Abstract: Firms choose strategies based on their attributes and industry conditions; therefore, strategy choice is endogenous and self-selected. Empirical models that do not account for this and regress performance measures on strategy choice variables are potentially misspecified and their conclusions incorrect. It is highlighted how self-selection on hard-to-measure or unobservable characteristics can bias strategy performance estimates and an econometric technique that has been developed to account for this effect is recommended. Although this concern applies to a wide range of strategy questions, to demonstrate its effect it is empirically examined to see if entry mode choice (acquisition versus greenfield) influences foreign direct investment survival. In specifications that do not account for self-selection, it is found that greenfield entries have survival advantages compared to acquisitions. This confirms previous findings. However, the significance of this effect disappears once self-selection of entry mode in the empirical estimates is accounted for. The results confirm that estimates from models that do not account for self-selection of strategy choice can lead to incorrect or misleading conclusions.



Endnotes

1. The Appendix describes OLS regressions.

Notation note: Because we express wages in log terms, the coefficient B reflects the returns to an additional year of schooling. Thus, a coefficient of .09 implies that people with one additional year of schooling earn about 9% higher wages.

2. Mathematical note: Formally, consider the model with schooling as the only variable X:

Yit = B·Xit + Ai + eit .

The OLS estimate of B is

bOLS = (X'X)-1 · X'Y = X'(B·Xit + Ai + eit ) / X'X

Taking expectations (and noting X is uncorrelated with e (that is, E(X'e) = 0) but X is correlated with A (E(X'A) > 0), we have:

E(bOLS ) = B (X'X) / X'X + B · X'A/X'X

= B + B · cov(X,A) / V(X) > B.

This result shows that if ability is positively correlated with schooling, if we omit ability from the regression then the coefficient on schooling is biased up.

3. Jargon note: Variation in unobservable factors is sometimes called "unobserved heterogeneity;" "selection bias" (because a non-randomly selected group attend college),  and "omitted variable bias" (because the regression omits the factors that led to the selection).

4. Mathematical note: We showed above that when ability is correlated with schooling, the OLS estimate of is biased up:

E(bOLS ) = B + B · cov(X,A) / V(X) > B

Assume we have an instrumental variable Z that is correlated with X but not with A or e (that is, E(Z'e) = E(Z'A) = 0). In words, Z is correlated with schooling but not ability. Then we can create the instrumental variable estimate:

bIV = Z'Y / Z'X = Z'(B·X + A + et ) / Z'X

Taking expectations gives:

E(bIV) = E (BZ'X + Z'A +Z'e ) / Z'X

Using the facts that E(Z'e) = E(Z'A) = 0 we have

E(bIV)= B,

so the IV estimate is not biased. One can show with lots of algebra that the two-stage least square estimate is the same as the formula above.