David I. Levine January 17, 2003

BA 254C Industrial Relations

Measuring Performance

Social scientists examine many measures of "outcomes." Each performance metric has good and bad aspects.

Organizational-level outcomes

Multiple measures exist for "effective" organizations. For most measures, we examine within-industry performance -- that is, we include industry dummies in regressions. Alternatively, researchers can match a firm with a similarly-sized firm within its industry. Subsequent performance of the match is often a more precise comparison group than comes from a regression.

Perceptual Outcomes

One might use the survey that measures a work practice to ask the effects of the practice as perceived by the respondents.

Perceptual outcomes have the problem of common methods bias. This bias can arise for three related reasons. First, some respondents may just use large numbers more than others; without reverse-coded responses, this practice will lead to spurious correlations between reports of the use of practices and reports of their success. Second, respondents' may be unwilling to admit they engage in an activity but it is not successful. This bias is a social favorability bias. Finally, perceptions may be distorted by cognitive dissonance effects to make it unlikely that they perceive they engage in an activity but it is not successful.

Comparing measures of practices from some respondents and effectiveness from others at the same organization can reduce these problems. For example, one study asked central headquarters to rate the performance of divisions, and measured management practices at the divisional level.

Operations-based measures of productivity or quality:

Examples include hours per (complexity-adjusted) car, percent of time a steel finishing line runs, minutes to carve a single piece on a machine tool (adjusting for piece complexity), or defect rates. These measures have the advantage that they really are productivity. Moreover, by examining very similar production processes, the researchers hope to avoid confounding technology, etc.

The downside of these measures is that they are difficult to gather on a large sample -- it is rare to have 500 workplaces with similar production processes and almost-identical outputs. There is also always a problem controlling for technology, market conditions, etc. Finally, quality and complexity differences can be large and difficult to control for.

Financially-based Measures of Productivity.

One can obtain data on company performance from Compact Disclosure, a database on nearly 12,000 companies that is published on CD-ROM. An annual Fortune listings provides the number of employees in each firm.

One can calculate several measures of firm productivity. The first is sales per employee, a simple and relatively common productivity indicator. Better than sales per employee would be value added per employee, where value added is sales minus the costs of raw materials. Most financial datasets do not include the data on raw materials needed to calculate value added. Both sales per employee or value added per employee have the problem they do not control for capital per worker -- that is, if two firms have equal value added per employee, but one has twice the capital intensity, the low-capital firm is more productive controlling for capital intensity.

To address this problem, we can analyze total factor productivity, an economic measure of the efficiency with which firms use capital and labor. When data are available, one can include raw materials, energy, and other factors as additional inputs. A simple production function is the Cobb-Douglas for value added:

1) VA= K^a·E⁽^1-a) · residual

where Q = production = sales + change in finished goods inventory, K = capital, and E = employment. Dividing by employment and taking logs this equation can be rewritten as:

2) log(VA/E) = a log(K/E) + residual.

We then define total factor productivity = TFP = the estimated residual. TFP > 0 indicates that, given its capital:labor ratio, this company produces a surprisingly large amount.

A more flexible functional form removes the restriction that only the capital labor ratio matters, and permits squared and interacted terms. For example, one can use a translog production function to estimate expected output per workers given the firms capital stock and employment:

3) log (VA) = b₁ log(K) + b₂ log(K²) + b₃ log(E) + b₄ log(E²) + b₅ log(K · E) + residual

The residual of this regression is also a measure of total factor productivity (TFP).

Production functions have been used to study how such inputs as capital (Solow, 1957), education (Levine and Renelt, 1992), and human resource policies such as profitsharing (Kruse and Weitzman, 1990) affect productivity.

Production functions are subject to the problem of endogeneity. For example, high-quality managers might choose more of an input and produce more output, but the quality of the manager, not the input, is responsible for much of the higher output. More generally, whatever omitted factor caused the high input use might directly affect productivity.

Adding a fixed effect and looking at how changes in inputs predict changes in output can help solve some problems of endogeneity. The cost is higher measurement error. Moreover, we are still left with the problem that whatever omitted factor caused the change in input use might directly affect the change in productivity.

Financial returns

There are many measures of financial return such as return on sales, return on assets, and return on investment, and return on common equity; that is, accounting profits divided by sales, assets, shareholders' accumulated investments, and shareholder equity.) Along these lines, one could analyze ROA as the dependent variable, and include log(sales) and log(equity) on the left-hand side as controls. Typically we use logarithms of most scalable variables; profits can be negative (that is, with an undefined logarithm), leading to problems here.

Financial variables have the problem that they do not account well for inflation, they do not control for all inputs (see the production function, above), and they can be manipulated by managers for a variety of purposes such as reducing taxes and fooling investors. Profits are reduced by investments in physical capital and other forms of investment such as R&D and training. Profits are also influenced by idiosyncratic events such as takeovers.

Some, but not all, of these problems are avoided by using measures of cashflow (profits + investment in plant and equipment), so companies do not look bad if they invest a lot. Often it makes sense to look before extraordinary expenses, to avoid single bad years with big write-offs of many years' problems. Often it makes sense to examine returns including interest payments, so companies do not look good or bad if they borrow money to buy their own stock. Thus, a good cashflow measure is what accountants refer to as EBIDT (earnings before interest, depreciation, and taxes).

Stock Market Return.

In many normative and some positive models of a capitalist economy, the ultimate measure of success is stock market returns to shareholders: increased asset value of common equity and dividends. Stock market returns are attractive to the extent that investors value all the information about expected future returns. Returns are available from the Standard & Poors Compustat tapes for the New York, American, and NASDAQ stock exchanges.

Examining the levels of stock market value requires a comparison--the stock market value compared to what? One sensible benchmark is the book value of the enterprise -- what the physical capital of the company cost when new. The ratio of stock market value to book value is also known as Tobin's Q (or average Q).

Tobin's Q captures good management. It also is forward-looking, which is an advantage over financial measures. Stock market value also indicates a strong brand reputation, a valuable patent, product-market monopoly power, or any other asset that is not plant and equipment. , but do not capture investments that investors have not yet observed.

Any stock market indicator of success has the problem that stock returns vary FAR more than would be predicted by (our best measures of) fundamentals. Thus, unless samples are quite large the results may be distorted by this noise in asset prices. Such noise is not purely random, as stock market values includes "fads" of popularity for a stock or sector. Studies of Interent companies at the end of 1999 would reach quite different answers than the same studies at the end of 2000.

A flow measure of stock market value is shareholder return: the capital gain plus any dividends. Such a return must adjust for stock splits, and has problems with takeovers and mergers.

One of the most convincing study designs examines stock market returns in the narrow window of a few days around announcements. This event history method looks at how the stock market of companies that face union drives (for example) perform compared to the market during the few days before and after the announcement that someone is trying to form a union there. Event history studies have examined stock market returns of everything from takeovers to winning quality awards to CEO turnover. Event studies require that the window of a few days around the announcement date correspond to when the market learned of the news.

Problems with financial measures

Ideally we want to adjust financial and stock market returns for a measure of their risk. For example, if one profit stream or stock is far riskier than another, a slightly higher return might not compensate for the extra risk. The capital asset pricing model (CAPM) provides a means to adjust for risk.

We cannot just take the variance of past returns to measure risk, because some risks can be diversified away. For example, if 20 stocks each have high variance but are uncorrelated with each other, a portfolio of all 20 has very low variance. CAPM assumes that only the risk that cannot be diversified away affects market returns. To do this, we first calculate each stock's "beta"-- a measure of its nondiversifiable risk. Compustat includes several estimates of beta for each stock. The risk-adjusted return, then, is the raw return minus beta times the market price of nondiversifiable risk. (CAPM also provides a measure of the market price for nondiversifiable risk.) Unfortunately, slightly different techniques for estimating beta often give very different estimates, suggesting that risk-adjusting returns will always be controversial.

Financial and stock market returns (but not productivity measures) subtract off wages as a cost. Socially, wages are not less valuable than profits; the owner-oriented measures ignore the social benefits of high wages.

Similarly, all of these measures treat higher effort by workers as a good thing. Socially, we want people to work the efficient level of effort, realizing the effort is costly. To make this calculation (conceptually), we need to take output and subtract off the (dollar equivalent) cost of effort. Such adjustments are not feasible usually.

Finally, financial measures are almost always at the corporation level. Many work practices are much smaller units, making it tough to find effects. Public sources also stress very large employers; many times we want to understand what is happening at small and medium employers as well.

Sales growth

We expect well-run firms to increase their sales. Sales that are driven by high productivity or quality do measure "performance." Unit sales that are driven by price cuts may not be sustainable. Conversely, revenue growth that is driven by increases in monopoly power that increase prices measures higher performance for shareholders, but lower performance for society.

At the same time, sales are reduced by industry-wide shocks that are exogenous to this firm. Thus, it is important to control for exogenous demand shocks, perhaps by including industry dummies. In that case, the analysis of sales growth is also an analysis of changes in market share. That framing makes it clear that it is important to define the industry correctly.

Organizational longevity / death

Organizational death is often the easiest form of outcome to measure. Thus, long datasets can be created when no other metrics exist.

The downside is that lots of forms of death exist. Start-ups that get bought for $zillions are not the same as start-ups that go bankrupt, or even those that get bought for a little bit. Conversely, lots of form of life exist--lurching along as a 3-person part-time business is not the same as growing to dominate the global telecommunications industry.

Convergent measures?

Table 1 presents the correlations among several productivity, accounting and stock market performance measures. Many of the correlations are negligible; thus confirming the importance of analyzing multiple measures to capture the many aspects of organizational performance.

Employee-level Performance Outcomes

Supervisory ratings.

Lots of well-known biases concerning stereotypes, halo effects, etc. Ratings tend to have low variance and validity, especially when raters are not trained.

The problems of bias are particularly severe when the focus of the research is discrimination. For example if we find a group performs poorly on the application test and poorly on subsequent supervisory ratings, it is possible that both metrics are biased in a correlated fashion.

Self-reported behaviors (effort, intent to quit).

Lots of self-serving bias concerning self-reported effort. Hard to distinguish self-reported effort from other attitudes due to halo effect.

Observed productivity

These measures are my favorites: sales per salesperson (adjusted for quota), pieces per piece rate work, calls handled per hour per customer service representative. In spite of their advantages, they are hard to adjust for the quality of the work performed. Also hard to adjust for how hard the job is (in sales settings, for the quota; in piece rates, for the standard time per piece).

Outcomes for Employees

A number of measures indicate "this is a good job" -- that is, performance from an employee's perspective.

Wages

It is important to control for hard-to-measure aspects of the job and the worker. (Longitudinal analyses of workers sometimes helps.) For example, we want the present value of health and pension benefits, but such measures are rarely available.

When you do not control for working conditions, wages can be misleading. For example, miners' danger and teachers' summer vacations partly explain why (adjusting for education) miners earn so much relative to teachers.

Systematic reporting error is also a problem. Lots of people, especially low-earnings folk, work black market jobs, and are loath to tell government statisticians or other researchers. High-income folk are also often loath to report their incomes, and most statistical services top-code their incomes anyway.

Safety

Safety is an important outcome in many sectors.

Unfortunately, employee self-reports are influenced by their view of the employer. Disability rates and time to return to work are lengthened by generous benefits and weak labor markets, and can be reduced by employees' desire to return to work and skills that are less affected by an injury.

Employer reports are largely influenced by regulations. Reported ergonomic injuries at Big Three auto assembly plants sky-rocketed after OSHA imposed million dollar fines for under-reporting. More generally, occupational illnesses (that is, those that take a while to develop) are much harder to measure than most accidents.

Death rates due to industrial accidents are usually more reliable than other metrics. Fortunately, deaths at work are rare. Unfortunately, that makes them useless for most analyses at any but the largest scale (e.g., comparisons across nations).

Turnover.

Low rates of voluntary turnover (quits) should indicate a desirable job (or poor labor market alternatives, or golden handcuffs through delayed compensation, etc.). Unfortunately, it is often impossible to differentiate (induced) quits from layoffs from dismissals from retirements.

Moreover, when screening is costly, employers should hire some workers who are bad fits for the job. For such workers, quits reflect good news for both the employer and employee. When quits are beneficial, using low quit rates to measure high performance can be misleading.

Self-reported satisfaction.

Global measures of satisfaction with work are less useful than facet measures -- satisfaction with boss, with job duties, etc.

Self-serving bias can be powerful. To give an example from a different arena, few people publicly report hating blacks anymore, even when they do. Conversely, fifty years ago many respondents reported they would not let a person of Chinese descent stay in their motel, but did let one stay when he arrived with a white researcher.

Discussion

Each construct is extremely difficult to measure. They have problems with reliability (that is, random measurement error). Each measure also has problems with validity due to systematic measurement error (e.g., high earners under-report income the most; people report their largest projects are the most successful) and/or bad causality (e.g., when people manipulate a performance metric to raise their pay).

Micro data on corporate and employee performance is almost always subject to big outliers. Outliers contain lots of information; for example, we care that Southwest Airlines and Wal-Mart increased the most in stock market value of a 15 year span. At the same time, outliers can have lots of measurement error. Moreover, especially when using log transformations on changes, very small startups can show uninformative growth rates. Thus, check all results with methods robust to outliers.

Results are more convincing when they hold for multiple performance measures. For example, when a new work system produces more cars, and also higher quality cars, we believe it increases organizational effectiveness, broadly defined. For each bias or source of error, consider auxiliary testable hypotheses to identify what is going on.

Problems measuring performance faced by researchers are also faced by organizational leaders. Thus, each bias can be a source of research; conversely, many theories suggest reasons why measures can be misleading.

Correlations among Financial Performance Measures

	1	2	3	4	5	6	7	8	9	10	11	12	13
1. Sales/employee T1
2. Sales/employee T2	.27
3. Total factor productivity T1	.55	.19
4. Total factor productivity T2	.59	.14	.90
5. Return on investment T1	-.01	-.00	.12	.04
6. Return on investment T2	-.02	-.07	.02	-.06	.00
7. Return on sales T1	.17	.04	-.02	-.03	.01	.07
8. Return on sales T2	.02	.06	.08	.02	.01	.00	.05
9. Return on assets T1	-.03	-.01	.08	-.04	.99	.00	.01	.01
10. Return on assets T2	-.05	-.06	-.02	-.06	.00	.92	.08	.03	.00
11. Return on equity T1	-.01	.00	.05	-.04	.97	-.01	.01	.00	.97	.00
12. Return on equity T2	.04	-.04	.05	-.00	.01	.65	.02	-.00	.01	.70	.01
13. Total stock market return to investors T1	-.03	.05	.03	-.01	.24	.13	.24	.07	.41	.14	.14	.09
14. Total stock market return to investors T2	.10	.07	.13	.18	.13	.19	.19	.19	.24	.50	.10	.13	.12

Notes:

1 . Ns vary from 514 to 1034 firms.

2. T1 is the average of 1986 - 1988.

3. T2 is the average of 1989 - 1991.

4. Source = Compact Disclosure, Fortune 500 Industrial and 500 Service companies.