by Michael J. Feuer on September 25,2019
Nearly forty-five years ago the preeminent scientist of psychological measurement, Lee
Cronbach, summarized “Five Decades of Public Controversy Over Mental Testing.”
1 As we approach the centennial of the period covered in Cronbach’s review, there is no sign the controversy is subsiding. On the contrary, recent
scandals of testing fraud—involving parents obsessed with getting their kids into
elite colleges working in cahoots with unscrupulously greedy consultants and admissions officers, as well as not-so-distant memories of K–12 teachers and administrators who went to
jail for tampering with student scoresheets—have brought the whole American testing enterprise back onto the front page. Evidence and allegations of cheating add fuel to the flames of longstanding anxiety that tests perpetuate inequality rather than attenuate it, and that the tests don’t even measure what matters. Some pundits pile on in a frenzy of “I told you so” rhetoric that blames all of higher education for privileging the most
privileged, despite convincing data showing that returns to educational attainment and college completion are
increasing for all groups in society (although at different rates).
An extreme reaction to the latest uproar has been the call to
stop testing altogether. This is odd, as similar suggestions don’t come up in other cases of fraud. Does anyone think that the correct response to tax evasion is to end taxation? Or that the remedy for predatory and discriminatory lending is to shut down the banking system? Or that the way to stop price-gouging by poultry executives is to ban the eating of chicken? Yet when it comes to testing
mischief, we seem to excuse the perpetrators by shifting blame and suggesting that the system (if not the devil) made them do it.
Still, attention to systemic pressures that distort incentives and give people excuses to misbehave is warranted. The educational measurement community has long acknowledged the potential
consequences of testing, but we do not yet have a robust theory to explain, anticipate, measure, and reduce (if not prevent) distortions resulting from overuse and misuse of tests. Without such theory, there will be continued pressure to eliminate testing even when it is designed and applied intelligently. This would be unfortunate, especially as modern theories of cognition—coupled with advances in the analysis of large-scale data—lay the foundation for more sensible, fairer, and more efficient uses of standardized measures.
What might be the building blocks for a behavioral science of testing? As we argue in a recent
volume of the
Annals of the American Academy of Political and Social Science, discussions of the future of testing and assessment should start with an explicit differentiation of its plausible and relevant purposes. As a starting place we suggest these: monitoring of school systems at the regional, national, and international levels; holding schools and teachers accountable for student learning; identifying and placing students with special learning needs; promoting fairness in admissions; and providing instructional support to classroom teachers (at all levels). Other uses of tests might be added to the list, e.g., tests of civic knowledge as a
gatekeeper for immigration and naturalization, tests of skills and knowledge relevant to
job requirements in various occupations, or paper-and-pencil tests to assess the honesty and
integrity of prospective and current employees; these, too, would need to be included in a comprehensive cataloging, to allow for careful scrutiny of their strengths, weaknesses, benefits, and costs.
Preparing such a taxonomy, though, is only the first step. Next come three critical axioms: (1) no measurement or assessment system will be error-free, (2) no single assessment system should be expected to satisfy these goals simultaneously, and (3) there will always be some drift in test use leading to possible degradation in the validity of results. In one form or another, these are not exactly new ideas; but the policy world could use the reminder that, with any quantitative or qualitative technology designed to describe or predict human performance and potential, there will always be some misclassification.
The question becomes how to estimate the magnitude and distribution of the downside risks and unintended effects of testing. Social scientists like to cite Campbell’s Law—“the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures...” But we don’t yet have a model to predict more accurately when, how, and for whom the “law” will have its most damaging consequences. Lessons from the history of economic and political responses to problems of market failure and externalities, enriched with findings from institutional and behavioral economics that more explicitly account for constraints in human and organizational
rationality, could become theoretical foundations for an advanced science of testing; these would be a foundation upon which to design and interpret experiments needed to test hypotheses regarding
incentives embedded in testing programs—including attention to pressures that cause otherwise ethical people to slip—and to estimate the relative efficacy of alternative policy strategies.
As I have argued elsewhere, if political economy is largely about the measurement of
externalities, then a behavioral science of testing would focus on the externalities of measurement. Unless we prefer to limit ourselves to papers lamenting another five decades (or more) of controversy, it is time to invest in research for a new behavioral science of testing. Such an investment would reap
positive externalities for policy and provide needed guidance to the next generation of educators and test developers.
Notes
1 Lee J. Cronbach, “Five Decades of Public Controversy Over Mental Testing
,” American Psychologist 30, no. 1 (1975): 1-14.