The SAT is an integral part of the college admission experience for most prospective students in the United States. The American public accepts this test as a fair and valid mechanism to sort students into higher education and, ultimately, into different positions in society. Questions about the validity or fairness of such a test therefore raise issues of serious consequence in education.
In our Spring 2010 issue, the
Harvard Educational Review (HER) published Maria Veronica Santelices and Mark Wilson’s article “Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning,” a fresh contribution to the debate that followed our 2003 publication of its intellectual predecessor, Roy Freedle’s “Correcting the SAT’s Ethnic and Social-Class Bias: A Method for Reestimating SAT Scores.” Both articles directly challenge the fairness of the SAT exam. Critics of Freedle’s article (Dorans, 2004; Dorans & Zeller, 2004) have impugned the soundness of his research methods and contended that the findings were not robust enough to alternate specifications of item bias.
This journal has a long history of being a space wherein productive discussion on issues of serious consequence in the field of education can take place. With this symposium, we have encouraged further debate on the fairness of the SAT by inviting both defenders and critics to reengage the topic. In doing so, we celebrate HER’s reputation as a journal that encourages debate in scholarship and acknowledge the power of such debate to challenge widely held beliefs, to advance thinking in the field, and to inspire re-examination of deeply entrenched educational policies and practices.
In their article, Santelices and Wilson (2010) argue that Freedle’s (2003) findings are not an artifact of his flawed methods. Using SAT data on students in California in 1994 and 1999—data that was only made available at the behest of the president of the nation’s largest system of public higher education—the authors find evidence of a correlation between differential item functioning (DIF) and item difficulty for African American and White students. Although they do not seek to advance a theory on the causal mechanisms behind these correlations, they do highlight the persistence of some bias on the SAT. Santelices and Wilson’s effort to show evidence of the SAT’s unfairness provokes readers to consider the consequences of bias in a measure that is used to determine college admissions and to question the opacity of testing organizations’ release of test data.
In this symposium, Freedle responds to Santelices and Wilson (2010) by challenging readers to think more broadly about bias and standardized testing. He asks readers to consider differences in the average scores and differences at the item level, such as DIF, as sources of racial and ethnic bias in standardized testing. Within this frame, Freedle urges readers to develop a stronger theoretical understanding of what causes bias on individual items. He also challenges those in the field of measurement to think critically and creatively about how to work toward eliminating racial and ethnic differences in mean scores in standardized testing. It is critical that practitioners and researchers hear the urgency in Freedle’s message when he explains that “the possibility of reducing ethnic group mean score difference is not a new idea, nor are the strategies to do so.” It is equally critical that education scholars answer his invitation to generate “a broader theoretical understanding for explaining the source of significant DIF . . . [and] move this conversation beyond a narrow focus on DIF.”
Neil Dorans, in his symposium response to Santelices and Wilson (2010), asks readers to question some of the assumptions that the authors make in their analysis and to consider the sensitivity of their conclusions on some of these critical assumptions. While Dorans’s criticisms respond to Santelices and Wilson’s arguments, they also speak to the universal struggles with which social scientists often grapple. His argument challenges the degree to which sampling idiosyncrasy—a concern that almost all research in education must face—may have influenced Santelices and Wilson’s conclusions regarding the correlation between DIF and item difficulty. He asserts that the degree of item bias on the test forms that Santelices and Wilson examine is overstated and urges readers to think about the power of words to influence the understanding of social phenomena for different groups of students. Further, Dorans asks us to consider whether a DIF and item difficulty correlation even matters: “DIF is an established measure of fairness; the DIF/difficulty correlation is not. It is the magnitude of DIF on an item that matters, not the correlation between DIF and difficulty.” Finally, Dorans speculates about the extent to which Santelices and Wilson’s study suffers from a peculiar bias of its own: finding exactly what the study set out to find. Confirmation bias, though framed by Dorans as pertinent to the Santelices and Wilson article, is a discussion to which all education research should give close attention.
Santelices and Wilson rejoin the conversation, speaking to the issues raised by Freedle and Dorans, and rounding out the debate that we had hoped to inspire with this symposium. Santelices and Wilson offer a concise point-bypoint response to the commentary of Dorans, noting their points of concurrence and bringing to bear what scholars in the field of measurement —including Dorans himself—have said in the past about their points of contention. They end with a strong affirmation of their original claim that their paper “confirms Freedle’s findings of a systematic relationship between item difficulty and DIF” and echo Freedle’s calls for more research into the DIF/item difficulty correlation, asking the field to investigate its “potential impact on total test scores and real life decisions.”
Although these three responses bring readers’ attention back to issues of bias in the SAT, there are other important facets of this conversation pertaining to the national dialogue about the role of standardized testing. Given the potential for racial and ethnic bias in the SAT, we challenge the field to think more critically about how these tests are used to label students and distribute society’s scarce rewards. However, other important issues about testing, such as those about standards-based examinations and their uses, deserve our critical attention, careful research, and spirited debate.
HER’s history of promoting constructive debate on issues of critical importance in education is long and rich, and we view this symposium as an opportunity to connect HER’s past with its present and future. The publication of Freedle’s 2003 article prompted a maelstrom of controversy that led to a national dialogue about the SAT’s role in college admissions and in American society. Many people are still deeply interested in carrying on this conversation. And we are honored to continue facilitating this discussion.
In an effort to continue the discussion of this very important issue beyond our printed pages, we are carrying the
HER tradition of call-and-response into the twenty-first century. We have set up a forum for responses on our publishing group’s blog,
http://www.hepg.org/blog, and we invite you to participate in this exciting new medium for productive dialogue. So, please join the conversation.
References
Dorans, N. (2004). Freedle’s Table 2: Fact or fiction.
Harvard Educational Review, 74(1), 62–79.
Dorans, N., & Zeller, K. (2004). Examining Freedle’s claims and his proposed solution: Dated data, inappropriate measurement, and incorrect and unfair scoring (Report No. RR-04-26). Princeton, NJ: Educational Testing Service.
Freedle, R. (2003). Correcting the SAT’s ethnic and social-class bias: A method for reestimating SAT scores.
Harvard Educational Review, 73(1), 1–44.
Santelices, M. V., & Wilson, M. (2010). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning.
Harvard Educational Review, 80(1), 106–133.