Voices in Education

Another look at Bias in the SAT
As the debate on a possible SAT bias continues, I want to address two among the many possible issues.

First, is the question of whether the SAT is biased against racial minorities and, particularly against African American students? In other words, are the SAT scores of African American students lower than what we would expect for their White peers of similar ability? To answer this question, psychometric researchers rely on one of the following approaches—differential item functioning (DIF) or differential test functioning (DTF) approach.

From a DIF approach, the two empirical studies published in the Harvard Educational Review (HER) (Freedle, 2003; Santelices & Wilson, 2010), do not portray the SAT as biased. They found that easy items are easier for White students, and that difficult items are easier for African American students. The effects are very small and they seem to cancel one another out.

This leaves the possibility of assessing bias using the DTF approach to explore whether the test can be biased as a whole and in similar ways for all items. The studies published in HER do not focus on this question, and neither do the responses to these studies.

The second issue is whether the SAT measures African American students and White students the same way. It is indeed possible that a test shows DIF without giving a score advantage to one of the groups. From the Freedle and Santelices & Wilson studies published in HER, the answer to the comparability question seems to be no. It would be a problem if the abilities in the two groups are indeed not measured in the same way.

The crucial finding is that item difficulty is correlated with DIF, which I call the DIF/diff correlation. Researchers (Dorans, 2010; Wainer & Skorupski, 2005) who criticize the Freedle and Santelices & Wilson studies are not surprised and agree with the DIF/diff correlation. Dorans (2004) criticized Freedle (2003) for other aspects of his study, not the DIF/diff correlation. Santelices & Wilson (2010) show that the supposed shortcomings of Freedle’s study cannot explain away the DIF/diff correlation in all cases. It seems that it is not the existence of the correlation that is at heart of this debate, but its interpretation.

Freedle (2003, 2010) gives cognitive and cultural explanations. Santelices and Wilson (2010) withhold from an interpretation. Critical authors attribute the correlation to an artifact, though Dorans (2010) does not specify the kind of artifact: “there are good reasons to expect a nonzero correlation with real data (Dorans, in press)”. In addition, Wainer and Skorupski (2005) published an explanation with simulated data from a DIF-free test. There are other possible explanations for why the DIF/diff correlation would appear in a DIF-free test, even when the mean scores of the two groups are not different and even without guessing. In summary, the DIF/diff correlation does not prove DIF, much in the same way that an artifact explanation does not prove that the correlation follows from an artifact. Before drawing any conclusions, the DIF/diff correlation needs to be thoroughly explained and the explanation carefully tested against alternative explanations.

My own conclusions are as follows: First, the SAT does not show bias from a DIF point of view and DTF deserves further study and a further development of its methodology. It would complete the picture regarding possible bias. Second, given its ambiguity, the correlation between difficulty and DIF is perhaps not the best approach to study comparability of what is being measured. Continuing to investigate the correlation between item difficulty and DIF is of great interest, so that we may arrive at understanding of the various possible reasons for this correlation.


About the Author: Paul De Boeck is a professor of psychology and psychometrics at the University of Amsterdam.