Editor’s Note

The National Center for Education Statistics recently released a report mapping proficiency standards on state accountability tests against the National Assessment of Educational Progress. The article below describes one state’s unusual public process for setting proficiency standards.

Volume 22, Number 4
July/August 2006

Proficiency for What?

A Wyoming standards-setting panel weighs the impact of its decisions

Proficiency for What?, continued



The No Child Left Behind Act of 2001 requires states to ensure that all schoolchildren are “proficient” in reading, writing, and mathematics by 2014. But what does proficiency really mean? In what testing expert W. James Popham calls “a genuine atypicality,” officials in the Wyoming State Department of Education decided to tackle this question head on by convening a panel of community leaders in June 2006 to recommend cutoff scores, known as cut-scores, for classifying student performance on the state’s new accountability tests.

“It’s a very, very important decision area,” notes Popham, professor emeritus at the UCLA Graduate School of Education and Information Studies and chair of Wyoming’s Technical Assistance Committee (TAC), made up of five assessment experts. “It is difficult to imagine a set of recommendations apt to have more impact on Wyoming education.”

This is an excerpt from the Harvard Education Letter. Subscribers can click here to continue reading this article. Click here to become a subscriber.

The No Child Left Behind Act of 2001 requires states to ensure that all schoolchildren are “proficient” in reading, writing, and mathematics by 2014. But what does proficiency really mean? In what testing expert W. James Popham calls “a genuine atypicality,” officials in the Wyoming State Department of Education decided to tackle this question head on by convening a panel of community leaders in June 2006 to recommend cutoff scores, known as cut-scores, for classifying student performance on the state’s new accountability tests.

“It’s a very, very important decision area,” notes Popham, professor emeritus at the UCLA Graduate School of Education and Information Studies and chair of Wyoming’s Technical Assistance Committee (TAC), made up of five assessment experts. “It is difficult to imagine a set of recommendations apt to have more impact on Wyoming education.”

Bookmark and Share

The No Child Left Behind Act of 2001 requires states to ensure that all schoolchildren are “proficient” in reading, writing, and mathematics by 2014. But what does proficiency really mean? In what testing expert W. James Popham calls “a genuine atypicality,” officials in the Wyoming State Department of Education decided to tackle this question head on by convening a panel of community leaders in June 2006 to recommend cutoff scores, known as cut-scores, for classifying student performance on the state’s new accountability tests.

“It’s a very, very important decision area,” notes Popham, professor emeritus at the UCLA Graduate School of Education and Information Studies and chair of Wyoming’s Technical Assistance Committee (TAC), made up of five assessment experts. “It is difficult to imagine a set of recommendations apt to have more impact on Wyoming education.”

In most states, the standard-setting process is carried on behind closed doors. State officials review teacher recommendations for various subjects and grade levels, and adjust the figures as they see fit. As additional tests are introduced at different grade levels, as per NCLB requirements, cut-scores are often calculated by extrapolating from existing standards. In Wyoming, however, education leaders have just introduced a completely new system of tests, known as PAWS (Proficiency Assessments for Wyoming Students). Designed to help teachers improve instruction, the tests are administered at grades 3-8 and 11 in three subject areas. (See “Red Light, Green Light,” Harvard Education Letter, March/April 2006.) Confronted with the need to set standards for all its tests at once, state officials seized the opportunity to open up a public discussion about the implications of this decision.

“This sort of standard-setting by a panel of distinguished citizens is quite rare,” acknowledges Wyoming state superintendent Jim McBride. Implicit in the decision to convene a highly visible, prestigious panel is the department’s recognition that the impact of the panel’s deliberations would be felt in areas beyond the K-12 public schools.

“We tried to convene a panel that would represent different interests,” explains Annette Bohling, deputy state superintendent and chief state schools officer. The 14-member panel convened by the Wyoming Department of Education included Governor David D. Freudenthal; several state legislators; the president of the University of Wyoming and the executive director of the Wyoming Community College Commission; the CEO of the Wyoming Business Council; and several members of the state board of education. The panel also included members of the Latino and native American communities. “No other state is doing this,” Bohling emphasizes.

WyCAS vs. NCLB

Wyoming’s previous system of accountability tests, the Wyoming Comprehensive Assessment System (WyCAS), set very high standards for proficiency relative to those of other states. The tests were developed as part of a major school reform effort launched in 1997 that brought Wyoming from ranking in the bottom half of the states participating in the National Assessment of Educational Progress (NAEP) in 1998 to a position among the top 10 states in both math and reading on the 2005 NAEP. (Wyoming also ranked first in 2005 in reading and math performance by students in the lowest socioeconomic group.) But because WyCAS was designed to track the performance of whole schools, rather than individual children, it could not easily be adapted to meet the requirements of NCLB, which holds schools accountable for the performance of specific subgroups of students.

The shift from WyCAS to PAWS meant revisiting the definition of proficiency. To reach Wyoming’s standard for proficiency on WyCAS, students would have to score above the 89th percentile on NAEP in math and above the 80th percentile in reading. By comparison, a study of a dozen states including Wyoming found that in the other 11 states, cut-scores for proficiency were clustered around the 50th percentile and the next highest score was set at the 65th percentile.

But NCLB requires that all students reach proficiency by 2014. If the PAWS proficiency standards were set at the same level as the WyCAS standards, this would require the state’s students to perform well above national norms for grade-level work, as measured empirically by the TerraNova standardized test, notes TAC member Michael Flicek, director of assessment and research for Natrona County Schools in Casper, Wyo. According to Flicek, Wyoming would become a real-life Lake Wobegon, a place where, in Garrison Keillor’s words, “all the children are above average!”

Underlying Issues

To guide their deliberations, panel members were presented with a set of three “discussion starter” options of increasing stringency, using as a baseline the cut-scores recommended by a task force of teachers in each subject area. For each set of cut-scores, panelists were given graphs showing the estimated number of children who would be categorized as advanced, proficient, basic, or below basic based on that set of cut-scores, broken down by ethnic subgroup and special ed status. Armed with all that information, the panel embarked on a two-day discussion in consultation with Wyoming Department of Education officials, TAC members, and representatives from Harcourt, the test developers.
The discussion touched on a wide range of issues associated with the definition of proficiency and its implications:

  • How arbitrary is the process of setting cut-scores for proficiency? “This is not a scientific endeavor,” cautioned TAC member David Berliner, Regents’ professor of education at Arizona State University. “It’s about judgment.” Berliner noted that there is always an element of arbitrariness in setting the passing level for, say, a driving test at 26 correct answers out of 40, rather than 25. Some items, for instance, may be more important than others: “Would you ever want to see a driver on the road,” he asked, “who cannot answer a question like this correctly: ‘What should you do when you see a sign that says School Xing?’” Nonetheless, the standard-setting process is not capricious. The root of the word “arbitrary,” he pointed out, is the same as for “arbitration”—implying a give and take among competing interests.“Nobody has a handle on doing cut-scores right,” Berliner told the panel. “We can try to be rational, but it’s hard to be right.”

  • Why not use the same definition of proficiency as the NAEP standard? Because standard-setting can be controversial, education officials often look to NAEP as a standard of comparison. Although NAEP is widely considered the “gold standard” among large-scale assessments, its performance standards have been widely criticized as technically problematic and too stringent, according to TAC member James Pellegrino, distinguished professor of cognitive psychology and education at the University of Illinois-Chicago. He told the panel that at current rates of improvement, it would take over 100 years for 100 percent of children to become proficient on the NAEP exam, and noted that even Secretary of Education Margaret Spellings has suggested that statewide proficiency levels for NCLB tests should be set closer to the “basic” performance level on the NAEP exam. “It’s useful information, but it should not be treated as gospel,” he said of the NAEP definition.

  • What does proficiency mean in the real world? “The question you have to ask yourselves,” Berliner said, “is ‘Proficient for what?’” Panelists debated a variety of suggestions: To enter college without needing remedial coursework? Not everyone is going to college, some panelists noted. To be prepared for the responsibilities of citizenship? To qualify for a good job? Each definition implied a different standard—and each standard implies different consequences for students.

  • How do proficiency standards affect aspiration and motivation? “You have to look at what motivates or demotivates kids,” noted Tucker Fagan, CEO of the Wyoming Business Council. High standards can motivate schools and children to achieve, but they can also cause more children to be labeled as failures, with potentially devastating consequences, he pointed out. “If we raise cut-scores [too high], we’re going to drive kids below the line and keep them below the line, and that’s not what we want to do.”

The trick, the panelists agreed, was to come up with a “Goldilocks” solution, where cut-scores were set not too low, not too high. “Standard-setting is always a judgmental process,” Pellegrino reassured the panel. “There’s no right or wrong. You’re trying to make a reasonable decision in a social, political, and economic context.”

“If you feel uncomfortable,” Bohling added, “that’s normal.”

Framework for a Decision

Following the discussion, panel members voted on a series of recommendations for the state superintendent. In the process, they evolved guidelines for framing their decisions. Among their suggestions:

  • Consider different goals in setting cut-scores for different performance levels. Raising the cut score for the “advanced” performance category, for example, can be a way to motivate students, without risking the adverse consequences associated with setting proficiency standards that may be out of many students’ reach. The panel recommended that the state superintendent consider raising cut-scores for the “advanced” category in all areas, independent of the level at which proficiency standards were set.

  • In defining proficiency, differentiate between grades 3-8 and 11th grade. In the elementary and middle grades, proficiency is defined as readiness to proceed to the next grade, panelists pointed out. Children who are not proficient in one grade may catch up in the next. By 11th grade, the stakes are higher. “You’re not asking a fifth grader to go out and make his way in the world,” noted Jim Rose, executive director of the Wyoming Community College Commission. The panel recommended that the superintendent make independent decisions on setting cut-scores for the earlier grades and for 11th graders.

  • Treat teachers’ recommendations for math and reading differently. Panelists observed that the cut-scores recommended by teachers for math were set significantly higher than the cut-scores their peers recommended for reading and would initially result in more children failing to achieve proficiency. “It’s typical,” Flicek explained, “for math teachers to recommend more rigorous cut-scores.” Math teachers in the higher grades, he noted, tended to have particularly high expectations. As a result, the panel voted against recommending cut-scores higher than those proposed by the math teachers.

  • Anticipate the impact of future performance. The panel voted to raise cut-scores for reading significantly above those proposed by teachers in that area, noting that the state has embarked on a number of reading initiatives that are expected to strengthen student performance significantly in the next few years. “It’s realistic to expect that we can meet a higher standard,” Bohling acknowledged.By contrast, the state has devoted significant resources to its writing programs over the last 25 years. “[The teachers] know exactly what they are looking for and they have high expectations,” she said. “We’ve seen some bumps in writing [performance] over the last several years. We’re not going to see those kinds of bumps in the future.” The panel voted to accept the teachers’ recommendations for the writing test.

The panel’s recommendations supported those made by teachers in some areas, but refined them or raised the bar in others. “Teachers have certain aspirations,” noted Flicek, “but there are a lot of other interests and aspirations in Wyoming. The point of the panel is to bring those into the process.”

At the end of the discussion, Thomas Buchanan, president of the University of Wyoming, commented, “I leave feeling a lot better about what’s going on in public education.”

Caroline Chauncey is editor of the Harvard Education Letter and assistant director of the Harvard Education Publishing Group.