Email Status

Volume 29, Number 4
July/August 2013

Assessing the New Common Core Tests

An interview with Joan L. Herman


Joan L. Herman has studied the science of student assessment for more than 30 years. She is the former director and current senior scientist at the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) at the University of California, Los Angeles, and the editor of Educational Assessment. Herman is also a technical adviser to the Smarter Balanced Assessment Consortium, which, with the Partnership for Assessment of Readiness for College and Careers (PARCC), is one of the two state consortia developing tests to assess individual students under the Common Core State Standards beginning next year. HEL editor Nancy Walser talked with Herman recently about how these new tests are shaping up. 

How are these two tests being developed?
Very differently than in the past. Right now, in many states, test developers are given the standards to measure; they disappear into what we call a “black box,” and out comes a test that supposedly measures the standards. Afterward, someone does a study that assesses the alignment between the standards and the assessment to see how well the test actually addresses the content and the intellectual demands that the standards call for. What’s new is that the consortia are using evidence-centered design (ECD), where alignment is built in from the beginning of test development, and the process is very transparent. So people—whether they are parents, teachers, or policy makers—if they take the time, can see what is being assessed and how.

With ECD, you start by establishing the claims about student learning that the test is supposed to evaluate. Both consortia have established four or five claims for both English language arts and math. In English language arts, for example, the claims are roughly that “students can read and understand increasingly complex texts,” “students can write for a variety of audiences and purposes,” and “students can conduct research.” Each of the claims is defined by a set of assessment targets, which comprise the potential evidence for substantiating each claim. The purpose of the test, then, is to collect evidence that will substantiate (or not) the claims for any individual student. The claims and targets essentially define what’s eligible for assessment—so it’s clear what is going to be assessed. Developers then create item specifications to guide the development of test items for each target. The ideal is that two item writers, using the same specifications, would come up with comparable items, which takes the mystery out of what the items will look like.

How are they different from current state tests, and why is it important to have new tests?
First of all, we have new standards that are “fewer, clearer, and higher,” so we need new tests that are aligned to the new standards. A prime difference—that will be a shock to the system—is that the new Common Core consortia tests will be far more rigorous than existing state tests. Alignment studies of current state tests suggest that most primarily test lower-level skills, like recall, and the most-basic applications, like “What is 12 x 134?” Because both of the new tests will contain performance tasks, they will enable the consortia to assess for deeper learning. The tasks are actually going to ask kids to solve problems, synthesize information, reason mathematically, conduct research, integrate multiple sources, write coherent explanations, as well as make reasoned arguments. For example, after a class discussion and a poll of students, and using a set of charts, sixth-graders might be asked to write a note to their teacher recommending where the class should go on a field trip based on preferences, cost, and other considerations. Ideally, teachers can use the specifications for performance tasks, because they will be available, to create similar tasks in their classroom to teach deeper learning; they will be tasks worth teaching to.

How are the Smarter Balanced and PARCC tests similar or different?
They are pretty similar. For both, there will be an end-of-year assessment, where kids will sit at the computer for an hour or two and respond to three types of questions, each being pretty quick to complete: selective response, short answer, or technologically enhanced, in which the items might involve drag-and-drop functions, creating graphs, creating a sequence, or filling in a formula. All the questions on the end-of-year test can be automatically scored, so results will be available very quickly. In addition, they will have an end-of-the-year performance task. Results from these two types of assessments will be somehow combined to create an overall score for each student.

In terms of major differences, Smarter Balanced end-of-year assessment is a computer adaptive test, which can be a more efficient way to test students because the computer adjusts the items that are administered to each student based on the student’s prior responses. PARCC end-of-year assessment is a fixed-form test, meaning all students answer the same questions. Smarter Balanced interim assessment includes both short answer and performance task options, while PARCC offers a midyear performance task.
Earlier this year, you cowrote a report about how well the two new testing systems will assess deeper learning encouraged by the Common Core. What did you find?

We found that both of the consortia plan to assess deeper learning, meaning that their tests will emphasize students’ ability to apply and use their academic knowledge, to think critically and solve problems, and to communicate effectively. The report also highlights the contrast between what’s planned for the consortia tests and the rigor of existing state tests, based on a study by the RAND Corporation. While the RAND study suggests that no more than 10 percent of U.S. students currently are assessed on deeper learning, 100 percent would be accountable for deeper learning on the consortia tests.
Do you think the new tests will be ready to administer in the spring of 2014?
I do. I don’t think they will meet everyone’s wishes and dreams in terms of totally transforming assessment, but they will be an important step forward. There are practical constraints that both consortia have to deal with, such as the limits of available technology, bandwidth, and the practicalities of testing time and cost. There may not be as many performance tasks or they may not be as extended as some would like because of these constraints. Then there’s the whole issue of the development schedule. It’s so quick that it doesn’t really allow time to develop radically new options. The item design doesn’t break all molds for assessment, but I keep saying and thinking, “Let’s not let the perfect be the enemy of the good.”

In your mind, what are the major technical issues that still need to be resolved?
There are a lot of technical issues around integrating performance tasks into the assessment system, particularly in light of the plans to use growth scores for teacher and school evaluations. We know from research that individual performance tasks can vary a lot depending on the topic of the performance. For example, if I’m asked to write an essay on chaos theory and am not given access or time to research the topic, I’m not going to do a very good job on that. But if I am asked to write about the causes of the Civil War, I’m likely to do a better job, because at least I know something about the topic. That exemplifies what we call individual-task interaction. It takes a number of tasks to get a stable estimate of student ability based on performance tasks, yet we don’t have the time or resources to administer multiple tasks. This creates particular problems when you are trying to compare scores from one year to the next, because it’s difficult to disentangle changes in scores that are due to the nature of the performance task administered or to changes in student competency. Performance engenders a lot of comparability challenges.

What are the benefits and drawbacks of giving these new tests online?
Number one, doing it online enables you to do automatic scoring and get feedback from the test much, much faster. Teachers always want quick feedback that they can use to respond to student needs. Technology is ultimately a more efficient way to administer and score—assuming you have the tools and the space to do it. You don’t have all this shipping and packaging of paper and all that coordination and security. The computer also enables you to expand the range of the item types on a test. For example, you can ask kids to move objects on the computer to show a sequence of events. It also enables you to build in accommodations for kids who need them. So, for kids who are English learners, you can build in a glossary to explain the meaning of words they may not understand. They can highlight a word in a math problem they don’t understand, and the software can give them a definition of the word so that students’ language ability doesn’t get in the way of their ability to show their math knowledge. If the state agreed that this was an appropriate accommodation, you could take the reading out of a test altogether and the computer could read the questions to the kids. In terms of drawbacks, there will be technical glitches, but that’s the reason you do pilot testing.

From the students’ point of view, what will it be like to take these tests?
They will definitely be harder, but they should be more interesting. They are going to require more thinking; there will be a greater variety of item types; there will be multimedia. I think they will more actively engage kids because they won’t just be circling responses. There should be a lot of use of authentic texts and authentic, real world problems.

You have said that teachers and others need to prepare parents for the “shock and awe” of the results next spring. What do you mean by that?
The new items are going to address deeper levels of learning. Kids are not used to having to think deeply to respond to tests, so kids will not do as well. Some people think that the new tests will mirror the proficiency levels of the NAEP (National Assessment of Educational Progress) test, which is set at a higher level than many state tests. That means that kids who currently score proficient in states with lower expectations will not be proficient on the new tests. To the public, it may appear like a dramatic drop in performance. I think there needs to be a public relations campaign to let parents and communities know that the reason we moved to the Common Core State Standards is because we are raising our expectations for what students should know and be able to do so they are prepared for college, work, and life. Rather than be a cause for alarm, the issue is, How do we use the results and help students move up?

From a technical point of view, what do you think are the appropriate and inappropriate uses of these test results?
I worry about the results being used in value-added models for teacher evaluation. The results can give us a general barometer of where individual kids are relative to the Common Core standards. I think they will enable us, combined with information on school characteristics, to identify schools that are doing particularly well and those that aren’t. But I worry about making fine-grained distinctions between schools, teachers, and individual kids based on relatively minor differences in scores. Any score is an estimate.

For Further Information

For Further Information

J. Herman and R. Linn. On the Road to Assessing Deeper Learning: The Status of Smarter Balanced and PARCC Assessment Consortia. Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, 2013.