|
|
9486 in the collection
Why we are ‘smart’ about evaluating athletes and ‘dumb’ about assessing students, teachers and schools
The Answer Sheet asked prominent researcher and educational psychologist David Berliner of Arizona State University to explain why using a standardized test score as a single measure of academic achievement doesn’t make sense and why we should use multiple measures.
Stephen Krashen comment: David Berliner is right: It is highly likely that teacher evaluation does a better job of evaluating students than standardized commercial tests do: The repeated judgments of professionals who are with children every day is probably more valid that a test created by distant strangers. Moreover, teacher evaluations are "multiple measures," are closely aligned to the curriculum, and cover a variety of subjects.
For some evidence supporting this:
Bowen, W., Chingos, M., and McPherson, M. 2009. Crossing the Finish Line: Completing College at America's Universities. Princeton: Princeton University Press.
Geiser, S. and Santelices, M.V., 2007. Validity of high-school grades in predicting student success beyond the freshman year: High-school record vs. standardized tests as indicators of four-year college outcomes. Research and Occasional Papers Series: CSHE 6.07, University of California, Berkeley. http://cshe.berkeley.edu
For those who argue that we need standardized tests in order to compare student achievement over time and to compare subgroups of students, we already have an instrument for this, the NAEP. The NAEP is administered to small groups of children, who each take a portion of the test, every few years. Results are extrapolated to estimate how the larger groups would score. No test prep is done, as the tests are zero stakes: There are no (or should be no) consequences for low or high scores. If we are interested in a general picture of how children are doing, this is the way to do it. If we are interested in finding out about a patient’s health, we only need to look at a small sample of their blood, not all of it.
By David C. Berliner
Americans are smart about evaluating athletes and sports teams, and dumb about evaluating students, teachers and schools. Let me explain.
Recently, two NFL teams were unbeaten after more than a dozen games into the 2009 season. Then both lost.
Suppose that you were observing them on the day they lost, rather than on the 13 or 14 times they had previously won. Given the circumstances, we might all agree that the day you watch a team matters, and thus a single observation can lead to a big mistake in judgment about a teams’ proficiency.
Suppose on the day you watch one of these teams the quarterback threw for 350 yards and three touchdowns. If that were all you were assessing that day, and reported the quarterbacks’ performance to others, it would sound impressive.
But the team you watched actually lost their game because the quarterback threw two interceptions, fumbled once, and had minus rushing yards. So despite his three touchdowns and impressive passing yardage in that particular game, we might well have reasons to boo the quarterback.
We might boo because we know that playing quarterback requires split-second decision making, skill in passing and running, holding on to the ball while being swarmed by blitzing linemen, reading defensive formations at the line of scrimmage, rallying team mates when the going gets rough, and a host of other skills.
In other words, the concept of quarterback is a complex one, made up of different bits of knowledge and skill that overlap and together make up the notion of “quarterbacking proficiency.”
“Touchdowns thrown” is by itself a pretty poor measure of the worth of a quarterback.
Even the most naïve sports fan knows that a team, or any player on it, should not be evaluated on the basis of a single observation.
Yet we often judge students, teachers, and schools on the basis of just that. A single test, often given over a few days in the Spring semester, constitutes the assessment of our students’ knowledge and skill for the purposes of evaluation under the No Child Left Behind (NCLB) law to which our schools must adhere.
But the test may have on it a form of item the students hadn’t encountered before. For example they might see,
4
+
8
rather than the more familiar 4 + 8.
This little difference has been found to change the correct responses rate in some primary grades by 20 percentage points and more!
Or on any given day students might misinterpret an item or two, which sounds like it might not be much of a problem but it might reduce average test scores by many points since the number of items right is not usually the way scores are reported to a student or for a class.
A small two-item difference on a test on any one day could result in much bigger appearing score differences for students or schools. In addition a student might misalign the questions with the answer sheet and then get fewer answers scored correct by a machine.
Perhaps many students stayed up late the night before watching their favorite basketball team play in a tournament or viewing a TV special, throwing their sleep pattern off for that particular day of testing. Or suppose the school cafeteria served a particularly nutritious breakfast the day of the test. Those students who received the school breakfast could have performed better than they usually do.
The point is that on any given day, in any curriculum area measured, dozens of influences could affect the scores of a student or a class. The scores obtained on any one day may diverge a lot from the scores obtained on another day. The solution is to have multiple observations from which we could take an average that might better characterize the typical score of a student or class.
In addition, the tests in the United States often have no items covering major areas of the domain that we call reading or mathematics.
We recognize that quarterbacking is multi-faceted and we understand that all of the skills needed to hold that position must be assessed or we would make a mistake in judging the worth of a particular quarterback.
Reading is no different. It is about more than decoding, spelling and punctuation. Reading is about making sense. It requires connecting what was read with one’s own experience, being able to retell what was read in one’s own words, predicting the next events in a story, analyzing plot and theme and the motives of characters, recognizing metaphors and symbols, discerning the authors intent, and dozens of other sophisticated forms of “comprehending” the text.
Mathematics is an equally complex domain. Being proficient at mathematics requires many skills, most of which are never tapped in the single administration of the tests we use throughout the nation.
Why in the world do we readily recognize the problems of using a single observation in judging sport figures or teams, and abandon those ideas in education?
Almost without exception, the tests for compliance with the No Child Left Behind law are ordinarily given only once. And a single observation is no more appropriate for judging students, teachers, and schools than it is in judging athletes or their teams.
Education Secretary Arne Duncan is an athlete. He should know this.
But because Secretary Duncan does not have classroom experience, he may not know that teachers evaluate their students every day, 180 times a school year. The fact that under our laws these teachers have no say in evaluating their students’ skills and abilities is really quite ludicrous.
Their clinical knowledge, derived from many mini-tests, homework assignments, and classroom interactions is devalued in the quest for “objective” scores. But we now see that those one shot “objective scores” may be invalid as measures of what a child actually knows or what a teacher can accomplish.
Our national assessment practices have another problem. It makes the situation even worse for teachers and administrators than it is for students—and it is plenty bad for students.
Teachers and schools are evaluated on the basis of how well their students do, not on the basis of their actual teaching.
We all know that the coach of a Class D high school football team isn’t expected to win often if he plays against teams in tougher leagues. He simply hasn’t the talent pool to do so. And we are aware that physicians in cancer hospitals would have a low rating were we to judge their performance primarily on patient longevity.
Yet we rarely factor in the student’s social class or their family and neighborhood characteristics when judging teachers and schools.
Every student under NCLB has to be proficient, which is as ridiculous as saying every athlete has to excel and every patient has to get well. Is it so hard to acknowledge that English Language Learners will probably not do well, at first, on the literacy parts of the tests?
Do we really expect asthmatic children to learn as readily as healthy children? Are families with food insecurity likely to have children who can perform as well as some others?
What is it that lets most people cut the coach some slack and rate physicians on the basis of the severity of the illnesses that they treat, yet prevents them from applying anything like the same logic to teachers and schools?
Why do we judge teachers and schools to be superior, or not, regardless of the conditions under which they work and on the basis of a one -shot test of student knowledge and skill that is clearly inadequate in assessing a) their typical performance, and b) the breadth of the skills needed to be proficient in the domain being assessed.
Although defended by many politicians and parents, the vast majority of the assessment systems we use for compliance with federal law are no damn good. Single rather than multiple observations to assess the competency of a quarterback or team are understood not to be valid and fair.
Single observations are just as invalid and unfair when used to judge students, teachers, or schools.
We can do better.
David C. Berliner is co-author with Sharon L. Nichols of Collateral Damage: How high-stakes testing corrupts America’s schools (2007). He is also co-author with Bruce J. Biddle of The Manufactured Crisis: Myth, fraud, and the Attack on America’s Public Schools (1995).
David Berliner The Answer Sheet: Washington Post
2010-01-06
Americans are smart about evaluating athletes and sports teams, and dumb about evaluating students, teachers and schools.
INDEX OF OUTRAGES
Pages: 380 [1] 2 3 4 5 6 Next >> Last >>
|