New York City
October 2003

The Statistics of Standards Erosion: An Interview with Dr. Valen Johnson
by Mitchell Levine

In the mid-Nineties, as a professor of Biostatistics at Duke University, Valen Johnson noticed a strange phenomenon occurring in the grading system of that highly regarded institution: Almost half the GPAs awarded by the college were in fact A to A pluses. Although some variation was displayed across the different programs—arts and humanities departments were far less stringent in their grading than the sciences—it was clear that what he refers to as a “crisis of standards” was threatening the integrity of the academic process.

In his book, Grade Inflation: A Crisis in College Education [Springer-Verlag, 2003], he details the circumstances that led to his research program, which he named DUET (Duke Undergraduates Evaluate Teaching), into the causes of this breakdown in intellectual vetting. After having his proposal to create a new system that substituted an “achievement index,” a weighted average to compensate for variations in grading strictness, for the traditional scheme handily vetoed by professors in several different departments, he set up a website that allowed students to deliver their teacher evaluations online. By matching their commentary with their averages and cross-referencing it against the grade norms of their classes in particular and the departments and the college itself as a whole, he learned something that should shock no one, but surprised everyone: teachers are motivated to assign students the grades they feel they deserve, because it leads to advancement for their careers. Speaking with Education Update, Dr. Johnson described his discovery: “Tenure and status promotions are in large part determined on the basis of student evaluations as a key factor. Students tend to view the process with an attribution bias: If they score well, it’s because of their intelligence and hard work. If they don’t, it’s because the grading was too strict.”

With these two tendencies interacting, teachers will often be pressured to grade leniently just to pander to their classes and their “enrollment vote.” Other theories, like the idea that classes with excellent teaching simply learn more and therefore score higher on average, or that self-selection of courses by motivated students lead to higher grades, he was able to discredit on the basis of a quantitative analysis of the data he was able to collect. Instead, it indicated that the correlation between grades and “Student Evaluations of Teaching (or SET) ratings is due to grade attribution and to a smaller extent to intervening factors.” That is, instructors who grade more severely are likely to have more students give them lower SET ratings than the instructors who grade less so, because they feel that it is the instructor’s fault that they are earning a lower grade. One of the biggest myths his research was able to dispel is the commonly held, if counterintuitive, notion that SETs are actually measures of student learning: Even if it is true that students don’t directly award teachers uniformly higher ratings simply for grading leniently, their ratings are still indices of student satisfaction, and not higher levels of understanding of the course material.

Interestingly enough, he tells us, the problem probably can’t actually get much worse than it is now. In fact, if it did almost every student would be receiving highest marks. Nonetheless, the situation as it stands, he feels, is seriously undermining the credibility of higher education. When students dictate grades, and grad schools demonstrate indifference to grading their enrollees after those students have been admitted, who will be able to ensure that the graduates academia turns out are truly qualified in their fields?

The solution he proposes manages to be as surprisingly simple to explain, as it was impossible for him to politically implement. All that would be necessary to counteract the upward biasing, he claims, would be to just ignore the lowest and highest 10 percent or 20 percent of the class when tabulating the ratings, since these two groups are the most likely to be grade-biased when evaluating their instructors. Unfortunately, as reasonable as this sounds, no school that he knows of has been able to set such a policy in motion. He sums up the problem in a simple epigram: “To right the boat, two things must happen (and) more principled student grading practices must be adopted, and faculty assessment must be more closely linked to student achievement.”

When asked if it was fair that students graded with a weighted measure of performance instead of a traditional grade point average would be placed at a disadvantage when competing with students from institutions with “grade-biased” academics for admissions into graduate programs, he admitted that it would be a liability for them, but also points out that it would lead to more solidly prepared candidates overall. Students with valid measures of learning available will therefore have deeper insight into how much they are actually learning, and will thus be empowered to learn more.

Any reader desiring to understand the true dynamics of grade assessment and academic integrity in higher education today—and that should include anyone teaching at, studying in, paying tuition to, or hiring graduates from any American college or university—must give themselves a flunking mark if they have not read this book.#

