Tag Archives: Value-added modeling

VAM is not behaviorist.

9 Sep

I generally agree with Dianne Ravitch on most education policy issues. I have consistently pointed out that value added measures (VAM) of teaching are statistically invalid and I have often cited her blog posts on this issue. Thus, I was very disappointed to see her link to a post that characterized VAM measures as “behaviorist.” Here is the offending quotation:

“However, traditional standardized assessments mainly contain questions that are crafted from a behaviorist perspective. The conceptual understanding that is highlighted in the cognitivist perspective and the participation in practices that is highlighted in the situative perspective are not captured on traditional standardized assessments. Thus, the only valid inference that can be made from a value-added estimate is about a teacher’s ability to teach the basic skills and knowledge associated with the behaviorist perspective.”

These words show an appalling lack of familiarity with modern behavioral psychology. Look at any recent text book on behavioral teaching methods, I like Behavior Analysis for Effective Teaching by Julie S. Vargas, and you find critiques of the use of standardized testing. Here is what Vargas writes:

“Educators realize that the goal of education is to prepare students for a future that requires much more than the skills assessed on a test.”

This is from a chapter where Vargas describes techniques for encouraging creativity and curiosity among students.

In later posts I will write about how facile and inaccurate characterizations of behaviorism have denied our teachers access to a set of highly effective classroom techniques.

9780415526807-210

David Berliner: The fatal flaw of value added assessment

12 Jan

Educational psychologist David Berliner has published a paper on value added assessment of teachers. His conclusion:

 

 “I conclude that because of the effects of countless exogenous variables on student classroom achievement, value-added assessments do not now and may never be stable enough from class to class or year to year to be used in evaluating teachers. The hope is that with three or more years of value-added data, the identification of extremely good and bad teachers might be possible; but, that goal is not assured, and empirical results suggest that it really is quite hard to reliably identify extremely good and extremely bad groups of teachers. In fact, when picking extremes among teachers, both luck and regression to the mean will combine with the interactions of many variables to produce instability in the value-added scores that are obtained. Examination of the apparently simple policy goal of identifying the best and worst teachers in a school system reveals a morally problematic and psychometrically inadequate base for those policies. In fact, the belief that there are thousands of consistently inadequate teachers may be like the search for welfare queens and disability scam artists—more sensationalism than it is reality.”

 

Enhanced by Zemanta

Value added measures of teachers are invalid

2 Oct

This is not a blog about educational policy. However, often findings in my field, educational psychology, have a direct bearing on policy debates. In those cases, particularly when the consequences are great,  it would be irresponsible for me not to speak out.

Value added measures of teacher performance are being widely adopted across the country. This adoption is occurring with very little discussion about the validity of these measures. I believe that these measures, at least as conceived today, are invalid.

A measurement can be defined as taking some property in the world and representing it as a number. An invalid measure is one that does not accurately reflect the property it is supposed to represent.

In the past few weeks I have been analyzing data from a research project. The topic is not important for our discussion here, the methodology, however, is. The approach I am using is called a gain score analysis. Participants are assigned to one of two groups, each group will receive a different intervention. For each group we measured our outcome variable at baseline, that is before treatment. After the intervention we will measure our outcome variable again. Gain score is defined as the final measurement minus the baseline measurement. In other word the magnitude of the change.  By focusing on the magnitude of the change we don’t have to worry about the fact that the baseline scores were not identical. We use a statistical test to see if one group gained significantly more that the other.

A value added measure of teaching is also a gain score analysis. They measure the students’ performance at the beginning of the year and then measure their performance again at years end. The difference would be the gain score or, as it is called in education, the value added. The average gain score for a group of students is said to be the value added by the teacher.

What is wrong with this approach? After all it seems to be identical to what my colleagues and I are doing in our research. Unfortunately, there is a crucial difference. In my study the participants were randomly assigned to the two groups. A gain score analysis can not be valid if the group assignments are not random. 

If students are not randomly assigned to schools and classrooms, and,  of course they are not, then value added measures are invalid for comparisons between teachers.

We know that students learn at different rates. We know this because in research where teaching is kept constant, such as in programmed instruction, students will complete at different rates. What ever the source of these differences in learning rate it means that a teacher’s value added score will, in part, be a function of student characteristics  not under control of the teacher. Thus, any policy based on value added measures is invalid and, by extension, unfair.

I am not opposed to measurement in education. Indeed, I know that properly used measurement can benefit both students and teachers. But to base policy on a measurement that we know to be invalid is senseless.

%d bloggers like this: