On Wednesday, I blogged about a new study of the effects of yoga on cognitive impairment. Thinking it over, I realized that some of the study’s results rest on a serious methodological flaw.
The study compares measures before the intervention to measures after the intervention within in each group. For example, it looks at the Geriatric Depression Scale scores for the yoga group before and after the intervention and says that there is a statistically significant difference. But this is not the correct analysis, we want to compare the changes between the yoga group and the control group. An appropriate procedure would have been a gain score analysis. The authors could have subtracted the after treatment scores from the before treatment scores and then compared those two values using an appropriate statistical test.
In the other words, the study had the possibility of comparing the control and the experimental group but failed to so. All it really says it that the scores improved in the treatment group. That is an interesting finding, but it should be considered only exploratory and suggestive. I have no objection to publishing exploratory findings, I have done so myself. But the authors had the opportunity to make a better test and they failed to do so.
There were many good presentations at APS this year, but by far the best was the three hour workshop I attended on JASP and Bayesian analysis run by Eric-Jan Wagenmakers. This led me to look up some of his writings including this great paper: “Bayesian Benefits for the Pragmatic Researcher.”
As way of illustration, the paper test the South Park Hypothesis: the contention that there is no correlation between the box office success and the quality of Adam Sandler movies. Quality is operationalized as freshness rating at Rottentomatoes.com.
It is called the South Park hypothesis from this bit of dialog:
“Producer: Watch this. A.W.E.S.O.M-O, given the current trends of the movie going public, can you come up with an idea for a movie that will break $100 million box office?
Cartman: [as A.W.E.S.O.M.-O] Um… Okay, how about this: Adam Sandler is like in love with some girl. But it turns out that the girl is actually a golden retriever or something.
Mitch: Oh! Perfect!
Executive: We’ll call it “Puppy Love”.
Mitch: Give us another movie idea, A.W.E.S.O.M.-O.
Cartman: Um… How about this: Adam Sandler inherits like, a billion dollars, but first he has to become a boxer or something.
Mitch: “Punch Drunk Billionaire”.”
The APS meeting was great. I heard many good talks, including this one by John Kruschke’s “Some Bayesian approaches to replication analysis and planning.”
So says educational researcher Stephen Porter:
“Qual folks are also their single best enemy. I trained in comparative politics, where qual scholars are respected, because they adopt a case study approach. Many of the qual researchers I see in education and other areas tend to do dumb things like:
- Abandon any approach to representative sampling when they select participants. They refer this as “purposive sampling” but it is often just an excuse for laziness – representative samples require a lot of work to collect. In a world where K-12 students are now being trained in the nuances of populations and samples, how do you think the average person, or policymaker, reacts to your study when you admit that the people you interviewed are not representative of anything?
- Some qual researchers insist there are multiple realities. What do you think the average person, who lives in a single reality like most of us, thinks of this idea?
- Some are also opposed to any notion of causality and reject the entire concept. Yet we live in a time when voters and policymakers are desperate for solutions to society’s problems. Do you honestly think they want to hear from someone who says, “Sorry, but I can’t really say whether smaller class size causes students’ test scores to increase. I can only describe the students’ experiences”? Such an approach is not very helpful to school districts trying to decide between hiring more teachers versus increasing teacher compensation.
In short, the future of qual research looks grim.”
I have blogged a number of times about the crisis in psychological research. Many widely publicized research findings have been called into question because of faulty methodology. These faulty methods include small underpowered studies, p-hacking, and failure to replicate.
Now the stereotype threat, the claim that awareness of a stereotype about one’s own group will lead to a reduction in performance, has been called into question.
“Stereotype threat is one of the most famous and influential ideas in psychology. It is thought to be a key explanation for group differences in performance – whether the group is defined by gender, race or class. But now, stereotype threat itself is under threat. New studies are questioning just how robust it is, and even whether it exists at all. The same goes for many other staples of social psychology – to the point where the whole edifice is tottering badly.”
Recently, I’ve posted about research on the therapeutic potential of psychedelic drugs. Keith Humphreys, at The Reality Based Community, makes a case for skepticism:
“Being skeptical about miracle cures is simply playing the odds. As my colleague John Ioannidis pointed out in one of the most-read papers in medical history, most medical research findings are wrong. This is particularly true of small studies, which are usually followed by larger studies that disconfirm the original miracle finding (Fish oil pills are a good example).”
I think this is good advice. My intuition tells me that psychedelics might have value, but I am prepared to change my mind, based on the emerging evidence.
This is a significant (excuse the pun) development that is unlikely to be reported in major news outlets. Basic and Applied Social Psychology, an academic journal, has banned authors from using null hypothesis significance testing procedures. This may seem like an obscure topic, but it has enormous implications for what counts as scientific evidence.
As a student, I was aware of growing criticism of null hypothesis significance testing. Strangely, when I raised these issues with faculty, most of them were unaware of the criticism. Even today, when I try to publish a paper using modern or non-parametric methods the reviewers will often either reject it out of hand or demand special justification.