Unreliable research: Trouble at the lab

7

“I SEE a train wreck looming,” warned Daniel Kahneman, an eminent psychologist, in an open letter last year. The premonition concerned research on a phenomenon known as “priming”. Priming studies suggest that decisions can be influenced by apparently irrelevant actions or events that took place just before the cusp of choice. They have been a boom area in psychology over the past decade, and some of their insights have already made it out of the lab and into the toolkits of policy wonks keen on “nudging” the populace.

Dr Kahneman and a growing number of his colleagues fear that a lot of this priming research is poorly founded. Over the past few years various researchers have made systematic attempts to replicate some of the more widely cited priming experiments. Many of these replications have failed. In April, for instance, a paper in PLoS ONE, a journal, reported that nine separate experiments had not managed to reproduce the results of a famous study from 1998 purporting to show that thinking about a professor before taking an intelligence test leads to a higher score than imagining a football hooligan.

The idea that the same experiments always get the same results, no matter who performs them, is one of the cornerstones of science’s claim to objective truth. If a systematic campaign of replication does not lead to the same results, then either the original research is flawed (as the replicators claim) or the replications are (as many of the original researchers on priming contend). Either way, something is awry.

To err is all too common

It is tempting to see the priming fracas as an isolated case in an area of science—psychology—easily marginalised as soft and wayward. But irreproducibility is much more widespread. A few years ago scientists at Amgen, an American drug company, tried to replicate 53 studies that they considered landmarks in the basic science of cancer, often co-operating closely with the original researchers to ensure that their experimental technique matched the one used first time round. According to a piece they wrote last year in Nature, a leading scientific journal, they were able to reproduce the original results in just six. Months earlier Florian Prinz and his colleagues at Bayer HealthCare, a German pharmaceutical giant, reported in Nature Reviews Drug Discovery, a sister journal, that they had successfully reproduced the published results in just a quarter of 67 seminal studies.

Written By: The Economist
continue to source article at economist.com

7 COMMENTS

  1. While the issues raised in here are both important and interesting, I suspect both those truths are limited in their extent, and I have a number of problems with this attempted critique of the findings of peer-reviewed science.

    Firstly, the estimates of how serious the problem is are highly sensitive to unempirical guesses of certain variables, such as the percentage of hypotheses which are true. GIGO. How does anyone know it’s 10 %? If anyone’s science is bad, it’s that of Ioannidis.

    Secondly, while certain facts about research presented in this piece are as we would expect if the problem was serious, they often admit plausible alternative explanations. For example, perhaps the reason positive results occur far more frequently than would be expected if we did random experiments (e.g. at a 5 % significance level positive results occur more than 5 % of the time) is partly because scientists have a good enough grasp of how the world works to know some null hypotheses aren’t worth testing, as the result is unlikely to be positive. Notice that, because what matters to the calculation I critiqued above is the percentage of alternative hypotheses tested in published research which are true, this didn’t-bother explanation puts the few-conclusions-are-right finding in serious danger. We can even venture an explanation which reflects badly on scientists – that many negative results aren’t published (that’s highly plausible) – yet doesn’t reduce our confidence in any peer-reviewed conclusions.

    Thirdly, insofar as science adopts any incorrect consensus, it does so only if the same false finding is reproduced. What are the odds every test of the effect of aspirin on headaches was wrong? Even if only three such tests have ever happened, the odds are quite remote. If you exponentially reduce the probability of false positives in Ioannidis’s argument to correspond to the case where they are successfully reproduced, its estimate of what percentage of positives are false becomes miniscule. The main place the article’s objections matter is when unreproduced findings are reported, which is mostly the in “Scientists have found” nonsense stories in journalism. Oh, but not in The Economist, I suppose?

    Fourth, I’m unconvinced of the reality of some of these alleged problems. For instance, is there any evidence that scientists really say, “Oh, you couldn’t get the same answer? Well, then that doesn’t count!” Even if they did, they would also have to say, “Wait, no-one else has found this? Then it hasn’t proven reproducible. What a shame.” So the idea that this effect can mislead the consensus is implausible.

    Fifth, if scientific findings aren’t generally right, how has it so effectively revolutionised our world, even just in the past few decades?
    Finally, what’s the alternative to peer review as we now know it? Does this author offer it? This is a work of journalism in a publication that names itself for economics, and has an anonymous author. These are all reasons it’s less likely to be right than the publications it critiques.

    • In reply to #1 by Jos Gibbons:

      While the issues raised in here are both important and interesting,

      Can you summarize in a sentence or two what these important and interesting issues are? It’s a serious question because I read the article and I didn’t see any issues that I would describe that way. What I read was an article that said “science is hard, people make mistakes, here are some examples”. In fact the whole article seemed a bit schizophrenic to me, the tag line of the article is:

      Scientists like to think of science as self-correcting. To an alarming degree, it is not

      But the article was nothing but examples of science self correcting.

      • In reply to #2 by Red Dog:

        Can you summarize in a sentence or two what these important and interesting issues are?

        To be honest I was trying to throw them a bone there, but I think it was still somewhat true, so I’ll try to explain.

        If only we knew the findings of all the studies that were being hidden rather than submitted, we’d know how far, if at all, positive results are overabundant. Since we don’t know the latter, and therefore don’t know how much we can trust these results, we should (a) train journalists to be responsible enough not to reference unreplicated conclusions, (b) make sure a large enough portion of research is dedicated to repeating as yet one-off studies to see what happens (maybe it already is), (c) try to get more knowledge of what is an appropriate statistical method on both sides of the review process, and (d) change how we incentivise researchers so they don’t feel they should rush their work. (I’m least clear on how we’d achieve (a), because I have a lower opinion of journalists than I do of politicians, especially “science journalists”, who I catch being wrong most often of all. NExt hardest for me would be (d), I think. I have no policies in mind.)

        But, having said all that, the most important issue I found in this paper, ironically (since the author proved its existence by accident), is that people who want to criticise how we currently do science need to ask themselves if a single one of the problems they’re considering ever results in a consensus favouring erroneous conclusions. Given that it demands reproducibility, I’m going with no.

    • In reply to #1 by Jos Gibbons:

      Fourth, I’m unconvinced of the reality of some of these alleged problems. For instance, is there any evidence that scientists really say, “Oh, you couldn’t get the same answer? Well, then that doesn’t count!” Even if they did, they would also have to say, “Wait, no-one else has found this? Then it hasn’t proven reproducible. What a shame.” So the idea that this effect can mislead the consensus is implausible.

      I think the key issue is this is talking about perceptions of psychology – not physics! Psychology has had whole schools of thought that were somewhat weird in the past. Some of them have since been corrected.

      • In reply to #3 by Alan4discussion:

        In reply to #1 by Jos Gibbons:

        Fourth, I’m unconvinced of the reality of some of these alleged problems. For instance, is there any evidence that scientists really say, “Oh, you couldn’t get the same answer? Well, then that doesn’t count!” Even if they did, they would also have to say, “Wait, no-on…

        ” Psychology has had whole schools of thought that were somewhat weird in the past. Some of them have since been corrected.”

        Not too sure of that. You still have sub disciplines in psychology such as social psychology and early childhood development, among others, that are still quite weird.

        • In reply to #5 by Neodarwinian:

          You think social psychology is weird but not psychoanalysis? I’m probably biased, but social psychology (at least the in limited understanding I have of it) makes a lot more sense than psychoanalysis, even if at times it seems similarly unscientific.

          This is a problem of social sciences in general, of which psychology (which isn’t a social science in its entirety) isn’t even the worst offender. It’s hard to “prove” anything, and most findings can and will be interpreted differently by many people, even by those of the same field. To those of “harder” sciences, this is seen as a serious fault with social sciences. To the social scientists themselves… well, they are still arguing wether it’s a good or bad thing.

  2. ” It is tempting to see the priming fracas as an isolated case in an area of science—psychology—easily marginalised as soft and wayward. “

    OK, I am tempted!

    ” But some new journals—PLoS One, published by the not-for-profit Public Library of Science, was the pioneer—make a point of being less picky. These “minimal-threshold” journals “

    This could be a problem in actual science as short cut researchers might go here to get published instead of being repeatedly shot down by journals such as Nature.

    Still, this article shows that science is still ever and always self correcting just by being published.

Leave a Reply