Just how much of science is actually cargo cult science? Recall that cargo cult science is an activity that looks like science, but it does not actually work. So how do we tell if science “works?” I would propose that it works when the experiment and its results can be independently replicated. So we need to ask how much of science has indeed been independently replicated.
We don’t really know how much of science is actually cargo cult science. But I suspect it is more than most people think. Consider the article, Replication studies:Bad copy In the wake of high-profile controversies, psychologists are facing up to problems with replication, by Ed Yong. He begins by noting an experiment that was published in the peer review literature that showed evidence of psychic abilities. Because such findings were so controversial, three other labs tried to independently replicate the findings (less controversial studies apparently do not receive the same scrutiny). They failed to do so. But then they stumbled upon a bigger problem in science: “they faced serious obstacles to publishing their results.” Yong explains the situation as follows:
Positive results in psychology can behave like rumours: easy to release but hard to dispel. They dominate most journals, which strive to present new, exciting research. Meanwhile, attempts to replicate those studies, especially when the findings are negative, go unpublished, languishing in personal file drawers or circulating in conversations around the water cooler. “There are some experiments that everyone knows don’t replicate, but this knowledge doesn’t get into the literature,” says Wagenmakers. The publication barrier can be chilling, he adds. “I’ve seen students spending their entire PhD period trying to replicate a phenomenon, failing, and quitting academia because they had nothing to show for their time.” These problems occur throughout the sciences, but psychology has a number of deeply entrenched cultural norms that exacerbate them. It has become common practice, for example, to tweak experimental designs in ways that practically guarantee positive results. And once positive results are published, few researchers replicate the experiment exactly, instead carrying out ‘conceptual replications’ that test similar hypotheses using different methods. This practice, say critics, builds a house of cards on potentially shaky foundations.
It is interesting to note that in 2012, “once positive results are published, few researchers replicate the experiment exactly, instead carrying out ‘conceptual replications’ that test similar hypotheses using different methods.” Recall that Feynman outlined the same situation back in the 1940s:
She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1947 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happens.
So a lot of science is not truly being replicated and has not been replicated since the 1940s. That doesn’t mean it is necessarily cargo cult science, but it does mean a lot of it could be cargo cult science. Yet it gets worse than this. Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn recently published a paper in Psychological Science entitled, False-Positive Psychology : Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. They show that it is just too easy to generate positive results for just about any topic. They write:
Perhaps the most costly error is a false positive, the incorrect rejection of a null hypothesis. First, once they appear in the literature, false positives are particularly persistent. Because null results have many possible causes, failures to replicate previous findings are never conclusive. Furthermore, because it is uncommon for prestigious journals to publish null findings or exact replications, researchers have little incentive to even attempt them. Second, false positives waste resources: They inspire investment in fruitless research programs and can lead to ineffective policy changes. Finally, a field known for publishing false positives risks losing its credibility.
The key point, IMO, is how a failure the replicate previous findings can be explained away with many possible causes. This means there is a built in intellectual inertia that favors the existence of cargo cult science within science. In other words, if lab X cannot replicate the work of lab Y, the failure of lab X is likely to be explained according to technicalities and unseen, but assumed to be unimportant, variables. So the failure to replicate is simply filed away. False positives are thus immunized to a certain extent. But it’s even worse than that:
In this article, we show that despite the nominal endorsement of a maximum false-positive rate of 5% (i.e., p ≤ .05), current standards for disclosing details of data collection and analyses make false positives vastly more likely. In fact, it is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis.
So it’s easy to statistically demonstrate a positive result when no such positive result truly exists. In fact, take in the abstract of this paper:
In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.
So, if it is “unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis” and a large chunk of science is not independently replicated and confirmed for a variety of reasons, it stands to reason that a large chunk of mainstream science is actually cargo cult science. Ah, but maybe this is a problem for social sciences, which do rely heavily on statistics to show their positive results. Surely this is not a problem for the hard sciences, right? Well, consider this recent report:
A former researcher at Amgen Inc has found that many basic studies on cancer — a high proportion of them from university labs — are unreliable, with grim consequences for producing new medicines in the future. During a decade as head of global cancer research at Amgen, C. Glenn Begley identified 53 “landmark” publications — papers in top journals, from reputable labs — for his team to reproduce. Begley sought to double-check the findings before trying to build on them for drug development. Result: 47 of the 53 could not be replicated. He described his findings in a commentary piece published on Wednesday in the journal Nature.
Here’s some more:
Other scientists worry that something less innocuous explains the lack of reproducibility. Part way through his project to reproduce promising studies, Begley met for breakfast at a cancer conference with the lead scientist of one of the problematic studies. “We went through the paper line by line, figure by figure,” said Begley. “I explained that we re-did their experiment 50 times and never got their result. He said they’d done it six times and got this result once, but put it in the paper because it made the best story. It’s very disillusioning.
The surest ticket to getting a grant or job is getting published in a high-profile journal,” said Fang. “This is an unhealthy belief that can lead a scientist to engage in sensationalism and sometimes even dishonest behavior.” The academic reward system discourages efforts to ensure a finding was not a fluke. Nor is there an incentive to verify someone else’s discovery. As recently as the late 1990s, most potential cancer-drug targets were backed by 100 to 200 publications. Now each may have fewer than half a dozen. “If you can write it up and get it published you’re not even thinking of reproducibility,” said Ken Kaitin, director of the Tufts Center for the Study of Drug Development. “You make an observation and move on. There is no incentive to find out it was wrong.”
So it’s not just a problem for the social sciences, now is it? What are we to make of all this? Does it mean we can toss out all of science? Of course not, as many scientific studies have been replicated in the lab or in the form of the generation of new technologies.
What it means is that anytime we are presented with new research findings, we should remain skeptical until the results have been independently replicated. This is especially true when someone who seems to have a political or metaphysical agenda is pushing the results from this study or that study. If you encounter such a person, the proper response is: “Has that study been replicated and, if so, can you provide the reference?” If they cannot show the study has been independently replicated by a different lab, then it is perfectly fair to remain skeptical and raise the very real possibility that someone is trying to advance their agenda by citing cargo cult science. They may accuse you of being “anti-science” for rejecting their favorite study, but you should then point out that anyone who is bothered or upset by the need to replicate scientific findings is the one who is truly “anti-science.”