- Home
- Naomi Oreskes
Why Trust Science? Page 22
Why Trust Science? Read online
Page 22
Do we need more information on how science is used (or not) in policy? Absolutely. For all the reasons that Drs. Edenhofer and Kowarsch articulate, science alone cannot not tell us what, if anything, to do about disruptive climate change (or any other complex social challenge). But the natural sciences do tell us that if we continue business as usual, sea levels will rise, biodiversity will be lost, and people will get hurt. The social sciences further tell us that trillions of dollars will be spent dealing with climate damages—money that could be used in happier and more productive ways—and we are all going to end up a lot poorer. The point of this book is to explain how and why this science is likely to be worthy of our trust. Should we go further and think harder about the basis for trust in the scientific assessments, such as the reports of the Intergovernmental Panel on Climate Change, which attempt to collate, judge, adjudicate, and otherwise evaluate scientific evidence for the purpose of informing policy? By all means.
The fact that I did not address this issue in these chapters should not be taken as disparaging its significance. On the contrary, I have written a different book on that topic!10 But since it is bad form to answer a serious question by saying “read my other book,” let me say a few words here.
Assessment for policy occurs in a different context from everyday science: namely, that a problem has been identified and some agency of governance has asked for information to help guide policy choices. Often this involves deadlines. Reports are needed in time for a particular press conference, congressional session, international meeting, or the like. There is a demand for an answer, even if the science required to supply it may be evolving and incomplete. This—along with their complex moral and political landscapes—makes assessments for policy more complex than science left to its own devices.
This is not to say that everyday science does not also encounter real, suspected, or alleged problems and threats: Edward Clarke certainly believed that female higher education was a threat. But there is a major difference: the IPCC was formulated as part of the UN Framework Convention on Climate Change, which formally recognized anthropogenic climate change as a threat to sustainable development. (In this sense, a value premise—the value of sustainable development—was embedded into its creation.) This instrument of international governance asked the scientific community, qua community, to gives its best assessment of the consensus of scientific opinion—the state of scientific knowledge—relative to this challenge. No one asked Dr. Clarke for his opinion on female scientific education; no institutional body was waiting for his answer. Thus, one obvious response to Clarke’s work is to point out that it was a single study by a single author; there was nothing remotely approaching a consensus on the matter at hand: not on the proposed solution and not even on the existence of the alleged problem.
I endorse Edenhofer and Kowarsch’s view that the complexity of climate policy issues requires “serious, integrated assessments to facilitate a learning process about the available policy pathways,” and that any such process must necessarily include both the natural and social sciences, as well as perspectives from law, government, religion, and the humanities. Their position is entirely compatible with the arguments I have presented. I also agree that values cannot be excised from such discussions; value differences are a central reason why we have political and social conflict. But I do maintain that ethical overlap among agonists is often greater than may appear. Edenhofer and Kowarsch implicitly acknowledge as much when they invoke “fundamental rights” and the (implicitly undisputed) “backdrop of Sustainable Development Goals,” as well as the prospects for “rational discussion about value-laden policy.” We don’t all agree on everything, but many of us agree on some things, and some of us agree on many things.
My argument complements theirs, insofar as we are all arguing for the open discussion of the role of values in both science and policy. I am not naively suggesting that if only we are transparent about values, all will be right with the world. I am arguing that if we can make the overlap in our values explicit, this may, in some cases, help us to overcome distrust that is rooted in a perceived clash of values. But this is hard for most natural scientists to do.
Because they have been enculturated in the norms of value neutrality, most scientists feel the need to hide or expunge their values from scientific practice and discussion.11 I argue that this is unnecessary and possibly counterproductive. For example: many scientists, even if they are not themselves religious believers, hold values that overlap with believers. This was demonstrated in a series of meetings at the Pontifical Academy of Sciences, which helped to lay the foundation for the Papal Encyclical on Climate Change and Inequality, Laudato Sí.12 The scientists and theologians who attended those meetings did not by any means agree on all things, but we found considerable common ground, which Pope Francis made explicit in his writings.
Diverse people will never agree on all things. My Christian friends believe in the divinity of Jesus Christ and I do not. That is unlikely to change. My point is not that we will reach theological or ethical consensus, but only that, if we share some values, then we can find common ground for a conversation. And that may help us to overcome what otherwise appears to be an insurmountable divide, not only on climate change, but perhaps on other matters as well.
* * *
Professor Jon Krosnick calls attention to a serious issue in contemporary science: the “replication crisis.”13 It is an issue with potential to undermine public trust in science, as well as to refute my argument that the communal processes of vetting scientific claims are likely to lead to reliable results so long as the vetters are diverse and open to self-criticism.
The issue is this: there have been a number of well-publicized examples of papers published in reputable journals—and in some cases heavily cited—whose results could not be replicated. Some papers have been retracted, leading commentators to declare a “retraction crisis.”14 Much of the discussion of the replication crisis, as well as of potential remedies, has focused specifically on psychology and biomedicine.15 However, Professor Krosnick claims that the problem pertains to all contemporary science, because of the incentive structure that rewards rapid publication at the expense of care and diligence. This may be, but Krosnick’s specific examples are all from psychology and biomedicine, and the latter predominantly from clinical trials of drugs. Both are domains in which statistical analysis plays a central role, and both are areas wherein the misuse of statistics—particularly p-hacking—has been demonstrated.
In 2019, a paper published in Nature called for a rethinking of the entire manner in which statistical tests of significance are conventionally used in science. They noted that the failure of a test of an effect to achieve statistical significance at the 0.05 level is not proof that the effect does not exist, yet scientists often claim that it is. Likewise, the finding that the difference between two groups does not achieve statistical significance at the 0.05 level does not prove that there is no difference between those groups, but scientists often make that claim too.16 The authors called for an end to the use of p-values in a dichotomous way and for “the entire concept of statistical significance to be abandoned.” Their paper was supported by the signatures of over eight hundred additional scientists, suggesting that these issues are widespread. We might expect that in any field that relies heavily on statistics—particularly where students are taught statistical tools in a “black-box” fashion—these problems might indeed be widespread.17
There is evidence of this. In a series of recent papers, my colleagues and I have demonstrated that the misapplication of statistics to historical temperature records—combined with social and political pressures—led many climate scientists to conclude wrongly that global warming had stopped, “paused,” or experienced a “hiatus” in the 2000s.18 Despite our work, the misimpressions persist: One government science agency blog post in 2018 misleadingly posed the question “Why did Earth’s surface temperature stop rising in the past decade?” La
ter, it was updated to inform the reader: “Since this article was last updated, the slowdown in the rate of average global surface warming that took place from 1998–2012 (relative to the preceding 30 years) has unequivocally ended.”19
This sentence illustrates how scientists tried to save face when it was demonstrated that Earth’s surface temperature did not stop rising in the past decade: they altered their terms, replacing stoppage, pause, and hiatus with “slowdown.” The latter term reflected the fact that the rate of warming appeared to be lower in the 2000s when compared to a baseline representing the period during which anthropogenic climate change has been underway.
This may seem like a trivial replacement—merely semantics, in some views—but it is not. It is well known that Earth’s climate fluctuates, so even in the face of a steady rise in atmospheric greenhouse gas concentrations, the rate at which the planet would warm would vary. No scientist would expect otherwise, but, ceteris paribus, we would expect the overall direction of change to remain positive. This is, in fact, what happened. In other words: nothing abnormal or unexpected occurred. The observed slowdown was neither scientifically surprising nor epistemically problematic. It was not something that required explanation. Yet, many scientists treated it as if it were, leading to a great deal of misleading conversation both in the scientific community and in public arenas.20
It seems reasonable therefore to conclude that the misuse of statistics is not restricted to psychology and biomedicine. But is there a broader problem with science, writ large? Here the evidence becomes more ambiguous, and I find it surprising that Professor Krosnick—who stresses the importance of rigorous empirical research—makes broad claims on limited evidence and lumps together phenomena that may be distinct.
In his opening, he offers a story of outright fraud—a professor who had fabricated data in over one hundred publications in leading journals. No doubt this is bad stuff, but fraud is a feature of all human activity. Is it more common in science than in finance? Or real estate? Or mineral prospecting? The information offered here does not enable us to judge.21 What it does enable us to do is to ask why was this fraud not detected sooner, reminding us that science (like every human activity) demands oversight, and to consider whether better oversight mechanisms in science are needed.
Then Krosnick offers us something completely different: the story of a paper that claimed to demonstrate the reality of ESP and “set off a firestorm, because the results seemed implausible and could not be reproduced.” This is the opposite of fraud: it illustrates science working as it should. A paper was published that made a strong, surprising, and implausible claim. Immediately it received tough critical scrutiny, and the psychology community rejected it. One might query why this paper was published in the first place, but if science is to be open to diverse ideas (as I have argued it must be), then it is inevitable that incorrect, stupid, and even absurd items will sometimes make their way into print. By itself, that is not an indictment of science. On the contrary, it is evidence that the scientific community has remained open, even to ideas that some of us might think should be closed down.
Then we have the example of one of the most well-known studies in the recent history of psychology—the famous (or infamous) Stanford prison experiment. Here, we are told that the BBC—which is not a scientific organization, so one has immediately to wonder about motivations and possible bias—tried and failed to replicate that study.22 Now we have study 1 versus study 2. What are we to think of that? Four options present themselves:
Study 1 is correct and study 2 failed to replicate it because of flaws in the latter.
Study 2 is correct and study 1 should be considered refuted.
Both studies are incorrect, albeit in different ways.
Both studies are correct, but the conditions under which they were performed were different, and therefore they provide different information about the effects of the conditions under which humans behave.
Without additional information, it is impossible to determine which of these four options is the right one.23
Most of the studies that Professor Krosnick offers as evidence of trouble in science are single studies that were later shown to be faulty. But the thrust of my argument is to stress that scientific knowledge is never created by a single study, no matter how famous, important, or well-designed. What leads to reliable scientific knowledge is the process by which claims are vetted. Crucially, that vetting must involve diverse perspectives and the presentation of evidence collected in diverse ways. This means that a single paper cannot be the basis for reliable scientific knowledge. In hindsight we might conclude that the Stanford prison experiment was given far too much weight, considering that it was a single study.
Albert Einstein’s celebrated 1905 paper on special relativity is a case in point: many people know only of that paper and think that Einstein, on his own, overturned Newtonian mechanics. This is an incorrect view, made possible by ignorance of history. Many of Einstein’s contemporaries helped to lay the groundwork that made the 1905 paper both possible and plausible (most famously, Hendrik Lorenz), and much subsequent work went into consolidating the epistemic gain of the 1905 paper. The same was true of general relativity: various colleagues, including the mathematician Emmy Noether, helped Einstein to resolve difficulties in the theory, and it was the Englishman Sir Arthur Eddington who undertook the experimental confirmation that convinced the world that the theory was true.24
Professor Krosnick’s commentary thus reinforces my argument about consensus: We should be skeptical of any single paper in science. Scientific discovery is a process, not an event. In that process, many provisional claims—perhaps even most provisional claims—will be shown to be incomplete and sometimes erroneous. As several past presidents of the US National Academy of Sciences recently argued, refutation and retraction, if done in a timely manner, may be viewed as science correcting itself as it should.25 Conventionally, we have called this process progress.
Admittedly, this does put us in a difficult situation when we have to make decisions on the basis of scientific knowledge that may in the future be challenged. What are we to do at any given moment, when we cannot say which of our current claims will be sustained and which will be rejected? This is one of the central questions that I have raised. Because we cannot know which of current claims will be sustained, the best we can do is to consider the weight of scientific evidence, the fulcrum of scientific opinion, and the trajectory of scientific knowledge. This is why consensus matters: If scientists are still debating a matter, then we may well be wise to “wait and see,” if conditions permit.26 If the available empirical evidence is thin, we may want to do more research.
But the uncertainly of future scientific knowledge should not be used as an excuse for delay. As the epidemiologist Sir Austin Bradford Hill famously argued, “All scientific work is incomplete—whether it be observational or experimental. All scientific work is liable to be upset or modified by advancing knowledge. That does not confer upon us a freedom to ignore the knowledge we already have, or to postpone the action that it appears to demand at a given time.”27 At any given moment, it makes sense to make decisions on the information we have, and be prepared to alter our plans if future evidence warrants.28
Returning to psychology, I was surprised that Professor Krosnick did not offer what I consider to be the most egregious recent example of bad science in that field: the “critical positivity ratio.” This was the claim that a very specific number—2.9013—could be used in a variety of ways to distinguish psychologically healthy individuals from unhealthy ones.29 After the paper was published in 2005, it was cited over one thousand times before being debunked in 2013, when a graduate student, Nick Brown, collaborated with physicist Alan Sokal and psychologist Harris Friedman on a reanalysis of the data.30 In hindsight it is bizarre that this paper—with its implausibly broad and ambitious claims and absurdly precise “ratio”—five significant figures!—would have broadly been accepted. Its
theoretical reliance on nonlinear dynamics might also have suggested that the paper was little more than trendy hype.31 But the crucial point here is this: it was a single paper. It may have been heavily cited, but it did not represent the consensus of professional experts.
Perhaps Professor Krosnick did not include it because it does seem to suggest that something is rotten in the state of psychology. But that is not the conclusion Krosnick wants and, perhaps for this reason, he paints with a broad brush and speaks in general terms. I think this is unfortunate, for it does not help us to delineate the extent and character of the problem. He gives us a singular example of fraud in political science and uses this to implicate the entire field. He speaks of the “physical sciences” when he is referring to biomedicine. He suggests that problems in engineering are “not uncommon,” but then offers only hearsay and anecdotes about engineering, and no evidence at all from physics, physical chemistry, geology, geophysics, meteorology, or climate science. He acknowledges that his observations about the causes of the alleged crisis are “mostly speculations.”
Then there is the claim that retractions are “skyrocketing.” In a world of skyrocketing numbers of publications, this is not a meaningful claim. The relevant metric here is the retraction rate, so let’s look at that. Steen et al. (2013) conclude that the retraction rate has increased since 1995, but that the overall retraction rate in the period 1973–2011 (based on an analysis of 21.2 million articles published in that interval) was 1 in 23,799 articles, or 0.004%.32 Fang et al. (2012) conclude that the percentage of scientific articles retracted because of fraud has increased ∼10-fold since 1975, but this still leaves the overall retraction rate at < 0.01%. It is difficult to see how this constitutes a general crisis in science.33