The “Gateway Belief” illusion: reanalyzing the results of a scientific-consensus messaging study

Kahan, Dan

doi:10.22323/2.16050203

1 Introduction: filling in the analysis gap

Van der Linden, Leiserowitz, Feinberg, and Maibach [ 2015 ] report finding evidence that “scientific consensus messaging” can be expected to increase public support for reducing human-caused climate change. According to VLFM, advising study subjects that “97% of climate scientists have concluded that human-caused climate change is happening” induces subjects to revise upward their own estimate of the proportion of scientists who subscribe to this position. The increase in subjects’ estimates, they conclude, is “causally associated with an increase in the belief that climate change is happening, [is] human-caused and [is] a worrisome threat”; changes in these beliefs and attitudes “in turn …increase[] support for public action.” On this basis, VLFM characterize “perceived scientific agreement as a ‘Gateway Belief’ that either supports or undermines other key beliefs about climate change” and climate-change mitigation. They describe their findings “as the strongest evidence to date” that “consensus messaging” — social-marketing campaigns that communicate scientific consensus on human-caused global warming — “is consequential” [van der Linden et al., 2015 , p. 6].

This paper examines the data collected by VLFM and uploaded to the PloS One site. ¹ The paper’s principal aim is to furnish information on experimental results not reported or acknowledged in the VLFM paper.

The VLFM study design involved comparing the before-and-after opinions of subjects exposed to a consensus message with those assigned to a control group, whose members read “distractor” news stories unrelated to climate change. As VLFM accurately report, relative to members of the control group, subjects told that “97% of climate scientists have concluded that human-caused climate change is happening” increased their estimates of “the percentage of climate scientists [who] have concluded that human-caused climate change is happening” [van der Linden et al., 2015 , pp. 1–2].

But VLFM said nothing about whether subjects exposed to such a message “increased” their “key beliefs about climate change” or became more supportive of “public action” relative to the control-group subjects. On “belief in climate change,” both groups revised their assessments upwards by equal amounts. On support for mitigation, the revised positions of the treatment and control group members differed by a trivial and statistically nonsignificant margin. These unreported results are inconsistent with VLFM’s hypotheses and announced findings on the impact of consensus messaging.

The nonsignificance of the study findings was obscured by VLFM’s analytical methods. VLFM report means for the study’s “before-and-after” outcome measures (hereinafter “differentials”), which appear to show very small yet statistically significant changes in a host of beliefs and attitudes. The reported means, however, aggregated the responses of consensus-message and control-group subjects. Had the two groups’ responses been separately reported, it would have been clear that on key mediator and outcome variables the differentials for consensus-message subjects did not differ significantly from the differentials of control-condition subjects, who viewed only a “distractor” news story.

The differences in the responses of the two groups was also elided by VLFM’s use of a misspecified structural equation model. Like VLFM’s summary of the study response means, their structural equation model ignores the impact of the experimental treatment on every variable other than the subjects’ estimates of the percentage of scientists who subscribe to the consensus position on human-caused climate change. The statistically significant model parameters that VLFM characterize as evidence of a “causal relationship” between the consensus-message treatment and “key beliefs on climate change” [van der Linden et al., 2015 , pp. 5–6] are merely simple correlations unrelated to the experimental treatment [Bolsen and Druckman, 2017 ]. ²

It is important to emphasize that the failure of VLFM’s data to support their announced conclusions does not mean that it is inappropriate to promote public awareness of scientific consensus on human-caused climate change. On the contrary, there is ample evidence that members of the public view scientific consensus as a valid guide for policymaking on climate change just as they do for all other manner of policymaking [Lewandowsky, Gignac and Vaughan, 2013 ].

There is a serious scholarly debate, however, about whether social marketing campaigns that feature the existence of scientific consensus (a staple of climate advocacy for over a decade) can be expected to promote constructive engagement with climate science [National Academies of Sciences, Engineering, and Medicine, 2017 , pp. 28–29]. In an important early paper, Lewandowsky, Gignac and Vaughan [ 2013 ] reported that members of an Australian convenience sample were more likely to accept that human activity is causing climate change after being exposed to consensus message. But when Cook and Lewandowsky [ 2016 ] conducted a similar cross-cultural study they found that “consensus information” triggered a “backfire effect” among U.S. study subjects subscribing to a “free market” worldview.

In another valuable study, Deryugina and Shurchkov [ 2016 ] found that immediately after being exposed to consensus messaging, U.S. study subjects revised upward their estimate of the probability that human activity was causing climate change. The subjects did not evince increased support, however, for climate-change mitigation policies. Moreover, in a follow up survey six months later the subjects’ estimates of the probability that human activity is causing climate change no longer differed significantly from their pre-message estimates.

Even more recently, Bolsen and Druckman [ 2017 ] found that consensus messaging had no impact on support for climate-mitigation policies. Indeed, consistent with Cook and Lewandowsky [ 2016 ], Bolsen and Druckman report finding a “backfire” effect among sophisticated Republicans.

Finally, Dixon, Hmielowski and Ma [ 2017 ], attempted a partial replication of VLFM. They report that “consensus messages were ineffective at influencing beliefs about climate change” [p. 526].

The point of this paper, however, is not to offer an assessment of the weight of the evidence on consensus messaging. It is only to clarify, for the benefit of those studying this issue and of those making practical use of this research the actual contribution that one particular study — VLFM’s — makes to this body of work.

This paper begins with an overview of the VLFM experiment. It then presents analyses of the VLFM data.

2 Description of VLFM study

The announced aim of the VLFM paper was to remedy a “major short-coming” in existing studies on the “role of perceived scientific agreement in generating public support for climate change policies”: namely, their reliance on “survey data” that was purely “correlational in nature” [p. 2]. For this purpose, VLFM proposed to test experimentally the “causal relationship between public perceptions of the scientific consensus on climate change and support for public action” [p. 2]. The paper describes as “confirmed” [p. 6] the authors’ hypothesis that an “experimentally induced change in the level of perceived consensus” would “lead to a change in respondents’ support for societal action on climate change” [p. 2].

To evaluate VLFM’s interpretation of their study results, it is necessary to start with a summary of how the VLFM experiment was structured and what it actually measured. Neither appears in the VLFM paper itself. ³

2.1 Experimental conditions and outcome measures

The VLFM sample consisted of 1104 individuals, who were randomly assigned to one of 11 conditions. In ten of them, subjects were exposed to one or another form of message indicating that “97% of scientists have concluded that human-caused climate change is happening” [van der Linden et al., 2014 , Supp. Information]. ⁴ In the eleventh, subjects were assigned to a “control” that was not exposed to this message.

All subjects responded twice to an identical series of questions relating to climate change. The first time was in the midst of a survey canvassing opinion on a variety of issues. Individuals exposed to a consensus message then answered those same questions again after they had been shown the particular version of the message selected for their experimental condition. Individuals in the control group answered the questions for a second time after viewing “distractor” material, including news stories “on Angelina Jolie’s double mastectomy” and an “article about the upcoming 2014 Star Wars Animated Series [van der Linden et al., 2014 , p. 257].

One of the items (“Perceived Scientific Consensus”) related to the extent of scientific consensus. That item instructed subjects to indicate “to the best of your knowledge, what percentage of climate scientists have concluded that human-caused climate change is happening?” on a scale from 0% to 100% [van der Linden et al., 2014 , Supp. Information]. ⁵

Other items related to climate-change beliefs and attitudes. One (“Belief in Climate Change”) asked subjects,

“How strongly do you believe that climate change is or is not happening?” Response options were given on a continuum, ranging from 0 (I strongly believe that climate change is not happening), 50 (I am unsure whether or not climate change is happening) to 100 (I strongly believe climate change IS happening).

Another item (“Worry About Climate Change”), asked subjects,

“On a scale from 0 to 100, how worried are you about climate change?” Response options were given on a continuum, ranging from 0 (“I am not at all worried”), 50 (“neutral”) to 100 (“I am very worried”).

VLFM state that they assessed subjects’ “belief …climate change is …human-caused” [p. 6]. No item featured in the reported study results, however, measures such a belief. Instead, the item that VLFM characterize as measuring “belief in human-caused climate change” states:

Assuming climate change IS happening: how much of it do you believe is caused by human activities, natural changes in the environment, or some combination of both?

“Response options,” according to the on-line supplemental information furnished in van der Linden et al. [ 2014 ],

were given on a continuum, ranging from 0 (I believe that climate change is caused entirely by natural changes in the environment), 50 (I believe that climate change is caused equally by natural changes and human activities) to 100 (I believe that climate change is caused entirely by human activities).

According to public opinion surveys, between 20% and 25% of Americans do not believe “there [is] solid evidence that the average temperature on earth has been getting warmer” [Funk and Rainie, 2015 ]. Because they do not believe climate change is occurring at all, such individuals necessarily do not believe human beings are causing it. Had VLFM included in their before-and-after array an item that simply asked subjects whether they believed human beings are causing climate change [e.g., Cook and Lewandowsky, 2016 ; Deryugina and Shurchkov, 2016 ], it would have been clear what fraction of these subjects — the ones who reject global warming is occurring outright — had been persuaded to revise that view and adopt the scientific-consensus position. But because VLFM instructed their subjects to “[a]ssum[e] climate change IS happening,” it is impossible to know whether any of those subjects genuinely changed their belief that climate change is not happening — no matter how they answered the study’s “human causation” item [hereinafter “Counterfactual belief”].

Finally, another item was intended to measure “support for public action on climate change” [p. 5]. Not specifically referring to any particular set of government policies [cf. Deryugina and Shurchkov, 2016 ], this item (“Support for Action”) asked:

“Do you think people should be doing more or less to reduce climate change?” Response options were given on a continuum, ranging from 0 (Much less), 50 (Same amount) to 100 (Much more).

Support for Action was the key outcome variable in the VLFM study. Exposure to a consensus message, they hypothesized, would “lead to a change in respondents’ support for societal action on climate change” as measured by this item [p. 2, italics added]. This effect, VLFM posited, would be “fully mediated by key beliefs about climate change” [p. 2] — a set of relations they labeled the Gateway Belief Model [Figure 1 ]. Corroboration of these hypotheses, they reasoned, would supply proof of the “causal relationship between public perceptions of the scientific consensus on climate change and support for public action” that studies “based on cross-sectional survey data” could not convincingly establish [p. 2].

Figure 1 : VLFM “Gateway Belief Model”.

2.2 Reported results

As indicated, the VLFM experiment used a between-subjects design. Whether the results supported VLFM’s hypotheses, then, depended on whether subjects treated with a consensus message formed beliefs and attitudes more supportive of action on climate change than did those exposed to only the “distractor” news stories [e.g., Deryugina and Shurchkov, 2016 ]. Indeed, VLFM explicitly state that their analyses “compared” the response of the consensus-message and control-group subjects [p. 4].

But no such comparisons were reported in the paper. VLFM’s Table 1 presents differences in “Pre-Test” and “Post-Test means” for all the study’s outcome measures (Table 1 ). Readers would be likely to understand this table to be comparing the “before” and “after” responses of subjects exposed to a consensus message. In fact, that is the case only for “Estimate of Scientific Consensus”: the reported values for that entry refer to how being exposed to such a message increased consensus-message subjects’ estimates of the percentage of scientists who accept that human beings are causing global warming. For all the other items, including Support for Action, the reported “Pre-” and “Post-test” means and the “difference” between the two combine the responses of the consensus-message subjects with the control-group subjects. Differences in the respective responses of the two groups were not reported.

Table 1 : VLFM “pre-” and “post-test” means.

The same is true for VLFM’s Gateway Model path diagram (Figure 1 ). VLFM report the results of a structural equation model that they characterize as measuring the degree to which exposure to a consensus message increased “Support for Action” through the mediation of the message’s impact on the study’s other belief and attitude variables (Figure 5 ). The model was configured to assess the effect of being exposed to such a message on only one variable — Scientific Agreement, the item measuring subjects’ estimates of the percentage of climate scientists who believe humans are causing global warming. The model was not specified to assess how being assigned to the consensus-message rather than the control-group condition affected Belief in Climate Change, Worry about Climate Change, Counterfactual Belief, or Support for Action. The parameter estimates reported for the paths between “perceived scientific agreement” and all of these variables (Figure 5 ) are merely “correlational in nature” — and thus reflect the same “major short-coming” that VLFM attribute to studies using observational methods [p. 2].

It is only possible to assess VLFM’s reported finding that consensus messages “increased” the study subjects’ “belief that climate change is happening” and “in turn” their “support for public action” [van der Linden et al., 2015 , pp. 1, 6] by comparing the responses of subjects exposed to such a message with the responses of those who weren’t [e.g., Deryugina and Shurchkov, 2016 ; Bolsen and Druckman, 2017 ; Cook and Lewandowsky, 2016 ]. Such an analysis is presented in the next section, which also describes various other features of the data that were not reported and that are essential to making inferences about the results of the VLFM experiment. ⁶

3 Analysis of the VLFM data

3.1 Treatment vs. control: simple comparisons of “after message” means

Figure 2 : Responses to Scientific Agreement. “Before-message” refers to the initial responses of all study subjects. “After-message” refers to the responses of the consensus-message subjects after exposure to a consensus message.

a. Main effects. Between the two conditions, there was a 12.8 percentage-point difference ( $t = 5.82$ , $p < 0.01$ ⁷ ) in the average “after message” estimate of the percentage of scientists who adhere to the consensus-position on global warming. Indeed, after being told that “97% of scientists have concluded that human-caused climate change is happening,” consensus-message subjects’ modal estimate jumped from 50% to 97% — a response selected by nearly one-third of the subjects assigned to that condition (Figure 2 ).

Figure 3 : “Before” and “after” responses for control and consensus-message group subjects on Belief in Climate Change” and “Support for Action” items.

There was no meaningful difference, however, between the consensus-message and control- group subjects’ “after message” level of support for climate mitigation. The modal response of consensus-message subjects on the key outcome variable Support for Action was an impressive 100 (Figure 3 ). But the same was true for members of the control group, 29% of whose members — as opposed to 27% for the consensus-message group — selected that level of response after reading about the news of the “2014 Star Wars Animated Series” [van der Linden et al., 2014 ]. The medians were 80 and 82, respectively, values trivially different from the sample-wide 81 in the highly skewed before-message responses. ⁸

Parsing the means of these groups for “significant” differences in responses is arguably an inferentially meaningless exercise given how slanted toward positive values the responses of both groups were. But not surprisingly, the trivial 2.5 point difference in the mean “after message” responses of the two groups on the 0–100 scale used to measure Support for Action did not differ to a significant extent ( $t = 1.09$ , $p = 0.28$ ), statistically or practically.

Figure 4 : “Before” and “after” responses for control and consensus-message group subjects on Counterfactual Human Causation and Worry About Climate Change items.

The same conclusion applies to the study’s measures of “key beliefs about climate change” [van der Linden et al., 2015 , p. 2]. In comparison to control group subjects, exposure to a consensus message did not meaningfully increase “belief that climate change is happening” [van der Linden et al., 2015 , p. 2]. The consensus-message and control-group subjects had identical median scores on Belief in Climate Change — 86, an increase for both relative to the already high sample-wide “Pre-Test” median of 81 (Figure 3 ). The difference in the mean “after-message” responses of the two groups — 1.3 points on the item’s 0–100 scale — was trivial and statistically non-significant ( $t = 0.51$ , $p = 0.61$ ). The mean differences in the differentials for Worry About Climate Change (2.5 points, $t = 0.86$ , $p = 0.39$ ) and Counterfactual Causation (2.6 points, $t = 1.03$ , $p = 0.30$ ) were also neither statistically nor practically significant (Figure 4 ).

3.2 Treatment vs. control: VLFM’s “correlational in nature” path model

a. Model misspecification. VLFM present a Structural Equation Model to support their conclusion that exposing subjects to a consensus message “causes a significant increase” in “key beliefs about climate change” and consequently in “support for public action” to reduce it [p. 6]. The VLFM model, however, fails to measure the impact of being told that “97% of climate scientists have concluded that human-caused climate change is happening” on any study outcome variable besides the subjects’ subsequent estimates of the “the percentage of climate scientists [who] have concluded that human-caused global warming is happening.”

Figure 5 : VLFM “SEM” model.

When analyzing experimental results with a structural equation model, “it is the random assignment of the independent variable that validates the causal inferences such that X causes Y, not the simple drawing of an arrow going from X towards Y in the path diagram” [Wu and Zumbo, 2007 , p. 373, italics added]. In such a diagram, every observed variable at the end of one or more single-headed arrows is regressed on the predictor variables from which such an arrow originates [Kline, 2015 ]. If the predictor variable reflects assignment to an experimental treatment condition, the corresponding path coefficient is a measure of the experimental treatment’s effect on that variable.

In the VLFM model (Figure 5 ), the experimental treatment is connected by an arrow to the differential for Perceived Scientific Agreement (i.e., the difference between the “before-” and “after” responses for that item). The corresponding parameter estimate ( $B = 12.80$ , $S E = 2.13$ ) thus represents the effect of being assigned to the consensus-message group rather than the control group (Table 2 , SR(1)).

There is no arrow, however, between the study’s experimental assignment and other depicted variables, all of which combine the responses of consensus-message and control-group subjects. In order to infer that an experimental treatment affects an outcome variable, “there must be an overall treatment effect on the outcome variable”; likewise. in order to infer that an experimental treatment affects an outcome variable through its effect on a “mediator” variable, “there must be a treatment effect on the mediator” [Muller, Judd and Yzerbyt, 2005 , p. 853]. In a structural equation model, such effects typically are estimated in exactly the same way they would be in a straightforward linear regression model: with a contrast variable (usually a 0/1 dummy) that reflects the subjects’ experimental assignment [Kraemer et al., 2002 , p. 878; see also Hoyle and Smith, 1994 , pp. 436–437]. Because the VLFM structural equation model lacks such variables, there is nothing in it that measures the impact of being “treated” with a consensus message on any of the study’s key climate change belief and attitude measures. ⁹ As Bolsen and Druckman [ 2017 , p. 11] put it, “[w]hile perception of a scientific consensus is a significant predictor of belief in human-induced change, and the latter affects policy beliefs, these effects are not from the experimental stimuli.” In short, the model is misspecified.

Table 2 : Reproduction of VLFM SEM model.

N = 1104

. Unstandardized MLE regression coefficients (with FIML for missing data). Standard errors listed parenthetically. Although the full specification of the VLFM path model is not indicated in the VLFM article or supporting material, this reproduction, derived by trial-and-error, confirms that the experimental treatment was treated as a predictor only for Scientific Agreement. VLFM indicate that their model was “adjusted for important covariates, including gender, education, age and political party” [van der Linden et al., 2015 , p. 4]. The reproduction of the analysis shows that gender, age (reflected in seven age-range categories), education (four categories), and dummy variables for two of the three categories of political self-identification (“Democrat,” “Independent,” and “Republican”) were treated as predictors only in the estimate of the impact of Belief in Climate Change, Worry About Climate Change, Counterfactual Human Causation on Support for Action . The only discrepancy between the reproduced model and the model reported by VLFM relates to the parameter estimate for Worry About Climate: VLFM report an unstandardized maximum likelihood coefficient of 0.19, whereas the reanalysis generated an MLE coefficient of 0.20, a different that is immaterial.

In the VLFM path diagram, for example, the differential for Perceived Scientific Agreement is used to estimate the differentials for Belief in Climate Change, Counterfactual Causation, and Worried About Climate Change (Figure 5 ). The responses to Perceived Scientific Agreement, however, do not distinguish between treated and untreated subjects. As a result, the model parameter estimates tell us only how Perceived Scientific Agreement correlates with Belief in Climate Change, Counterfactual Causation, and Worry About Climate Change on average for the entire sample (Table 2 , SR(2)-(4)), regardless of whether a subject was exposed to a consensus message or instead only a “distractor” news story. If what we are interested in is how a change in perceived consensus “experimentally induced” by a consensus message affects these posited “mediator” variables [van der Linden et al., 2015 , p. 2], the underlying regression model must include variables that reflect the impact of being treated with a consensus message [Muller, Judd and Yzerbyt, 2005 ; Kraemer et al., 2002 ]. VLFM’s does not.

Nor does their SEM contain the variables needed to determine whether the experimental assignment made any difference in Support for Action, the study’s ultimate outcome variable. The model supplies parameter estimates for Belief in Climate Change, Counterfactual Causation, and Worried About Climate Change (Table 2 , SR(5)). But again, without anything in the underlying regression model to distinguish control-group from consensus-message subjects’ responses, these estimates tell us only what the correlation was between these predictors and Support for Action on average for the entire sample, regardless of experimental assignment. Nothing in the SEM captures the effect of being exposed to a consensus message on Support for Action at all, much less the impact of such a message through its effect on the “mediators” posited by VLFM [Muller, Judd and Yzerbyt, 2005 ; Kraemer et al., 2002 ]. A model so specified cannot distinguish between outcomes that confirm and ones that disconfirm the Gateway Belief Model, a point illustrated by a statistical simulation presented in the supplementary material.

Table 3 : Regression models for experimental results.

N = 1104

. Dependent variables are “before” and “after” differentials. Unstandardized MLE regression coefficients (with FIML for missing data). Coefficient z-statistics denoted parenthetically. The predictor for “97% msg” indicates assignment to an experimental group exposed to a “scientific consensus” message (=1) as opposed to the study control group (=0); the coefficient for that predictor indicates how much larger the mean difference in “before” and “after” scores was for subjects assigned to the “97% message” condition. Bold denotes that indicated coefficient is significant at

p < 0.05

. “Bayes Factor vs. Null” denotes how much more consistent the observed data are with two competing models: one that posits exposure to a consensus message affects the indicated variable, and one that posits that it doesn’t. A Bayes Factor greater than one indicates that the evidence is more consistent with the null model. The factor was derived from the Bayes Information Criterion differentials for models including and not including the consensus-message predictor, respectively [Raftery, 1995 , pp. 133–134; Wagenmakers, 2007 , pp. 796–797; Masson, 2011 , p. 681].

b. Main effects. The analyses that must be performed to tell the difference are reported in Table 3 . Consistent with well-established practice [Muller, Judd and Yzerbyt, 2005 , p. 853], and with the method used in other consensus-message studies [e.g., Deryugina and Shurchkov, 2016 ], the outcome and each posited mediator are individually regressed on the experimental treatment. The coefficient for the predictor “97% msg” reflects how much assignment to the consensus-message as opposed to the control group affected the specified variable [Muller, Judd and Yzerbyt, 2005 ; Kraemer et al., 2002 ]. ¹⁰ This analysis shows that, contrary to VLFM’s hypothesis (and announced conclusion), exposure to a consensus message did not lead to a meaningful “change in respondents’ support for societal action” among message-exposed subjects “compared to a ‘control’ group” [van der Linden et al., 2015 , p. 4]. The difference in the “before” and “after” responses of consensus-message and control-group subjects on the 101-point Support for Action measure — 2.12 points — was trivially small and not statistically significant ( $z = 1.60$ , $p = 0.11$ ). ¹¹

VLFM characterize their results as the “strongest evidence to date” that “consensus messaging” is “consequential” [van der Linden et al., 2015 , p. 6]. But when the responses of the consensus-message and control subjects are genuinely compared, it turns out that whether subjects viewed a consensus message or instead a distractor news story explained 0% of the variance in the differences in the “before” and “after” responses to Support for Action (Table 3 ), the study’s key outcome variable. In sum, the results of consensus-messaging, as tested by VLFM, were of no consequence in the regression analysis.

Because VLFM’s experiment failed to generate any evidence, of a “causal relationship” between an “experimentally induced change in public perceptions of …scientific consensus on climate change and support for public action,” there is no occasion to assess whether any such “effect” was “mediated,” “fully” or partially, by “key beliefs about climate change” [p. 2] [Kenny, Kashy and Bolger, 1998 ; Baron and Kenny, 1986 ]. One could simply assume that the message had an effect on Support for Action “through” its effect on the posited “mediator” variables but that the effect was too “small” to be detected [Judd, Yzerbyt and Muller, 2014 ]. Making that assumption, however, would defeat the purpose of the study: to remedy the “major short-coming” of previous “correlational” studies by testing — not assuming —“the proposed causal relationship between public perceptions of the scientific consensus on climate change and support for public action” [p. 2].

In addition, treating a null effect on the outcome variable as consistent with the study hypotheses would not dispense with the need to demonstrate an experimental impact on the VLFM model’s posited “mediator” variables [Judd, Yzerbyt and Muller, 2014 ]. Such an analysis shows that, contrary to VLFM’s hypothesis, the experimental manipulation did not meaningfully affect the “key belief” that climate change is actually occurring. The difference in the “before” and “after” responses of consensus-message and control-group subjects on the 101-point Belief in Climate Change measure was also trivially small (2.49) and not statistically significant ( $z = 1.39$ , $p = 0.17$ ). Again the difference in variance explained by the experimental treatment was 0.0% (Table 3 ).

This result is strongly at odds with the Gateway Belief Model hypothesis. VLFM predicted that “experimentally manipulating” perceptions of scientific consensus with a consensus message would increase “key beliefs about climate change” and “in turn” increase “public support for action” [van der Linden et al., 2015 , p. 2]. But in fact, subjects whose perceptions of scientific consensus were experimentally manipulated in this fashion did not increase either their “belief in climate change” or their “support for action” to mitigate it.

The differentials for consensus-message subjects exceeded by a statistically significant margin the differentials of the control-group subjects only on VLFM’s Counterfactual Causation and Worry About Climate Change items. But even here the differences on the 101-point scales were exceedingly small: they explained less than 0.5% (i.e., one half of one percent or one-two-hundredth) of the variance in the “before” and “after” responses of the consensus-message subjects relative to the control-group on Worry About Climate Change. For Counterfactual Causation, the experimental assignment explained less than 0.7% of the variance between the conditions (Table 3 ).

Of course, readers should draw their own inferences from this pattern of results. But they can do that only if they can see what the VLFM data actually show. The analysis in this paper is designed to make that possible.

c. Political polarization. Also at odds with the Gateway Belief Model was the impact of the experimental manipulation on subjects of varying political affiliations. Without reporting any supporting data or analyses, VLFM make a series of representations relating to how the tested consensus messages affected individuals of opposing political party affiliations. They say, for example, that “the consensus message had a larger influence on Republican respondents,” and that “consensus-messaging …shift[ed] the opinions of both Democrats and Republicans in directions consistent with the conclusions of climate science” [p. 6]. The natural interpretation of this language is that the experimental manipulation influenced Republicans’ responses more than Democrats’ on the key mediator and outcome variables.

But when all the VLFM data are inspected, they do not support that conclusion (Table 4 ). The only variable on which the experimental manipulation had an impact on Republicans that exceeded that of Democrats by a statistically significant margin was Perceived Scientific Agreement (Table 4 ) — i.e., the estimated percentage of scientists who accept human-caused climate change. Compared to Democratic subjects, Republicans in the consensus-messaging condition did not modify to a statistically or practically significant degree their after-message assessment of the need for action to combat climate change (Public Support). Nor did they modify to a statistically significant extent their assessment of the existence of climate change, the counterfactual contribution of humans to global warming, or worry about climate change. If reducing political polarization is a critical step in the democratic enactment of climate-mitigation policies in the U.S., the results of the VLFM study suggest that simply telling Republicans that “97% of climate scientists have concluded that human-caused climate change is happening” appears not to contribute to this end. ¹²

Table 4 : Partisan individual differences in impact of experimental manipulation.

N = 1104

. Dependent variables are “before” and “after” differentials. Consistent with VLFM’s analytic strategy, predictors are estimated using a structural equation model. Unstandardized MLE regression coefficients (with FIML for missing data). Coefficient z-statistics denoted parenthetically. The predictors for “Republican” and “Independent” are dummy variables that reflect responses of 5 to 7 and 4, respectively, on the study’s 7-point party identification variable; Democrat-identifying subjects (score 1–3 on the 7-point party identification variable) are the omitted reference group. The model constant reflects the impact of the “distractor” message on Democrats, and “97% msg” the additional impact on Democrats of having been assigned to the consensus-messaging condition. The variable “Repub” reflects the difference between Republicans and Democrats in the control condition; “msg_x_Repub” the incremental effect of the consensus-messaging treatment on Republicans relative to Democrats. Bold denotes that indicated coefficient is significant at

p < 0.05

.

3.3 Sample size, statistical significance, and evidentiary weight

While the number of subjects in the VLFM study ( $N = 1104$ ) was large by conventional standards, the negligible size of the experimental effects still rendered it “too small” to find statistically significant differences between responses of subjects assigned to the control and consensus-message groups. Researchers aware of this constraint might decide to remedy it by re-running the study after arbitrarily inflating the sample size by, say, a factor of six [van der Linden, Leiserowitz and Maibach, 2016 ]. Resorting to a massively large sample to guarantee “statistical significance” is recognized as a poor research practice because it confuses statistical significance for inferential weight [Fung, 2014 ; Kim and Ji, 2015 ]. “ $P$ values are not and should not be used to define moderators and mediators of treatment, because then moderator or mediator status would change with sample size” [Kraemer et al., 2002 , p. 881].

The only informative way to convey the results of experiments that purport to test hypothesized mediation effects is to report statistics that indicate the evidentiary weight of the data in relation to models configured to reflect competing hypotheses [Kim and Ji, 2015 ]. Consistent with this approach, the Bayesian Information Criterion was calculated for the appropriately specified models reflected in Table 3 [Raftery, 1995 ; Goodman, 1999 ; Goodman, 2005 ; Wagenmakers, 2007 ]. In relation to Support for Action, this test showed that the data were 9 times more consistent with a “null effect” model — in which exposure to a consensus message was hypothesized to have no effect — than with a model hypothesizing message exposure would have the effect associated with the VLFM model (Table 3 ). The data were 12 times more consistent with a null-effect model than with the consensus-messaging model for Belief in Climate Change. The data were more consistent too — albeit by smaller margins — with no effect for Worry about Climate Change (2.8x) and Counterfactual Causation (1.2x). ¹³

But in any event, insufficient power to observe a “statistically significant effect” does not license reporting a significant effect that does not actually exist. VLFM claim that their data show that being exposed to a consensus message generated a “a significant increase” in “key beliefs about climate change” when “experimental consensus-message interventions were collapsed into a single ‘treatment’ category and subsequently compared to [a] ‘control’ group” [van der Linden et al., 2015 , p. 4]. This representation is incorrect.

4 Conclusion

VLFM “stands in contrast” [Cook and Lewandowsky, 2016 , p. 175] with more recent examinations of consensus messaging. VLFM claim that showing subjects a consensus message “cause[d] a significant increase” in “key beliefs about climate change” and a corresponding increase in “support for public action” to mitigate it [van der Linden et al., 2015 , p. 6]. Such a result would be in tension with Cook and Lewandowsky’s finding of a “worldview backfire effect,” in which a consensus message amplified U.S. conservative’s distrust in climate scientists and magnified political polarization in the sample as a whole [ 2016 , pp. 169–172], a result also reported by Bolsen and Druckman [ 2017 ]. The reported VLFM finding would also be in direct conflict with Dixon, Hmielowski and Ma [ 2017 ], who found that exposing subjects to a 97%-consensus message had no significant effect; and with Deryugina and Shurchkov [ 2016 ], who found that the immediate impact of a consensus message on beliefs in human-caused climate change did not translate into greater support for climate mitigation policies — either immediately or in a six-month follow up study, in which subjects’ beliefs in human-caused climate change no longer differed significantly from their pre-message beliefs.

Much of the conflict disappears, however, when one simply looks at the VLFM data and how VLFM analyzed them. Unlike these other researchers, VLFM did not report the responses of subjects in the study control group, the members of which read only “distractor” news stories on popular entertainment. Subjects who were told that “97% of climate scientists have concluded that human-caused climate change is happening” did increase their estimates of the “percentage of climate scientists [who] have concluded that human-caused climate change is happening.” But consistent with the result in Dixon, Hmielowski and Ma [ 2017 ] and Deryugina and Shurchkov [ 2016 ], the degree to which those subjects increased their “support for public action” — 2 points on a 101-point scale — was not significant, in statistical or practical terms, compared to the responses of the VLFM control group subjects.

Unlike Deryugina and Shurchkov [ 2016 ], Cook and Lewandowsky [ 2016 ], Bolsen and Druckman [ 2017 ], and Dixon, Hmielowski and Ma [ 2017 ], VLFM did not collect data on their subjects’ actual beliefs in human-caused climate change. But on a 101-point measure of “belief in climate change” that made no reference to cause, VLFM subjects who viewed a consensus message and those who read about a Star Wars cartoon series both increased their “after message” responses by small amounts. The 2.5-point difference in how much they increased them did not differ significantly (in statistical or practical terms).

This information, essential to evaluating the meaning and importance of the VLFM study, was not reported in the paper. On the contrary, it was obscured by the authors’ use of a misspecified structural equation model that reported correlations between various study outcome measures without regard to how those responses varied among subjects who were and were not exposed to a consensus message.

Once again, the point of this paper was not to determine which position is correct on the use of consensus messaging. It was only to assure that scholars would have access to all the data collected by VLFM when assessing that study’s contribution to knowledge.

References

References include sources cited in the supplementary material.

: Baron, R. M. and Kenny, D. A. (1986). ‘The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations’. Journal of Personality and Social Psychology 51 (6), pp. 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173 .
: Bolsen, T. and Druckman, J. (2017). Do Partisanship and Politicization Undermine the Impact of Scientific Consensus on Climate Change Beliefs? URL: http://www.law.northwestern.edu/research-faculty/searlecenter/events/roundtable/documents/Druckman_Partisan_Group_Identity_Belief_in_Human-Caused_Climate_Change.pdf .
: Brewer, M. B. and Crano, W. D. (2014). ‘Research Design and Issues of Validity’. In: Handbook of Research Methods in Social and Personality Psychology. Ed. by H. T. Reis and C. M. Judd, pp. 11–26. https://doi.org/10.1017/cbo9780511996481.005 .
: Carlsson, R., Schimmack, U., Williams, D. and Bürkner, P.-C. (2016). ‘Bayesian Evidence Synthesis is No Substitute for Meta-analysis: A Re-analysis of Scheibehenne, Jamil and Wagenmakers’. Psych. Sci.
: Cook, J. and Lewandowsky, S. (2016). ‘Rational Irrationality: Modeling Climate Change Belief Polarization Using Bayesian Networks’. Topics in Cognitive Science 8 (1), pp. 160–179. https://doi.org/10.1111/tops.12186 . PMID: 26749179 .
: Deryugina, T. and Shurchkov, O. (2016). ‘The Effect of Information Provision on Public Consensus about Climate Change’. PLOS ONE 11 (4). Ed. by J. Hewitt, e0151469. https://doi.org/10.1371/journal.pone.0151469 .
: Dixon, G., Hmielowski, J. and Ma, Y. (2017). ‘Improving Climate Change Acceptance Among U.S. Conservatives Through Value-Based Message Targeting’. Science Communication 39 (4), pp. 520–534. https://doi.org/10.1177/1075547017715473 .
: Fox, J. (2015). Applied regression analysis and generalized linear models. 3rd ed. Los Angeles, U.S.A.: SAGE.
: Fung, K. (2014). Big Data, Plainly Spoken . URL: http://junkcharts.typepad.com/numbersruleyourworld/2014/11/gelman-explains-why-massive-sample-sizes-to-chase-after-tiny-effects-is-silly.html .
: Funk, C. and Rainie, L. (1st July 2015). ‘Americans Politics and Science Issues’. Pew Research Center . URL: http://www.pewinternet.org/2015/07/01/americans-politics-and-science-issues/ .
: Goodman, S. N. (1999). ‘Toward Evidence-Based Medical Statistics. 2: The Bayes Factor’. Annals of Internal Medicine 130 (12), pp. 1005–1013. https://doi.org/10.7326/0003-4819-130-12-199906150-00019 .
: — (2005). ‘Introduction to Bayesian methods I: measuring the strength of evidence’. Clinical Trials 2 (4), pp. 282–290. https://doi.org/10.1191/1740774505cn098oa .
: Gujarati, D. N. and Porter, D. C. (2009). Basic econometrics. 5th ed. Boston, U.S.A.: McGraw-Hill Irwin.
: Hoyle, R. H. and Smith, G. T. (1994). ‘Formulating clinical research hypotheses as structural equation models: A conceptual overview’. Journal of Consulting and Clinical Psychology 62 (3), pp. 429–440. https://doi.org/10.1037/0022-006x.62.3.429 .
: Judd, C. M. (2000). ‘Everyday Data Analysis in Social Psychology: Comparisons of Linear Models’. In: Handbook of Research Methods in Social and Personality Psychology. Ed. by H. T. Reis and C. M. Judd. New York, U.S.A.: Cambridge University Press, pp. 370–392.
: Judd, C. M., Yzerbyt, V. Y. and Muller, D. (2014). ‘Mediation and Moderation’. In: Handbook of Research Methods in Social and Personality Psychology. Ed. by H. T. Reis and C. M. Judd. Cambridge, U.K.: Cambridge University Press, pp. 653–676. https://doi.org/10.1017/cbo9780511996481.030 .
: Kenny, D., Kashy, D. and Bolger, N. (1998). ‘Data analysis in social psychology’. In: The Handbook of Social Psychology. 1. Ed. by D. Gilbert, S. Fiske and G. Lindzey. Boston, MA, U.S.A.: McGraw-Hill, pp. 233–265.
: Kim, J. H. and Ji, P. I. (2015). ‘Significance testing in empirical finance: A critical review and assessment’. Journal of Empirical Finance 34, pp. 1–14. https://doi.org/10.1016/j.jempfin.2015.08.006 .
: King, G. and Roberts, M. E. (2015). ‘How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It’. Political Analysis 23 (02), pp. 159–179. https://doi.org/10.1093/pan/mpu015 .
: Kline, R. B. (2015). Principles and practice of structural equation modeling. New York, U.S.A.: The Guilford Press.
: Kraemer, H. C., Wilson, G. T., Fairburn, C. G. and Agras, W. S. (2002). ‘Mediators and Moderators of Treatment Effects in Randomized Clinical Trials’. Archives of General Psychiatry 59 (10), pp. 877–883. https://doi.org/10.1001/archpsyc.59.10.877 .
: Lewandowsky, S., Gignac, G. E. and Vaughan, S. (2013). ‘The pivotal role of perceived scientific consensus in acceptance of science’. Nature Climate Change 3 (4), pp. 399–404. https://doi.org/10.1038/nclimate1720 .
: Masson, M. E. J. (2011). ‘A tutorial on a practical Bayesian alternative to null-hypothesis significance testing’. Behavior Research Methods 43 (3), pp. 679–690. https://doi.org/10.3758/s13428-010-0049-5 .
: Muller, D., Judd, C. M. and Yzerbyt, V. Y. (2005). ‘When moderation is mediated and mediation is moderated’. Journal of Personality and Social Psychology 89 (6), pp. 852–863. https://doi.org/10.1037/0022-3514.89.6.852 .
: National Academies of Sciences, Engineering, and Medicine (2017). Communicating Science Effectively: A Research Agenda. Washington, DC, U.S.A.: National Academies Press. https://doi.org/10.17226/23674 .
: Raftery, A. E. (1995). ‘Bayesian Model Selection in Social Research’. Sociological Methodology 25, pp. 111–163. https://doi.org/10.2307/271063 .
: Slovic, P., Finucane, M. L., Peters, E. and MacGregor, D. G. (2004). ‘Risk as Analysis and Risk as Feelings: Some Thoughts about Affect, Reason, Risk, and Rationality’. Risk Analysis 24 (2), pp. 311–322. https://doi.org/10.1111/j.0272-4332.2004.00433.x .
: Slovic, P., Peters, E., Finucane, M. L. and MacGregor, D. G. (2005). ‘Affect, risk, and decision making’. Health Psychology 24 (4, Suppl), S35–S40. https://doi.org/10.1037/0278-6133.24.4.s35 .
: van der Linden, S. L. and Chryst, B. (2017). ‘No Need for Bayes Factors: A Fully Bayesian Evidence Synthesis’. Frontiers in Applied Mathematics and Statistics 3. https://doi.org/10.3389/fams.2017.00012 .
: van der Linden, S. L., Leiserowitz, A. and Maibach, E. W. (2016). ‘Communicating the Scientific Consensus on Human-Caused Climate Change is an Effective and Depolarizing Public Engagement Strategy: Experimental Evidence from a Large National Replication Study’. SSRN Electronic Journal . https://doi.org/10.2139/ssrn.2733956 .
: van der Linden, S. L., Leiserowitz, A. A., Feinberg, G. D. and Maibach, E. W. (2014). ‘How to communicate the scientific consensus on climate change: plain facts, pie charts or metaphors?’ Climatic Change 126 (1-2), pp. 255–262. https://doi.org/10.1007/s10584-014-1190-4 .
: — (2015). ‘The Scientific Consensus on Climate Change as a Gateway Belief: Experimental Evidence’. PLOS ONE 10 (2). Ed. by K. L. Ebi, e0118489. https://doi.org/10.1371/journal.pone.0118489 .
: Wagenmakers, E.-J. (2007). ‘A practical solution to the pervasive problems ofp values’. Psychonomic Bulletin & Review 14 (5), pp. 779–804. https://doi.org/10.3758/bf03194105 .
: Wu, A. D. and Zumbo, B. D. (2007). ‘Understanding and Using Mediators and Moderators’. Social Indicators Research 87 (3), pp. 367–392. https://doi.org/10.1007/s11205-007-9143-1 .

Author

Dan Kahan is the Elizabeth K. Dollard Professor of Law and Professor of Psychology at Yale Law School. He is a member of the Cultural Cognition Project, an interdisciplinary team of scholars who use empirical methods to examine the impact of group values on perceptions of risk and science communication. In studies funded by the National Science Foundation, Professor Kahan and his collaborators have investigated public dissensus over climate change, public reactions to emerging technologies, and public understandings of scientific consensus across disputed issues. Articles featuring the Project’s studies have appeared in a variety of peer-reviewed scholarly journals including the Journal of Risk Research, Judgment and Decision Making, Nature Climate Change, Science, and Nature. The Project’s current focus is on field research to integrate insights from the science of science communication into the craft of professional science communicators in various domains, including democratic decisionmaking, education, and popular engagement with science. Professor Kahan is a Senior Fellow at the National Center for Science and Civic Engagement, a member of the American Academy of Arts and Sciences. E-mail: dan.kahan@yale.edu .

Endnotes

Supplementary material Available at https://jcom.sissa.it/archive/16/05/JCOM_1605_2017_A03 .

¹ The data for van der Linden et al. [ 2015 ] can be accessed with the doi specified at http://tinyurl.com/zve57wo . Characteristically for “open-access” journals, PLOS ONE ’s “data availability” policy requires authors of papers in that journal to “to make all data underlying the findings described in their manuscript fully available without restriction” for the purpose of enabling “validation, replication, reanalysis, new analysis, reinterpretation or inclusion into meta-analyses.”

² In the supplementary material, simulated data are used to illustrate that the misspecified model featured in VLFM is unable to distinguish between study results that support their Gateway Belief Model hypotheses and study results that refute them.

³ The description here is based on van der Linden et al. [ 2014 ] and the on-line supplemental information for that paper.

⁴ The relative effect of these ten different messages on subjects’ estimates of the percentage of climate scientists that accept human-caused climate change was analyzed in van der Linden et al. [ 2014 ]. Neither VLFM nor van der Linden et al. [ 2014 ] reproduce all of the messages; the on-line “supplemental information” document for the latter shows only four “examples.” There is also no variable in the VLFM dataset indicating which of the ten message subjects received.

⁵ The wording for this and other survey measures can be found in the supplemental information for van der Linden et al. [ 2014 ].

⁶ The internal validity of the VLFM study is open to reasonable dispute. Telling subjects that “97% of scientists have concluded that human-caused climate change is happening” and then asking them “to the best of your knowledge, what percentage of climate scientists have concluded that human-caused climate change is happening” could be viewed as presenting a fairly obvious demand-effect confound [Brewer and Crano, 2014 ].

⁷ All reported “ $p$ values” are two-tailed. A one-tailed $p$ -value would be inappropriate: not only was it possible for message-exposed subjects to change their “after” responses in a negative direction, but researchers have reported finding exactly this sort of back-fire effect in consensus-messaging studies [National Academies of Sciences, Engineering, and Medicine, 2017 , p. 62; Cook and Lewandowsky, 2016 ; Bolsen and Druckman, 2017 ].

⁸ The skew toward belief in and concern over global warming reflected in the “before message” responses (Figure 3 ) is wildly out of synch with the intensity of public division on this issue. The most likely explanation is that the study’s unusually worded items and 0–100 response scales were simply ill-designed for study of public opinion on climate.

⁹ The VLFM model was reproduced to verify that in fact it was specified in the manner the paper represents (Table 2 ).

¹⁰ VLFM “controlled” for gender, education, age and political party in their estimate of Support for Action (Table 2 ). Because the study used randomized assignment, these variables are ignorable in estimating the impact of the experimental assignment. Nevertheless, an analysis including these variables found that they did not affect estimates of the treatment effect on the study outcome and mediator variables as reported in Table 3 .

¹¹ Consistent with VLFM’s decision to analyze their experimental results with a structural equation model, maximum likelihood regression (with FIML for missing data) were used to calculate the parameter estimates for the treatment effects on model outcome and mediator variables. The effects were identical when estimated with ordinary least squares regression. OLS regression was used to assess the potential impact of heteroscedasticity on the parameter estimates. Under OLS regression, “unequal error variance typically distorts statistical inference only when the problem is severe” [Fox, 2015 , p. 306]. Because any method of computing “robust” standard errors to correct for heteroscedasticity will itself be vulnerable to over- or under-estimation, commentators recommend that nonconstant error variance be regarded as warranting some form of correction “only when the magnitude (i.e., the standard deviation) of the errors varies by more than a factor of about 3 — that is, when the largest error variance is more than about 10 times the smallest” [Fox, 2015 , p. 307; Gujarati and Porter, 2009 , p. 400]. In the case of Support for Action, the standard errors associated with the OLS estimate of the experimental effect (1.32) and with a robust generalized least squares estimate (0.82) varied by a factor of less than 2. As a result, the magnitude of the error variances differed by a factor of less than 3 — well under the factor of 10 threshold [ibid.]. The same was true for each of the study mediator variables. Even where heteroscedasticity does warrant correction, moreover, the appropriate remedy is to treat it as evidence of model misspecification, not to paper it over with one or another nonparametric alternative that generates the desired “ $p < 0.05$ ” answer sought by the researcher. “The bigger the difference robust standard errors make, the stronger the evidence for misspecification” [King and Roberts, 2015 , p. 177].

¹² Additional analyses of the impact of consensus-messaging on subjects of opposing political affiliations is presented in the supplementary material.

¹³ van der Linden and Chryst [ 2017 ] recently purported to show that Bayes factor scores are inferior to Bayesian synthesis, an alternative Bayesian method for determining study effect sizes. They apparently overlooked Carlsson et al. [ 2016 , in press], which demonstrates the invalidity of the Bayesian synthesis method.