18/05/2022
Caveats
There exist a number of caveats for both the proposed Bayesian meta-analysis approach specifically and meta-analysis in general. The main danger is that researchers treat the outcome of a meta-analysis as definitive without taking into account the assumptions and limitations of the approach. In general, there are many uncertainties when applying meta-analysis; the proposed approach attempts to address one of these uncertainties (i.e., should a fixed-effect or random-effects model be used) using Bayesian model averaging. One uncertainty that is not addressed by the approach is whether the assumption of a normal distribution of true study effects is plausible. It may be argued that this assumption is problematic because of a number of reasons. For example, there may be dependencies between different effect sizes due to including multiple effect sizes from the same articles or multiple studies from the same lab. Moreover, there may be sequential dependencies given that researchers may inform their study designs by reading the literature (this may be less of a concern for many-labs meta-analyses). Furthermore, researchers should be aware that there may be measurement-error and range-restriction issues. A number of methods have been proposed to address these caveats (e.g., Cheung & Chan, 2008; Schmidt & Hunter, 2015; Tipton, 2015). Another caveat is that the presence of publication bias may distort the meta-analytic result. Publication bias can be ruled out in case the complete set of studies has been preregistered (e.g., in the form of a Registered Replication Report, Chambers, 2017; van Elk et al., 2015). Whenever publication bias cannot be ruled out, a number of methods have been proposed for estimating the extent of this publication bias and for correcting the meta-analytic effect size estimate (e.g., Gronau, Duizer, et al., 2017; Simonsohn et al., 2014a, 2014b; van Assen et al., 2015).18 Furthermore, our lab has recently proposed an extension of the Bayesian model-averaged meta-analysis procedure that takes into account the possibility of publication bias (Bartoš et al., 2020; Maier et al., 2020). In any case, it is important to emphasize that researchers should not blindly trust meta-analysis results but should take into account substantive expertise and knowledge about the limitations of the procedure.
Beyond overall effects
In addition to the key questions Q1 and Q2, researchers may often be interested in incorporating discrete and continuous moderators at the study level. Although we did not discuss this possibility here, the metaBMA package does provide functionality for including moderators. Including moderators in the analysis is one way of accounting for the fact that different subsets of studies might have different latent effect sizes. Another possible way of incorporating and testing this assumption would be to change the distribution of the latent study effects. Instead of assuming a single continuous normal distribution of effect sizes, one could assume a latent mixture of normal distributions and then test how many components are necessary to describe the distribution of latent study effects best (e.g., Moreau & Corballis, 2019).
An additional approach to a Bayesian meta-analysis is to focus on the entire distribution of study effects instead of the overall effect. For instance, Rouder et al. (2019) proposed to test whether all studies in the meta-analytic sample show an effect in the same, expected direction or whether some studies show an opposite effect. An appropriate model for this analysis is one in which both the distribution of the overall effect and the distribution of individual study effects are truncated; the latter truncation is imposed to allow individual study effects in one direction only (upper level of Fig. 1). This model can then be compared with the unconstrained alternative (i.e., the random-effects alternative). Similar tests have been proposed in the clinical literature, in which meta-analysis also serves the purpose to test whether one treatment is superior for one patient population and another treatment is superior for another patient population (Gail & Simon, 1985). Such a “Does every study show an effect?” analysis is implemented in the metaBMA package.
As a final word of caution, we would like to stress again that, in line with the adage “garbage in, garbage out,” no statistical analysis can provide high-quality inference based on low-quality data that might be the result of problematic study design, shortcomings of the implementation or sample, publication bias, significance chasing, and so on; Bayesian model-averaged meta-analysis is no exception. For instance, one may use the procedure to analyze studies that have not been preregistered; however, the conclusions might need to be interpreted with skepticism in case the quality of the included studies is questionable or if the included studies represent a biased sample of all conducted studies in a field. In contrast, when the set of studies is of high quality, preregistered, and possibly even the result of a Registered (Replication) Report, we believe that Bayesian model-averaged meta-analysis can be a valuable tool that allows researchers to address key questions of interest in a principled manner.
Appendix
Changing the prior probabilities of the hypotheses
When computing Bayes factors (BFs) that compare two models, such as BFHf1,Hf0 (see Equation 2 and Equation 3), the prior probabilities of the hypotheses do not affect the resulting BF. For instance, when inserting the expressions for the posterior probabilities in Equation 3, the prior probabilities cancel out:
BFHf1,Hf0=p(data∣∣Hf1)p(Hf1)p(data∣∣Hf0)p(Hf0)/p(Hf1)p(Hf0)=p(data∣∣Hf1)p(data∣∣Hf0).
(6)
In contrast, when computing inclusion BFs that involve more than two models, the prior probabilities affect the resulting BFs. For instance, when inserting the expressions for the posterior probabilities in Equation 4, the prior probabilities do not cancel out:19
BF10=p(data∣∣Hf1)p(Hf1)+p(data∣∣Hr1)p(Hr1)p(data∣∣Hf0)p(Hf0)+p(data∣∣Hr0)p(Hr0)/p(Hf1)+p(Hr1)p(Hf0)+p(Hr0).
(7)
Here we demonstrate the effect of changing the prior probabilities of the hypotheses using the self-concept maintenance example. Specifically, we show how the posterior probabilities of the hypotheses and the inclusion BFs change when (a) increasing the prior probability of the winning hypothesis Hf0 from 0.25 to 0.70 and (b) increasing the prior probability of the worst hypothesis Hr1 from 0.25 to 0.70.
The remaining prior probability, 0.30, is distributed evenly across the other three hypotheses (i.e., each of the remaining hypotheses is assigned prior probability 0.10).
Increasing the prior probability of Hf0
Hypotheses posterior probabilities
Table 2 displays the prior probabilities of the hypotheses and the posterior probabilities of the hypotheses for each of the three different prior specifications for m. Although the numbers changed, the ordering of the posterior probabilities is identical to the one obtained when using equal prior probabilities for all four hypotheses: For all prior specifications, the fixed-effect null hypothesis (Hf0) receives most posterior probability, followed by the random-effects null hypothesis (Hr0), the fixed-effect alternative hypothesis (Hf1), and the random-effects alternative hypothesis (Hr1).
Table
Table 2. Prior and Posterior Probabilities of the Four Hypotheses of Interest
Table 2. Prior and Posterior Probabilities of the Four Hypotheses of Interest
View larger version
Model-averaged BF for an overall effect
For the default (two-sided) prior setting, BF10 ≈ 0.077. Consequently, BF01 ≈ 12.987, which indicates strong evidence for the absence of an effect. Recall that equal prior probabilities for all four hypotheses yielded BF01 ≈ 8.696, which indicates moderate evidence for the absence of an effect. For the default (one-sided) prior setting, BF10 ≈ 0.016. Consequently, BF01 ≈ 62.5, which indicates very strong evidence for the absence of an effect. Equal prior probabilities for all four hypotheses yielded BF01 ≈ 47.619, which also indicates very strong evidence for the absence of an effect. For the informed (one-sided) prior setting, BF10 ≈ 0.004. Consequently, BF01 ≈ 250, which indicates extreme evidence for the absence of an effect. Equal prior probabilities for all four hypotheses yielded BF01 ≈ 200, which also indicates extreme evidence for the absence of an effect. In sum, the inclusion BFs based on the different setting of the prior probabilities of the four hypotheses (see Table 2) qualitatively agree with the ones obtained when using equal prior probabilities: There is evidence for the absence of an effect. However, they differ in the degree of evidence for the absence of an effect.
Model-averaged BF for heterogeneity
For the default (two-sided) prior setting, BFrf ≈ 0.119. Consequently, BFfr ≈ 8.403, which indicates moderate evidence for the absence of heterogeneity. Recall that equal prior probabilities for all four hypotheses yielded BFfr ≈ 5.291, which also indicates moderate evidence for the absence of heterogeneity. For the default (one-sided) prior setting, BFrf ≈ 0.111. Consequently, BFfr ≈ 9.009 indicates moderate evidence for the absence of heterogeneity. Equal prior probabilities for all four hypotheses yielded BFfr ≈ 5.263, which also indicates moderate evidence for the absence of heterogeneity. For the informed (one-sided) prior setting, BFrf ≈ 0.107. Consequently, BFfr ≈ 9.346, which indicates moderate evidence for the absence of heterogeneity. Equal prior probabilities for all four hypotheses yielded BFfr ≈ 5.263, which also indicates moderate evidence for the absence of heterogeneity. In sum, the inclusion BFs based on the different setting of the prior probabilities of the four hypotheses (see Table 2) qualitatively agree with the ones obtained when using equal prior probabilities: There is evidence for the absence of heterogeneity. However, they differ in the degree of evidence for the absence of heterogeneity.
Increasing the prior probability of Hr1
Hypotheses posterior probabilities
Table 3 displays the prior probabilities of the hypotheses and the posterior probabilities of the hypotheses for each of the three different prior specifications for m. Although the numbers changed, the ordering of the posterior probabilities is similar to the one obtained when using equal prior probabilities for all four hypotheses: For all prior specifications, the fixed-effect null hypothesis Hf0 receives most posterior probability, followed by the random-effects null hypothesis Hr0. However, now the fixed-effect alternative hypothesis Hf1 receives less posterior probability than the random-effects alternative hypothesis Hr1.
Table
Table 3. Prior and Posterior Probabilities of the Four Hypotheses of Interest
Table 3. Prior and Posterior Probabilities of the Four Hypotheses of Interest
View larger version
Model-averaged BF for an overall effect
For the default (two-sided) prior setting, BF10 ≈ 0.056. Consequently, BF01 ≈ 17.857, which indicates strong evidence for the absence of an effect. Recall that equal prior probabilities for all four hypotheses yielded BF01 ≈ 8.696, which indicates moderate evidence for the absence of an effect. For the default (one-sided) prior setting, BF10 ≈ 0.011. Consequently, BF01 ≈ 90.909, which indicates very strong evidence for the absence of an effect. Equal prior probabilities for all four hypotheses yielded BF01 ≈ 47.619, which also indicates very strong evidence for the absence of an effect. For the informed (one-sided) prior setting, BF10 ≈ 0.003. Consequently, BF01 ≈ 333.333, which indicates extreme evidence for the absence of an effect. Equal prior probabilities for all four hypotheses yielded BF01 ≈ 200, which also indicates extreme evidence for the absence of an effect. In sum, the inclusion BFs based on the different setting of the prior probabilities of the four hypotheses (see Table 3) qualitatively agree with the ones obtained when using equal prior probabilities: There is evidence for the absence of an effect. However, they differ in the degree of evidence for the absence of an effect.
Model-averaged BF for heterogeneity
For the default (two-sided) prior setting, BFrf ≈ 0.076. Consequently, BFfr ≈ 13.158, which indicates strong evidence for the absence of heterogeneity. Recall that equal prior probabilities for all four hypotheses yielded BFfr ≈BFfr≈5.291 5.291, which indicates moderate evidence for the absence of heterogeneity. For the default (one-sided) prior setting, BFrf ≈ 0.054. Consequently, BFfr ≈ 18.519, which indicates strong evidence for the absence of heterogeneity. Equal prior probabilities for all four hypotheses yielded BFfr ≈ 5.263, which indicates moderate evidence for the absence of heterogeneity. For the informed (one-sided) prior setting, BFrf ≈ 0.049. Consequently, BFfr ≈ 20.408, which indicates strong evidence for the absence of heterogeneity. Equal prior probabilities for all four hypotheses yielded BFfr ≈ 5.263, which indicates moderate evidence for the absence of heterogeneity. In sum, the inclusion BFs based on the different setting of the prior probabilities of the four hypotheses (see Table 2) qualitatively agree with the ones obtained when using equal prior probabilities: There is evidence for the absence of heterogeneity. However, they differ in the degree of evidence for the absence of heterogeneity.
Summary
In sum, changing the prior probabilities of the hypotheses—as expected—has an effect on the posterior probabilities of the hypotheses. Furthermore, it also has an effect on the inclusion BFs, that is, it has an effect on the degree of model-averaged evidence. However, in this particular example, using the particular changes to the prior probability that we used, it does not change the qualitative overall conclusions that there is evidence for the absence of an effect and that there is evidence for the absence of heterogeneity. In general, we believe that unless there is strong prior knowledge that suggests to set the prior probabilities differently, it is prudent to set the prior probabilities of all four hypotheses uniformly to 0.25.
Transparency