Replication Crisis in Psychology Research: Who’s to Blame for?

[By: Umay Șen]

As a part of my PhD program, I am attending a lot of classes besides doing research. Recently, I took the course on Bayesian Statistics and had a lot of chance to compare Bayesian approach to data and statistical analysis to the frequentist approach which is/was more common in the field. This course made me get a critical perspective to the frequentist approach which is one of the most common practices in Psychology when you deal with the data. In this blog post, I would like to share some basic ideas of Bayesian approach and discuss it in relation to the replication crisis in Psychology.

Psychology field has been in an unfortunate state for the last couple of years. There are some problematic issues when it comes to replicate the findings that were previously published. When the findings are attempted to be replicated, significance sometimes disappear or the previous effects sizes get smaller and smaller. According to research, only one third of the attempted replications was resulted in success, which is, I think quite alarming.

One of the most important reasons for the current replication crisis is methodological. Frequentist analysis creates publication pressure on the researchers by putting excessive emphasis on the significant p values. Then, the researchers put too much emphasis on the published positive findings without taking into consideration high false positive rates. Szucs and Ioannidis (2017) argue that there are more false positives in the literature than it is usually assumed by conventional 5% level. When we look at the literature, we assume that one significant study conveys valid evidence coming from a well-designed study despite the fact that we don’t know much about the backstage of it; null results, pilots, pre-determined a and b values, whether they are pre-determined or not. On the other hand, the journals usually favor the statistically significant results rather than the null findings. This leads the researchers to a significance oriented mental state. Even though they are not aware of it, most researchers are falling into the errors of harking, manipulating the a and b values after getting the results, reporting only significant results and interpreting a significant result as an absolute truth. Thus, fishing expeditions in order to find the significant p values, nontransparent procedures and analysis become unavoidable, resulting in false positives. Bayesian analysis, on the other hand, can provide support for the null hypothesis as well as rejecting it. It avoids the binary thinking of significant or non-significant dilemma by emphasizing the extent of the strength of the belief in the results and the data. Bayesian statistics can be considered as a good candidate to tackle with the issues of the replication crisis.

I also consider pre-registration is an effective solution for the pressure that researchers are experiencing in terms of publishing only the significant results. It could also prevent some problems related to null hypothesis significance testing such as harking, determining critical a values after getting the results. It is for sure that these implementations put so many questions on the validity and reliability of the results and pre-registration may increase the overall quality of the research by eliminating these practices. However, pre-registration should not be seen as a solution to theoretical problems arising from the frequentist approach, and also problems related to p value. These issues are still there, even when one prefers the pre-registration. Our job is to look for the ways to improve our research. From my point of view, Bayesian analysis and pre-registration could be the way to go.




Szucs, D., & Ioannidis, J. P. (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology15(3), 1-18.