Facing the replication crisis in machine learning modeling

DFG project "Facing the Replication Crisis in Machine Learning Modeling" was approved as part of the DFG priority program "META-REP"

Starting Point

Predictive modeling using machine learning (ML) algorithms is gaining popularity in many scientific disciplines, including medicine, epidemiology, and psychology. However, the transfer of complex statistical methods to other areas of application outside its core area is prone to error. Thus, initially promising results were unfortunately often based on incorrectly validated models that led to overly optimistic predictive accuracy (e.g., in predicting the risk of suicide). As methodological shortcomings can have serious negative consequences for both individuals and society, some researchers warn of a “new” replication crisis in ML-based research. Previous work has largely focused on the algorithmic aspects of this crisis, ignoring the specific challenges in psychological research, such as unreliable indicators, small samples, missing data. We propose a workflow specifically tailored to ML research in psychology, highlighting typical challenges and pitfalls. It consists of five steps: (1) conceptualization, (2) preprocessing, (3) model training, (4) validation and evaluation, and (5) interpretation and generalizability. In addition to the more technical-statistical steps, the workflow also includes the more conceptual aspects that need to be addressed to successfully implement ML modeling in psychological research.

Work program

As a first project, we will conduct a comprehensive systematic review of the predictive modeling literature across different psychological subdisciplines over the past decade. The goal is to provide an overview of common practices in psychological research regarding conceptualization, data preprocessing, model training and validation, generalizability claims, and open science practices. In a second project, based on the systematic review, we will identify typical pitfalls and develop a checklist to help authors navigate through the ML workflow. In addition, we will compile a brief Risk of Bias assessment for ML modeling that can be used to assess the quality of ML studies, for example when conducting a meta-analysis. In a third project, we will create an ML predicting challenge and evaluate our best practice recommendations in an experimental setting. In one condition, we will provide no further guidance or restrictions beyond the description of the prediction task, while in the other condition, we will provide researchers with recommendations and information on how to identify and avoid common ML modeling pitfalls. Finally, we will test whether following the recommendations leads to more robust, transparent, and reproducible predictions. In a fourth project, we will develop an open online learning course to teach the logic and techniques of ML modeling. All four projects will provide tools and resources to mitigate the replication crisis in ML.