Complexity and interpretability in clinical prediction modeling

When carrying out model selection on multiple equally well-performing models, then based on the first principle of “Occam’s razor” the simpler of the two models should be preferred (1). Interpreted clinically, we argue that simplicity acts as proxy for comprehension and scalability. Deep learning dramatically advanced many fields including image and speech recognition. However, various domains
remain, in which deep models might not be the optimal choice in comparison to their classical counterparts when considering input structure and available sample size (2). Especially for the common use-case of predicting from tabulated patient-level data in clinical settings with relatively low sample sizes (e.g. relatively as in: compared to the sample sizes that Google uses, millions of images), the performance gain from adopting a more elaborate nonlinear model, while controlling for overfitting and class imbalance, is negligible. Many modern neural networks (3) poorly calibrate. Especially, in clinical practice, calibration is crucial. Often, simpler models, such as logistic regression or Cox proportional hazards models, calibrate better with the appropriate complexity in relation to the classification problem at hand. It is always interesting to compare the calibration curves of complex models such as neural networks with the simpler linear SVM calibrations. Although the neural network as a more complex model can produce more accurate predictions, they continue to be treated as black boxes by many clinicians and inherently lack interpretability when compared with simple yet powerful models as logistic regression (4). For clinicians, it is important to understand the underlying reasoning of a machine-learning approach and understand how to correctly apply them–otherwise, they are not well accepted into the clinical routine despite being strong prediction tools.


  1. Domingos P. The role of Occam’s Razor in knowledge discovery. Data Min Knowl Discov 1999;3:409–25.
  2. He T, Kong R, Holmes AJ, Nguyen M, Sabuncu MR, Eickhoff SB, et al. Do
    deep neural networks outperform kernel regression for functional connectivity prediction of behavior? bioRxiv 473603; doi:
  3. Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural
    networks [abstract]. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia. PMLR 2017;70.
  4. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 2019;1:206–15.

From: Kernbach, J.M., Staartjes, V.E., 2020. Predicted Prognosis of Pancreatic Cancer Patients by Machine Learning—Letter. Clin Cancer Res 26, 3891–3891.

0 comments on “Complexity and interpretability in clinical prediction modeling

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s