Class imbalance is present when one class, the minority class, is much rarer than the majority class. ML models extract features better and are most robust if all classes are approximately equally distributed. If considerable class imbalance is present, ML models will often become “lazy” in learning how to discriminate between classes, and instead choose to simply vote for the majority class. This bias provides synthetically high AUC, accuracy, and specificity, but unemployable sensitivity. The “accuracy paradox” denotes the situation when synthetically high accuracy only reflects the underlying class distribution in unbalanced data.
As an example, one might want to predict complications from a registry containing 90% of patients without complications. By largely voting for the majority class (no complication), the model would achieve an accuracy and specificity of around 90%, and very low sensitivity without actually learning from the data. This can be countered by adjusting class weights within the model, by undersampling and thus removing observations from the majority class, or by oversampling the minority class.1 Specifically, the synthetic minority oversampling technique (SMOTE) has been validated, shows robust performance, and is easy to employ.2 SMOTE simulates new observations for the minority class by using k-means clustering.
Neurosurgical data is often prone to class imbalance. With the emergence of many studies that aim to predict neurosurgical outcomes using ML, it is crucial to ensure methodological quality. In general, if class imbalance is present, care should be taken to weight classes or to under- or oversample using data science techniques like SMOTE. Accuracy and AUC alone do not always give a full representation of a ML model’s performance. In our view, additionally reporting sensitivity and specificity is central.
1. Batista GEAPA, Prati RC, Monard MC: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor Newsl 6:20–29, 2004
2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP: SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357, 2002
From: Staartjes, Victor E., and Marc L. Schröder. “Class Imbalance in Machine Learning for Neurosurgical Outcome Prediction: Are Our Models Valid?” Journal of Neurosurgery. Spine 29, no. 5 (01 2018): 611–12. https://doi.org/10.3171/2018.5.SPINE18543.