People Keep Confusing Imbalanced Data With Small Samples

Rare events vs. rare data: Clarifying a common data science error

Aug 11, 2025

∙ Paid

In my previous post, I talked about the common misconception that the ROC-AUC score is inherently unreliable on imbalanced datasets. I explained why this isn’t true and highlighted that, actually, ROC-AUC is very stable regardless of the dataset’s imbalance.

In this post, I’ll keep talking about the same topic, but from a slightly different perspective. …

Continue reading this post for free, courtesy of Samuele Mazzanti.

Or purchase a paid subscription.

From Data to Decisions

People Keep Confusing Imbalanced Data With Small Samples

Rare events vs. rare data: Clarifying a common data science error

Continue reading this post for free, courtesy of Samuele Mazzanti.