From Data to Decisions

From Data to Decisions

People Keep Confusing Imbalanced Data With Small Samples

Rare events vs. rare data: Clarifying a common data science error

Samuele Mazzanti's avatar
Samuele Mazzanti
Aug 11, 2025
∙ Paid

In my previous post, I talked about the common misconception that the ROC-AUC score is inherently unreliable on imbalanced datasets. I explained why this isn’t true and highlighted that, actually, ROC-AUC is very stable regardless of the dataset’s imbalance.

In this post, I’ll keep talking about the same topic, but from a slightly different perspective. …

User's avatar

Continue reading this post for free, courtesy of Samuele Mazzanti.

Or purchase a paid subscription.
© 2026 Samuele Mazzanti · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture