From Data to Decisions

From Data to Decisions

People Keep Confusing Imbalanced Data With Small Samples

Rare events vs. rare data: Clarifying a common data science error

Samuele Mazzanti's avatar
Samuele Mazzanti
Aug 11, 2025
∙ Paid
4
Share

In my previous post, I talked about the common misconception that the ROC-AUC score is inherently unreliable on imbalanced datasets. I explained why this isn’t true and highlighted that, actually, ROC-AUC is very stable regardless of the dataset’s imbalance.

In this post, I’ll keep talking about the same topic, but from a slightly different perspective. …

Keep reading with a 7-day free trial

Subscribe to From Data to Decisions to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Samuele Mazzanti
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture