This is last in a series of seven Segmentation and Clustering articles. Clustering Illusion is an intriguing human tendency to erroneously perceive clusters in small random sample sets.

 

Definition

The Clustering Illusion is the tendency to erroneously perceive small samples from random distributions to have significant ‘streaks’ or ‘clusters’. It is caused by the human tendency to under-predict the amount of variability likely to appear (due to chance) in a small sample of random or semi-random data. For instance, if you were to flip a coin and have heads turn up ten times in a row, you might think that the coin is biased. But if you were to flip that coin 1,000 times, the odds of getting ten or more heads in a row are a surprising 62 percent.

 

Consequences

The Clustering Illusion creates traps for marketers. If they figure out some meaningful patterns in a random jumble of information, they tend to wrongly generalize the same patterns onto a larger dataset. A winning streak may indicate the clustering exercise is sound, but it may also be a statistical anomaly.

 

Is More Data a Quick Fix?

It is true that many things experience short, intense cycles. But it’s also true that things tend to regress to the mean over longer periods. So, if an investment for example has a long-term record of good returns but goes through a period of lower returns, it doesn’t mean this investment will give a poor return going forward. In fact, it might mean just the opposite: it might make it an even better time to invest. In business and economics, you often have to look at 10-, 15-, or 20-year periods to see the real trends.

Getting more data is helpful, but only if it is drawn at random. If the additional data is fraught with biases induced by something like stratified sampling, then the Clustering Illusion will persist. There will be a risk of generalization of clustering results brought on by the lack of statistical validity in the clustering outputs.

 

How to Minimize Clustering Illusions

  • Don’t place too much emphasis on short-term performance, whether positive or negative. Remember, hot and cold streaks are common and can be due to nothing more than luck.
  • While clustering, don’t rely on your intuition to guide you. Instead, use fact-based rules and disciplined strategies to overcome cognitive biases like the cluster illusion.
  • The more often you look at them, the more likely you will see trends that don’t really exist.

 

Food for Thought

Clustering Illusion demonstrates how hardheaded us mere mortals can be when it comes to facing facts that don’t support our beliefs. Awareness of Clustering Illusion can help avoid the trap of patterns that don’t exist, and can improve the accuracy and consistency of your clustering outputs.

-Authored by Bipin Kapri, Data Scientist at Absolutdata

Technical articles are published from the Absolutdata Labs group, and hail from The Absolutdata Data Science Center of Excellence. These articles also appear in BrainWave, Absolutdata’s quarterly data science digest.

Subscribe to BrainWave