This is third in a series of Segmentation and Clustering articles. Ensemble Segmentation combines several segmentation solutions to create a richer, more multi-dimensional segmentation model.
Ensemble Segmentation, as the name suggests, implies combining several segmentation solutions that have been developed on the same data to create ensemble segments that best solve the problem at hand. Ensemble Segmentation first appeared in data mining literature in the mid-1990s.
Ensemble Segmentation is motivated by the fact that an ensemble is likely to be richer than its one-dimensional constituent solutions. In the context of market research, Ensemble Segmentation may mean combining segmentation solutions that have been developed independently from behavioral and attitudinal data on the same sample, thereby increasing our understanding of customers. In the context of CRM data, this may mean combining solutions that have been developed independently using marketing data and sales data, thereby allowing us to add customer intelligence to sales activity.
Typical Phases of Ensemble Segmentation
Phase 1: Develop multiple solutions that vary in terms of the method employed (K-Means, Hierarchical Clustering using the number of clusters (ranging from 2 to 30)) and the measures used.
Phase 2: Cluster and group respondents based on the analyses generated in Phase 1. Different meta-clustering algorithms can be used to cluster respondents; one approach is to use distance-based methods like K-Means.
Advantages of Ensemble Segmentation
There are some inherent advantages of creating an ensemble of segments:
- Combining groupings from alternate and dissimilar sets of variables (e.g. demographics, lifestyle behaviors, desired benefits or needs, etc.) is likely to lead to richer insights.
- Analysts can include a variety of clustering techniques when building the ensemble in Phase 1; it is not restricted to one approach.
- Legacy clusters based on internal data can also be incorporated.
- Cluster solutions that are less sensitive to sample variations and outliers are uncovered.
- Solutions that would not have been obvious with a single approach become apparent.
Approaches to Ensemble Segmentation
There are several approaches that can be used in Ensemble Segmentation: Bayesian methods using Gibbs Sampling, EM Algorithms, Hypergraph Partitioning, K-Means Clustering, Natural Clusters Combined from Shared Nearest Neighbors, etc.
Reference for Further Reading
An Ensemble Method for Clustering – Andreas Weingessel, Evgenia Dimitriadou & Kurt Hornik
Technical articles are published from the Absolutdata Labs group, and hail from The Absolutdata Data Science Center of Excellence. These articles also appear in BrainWave, Absolutdata’s quarterly data science digest.
Subscribe to BrainWave