Can Big Data identify every targetable household in the US? Absolutely.

Looking in to the eye of a huge blue wave.

A nationwide manufacturer and retailer has a unique problem:  they have a really, really big area to cover. Our clients (a designer clothing company) usually contacted their customers directly via their trained sales team, but they needed to make their sales process more efficient. But how can you analyze and predict buying behaviors for the entire USA? At last count, that’s 322 million people living in 124.6 million households.

In the end, our client needed to know which of the households were most likely to buy their product. They also wanted to find out the most effective way to reach these potential customers and what each household might spend.

Predictive Analytics on a Whole Country Looks Like This

The plan to come up with actionable data looked like this:

  • Prepare data from multiple sources.
  • Identify which factors would increase potential purchases.
  • Build hypotheses around these factors.
  • Create a model based on this information.
  • Identify targetable customers based on their spending habits and demographics data.
  • Understand the potential value of every household based on the above factors.
  • Use geospatial data figure out the trade area and the areas where customers are most concentrated.

There were more than 10,000 variables to sift through, including overall demographics, geospatial locations for customers and sales agents, and previous purchasing patterns. This amounted to 200+ gigabytes of information that was studied and documented in a data dictionary. The data dictionary also included details (variance, mean, and quintiles of the variables) that helped us remove several outliers.

Next, an exploratory analysis helped identify variables that had an impact on customers’ purchase decisions.  Given the huge number of variables that emerged, it became critical to understand what needed to be left out. With this information in place, hypothesis building could begin.

We had the data points. We had the hypothesis. Now, we had to construct meaningful models.

The Big Data Modeling Process And Beyond

To start our model, we:

  • Removed co-linear and correlated variables from our data.
  • Identified and eliminated high-leverage observations.
  • Found the variables that made the most sense for our client.
  • Checked for purchasing behavior trends in the clusters we had already established.
  • Selected the most appropriate model for the job. (Our list included Generalized Linear Models, Splines, Decision Trees, Random Forests, and Geometric Brownian Motion (GBM) models.)

Next, we moved on to defining customer segmentation and trade area mapping.

Creating accurate customer segmentation meant profiling various clusters, which meant seemingly endless iterations. With some additional input from our clients, we were able to profile customers and associate values with each customer segment. The result was a highly refined set of clusters, with each customer having an associated value.

Finally, there was a huge amount of geospatial data to process. Getting our clients to see exactly where their valuable customers were located was challenging in many ways. So we created an interactive dashboard. With this tool, our clients could easily see where marketing budgets could be aligned and where more salespeople were needed.

Big Data = Big Value

In the end, our journey was about much more than tools, models, and a mountain of consolidated data. It was about effectively showing segmentation for over 300 million possible customers — a market size of $1 billion. And it was about taking our clients from the largest-scale picture all the way to the most detailed, thanks to an interactive dashboard.  Now, our clients can drill down to find their most profitable geographic locations as well as any adjustments they need to make to stay streamlined.

So can Big Data deliver big results and bigger value?  The answer is one very big Yes.

Authored by Rajat Narang – Associate Director at Absolutdata