This is Big Data at its best. Is it possible to identify every targetable household in the US? Absolutdata teamed up with a client in the fashion retail business to do just that.
Our client had a problem. As a designer clothing manufacturer and retailer, they were used to contacting their customers directly via their trained sales team. Like everyone else, they needed to boost the efficiency of their sales process. But here’s the thing.
As a national retailer, their prospective client base is spread out over 300 million people and 3.8 million square miles. That’s a whole lot of data to collect, sort through and analyze.
Nevertheless, our client needed to know which households (out of 117 million or so) would be most likely to buy their product. And how they could best reach these potential customers. And what they might expect each customer to spend.
Here’s how we did it.
Mapping a Really Big Data Study
At Absolutdata, we combined their fashion merchandise expertise with our team of retail experts, data engineers, and data scientists. Our goal was to find out if we could, indeed, come up with actionable data for over 100 million prospects. The plan looked like this:
- Combine and consolidate data from multiple disparate sources.
- Build hypotheses around factors that would increase the likelihood of purchase.
- Create a model based on the previous two factors.
- Identify targetable customers, based on their spending habits and demographics data. Use this to understand the potential value of each person.
- Map out where customers are, using geospatial data. Use this to understand the trade area and where customers are the most concentrated geographically.
Fitting Thousands of Variables Into One Analysis
In this case, there were more than 10,000 variables to sift through, including overall demographics, geospatial location for customers and sales agents, and previous purchasing patterns. All in all, it amounted to over 200 gigabytes of information.
This massive amount of variables and information was studied with the meaning of each variable documented in the data dictionary. The data dictionary also included details like variance, mean, and quintiles of the variables. This helped us remove several outliers.
Next, we moved on to a detailed and careful exploratory analysis. The main goal here was to identify variables that are having an impact on the purchase decision of customers.
Now it was time to build our hypothesis. Given the huge number of variables that emerged, we especially needed to understand what needed to be left out. It turned out to be quite the crash course in fashion for data geeks like us. Which was fun, and sometimes pretty funny.
During this time, our clients really got to understand the link between purchasing behaviors and the demographics of their customers. For us, it was an especially exciting phase as we got to see some clear trends on how change in demographics was changing purchase behaviors. And it all sounded extremely intuitive to our client. We were on the right track!
We had the data points. We had the hypothesis. Now, we had to construct meaningful models for our fashionista brand.
Our Models Were So Scientific It Was Beautiful
Math and statistics may not be as exciting as designer duds, but this is where data scientists and analytics pros dig in. To start the modeling, we went through several steps, including:
- Tidied up the data by removing co-linear and correlated variables
- Identified high-leverage observations and removing them
- Found high information variables that made the most sense for our client
- Referred to the initial clusters we had previously established to check for purchasing behavior trends
- Selected the most appropriate model for the job. No pun intended. Our roster included Generalized Linear Models, Splines, Decision Trees, Random Forests, and Geometric Brownian Motion (GBM) models.
And this was a lot of work, but we were still far from finished.
Putting It All Together, In a Stylish New Package
It was time to produce some actionable information. This meant defining customer segmentation and trade area mapping.
Creating accurate customer segmentation seemed like a never-ending process. Getting the final segments in place by profiling various clusters meant trying out endless iterations. In turn, this meant lots of conversations with our fashion-expert clients. Thanks to their input, we were able to profile the customers and associate values with each customer segment. The result was a highly refined set of clusters, with each customer having an associated value.
The final piece of the puzzle needed to be placed. How appropriate that it was geographical. We had a huge amount of associated geospatial data. Getting our clients to see exactly where their valuable customers are located was challenging for us and for them.
To meet this challenge, we provided an interactive dashboard for the clients’ use. With this tool, they could clearly find where marketing budgets needed alignment and where there was a dearth of salespeople.
This was on track to become the single most important dashboard for the entire company.
From Big Data Comes Big Value
In the end, we created more than tools, models, and a mountain of consolidated data. We were able to effectively show segmentation for over 300 million possible customers — a market size of $1 billion. And from that big picture, our clients could drill down to find their most profitable geographic locations. The fact that our clients could do this by means of an interactive dashboard was the cherry on top of one very big, very Big Data sundae.