I Have All My Big Data in Hadoop; Now What?

Big Data is exciting because it has the potential to deliver insights that can transform your marketing. Storing and processing Big Data has become a relatively well-understood problem, and most enterprises have made good progress towards collecting this treasure trove of customer data in next generation Hadoop clusters or Big Data repositories.

But determining what to actually do with that data is another matter.

Most marketers immediately think of using customer data to establish rules for how to treat their customers. For example, if a prospect or customer actively browses your website on an Android device, but has not downloaded your Android app yet, then you should promote the mobile app to them, right?

Rules are great for simple situations, but offers and decisions are rarely simple and clear-cut for marketers.

For example, try writing down a set of rules to let your computer decide whether you should cook dinner tonight, go to a restaurant, or order in pizza. Sound simple? You’ll probably take into account how much time you have, how hungry you are, how tired you are, what food you have in the refrigerator, what you had for lunch, how much spare cash you have, whether you are trying to lose weight, what’s available in the neighborhood, and the weather outside. Not to mention who else will be dining with you and what you feel like eating! You’ll quickly get to a long and complex list of nested rules. Maybe you can bear to write them down once, but it’s no fun to maintain and manage them.

Rules created by humans faithfully executed by a computer have a serious drawback—they don’t help you discover what you don’t already know. People tend to set rules that “make sense,” which really means they’re imposing their own preconceptions on the data and limiting the scope of the use case to their imaginations. So, how can a marketer discover unexpected but valuable insights?

Statistical models enable marketers to be much more accurate. Using regression models, decision trees and similar techniques enables analysts to build models that describe the past and predict the future. This problem is called “supervised learning”—where there’s a correct answer in hindsight, and everything you need is captured in the recorded data.

Analysts such as Nate Silver of The New York Times have helped popularize these techniques and demonstrate great accuracy in predicting anything from election results to baseball performance. The same techniques can be applied to your customer data to yield unexpected insights and customer intelligence.

For example, say a financial services company uses these modeling techniques to build models predicting which of its website visitors is most likely to open a student account or loan. The company finds that one of the most powerful predictors of student account interest is Web activity later in the day. Whether that is due to the fact that students are busy at lectures all morning or recovering from last night’s party, would most marketers have predicted time of day to be one of the strongest predictors?

Statistical models and supervised learning can be powerful and reveal unexpected insights, but they need expert modeling skills and significant maintenance overhead to ensure they don’t become stale.

Just as rules-based systems will not help marketers uncover new insights, models built on historical data are only as rich as the data they’re modeling. How do you evaluate brand new ideas which, by definition, may not appear in your historical data? Sometimes you just have to test them out.

Imagine you finally decide to eat out tonight, and now you’re choosing a restaurant. You have a few favorites you’ve enjoyed in the past, so you might go to one of those. But if you always go to the same restaurants, you’ll miss out on other places. People learn by augmenting experience with experimentation, and your Big Data technology should do the same.

One of the great benefits of digital marketing is that responses are measurable. You can serve a digital advertisement to a customer and directly record whether or not they respond. But you cannot know how a customer in a micro-segment will respond to a completely different ad unless you try it. Marketers need to serve ads or offers to see the response.

Online machine learning and offer testing solve this problem by combining regulated experimentation with the extraction of statistical knowledge from data. Online learning uses Big Data, testing, and continual learning to auto-generate better and better statistical models “on the fly.” It maximizes the rate of learning, while minimizing the opportunity cost of experimenting.

Online learning has other advantages—it automatically adapts to chance and is efficient for large data volumes, because data is streamed through models and only processed once.

As data gets ever bigger, faster and more diverse and demand for data scientists continues to rapidly outstrip demand, marketers and their agencies will quickly find that automated online machine learning will become the only scalable approach to extract the maximum value from their customer data.

       Jason McFall is CTO & Matt Reading is VP of client services at Causata.

Related Posts