The Inside Scoop Blog

You're So Predictable: A Lesson in Managing Data

Share this article:
You're So Predictable: A Lesson in Managing Data
You're So Predictable: A Lesson in Managing Data

"Our lives are not our own. We are bound to others, past and present, and by each crime and every kindness, we birth our future." –from Cloud Atlas by David Mitchell

There's a certain pattern most people exhibit before they reach a certain future. This can be the case in life, but it's also the case when it comes to purchasing decisions. In the latter case there is often a consistent pattern, and as marketers, we are able to capture it and find others bound to that same pattern.

For example, people who join a gym are also likely to purchase gym shoes, supplements, maybe a yoga mat or some limited in-home equipment. People who purchase a kitchen mixer will likely buy baking pans. People who purchase a boat may be interested in buying fishing gear—and so on. We're able to draw these assumptions about a consumer based on the mass of consumers before them who did the same things.

But persona marketing can be a tricky science. Luckily, there are tools and methods out there that can help us determine which attributes to select. It's called data. And businesses have lots of it. But what if we don't want to handcraft everything that goes into a model? Let's say we have hundreds of things we're trying to predict. And on top of that, let's say there's also a certain portion of the population with an unclear or unstructured past—a scattered purchase history. Maybe I joined a gym, bought a mixer, and I like to fish.

There will always be little drivers that steer us off the path, and depending on how much we steer, it can completely change the course of our future. When making a prediction these little drivers are sometimes the most important attributes of a model; they might be the key or the detail that sets us up to completely customize a customer's experience. But because we don't completely understand these little drivers, we can't necessarily label them or organize them into useable structured databases. Enter unsupervised learning. It's the key to organizing and understanding the small drivers.

Unsupervised learning can be a powerful modeling method. It can classify unstructured past and present data on its own. This can save time and energy on the human side but at the risk of machine processing time. To get the most accurate result in unsupervised learning, every time something changes it recalculates a new algorithm on its own. If, however, there is something that needs to be predicted up in an up-to-the-minute fashion, unsupervised learning may not be the best solution. It can be time consuming to re-collect, reorganize, and re-cleanse data—and that's all before retraining and recalculating the new algorithm.

Here are some things that can be done to facilitate the retraining of data, known as inductive biases:

Label as much data as possible. This way, the machine that's learning the data doesn't have to start from scratch. For our gym goer, for example, we would include everything in the first round: What time does that person go to the gym? Will he or she have eaten first? Do they go with a friend?

Get rid of useless features. These won't always be the same—there's no silver bullet—but it is important to reduce your datapoints to a more manageable set. Once we have the larger set, we can drill down from there.

Simplify your hypotheses. From the logic of Occam's razor we know that if there are multiple hypotheses, the simpler one is the better one. There are infinite ways to answer the question “What are these customers?” Scale it down. Scale it beyond “Are they likely to buy these ten products in the next year?” Start with one question: “Will someone who goes to the gym be likely to buy a yoga mat?” If you start with a simple question, it's a lot easier to develop a model that will predict whether or not consumers will do that first. Once we establish that, we move on: “Of those who purchased a yoga mat, will their next purchase be gym shoes?” It's taking your mass of data and scaling it to one simple question to get started.

Ultimately, it's a cross between two types of learning, known as semi-supervised learning, that might be the best approach. The greatest model may just need to be formulated from the known alongside the unknown. It should be based on a hypothesis that is neither simple nor complex, but rather, a hypothesis that clearly gets to the root of the problem and gives a possible solution that is executable. There are many “little drivers” of life that we could focus on, and we might even be able to formulate better hypotheses based on these findings. But as long as we get to where we need to go, the small things might just be that—a small thing.



Hamen Lo McLaughlin is a statistical database analyst at Pluris Marketing where she focuses on marketing enablement, analytics, and optimization solutions.

Share this article:
close

Next Article in The Inside Scoop Blog

Sign up to our newsletters

Latest Jobs:


Company of the week


Concerned about growth? With over 25 years experience in the industry, the list experts at Fairfield Marketing Group possess the know-how to help immediately improve any domestic or international direct marketing effort. First-time and well-established mailers can rely on Fairfield Marketing Group's expertise to help launch campaigns into positive and profitable ventures.

Find out more here »

More in The Inside Scoop Blog

Don't Get Off the Ride

Don't Get Off the Ride

Too often we move on to the next lead generation campaign before the previous one has realized its full potential. Here's why that's a bad idea.

Winning in the 'Age of Me'

Winning in the 'Age of Me'

Despite all the strides we've made in data science, many brands still segment their customers using very broad strokes. But that needs to change.

Get Your SEO Strategy in Gear

Get Your SEO Strategy in Gear

You need to protect your business from the vicissitudes of the changing search landscape. Here's how.