Learning about Machine Learning the Easy Way
Huge ROI in machine learning, but only for the deeply qualified
From Blade Runner to I.Robot, to Transformers, Hollywood's robots-take-control genre has long profited from fears surrounding artificial intelligence's future role in society. But it's looking like, at this particular juncture, AI is more likely to determine what life insurance or hiking boots people might buy than whether cyborgs or humans will control the world.
No worthwhile data management platform or analytics solution emerges today without touting its machine learning capabilities and powers of predictive analytics. But machine learning's roots are not foreign to business people. It's an outgrowth of statistics--a ramped-up, higher-tech take on predictive modeling.
Delivering a primer on the topic at a client conference, SAS Manager of Data Science Technologies Wayne Thompson noted that the main difference between statistics and machine learning is that “statistics focuses more on inferential analysis or hypothesis testing to make predictions about a larger population than the sample represents. Machine learning uses massive amounts of observational data and, as a branch of artificial intelligence, focuses on automation.”
With each iteration of an email campaign or each interaction with a website, machine learning offers marketers the promise that it will store intelligence gained and use that intelligence to make the next connection with a customer more relevant and more revenue-producing.
Machine learning is powered by algorithms that can predict, for instance, which insurance customers are most likely to file a claim—the type of information in high demand by data-driven marketers whose compensation packages are increasingly dependent on ROI.
Machine learning applications run largely on “supervised learning” algorithms that use historical data to predict future events. Credit card companies, which have long experience in using data to predict who will be the most profitable and the most risky customers, use supervised learning to eliminate potential cardholders with the greatest potential for fraud. This is done by randomly selecting card applicants (training data, in ML parlance), itemizing customer activity like account balances and frequency of transactions (features), adding data from subsequent interactions (output variables), and feeding this binary soup into the algorithm. A model is created and employed in a subsequent marketing campaign that picks out and eliminates applicants at high risk for fraud. Further intelligence is compiles and written into the rules for a decision tree.
Ah, but decision trees grow only to fall. An algorithm called “random forest” employed in machine learning systems combines individual decision trees and, when new input enters the system, it runs down all of the trees. “If I'm fitting around a random forest, I'll build decision trees on many random subsets of the data and then average them to build the final model,” noted SAS's Thompson in his machine learning primer for clients.
By now, you should be getting the idea that machine learning is not a toy to be tinkered with by the uninitiated. Gaming pioneer Arthur Samuel, whose Samuel Checkers-Playing Program is credited with having employed the first self-learning program, succinctly summed up machine learning as a “field of study that gives computers the ability to learn without being explicitly programmed.” But in a blog on the Toptal developers community recently, software engineer Nick McCrea held that the science behind the solution is very tricky.
‘There are many subtleties and pitfalls in ML, and many ways to be led astray by what appears to be a perfectly well-tuned thinking machine,” McCrea wrote. Truth is, he said, there aren't enough capable machine learning designers out there to keep up with demand from the world of commerce.