Arria NLG: data in, language out

Dr Robert Dale cares about ideas.

“I’m not quite satisfied with the answer I just gave you, Kim. Let me try again.” His soft Scottish accent has survived transplant to Sydney, Australia, where he spent more than twelve years as a professor in the department of computing, specializing in “natural language generation, reference and anaphora, intelligent text processing, information extraction, automated grammar and style checking, the semantics of typography, spoken language dialogue systems” (as the university website still tells us).

Chatting with Dale, currently Chief Strategy Scientist and Chief Technology Officer at Arria NLG, about natural language generation is like participating in a seminar–in the best possible way. He’s clear, helpful, and willing to be challenged by questions. But let’s get the acronyms out of the way.

The Science

“NLG is technically one half of NLP,” he told me. NLP is natural language processing: the other half of it is NLU, natural language understanding. NLU renders text as data. It’s been around a while, and is “a much more popular and visible space” than NLG. NLU has powered the automation of text analytics and most forms of sentence analysis. “80 percent of energy, effort, and money has gone into NLU,” Dale says, “and that’s not surprising. NLG has been the poorer sister, but that’s changing now.”

NLU reduces text to information. Given the vast quantities of text appearing online every second of the day, converting it into actionable data has been a vital task. While that was happening, Dale told me, “NLG remained an academic exercise.” Today, however, it’s increasingly big data which needs to be explained in language. In essence, NLG takes numbers and automatically converts them into textual commentary and reports

The Use Cases

London-based Arria NLG creates science-based “information delivery solutions” for clients. What that means, in practice, is building software which can generate written content from a client’s data. “You can scale up the production of textual reporting beyond what would be humanly feasible,” Dale said. One example is Arria’s input into weather forecasting by the Met Office, the UK’s national weather service. In the past, the Met Office had issued forecasts only for some 60 locations–not because it lacked more granular data, but for economic and logistical reasons. Using NLG software, the office can now create reports for thousands of locations in seconds. “It can scale indefinitely,” Dale commented.

NLG can also be used to make written recommendations for action, based on data. “Language is the best device for conveying information,” Dale told me. As data gets more complex, graphic representations get cluttered and overloaded. For an oil and gas client in the Gulf of Mexico, there was a need to boil down masses of data generated by sensors into text reports focused on key events. For this, Arria tapped human expertise to teach their NLG software. “What would the expert say about such and such a thing?” was the question, according to Dale. Structured interviews with experts, combined with reverse engineering of existing, human-generated, written reports, provided the intelligence which Arria’s software could automate and scale.

NLG-generated reports can be finely tailored to an audience too. Dale told me about work in the neo-natal intensive care unit at the Edinburgh Royal Infirmary. The same data, from sensors and medical records, informs several different types of written report: for physicians treating patients; for nurses handing over shifts; and for concerned parents.

Getting it Right

These use cases underline the importance of being able to rely on the accuracy and validity of reports and recommendations which are, essentially, written by a robot. I questioned Dale on the increased need for QA as reports become more remote from the data on which they’re based. His first response was to remind me that “ultimately, there’s nothing unique about this application in that regard. With technology, there’s always scope for error.” In some highly sensitive use cases, there’s always going to be an expert in the loop, but it’s significantly less work to have an engineer to look over reports and approve them than have the engineer create them from scratch.

That’s when Dale paused, and delivered what he thought (and I agreed) was a better answer. “We’re adding a layer of textual interpretation to the data. ‘Tell me what that graphic means.’ You’re suggesting that the graph is a more direct presentation of the data. I’m not sure that’s true. Visualization is not necessarily that close to the raw data either. Graphing data can involve manipulation too, like smoothing it.” At the end of the day, in any case, we rely on machine interpretations of data all the time, he said. Think of an automobile dashboard.

The Marketing Angle

Clearly this is a developing technology with multiple applications. Why do marketers need to know about it? There’s a use case for advertising campaigns. Digital data on audience response can be matched with predictive analytics to generate real-time written recommendations on how to improve the campaign’s effectiveness.

But there are deeper possibilities. “There are two facets to this,” Dale said. First, “you can tailor content based on what you know about a person.” Second, “you can play around with the language” to discover the most effective form of communication. In other words, said Dale, you can go beyond a kind of Reader’s Digest-type language template, and be more creative. “In most cases, a human would be more subtle and nuanced,” he admitted, but the advantage, of course, is scale.

There’s a further step Arria has not yet taken, which is the automatic generation of “manipulative” language. “You can start to look at the psychology of language use,” said Dale. This may sound sinister, but according to Dale “advertising is all about persuasion anyway.” He anticipates a reaction against marketing and advertising which targets individuals so directly. As he asked at the New York Sentiment Symposium earlier this year, are individuals comfortable with swift messaging, in effect “targeted at you“? He compares the situation with Google Glass. At first people are uncomfortable about a new technology, but the pendulum can swing back. “People need to become acclimatized,” he said.

Finally, if NLG can conjure targeted advertising messages out of a universe of numbers, it could certainly produce robo-written news stories and blogs on assigned topics too. But I promise, this isn’t one of them.

Hand-Picked Top-Read Stories

Illinois’ tier 2 pension reforms spark debate

Balancing SEO and user-focused content for business growth

Google analyst clarifies misconceptions around hyphenated domain names

Trending Tags

Arria NLG: data in, language out

Illinois’ tier 2 pension reforms spark debate

Google analyst clarifies misconceptions around hyphenated domain names

Balancing SEO and user-focused content for business growth

Brian Gareth discusses impact of link building on SEO

Milwaukee’s Century City park eyes significant expansion

Arria NLG: data in, language out

Related Posts