Tools for Automated Text Analysis
Some forms of text analysis have been around for years. Much of the initial work was grounded in academic research on artificial intelligence. This also encompassed speech processing, computer vision, language translation, neural networks and other components. Though all of these have met some success, text analysis has progressed much further in its practical applications.
What pulled text analysis out of the lab was the Internet. Two crucial Internet functions require text analysis: search engines and e-mail response management. At the height of the tech boom, text analysis specialists raced to develop products that drew on their skills. The search engines and automated response systems they created are now so much a part of our everyday lives that we don't think of them as based on advanced technologies.
Of course, some people might argue that a really advanced technology would produce fewer irrelevant search results and automated replies. Many text analysis gurus would agree, because the most sophisticated text analysis approaches are not yet embedded in common Internet products.
The technical details of these methods are best left to specialists. But in general, their approach is to move beyond scanning for key words - still the most common approach for search engines and automated response systems - to classifying documents based on the concepts expressed in their text. Some systems derive classifications from general information such as dictionaries and grammars; some develop custom classifications from input documents; and some rely on classification schemes provided by the user. Many use a combination of these methods.
Once the categories are established, they can be applied to new documents, which then can be accessed by category rather than key words. This avoids keyword problems such as missed matches because of synonyms and irrelevant results when words have multiple meanings.
In addition to assigning documents to categories, most systems can establish relationships among the categories themselves. This lets them identify concepts and documents that are similar to others, and sometimes even arrange these concepts and documents in hierarchies. Related functions can identify the most important sentences within a document and use these to build document summaries.
Some systems also identify specific bits of information within a document, such as names, dates and locations. This capability, called feature extraction, can convert unstructured text into a structured database record. This is useful, since the records then can be processed with conventional data management tools.
Though these methods have not been widely adopted, they have been available for years in specialized products. For example, Autonomy (http://www.autonomy.com) and Semio (http://www.semio.com) have long provided search tools using advanced text categorization and similarity measures. Wider use has been limited by practical obstacles such as cost, scalability and difficulty of deployment. These barriers will fall as the technologies mature. So now is the time to imagine what a marketer's world will look like when advanced text analysis is readily available.
One change should be an improvement in existing applications. More intelligent search engines should make life easier in general and allow advances in tools that scan the Web for specific information - say, new competitors or prospects - and summarize the results. (This column reviewed one scanning product, Intarka, two years ago, but the system is apparently no longer available.) More accurate automated responses also should cut customer service costs and improve satisfaction.
But the real change should come from new applications. Perhaps the most intriguing is mining customer comments for trends and opportunities in the same way that companies today mine their structured data.
One system offered for this purpose today is PolyAnalyst (Megaputer Intelligence, http://www.megaputer.com , 812/330-0110). PolyAnalyst is a set of modules that analyzes both text and conventional data. Text functions include categorization and feature extraction. This means the system could identify common themes in customer comments, code individual records with these themes and then prepare a detailed statistical analysis of the records. Such tight integration of text and data analysis is unusual and obviously convenient. It helps that PolyAnalyst's conventional data analysis functions are extensive and impressive.
Megaputer offers a separate text analysis product, TextAnalyst, using a different approach from PolyAnalyst. While PolyAnalyst provides detailed analysis of individual records, TextAnalyst is oriented to organizing groups of documents. Megaputer is also working on yet another product, due in several months, that will do text processing such as classifying and routing e-mails.
Island Data (http://www.islanddata.com , 760/517-4100) already offers text processing based on concepts in combination with key words. The vendor's flagship product, Express Response, is used for online customer service such as automated e-mail response and message routing. Using concepts, the system can classify messages in terms such as tone and urgency and can identify situations such as sales opportunities or attrition risks.
This can happen in real time, allowing immediate response when an opportunity presents itself. Most other text analysis products work as batch processes. Island Data is working on a new product that provides similar capabilities but is oriented toward marketing applications rather than customer service operations.
Island Data works as an application service provider - that is, messages are routed to its computers and processed there - rather than selling its software for operation by its customers.
Text Analysis International (http://www.textanalysis.com) takes the opposite approach, offering tools for users who want to build their own text processing systems. Capabilities include categorization, summarization, natural language queries, text analysis, indexing and data extraction. The vendor also provides a programming language tailored for natural language processing, knowledge base management system, rule generation engine and runtime text analyzer. Clients can combine these to build and access databases of information extracted from unstructured text.
These are just a few vendors with interesting text analysis products. For a more complete list, go to www.kdnuggets.com/software/text.html.