Speech Recognition: Path to CRM

A great deal of attention lately has focused on the pros and cons of customer relationship management tools — much from the point of view of “which tool can I buy, and what is it going to do for me?”

One of the things that’s been overlooked is that for CRM to function properly, it needs to receive constant, meaningful input from an outside system — input that acts as the raw meat for its real work, which is the analysis of what customers want and whether they got it.

We’re used to thinking of two main data inputs from the customer. One, interactive voice response, has been around forever and is widely understood. There are few new ways to construct a branching tree routing script or to parse an incoming account number. Application development for IVR has been made easy enough for an intelligent nonexpert to put together working, useful applications.

The other input mode is the very rich, multitextured Web interface. This has developed so recently that there are many techniques for feeding customer input from a Web site through to a CRM system and into a contact center. All the many data-gathering and connection modes fall into this camp: e-mail, click-to-call, text chat and Web-based e-commerce forms.

Almost off the radar, though, is another technology that has reached strong maturity. Speech recognition, which is making big waves in the consumer technology world, is still seen as something of an afterthought in the CRM/contact center world. In fact, it’s a lot more than just a replacement for push-button input.

Why is it different? I’d like to argue that it creates the opportunity for a completely different type of interaction than does IVR, which, despite having “interactive” as part of its name, is really a one-way channel. Customers enter their identification and pull out a small subset of data that pertain to them.

Remember why it caught on in the first place: It automated the dumping of small bits of repetitive information to the caller, keeping those calls away from expensive agents. Relatively low-tech, profoundly efficient, easy-to-diagram into an existing call flow, it was the very definition of a no-brainer.

But when you add speech to the same call, you add several orders of complexity. Forget about the complexity of the technology that you need to operate it; instead, concentrate on the complexity of the information flow between you and the customers. Instead of asking questions that get answered in only numeric digits, you can draw out responses that are far more subtle.

Few people are going to enter an address using keys that have three letters apiece. Asking for a stock quote using the Charles Schwab & Co. IVR is hard enough — each letter of the alphabet is assigned a two-key code. Schwab had to send customers little wallet cards to remind them of the alphabetic cipher just to be able to retrieve stock quotes. This was in 1996, before the company was heavily online and before it installed a speech-recognition system.

Stock quotes are one of those basic information retrieval apps that the Web does well as a replacement for the phone system. But you can do things that are so much richer, limited only by the system resources available to parse the speech and the power of the recognition engine.

According to SpeechWorks, one of the companies offering powerful speech recognition tools, you need four things to run high-quality speech applications:

* state-of-the-art technology;

* high-level building blocks — essentially this means that the recognition engine contains prefabricated modules for handling certain types of speech;

* tight integration on robust telephony platforms;

* tools for analyzing and tuning applications.

Those four make the speech recognition work. To make the application it’s running a success as well, you also need:

* appropriate application development procedures;

* an understanding of what your app is ultimately supposed to do for you in terms of what would make it a success;

* a good user interface.

SpeechWorks said that based on several of its customer installations, the average cost per minute of an agent-attended call is $1.50; by contrast, the average cost of a speech-recognition-attended call is 25 cents to 35 cents. That’s not too surprising, and it rightly said that that speech-recognition cost varies based on the contract the call center has with its local telephone companies for long-distance traffic.

At one of its installations, the length of customer interactions was reduced from 12.5 minutes through push-button to two to three minutes using speech. This goes right to the question of speech recognition as a rough equivalent to IVR. If anything like that reduction can be repeated across the board or even in a significant minority of applications, then speech looks like a better way into the database despite the higher level of technology it requires.

From a call flow and design point of view, though, it would be a mistake to think of speech recognition as “talking IVR.”

When we speak of a richer interaction, we mean that you do not have to delineate options one through four and leave people scratching their heads to figure out where their particular problems fit into your scheme.

The sophisticated application will acknowledge that there are ambiguities of response and will tailor prompts to try to zero in on what the customer needs without being as linear as IVR.

People who are expert at using the system can take a shortcut through it, for example, or can barge in — that is, talk while the system is talking and have it know that it should stop and listen.

Those points apply to the IVR/speech-recognition comparison, which is how you look at speech recognition when its main purpose is to identify the people and route them to the right agent. But again, the interaction can be richer and used to gather information that you didn’t have already. Once you’ve used the system to identify the caller, you can ask questions that have more detailed answers, even questions that are tailored to a particular audience or context.

The stronger the speech-recognition engine, the more you’ll be able to segment what a caller says. Again, its strength in the long run will not be that it gets the information to the caller at a lower cost; rather, it will be that it gets information from the caller to you in a more meaningful and spontaneous way. It’s easier to say something into a phone than to fill out a survey and mail it back or even to fill out a form on a Web site.

When we look at the spectrum of CRM-style applications that will roll out during the next two years, the common element is needing an information channel that brings information reliably from the customer to the company. We’re used to customers calling when they want something, and parsing the data that comes with the call is so old hat that it’s never even mentioned anymore. We’re quickly getting used to customers using e-mail and the Web for interactions.

We’re moving inexorably to where all transactions are recorded, stored and analyzed using advanced data-mining techniques.

If universal recording and archival does arrive, the collection of speech recognition data, added to the analysis, could be a valuable addition.

CRM, so much a buzz term now, is an idea that represents a range of future technologies, some of which will catch on and some of which will not. It’s the theory that matters, that information flows freely among all systems and is parsed somewhere inside the organization, probably away from the call center.

The theory of CRM will depend on controlling the most customer information at the lowest cost. It now seems that speech recognition has a good shot at replacing IVR as the primary information gathering tool for phone-only interactions.

Related Posts