Is Speech Recognition Ready for the Big Time?
What is speech recognition? It is computer hardware and software that listens to what you have to say, parses it, matches what it finds against a database of known information and responds accordingly. You use speech recognition, for example, when you "verbally" dial on a hands-free cell phone or use one of the new PC-based dictation systems.
This technology is exploding for two reasons. First, PC horsepower has progressed to the point where it is powerful enough to process speech in real time. Second, the algorithms that allow a computer to recognize speech have been refined. For speech recognition to be useful, it's necessary to recognize the patterns that underlie the speech, without regard to accent, speed or the quirks of any individual user.
While the computer and popular media have focused much attention on the possibility of PC dictation systems interacting with a computer, it is clear that one of the big winners of advances in speech recognition will be the call center industry.
Reaching New Classes of Callers
Speech recognition is gaining a toehold in call centers as a tool customers use to reach the proper person or extract information from a host database. By itself, speech recognition doesn't add new functionality - callers have made automated choices for years by pushing buttons. But the ability to speak your selections adds new callers: those with rotary phones, those who are mobile and those so pressed for time they can't be bothered to do anything but speak. It then processes those callers using the same tools call centers have used for years. The same benefits flow from speech recognition as from interactive voice response: fewer calls going to an agent, shorter calls and more self-service.
The most active users of this technology have been financial services firms. These companies are the early adopters of many call center technologies, especially the Internet. That's because their businesses focus on providing repetitive, data-centric transactions in a compressed time frame. The input a speech recognition system processes is very defined - sequences of digits for things such as account numbers, phone numbers, social security IDs or passwords.
There are two distinct types of speech recognition - speaker-dependent and speaker-independent. The two diverge wildly in the types of things they are good at, and the types of systems needed to make them run.
Most of the attention in the world at large has been paid to speaker-dependent recognition. In this type of system, the user trains the computer to understand the patterns in his or her own speech. By training the system over time, a user can teach it to understand a very broad vocabulary of words, and it can approach 98 percent accuracy in transcription with certain types of text.
Call Centers: A Speaker-Independent World
This is quite good, but has little or nothing to do with call centers where applications must be speaker-independent. Call centers handle calls from a vast number of people, all with different voices and speech patterns. But speaker-independent recognition is harder to accomplish, requiring more resources than a personal dictation system. For that reason, call center systems tend to be less inclusive, with smaller vocabularies. What you want is a system that responds to the likely inputs - common words such as yes, no, help and operator as well as digits and letters of the alphabet.
Internationally, there is still a large installed base of potential callers who cannot access IVR. It follows that these callers are going to be expensive to process when they come into a call center because they have to be held in queue until an agent is ready.
Speech recognition will be a more important technology for call centers over the next few years. The centers that prosper will handle their paradoxical mission of simultaneously reducing costs and improving service.
Speech recognition is expensive to develop, but once it is done, it's done forever. In 1998, costs dropped as better tool kits became available for creating speech applications, and the vocabularies they can process grew.
The cost of maintaining a speech recognition system has few of the headaches involved in computer telephony integration or other "fancy" call center technologies. Once you tease meaning out of the speech, it becomes just like information entered through the Web or a touch-tone phone.
Call centers that add value to the customer are the most valuable. The more people you can encourage to call, the better off you are. Capture data about them. Learn their likes and dislikes. And leave them with a positive experience to tell others about.
That's the essence of speech recognition - a simple front-end tool without many bells and whistles. Just as IVR turned customer entry into a call center, customers will accept speech recognition as a natural way to connect with a company.