Speech Recognition : History

History

The first attempt at Speech Recognition was made at least 50 years before the digital computers were invented. Graham Bell, in an attempt to aid his deaf wife in understanding what people said, tried to make a device that would produce visual images out of the words uttered into the machine. While he managed to produce spectrographic images of sound, his wife couldn’t decipher them. This however, led to the invention of the telephone.

It was not before the advent of digital computers that further serious attempts were made in speech recognition technology. In 1952, Bell Labs introduced the first ‘Automatic Speech Recognizer’ named ‘Audrey’. It could only recognize the first 10 digits with 97 to 99% accuracy, provided the speaker was male, spoke with 350ms delay between words and Audrey had been adjusted to user’s speech profile. In other cases, accuracy was about 60%. The principle behind Audrey, of recognizing phonemes, served as a reference model for the barely successful research for many years to come. It was the collective works of Noam Chomsky and Halle in phonology over the idea of generative grammar, that language could be analyzed programmatically, that led mainstream linguistics to switch over from phonemes to breaking down the sound pattern into smaller, more discrete features.

Years of futile research led to shutdown of further research at the Bell labs for almost 10 years. However defense research agency ARPA continued research during that time and under their sponsorship, ‘Harpy’ was born at Carnegie Mellon University. Though it was slow, far from real time and required training, it did recognize connected speech within the ambit of 1000 words. It used Hidden Markov Models which are still the most popular model for Speech Recognition.

In 1980’s and 1990’s DARPA (Previously ARPA) floated the same challenge with more stringent performance rules and the results reduced the Word Error Rate from 10% down to a few percent. Another well established school of thought, Artificial Neural Networks, believed that Speech Recognition was basically pattern recognition and brain-like models could possibly lead to brain-like performance too, another dimension to research on speech recognition sprung up.

Microsoft released speech recognition system compatible with Office XP. It too required training, and static environment, and worked for a single user. Further, while demonstrating the Speech Recognition capabilities of Windows Vista, the system performed well while opening and accessing files, but when it came to transcribing documents, it wasn’t very accurate. As the field continues to thrive, more and more companies have started emerging. Dragon Naturally Speaking from Nuance is a popular Speech to Text software. Other companies that compete in this technology include NICE Systems, Verint Systems, Vlingo, Unisys, ChaCha, SpeechCycle, Klausner Technologies, Sensory etc.