What is Speech Recognition?

Speech Recognition technology refers to the recognition of human speech by computers and then performing a voice initiated program or function. The challenge that is handled so easily by the human brain, of interpreting speech amidst all accents, pitch, tone, articulation, nasality, vocalizations and pronunciation is a challenge when a computer tries to do it. Moreover, the natural voice generation process in humans is a non-linear process which is not only under conscious control but is subject to variations based on factors as diverse as gender, upbringing or the emotional condition. This pattern is further distorted by the presence of noise and echoes in the surrounding environment.

Another challenge is that the speech is seldom discreet; it is always a continuous stream of words, with the pauses in between which are hard to discern. The classic example demonstrating this is to say the words Recognize Speech in varied speeds. Without appropriate pauses, it sounds like Wreck-a-nice beach. The presence of homonyms further aggravates the situation. This not only offers plenty numbers for the processors to crunch, but ample food for thought to innovators and scientists for devising novel means of improving upon the prevailing technologies and developing them into a state of art.