Speech Recognition: Connectionism

Connectionism/ Artificial Neural Networks

After gaining much success in classifying the speech segments as voiced/unvoiced or nasal/plosive, researchers moved on to phoneme classification which achieved very competitive results. There are two approaches towards the speech recognition problem: Static and Dynamic.

In static approach, the whole voice segment is considered at once and decision is made. Inputs are applied to multilayer perceptrons with hidden units, which also function as feature detectors and help in the classification of important classes of sound like vowels or consonants with high accuracy. The classification decision is then taken as output.

In Dynamic approach, methods like Time Delay Neural Networks (TDNN) and Recurrent Neural Nets have been used. Here, the neural network makes a local decision seeing a small window in contrast to static methods where complete frame is used, and the decision is then integrated to get a global decision. Where static approach seems to give good results in phoneme classification, dynamic approach fairs better with words and sentences.

Current Scenario

Speech recognition was initially built for doing the work of a medical transcriptionist. No wonder it was not possible at that time given the infrastructure and the limited advance of technology, it now seems to be gathering steam around developers again, especially military. Various militaries are not only putting great effort in improving the technology for medical purposes but also for achieving a tactical edge in combat machinery. Fighter jet cockpits are being fitted with SR devices which can help the pilot do various non-critical tasks with vocal commands. Its performance at high G’s is being tested and worked upon.

US military’s F-16, French Mirage, UK’s Eurofighter Typhoon and Swedish Gripen are all examples of aircrafts where such technology is being deployed. In Helicopters, the Pilot seldom wears a face mask thus involving more background noise making it harder for the SR system to interpret the commands. This is an additional challenge to make a robust system. Air Traffic Controller trainers are employing such techniques to replace the actual pilot who otherwise has to interact with ATC trainees, thus reducing the precious workforce required for such tasks. Microsoft’s Tellme and Yahoo’s oneSearch have been constantly providing improved voice searching capabilities in some parts of the world. IVR systems are constantly evolving and so are desktops with SR capabilities.

Modern vehicles are being fitted with SR systems to provide enhanced accessibility. Ibn Sina is being developed as a talking humanoid in an advanced research lab in UAE as a multilingual platform. The horizons are widening with each improvement and a day may soon come when a tin box is actually conversing with you in a heart to heart talk.