Speech Recognition is the capability of computer systems to understand human voice and convert it into words or computer understandable format, e.g. in a speech recognition enabled integrated music system, you can change song just by voice instruction while driving your car.
Speech is a complex dynamic process with continuous audio stream. There is lack of clear separation in between its components. Its components include language, vocabulary, punctuation, varying speed, tone, ascent etc. Some of these components are difficult to measure independently. So, speech recognition systems are complex to design.
Most of the speech recognition systems uses Hidden Markov Models ( HMM ) to handle its complexity. HMM considers speech as a composition of discrete stationary signals in microseconds. We need to continuously train and optimize speech recognition models with more and more data acquired over a period of time.
Speech recognition systems are providing new dimensions to Human Computer Interaction (HCI). Speech recognition systems are playing vital role in assisting the daily life activities of differently-abled people. A person without hands can easily operate different gadgets with the help of speech recognition system.
Its application areas include speech-to-text converters, digital voice assistants etc. Some of the well known voice assistant systems are Apple’s Siri, Amazon Alexa, Google Assistant and Microsoft’s Cortana.