Transforming speech recognition into text is a complex process associated with Machine Learning or Deep Learning. It involves a machine or program which identifies words and phrases spoken in a specific language and converts them into text.
But you can use an alternate method for the same with a Google speech recognition API for your high school python course. It requires a basic knowledge of Python and involves a simple process.
Siri, Google Assistant, Cortana, and other virtual assistants who interact through voice, use this technology, and there may be widespread use of these technologies in coming years.
These speech recognition technologies, consisting of premium category products, supports elderly people and physically or visually empowered individuals to interact conveniently.
What You Need for Transcribing
Before going through the tutorial, you have to install the following Python library on your desktop or laptop:
To complete this tutorial, you need to have the following Python library installed on your Machine:
- Py Audio Library
- Speech Recognition Library
With Speech Recognition, you can recognize speech with the help of several online and offline engines and APIs. The engines that are supported are:
- CMU Sphinx, which works offline
- Google Speech Recognition
- Google Cloud Speech API
- Houndify API
- Microsoft Bing Voice Recognition
- IBM Speech to Text
- SnowboyHotword Detection – This also works offline.
Google Speech Recognition API is free for basic uses, so you can use it for your high school python course. But it may have a limitation on the volume of requests that you can send during a specific period. In the following tutorial, Google Speech Recognition API has been used to perform Speech Recognition. The sound is directly fed from the microphone using Audio Source from File.
- Speech recognition from microphone
You have to record the audio from the microphone using PyAudio and send it to Google Speech to text recognition engine. The engine will perform the recognition and return the transcribed text.
- Speech recognition from audio file
In this process, speech recognition is done from the audio line, and you have to change only one line of code. Instead of a Microphone as a source of audio, you have to give a path to your Audio File that you want to transcribe to text.
- Speech recognition from a long audio source
In the case of very long audio, the process can slow down if you have to load the whole audio to Memory and send it over API. Instead, you can split the long audio source into small chunks and do the speech recognition on individual chunks.
You may use PyDub to split the Long Audio Source into small chunks. For installing PyDub, you have to use pip.
Now that you know the process of incorporating speech recognition in a project for your high school python course, you will see that it is easier than you had expected it to be. The speech support system is of great help for the elderly and physically challenged individuals in many households, and their use may increase to a large extent in the future.