Better models and neural networks power improved Google voice search
Google has announced that their research labs have introduced some new models to power Google’s voice search function and dictation on smartphones thanks to the adoption of Connectionist Temporal Classification (CTC) and sequence discriminative training techniques that make up a form of Recurrent Neural Networks (RNNs). The RNN model replaces the Deep Neural Networks (DNNs) that Google adopted in 2012 to replace the Gaussian Mixture Model (GMM) that had been the industry standard for the previous 30 years. Google says the new technology will produce more accurate results, even in noisy environments, and can produce those results much more quickly.
The new RNNs that Google is using make use of feedback loops in order to capture temporal data to feed into the model. Older models only looked for the presence of sounds while this new technology also accounts for where the sound occurs in time relative to other sounds. Google also indicates their new RNN uses memory cells and a gating mechanism to help memorize sound information, which helps to improve accuracy over time.
Once the RNN recognizes all the sounds in a given utterance, Google feeds it through the CTC. This steps helps the technology recognize sounds in an utterance without having to predict the sound instant to instant. Eventually the software builds a sequence of “spikes” that reveal a waveform that can then be processed.
One of the challenges that Google researchers faced was making all of this happen in real-time or as close to it as possible. One of the changes they incorporated was to train the models to use larger audio chunks than the 10 milliseconds traditional methods employed. This move in turn meant less computations had to be performed, making everything work faster. Google researchers also added artificial noise and reverberation to the data to make the technology better at coping with background noise.
The final step in crafting this new technology to be deployed to end users was training the model to output predictions closer to when speech was occurring. Google found that the neural network was so “smart” that it had figured out that it could delay output by around 300 ms so it could listen further ahead for additional speech signals that would help it improve its predictions. Unfortunately, this introduced some additional lag that the researchers had to eliminate.
The bottom line users is that voice searches using the Google app on both Android and iOS should perform much better now.
source: Google
Come comment on this article: Better models and neural networks power improved Google voice search




