Neural network-based models for speech recognition systems (CA,SL,EN,ES,DE,PT,IT,NL,FR)
These are automatic speech recognition models for Catalan/Valencian (CA), Slovenian (SL), English (EN), Spanish (ES), German (DE), Portuguese (PT), Italian (IT), Dutch (NL) and French (FR). For each of these languages, an acoustic model and a language model are included. They solve the problem of automatic transcription/subtitling, on a deferred basis and in real-time, of video and audio files/signals. They are state-of-the-art models, similar to those used by the best current commercial systems of large technology companies. They are based on large amounts of data from different sources (educational, television, journalistic, legal, etc.), which allows highly accurate results to be obtained in various fields. Our systems based on these models have won first place in international competitions (RTVE-IberSpeech TV Speech-to-Text Challenge 2018; International Conference on Machine Translation WMT18 and WMT19). The MLLP-VRAIN group of the UPV can customize the models by adapting them to the client’s domain to increase the accuracy of the results. From a technical point of view, acoustic models are hybrids that combine continuous hidden Markov models and deep neural networks; Language models are statistical, neural, or combinations of both. All of them have been properly trained. The automatic speech recognition software (registered in UPV, ref. S-19912-2018) requires an acoustic model and a language model, corresponding to the same language, to perform automatic transcriptions in that language.