TT-Streaming: RPC API for Transcription and Translation of Live Audio Broadcasts
Artificial intelligence technologies related to language processing, such as automatic speech recognition (ASR) or automatic translation (MT), are key to, mainly, guaranteeing universal accessibility of audiovisual content on a large scale. Its application can cover both on-demand content (off-line) and live broadcast content (on-line or streaming), of any type (e.g. informative, sports, educational, entertainment).
The automatic generation of multilingual subtitles provides great benefits beyond accessibility, for example: content indexing, semantic search and recommendation, automatic generation of summaries, etc. These technologies are also an essential requirement for the operation of chatbots, personal assistants (e.g. smart speakers) and voice control systems (ubiquitous in vehicles). The deployment of this type of technologies in educational repositories, televisions, and content management platforms (CMS), for audiovisual content consumed on demand (off-line), is already covered by the group’s software ‘S-17943-2016 TLP: The transLectures-UPV Platform. Multilingual subtitling and text translation for MOOCs and media repositories’). In fact, this software is currently in operation, offering automatic multilingual subtitling services to a multitude of national and international organizations and institutions, including the institutional repository UPV[Media] (poliMedia). However, this software is not designed for the processing of audiovisual content broadcast live, nor for use in contexts that require an immediate response from the system, such as personal assistants, chatbots, or voice control systems.
The software ‘TT-Streaming: RPC API for transcription and translation of live audio streams’; It responds to the growing interest and need to provide these transcription and translation services in real time or quick response. This software is offered as SaaS (Software as a Service), and implements an API (Application Programming Interface) based on the standard RPC (Remote Procedure Call) protocol, which allows the transcription and translation in real time of continuous audio streams, internally using the transcription (ASR) and translation (MT) systems of the MLLP-VRAIN group.
These systems are at the forefront of technology and surpass, in quality, and by far, the ASR systems of Google Speech-To-Text Cloud in tasks representative of the real world, such as television content (RTVE, À Punt), educational content (poliMèdia, VideoLectures.NET), and informative (TED), in different languages: Valencian, Spanish, English and Slovenian. This software will therefore allow the UPV to offer cutting-edge real-time transcription and translation services at low cost, a capability that is only offered today, in origin, by a small group of large technology companies (Google, Microsoft).