Center for Language and Speech Technology (CLST), Radboud University, Nijmegen, The Netherlands
Center for Language and Speech Technology (CLST), Radboud University, Nijmegen, The Netherlands
Center for Language and Speech Technology (CLST), Radboud University, Nijmegen, The Netherlands
The SignON Project EU aims to bridge the communication gap between deaf, hard-of-hearing, and hearing individuals by developing a user-centered and community-driven platform. This platform facilitates communication in sign language and spoken languages, including English, Spanish, Dutch, and Irish. Users can interact with the system using text, speech, or sign language through video, while the system responds using various methods such as translated output, subtitles, synthesized speech, and a 3D avatar. The ASR (Automatic Speech Recognition) functionality is delivered with the help of a restful ASR webservice that interacts with the SignOn framework. Initially, we employed the lightweight and low-latency Kaldi ASR models which required a large amount of training data. However, in recent years, pretrained models like Wav2vec 2.0 XLS-R have gained prominence in ASR due to their ability to be fine-tuned with less data and achieve lower word-error rates. Additionally, the Whisper ASR, introduced more recently, offers a universal ASR model that supports over 50 languages and provides superior performance. Recognizing the need for an ASR webservice that freely serves fine-tuned ASR models based on Wav2vec 2.0 and all Whisper models, we developed a new ASR webservice. This service empowers users by allowing them to select the ASR model and languages for transcription, providing them complete control over the ASR process. In this presentation, I will discuss the impact of our work in the SignON project and introduce the newly developed ASR webservice for end-to-end ASR models. We will highlight its benefits and address areas that require further improvement.