ODWN, OMW: issues when dealing with spoken languages, but especially also with sign languages

Ineke Schuurman

KU Leuven

Bram Vanroy

KU Leuven

Vincent Vandeghinste

Instituut voor de Nederlandse Taal / KU Leuven

Caro Brosens

Vlaams GebarentaalCentrum

Margot Janssens

Vlaams GebarentaalCentrum

Thierry Declerck

Deutsche Forschungszentrum für Künstliche Intelligenz GmbH

Sam Bigeard

Universitat Hamburg

WordNets, lexical databases that link words based on their semantic relations, exist for many spoken languages (SpLs), but they are scarce for Sign Languages (SLs). Work is ongoing to link signs of several SLs to WordNets, and linking them with other SLs using Open Multilingual WordNet (OMW).We are taking the first steps into creating SignNets for VGT (Vlaamse Gebarentaal; Flemish Sign Language) and to a lesser extent for NGT (Nederlandse Gebarentaal; Dutch Sign Language). Doing so, we stumbled on some issues. In part, they concern general semantic and regional variation in Spoken Dutch that are difficult to map to WordNets and SignNets. Several of them are related to the quality of Open Dutch WordNet which was created through a forced transition from Cornetto. Many adjectives, for example, are lacking, or interpreted as nouns. ‘Zwart’ (black) is only recognized as a noun, in a synset (set of synonyms) with ‘moor’ and other (derogatory) words to denote black people. ‘Rood’ is also just a noun, and ‘rood’ and ‘keel’ (throat) constitute a synset. The adjectives are simply missing.

Several synsets are also incomplete. Often words used in Flanders (in newspapers, news on

radio and tv) are lacking (like for example microgolf (microwave), only ‘magnetron’ is mentioned). Or polysemous words are not all included. ‘Lopen’ can mean ‘to walk’ and ‘to run’, the second reading, (commonly used in Flanders) is lacking. On the other hand, ‘voormiddag’ and 'namiddag’ are only mentioned in their Flemish usage whereas in the Netherlands often a different nuance in terms of time frame will be intended.

These are shortcomings that are important for us when designing the framework for SignNets but that are also of interest for people who are working on the semantics of spoken languages.

When dealing with SLs, other hurdles are to be taken as well. It is, for example, not exceptional to have to split the meaning of a word between a meaning for spoken language and one for sign language. In many SLs there will be different signs for ‘applause’ (hearing people) and ‘applause’ (deaf people). In most SpLs there aren’t yet words to distinguish between these. ‘Handgeklap’ (clapping) will only belong to the synset of the first type of applause. The second type of applause cannot be named (unless using a special identifier like ‘applaus(doof)’). Other times existing synsets may need to be split up in order to be able to link proper synsets to signs. This will quite often be the case when iconic signs are involved, for example whether a vertical or a horizontal movement is involved (vertical: ‘een schilderij aan de muur hangen’ (to hang a painting on the wall) vs horizontal: ‘een aanhangwagen aan de auto hangen’ (to hang a trailer on the car)). This often results in two or more signs.

In this submission we will discuss the challenges that we are faced with and the course of action that we take to mitigate them, both in terms of WordNet and in light of new SignNets.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips