Dutch Language Institute
Dutch Language Institute
Dutch Language Institute
Accurate and consistent terminology is essential for effective communication within specialized domains. Terminology management systems support both domain specialists and communication professionals like translators in maintaining terminological consistency. Within these systems, termbases serve as structured datasets that contain domain-specific terminology along with associated information. The compilation of termbases typically proceeds in two steps. First, candidate terms are extracted from a domain-specific text collection or corpus using an automatic term extraction tool. In a second step, a selection of relevant terms is imported into a termbase editor where terminologists can organize them on the concept level and add definitions, translations, examples of contextual use, and other relevant data.
Typically, term extraction and termbase editing are implemented in separate applications. There is a wide range of tools and web applications available, which are usually designed to be language-independent. However, some tools have developed with customized functionality for specific languages. For Dutch, separate applications for term extraction (TermTreffer, van der Vliet 2015) and termbase editing (TermBeheerder) were developed in the early 2010s by two commercial companies at the request of the Union for the Dutch Language (Taalunie). Since 2016 they have been hosted by the Dutch Language Institute (INT -Instituut voor de Nederlandse Taal) in its role as Centre of Expertise for Dutch Terminology (ENT - Expertisecentrum Nederlandstalige Terminologie, Steurs 2021).
Because these two legacy applications are difficult to maintain, the Taalunie and INT have decided to develop a new, free and open-source web application that builds on top of the in-house corpus and database tools of the INT. The application will offer a single integrated environment for both term extraction and termbase editing, so that users do not have to switch between tools and transfer datasets (although export and import functions will still be available). Additionally, the application is designed to be scalable to larger corpora, by leveraging the BlackLab corpus retrieval system (de Does, Niestadt & Depuydt 2017), and to allow the integration of externally developed state-of-the-art modules for term extraction, notably sequential term taggers like D-Terminer (Rigouts Terryn, Hoste & Lefever 2022). This poster presents the results of the pilot study and the developed prototype and outlines the plans and invites feedback for the final release version.
References:
Does, Jesse de, Jan Niestadt en Katrien Depuydt (2017), ‘Creating research environments with BlackLab’. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Pres
Rigouts Terryn, Ayla, Veronique Hoste, en Els Lefever. 2022. ‘D-Terminer: Online Demo for Monolingual and Bilingual Automatic Term Extraction’. In Proceedings of the TERM21 Workshop, 33-40. Language Resources and Evaluation Conference (LREC 2022).
Steurs, Frieda. 2021. ‘Centre of Expertise for Dutch Terminology: A Digital Platform for Professional Language’. Academic Journal of Modern Philology, nr. 13: 315-24.
Vliet, Hennie van der (2015). 'TermTreffer, a term extractor for Dutch.' Poster at Computational Linguistics in the Netherlands 22, Antwerp.