LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
LT3, Language and Translation Technology Team, Ghent University
The advent and popularisation of Large Language Models (LLMs) have given rise to prompt-based NLP techniques which eliminate the need for large manually annotated corpora and computationally expensive training or fine-tuning processes. Zero-shot learning (ZSL) in particular presents itself as an attractive alternative to the classical train-development-test paradigm for many downstream tasks as it provides a quick and inexpensive way of leveraging implicitly encoded knowledge in LLMs.
Despite the large interest in zero-shot applications within the domain of NLP as a whole, there is often no consensus on the methodology, analysis and evaluation of zero-shot pipelines. As a tentative step towards finding such a consensus, this work provides a detailed overview of available methods, resources, caveats and evaluation strategies for zero-shot prompting within the Dutch language domain.
Additionally, we present a centralised zero-shot benchmark for a large variety of Dutch NLP tasks using a series of standardised data sets and unified evaluation strategies. To ensure that this benchmark is representative, we investigate a selection of diverse prompting strategies and methodologies for a variety of state-of-the-art Dutch Natural Language Inference models, masked language models (BERTje, RobBERT) and autoregressive language models (Dutch GPT2, Flan-T5). As evaluation tasks, we include span detection (aspect extraction, event detection, event argument extraction) and classification tasks (sentiment analysis, emotion detection, irony detection, news topic classification and die/dat prediction). These tasks also vary in subjectivity and domain, ranging from more social (emotion and irony detection for social media) to factual tasks (topic classification and event detection for news).