Exploring LLMs’ capabilities for Dutch Grammatical Error Detection in L1 and L2 texts

Serafina Van Geertruyen

LT3, Language and Translation Technology Team, Ghent University

Joni Kruijsbergen

KU Leuven

Véronique Hoste

LT3, Language and Translation Technology Team, Ghent University

Orphée De Clercq

LT3, Language and Translation Technology Team, Ghent University

Large language models (LLMs) are likely to become an integral part of written language production. While these can definitely assist with generating text, they cannot replace the critical thinking, creativity, and effective communication that are inherent to strong human writing skills. However, out of the four basic language skills, writing is the one causing the most issues (Herelixca & Verhulst, 2014). In this respect, research has indicated that good corrective feedback can be beneficial for enhancing writing skills (Link, Mehrzad & Rahimi, 2022). But providing such feedback can be extremely time-consuming (Godwin-Jones, 2022) which is why automated writing support systems have been extensively researched.

In this research we explore LLMs’ capabilities for writing error detection, which can be seen as a first step towards automated writing support. A reliable and accurate detection system can effectively identify errors in text, enabling targeted corrections and improving the overall quality of writing (Yuan et al., 2021). Moreover, the pedagogical importance of detection over correction can be stressed, as it promotes self-correction and language learning by helping learners identify and understand their mistakes (Volodina et al., 2023).

Our work focuses on Dutch writing error detection targeting two envisaged end-users: L1 and L2 adult speakers of Dutch. For the natives we could rely on an existing dataset (Deveneyns & Tummers, 2013). For the learners of Dutch we relied on an in-house dataset from the Leuven Language Institute. Following the best practices set out in the recent MultiGED shared task (Volodina et al., 2023) both datasets were split in a train and test set. For both groups we experimented with fine-tuning and combining different mono- (BERTje, RobBERT) and multilingual (mBERT, XLM-RoBERTa) LLMs.

Our results reveal that the multilingual LLMs are better up for the task, which is in line with similar work in other languages. We will thoroughly present and discuss all results while zooming in on the specificities of working with either L1 or L2 data. Given the current rise of generative AI and autoregressive LLMs, we will also discuss how our results compare to prompting ChatGPT.

Deveneyns, A., & Tummers, J. (2013). Zoek de fout; een foutenclassificatie als aanzet tot gerichte remediëring Nederlands in het hoger professioneel onderwijs in Vlaanderen. Levende Talen Tijdschrift, 14(3), 14-26.

Godwin-Jones, R. (2022). Partnering with AI: Intelligent writing assistance and instructed language learning. Language Learning and Technology, 26, 5-24.

Herelixka, C., & Verhulst, S. (2014). Nederlands in het hoger onderwijs: Een verkennende

literatuurstudie naar taalvaardigheid en taalbeleid. Nederlandse Taalunie.

Link, S., Mehrzad, M., & Rahimi, M. (2022). Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement. Computer Assisted Language Learning, 35(4), 605-634.

Volodina, E., Bryant, C., Caines, A., De Clercq, O., Frey, J.-C., Ershova, E., Rosen, A., and Vinogradova, O. (2023). Multiged-2023 shared task at NLP4CALL: Multilingual grammatical error detection. In Proceedings of NLP4CALL 2023, pages 1-16.

Yuan, Z., Taslimipoor, S., Davis, C., and Bryant, C. (2021). Multi-class grammatical error detection for correction: A tale of two systems. In Proceedings of EMNLP2021 pages 8722–8736.

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023