The Dutch Law as a Semantic Role Labeling Dataset

Romy van Drie

TNO

Maaike de Boer

TNO

Roos Bakker

TNO

Ioannis Tolios

TNO

Daan Vos

TNO

[NOTE: REPEAT OF PRESENTATION TO BE GIVEN AT ICAIL (JUNE 2023) AND PAPER TO BE PUBLISHED IN CONFERENCE PROCEEDINGS]

Legal documents, and specifically law texts, are not easy to understand

by humans. The specific terminology and sentence constructions

are particular, which also makes it a difficult machine

understanding task. In this paper, we present a publicly available

benchmark dataset containing Dutch law texts which can be used

to train AI models that assist humans equipped with the task of

interpreting legal texts. However, the dataset can be used in a

broader context, such as semantic role labeling of Dutch (legal)

texts. Our dataset contains 4463 annotated sentences from 55 different

Dutch laws, in which four roles are annotated by human

annotators: action, actor, object and recipient. The inter-annotator

agreement is substantial (𝜅=0.75). In experiments with a rule-based

and a transformer-based method, results show that the transformer-based

method performs quite well on the dataset (accuracy > 0.8).

These results indicate that we can reliably predict actions, actors,

objects and recipients in legal texts. This can help people equipped

with the task of formal interpretation of legal texts.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips