Enhancing the Parallel Meaning Bank: Enriched Linguistic Resources and Comprehensive Evaluation with Challenge Sets

Xiao Zhang

CLCG, University of Groningen

Chunliu Wang

CLCG, University of Groningen

Rik van Noord

CLCG, University of Groningen

Johan Bos

CLCG, University of Groningen

The Parallel Meaning Bank is a semantically annotated parallel corpus for multiple languages primarily aimed for developing semantic parsing systems. We present a new release of this linguistic resource with several key changes. These changes include; (1) switching from clause notation to sequence notation for meaning representation; (2) adding Chinese data to the existing languages (English, Dutch, German, Italian); (3) facilitating alignment between words and their meanings; (4) reorganizing the train/development/test sets based on length distribution; and (5) introducing two challenge sets. The first challenge set mainly contains long texts. The second challenge set is generated by re-combining decomposed parts between sentences using Combinatory Categorial Grammar, aiming at testing the ability of compositional generalization not found in the training set. To enable further comparison, we present the performance of previous parsing models on the new release as baselines.

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023