Using Large Language Models for Conventional Metaphor Detection

Jiahui Liang

Leiden University Centre for Linguistics, Leiden university

Stephan Raaijmakers

Leiden University Centre for Digital Humanities, Leiden University

Aletta G. Dorst

Leiden University Centre for Digital Humanities, Leiden University

Jelena Prokic

Leiden University Centre for Digital Humanities, Leiden University

Metaphor detection and interpretation are challenging tasks in Natural Language Processing (NLP). Corpus research indicates that, on average, one in every seven to eight words is used metaphorically, and conventional metaphors account for 99% of all linguistic metaphors within news, literature, academic texts and conversations (Steen et al. 2010), which poses a number of specific challenges. Linguistic metaphors are difficult to process automatically for three reasons:

• First, metaphor interpretation varies depending on the context. For conventional metaphors, their metaphorical meanings have become lexicalized due to high-frequency usage, and sometimes their metaphorical meanings are even more frequent than their non-metaphorical basic meaning (e.g. "inflation" in the economic sense is more frequent than the basic meaning relating to a physical process). Most people do not realize such “dead” metaphors are metaphorical. For machines, traditional word sense disambiguation and metaphorical sense detection are still inadequate to determine whether the contextual sense of a word is metaphorical.

• Second, shared cultural and social knowledge is the basis for linguistic metaphor understanding. Traditional language models trained solely on textual data lack the capability to understand the physical world and build connections between metaphors and the sociocultural knowledge that motivates them (Liu et al. 2022).

• Third, existing models for conventional metaphor detection do not distinguish between different sub-types of metaphor from the perspective of linguistic forms (i.e. word class, signalled or implicit), conceptual structures (novel vs conventional) or communicative functions (deliberate vs non-deliberate).

The emergence of Large Language Models (LLM) creates new possibilities for metaphor detection and sub-type labelling. Recent research indicates that LLMs demonstrate superior performance in language understanding and contextual semantic comprehension compared to previous language models (Zhou et al. 2023). Prompting (or in-context learning) approaches appear useful techniques for the application of LLMs to NLP tasks (Chung et al. 2022). In this study, we explore the capabilities of LLMs on conventional metaphor detection across different prompting setups.

In this talk we present an ongoing project that tests the performance of ChatGPT, BLOOMZ (Muennighoff et al. 2022) and Flan-T5 (Chung et al. 2022) on the task of conventional metaphor detection. In this research, we use a subset of the VUAMC metaphor corpus (Steen et al. 2010) consisting of 3811 sentences. The performance of LLMs is evaluated under the following settings:

1. Zero-shot (bare task description to see what the model does “out of the box”)

2. N>0-shot prompting (providing N examples)

2.1 Labelled examples:

- providing sentences with labels for every word showing whether it is metaphorical

2.2 Reasoning examples:

- Based on 2.1, providing explanation for metaphors in examples

Additionally, we conduct an error analysis to retrieve linguistic information relevant for finetuning LLMs in metaphor detection. By assessing the success of various LLMs in detecting conventional metaphors, the project provides new insights into the impact of different architectures and training data on conventional metaphor detection.

References

Chung, H.W. et al. (2022) Scaling instruction-finetuned language models, arXiv.org. Available at: https://arxiv.org/abs/2210.11416 (Accessed: 15 June 2023).

Liu, E. et al. (2022) ‘Testing the ability of language models to interpret figurative language’, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies [Preprint]. doi:10.18653/v1/2022.naacl-main.330.

Muennighoff, N. et al. (2023) Crosslingual generalization through Multitask Finetuning, arXiv.org. Available at: https://arxiv.org/abs/2211.01786 (Accessed: 15 June 2023).

Steen, G.J. et al. (2010) ‘Metaphor in usage’, cogl, 21(4), pp. 765–796. doi:10.1515/cogl.2010.024.

Zhou, C. et al. (2023) A comprehensive survey on pretrained foundation models: A history from Bert to chatgpt, arXiv.org. Available at: https://arxiv.org/abs/2302.09419 (Accessed: 15 June 2023).

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023