Comparative Evaluation of Topic Generation: Human vs. LLMs

Andriy Kosar

Textgain / University of Antwerp (CLiPS)

Guy De Pauw

Textgain

Walter Daelemans

University of Antwerp

This research delves into topic identification and generation in news texts, drawing on a comparative study of human participants from Belgium, the USA, and Ukraine, and Large Language Models (LLMs). In the first experiment, 110 participants of diverse backgrounds assigned topics to three news texts each. The findings underscored significant variations in topic assignment and naming, indicating a need for new evaluative metrics that move beyond simple binary matches. The second experiment enlisted seven native speakers and two LLMs to generate topics for seven news texts. These generated topics were then anonymously assessed by a jury of three experts, evaluated by the criteria of relevance, completeness, and clarity. Detailed results shed light on the potential use of LLMs for topic detection and underscore the subjective nature of news topic identification by human annotators. The study highlights the need to acknowledge and accommodate the inherent diversity and subjectivity in topic identification, particularly when applying LLMs for topic detection and naming.

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023