University of Antwerp
University of Toronto, Department of Computer Science
KITE Research Institute
Vector Institute for AI
Vector Institute for AI
In 2011, IBM’s Watson beat human competitors in the quiz game "Jeopardy!", paving the way for AI systems to be applied in game settings requiring natural language understanding and creative problem solving skills [1]. However, the question remains whether the ever-growing large language models (LLMs) will surpass humans when confronted with challenges relying on cultural references, world knowledge, and semantic and pragmatic inferences.
This study builds on recent research on the Only Connect Walls (OCW) segment of the popular British TV show “Only Connect”, in which each wall consisting of 16 clue words must be grouped into sets of 4 based on heterogeneous connections between the words, with one keyword connecting each set. However, the game is designed to mislead competitors by introducing red herrings and seemingly multiple possible options, but there is only one correct solution for each wall. The OCW-dataset consists of 618 walls, collected from 15 seasons of the show. Previous experiments, such as clustering on static and contextual word embeddings and few-shot prompting LLMs, are still far from human performance [2].
To address this challenge, we propose to explore tree of thought reasoning [3], to benefit from backtracking and self-evaluation of LLMs. Alternatively, we suggest leveraging reasoning over different meanings, synonyms, cultural references, linguistic properties, and semantic and pragmatic inferences of clue words, by investigating few-shot prompting LLMs and training LMs to deduce different senses of a clue word, building on the framework proposed by [4]. By exploiting these word sense expansions, we aim to discover connections between the clue words.
While this is a difficult challenge because of the niche UK-centric knowledge domains and the game’s setup to mislead competitors, we expect to improve on the established baselines by guiding the models through the web of semantic and pragmatic inferences of the clue words and leveraging this world knowledge to reason over connections between the clue words.
This research also paves the way for broader applications of common sense reasoning. By uncovering effective strategies to navigate complex semantic spaces, interpret cultural references, and reason over linguistic properties, our approach is a step towards common sense reasoning across various domains and problem-solving scenarios.
SOURCES
[1] Ferrucci, D., E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty. “Building Watson: An Overview of the DeepQA Project”. AI Magazine, vol. 31, no. 3, July 2010, pp. 59-79, doi:10.1609/aimag.v31i3.2303.
[2] Naeini, S. A., Saqur, R., Saeidi, M, Giorgi, J, and Taati, B. “Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset”. Under review. (2023).
[3] Yao, Shunyu, et al. "Tree of thoughts: Deliberate problem solving with large language models." arXiv preprint arXiv:2305.10601 (2023).
[4] West, Peter, et al. "Symbolic knowledge distillation: from general language models to commonsense models." arXiv preprint arXiv:2110.07178 (2021).