Representational Biases in Emergent Communication in a Multi-Agent System

Daniël Akkerman

Department of Cognitive Science and Artificial Intelligence, Tilburg University

Phong Le

Amazon Alexa, Cambridge

Raquel G. Alhama

Department of Cognitive Science and Artificial Intelligence, Tilburg University

Language emergence in multi-agent systems is often studied using a referential game. This setup typically involves two agents, a sender and a receiver, which need to develop a communication protocol to refer to a target. The task of the sender is to generate a message to refer to the target, while the receiver has to decode the message to pick the right target from a set that includes both the target and a number of distractors.

Existing work almost exclusively uses images as the data source from which targets and distractors are selected (see Lazaridou and Baroni 2020 for a survey, and Słowik et al. 2021 for an exception). This is a natural choice to recreate a visually-grounded scenario for language emergence, but the downside is that agents need to perform low-level processing, since information in images is distributed over pixels. Human language, however, may build upon higher-level conceptual representations. Prior work has shown that agents do not readily induce a conceptual system from the images while playing a referential game (Bouchacourt and Baroni, 2018). In our work, we simulate a scenario in which the conceptual representations have already been induced from the visual input, before the agents are engaged in a communicative task. In other words, we explore the idea that the conceptual system preceded the emergence of language.

To that aim, we perform a series of carefully controlled experiments that investigate the impact of using lower- or higher-level input in a dyadic referential game using Deep Learning agents. In particular, we represent higher-level input with directed labeled graphs. In order to maximize the alignment between the images and graphs, we generate a simple customized dataset. The images are made of combinations of 50 basic shapes positioned in two-by-two grids, while the graphs represent those shapes as nodes, and their relative position as labels in the connecting edges.

Our studies manipulate the amount of novelty in the test data with relation to the training data (e.g. unseen shapes). Current results show that graph-based agents outperformed image-based models in terms of communication success. The graph-based models showed higher accuracy, and scale better than image-based models when increasing the amount of distractors. Our work is still ongoing, but preliminary results suggest that graph-based models perform best in all the conditions above, lending support to the idea that higher-level representations may provide the right inductive biases for agents to develop more structured languages.

Bouchacourt, D., & Baroni, M. (2018). How agents see things: On visual representations in an emergent language game. arXiv preprint arXiv:1808.10696.

Goyal, A., & Bengio, Y. (2022). Inductive biases for deep learning of higher-level cognition. Proceedings of the Royal Society A, 478(2266), 20210068.

Lazaridou, A., & Baroni, M. (2020). Emergent multi-agent communication in the deep learning era. arXiv preprint arXiv:2006.02419.

Słowik, A., Gupta, A., Hamilton, W. L., Jamnik, M., Holden, S. B., & Pal, C. (2021). Structural Inductive Biases in Emergent Communication. arXiv preprint arXiv:2002.01335

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips