Interplay between linguistic alignment and sentiment in online discussions

Suzanna Wentzel

Universiteit Twente

Mariët Theune

Universiteit Twente

Sumit Srivastava

Universiteit Twente

Doina Bucur

Universiteit Twente

When people hold conversations, they tend to adapt to each other in many different behaviors, one of which is linguistic alignment (converging on word use, word categories and structures). It is the primary form of accommodation in online conversations (Wang et al., 2017). Furthermore, previous research indicated that there could be an interplay between linguistic alignment and sentiment (Bernhold & Giles, 2020; Niederhoffer & Pennebaker, 2002), though the studies contradict in what interplay that would be. This study therefore investigates the following:

How does linguistic alignment relate to the expressed sentiments of interlocutors in forum posts about political topics?

It is at the time of writing still a work in progress, but the results should be ready at the conference.

A side contribution is that this research tackles challenges that have not been solved before. Online discussions are multi-party, but previous research focuses on two-party conversations. Other challenges are how to construct conversation threads, as online fora have multiple ways to view discussions. In addition, there are many methods of extracting linguistic alignment and many ways to perform sentiment analysis. It is not yet clear which methods suit posts in online discussions best. This research proposes some methods and will give insight in overcoming these challenges.

The dataset used for this study is the Internet Argument Corpus (v2) (Abbott et al., 2016). It contains online multi-party discussions about several political topics, extracted from a discussion forum. Investigating the dataset showed that discussions range between 2 and 1291 posts. On average, there are 4 posts per author, but it ranges from 1 to 285 posts. Creating a heatmap of the number of authors per discussion length showed that the number of authors in discussions flattens as discussions get longer.

We have started investigating lexical alignment in a simplified way using the Jaccard Similarity, which can measure multi-set overlap. Initial results show that the average overlap between posts in discussions ranges between 0 and 0.2 (where 0.5 is the maximum overlap).

Next, we plan to investigate the change of alignment over posts for all discussions. With time-series clustering, classes of trends in the alignment over time are extracted. Then, sentiment analysis will be performed with Vader, a lexicon and rule-based sentiment analysis tool specifically designed for texts on social media (Hutto & Gilbert, 2014). Again, changes in sentiment over posts will be computed for all discussions, and trends are investigated with time-series clustering. With the results of these two studies, we plan to examine the interplay between the two phenomena.

Abbott, R., Ecker, B., Anand, P., & Walker, M. A. (2016). Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 4445–4452.

Bernhold, Q. S., & Giles, H. (2020). Vocal Accommodation and Mimicry. Journal of Nonverbal Behavior, 44(1), 41–62.

Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), Article 1.

Niederhoffer, K. G., & Pennebaker, J. W. (2002). Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology, 21(4), 337–360.

Wang, Y., Reitter, D., & Yen, J. (2017). How Emotional Support and Informational Support Relate to Linguistic Alignment (p. 34).

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips