Who are the haters? A corpus-based demographic analysis of authors of hate speech

Lisa Hilte

LH: CLiPS, University of Antwerp

Ilia Markov

IM: CLTL, Vrije Universiteit Amsterdam

Nikola Ljubešić

NL: Jožef Stefan Institute, Ljubljana; University of Ljubljana; Institute of Contemporary History, Ljubljana

Darja Fišer

DF: Jožef Stefan Institute, Ljubljana; University of Ljubljana; Institute of Contemporary History, Ljubljana

Walter Daelemans

WD: CLiPS, University of Antwerp

We examine the profiles of hate speech authors in a multilingual dataset of Facebook reactions to news posts discussing topics related to migrants and the LGBT+ community. The included languages are English, Dutch, Slovenian, and Croatian. First, all utterances were manually annotated as hateful or acceptable speech. Next, we used binary logistic regression to inspect how the production of hateful comments is impacted by authors' profiles (i.e., their age, gender, and language). Our results corroborate previous findings: in all four languages, men produce more hateful comments than women, and people produce more hate speech as they grow older. But our findings also add important nuance to previously attested tendencies: specific age and gender dynamics vary slightly in different languages or cultures, suggesting that distinct (e.g., socio-political) realities are at play. Finally, we discuss why author demographics are important in the study of hate speech: the profiles of prototypical "haters" can be used for hate speech detection, for sensibilization on and for counter-initiatives to the spread of (online) hatred.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips