Controversy Detection in Dutch News

Hannah van Goor

Vrije Universiteit Amsterdam

Anil Yaman

Vrije Universiteit Amsterdam

João Pedro Correia dos Reis

DEUS AI

Niya Stoimenova

DEUS AI

Navigating controversial topics in news plays a vital role in fostering social awareness, promoting civil discourse and combating online polarization. The ability to anticipate whether a particular news post will be controversial can, in its function in a bigger system, assist in achieving significant and positive outcomes for reducing polarization, by for example exhibiting opposing views on controversial posts. The benefits of being exposed to a wide range of viewpoints have been widely proven and technology could be utilized to expand people’s perspectives. In this research, we investigate which model(s) prove insightful in predicting controversial Dutch news posts. We propose a variety of content-based generalizable modelling approaches to predict controversy in Dutch news posts. Furthermore, we developed the first sizeable data set regarding controversy detection in the Netherlands, of 10k news posts obtained from the 10 largest Dutch news sources, annotated with an entropy measure over the Facebook reactions serving as proxy for controversy. Three different vectorization techniques have been tested; tf-idf, Word2Vec and BERT embeddings. Moreover, a range of traditional machine learning regressors as well as a language model approach have been implemented. As baseline, a dummy regressor which always predicts the mean entropy of the text per source is used. All experiments are set up in a pipeline with hyperparameter tuning and 10-fold cross validation to evaluate the models. The language model yields the best mean squared error: 0.099 (baseline mse: 0.11). Most models outperform the baseline and are thus reasonably successful in predicting controversial news posts. Nevertheless, our work is grounded in the understanding that its effectiveness and ethical implications are deeply intertwined with the socio-technical ecosystem and the actual environment in which it will operate.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips