On the portability of economic event detection and event-based sentiment analysis to under-resourced languages: a case study for Russian

Natalia Sugrobova

LT3, Ghent University

Loic De Langhe

LT3, Ghent University

Veronique Hoste

LT3, Ghent University

Since the early 2000s, when research on sentiment or opinion analysis gained momentum, the interest has gradually shifted to more fine-grained approaches focusing on emotions, on aspects, or on implicit expressions of sentiment. In the latter case, individuals do not explicitly express their opinion on a certain topic, but a positive or negative impression of it can nonetheless be inferred based on world knowledge, cultural or historical context, a phenomenon which Toprak et al. (2010) denoted as ‘polar facts’. An interesting genre for investigating polar facts is financial newswire text, as it is widely acknowledged that how companies are perceived by investors and how they react, is influenced by the news events published about those companies (e.g. Tetlock 2007). Economic news events carry factual information but are often also characterized by a sentiment presenting subjective information about the event (Van de Kauter et al. 2015).

The present research is aimed at investigating to what extent an event-based sentiment analysis methodology developed for a high-resourced language, such as English (Jacobs 2021), can be ported to a lower-resourced language, Russian in this study. For doing so, economic texts taken from the online news platforms Kommersant (ru: Коммерсантъ) and Vedomosti (ru: Ведомости) was labeled with information on event triggers, participant arguments, event coreference, and event attributes (type, subtype, negation, and modality). Furthermore, the event annotations were enriched with sentiment annotations for explicit and implicit polarity (positive, negative, or neutral investor sentiment) resulting in a test set of 41 documents, 790 event triggers and 1495 sentiment annotations, which was subsequently translated into English using DeepL. 3 sets of classification tasks were performed: event extraction and classification using variations of the BERT model architecture within the DYGIE++ framework for token representations (Wadden et al., 2019), coarse-grained sentiment analysis on sentence level exploiting different LLM architectures and <polar span, target span, polarity> triple extraction using the Grid-based Tagging Schema (GTS) model proposed by Wu et al. (2020). We will discuss the (mixed) results for each task, also focusing on a qualitative analysis of the specific challenges involved when porting a system developed for English to a machine-translated morphologically more complex language as Russian.

Jacobs, G. (2021) Extracting Fine-Grained Events and Sentiment from Economic News. PhD Thesis. Ghent University.

Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139–1168.

Toprak, C., Jakob, N., & Gurevych, I. (2010). Sentence and expression level annotation of opinions in user-generated discourse. In Proceedings of ACL 2010 (pp. 575–584). Uppsala, Sweden.

Van De Kauter, M., Breesch, D., & Hoste, V. (2015a). Fine-grained analysis of explicit and implicit sentiment in financial news articles. Expert Systems with Applications, 42 (11), 4999–5010.

Wadden, D. et al. (2019) ‘Entity, Relation, and Event Extraction with Contextualized Span Representations’, in Proceedings of EMNLP-IJCNLP 2019 (pp. 5784–5789). Hong Kong, China.

Wu, Z. et al. (2020) ‘Grid Tagging Scheme for Aspect-oriented Fine-grained Opinion Extraction’, in Findings of the Association for Computational Linguistics: EMNLP 2020. (pp. 2576–2585). Online

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023