Use of generative pre-trained large language models to predict suicide risk on social media texts

Luca Leone

Tilburg University

Eva Vanmassenhove

Tilburg University

Depression is considered one of the most pressing public health concerns, affecting millions of people worldwide (King & Vidourek, 2012). Especially among teenagers, there is a clear sign that the percentage of people affected by depression is increasing (King & Vidourek, 2012). If undetected or left untreated, people affected by depression may intentionally harm themselves, or even decide to take their own life (Haque et al., 2021). In 2015, 1.1% of all deaths in the European Union were due to intentional self-harm (Eurostat, 2018), making suicide a major public health problem. Being able to predict suicide at an early stage may help prevent a part of those deaths, leading to a higher life expectancy, and better life quality (Łyszczarz, 2021). Machine Learning (ML) methods and Natural Language Processing have proven to be very valid instruments for the purpose of early predicting suicide risk (Abdulsalam & Alhothali, 2022; Haque et al., 2021; Ji et al., 2020; Liu et al., 2020).

Previous research and existing methods have approached this task by using data collected in health records, publicly available questionnaires, and suicide notes (Haque et al., 2021; Ji et al., 2020). Accessing and collecting those data is, however, very complicated, resulting in limited datasets, and outcomes difficult to generalise from (Haque et al., 2021). Conversely, social media are used by millions of people, particularly youngsters, to share their feelings, concerns, and opinions. The usage of social media for the detection of suicide risk has been proven to be very effective by multiple studies (Abdulsalam & Alhothali, 2022; Haque et al., 2021; Ji et al., 2020; Shen & Rudzicz, 2017), using, for instance, time-aware transformers (STATEnet) (Sawhney et al., 2020), or complex embedding mechanisms (Liu et al., 2020).

Recent open-source models, GPT-Neo4 and BLOOM5 , have shown promising results in analysing natural language and optimising classification (Black et al., 2022; Scao et al., 2022). Current research has not yet explored the potential of using such pre-trained large language models to identify suicide risk. As such, in this paper, we investigate the effectiveness of using generative pretrained large language models, specifically BLOOM and GPT-Neo, in detecting suicide risk compared to encoder-only embedding models. Furthermore, given that ensemble models have proven to be successful in enhancing a model’s average performance (Gao et al., 2021), we investigate whether ensembling BLOOM and GPT-Neo can further improve the performance of those models.

With our paper we aim to address the growing need for accurate and timely identification of individuals at risk of suicide. To overcome the shortage of available data, the study utilizes data collected from Reddit and highlights the potential of using social media data for suicide risk detection. The results demonstrate that both BLOOM and GPT-Neo models, as well as their ensembling, outperform encoder-only embedding models in terms of AUC scores. Additionally, SHAP values are calculated to provide insight into the model’s classification decisions. The findings suggest that pre-trained large language models are a promising alternative to encoder-only embedding models for suicide risk detection.

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023