On Text-based Personality Computing: Challenges and Future Directions

Qixiang Fang

Utrecht University

Anastasia Giachanou

Utrecht University

Ayoub Bagheri

Utrecht University

Laura Boeschoten

Utrecht University

Erik-Jan van Kesteren

Utrecht University

Mahdi Shafiee Kamalabad

Utrecht University

Daniel L Oberski

Utrecht University and University Medical Center Utrecht

Text-based Personality Computing (TPC) refers to automatic personality assessment based on text data (e.g., tweets, essays). Many empirical studies and datasets, plus a few survey papers exist. However, there is not yet a position paper that provides an evaluation about the quality of current TPC research, suggests solutions and future directions and combines perspectives from NLP and psychology.

In this paper, we review 60 TPC papers from the ACL Anthology and identify 15 challenges that we consider deserving the attention of the research community. We organize the 15 challenges into the following 6 topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. Furthermore, in light of these challenges, we offer concrete recommendations for future TPC research, which we summarise below:

Personality taxonomies: Choose Big-5 over MBTI; Try modelling facets and using other taxonomies like HEXACO where appropriate.

Measurement quality: Pay attention to measurement error in personality measurements, be they based on questionnaires or models; Try to reduce measurement error by design (e.g., choose higher-quality instruments; use better data collection practices); Provide quality evaluation (i.e., validity and reliability) for any new (and also existing) approaches.

Datasets: Make TPC datasets shareable, which should also contain fine-grained personality measurements and descriptions of the target population;

Performance evaluation: Report a diverse set of performance metrics; Report R2 for a regression task.

Modelling choices: Make use of their psychometric properties when modelling personality traits (e.g., use joint modelling; modify the loss function to preserve the covariance information); For even better predictions, try incorporating personality questionnaire texts, applying data augmentation and dimensionality reduction techniques, as well as incorporating more personality-related variables.

Ethics and fairness: Avoid unnecessary TPC; Apply TPC to clinical, professional and educational settings; Investigate fairness.

Lastly, engage in (interdisciplinary) research work with survey methodologists, psychologists, and psychometricians.

We hope that our paper will inspire better TPC research and new research directions.

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023