Utrecht University
Utrecht University
Utrecht University
Utrecht University
Utrecht University
Utrecht University
Utrecht University and University Medical Center Utrecht
Text-based Personality Computing (TPC) refers to automatic personality assessment based on text data (e.g., tweets, essays). Many empirical studies and datasets, plus a few survey papers exist. However, there is not yet a position paper that provides an evaluation about the quality of current TPC research, suggests solutions and future directions and combines perspectives from NLP and psychology.
In this paper, we review 60 TPC papers from the ACL Anthology and identify 15 challenges that we consider deserving the attention of the research community. We organize the 15 challenges into the following 6 topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. Furthermore, in light of these challenges, we offer concrete recommendations for future TPC research, which we summarise below:
Personality taxonomies: Choose Big-5 over MBTI; Try modelling facets and using other taxonomies like HEXACO where appropriate.
Measurement quality: Pay attention to measurement error in personality measurements, be they based on questionnaires or models; Try to reduce measurement error by design (e.g., choose higher-quality instruments; use better data collection practices); Provide quality evaluation (i.e., validity and reliability) for any new (and also existing) approaches.
Datasets: Make TPC datasets shareable, which should also contain fine-grained personality measurements and descriptions of the target population;
Performance evaluation: Report a diverse set of performance metrics; Report R2 for a regression task.
Modelling choices: Make use of their psychometric properties when modelling personality traits (e.g., use joint modelling; modify the loss function to preserve the covariance information); For even better predictions, try incorporating personality questionnaire texts, applying data augmentation and dimensionality reduction techniques, as well as incorporating more personality-related variables.
Ethics and fairness: Avoid unnecessary TPC; Apply TPC to clinical, professional and educational settings; Investigate fairness.
Lastly, engage in (interdisciplinary) research work with survey methodologists, psychologists, and psychometricians.
We hope that our paper will inspire better TPC research and new research directions.