Minimising brilliancy bias within GPT-2 by utilising the PALMS method

Tsvetomira Krikoryan

Bachelor's student Artificial Intelligence - Radboud University

Erkan Basar

Bachelor's thesis supervisor, Artificial Intelligence - Radboud University

During early childhood, children aged 5, regardless of gender, tend to perceive their own gender as equally brilliant. However, as they progress to the ages of 6-7, a noticeable disparity emerges. Specifically, girls tend to exhibit less association between brilliance and the female gender.

This phenomenon is widely recognized as brilliancy bias, which refers to the tendency to attribute greater brilliance to one group over another. It is important to note that this bias is not limited to human behavior but is also observed in advanced language models. These models, while impressive, can inadvertently amplify existing biases. Notably, brilliancy bias has been identified within cutting-edge language models such as GPT-2, as well as its successors like GPT-3.

To address this issue, researchers have developed the PALMS (Process for Adapting Language Models to Society) method. This approach involves utilizing a carefully selected sample size of a values-tagged dataset, which is then used to fine-tune the language model. The values-tagged dataset is constructed by taking an existing dataset that contains biases related to brilliancy, and modifying it by inverting gendered words, names, and pronouns. By capitalizing on the bias already present in the model and inverting it, it becomes possible to mitigate biases effectively.

To evaluate the performance of the PALMS method, researchers employ text analysis techniques to investigate the most frequently occurring words associated with each gender. Additionally, lexicon analysis is utilized to delve into the sentiment expressed towards each gender. The findings indicate that both men and women are more frequently associated with words connoting brilliancy, such as "brilliant," "smart," and "genius." Furthermore, the lexicon analysis reveals a significant increase in positive sentiment expressed towards males. Conversely, women experience a substantial boost in terms of the authority and dominance ascribed to them.

Through the utilization of the PALMS method and the subsequent analysis of associated words and sentiment, it becomes evident that biases relating to brilliancy can be mitigated. This work contributes to the ongoing efforts aimed at fostering gender equality and promoting fair representation in language models and, by extension, in society.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips