Informatics Institute (University of Amsterdam) - NLP
Computational Humanities (University of Amsterdam) - NLP
Natural Language Processing (NLP) has witnessed significant advancements in recent years, extending its applications to various domains, including the biomedical field. This study focuses on the evaluation of language models' word embeddings in capturing semantic relationships within biomedical terms. Specifically, we assess the performance of BioBERT, BlueBERT, and BERT, language models trained on biomedical text, through an intrinsic evaluation task. Our findings reveal the accuracy of the evaluated models in an outlier detection task on gene names. A domain expert assessed the accuracy, establishing BioBERT at 37.5%, BlueBERT at 47.6%, and BERT at 33.3%. Surprisingly, the results indicate that BioBERT only slightly outperforms the baseline BERT, whereas BlueBERT performed the best. The overall performance of the models, however, remains poor, which could be attributed to the inherent complexity of the task due to the ambiguity of gene names - there is no standardised form and gene names can vary across literature.
This research addresses a notable gap in the current body of literature by undertaking the first-ever intrinsic evaluation of BioBERT, BlueBERT, and BERT in the context of the biomedical domain. Despite the wide adoption of these specialised language models, their ability to capture the nuanced semantics of biomedical language has yet to be thoroughly examined. By conducting an evaluation of the word embeddings produced by these models, this study offers valuable insights into their efficacy and uncovers their potential to enhance language comprehension within the field of biomedical research.