Skip to main navigation Skip to search Skip to main content

Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution

  • Daphne van Ginneken
  • , Anamay Samant
  • , Karlis Daga-Krumins
  • , Wiona Glänzer
  • , Andreas Agrafiotis
  • , Evgenios Kladis
  • , Sai T Reddy
  • , Alexander Yermanos*
  • *Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

7 Downloads (Pure)

Abstract

B cell selection and evolution play crucial roles in dictating successful immune responses. Recent advancements in sequencing technologies and deep-learning strategies have paved the way for generating and exploiting an ever-growing wealth of antibody repertoire data. The self-supervised nature of protein language models (PLMs) has demonstrated the ability to learn complex representations of antibody sequences and has been leveraged for a wide range of applications including diagnostics, structural modeling, and antigen-specificity predictions. PLM-derived likelihoods have been used to improve antibody affinities in vitro, raising the question of whether PLMs can capture and predict features of B cell selection in vivo. Here, we explore how general and antibody-specific PLM-generated sequence pseudolikelihoods (SPs) relate to features of in vivo B cell selection such as expansion, isotype usage, and somatic hypermutation (SHM) at single-cell resolution. Our results demonstrate that the type of PLM and the region of the antibody input sequence significantly affect the generated SP. Contrary to previous in vitro reports, we observe a negative correlation between SPs and binding affinity, whereas repertoire features such as SHM and isotype usage were strongly correlated with SPs. By constructing evolutionary lineage trees of B cell clones from human and mouse repertoires, we observe that SHMs are routinely among the most likely mutations suggested by PLMs and that mutating residues have lower absolute likelihoods than conserved residues. Our findings highlight the potential of PLMs to predict features of antibody selection and further suggest their potential to assist in antibody discovery and engineering.

Original languageEnglish
Article numberbbaf418
Number of pages11
JournalBriefings in bioinformatics
Volume26
Issue number4
DOIs
Publication statusPublished - 2 Jul 2025

Fingerprint

Dive into the research topics of 'Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution'. Together they form a unique fingerprint.

Cite this