Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution

Daphne van Ginneken, Anamay Samant, Karlis Daga-Krumins, Andreas Agrafiotis, Evgenios Kladis, Sai T. Reddy, Alexander Yermanos*

*Corresponding author for this work

Research output: Working paperPreprintAcademic

Abstract

B cell selection and evolution play crucial roles in dictating successful immune responses. Recent advancements in sequencing technologies and deep-learning strategies have paved the way for generating and exploiting an ever-growing wealth of antibody repertoire data. The self-supervised nature of protein language models (PLMs) has demonstrated the ability to learn complex representations of antibody sequences and has been leveraged for a wide range of applications including diagnostics, structural modeling, and antigen-specificity predictions. PLM-derived likelihoods have been used to improve antibody affinities in vitro, raising the question of whether PLMs can capture and predict features of B cell selection in vivo. Here, we explore how general and antibody-specific PLM-generated sequence pseudolikelihoods (SPs) relate to features of in vivo B cell selection such as expansion, isotype usage, and somatic hypermutation (SHM) at single-cell resolution. Our results demonstrate that the type of PLM and the region of the antibody input sequence significantly affect the generated SP. Contrary to previous in vitro reports, we observe a negative correlation between SPs and binding affinity, whereas repertoire features such as SHM, isotype usage, and antigen specificity were strongly correlated with SPs. By constructing evolutionary lineage trees of B cell clones from human and mouse repertoires, we observe that SHMs are routinely among the most likely mutations suggested by PLMs and that mutating residues have lower absolute likelihoods than conserved residues. Our findings highlight the potential of PLMs to predict features of antibody selection and further suggest their potential to assist in antibody discovery and engineering.
Original languageEnglish
PublisherBioRxiv
Number of pages27
DOIs
Publication statusPublished - 11 Dec 2024

Fingerprint

Dive into the research topics of 'Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution'. Together they form a unique fingerprint.

Cite this