Session: Leveraging Pre-Trained Transformer Models for Protein Function Prediction

Transformer-based models, such as ProtGPT2 and ESM, are revolutionizing protein sequence analysis by enabling detailed embeddings and advanced function prediction. This talk provides a hands-on introduction to using pre-trained open-source transformer models for generating protein embeddings and leveraging them for classification tasks. Attendees will learn to tokenize sequences, extract embeddings, and implement machine-learning pipelines for protein function annotation based on Gene Ontology (GO) or Enzyme Commission (EC) numbers. This session will showcase how pre-trained transformers can democratize access to advanced protein analysis techniques while addressing scalability and explainability challenges. After the talk, the speaker will provide a notebook to test basic functionality, enabling participants to explore the concepts discussed.

Presenters: