Reticular's Demo

Protein language models like ESM2-3B have revolutionized structure prediction, achieving remarkable accuracy by learning from millions of protein sequences. However, their complex neural network architectures make it difficult to understand how they transform sequence information into structural predictions.

This is Reticular's interactive visualization of our work on mechanistic interpretability of protein structure prediction. We've trained Matryoshka Sparse Autoencoders (SAEs) on ESM2-3B, the base model for ESMFold, to extract thousands of interpretable features that reveal how sequence representations inform structural outcomes.

By uncovering meaningful, human-understandable features within the complex layers of protein language models, we enable both scientific insights into how these models work and practical control over their predictions.

Research Paper

This visualization accompanies our paper "Towards Interpretable Protein Structure Prediction with Sparse Autoencoders" which was accepted to the GEM Bio ICLR workshop.

Read the full paper →

What You Can Do

Explore Features

Explore hierarchically organized protein features discovered by our Matryoshka SAE

Analyze Activations

See how these features activate on different protein sequences

Understand Descriptions

View LLM-generated feature descriptions based on patterns found across 50,000 diverse proteins from SwissProt

Manipulate Structures

Visualize how manipulating specific features affects predicted protein structures

Case Study: Steering Protein Properties

In our research, we demonstrate how targeted feature manipulation can control specific protein properties. The case study visualization shows how steering particular features influences the solvent accessibility of protein structures while maintaining their overall integrity.

This ability to "steer" protein properties opens exciting possibilities for protein engineering applications. By understanding how specific features influence structural outcomes, we can potentially design proteins with desired characteristics more effectively.

Acknowledgments

We thank Lin et al. (Meta AI) for open-sourcing ESM2 and ESMFold. We also acknowledge Simon and Zou (InterPLM, 2024) for pioneering work in protein language model interpretability, as well as Nabeshima (2024) and Bussmann et al. (2024) for their work on Matryoshka SAEs. Contact us for research inquiries and collaboration opportunities.

Reticular

About This Demo