MIMIC is a generative multimodal foundation model that jointly models DNA, RNA, proteins, and cellular context in one framework.
Most biological AI systems treat sequence, structure, and function as separate tasks. MIMIC instead learns a shared distribution over molecular states, enabling any-to-any inference and design across modalities.
- Biological function emerges from coupled constraints across sequence, structure, regulation, and context.
- Single-modality models miss information that is available in complementary modalities.
- Many high-value problems are inverse problems: generate sequences that satisfy desired structural or regulatory outcomes.
- Any-to-any generation: Condition on any observed subset of modalities and infer the rest.
- Splicing prediction and design: Improves splice prediction and enables targeted sequence redesign under fixed constraints.
- Protein design: Uses multimodal conditioning (e.g., backbone + surface context) to generate diverse high-confidence binders.
- RNA structure support: Predicts probing-like reactivity tracks that improve downstream RNA secondary-structure inference.
- Transfer learning: Delivers strong performance across diverse RNA and protein downstream benchmarks.
- ~1B parameter encoder-decoder transformer
- Split-track multimodal representation (nucleic acid, protein, semantic context, etc.)
- Localized positional encoding within each track
- Register-token compression for global molecular context
- Multi-pathway training for partially observed modality combinations
- Curriculum scaling of context length (1k to 10k tokens)
LORE aligns heterogeneous molecular data into coherent, partially observed examples with shared transcript/protein anchors.
Scale highlights:
- 13M RNA transcripts
- 15.5M proteins
- 4B+ natural language tokens
- 6000+ organisms
- Paper: arXiv:2604.24506
- Blog post: MIMIC announcement and technical overview
MIMIC model code/weights and LORE release assets are in preparation for public release.
If you use this work, please cite:
@article{golkar2026mimic,
title={MIMIC: A Generative Multimodal Foundation Model for Biomolecules},
author={Golkar, Siavash et al.},
year={2026},
eprint={2604.24506},
archivePrefix={arXiv},
primaryClass={q-bio}
}This project is licensed under the MIT License.


