ML Atomistic
Foundational libraries enabling ML models and simulation engines to communicate
Context
Atomistic ML lives in an O(M x E) world: M models (PyTorch, JAX, MACE, PET, …) times E simulation engines (LAMMPS, GROMACS, ASE, eOn). Every model-engine pair needs its own glue, and a model trained under one framework cannot be dropped into another without rewriting that glue.
The approach
Metatensor provides a shared data format (TensorMap) for atomistic ML tensors - descriptors, predictions, training labels - with metadata about which atoms and properties each block describes (Bigi et al. 2025). The format is framework-agnostic, sparse, and block-structured. Metatomic wraps trained models behind one interface that simulation engines call directly. Integration cost drops from O(M x E) to O(M + E).
My contributions
At EPFL labCOSMO I work on the systems-level pieces that make metatensor practical: DLPack support for zero-copy tensor interchange between PyTorch, NumPy, and the Rust core; device-aware execution that routes tensors to the right hardware; and performance tuning on the Rust side. On the simulation side, I maintain the integration with eOn for saddle point searches and contribute to the GROMACS integration for scalable MD.
Code
- metatensor – Contributor (DLPack, device management, Rust core)
- metatomic – Contributor
- metatrain – Contributor (training infrastructure)
- metatensor-gromacs – GROMACS integration (DD scalability, device-aware threading)
- vesin – Neighbor list library
- rgpot – RPC-based potential interface (GitHub)
- ChemGP – Chemically motivated Gaussian processes
- Atomistic Cookbook – Tutorial recipes for the ecosystem
Open directions
- Hardware-aware execution paths in metatomic: automatically selecting GPU vs CPU kernels based on system size and available devices.
- Extending metatensor to handle long-range electrostatics and periodic boundary conditions natively in the tensor format.