Research Vision
This page summarizes the research programme I plan to lead as an independent faculty member. For the present state of individual threads, see Research.
The problem
Computational chemistry is the discipline of translating quantum mechanics into predictions about macroscopic matter. Sixty years in, two bottlenecks still dominate: (1) the cost of the electronic structure calculation, and (2) the cost of sampling the configuration space that matters for a given question - reaction rates, phase behaviour, materials response under load. Machine-learned potentials have made (1) tractable for a growing class of systems. (2) is where the next decade of progress has to happen, and it is where my work has consistently landed.
The thesis
Better representations beat better models. If the surrogate for a potential energy surface captures the invariances of chemistry (permutation, rotation, composition) in its distance metric, the statistics become far simpler and far more data-efficient. The Optimal Transport Gaussian Process framework is one concrete instance of this thesis. The principle generalizes - to coarse-grained dynamics, to kinetic Monte Carlo on adaptive landscapes, to uncertainty quantification across benchmark problems.
Programme for the first five years
Adaptive kinetic Monte Carlo with learned saddle priors. Scaling the OT-GP framework from single saddle searches to millions-of-steps aKMC runs. Target: a million-atom, thousand-step-per-day capability that lets us simulate long-timescale processes (corrosion, catalyst deactivation) on realistic systems. Deliverables: an open-source aKMC driver on top of eOn + metatomic, benchmarks on at least three industry-relevant surfaces.
Hybrid GP + ML-potential corrections. The GP predicts deltas on the ML potential, not the full DFT surface; the ML potential absorbs the bulk transferability. This should simultaneously cut training data requirements and fix the tail-behaviour problem ML potentials suffer near transition states. Deliverables: a trained correction layer for at least one foundation potential (PET-MAD or MACE-MP), reaction-rate benchmarks against DFT reference.
Bayesian inference for benchmark-driven algorithm design. The hierarchical framework from the AIP Advances paper extends beyond saddle search. Any discipline that compares algorithms on heterogeneous test problems (solver suites, sampling methods, ML-potential architectures) needs honest uncertainty on the ranking. Deliverables: a domain-agnostic R / Stan package, community benchmark suites that report posteriors instead of point estimates.
Infrastructure that survives the decade. A continuing commitment to eOn, metatensor, f2py, and the broader scientific Python stack. These are the vehicles by which any academic algorithmic result reaches practitioners. My students will ship working code that others use, not one-off notebooks.
Why me
- Publications: 29 tracked across computational chemistry, scientific software, Bayesian benchmarking, ultrafast spectroscopy.
- Software: Lead maintainer of eOn; commit-rights maintainer of f2py (NumPy); integrated HiGHS into SciPy; ported OpenBLAS to meson; JOSS editor (2024-present).
- Mentorship: GSoC student (2021), mentor (2022, 2023, 2024 admin), NumFOCUS SDGs, Summer of Nix, DVS.
- Teaching: University courses in Machine Learning and Software Quality Management at University of Iceland; ten+ Carpentries workshops; Stanford Code in Place section leader and teaching mentor; invited C++ and web-development workshops.
- Service: 47 verified peer reviews on Web of Science; JOSS editor; session-chair duty at APS; IEEE P3173 Vice Chair for reproducible neuroimaging.
The common thread across all of this: I build algorithms that rest on better representations, ship them as software others can depend on, and teach the ideas so they outlive the codebase.
Grand challenges I want students to own
- A BLAS for chemical kinetics: standardized, optimized building blocks that any simulation code can call, the way LAPACK standardized linear algebra (argued in the thesis conclusion).
- Uncertainty-aware foundation potentials: ML potentials that report calibrated predictive intervals, not point estimates, with the uncertainty driving active learning on the fly.
- Reproducible-by-construction HPC pipelines: Nix-based infrastructure where a published calculation includes the exact environment that ran it, not a README that lists dependencies.