Bayesian Statistics
Hierarchical models and uncertainty quantification for algorithm benchmarking
The problem
Computational chemistry benchmarks typically report point estimates across a handful of test cases. These numbers hide the variability between problems and give no indication of whether observed differences between algorithms are statistically meaningful.
Hierarchical Bayesian approach
We use brms (Bayesian Regression Models using Stan) to fit hierarchical models
where each test case is treated as drawn from a population
(Goswami 2025). The model estimates both the
average performance of each algorithm and the between-problem variance,
producing full posterior distributions over rankings.
Applied to dimer method rotation optimizers (CG vs L-BFGS) across 500 molecular systems, the model reveals which performance differences are real and which are noise from problem selection.
Transferability
The methodology applies beyond saddle point searches. Any computational benchmark where algorithms are compared across test problems can use the same hierarchical structure for honest uncertainty on performance claims.