Reproducible HPC
Nix, containers, and workflow tools for HPC environments
Context
HPC environments drift. Module systems, shared libraries, and compiler versions change from under you; the calculation that ran last month will not necessarily run today. A README that lists what you installed is not a reproducibility strategy.
Nix for HPC
Nix pins every dependency from compiler to MPI implementation in a content addressed store, producing bit-for-bit reproducible builds. Benchmarks on the Elja cluster show the resulting binaries match the performance of manually tuned builds (Goswami et al. 2023).
Literate reproducible workflows
Multi-tool pipelines (alignment, tree building, statistical analysis) drift worst of all because each stage is usually someone else’s script. We combined org-mode with Snakemake so the analysis prose, code, and execution live in one file, regenerated deterministically (Goswami and S 2023).
Broader ecosystem work
I consult for the Icelandic National Competence Center for HPC & AI (2021-present), where I set up the Nix-based software module system on the Elja supercomputer. Adjacent to that I maintain a number of conda-forge feedstocks, contribute to pixi and spack, and help individual projects adopt reproducible build practices. The blog collects the Nix, pixi, and packaging patterns across a dozen-plus posts.
Code
- hzArchiso – Custom ArchLinux installation media
Related writing
- HPC series on rgoswami.me (per-project Spack baselines, Spack + PyTorch workflow).
- Local Nix without root (how to bootstrap Nix on a cluster where you do not have admin).
See also: packaging tutorials under Teaching for Nix workshop material.
Open directions
- A “BLAS for chemical kinetics” (from the thesis conclusion): standardized, optimized building blocks that any simulation code can call, the way LAPACK standardized linear algebra.
- Extending the Nix-based HPC approach to GPU-accelerated workflows with proper CUDA/ROCm dependency management.