reproducible-hpc

Reproducible HPC

Reproducible HPC banner

Nix, containers, and workflow tools for HPC environments

Reproducible HPC workflow diagram

Context

HPC environments drift. Module systems, shared libraries, and compiler versions change from under you; the calculation that ran last month will not necessarily run today. A README that lists what you installed is not a reproducibility strategy.

Nix for HPC

Nix pins every dependency from compiler to MPI implementation in a content addressed store, producing bit-for-bit reproducible builds. Benchmarks on the Elja cluster show the resulting binaries match the performance of manually tuned builds (Goswami et al. 2023).

Literate reproducible workflows

Multi-tool pipelines (alignment, tree building, statistical analysis) drift worst of all because each stage is usually someone else’s script. We combined org-mode with Snakemake so the analysis prose, code, and execution live in one file, regenerated deterministically (Goswami and S 2023).

Broader ecosystem work

I consult for the Icelandic National Competence Center for HPC & AI (2021-present), where I set up the Nix-based software module system on the Elja supercomputer. Adjacent to that I maintain a number of conda-forge feedstocks, contribute to pixi and spack, and help individual projects adopt reproducible build practices. The blog collects the Nix, pixi, and packaging patterns across a dozen-plus posts.

Code

  • hzArchiso – Custom ArchLinux installation media
  • HPC series on rgoswami.me (per-project Spack baselines, Spack + PyTorch workflow).
  • Local Nix without root (how to bootstrap Nix on a cluster where you do not have admin).

See also: packaging tutorials under Teaching for Nix workshop material.

Open directions

  • A “BLAS for chemical kinetics” (from the thesis conclusion): standardized, optimized building blocks that any simulation code can call, the way LAPACK standardized linear algebra.
  • Extending the Nix-based HPC approach to GPU-accelerated workflows with proper CUDA/ROCm dependency management.

References

Goswami, Rohit, Ruhila S, Amrita Goswami, Sonaly Goswami, and Debabrata Goswami. 2023. “Reproducible High Performance Computing without Redundancy with Nix.” In 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (Pdgc).
Goswami, Rohit, and Ruhila S. 2023. “High Throughput Reproducible Literate Phylogenetic Analysis.” In 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (Pdgc).

← All research threads