Reproducible HPC
Nix, containers, and workflow tools for HPC environments
The problem
HPC systems routinely break reproducibility. Module systems, shared libraries, and compiler version drift mean that a calculation that ran last month may not run today. The standard approach – documenting the environment in a README – fails the moment a dependency updates.
Nix for HPC
We demonstrated that Nix can provide bit-reproducible builds on HPC systems without sacrificing performance (Goswami et al. 2023). Nix derivations pin every dependency from compiler to MPI implementation, and the resulting binaries match the performance of manually optimized builds.
Literate reproducible workflows
For workflows that span multiple tools (alignment, tree building, statistical analysis), we developed a literate programming approach using org-mode and Snakemake (Goswami and S 2023). Each step is documented alongside its execution, producing a self-contained reproducible document.
Broader ecosystem work
Beyond papers, I maintain conda-forge packages, contribute to pixi (the conda-based task runner), and help scientific projects adopt reproducible build practices. My blog on rgoswami.me documents Nix, pixi, and packaging patterns for mixed-language scientific codes.
Code
- hzArchiso – Custom ArchLinux installation media
Related writing
- Non-CRAN R packages with Nix
- Mach-nix and Niv for Python
- Customizing ArchLinux installation media
- SSH access via container VPNs