ATP
Created Wednesday 24 August 2016
ATP in Theta
At higher core counts, other strategies will be needed -- disabling core dumps or even lightweight files is needed when 10,000 processes fall over, each choosing to write a core dump. Lustre doesn't take kindly to that ... but then when you go and do operations in the directory with 10,000 (or even 100,000 files) in them, it likes that even less! Got burned a few times over the years on that one .. although I never tried breaking a GPFS system in that way, YMMV.
There are two other ways to debug this situation non-interactively:
- module load atp
I can confirm that ATP works under these conditions:
a) compile with Cray
b) put this in your job script:
module load atp
export ATP_ENABLED=1
aprun [...]
ATP in NERSC
http://www.nersc.gov/users/software/performance-and-debugging-tools/stat-and-atp/#toc-anchor-2
Backlinks: Software:Debugging Software