rsl.out.???? files truncated by segmentation fault

andythewxman

New member
Hello! I am currently running WRFV4.5.2 using Slurm. When I run X amount of processors on a single node, everything works as expected. However, when I use the same number of processors evenly divided over two nodes, the model still completes, but I get a segmentation fault message and then the rsl.out.???? files are truncated and do not provide a "SUCCESS" message at the end. I did not observe this behavior when running WRFV3.8. I am a bit stumped at this point. Do you have any idea where I can start trying to track this down? Thanks.
 
Hi,
Apologies for the long delay in response while our team tended to time-sensitive obligations. Thank you for your patience. This issue is likely specific to your computing environment. I would suggest discussing the issue with a systems administrator at your institution for solutions.
 
Thank you - we were able to resolve the issue and confirm it was not WRF-related. A co-worker experienced this with another HPC program we were running. Apparently it is a known Cray error with an environment variable. Setting FI_VERBS_PREFER_XRC=0 solved this for us.
 
Back
Top