Hello,
I tried various suggestions related to a segmentation fault, but it has not been solved so far.
Is there anyone who can help me to fix this problem?
Thanks,
Loading openmpi/4.1.6/gcc11.4.0-cuda12.3.2
Loading requirement: cuda/12.3.2 ucx/1.15.0/cuda12.3.2
starting wrf task 36 of 48
starting wrf task 2 of 48
starting wrf task 5 of 48
starting wrf task 11 of 48
starting wrf task 14 of 48
starting wrf task 39 of 48
starting wrf task 8 of 48
starting wrf task 10 of 48
starting wrf task 43 of 48
starting wrf task 3 of 48
starting wrf task 18 of 48
starting wrf task 6 of 48
starting wrf task 13 of 48
starting wrf task 20 of 48
starting wrf task 26 of 48
starting wrf task 37 of 48
starting wrf task 44 of 48
starting wrf task 42 of 48
starting wrf task 25 of 48
starting wrf task 7 of 48
starting wrf task 17 of 48
starting wrf task 41 of 48
starting wrf task 47 of 48
starting wrf task 34 of 48
starting wrf task 19 of 48
starting wrf task 22 of 48
starting wrf task 31 of 48
starting wrf task 46 of 48
starting wrf task 35 of 48
starting wrf task 27 of 48
starting wrf task 15 of 48
starting wrf task 29 of 48
starting wrf task 30 of 48
starting wrf task 45 of 48
starting wrf task 28 of 48
starting wrf task 38 of 48
starting wrf task 23 of 48
starting wrf task 24 of 48
starting wrf task 12 of 48
starting wrf task 0 of 48
starting wrf task 32 of 48
starting wrf task 40 of 48
starting wrf task 21 of 48
starting wrf task 9 of 48
starting wrf task 33 of 48
starting wrf task 1 of 48
starting wrf task 4 of 48
starting wrf task 16 of 48
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 24 with PID 0 on node bnode045 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
============================================================
Request ID: 440571.nqsv
Request Name: runwrf.sh
Queue: gpu@nqsv
Number of Jobs: 1
Created Request Time: Mon Oct 21 16:55:31 2024
Started Request Time: Mon Oct 21 16:56:04 2024
Ended Request Time: Mon Oct 21 18:36:57 2024
Resources Information:
Elapse: 6057S
Remaining Elapse: 80343S
============================================================
I tried various suggestions related to a segmentation fault, but it has not been solved so far.
Is there anyone who can help me to fix this problem?
Thanks,
Loading openmpi/4.1.6/gcc11.4.0-cuda12.3.2
Loading requirement: cuda/12.3.2 ucx/1.15.0/cuda12.3.2
starting wrf task 36 of 48
starting wrf task 2 of 48
starting wrf task 5 of 48
starting wrf task 11 of 48
starting wrf task 14 of 48
starting wrf task 39 of 48
starting wrf task 8 of 48
starting wrf task 10 of 48
starting wrf task 43 of 48
starting wrf task 3 of 48
starting wrf task 18 of 48
starting wrf task 6 of 48
starting wrf task 13 of 48
starting wrf task 20 of 48
starting wrf task 26 of 48
starting wrf task 37 of 48
starting wrf task 44 of 48
starting wrf task 42 of 48
starting wrf task 25 of 48
starting wrf task 7 of 48
starting wrf task 17 of 48
starting wrf task 41 of 48
starting wrf task 47 of 48
starting wrf task 34 of 48
starting wrf task 19 of 48
starting wrf task 22 of 48
starting wrf task 31 of 48
starting wrf task 46 of 48
starting wrf task 35 of 48
starting wrf task 27 of 48
starting wrf task 15 of 48
starting wrf task 29 of 48
starting wrf task 30 of 48
starting wrf task 45 of 48
starting wrf task 28 of 48
starting wrf task 38 of 48
starting wrf task 23 of 48
starting wrf task 24 of 48
starting wrf task 12 of 48
starting wrf task 0 of 48
starting wrf task 32 of 48
starting wrf task 40 of 48
starting wrf task 21 of 48
starting wrf task 9 of 48
starting wrf task 33 of 48
starting wrf task 1 of 48
starting wrf task 4 of 48
starting wrf task 16 of 48
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 24 with PID 0 on node bnode045 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
============================================================
Request ID: 440571.nqsv
Request Name: runwrf.sh
Queue: gpu@nqsv
Number of Jobs: 1
Created Request Time: Mon Oct 21 16:55:31 2024
Started Request Time: Mon Oct 21 16:56:04 2024
Ended Request Time: Mon Oct 21 18:36:57 2024
Resources Information:
Elapse: 6057S
Remaining Elapse: 80343S
============================================================