I have a server with 2 Intel Xeon Gold 6148 CPUs, in which omp domains distributed in the following mode:
CPUs 0-19 in domain 0
CPUs 20-39 in domain 1
I've compiled wrf.exe (arw v3.9.1.1) with Intel compilers (v19.1.3.304) and the option dm+sm, also with support of avx512 instructions set. (see configure.wrf file). It created all exe files except the real.exe (It was no problem because I had real.exe compiled with only dm option with the support of avx512)
When I run the wrf.exe with the command:
$ mpirun -np 2 -genv OMP_NUM_THREADS=19 -genv I_MPI_DEBUG=5 -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=omp -genv I_MPI_PIN_ORDER=compact -genv I_MPI_PIN_CELL=core -genv I_MPI_PIN_PROCESSOR_LIST=1-19,21-39 ./wrf.exe
it crushs with the error: (Segmentation fault)
here is output:
--------------------------------------------------------------------------------------------------
[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9 Build 20200923 (id: abd58e492)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
starting wrf task 1 of 2
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 32308 localhost.localdomain {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}
[0] MPI startup(): 1 32309 localhost.localdomain {19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_PIN=1
[0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=1-19,21-39
[0] MPI startup(): I_MPI_PIN_CELL=core
[0] MPI startup(): I_MPI_PIN_DOMAIN=omp
[0] MPI startup(): I_MPI_PIN_ORDER=compact
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5
starting wrf task 0 of 2
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 32308 RUNNING AT localhost.localdomain
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 32309 RUNNING AT localhost.localdomain
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
----------------------------------------------------------------------------------------------------
But it runs without any problems if wrf.exe is compiled in dm mode.
$ mpirun -n 38 ./wrf.exe
See my namelist.input file and wrf run logs
Can any one help me?
CPUs 0-19 in domain 0
CPUs 20-39 in domain 1
I've compiled wrf.exe (arw v3.9.1.1) with Intel compilers (v19.1.3.304) and the option dm+sm, also with support of avx512 instructions set. (see configure.wrf file). It created all exe files except the real.exe (It was no problem because I had real.exe compiled with only dm option with the support of avx512)
When I run the wrf.exe with the command:
$ mpirun -np 2 -genv OMP_NUM_THREADS=19 -genv I_MPI_DEBUG=5 -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=omp -genv I_MPI_PIN_ORDER=compact -genv I_MPI_PIN_CELL=core -genv I_MPI_PIN_PROCESSOR_LIST=1-19,21-39 ./wrf.exe
it crushs with the error: (Segmentation fault)
here is output:
--------------------------------------------------------------------------------------------------
[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9 Build 20200923 (id: abd58e492)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
starting wrf task 1 of 2
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 32308 localhost.localdomain {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}
[0] MPI startup(): 1 32309 localhost.localdomain {19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_PIN=1
[0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=1-19,21-39
[0] MPI startup(): I_MPI_PIN_CELL=core
[0] MPI startup(): I_MPI_PIN_DOMAIN=omp
[0] MPI startup(): I_MPI_PIN_ORDER=compact
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5
starting wrf task 0 of 2
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 32308 RUNNING AT localhost.localdomain
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 32309 RUNNING AT localhost.localdomain
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
----------------------------------------------------------------------------------------------------
But it runs without any problems if wrf.exe is compiled in dm mode.
$ mpirun -n 38 ./wrf.exe
See my namelist.input file and wrf run logs
Can any one help me?