Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Error when running WRF

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Enea2301

New member
Hi! I'm quite veteran at using WRF, but today I've found an error I've never faced before.

Today I decided to compile WRF via Intel compilers, to test out it against GCC with latests versions of them both and of libraries.

Well, compilation went smooth and everything seemed fine. real.exe worked flawlessly, and wrf.exe seemed to, but when the program had to start calculations...

As you can see, program shows this error:

"forrtl: severe (174): SIGSEGV, segmentation fault occurred"

Well, I can assure it's not a memory problem. ulimit was set to unlimited, and this same domain in WRF-GCC (compiled with GCC version) just uses 5Gb of RAM.

What can it be? Any ideas?

I'm totally out of ideas.

For more information, that may be relevant: WRF compiled with DMPAR worked, and wrf.exe ran (it was quite slower than GCC smpar, which is what I use now as "operative"). This problem is happening while compiling with SMPAR (why SMPAR? Well, I have a 1 CPU computer and the performance gain from DMPAR to SMPAR is impressive in my Ryzen CPU).
 

Attachments

  • Captura de pantalla de 2020-11-18 22-03-44.png
    Captura de pantalla de 2020-11-18 22-03-44.png
    286.3 KB · Views: 765
I've been able to trace the error down to radiation schemes.

I use lw and sw 4 (RRTMG), and in GCC there are no problems. However, with intel compilers, it crashes

If I change to sw/lw 1 (Dudhia and RRTM), everything seems fine.

Any ideas on why is this happening?
 
It seems that the case failed immediately after wrf.exe started, which often indicates a data issue or a memory issue.
For this case, I am suspicious it is a memory issue. I have a few questions:
(1) What version of WRF you are running?
(2) How did you build and run ?
(3) We recommend compiling WRF in dmpar mode, and run with multiple processors
 
Hi! I really don't think it's a memory issue.

Running the same domain for the same period of time and same physics conditions on WRF compiled with GCC in smpar mode runs without a problem, only using 5-6GB out of the total 32GB of RAM that I have available (both cases with ulimit -s unlimited set).

I built this WRF with latest intel compilers, and the default optimization flags. For running WRF, as it was in smpar mode, i used the command OMP_NUM_THREADS=12 (so it uses 12/24 threads of my CPU, or what it's the same, the real 12 physical cores).

All my WRF installations are done with smpar because I run the model on a desktop PC with a powerful CPU, and after several tests it was fairly clear that using SMPAR instead of DMPAR provided a benefit of speed, resulting in less simulation time (probably due to the architecture of my CPU, a Ryzen, which clearly benefits from SMPAR way of working).

(For example, running same WRF with same compilers and flags in SMPAR and DMPAR mode yielded around 15% faster simulations with SMPAR).
 
I notice that your tile size is pretty small (the grid number along the Y direction is only 7 for each tile). Can you reduce the number of processors you used to run this case, for example, set OMP_NUM_THREADS=4, and try again? Please let me know whether it works.
 
Ming Chen said:
I notice that your tile size is pretty small (the grid number along the Y direction is only 7 for each tile). Can you reduce the number of processors you used to run this case, for example, set OMP_NUM_THREADS=4, and try again? Please let me know whether it works.

I tried it, and the result was pretty much the same. Only way I was able to make WRF with intel compilers work was with DMPAR mode.
 
Actually dmpar mode is commended for compiling WRF. Please stay with this mode now that it works fine. We did see issues in some machines/compilers when the code is compiled in dm+sm mode or sm mode. It is hard for us to repeat these problems and come up with a solution.
 
Top