Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

[SOLVED] Segmentation Fault when executing a large run

sergiotwp

New member
Hello everyone,
I've compiled the last version of WRF(4.6.0) with the configure option dm+sm on an aarch64 architecture. When first testing whether the compilation output works or not, we tried a small run of 160x100 cells of 5 km resolution, and it worked properly, so we guessed the compilation was fine.
After some other tests, we had a Segmentation Fault executing a bigger run of 400x300 cells of 5km resolution. At first, we got this error even when executing real.exe, but it solved executing "ulimit -s unlimited" and "ulimit -c unlimited". However, wrf.exe still does not work.
One thing to notice is that our WPS is compiled with the serial option orientated to x86-64 with some modifications to make it work on aarch64. There seems to be no issue with WPS, though.
We have also tried setting WRFIO_NCD_LARGE_FILE_SUPPORT to 1 and OMP_NUM_THREADS to 4 with no improvement.
I'm attaching the compile log as well as the logs of the run (two different logs for two different number of threads).

EDIT: We have made some changes in the namelist (for example reducing the domain size, but it didn't make a difference), and finally it worked after removing and changing the physics options. Can someone find any inconsistencies in the "error_namelist.input.txt", which worked fine in WRF4.5.1? I also attach the two namelists that worked.

EDIT 2: After removing the fdda diagnostics from the namelist.input, the model worked fine, does anyone know why?

EDIT 3: The size of the run still seems to be part of the issue, since increasing it results in a segmentation fault.

1718095168944.png
 

Attachments

  • log.compile.txt
    1 MB · Views: 5
  • rsl.error.0000.txt
    7.4 KB · Views: 10
  • rsl.error.0015.txt
    2 KB · Views: 5
  • error_namelist.input.txt
    6.2 KB · Views: 2
  • working_namelist1.input.txt
    5.9 KB · Views: 2
  • working_namelist2.input.txt
    5.9 KB · Views: 2
Last edited:
This is obviously a memory issue. I would suggest that you increase the number of processors, which will give you larger memory, and rerun this case. if it still doesn't work, please consult your computer manager to get larger memory.
 
This is obviously a memory issue. I would suggest that you increase the number of processors, which will give you larger memory, and rerun this case. if it still doesn't work, please consult your computer manager to get larger memory.
We managed to run the simulation with 48 cores and OMP_NUM_THREADS=4 as well as 40 cores setting OMP_NUM_THREADS=16. It seems to be, in fact, some issue related to memory and processors. The problem is that we could run the same simulation in WRF 4.5.1 using only 4 threads and 8 processors, so I think there has to be some configuration that is limiting WRF to work properly.
 
I am glad it finally works and thanks for the update !

It could be possible that WRFV4.6 requires larger memory due to some changes/update ....
 
I am glad it finally works and thanks for the update !

It could be possible that WRFV4.6 requires larger memory due to some changes/update ....
You are welcome. We still have problems though. This increase of the number of cores is giving us performance issues (described in this thread), and we are still getting segmentation faults with fewer cores no matter what we try. I don't think this is a problem of the WRF version. Version 4.5 is capable of running the same simulation in a single core. Do someone have any idea of how to get rid of the segemntation faults?
 
I finally found the solution: export OMP_STACKSIZE=128M. I can set the OMP_STACKSIZE to almost any number, but exporting it is required to prevent the segmentation faults.
 
Top