Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

I can't run wrf.exe on a process larger than 6.

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

KiyoTom

Member
Dear Team

I downloaded and built WRF Model Version 4.2.1 and tried to run it on aws RHEL (x86_64) system.
When I run wrf.exe on 96vCPUs instance with 8 processes * 12 threads, I get segmentation fault.
However, when I run it in 6 processes * 16 threads, it ends normally.
I would like to use a cluster to do large scale calculations and get results faster.
What is the cause of a segmentation fault?

I set "ulimit -s unlimited" before execution.

compiler -> intel 19.1.2.254 20200623
MPI -> Intel(R) MPI Library 2019 Update 8 for Linux*

Attach namelist.input, rsl.output. * And rsl.error. *.
View attachment namelist.input
View attachment rsl.tgz
 
This depends on how your model domain is decomposed.
With 6 processors, your domain is decomposed reasonably. However, with 8 processors, 4 is allocated to the Y-dimension, and the number of grids allocated for a single processor could be very small, which will result in model failure. Below is an example:
[case]
WRF TILE 12 IS 1 IE 165 JS 159 JE 165
WRF NUMBER OF TILES = 12
forrtl: severe (174): SIGSEGV, segmentation fault occurred
[/case]

Tile 12 only has 7 grid numbers in Y-direction, which is too small. Note that we recommend each processor to cover at least 20 grid numbers.
 
When you determine the the number of processors to use, it is better to choose the number so that its allocations to X and Y dimensions are similar.

For example,

the number 16 is good because 16 = 4 x 4, and 4 is allocated to Y and 4 is allocated to X
the number of 18 will end up with 9 in Y and 2 in X. This will lead to issues in WRF run.
 
Top