Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Issue with LANDUSEF variable during geogrid.exe

After running geogrid.exe, the variable LANDUSEF in geo_em.d04.nc appears to have incorrect values.

Using ncview, the range shows 0 to 1e+20, whereas for geo_em.d03.nc and other domains, the range correctly shows 0 to 1. I'm not sure if there is anything wrong, but it seems unusual. The goegrid.log and namelist.wps files are attached. I tried to compress the geo_em files, but the size didn't reduce much. If there is another way to send large files (about 250 MB), please let me know.

Any ideas would be appreciated.
 

Attachments

  • geogrid.log
    104.5 KB · Views: 1
  • namelist.wps
    1.3 KB · Views: 2
Hi,
I ran a test with your namelist.wps file and I see the same thing you do; however, I don't easily see any issue with the geo_em.d04 file. Can you try to run ungrib and metgrid and see if things work and look reasonable? If so, continue to run real and wrf, and if everything looks okay, it should be okay to ignore the issue.
 
Hi,
I ran a test with your namelist.wps file and I see the same thing you do; however, I don't easily see any issue with the geo_em.d04 file. Can you try to run ungrib and metgrid and see if things work and look reasonable? If so, continue to run real and wrf, and if everything looks okay, it should be okay to ignore the issue.
Thank you for your helpful reply.

The same issue is observed with the met_em.d04* and wrfinput_d04 files, while everything seems normal for the met_em and wrfinput files in other domains.

I should mention that after attempting to run the command "mpirun -np 36 ./wrf.exe" (or other np options, which I believe are not relevant to this issue), the model runs normally for max_dom = 3. However, when max_dom = 4, the model stops running and displays the following message:

"""
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 20962 RUNNING AT compute13
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

"""
On the other hand, using "mpirun -np 16 ./wrf.exe" seems to make the model run for max_dom = 4, albeit with very slow progress.

The namelist.input and rsl files for max_dom = 4 and "mpirun -np 36 ./wrf.exe" are attached.

Any feedback or comments would be greatly appreciated.
 

Attachments

  • namelist.input
    4.5 KB · Views: 1
  • rsl_code9.tar.gz
    28.9 KB · Views: 1
Thank you for your helpful reply.

The same issue is observed with the met_em.d04* and wrfinput_d04 files, while everything seems normal for the met_em and wrfinput files in other domains.

I should mention that after attempting to run the command "mpirun -np 36 ./wrf.exe" (or other np options, which I believe are not relevant to this issue), the model runs normally for max_dom = 3. However, when max_dom = 4, the model stops running and displays the following message:

"""
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 20962 RUNNING AT compute13
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

"""
On the other hand, using "mpirun -np 16 ./wrf.exe" seems to make the model run for max_dom = 4, albeit with very slow progress.

The namelist.input and rsl files for max_dom = 4 and "mpirun -np 36 ./wrf.exe" are attached.

Any feedback or comments would be greatly appreciated.
I would like to add that the run for max_dom = 4, using a Slurm script with 2 nodes and 36 jobs per node, is currently in progress. I am uncertain about the LANDUSEF settings and am waiting for the WRF run to complete so I can check for any anomalies in the outputs.

On the other hand, considering that the resolutions of the third and fourth domains are 1 km and 333.33 m respectively, I am wondering if using MYJ as the bl_pbl_physics option is appropriate, or if I should switch to the Shin-Hong scheme for all domains?
 
I would like to add that the run for max_dom = 4, using a Slurm script with 2 nodes and 36 jobs per node, is currently in progress. I am uncertain about the LANDUSEF settings and am waiting for the WRF run to complete so I can check for any anomalies in the outputs.
Core dumped!

However, the results appear reasonable, even though the process only progressed for a few minutes.
 
On the other hand, considering that the resolutions of the third and fourth domains are 1 km and 333.33 m respectively, I am wondering if using MYJ as the bl_pbl_physics option is appropriate, or if I should switch to the Shin-Hong scheme for all domains?

For max_dom = 4, using a Slurm script with 2 nodes and 36 (and ever 16) tasks per node, and setting bl_pbl_physics = 11 (Shin-Hong) and sf_sfclay_physics = 1 (Revised MM5), the process encountered a core dump error once again. The only difference with the challenging bl_pbl_physics and sf_sfclay_physics settings in CONUS was that a few time steps were successfully completed.

If a core dump occurs solely due to increasing the resolution by adding a nested domain, what is the typical solution? Would modifying the physics configurations help? However, it didn’t seem effective in my case, as I tried different options like bl_pbl_physics (Shin-Hong) and sf_sfclay_physics (Revised MM5), as mentioned earlier.

Any comments or suggestions would be greatly appreciated.
 
Last edited:
Hi,
I think the problem is that you're using entirely too few processors (max 36) for your domain sizes, which are:

e_we = 239, 493, 967, 1840,
e_sn = 150, 316, 595, 1129,

Domain 4 is much larger than domain 1, which may make these two domain sizes difficult to run together. See Choosing an Appropriate Number of Processors for details.
 
Top