Issue with LANDUSEF variable during geogrid.exe

ehsantaghizadeh · Mar 4, 2025

After running geogrid.exe, the variable LANDUSEF in geo_em.d04.nc appears to have incorrect values.

Using ncview, the range shows 0 to 1e+20, whereas for geo_em.d03.nc and other domains, the range correctly shows 0 to 1. I'm not sure if there is anything wrong, but it seems unusual. The goegrid.log and namelist.wps files are attached. I tried to compress the geo_em files, but the size didn't reduce much. If there is another way to send large files (about 250 MB), please let me know.

Any ideas would be appreciated.

kwerner · Mar 11, 2025

Hi,
I ran a test with your namelist.wps file and I see the same thing you do; however, I don't easily see any issue with the geo_em.d04 file. Can you try to run ungrib and metgrid and see if things work and look reasonable? If so, continue to run real and wrf, and if everything looks okay, it should be okay to ignore the issue.

ehsantaghizadeh · Mar 13, 2025

kwerner said:
Hi,
I ran a test with your namelist.wps file and I see the same thing you do; however, I don't easily see any issue with the geo_em.d04 file. Can you try to run ungrib and metgrid and see if things work and look reasonable? If so, continue to run real and wrf, and if everything looks okay, it should be okay to ignore the issue.

Thank you for your helpful reply.

The same issue is observed with the met_em.d04* and wrfinput_d04 files, while everything seems normal for the met_em and wrfinput files in other domains.

I should mention that after attempting to run the command "mpirun -np 36 ./wrf.exe" (or other np options, which I believe are not relevant to this issue), the model runs normally for max_dom = 3. However, when max_dom = 4, the model stops running and displays the following message:

"""
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 20962 RUNNING AT compute13
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

"""
On the other hand, using "mpirun -np 16 ./wrf.exe" seems to make the model run for max_dom = 4, albeit with very slow progress.

The namelist.input and rsl files for max_dom = 4 and "mpirun -np 36 ./wrf.exe" are attached.

Any feedback or comments would be greatly appreciated.

ehsantaghizadeh · Mar 13, 2025

ehsantaghizadeh said:
Thank you for your helpful reply.

The same issue is observed with the met_em.d04* and wrfinput_d04 files, while everything seems normal for the met_em and wrfinput files in other domains.

I should mention that after attempting to run the command "mpirun -np 36 ./wrf.exe" (or other np options, which I believe are not relevant to this issue), the model runs normally for max_dom = 3. However, when max_dom = 4, the model stops running and displays the following message:

"""
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 20962 RUNNING AT compute13
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

"""
On the other hand, using "mpirun -np 16 ./wrf.exe" seems to make the model run for max_dom = 4, albeit with very slow progress.

The namelist.input and rsl files for max_dom = 4 and "mpirun -np 36 ./wrf.exe" are attached.

Any feedback or comments would be greatly appreciated.

I would like to add that the run for max_dom = 4, using a Slurm script with 2 nodes and 36 jobs per node, is currently in progress. I am uncertain about the LANDUSEF settings and am waiting for the WRF run to complete so I can check for any anomalies in the outputs.

On the other hand, considering that the resolutions of the third and fourth domains are 1 km and 333.33 m respectively, I am wondering if using MYJ as the bl_pbl_physics option is appropriate, or if I should switch to the Shin-Hong scheme for all domains?

ehsantaghizadeh · Mar 13, 2025

ehsantaghizadeh said:
I would like to add that the run for max_dom = 4, using a Slurm script with 2 nodes and 36 jobs per node, is currently in progress. I am uncertain about the LANDUSEF settings and am waiting for the WRF run to complete so I can check for any anomalies in the outputs.

Core dumped!

However, the results appear reasonable, even though the process only progressed for a few minutes.

ehsantaghizadeh · Mar 15, 2025

ehsantaghizadeh said:
On the other hand, considering that the resolutions of the third and fourth domains are 1 km and 333.33 m respectively, I am wondering if using MYJ as the bl_pbl_physics option is appropriate, or if I should switch to the Shin-Hong scheme for all domains?

For max_dom = 4, using a Slurm script with 2 nodes and 36 (and ever 16) tasks per node, and setting bl_pbl_physics = 11 (Shin-Hong) and sf_sfclay_physics = 1 (Revised MM5), the process encountered a core dump error once again. The only difference with the challenging bl_pbl_physics and sf_sfclay_physics settings in CONUS was that a few time steps were successfully completed.

If a core dump occurs solely due to increasing the resolution by adding a nested domain, what is the typical solution? Would modifying the physics configurations help? However, it didn’t seem effective in my case, as I tried different options like bl_pbl_physics (Shin-Hong) and sf_sfclay_physics (Revised MM5), as mentioned earlier.

Any comments or suggestions would be greatly appreciated.

kwerner · Mar 18, 2025

Hi,
I think the problem is that you're using entirely too few processors (max 36) for your domain sizes, which are:

e_we = 239, 493, 967, 1840,
e_sn = 150, 316, 595, 1129,

Domain 4 is much larger than domain 1, which may make these two domain sizes difficult to run together. See Choosing an Appropriate Number of Processors for details.

ehsantaghizadeh · Apr 2, 2025

Dear

kwerner said:
Hi,
I think the problem is that you're using entirely too few processors (max 36) for your domain sizes, which are:

e_we = 239, 493, 967, 1840,
e_sn = 150, 316, 595, 1129,

Domain 4 is much larger than domain 1, which may make these two domain sizes difficult to run together. See Choosing an Appropriate Number of Processors for details.

Thank you for your response.

Based on my calculations, the most amount of processors ≈ 3323, and the least amount of processors ≈ 3. I also tested with -np 72, which both 36 and 72 exceed the least amount of processors. I plan to attempt using more processes, if permitted by our servers.

If you believe the issue could be linked to the number of processors, I would greatly appreciate any suggestions you might have for a number of processors that works well.

In the meantime, could you please confirm if the namelist settings appear correct? Specifically, I am curious if odd values like 1129, 967, or 493 might have any unintended effects. However, with max_dom = 3, I can confirm that wrf.exe ran successfully.

I look forward to hearing from you.

kwerner · Apr 15, 2025

Apologies for the delay. To determine the number of processors you can use, you have to base it off the max # that can be used for the smallest domain and the minimum # that can be used for the largest domain. Your smallest domain is d01 (239x150). Based on the rough rule of thumb calculation shown in Choosing an Appropriate Number of Processors, the max you should use is about 54 processors.

However, your largest domain is d04 (1840x1129), and again, you need to calculate the minimum number of processors that can be used for the largest domain. Per the rough calculation, the minimum number that can be used for this domain is about 207.

This means if you want to run all of these domains together, simultaneously, you cannot use more than 54 processors, but can't use fewer than 207, which, of course, is impossible. Therefore you are going to need to use the ndown program to run d04 separately from the other domains. This will allow you to use more processors when processing d04 alone.

ehsantaghizadeh · Apr 29, 2025

kwerner said:
Apologies for the delay. To determine the number of processors you can use, you have to base it off the max # that can be used for the smallest domain and the minimum # that can be used for the largest domain. Your smallest domain is d01 (239x150). Based on the rough rule of thumb calculation shown in Choosing an Appropriate Number of Processors, the max you should use is about 54 processors.

However, your largest domain is d04 (1840x1129), and again, you need to calculate the minimum number of processors that can be used for the largest domain. Per the rough calculation, the minimum number that can be used for this domain is about 207.

This means if you want to run all of these domains together, simultaneously, you cannot use more than 54 processors, but can't use fewer than 207, which, of course, is impossible. Therefore you are going to need to use the ndown program to run d04 separately from the other domains. This will allow you to use more processors when processing d04 alone.

I appreciate your helpful and detailed reply.

I’ll give it a try and get back to you with the results—likely in a few weeks.

ehsantaghizadeh · Jun 26, 2025

kwerner said:
Hi,
I ran a test with your namelist.wps file and I see the same thing you do; however, I don't easily see any issue with the geo_em.d04 file. Can you try to run ungrib and metgrid and see if things work and look reasonable? If so, continue to run real and wrf, and if everything looks okay, it should be okay to ignore the issue.

Could you advise whether setting geog_data_res = 'default' for domains with resolutions of dx = 9000, 3000, 1000, 333 might contribute regarding the LANDUSEF variable displaying a range of values from 0 to 1e+20?

Additionally, I would greatly appreciate your recommendation on the most suitable geog_data_res setting when working with domain resolutions finer than 1 km (e.g., 333 m).

kwerner · Jul 11, 2025

Hi,
Apologies again for the delay. I've run several tests now, modifying different variables in namelist.wps. The issue seems to be related to the ref_lat value, and not the resolution. It works okay when it's a few degrees below 50, but not that high. I'm going to consult with a colleague, who may have some thoughts about this.

Were you ever able to get this to run through the wrf.exe process?

ehsantaghizadeh · Jul 16, 2025

Hi Kwerner,

Thanks for getting back to me. I actually didn’t receive any notifications when replies came in—just happened to check manually and saw the update.

To clarify, I’ve been facing two separate issues:

LANDUSEF Variable Anomaly: The output shows a range of values from 0 up to 1e+20. I suspect this may be due to using very high-resolution domains (e.g., 333 m). I’m considering changing geog_data_res = 'default' to a more appropriate setting, although I’m still unsure which one works best for resolutions finer than 1 km.

Core Dump with wrf.exe on Domain 4: I haven’t made much progress here yet. The crash occurs during wrf.exe execution for domain 4 (333 m resolution). I'm thinking about reducing the domain size by adjusting e_we and e_sn only for domain 4, to see if that alleviates the problem.

Regarding your question—no, I haven’t been able to run the simulation all the way through the wrf.exe stage yet.

Appreciate your efforts and tests

kwerner · Aug 1, 2025

Hi,
Thanks for clarifying. Since the core dump issue is a separate problem, do you mind starting a new thread to discuss that one? If each thread is specific to a single issue, it helps alleviate confusion and makes it easier to read/search later.

As for the LANDUSEF issue, I haven't heard back from my colleague yet. I'll reach out again, in case the email was overlooked. Thanks again for your patience.

ehsantaghizadeh · Aug 9, 2025

ehsantaghizadeh said:
Could you advise whether setting geog_data_res = 'default' for domains with resolutions of dx = 9000, 3000, 1000, 333 might contribute regarding the LANDUSEF variable displaying a range of values from 0 to 1e+20?

Additionally, I would greatly appreciate your recommendation on the most suitable geog_data_res setting when working with domain resolutions finer than 1 km (e.g., 333 m).

Hello Kwerner,

Thank you again for your detailed and helpful guidance throughout this thread.

As a follow-up, I’d like to inquire about the `geog_data_res` option in the WPS `namelist.wps` file. My third and fourth domains are configured at 1 km and 333.33 m resolution, respectively. I’ve observed that using `geog_data_res = 'default'` appears to result in anomalously large values—ranging from 0 to 1e+20—for the `LANDUSEF` variable in the `geo_em.d04.nc`, `met_em.d04*`, and `wrfinput_d04` files.

If this behaviour is indeed linked to the resolution mismatch, my core question is: what are the recommended or best practices for setting `geog_data_res` when working with domains at 1 km or finer resolution?

I appreciate your continued support and insights.

Best regards,
Ehsan

kwerner · Aug 13, 2025

Ehsan,
It's not clear whether this issue is related to the resolution; however, there is a higher-resolution MODIS data available, which I did test. It gives the same LANDUSEF values, unfortunately. If you do want to try using it - just for higher-resolution landuse, download the MODIS-15s data and then set, for either d04, or for both d03 and d04, geog_data_path to 'modis_15s+default'

The only solution I've found is to decrease the ref_lat to a value no greater than 48.1. If this means that your area of interest is no longer in the domain, you may need to adjust the domain sizes a bit, but otherwise, if that value works for you, I would advise to proceed forward, using that latitude. I've still yet to receive a response from my colleague, and I don't want you to have to continue to wait.

Issue with LANDUSEF variable during geogrid.exe

ehsantaghizadeh

Member

Attachments

kwerner

Administrator

ehsantaghizadeh

Member

Attachments

ehsantaghizadeh

Member

ehsantaghizadeh

Member

ehsantaghizadeh

Member

kwerner

Administrator

ehsantaghizadeh

Member

kwerner

Administrator

ehsantaghizadeh

Member

ehsantaghizadeh

Member

kwerner

Administrator

ehsantaghizadeh

Member

kwerner

Administrator

ehsantaghizadeh

Member

kwerner

Administrator