Fails to run with two nested domains but ok with single nest

rabbanidu93 · May 30, 2022

Dear altruists,

I am trying to run ./real.exe using 3 domains of 20 km, 10 km & 5 km. It stops at the middle by showing "Full level index = 34 Height = 19871.2 m Thickness = 920.1 m". But simulation can be done successfully with 2 domains. The same thing happens with domain resolutions such as 12-4-1 km and different vertical levels. I have also used several WRF versions such as 4.3.3 & 4.1.3. For these simulations, I have used many cores up to 56. None solves the problem of running with 3 domains. I have attached the log file and namelist.input herewith. The PC has a very large size of memory.

Thanks in advance.
Golam Rabbani

Deleted member 3607 · May 30, 2022

Can you provide your namelist.wps file as well?

rabbanidu93 · May 31, 2022

Whatheway said:
Can you provide your namelist.wps file as well?

Thank you for your reply. I have attached namelist.wps and namelist.input. FYI, I am using a ubuntu server having 128 cores & 132GB RAM. Still I can not run with more than 1 nested domain. Same things happen with both FNL & ECMWF datasets.

Deleted member 3607 · May 31, 2022

rabbanidu93 said:
Whatheway said:

Can you provide your namelist.wps file as well?

Click to expand...

I don't see any errors in the namelists.

When you run

./geogrid.exe
./ungrib.exe
./metgrid.exe

Do any errors show?

rabbanidu93 · May 31, 2022

Whatheway said:
rabbanidu93 said:

Whatheway said:

Can you provide your namelist.wps file as well?

Click to expand...

I don't see any errors in the namelists.

When you run

./geogrid.exe
./ungrib.exe
./metgrid.exe

Do any errors show?

Click to expand...

Actually it shows no error. The full simulation can be done successfully with 2 domains. When I use 3 domains, the simulation just stops at the very beginning showing the condition written in rsl.out.0000. Running geogrid.exe ungrib.exe & metgrid.exe shows no error. When i run 'mpirun -np 40 ./real.exe' the simulations got stop.

Ming Chen · Jun 1, 2022

What is your forcing data (i.e., the data you ungib)? I found that you only have 10 levels of data for the vertical layer up to 50hPa and 34 model levels. This may lead to problems when WRF conducts vertical interpolation, especially when the horizontal resolution is high (in your case 5km).

Also, we recommend using odd number of nesting ratio, for example, parent_grid_ratio = 1, 3, 3 or parent_grid_ratio = 1,5 ,5. This is because because for even values, interpolation errors arise due to the nature of Arakawa C-grid staggering.

rabbanidu93 · Jun 2, 2022

Ming Chen said:
What is your forcing data (i.e., the data you ungib)? I found that you only have 10 levels of data for the vertical layer up to 50hPa and 34 model levels. This may lead to problems when WRF conducts vertical interpolation, especially when the horizontal resolution is high (in your case 5km).

Also, we recommend using odd number of nesting ratio, for example, parent_grid_ratio = 1, 3, 3 or parent_grid_ratio = 1,5 ,5. This is because because for even values, interpolation errors arise due to the nature of Arakawa C-grid staggering.

Dear Ming Chen,
Thank you for your reply.
I used ECMWF newly published forecast data (https://data.ecmwf.int/forecasts/) that has 10 num_metgrid_levels & 0 num_metgrid_soil_levels. Its resolution is 0.4 deg by 0.4 deg. I got a sample Vtable.ECMWF from this forum then edited it myself. I have attached the Vtable.ECMWF herewith in case i edited it in a wrong way, please check it.

According to your recommendation I have conducted some new simulations maintaining parent_grid_ratio = 1, 3, 3 for 36-12-4 km domains. But still stops with 3 domains and ok with 2. Then I have changed the dataset to NCEP FNL data (1-degree by 1-degree) having 34 num_metgrid_levels & 4 num_metgrid_soil_levels. Still got the same issue with 3 domains.

*Should I use options like use_adaptive_time_step, step_to_output_time, target_cfl, max_step_increase_pct, starting_time_step, max_time_step, min_time_step, use_surface, force_sfc_in_vinterp etc. But I am not quite familiar with these options. If needed, can you please suggest me what should be the values in my cases? FYI, I am using a ubuntu server having 128 cores & 132GB RAM.

I am eagerly waiting for your kind suggestions.

Ming Chen · Jun 2, 2022

Please let me know the following information:
(1) Which version of WRF/WPS are you using?
(2) How did you compile WRF/WPS and run the job? (e.g., compiler, dm or sm mode you choose)
(3) Where did you download FNL/ECMWRF data?
(4) When you run the job by mpirun -np 40 ./real.exe, did you see 40 rsl.out and rsl.error files in your working directory?

rabbanidu93 · Jun 5, 2022

Ming Chen said:
Please let me know the following information:
(1) Which version of WRF/WPS are you using?
(2) How did you compile WRF/WPS and run the job? (e.g., compiler, dm or sm mode you choose)
(3) Where did you download FNL/ECMWRF data?
(4) When you run the job by mpirun -np 40 ./real.exe, did you see 40 rsl.out and rsl.error files in your working directory?

1. WRF 4.3 & WPS 4.3
2. gfortran & dmpar
3. FNL: https://rda.ucar.edu/datasets/ds083.2/
ECMWF: https://data.ecmwf.int/forecasts/
4. Yes. There are 40 rsl.out and 40 rsl.error files

Ming Chen · Jun 9, 2022

I am suspicious this could be a memory issue. Since your 3rd domain has a large number of grids, it may require more memory.

Can you try to run with more processors, for example run with 48 or 56 processors?

It may also be worth trying to unlimit the stacksize. In sh/bash, you can run 'ulimit -s unlimited' and in csh/tcsh you can run 'limit stacksize unlimited'.

Please try.

rabbanidu93 · Jun 11, 2022

Ming Chen said:
I am suspicious this could be a memory issue. Since your 3rd domain has a large number of grids, it may require more memory.

Can you try to run with more processors, for example run with 48 or 56 processors?

It may also be worth trying to unlimit the stacksize. In sh/bash, you can run 'ulimit -s unlimited' and in csh/tcsh you can run 'limit stacksize unlimited'.

Please try.

Yes I have tried several number of cores such as 48, 56, upto 100 cores. 'ulimit -s unlimited' is always kept in my .bashrc. While running ./real.exe, i have observed real-time ram usage. I see real.exe uses maximum 33% RAM (out of 111GB) and stops at that condition. For your kind information, the parent domain covers -10S to 40N & 60W to 130E including full Himalayan region, India, Bangladesh, Bay of Bengal etc. Does the problem occurs because of these complex topography of these regions? In that case, what could be the necessary steps to run the simulation considering these issues?
*I have done a successful simulation using 3 domains (2 nested) which cover fewer grids (upto 100) over 20-30N & 80-95E.

Ming Chen · Jun 17, 2022

I agree that high-topography and fine-resolution sometimes may problems. In your case, however, I still think it is a memory issue. Based on your message that *I have done a successful simulation using 3 domains (2 nested) which cover fewer grids (upto 100) over 20-30N & 80-95E ", I suppose the model should work fine over the region.

Shaivi Shukla · May 21, 2024

rabbanidu93, Ming Chen I am also facing the same issue , my real program works well upto 1st domain , then crashes indicating
Full level index = 32 Height = 18845.8 m Thickness = 1023.7 m. I am using CESM dataset. Here I had attached he files. Can you please tell me how you solved this problem.

Fails to run with two nested domains but ok with single nest

rabbanidu93

Member

Attachments

Deleted member 3607

Guest

rabbanidu93

Member

Attachments

Deleted member 3607

Guest

rabbanidu93

Member

Ming Chen

Moderator

rabbanidu93

Member

Attachments

Ming Chen

Moderator

rabbanidu93

Member

Ming Chen

Moderator

rabbanidu93

Member

Ming Chen

Moderator

Shaivi Shukla

New member