Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Fails to run with two nested domains but ok with single nest

Dear altruists,

I am trying to run ./real.exe using 3 domains of 20 km, 10 km & 5 km. It stops at the middle by showing "Full level index = 34 Height = 19871.2 m Thickness = 920.1 m". But simulation can be done successfully with 2 domains. The same thing happens with domain resolutions such as 12-4-1 km and different vertical levels. I have also used several WRF versions such as 4.3.3 & 4.1.3. For these simulations, I have used many cores up to 56. None solves the problem of running with 3 domains. I have attached the log file and namelist.input herewith. The PC has a very large size of memory.

Thanks in advance.
Golam Rabbani
 

Attachments

  • log.txt
    7.5 KB · Views: 11
  • namelist(1).input
    4 KB · Views: 41
Can you provide your namelist.wps file as well?
 
Whatheway said:
Can you provide your namelist.wps file as well?

Thank you for your reply. I have attached namelist.wps and namelist.input. FYI, I am using a ubuntu server having 128 cores & 132GB RAM. Still I can not run with more than 1 nested domain. Same things happen with both FNL & ECMWF datasets.
 

Attachments

  • namelist.input
    3.9 KB · Views: 21
  • namelist.wps
    783 bytes · Views: 24
  • rsl.out.0000.txt
    1.2 MB · Views: 9
rabbanidu93 said:
Whatheway said:
Can you provide your namelist.wps file as well?

I don't see any errors in the namelists.

When you run

./geogrid.exe
./ungrib.exe
./metgrid.exe

Do any errors show?
 
Whatheway said:
rabbanidu93 said:
Whatheway said:
Can you provide your namelist.wps file as well?

I don't see any errors in the namelists.

When you run

./geogrid.exe
./ungrib.exe
./metgrid.exe

Do any errors show?

Actually it shows no error. The full simulation can be done successfully with 2 domains. When I use 3 domains, the simulation just stops at the very beginning showing the condition written in rsl.out.0000. Running geogrid.exe ungrib.exe & metgrid.exe shows no error. When i run 'mpirun -np 40 ./real.exe' the simulations got stop.
 
What is your forcing data (i.e., the data you ungib)? I found that you only have 10 levels of data for the vertical layer up to 50hPa and 34 model levels. This may lead to problems when WRF conducts vertical interpolation, especially when the horizontal resolution is high (in your case 5km).

Also, we recommend using odd number of nesting ratio, for example, parent_grid_ratio = 1, 3, 3 or parent_grid_ratio = 1,5 ,5. This is because because for even values, interpolation errors arise due to the nature of Arakawa C-grid staggering.
 
Ming Chen said:
What is your forcing data (i.e., the data you ungib)? I found that you only have 10 levels of data for the vertical layer up to 50hPa and 34 model levels. This may lead to problems when WRF conducts vertical interpolation, especially when the horizontal resolution is high (in your case 5km).

Also, we recommend using odd number of nesting ratio, for example, parent_grid_ratio = 1, 3, 3 or parent_grid_ratio = 1,5 ,5. This is because because for even values, interpolation errors arise due to the nature of Arakawa C-grid staggering.
Dear Ming Chen,
Thank you for your reply.
I used ECMWF newly published forecast data (https://data.ecmwf.int/forecasts/) that has 10 num_metgrid_levels & 0 num_metgrid_soil_levels. Its resolution is 0.4 deg by 0.4 deg. I got a sample Vtable.ECMWF from this forum then edited it myself. I have attached the Vtable.ECMWF herewith in case i edited it in a wrong way, please check it.

According to your recommendation I have conducted some new simulations maintaining parent_grid_ratio = 1, 3, 3 for 36-12-4 km domains. But still stops with 3 domains and ok with 2. Then I have changed the dataset to NCEP FNL data (1-degree by 1-degree) having 34 num_metgrid_levels & 4 num_metgrid_soil_levels. Still got the same issue with 3 domains.

*Should I use options like use_adaptive_time_step, step_to_output_time, target_cfl, max_step_increase_pct, starting_time_step, max_time_step, min_time_step, use_surface, force_sfc_in_vinterp etc. But I am not quite familiar with these options. If needed, can you please suggest me what should be the values in my cases? FYI, I am using a ubuntu server having 128 cores & 132GB RAM.

I am eagerly waiting for your kind suggestions.
 

Attachments

  • namelist.input_ECMWF.txt
    3.9 KB · Views: 10
  • namelist.input_FNL.txt
    3.9 KB · Views: 12
  • namelist.wps_FNL.txt
    783 bytes · Views: 12
  • namelist.wps_ECMWF.txt
    787 bytes · Views: 8
  • Vtable.ECMWF.txt
    2.2 KB · Views: 7
Please let me know the following information:
(1) Which version of WRF/WPS are you using?
(2) How did you compile WRF/WPS and run the job? (e.g., compiler, dm or sm mode you choose)
(3) Where did you download FNL/ECMWRF data?
(4) When you run the job by mpirun -np 40 ./real.exe, did you see 40 rsl.out and rsl.error files in your working directory?
 
Ming Chen said:
Please let me know the following information:
(1) Which version of WRF/WPS are you using?
(2) How did you compile WRF/WPS and run the job? (e.g., compiler, dm or sm mode you choose)
(3) Where did you download FNL/ECMWRF data?
(4) When you run the job by mpirun -np 40 ./real.exe, did you see 40 rsl.out and rsl.error files in your working directory?

1. WRF 4.3 & WPS 4.3
2. gfortran & dmpar
3. FNL: https://rda.ucar.edu/datasets/ds083.2/
ECMWF: https://data.ecmwf.int/forecasts/
4. Yes. There are 40 rsl.out and 40 rsl.error files
 
I am suspicious this could be a memory issue. Since your 3rd domain has a large number of grids, it may require more memory.

Can you try to run with more processors, for example run with 48 or 56 processors?

It may also be worth trying to unlimit the stacksize. In sh/bash, you can run 'ulimit -s unlimited' and in csh/tcsh you can run 'limit stacksize unlimited'.

Please try.
 
Ming Chen said:
I am suspicious this could be a memory issue. Since your 3rd domain has a large number of grids, it may require more memory.

Can you try to run with more processors, for example run with 48 or 56 processors?

It may also be worth trying to unlimit the stacksize. In sh/bash, you can run 'ulimit -s unlimited' and in csh/tcsh you can run 'limit stacksize unlimited'.

Please try.

Yes I have tried several number of cores such as 48, 56, upto 100 cores. 'ulimit -s unlimited' is always kept in my .bashrc. While running ./real.exe, i have observed real-time ram usage. I see real.exe uses maximum 33% RAM (out of 111GB) and stops at that condition. For your kind information, the parent domain covers -10S to 40N & 60W to 130E including full Himalayan region, India, Bangladesh, Bay of Bengal etc. Does the problem occurs because of these complex topography of these regions? In that case, what could be the necessary steps to run the simulation considering these issues?
*I have done a successful simulation using 3 domains (2 nested) which cover fewer grids (upto 100) over 20-30N & 80-95E.
 
I agree that high-topography and fine-resolution sometimes may problems. In your case, however, I still think it is a memory issue. Based on your message that *I have done a successful simulation using 3 domains (2 nested) which cover fewer grids (upto 100) over 20-30N & 80-95E ", I suppose the model should work fine over the region.
 
rabbanidu93, Ming Chen I am also facing the same issue , my real program works well upto 1st domain , then crashes indicating
Full level index = 32 Height = 18845.8 m Thickness = 1023.7 m. I am using CESM dataset. Here I had attached he files. Can you please tell me how you solved this problem.
 

Attachments

  • namelist.input
    4.1 KB · Views: 1
  • namelist.wps
    1.4 KB · Views: 0
  • rsl.error.0000
    268.5 KB · Views: 0
  • rsl.out.0000
    489.4 KB · Views: 0
Top