Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Writing large WRF model output files with pnetcdf

jkukulies

New member
I am running an idealized case (em_quarter_ss) with a large domain at 250 m with 2496 grid cells in both directions and 96 vertical levels. While the same simulation at 500 m (with 1248x 1248 grid cells) works without problem on Derecho, the simulation hangs when it tries to write the first output file. I used io_format=102 for wrfinput_d01, but io_format=2 for the output. I have tried up to 60 nodes (each 128 cores) on Derecho and wonder if this problem can be solved by going even higher or if I should find a different solution, such as using the library parallel-netcdf.

I have tried to compile WRF with Pnetcdf, but was not successful.

Attached you can find the namelist and error log from the simulation trial as well as the compile log and code changes to wrf_io.F90 to use the pnetcdf library for large netcdf file output.
 

Attachments

  • namelist.input.txt
    5.4 KB · Views: 2
  • rsl.error.0000.txt
    7.1 KB · Views: 1
  • rsl.error.7679.txt
    28.4 KB · Views: 1
  • compile_with_pnetcdf.log
    816 KB · Views: 0
  • wrf_io.F90.txt
    123.6 KB · Views: 1
If you want to have a look at the case, it is located here: glade/work/kukulies/WRF/TempScripts/19_2011-07-13_CTRL_Midwest_-Loc1_MCS_Storm-Nr_JJA-8-TH5/250
 
Hi,
I would like to apologize for the long delay in response. Between holidays, unplanned time out of the office, the AMS conference, and planning for our WRF tutorial taking place this week, we got pretty behind on forum questions. Thank you so much for your patience.

Are you still experiencing issues with this? I don't believe you need additional processors. Take a look at Choosing an Appropriate Number of Processors to determine an appropriate number for you.

I do believe, however, that when you use the io_format=102 option for any of the executables, you then have to continue using that format for the remaining executables. I.e., because you used it to create the wrfinput file, you probably have to use it to run WRF, as well.
 
No problem and thank you for the answer! Unfortunately I still have problems getting the simulation to run. I tried different numbers of processors based on the guidelines you linked to, but with the io_format=2 it always hangs when wrfinput is written. The same simulation itself with a quarter of the grid cells (dx=500m) runs pretty fast and there is no problem in writing the output files. Do you think that the problem is related to the size of the domain and that when choosing io_format=2 only one MPI process writes the file while the others wait? I know it is a large domain but then I thought that it should still be within what WRF can handle with the normal output format.

I also tried io_format= 102 for both IDEAL and WRF which also works fine. But the problem here is that I could not get the joiner program to work on Derecho and it seems that nobody has managed to do so thus far?
 
Again, I'd like to apologize for the delay. Right after the tutorial, I had to be out of the office for several days, unexpectedly. Are you still struggling with this? I tried to look around in the directory where you pointed to above, but it looks like the rsl files are stopping when running ideal.exe, and not wrf.exe. Are you now having problems with that, as well? If so, I do see the error

Code:
ERROR: ghg_input available only for these radiation schemes: CAM, RRTM, RRTMG, RRTMG_fast
           And the LW and SW schemes must be reasonably paired together:
           OK = CAM LW with CAM SW
           OK = RRTM, RRTMG LW or SW, RRTMG_fast LW or SW may be mixed

To overcome this problem, set "ghg_input = 0" in the &physics section of the namelist.

A couple other thoughts:
1) Yes, I believe the code is struggling to write files that large.
2) Set debug_level = 0. Turning this on rarely provides any useful information and just makes the output files very large and difficult to read. I don't think this is causing your issue, but it could down the road.
3) For a domain this size, you probably should be using something like 15K processors, and at the VERY minimun, you should probably use around 1000. Have you tried using that many?
 
Thanks for all the tips! I set debug_level and ghg_input to 0 and tried to run IDEAL again on 120 nodes (15360 processors) but without success. There is no error coming up (I think even before when the ghg_input was not set to 0, this error did not cause IDEAL to fail).

I think it looks like the problem is that the program is not able to finish writing the wrfinput file within a wall time of 12 hours, even with >15000 cores. I am wondering if only one processor is responsible for the actual output writing when io_format is not set to 102? Therefore, I was wondering if it would be an option to compile WRF with pnetcdf or how you would go about this when I cannot use the joiner program on Derecho but it seems like an unreasonable amount of core hours that are just burned because it takes so long time to write the output.
 

Attachments

  • rsl.error.00015359.txt
    744 bytes · Views: 3
**Edited 2/26/24**

I should have been more clear that when I suggested using up to 15K processors, that would be for running wrf.exe, not ideal.exe. For some ideal cases, you can only use a single processor to run ideal.exe, but fortunately for this one, you can use more. I should mention than in my 12 years of experience, I've never seen a case where someone used more than a little more than 1000 grid spaces, so your case is more than double the size of any case I've experienced.

Regarding the "joiner" program, have you seen this post, and would that help you?
 
Last edited:
Hi again,

I believe the issue is related to the number of vertical levels you're using. I ran a test with your set-up and was able to run ideal.exe with 1024 processors if I used the default number of vertical levels (41), but when I use 96, like you are requesting, it hangs - meaning something is wrong. If you want to use more than 41 levels, you may have to do some quick tests to see what the max is that you can use, but at least we know that ideal.exe can be run with your domain size - you'll just need to modify the number of vertical levels.
 
Thanks for running a test run with a smaller amount of vertical levels. That is a good point! However, my aim is to compare the 250m simulation with the same simulation at different grid spacings so it would be ideal if I could use the same number of vertical levels.

Since I have the parallel output files (io_format =102), I am back on makimg the joiner program work. Do you know if there is a working version somewhere on casper? I was not able to compile it on Derecho despite the changes you pointed me to in the other post.

If this does not work, do you have another strategy of how to combine the files (e.g. using a different software or programming language than the Fortran one)?
 
Unfortunately I'm not aware of any working version of the joiner program, or any other software others have used. The joiner program was given to us by an outside colleague and is not officially supported by our group. I would recommend perhaps commenting on other posts that mention patching the tiles back together to see if they have any tips for you. If you do happen to get something working, and you don't mind, please share that code with us so that we can share it with others. Unfortunately we don't currently have the resources to work with the code. Thanks.
 
Top