Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRFv4.5.1 run fails after w-damping: forrtl: error (78): process killed (SIGTERM)

jkukulies

Member
Hi Ming!

As dicussed, I am having an issue running a WRF simulation with the following setup using version WRF4.5.1. The simulation is a pretty large domain with 1368 x 1016 grid cells (dx = 4km over CONUS). The compilation works well and I successfully created the input data with the metgrid and real programs after using era5_to_int.py to convert ERA5 files to the intermediate format. The input files look fine to me and metgrid and real did raise any errors.

I have tried many different core configurations (testing anything between 128 and 2400 CPUS) and also made sure I have enough memory, so I don't think the error I am getting is memory-related. Also, it always happens at the same time: when w-damping is called (even for damping = 0). I have no CFL or instability related errors in any of the rsl files though.

Thank you so much in advance!!

//Julia
 

Attachments

  • namelist.input.txt
    8.6 KB · Views: 2
  • rsl.error.0000.txt
    8.3 KB · Views: 1
Hi Julia,
Your namelist.input looks fine to me. However, rsl file indicates that wrf.exe crashed immediately after it started. This often indicates that either the input data is wrong, or the memory is not sufficient for running this case.

Let's first check whether the memory is an issue. Please delete the following two options:
Code:
nproc_x                             = 16,
 nproc_y                             = 8,

Then run wrf.exe using 2 nodes (256 processors). Please let me know whether you still have the same issue.

If the case still fails, please let me know where your wrfinput and wrfbdy are located. I would like to take a loo at these data files.

Thanks.
 
Hey Ming,

I already tried two and one node with nproc_x and nproc_y being undefined.

My input data is located at: /glade/work/kukulies/wrf/run/

Thank you so much for looking into this, I really appreciate it!!
 
As an update: I tried to run the same case with a different version of WRF that works well with different input data (same domain, but different initial time) and it also fails at the same point. Therefore, I think there must be an issue with the input data I created using the latest github version of era5_to_int. The input data in /glade/work/kukulies/wrf/run/ looks right though. I am attaching the metgrid.log file from the creation of met*nc files used for wrfin and wrfbdy
 

Attachments

  • metgrid.log.txt
    2.4 MB · Views: 1
Hi Julia,
Thank you for the information.
I rerun your case and I am able to reproduce your errors. The case failed immediately after wrf.exe starts, which makes me suspicious that the input data is wrong.
I will continue to look at this case and get back to you once I know for sure what is wrong. It may take some time . Thank you for your patience.
 
Hi Ming!

I also tried a new easy case with a much smaller domain in

/glade/work/kukulies/wrf_morr/test/em_real. This version of WRF uses the same WRF compilation of version 4.5.1 (with a few changes I made to the microphysics scheme). I have also tried the same case with the precompiled version of WRFv4.5.1 and WRF4.6.1 on Derecho. But they all fail at the location (when w-damping is called).

In addition to those tests above, I have tried using era5_to_int.py with model_levels (and then subsequently the calc_ecmwf_p.exe utility program from WPS) and with pressure levels. And I have tried using different versions of WPS to process the intermediate format data into the geogrid and metgrid files needed to run WRF.

I am a bit helpless of what else to try. I have also looked at the fields of the input data and compared with data that I believe contains reasonable ranges of values. I cannot find any outstanding issues with the input data, although it still seems like this problem is most likely related to it.
 
Hi Julia,
Thank you for creating a smaller case, which makes it easier for me to debug what is wrong. I am puzzled why there is no error message except the segmentation fault. Your input data looks 100% fine to me.
I will continue to work on this case and get back to you if ther eis any progress...
 
Top