Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Error while running WRF ndown.exe

I am running ndown.exe to get finer resolution from coarser resolution using wrf3.8.1. I have run all the steps prior to running wrf.exe for finer resolution. However, I keep getting the following error.
d01 2015-07-14_04:01:00 RRTMG: pre-computed snow effective radius found, setting inflglw=5 and iceflglw=5
d01 2015-07-14_04:01:00 RRTMG: pre-computed cloud droplet effective radius found, setting inflglw=3
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread-2.31.s 0000148A07B5D8C0 Unknown Unknown Unknown
libpthread-2.31.s 0000148A07B5C623 __write Unknown Unknown
wrf.exe 0000000002B11BE8 for__write_output Unknown Unknown
wrf.exe 0000000002B12B09 for__put_sf Unknown Unknown
wrf.exe 0000000002B3B62F for_write_seq_fmt Unknown Unknown
wrf.exe 0000000002B3930F for_write_seq_fmt Unknown Unknown
wrf.exe 0000000000755A48 Unknown Unknown Unknown
wrf.exe 000000000099F0AF Unknown Unknown Unknown
wrf.exe 0000000001D00651 Unknown Unknown Unknown
wrf.exe 00000000017B3B0E Unknown Unknown Unknown
wrf.exe 000000000189E754 Unknown Unknown Unknown
wrf.exe 000000000129903A Unknown Unknown Unknown
wrf.exe 00000000011468BC Unknown Unknown Unknown
wrf.exe 000000000054F6EF Unknown Unknown Unknown
wrf.exe 000000000040F071 Unknown Unknown Unknown
wrf.exe 000000000040F031 Unknown Unknown Unknown
wrf.exe 000000000040EFCD Unknown Unknown Unknown
libc-2.31.so 0000148A03E1029D __libc_start_main Unknown Unknown
wrf.exe 000000000040EEFA Unknown Unknown Unknown

I could not understand the error just from the error message above. Please see namelist.input and rsl.error.0000 attached. Do you have any idea?

Thanks,
 

Attachments

  • namelist.input
    1.9 KB · Views: 1
  • rsl.error.0000
    5.2 MB · Views: 1
Last edited:
Hi,
It's difficult to say for sure, but it's very likely that the model just simply isn't able to run this large domain (703 x 703) with only 64 processors. Can you try to increase that number to something much larger and see if it's able to get further?

If that doesn't help, can you try this with the latest version of WRF to see if that makes any difference? Unfortunately V3.8.1 is very old and we are no longer able to support it.
 
Thank you for your suggestions.
I just tried to increase cores from 64 to 640, and it still did not work. The new error message is
d01 2015-07-14_04:01:00 RRTMG_LWF: Number of columns is 1716
d01 2015-07-14_04:01:00 RRTMG_LWF: Number of columns per partition is 8
d01 2015-07-14_04:01:00 RRTMG_LWF: Number of partitions is 215
MPT ERROR: Rank 160(g:160) received signal SIGSEGV(11).
Process ID: 65877, Host: r9i6n22, Program: /glade/scratch/zhifengy/data/experiment_output/enkf_pecan/Visit_PSU2023/ndown_EnKF_conv_11radar_DeterministicForecast/0400_mean/wrf.exe
MPT Version: HPE MPT 2.25 08/14/21 03:05:20

MPT: --------stack traceback-------
MPT: Attaching to program: /proc/65877/exe, process 65877
MPT: Missing separate debuginfo for /glade/u/apps/ch/os/lib64/libm.so.6
MPT: Try: zypper install -C "debuginfo(build-id)=cc12cf31ea4a157ebc7ac7bdfc09d5bfa3e0f3e0"
MPT: (No debugging symbols found in /glade/u/apps/ch/os/lib64/libm.so.6)
MPT: Missing separate debuginfo for /glade/u/apps/ch/os/lib64/libdl.so.2
MPT: Try: zypper install -C "debuginfo(build-id)=86536433e7d5a205deca926f6f00436b2ff206b1"
MPT: (No debugging symbols found in /glade/u/apps/ch/os/lib64/libdl.so.2)
MPT: Missing separate debuginfo for /glade/u/apps/ch/os/lib64/librt.so.1

This error is still not obvious.

If I use the latest version of WRF (4.4 or 4.5), can wrf4.4 read wrfout files from wrf3.8.1 as input for ndown.exe? Could you let me which version of WRF are you still supporting?
 
Hi,
I first would like to apologize for the long delay in response. We have been occupied with preparing for our upcoming code release and have gotten behind with forum posts. Thank you so much for your patience.

Officially, we only support the latest version of the code, but if the issue is proven to not be related to the code version, then we can try to help support versions back to V4.0. Regarding whether ndown is able to work properly with older code input - if you were able to run ndown.exe without any problems, and you were able to produce the new wrfbdy and wrfinput files (and if they look okay), then I would assume that shouldn't be a problem. Once you start to run your finer-resolution domain, it's not different than any other model simulation, so the problem shouldn't be related to the ndown program.

If you're still experiencing trouble with this, can you package all of your rsl* files (from your 640 processor simulation) into a single *.tar file and attach that, along with the namelist.input file that was used to run the original coarse domain simulation (if you still have that)? Please also look at your fine-resolution wrfbdy and wrfinput files, just to make sure nothing looks wrong. Thanks!
 
Top