Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WPS ungrib seg fault

smeech84

New member
Hello,

I am trying to rebuild my WRF workflow on Derecho but am running into problems almost constantly. The new problem seems to be with ungrib executing.

I use WRF v4.2.2 compiled with option 46 which is supposed to be the fastest. I compiled the most recent WPS for git hub against it using option 35.

Geogrid seems to work, but ungrib does not. The hard part is that I don't see any helpful information about the seg fault either:

*** Starting program ungrib.exe ***
Start_date = 2021-12-31_00:00:00 , End_date = 2022-01-05_00:00:00
output format is WPS
Path to intermediate files is ./
dec2397.hsn.de.hpc.ucar.edu: rank 0 died from signal 11

I've also tried to use the precompiled code in:
/glade/u/home/wrfhelp/derecho_pre_compiled_code/wpsv4.2/

I still get seg fault errors without any information. I would have thought there would be messages like "wrong namelist variable, or time not found in files". anything would be helpful to try to diagnose this if it was a set up problem. I also get a big binary file called 'core' after it fails.
Are there certain environmental modules that should be active when I run these executables? I never seem to have had this problem when running the exact same cases on Cheyenne...

Any advice would be appreciated as a navigate these Derecho problems

Thanks
 
Hi Ming,

I have two identical test areas right now, 1) using derecho precompiled WPS (test2) and 2) using a CCE environment compiled WPS code (test3). Both give the same immediate seg fault behavior.

/glade/derecho/scratch/smeech/wrf_workflow/tests/workflow_output/DPG_10yeardebug/test2
/glade/derecho/scratch/smeech/wrf_workflow/tests/workflow_output/DPG_10yeardebug/test3

I am using data from /glade/campaign/collections/rda/data/ds094.0/.
 
Hi,
I can repeat the error, i.e., ungrib.exe doesn't work. We are not sure yet whether this is a data issue or a machine issue.
Can you try to ungrib CFSR data that can be downloaded from NCAR RDA Dataset ds093.0, and let me know whether it works?
We used CFSR data before in cheyenne, and we know it works.
 
Hello,

I am trying to rebuild my WRF workflow on Derecho but am running into problems almost constantly. The new problem seems to be with ungrib executing.

I use WRF v4.2.2 compiled with option 46 which is supposed to be the fastest. I compiled the most recent WPS for git hub against it using option 35.

Geogrid seems to work, but ungrib does not. The hard part is that I don't see any helpful information about the seg fault either:

*** Starting program ungrib.exe ***
Start_date = 2021-12-31_00:00:00 , End_date = 2022-01-05_00:00:00
output format is WPS
Path to intermediate files is ./
dec2397.hsn.de.hpc.ucar.edu: rank 0 died from signal 11

I've also tried to use the precompiled code in:
/glade/u/home/wrfhelp/derecho_pre_compiled_code/wpsv4.2/

I still get seg fault errors without any information. I would have thought there would be messages like "wrong namelist variable, or time not found in files". anything would be helpful to try to diagnose this if it was a set up problem. I also get a big binary file called 'core' after it fails.
Are there certain environmental modules that should be active when I run these executables? I never seem to have had this problem when running the exact same cases on Cheyenne...

Any advice would be appreciated as a navigate these Derecho problems

Thanks
Try loading modules related to compilers and libraries that WRF and WPS depend on.
Try this.
Code:
module load intel
module load impi

and also don't forget to check path.
Code:
./link_grib.csh /path/to/your/data
 
Thanks for your help!

Unfortunately, I seem to have the same seg fault problem.

I made two more test directories both using the derecho precompiled code:
/glade/derecho/scratch/smeech/wrf_workflow/tests/workflow_output/DPG_10yeardebug/test4
/glade/derecho/scratch/smeech/wrf_workflow/tests/workflow_output/DPG_10yeardebug/test5

Both tests use data downloaded from the RDA site rather than from the /glade repo. Test 4 uses CFSv2 (ds094.0) data and test5 uses CFSR data (ds093.0)

I also tried using derecho's default modules:
Currently Loaded Modules:
1) ncarenv/23.06 (S) 2) craype/2.7.20 3) intel/2023.0.0 4) ncarcompilers/1.0.0 5) cray-mpich/8.1.25 6) hdf5/1.12.2 7) netcdf/4.9.2

I also tried loading the suggested modules 'intel' and 'impi'. Derecho doesn't have the 'impi' module, was that perhaps a typo?

Thanks again
-Scott
 
Hello,

I've continued trying to understand what this problem is but I cannot find anyone else who can see what the problem is. I've made a few more tests using different data, dates, loaded module, $PATH and LD_LIBRARY_PATH environments, and differing versions of compiled code on Derecho and I still get an immediate seg fault every time I run it. Is there any other possible advice someone can give me?

Thanks
 
The segmentation fault is caused by inappropriate installation of GRIB2 library, including the libs under ~/wrfhelp.

Before we fix the library issue, please copy the executable file saved at

/glade/derecho/scratch/chenming/WPS/ungrib.exe.mchen.

I have run a quick test and this file works fine.

Please let me know if you still have issues running ungrib.exe.
 
Hi Ming,

Thanks and I notice that I can run that same test case with your ungrib.exe you have provided. I hope that the grib2 library fix may be helpful going forward for any other users encountering this on Derecho. Again, thanks for all the help this month!
 
Top