Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

segmentation fault while running init_atmosphere_model

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Hi all,

I got the following error while running init_atmosphere_model for generating static.nc nad init.nc, also while running atmosphere_model

$mpirun -np 1 --mca btl ^openib ./init_atmosphere_model

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x3FFFAD6A9ED3
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 144190 on node p8umbc2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

The MPAS version is 6.0, and WRF(WPS) version is 3.7. I also attached namelist.init_atmosphere file and streams.init_atmosphere for both static run and init run, respectively. For generating static.nc, I used namelist.init_atmosphere_static file and streams.init_atmosphere_static file. For generating init.nc, I used namelist.init_atmosphere_init file and streams.init_atmosphere_init file.
Actually, I could run init_atmosphere_mode before. But after some modification, I can not run it now. I am now sure why. Would you please help me to double check my namelist and streams here? Thank you

Zhifeng
 

Attachments

  • namelist_init_atmosphere_init.txt
    1.2 KB · Views: 58
  • namelist_init_atmosphere_static.txt
    1.2 KB · Views: 58
  • streams_init_atmosphere_init.txt
    659 bytes · Views: 57
  • streams_init_atmosphere_static.txt
    696 bytes · Views: 58
After I modified to the single precision, it seems it works for June 01, 2015. I mean both init_atmosphere_model and atmosphere_model. However, when I ran for Jan 10, 2018, init_atmosphere_model works fine. But atmosphere_model still has the segmentation fault. I realized there are differences between GFS initial data from June 2015 and Jan 2018. Both of them are using WPS3.9 to process now.

I am not sure whether it is a memory problem or not. Here I attached namelist.atmosphere (namelist_atmosphere.txt) and atmosphere_model log file (log_atmosphere_0000_out.txt).

Zhifeng
 

Attachments

  • namelist_atmosphere.txt
    1.7 KB · Views: 54
  • log_atmosphere_0000_out.txt
    10.1 KB · Views: 52
Which compilers have you used to build the MPAS init_atmosphere_model and atmosphere_model programs? We've found in the past that, if the Intel compilers are used, it's necessary to unlimit the stack size ("ulimit -s unlimited" in bash, or "limit stacksize unlimited" in tcsh). The Intel compilers seem to allocate many arrays on the stack (rather than on the heap), perhaps for performance reasons, and this can cause the default stack size limit to be exceeded.
 
I just tried to use ulimit -s unlimited. The error is still there. I am not sure whether I should recompile the MPAS atmosphere or not. I guess not.
I am using gfortran for this run. But I am not sure why 2015 works fine, while 2018 does not. I doubt the error is related to the initial data. I am comparing the initial data from GFS.
 
Can you try recompiling the atmosphere core with debugging options enabled? Specifically, you can run:
Code:
make clean CORE=atmosphere
make gfortran CORE=atmosphere PRECISION=single DEBUG=true
You may need to add "USE_PIO2=true" as before.
 
I recompiled the code with the above commands. The new error is more specific. Here is the new error.

$ mpirun -np 4 --mca btl ^openib ./atmosphere_model

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0 0x3FFFADB89ED3
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 92377 on node p8umbc2 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------

From the new error, I am guessing it's caused by the input data (from GFS initial data).
First step, I started to compare the GFS initial data from both 2015 and 2018 (e.g. gfs.0p25.2015060100.f000.grib2). I found GFS data in 2015 and 2018 are the same in terms of min/max values, no abnormal values.

Second step, I compared the WPS intermediate file (e.g. FILE:2015-06-01_00). I found there are 10 variables with different min/max values (SNOW, SNOWH, SM000010, SM010040, SM040100, SM100200, ST000010, ST010040, ST040100, ST100200). There are related to snow, soil moisture, soil temperature. It seems that the missing values (e.g. 1.e+20) are treated as normal value. I printed out the variables info in detail using NCL and you can find them attached. 2015_ungrib.txt is for file FILE:2015-06-01_00. 2018_ungrib.txt is for FILE:2018-01-10_00. Both FILE:* files are from WPSv3.9. And I found the same issue in WPSv3.7. Both WPSv3.7 and WPSv3.9 are compiled by gfortran.

Third step, I also compare the initial data output (x1.2562.init.nc) from init_atmosphere_model. I also found the abnormal min/max values for the following variables. Here is the list.

Different values between 2015 vs 2018 in the file x1.2562.init.nc

2015 (normal):

(0)     Now working on = smois   
(0)     min=0.02900000102818012   max=1
(0)     Now working on = tslb
(0)     min=220.0579893467179   max=315.149462617051
(0)     Now working on = snow
(0)     min=0   max=447.9040651414476
(0)     Now working on = snowh
(0)     min=0   max=2.239520325707238


2018 (abnormal):

(0)     Now working on = smois   
(0)     min=0.02215833000351746   max=6.432640957819654e+20
(0)     Now working on = tslb
(0)     min=221.9800769973877   max=6.432640957819654e+20
(0)     Now working on = snow
(0)     min=0   max=1.999800052110802e+21
(0)     Now working on = snowh
(0)     min=0   max=9.999000260554009e+18

So from the above comparison, I can say the problem is coming from WPS ungrib.exe. Next step, I will dig into ungrib and find the specific location to process GFS data.

What do you think? Thank you for your help.
 

Attachments

  • 2015_ungrib_wps3.9.txt
    78.5 KB · Views: 51
  • 2018_ungrib_wps3.9.txt
    90.9 KB · Views: 59
That the soil fields have bad values is the clue we need, I think! NCEP made some changes to the GFS soil fields in 2017 that required an update to the WPS. Here are the WPS release notes regarding this issue: http://www2.mmm.ucar.edu/wrf/users/wpsv3.9/updates-3.9.0.1.html . Can you try processing the 2018 GFS data using the latest WPS code, then re-creating the MPAS-A initial conditions file?
 
Thank you very much. I tried to find the known issues from WRF website, but I did not find this one. This is great. I downloaded the newest version of WPS (v4.0) and it worked well. Now the problem is solved. I can run atmospere_model using 2018 GFS data.

But I still found there are some abnormal values in the above mentioned variables. The minimum values are too small, around -1e+30. Here is one example of variable "SM000010"

(0) Field #7
(0) name : 'SM000010'
(0) description : 'Soil Moist 0-10 cm below grn layer (Up)'
(0) units : 'fraction'
(0) min/max value : -1e+30 / 0.468
(0) date : '2018-01-10_00:00:00'
(0) map source : 'NCEP GFS Analysis'
(0) version : 5
(0) forecast hour : 0
(0) level : 200100
(0) ny x nx : 721 x 1440
(0) projection : 0
(0) startlat : 90
(0) startlon : 0
(0) deltalat : -0.25
(0) deltalon : 0.25
(0) earth_radius : 6371.23

Atmosphere_model ran well and the following variables seem reasonable. Here I attached the plots for them.

temperature at 2 meter
relative humidity
surface pressure
accumulated total grid-scale precipitation

I may double check other variables related to soil moisture and temperature, and snow.
 

Attachments

  • 2018_09_26_201801case_v01.pptx
    28.4 MB · Views: 54
The init_atmosphere core looks for a value of -1E30 specifically and treats it as a missing value in fields: https://github.com/MPAS-Dev/MPAS-Model/blob/v6.1/src/core_init_atmosphere/mpas_init_atm_cases.F#L3329 . As long as the range of values is reasonable in your model initial conditions, there should be no problem.
 
Top