Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRFDA execution error: Program received signal SIGABRT: Process abort signal.

Dear WRFDA developer team,

We are facing complex execution errors while running WRFDA using conventional BUFR observations (downloaded through the NCAR website). We tried to debugg the WRFDA code ourselves until finding a larger issue that would affect many parts of the code.

Here is the setting of our configuration:
- WRFDA 4.5.2
- Compilator GNU/gfortran option 34
- Domain 450x450km; grid size 5km and 40 vertical levels
- Machine with 38 CPU and 250 GO RAM

Firstly we got an error with the size of the sfzo matrix (from 50 to 61), which is currently hardcoded at 50 and in our case had to be increased up to 61.

Then, we got a SIGABRT in WRFV4.5.2/var/build/da_minimisation.f:3397
Or a SEGFAULT with gdb.The proble is due to the fact that the size of cv_size is negative.
A quick analysis shows that the calculation of cv_size induces a 32-bit integer overflow :
=> Line 553 var/da/da_main/da_solve.inc
be % cv % size = be%cv%size_jb + be%cv%size_je + be%cv%size_jp + be%cv%size_js + be%cv%size_jl + be%cv%size_jt
cv_size = be % cv % size
In our case :

(gdb) p be % cv
$7 = ( size = -2106604383, size_jb = 32457761, size_je = 538976288, size_jp = 538976288, size_js = 538976288, size_jl = 0, size_jt = 538976288, size1c = 538976288, size2c = 538976288, size3c = 0, size4c = 0, size5c = 0, size6c = 0, size7c = 0, size8c = 0, size9c = 0, size10c = 0, size11c = 0, size_alphac = 0, size1 = 8064040, size2 = 8064040, size3 = 8064040, size4 = 8064040, size5 = 201601, size6 = 0, size7 = 0, size8 = 0, size9 = 0, size10 = 0, size11i = 0, size1l = 930378601, size2l = 1681679455, size3l = 1767992687, size4l = 1012874862, size5l = 0, sizechemic = (0) )

(gdb) p be%cv%size_jb + be%cv%size_je + be%cv%size_jp + be%cv%size_js + be%cv%size_jl + be%cv%size_jt

$15 = 2188362913
The sum is over 2^32/2 = 2147483648
When we try to store 2188362913 inside a 32-bit integer, we get -2106604383.

We provide you with the input namelist and output errors from WRDA.

Thanks much in advance for any help you can provide to help us better understanding the sources of the errors or what we are doing wrong when using WRFDA,

Best regards,

Pauline Martinet
 

Attachments

  • out_mpi.zip
    11.3 KB · Views: 1
Top