Hello,
Edit : I first thought the problem came from the ref lat/lon in wrfbdy_d02 output from ndown.exe, but after another dedicated try-run (totally different config, CFSR forced), it appears that wrfbdy_d02 always have ref lat/lon from d01 instead of d02 (see second post below). Could anyone confirm it is normal ?
About the most common reasons for segfault :
1) I did several tests with less or more procs : same results, not the problem. OK
2) I checked for disk space : OK
3) Input data seems OK, WPS, REAL & NDOWN = success : OK --> NOK segfault probable origin : corrupted ndown outputs on some variables (notably pressures, Qvapor) - I had only checked T2, TSK, SST, U10 & V10 which were fine so I didn't get any further, mea culpa
4) I checked for CFL errors : OK
5) I checked for memory size : OK
More info : I work on "Datarmor" supercomputer. And I thounght of a problem caused by ref_lat/lon because I already had a segfault probably caused by a typo on this vairiables in a previous run namelist (see : (RESOLVED) forrtl: severe (174): SIGSEGV, segmentation fault occurred --> namelist typo)
If anyone got an idea : ndown.exe normally outputs 2 files : wrfbdy_d02 and wrfinput_d02 ; these two gridded datasets should be (as I thought) on the same grid but here, I only get wrfinput_d02 well centered :
Note that only CEN_LAT and CEN_LON are wrong compared to the grid of wrfinput_d02.
Moreover, I don't seem to get an error in the ndown.exe rsl* files (which made me doubt little being it the cause of my problem) :
> scs (equivalent to grep 'SUCCESS' rsl.e*)
rsl.error.0000: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
rsl.error.0001: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
rsl.error.0002: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
rsl.error.0003: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
... and so on for all rsl.error*
Of course, when I try to run WRF with these as inputs (after having switched names *d02 to *d01 as necessary), if fails and the WRF's rsl* files read :
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Please, help.
Configs short description :
Coarse is composed of 3 domains with d01 centered on 17.90401°S / 172.7851°W and nested d02/d03 both centered on 17.62045°S / 149.5606°W
Fine is composed of 2 domains with d01/d02 respectively equivalent to Coarse's d02/d03 above, hence also centered on 17.62045°S / 149.5606°W
See the namelist.input* attached.
Edit : I first thought the problem came from the ref lat/lon in wrfbdy_d02 output from ndown.exe, but after another dedicated try-run (totally different config, CFSR forced), it appears that wrfbdy_d02 always have ref lat/lon from d01 instead of d02 (see second post below). Could anyone confirm it is normal ?
About the most common reasons for segfault :
1) I did several tests with less or more procs : same results, not the problem. OK
2) I checked for disk space : OK
3) Input data seems OK, WPS, REAL & NDOWN = success : OK --> NOK segfault probable origin : corrupted ndown outputs on some variables (notably pressures, Qvapor) - I had only checked T2, TSK, SST, U10 & V10 which were fine so I didn't get any further, mea culpa
4) I checked for CFL errors : OK
5) I checked for memory size : OK
More info : I work on "Datarmor" supercomputer. And I thounght of a problem caused by ref_lat/lon because I already had a segfault probably caused by a typo on this vairiables in a previous run namelist (see : (RESOLVED) forrtl: severe (174): SIGSEGV, segmentation fault occurred --> namelist typo)
If anyone got an idea : ndown.exe normally outputs 2 files : wrfbdy_d02 and wrfinput_d02 ; these two gridded datasets should be (as I thought) on the same grid but here, I only get wrfinput_d02 well centered :
> nch wrfinput_d02 | gall (i.e. ncdump -h $1 | grep <wanted variables>) :WEST-EAST_GRID_DIMENSION = 121 ; :SOUTH-NORTH_GRID_DIMENSION = 121 ; :BOTTOM-TOP_GRID_DIMENSION = 33 ; DX = 7000.f ; DY = 7000.f ; :USE_THETA_M = 0 ; :GWD_OPT = 0 ; :GRID_ID = 2 ; :I_PARENT_START = 318 ; :J_PARENT_START = 65 ; DT = 40.f ; :CEN_LAT = -17.62045f ; --> OK :CEN_LON = -149.5606f ; --> OK | > nch wrfbdy_d02 | gall :WEST-EAST_GRID_DIMENSION = 121 ; :SOUTH-NORTH_GRID_DIMENSION = 121 ; :BOTTOM-TOP_GRID_DIMENSION = 33 ; DX = 7000.f ; DY = 7000.f ; :USE_THETA_M = 0 ; :GWD_OPT = 0 ; :GRID_ID = 2 ; :I_PARENT_START = 318 ; :J_PARENT_START = 65 ; DT = 40.f ; :CEN_LAT = -17.90401f ; --> NOK :CEN_LON = -172.7851f ; --> NOK |
Note that only CEN_LAT and CEN_LON are wrong compared to the grid of wrfinput_d02.
Moreover, I don't seem to get an error in the ndown.exe rsl* files (which made me doubt little being it the cause of my problem) :
> scs (equivalent to grep 'SUCCESS' rsl.e*)
rsl.error.0000: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
rsl.error.0001: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
rsl.error.0002: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
rsl.error.0003: ndown_em: SUCCESS COMPLETE NDOWN_EM INIT
... and so on for all rsl.error*
Of course, when I try to run WRF with these as inputs (after having switched names *d02 to *d01 as necessary), if fails and the WRF's rsl* files read :
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Please, help.
Configs short description :
Coarse is composed of 3 domains with d01 centered on 17.90401°S / 172.7851°W and nested d02/d03 both centered on 17.62045°S / 149.5606°W
Fine is composed of 2 domains with d01/d02 respectively equivalent to Coarse's d02/d03 above, hence also centered on 17.62045°S / 149.5606°W
See the namelist.input* attached.
Attachments
Last edited: