philipdumont
New member
We have been using WPS/WRF version 3.6 for some time.
Recently, we started tripping on a bug in nested domains when the region of interest was centered on the 180 degree longitude. We wanted a fix for this problem, heard that it is fixed in version 3.9, and so we are attempting to upgrade to WPS/WRF 3.9.latest. (We avoided going to WRF 4, expecting such a move to require a more difficult/involved integration of our software.)
To the extent possible, we are trying to use the same configs/namelists in 3.9 as we did in 3.6.
One exception to this is that we found the default value for o3input changed between 3.6 and 3.9. (I think the doc said it changed at 3.7.) Since our namelist.input file was not providing a value for o3input, we were getting a different value (different default) in 3.9 than we had been in 3.6, which caused wrf.exe to fail immediately at launch with a complaint of not being able to find file ozone.something-or-other. This was easily fixed giving an explicit "o3input=0" in namelist.input. (0 was the default value in 3.6.)
With that change wrf.exe 3.9 gets started. And it generates some output -- one or two of the wrfout_d* files show up (out of an expected few dozen). But then output just stops. Nothing written to any of wrfout*, rsl*, or any other file. The wrf.exe processes are still running, still using up as much CPU as they can get, but not doing anything.
When it gets to this state, I grab all of the wrf.exe processes with strace(1) (multiple -p options), and all I see is a whole lot of sched_yield(2) system calls. If I add an option to the strace command line to trace everything *but* sched_yield, I see that none of the wrf.exe processes are calling any other system calls at all.
Any idea what might cause this? Any idea how to go about debugging it?
Thanks.
Recently, we started tripping on a bug in nested domains when the region of interest was centered on the 180 degree longitude. We wanted a fix for this problem, heard that it is fixed in version 3.9, and so we are attempting to upgrade to WPS/WRF 3.9.latest. (We avoided going to WRF 4, expecting such a move to require a more difficult/involved integration of our software.)
To the extent possible, we are trying to use the same configs/namelists in 3.9 as we did in 3.6.
One exception to this is that we found the default value for o3input changed between 3.6 and 3.9. (I think the doc said it changed at 3.7.) Since our namelist.input file was not providing a value for o3input, we were getting a different value (different default) in 3.9 than we had been in 3.6, which caused wrf.exe to fail immediately at launch with a complaint of not being able to find file ozone.something-or-other. This was easily fixed giving an explicit "o3input=0" in namelist.input. (0 was the default value in 3.6.)
With that change wrf.exe 3.9 gets started. And it generates some output -- one or two of the wrfout_d* files show up (out of an expected few dozen). But then output just stops. Nothing written to any of wrfout*, rsl*, or any other file. The wrf.exe processes are still running, still using up as much CPU as they can get, but not doing anything.
When it gets to this state, I grab all of the wrf.exe processes with strace(1) (multiple -p options), and all I see is a whole lot of sched_yield(2) system calls. If I add an option to the strace command line to trace everything *but* sched_yield, I see that none of the wrf.exe processes are calling any other system calls at all.
Any idea what might cause this? Any idea how to go about debugging it?
Thanks.