Hello,
I am attempting to run wrf.exe in parallel, but it keeps failing after running for 5-6 hours wall time. I was able to get real.exe to run successfully using the same specifications (mpirun -np 42). I've been unable to find anything in the rsl files that would indicate why it would fail. It appears it is running in parallel with "Ntasks in X 6 , ntasks in Y 7". I have also tried with 20, 30, and 60 cores, but got the same result. When compiling WRF, I followed the directions here: Full WRF and WPS Installation Example (GNU). I've tried compiling by following these directions, then I tried compiling with the most recent stable mpich version (mpich-4.1.2), and most recently, I tried the updated versions of all libraries but still get the same result. I'm not sure what else I could try at this point. I have also attached one of my rsl files. Any help would be appreciated. Thank you
Update
I've noticed that I get the following message in my rsl file:
**WARNING** Time in input file not equal to time on domain **WARNING**
**WARNING** Trying next time in file wrflowinp_d01 ...
Also, my wrfinput* files only have one time point (should have 48 for the entire simulation). I've checked namelist.input and start and end date/times are correct, and rsl* files show that real.exe completes successfully. I've also attached the real.exe rsl.error.0000 file (real_rsl.error.0000)
I am attempting to run wrf.exe in parallel, but it keeps failing after running for 5-6 hours wall time. I was able to get real.exe to run successfully using the same specifications (mpirun -np 42). I've been unable to find anything in the rsl files that would indicate why it would fail. It appears it is running in parallel with "Ntasks in X 6 , ntasks in Y 7". I have also tried with 20, 30, and 60 cores, but got the same result. When compiling WRF, I followed the directions here: Full WRF and WPS Installation Example (GNU). I've tried compiling by following these directions, then I tried compiling with the most recent stable mpich version (mpich-4.1.2), and most recently, I tried the updated versions of all libraries but still get the same result. I'm not sure what else I could try at this point. I have also attached one of my rsl files. Any help would be appreciated. Thank you
Update
I've noticed that I get the following message in my rsl file:
**WARNING** Time in input file not equal to time on domain **WARNING**
**WARNING** Trying next time in file wrflowinp_d01 ...
Also, my wrfinput* files only have one time point (should have 48 for the entire simulation). I've checked namelist.input and start and end date/times are correct, and rsl* files show that real.exe completes successfully. I've also attached the real.exe rsl.error.0000 file (real_rsl.error.0000)
Attachments
Last edited: