On PC, the simulations produce identical results, but on clusters, I don't get similar or close results. In some cases, I get differences greater than 5°C at a given grid point for the T2 variable. I really don't know what to do. I'm attaching the configure.wrf files from the serial and parallel builds (dmpar).
Can you attach the namelist.input and namelist.wps files you're using for these cases? I'd like to see if I also get different values for the three different build options. Can you also let me know which version of WRF you're using? Thanks!
In both simulations (serial and parallel) I used the same attached namelists files. I emphasize that I did not compile WPS in parallel, that is, the same met_em.d0?* files were used in both simulations. In the last tests I did I used version 4.5, but I already had the same problem with other versions.
Thank you in advance for the attention of the whole team.
I tried to replicate your issue using V4.5 and essentially your namelists. I was unable to run a serial simulation with the domain sizes you are using (I'm not sure how you were able to), so I had to decrease the domain sizes. However, the serially-compiled simulation and a dmpar-compiled simulation produced identical results.
Have you made any modifications to your WRF code? Are there any differences, whatsoever, between the simulations, besides the fact that one is compiled serially and one is compiled for dmpar?
I haven't made any changes to the code. I'm surprised I can't run the model in serial. I haven't had any problems with this, even running it on a workstation.
I am making the results of domain 3 available in the Nextcloud storage to verify the differences obtained between the simulations.
Files: wrfout_d03_2040-01-01_serial and wrfout_d03_2040-01-01_dmpar.
Thanks for sharing those. Can you also do the following?
1) Please go to your serially-compiled wrf running directory and make a copy of the namelist ( > cp namelist.input namelist.input.serial)
2) Go to your dmpar-compiled wrf running directory and do the same with that namelist ( > cp namelist.input namelist.input.dmpar)
Then send the following files (you can package all of these into a single *.tar or *.zip file and upload it to Nextcloud):
log file(s) for both simulations (output log for the serial simulation and rsl* files for the dmpar simulation)
wrfbdy_d01 and wrfinput* files for each simulation (renamed to correspond to the compile type)
I did exactly as instructed and added the files in Nextcloud. You can access it through the serial_x_dmpar.zip file.
- I used exactly the same namelist in both simulations, even the "HISTORY_OUTNAME" I kept the same. After finishing the serial simulation I renamed the file to "wrfout_d03_2040-01-01_serial".
- Unlike the dmpar run which generated several rsl* files, the serial run generated only one log file, "wrf-serial.o16989".
Thanks for sharing those. I've compared all of your files and they are, indeed, identical. The only files I didn't have from you were the wrflowinp* files (I forgot to ask for those) so I wasn't able to run a test using sst_update. I also am still not able to run your 3 domain simulation with a single processor, but I ran a dmpar and serial simulation with a single domain and the wrfout* files are identical at the end. Unfortunately I'm out of ideas at the moment. The only other thing I thought you could try is to obtain new, clean versions of the source code and then, in new directories, compile them again (one with a dmpar configuration and one with a serial configuration) to see if you still have the same issue. I know that sounds silly, but simply recompiling the code has worked on more than one occasion when it doesn't seem to make sense.
I performed a series of tests on two different HPCs and got the same pattern of results.
Executions with different numbers of cores but with the same number of NTASK_X present results (wrfinput_d* and wrfout*) equal to each other, but when NTASK_X is different the results are different. For example, runs with 1, 2, 3, 5 and 7 cores (NTASK_X = 1 and NTASK_Y=1, 2, 3, 5, 7) the results are equal to each other. Runs with 12 and 21 cores (NTASK_X = 3 and NTASK_Y = 4, 7 ) the results are equal to each other, but are different from the results obtained for runs with NTASK_X = 1. There seems to be some problem when the model splits the matrix on axis X. Does this information contribute to the solution of the problem?
I'd first like to apologize for the long delay in response. We have been busy preparing for the WRF tutorial taking place this week and have gotten quite behind on forum responses. I have tried running with the task settings you mention, and even tried with GNU and Intel, and I am still not able to repeat the issue. Does it take a certain amount of simulation time for this to occur, or does it happen quickly (say, with a 3 or 6 hour simulation)?