Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

The same initial and boundary conditions get different results

wrfuser

New member
I ran wrf.exe for two cases(run1 and run2).wrfbdy_d01 and wrfinput_d0* are the same in the two cases. I think that wrfout_d0* are the same between run1 and run2 because wrfbdy_d01 and wrfinput_d0* are the same. Is my understanding correct? In order to confirm this point, I checked the values of "U10" in wrfout_d*.

1) 2021-01-01 18:00:00 (initial time)
ncdump -v "U10" ./run1/wrfout_d01_2021-01-01_18\:00\:00 > out_run1.txt
ncdump -v "U10" ./run2/wrfout_d01_2021-01-01_18\:00\:00 > out_run2.txt
out_run1.txt and out_run2.txt are the same.

2) 2021-01-01 19:00:00
ncdump -v "U10" ./run1/wrfout_d01_2021-01-01_19\:00\:00 > out_run1.txt
ncdump -v "U10" ./run2/wrfout_d01_2021-01-01_19\:00\:00 > out_run2.txt
out_run1.txt and out_run2.txt are different.I attached the files.
It seems strange that out_run1.txt and out_run2.txt are different, do you know why? Am I doing something wrong?
 

Attachments

  • log_run1.txt
    39.5 KB · Views: 0
  • log_run2.txt
    39.5 KB · Views: 0
  • namelist.input.txt
    4.3 KB · Views: 1
  • out_run1.txt
    168 KB · Views: 2
  • out_run2.txt
    168.1 KB · Views: 1
Thanks for your reply.
Yes, I ran the two cases using the same wrf.exe file and the same number of processors. The number of processors was set as follows:
export OMP_NUM_THREADS=64
./wrf.exe
You can see "WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS = 64" in the log files(e.g.,log_run1.txt).Am I doing something wrong?
 
I also ran the test case in Chapter 4 in WRF Users Guide(Real Data Test Case: 2000 January 24/12 through 25/12):
real.exe→wrf.exe(run1)→wrf.exe(run2)

I checked the outputs,
ncdump -v "U10" ./run1/wrfout_d01_2000-01-24_12\:00\:00 > out_run1.txt
ncdump -v "U10" ./run2/wrfout_d01_2000-01-24_12\:00\:00 > out_run2.txt
and found that out_run1.txt and out_run2.txt are different.
 
I am not sure whether the setting OMP_NUM_THREADS=64 will ensure the same number of processors for your case.
Anyway, we expect slight differences in the results when running the same case with different number of processors. But the differences shouldn't be larger enough to impact the physics/dynmaics of the model.
 
Thank you so much for your reply. Could you please tell me the following points?
I am not sure whether the setting OMP_NUM_THREADS=64 will ensure the same number of processors for your case.
Q1. How can I make sure that the number of processors is the same? How do I get the same number of processors?

Anyway, we expect slight differences in the results when running the same case with different number of processors. But the differences shouldn't be larger enough to impact the physics/dynmaics of the model.
Q2. I understood that the difference between out_run1.txt and out_run2.txt may come from the difference in the number of processors. However, I cannot understand that out_run1.txt and out_run2.txt are the same for 1) 2021-01-01 18:00:00 (initial time). Could you please tell me the reason?
 
@Whatheway
You are right that we need to compile WRF in dmpar mode if we run MPI.
To run WRF compiled in smpar mode (i.e., openMP), we need to setenv OMP_NUM_THREADS N (N is the number of processors)
To run WRF compiled in dmpar mode(i.e., MPI), the command is mpirun -np N wrf.exe
 
@wrfuser
I have to say that there might have some issues in the OpenMP option of WRF.
Would you please recompile WRF in dmpar mode (MPI option), and rerun the case?
 
@Whatheway
@Ming Chen
I am sorry for my late reply.
I recompiled WRF in dmpar mode and got identical results! Thanks so much for your helpful comments.
Could you tell me why in smpar mode the same initial and boundary conditions get different results?
 
@Whatheway
@Ming Chen
I am sorry for my late reply.
I recompiled WRF in dmpar mode and got identical results! Thanks so much for your helpful comments.
Could you tell me why in smpar mode the same initial and boundary conditions get different results?
You will probably have to re install all the libraries and install mpich then reinstall WRF and WPS with dmpar enabled.
 
@Whatheway
Thank you for your quick response.

In smpar mode, the same initial and boundary conditions get different results (run1/=run2).
In dmpar mode, the same initial and boundary conditions get the same results (run1=run2).The problem in the first post did not occur with dmpar.

Do I need to reinstall all libraries, install mpich, then reinstall WRF and WPS with dmpar?
 
@Whatheway
Thank you for your quick response.

In smpar mode, the same initial and boundary conditions get different results (run1/=run2).
In dmpar mode, the same initial and boundary conditions get the same results (run1=run2).The problem in the first post did not occur with dmpar.

Do I need to reinstall all libraries, install mpich, then reinstall WRF and WPS with dmpar?

No problem.

So I believe it will be better going forward to reinstall everything using MPICH. As @Ming Chen said that dmpar is needed and MPICH is required for dmpar.

In my personal experience I always install all the libraries with mpich's compilers over the serial gnu compilers when I build a dmpar WRF. I'm not certain it's needed but I want to make sure all the libraries are built in the same environment as I want to build WRF and WPS.

Before you delete and rebuild everything from scratch let us wait until @Ming Chen or @kwerner (NCAR Staff) add their thoughts to this.

If you don't need to rebuild everything from scratch that would be prefered.
 
Sorry to jump into this discussion. I am testing WRF with MPI and MPI+OpenMP on cheyenne. The code is reproducible when OpenMP is off, but is NOT reproducible when OpenMP is on. By reproducible, I mean run the exact same case with the same binary twice and get bit-for-bit identical results. Is irreproducibility with OpenMP expected?

Ming Chen in #10 above suggested "I have to say that there might have some issues in the OpenMP option of WRF". What does this mean? I understand that different pe counts can produce different results, but running the exact same case on the same number of MPI tasks and OpenMP threads should produce exactly the same internal decomposition including threading. Is that not true? Is bit-for-bit reproducibility a requirement and tested for in WRF with OpenMP on?
 
To get bit-for-bit reproducibility, we have to reduce the optimization level when compiling WRF. Please try to recompile WRF with the option:
./configure -d

Remember to do ./clean -a before recompiling.
 
Thanks @Ming Chen, it looks like that sets the compiler flags to -O0. I will test that. Do you know why WRF is not reproducible with OpenMP on and standard optimization. Are you arguing that the decomposition isn't reproducible (I think it is)? Are there some global sums in the OpenMP loops that might depend on thread ordering or something? If would be helpful to understand why -O0 fixes the problem rather than just knowing that it does.

What I really want is to be able to run in production with reproducibility, I'll see how -O0 performs, but my guess is that it'll cancel out some/most of the benefit of using OpenMP. You can get reproducibility with OpenMP, it just takes some debugging. The fact that WRF is not reproducible with OpenMP on (and -O2) suggests that there are some bugs in the OpenMP implementation. It's not just "it's a different decomposition" everytime WRF is run, I don't think that's what's happening. It's more like some variables that should be private in OpenMP loops are not. That's a bug that can change a simulation and produce wrong answers and is why it's important to understand why OpenMP may be behaving the way it is. I'm trying to test/debug WRF a bit at the moment, but I'm not a WRF expert. I'll let you know if I find anything. Any insights would be greatly appreciated.
 
Yes, we are using "-fp model precise" with the Intel compiler. Thanks for the suggestion.

I have been debugging the OpenMP. Just to make things clear, we are using WRF3.7.1 for a number of reasons. I don't know whether the latest version of WRF has the same issues. But I have found lots of bugs in the OpenMP implementation that I have fixed in my version, all related to private variable declarations. If a variable should be private in OpenMP but isn't, it means that a local variable is being set and used by multiple WRF tiles but that the value of that variable can be different on different tiles. That is a "race" conditon, so when running that section of code threaded, the results will be irreproducible because the value of that variable will depend on the order of threads hitting that section of code, and that is not guaranteed with standard OpenMP threading. It could be that the "weather" WRF is generating is the same when this happens, but this is certainly an issue that is NOT similar, for instance, to roundoff errors in global sums due to order of operations changes. These OpenMP errors are real bugs that could produce bad simulations. So far as I can tell, the WRF decomposition is entirely deterministic, so when running the same binary on the same case twice, the results should be bit-for-bit identical. That is true of our version when running with MPI only, now I'm fixing the OpenMP so it's true with MPI+OpenMP as well. And by the way, this is true when running with the default optimization (basically -O3 -ip -fp-model precise -qopenmp -fpp -auto) which I'm using now. I backed off to -O0 as part of the debugging process, but now have optimization back up where it should be and getting bit-for-bit, still debugging a few OpenMP loops.

I would strongly encourage that WRF start to test some configurations with MPI+OpenMP for reproducibility. I'd be happy to provide guidance about what to do and how to identify and fix problems as they occur. These OpenMP bugs can be really tedious to find, but I think it's incredibly important to have confidence in the OpenMP implementation. If a section cannot be made to thread reproducibly, then you can always turn that section off. I feel reproducibility should be a requirement. Again, as I said, with our version of WRF 3.7.1, I found lots of errors relatively quickly and am now left with just a couple of threaded sections that continue to give me problems. I may solve those problems, or I'll turn off the threading in those sections in production runs.
 
Top