Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Forced exit from Rosenbrock

zxdawn

Member
Dear Sir/Madam,

I compiled WRF-Chem V3.5.1 with KPP successfully and tried to run wrf.exe.
However, I got this error at the beginning of calling kpp_mechanism_driver:

Code:
Forced exit from Rosenbrock due to the following error:

 No of steps exceeds maximum bound
 T=   72.0000000000000      and H=   72.0000000000000

 Forced exit from Rosenbrock due to the following error:

 No of steps exceeds maximum bound
 T=   72.0000000000000      and H=   72.0000000000000

It's related to this file WRFV3/chem/KPP/kpp/kpp-2.1/int/rosenbrock_tlm.f90:

Code:
 565    IF ( Nstp > Max_no_steps ) THEN  ! Too many steps
 566       CALL ros_ErrorMsg(-6,T,H,IERR)
 567       RETURN
 568    END IF
 569    IF ( ((T+0.1d0*H) == T).OR.(H <= Roundoff) ) THEN  ! Step size too small
 570       CALL ros_ErrorMsg(-7,T,H,IERR)
 571       RETURN
 572    END IF

The model fails in solving the chemical equations.
As Gabriele suggested on https://groups.google.com/a/ucar.edu/forum/?hl=en#!topic/wrf-chem-run/xKd6GHTYPQ4, I checked the input files.
I just compiled the WRF-Chem without KPP and used the same input files. It works fine.
So, how to debug what's wrong with KPP?

Versions:
byacc-20180609, flex-2.5.3, ifort-17.0.4.196, netcdf-4.6.2, hdf5-1.8.21

Enviornment:
Code:
((EM_CORE=WRF_EM_CORE=WRF_CHEM=1))
((NMM_CORE=WRF_NMM_CORE=0))
export EM_CORE WRF_EM_CORE WRF_CHEM
export NMM_CORE WRF_NMM_CORE

export NETCDF4=1
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
export PATH="${software}/byacc-20180609:${software}/flex-2.5.3/bin:$PATH"
export YACC="${software}/byacc-20180609/yacc -d"
export FLEX=${software}/flex-2.5.3/bin/flex
export FLEX_LIB_DIR=${software}/flex-2.5.3/lib
 
Hi,

It is unlikely that the problem is with the rosenbrock solver - more likely your initial conditions, and model inputs, are causing an unrealistic and unstable condition which the solver cannot deal with. What we will need in order to help you is more information about your model setup and input files.

Please could you post the namelist.input files that you use when running with & without KPP (I assume you are using different chemistry options)? Also, details on what meteorological and chemical input files you are using would be very useful, as well as what your domain setup is (what geographic location your domain is, etc). From this information we can start suggesting tests for you to carry out to try to find the source of your problem.

cheers,
Doug
 
Hi Doug,

I attached namelist.wps and namelist.input.

I'm using the same chemistry option which is r2smh. This chemistry mechanism is developed by the Berkeley.
Anyway, I encounter the same error when using other chemistry options with KPP by same wrfinput* and other files as r2smh.

Meteorological files: NARR
Chemical input files: MEGAN, NEI and MOZBC

I compiled WRF-Chem with KPP on another HPC and it works fine.
I attach two logs about wrfinput_d01 and wrfbdy_d01. One is the successful log and another is the failure. They're same!

All steps of making input files are same on two HPCs. The only difference is the environment.

Regards,
Xin
 

Attachments

  • namelist.input
    8.8 KB · Views: 142
  • namelist.wps
    1.5 KB · Views: 103
  • failure.log
    7.8 KB · Views: 114
  • success.log
    7.9 KB · Views: 118
Environment of failure:

1) intel-compilers/2017_update4
2) MPI/Intel/MPICH/3.2-icc2017-dyn
3) hdf5-1.8.21
4) netcdf-4.6.2
5) zlib/1.2.11-icc17
6) jasper/1.900.1/01-CF-15-libpng

Environment of success:

1) intel/18.0.0
2) impi/2018.0.128
3) netcdf/4.3.0
4) hdf5/1.8.20
5) zlib/1.2.7
6) jasper/1.900.1
7) libpng/1.5.13
 
Likely compiler. Intel 17.x and Intel 18.x can generate bad code. I've been battling this problem for a while
on STAMPEDE2.

Try building with a lower optimization level.
 
Hi Kevin,

Try building with a lower optimization level.

Both of the example were compiled by `-O2`.

If I switch to intel 2015, I would still get the same error.

However, I forgot to mention that the successful one used option_19 (Linux x86_64 i486 i586 i686, ifort compiler with icc (dmpar)),
while the another was option_24 (Linux x86_64 i486 i586 i686, Xeon (SNB with AVX mods) ifort compiler with icc (dmpar)).

So, this time I tried intel 2015 with option_19 and it works fine now!

I'm confused ... Why does it work for option_19, not option_24?

I attached the compile.log of two options. You can check that.

For the difference of compiling KPP, Let's take Line 309 as an example:
Option 19:
Code:
ifort -c -O2 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -i4  module_kpp_cbmz_bb_Precision.f90
Option 24:
Code:
ifort  -c -O2 -xAVX -w  -auto -ftz -fno-alias -fp-model fast=1 -no-prec-div -no-prec-sqrt -FR -convert big_endian -auto -align array64byte    -i4  module_kpp_cbmz_bb_Precision.f90

I also tested the speed of these two options:
option_24 without KPP is 3.69293 elapsed seconds per time step;
option_19 with KPP is 26.98240 elapsed seconds per time step;
Em... It looks option_24 without KPP is much quicker.

Cheers,
Xin
 

Attachments

  • compile_option19.log
    1.2 MB · Views: 111
  • compile_option24.log
    1.2 MB · Views: 108
Finally, setting the optimization level from '-O2' to '-O1' works now!
And, the speed of AVX with KPP is ~22 elapsed seconds per time step;
It's a little faster than option_19 (Linux x86_64 i486 i586 i686, ifort compiler with icc (dmpar)).

I still have two questions:

1. Why option_24 without KPP is much faster? Is the calculation of chemistry not working?
How to check whether the chemistry computing is correct?

2. Option 19 used '-O2' too. Why it works ...

Option 19:
Code:
ifort -c -O2 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -i4  module_kpp_cbmz_bb_Precision.f90
Option 24:
Code:
ifort  -c -O2 -xAVX -w  -auto -ftz -fno-alias -fp-model fast=1 -no-prec-div -no-prec-sqrt -FR -convert big_endian -auto -align array64byte    -i4  module_kpp_cbmz_bb_Precision.f90

Here's the Line 309 of compile_log with '-O1' (option_24):
Code:
/lib/cpp -C -P -I../../../../inc -DEM_CORE=1 -DNMM_CORE=0 -DNMM_MAX_DIM=2600 -DCOAMPS_CORE= -DDA_CORE= -DEXP_CORE= -DIWORDSIZE=4 -DDWORDSIZE=8 -DRWORDSIZE=4 -DLWORDSIZE=4 -DNONSTANDARD_SYSTEM_FUNC -DCHUNK=64 -DXEON_OPTIMIZED_WSM5 -DOPTIMIZE_CFL_TEST -DWRF_USE_CLM -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT  -DDM_PARALLEL -DNETCDF -DPNETCDF -DUSE_ALLOCATABLES -DGRIB1 -DINTIO -DLIMIT_ARGS -DCONFIG_BUF_LEN=65536 -DMAX_DOMAINS_F=21 -DMAX_HISTORY=25 -DNMM_NEST= -DWRF_CHEM -DBUILD_CHEM=1 -DWRF_KPP -I. -traditional -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT  module_kpp_cbmz_bb_Precision.b  > module_kpp_cbmz_bb_Precision.f90

----------------------------------------------------------------------------------------

If you feel confused, please check this summary:

Option_19: Linux x86_64 i486 i586 i686, ifort compiler with icc (dmpar)
Option_24: Linux x86_64 i486 i586 i686, Xeon (SNB with AVX mods) ifort compiler with icc (dmpar)

| Platform | Optimization_level | with_KPP | chem_work |
| ------------| ------------------------- | ------------ | -------------- |
| 19 | default(-O2) | Y | Y |
| 24 | -O2 | N | Y |
| 24 | -O2 | Y | N |
| 24 | -O1 | Y | Y |

Thank you everyone!
Cheers,
Xin
 
hi Xin,

Option 24 without KPP will be a lot faster because (as you suggest) the chemistry calculations are not being run.

As to why Option 19 works, I think that this flag might be the reason:
Code:
-fp-model precise

On the UK national HPC (ARCHER) I found that I could get intel 17.x to create a working executable by including this flag on this line of configure.wrf (when using configure option "INTEL (ftn/icc): Cray XC" with "dmpar"):
Code:
FCOPTIM         =       -ip -O3 -fp-model precise $(OPTAVX)
I don't think that my problem was in the KPP chemistry code (it was a couple of years ago, so I'd forgotten it might be an issue, or what the exact details were) - but it did involve NaN's appearing in my data appearing in the data arrays, and then being propogated through the whole domain. A similar problem for you could cause the rosenbrock solver to fail to find a solution.

You could try replacing
Code:
fast=1
(which allows optimisations in floating point math) with
Code:
precise
(which forces precise floating point math) in the configure.wrf file generated for option 24, and see if it will then create code that will work for you.

cheers,
Doug
 
Hi Doug,

Thank you! You're right!

Here's the comparison:

| Platform | Optimization_level | fp-model | Speed (seconds/time step) | Cores |
| -------- | ------------------ | -------- | ------------------------- | ----- |
| 19 | O2 | precise | ~27 | 96 |
| 24 | O1 | fast=1 | ~22 | 96 |
| 24 | O2 | precise | ~23.5 | 96 |
 
Good to hear that this fix works for you too.

Looking back in my notes, I see that I found similarly very small differences in model performance between -O1, -O2 -fp-math precise, and -O3 -fp-math precise as you have. I guess the lack of floating point math optimization cancels out the other performance gains made going from -O1 to -O2 or -O3.
 
Looking back in my notes, I see that I found similarly very small differences in model performance between -O1, -O2 -fp-math precise, and -O3 -fp-math precise as you have. I guess the lack of floating point math optimization cancels out the other performance gains made going from -O1 to -O2 or -O3.

Yes, I tried -O3 -fp-model precise and the speed is a little faster than -O1 -fp-model fast=1.

But there're differences between option_19 and option_24 -O3 -fp-model precise or option_24 -O1 -fp-model fast=1.

For example, the difference of NO2 at ground is mostly smaller than 2 pptv, but 40 pptv somwhere.
The pattern of difference is same for different optimization level and floating point math with option_24.

So, the result of option_24 is different from option_19.

Which factor would cause this difference?
Is option_19 the common one you use?


You might check to see if your hardware supports AVX2. I don't think there is a default config for that.

I asked the administrator and he said the hardware supports AVX.
 
Hi,
I am facing the same issue. I have tried compiling with o1 optimization but still I am getting the error. I am using WRF-Chem 4.1, Intel compiler 2016. Could you advice on what I should try for removing the error? I am attaching the namelist.
 

Attachments

  • namelist.input
    8.5 KB · Views: 120
I met the same issue when I compiled WRF with

Code:
CFLAGS_LOCAL    =       -w -O3 -ip -fp-model fast=2 -no-prec-div -no-prec-sqrt -ftz -no-multibyte-chars -xCOMMON-AVX512
LDFLAGS_LOCAL   =       -ip -fp-model fast=2 -no-prec-div -no-prec-sqrt -ftz -align all -fno-alias -fno-common -xCOMMON-AVX512
FCBASEOPTS_NO_G =       -ip -fp-model precise -w -ftz -align all -fno-alias $(FORMAT_FREE) $(BYTESWAPIO) -fp-model fast=2 -no-heap-arrays -no-prec-div -no-prec-sqrt -fno-common -xCOMMON-AVX512

What option should I compile WRF with?
Thank you.
 
When I tested, "-O3 -fp-model precise' is likely to be best solution to solve the Rosenbrock error.

Code:
CFLAGS_LOCAL    =       -w -O3 -ip -xHost -fp-model precise -no-prec-div -no-prec-sqrt -ftz -no-multibyte-chars -xCORE-AVX512
LDFLAGS_LOCAL   =       -ip -xHost -fp-model precise -no-prec-div -no-prec-sqrt -ftz -align all -fno-alias -fno-common -xCORE-AVX512
FCBASEOPTS_NO_G =       -ip -fp-model precise -w -ftz -align all -fno-alias $(FORMAT_FREE) $(BYTESWAPIO) -xHost -fp-model precise -no-heap-arrays -no-prec-div -no-prec-sqrt -fno-common -xCORE-AVX512

Also the compiler option "15. (dmpar) INTEL (ifort/icc)" shows similar speed without the Rosenbrock error.
 
  • Wow
Reactions: ydf
When I tested, "-O3 -fp-model precise' is likely to be best solution to solve the Rosenbrock error.

Code:
CFLAGS_LOCAL    =       -w -O3 -ip -xHost -fp-model precise -no-prec-div -no-prec-sqrt -ftz -no-multibyte-chars -xCORE-AVX512
LDFLAGS_LOCAL   =       -ip -xHost -fp-model precise -no-prec-div -no-prec-sqrt -ftz -align all -fno-alias -fno-common -xCORE-AVX512
FCBASEOPTS_NO_G =       -ip -fp-model precise -w -ftz -align all -fno-alias $(FORMAT_FREE) $(BYTESWAPIO) -xHost -fp-model precise -no-heap-arrays -no-prec-div -no-prec-sqrt -fno-common -xCORE-AVX512

Also the compiler option "15. (dmpar) INTEL (ifort/icc)" shows similar speed without the Rosenbrock error.
I want to compile the model with the intel compiler. And since you run the model with chem_opt 202 successfully I need to know what WRF-Chem version you compiled. I also need to know what version of netcdf-c, netcdf-f, hdf5, libpng, zlib, jasper and mpich did you used, please.

Once you configure the model which option did you select ( 15,19,24)?
 
Top