Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

runtime error with Intel compiler - OK with GNU

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

alainF

Member
Hello,
I am having some problems to let WRF system run with the executable built with the INTEL compiler ifort.
First, I have tried with GNU.
I have made all the steps from the website : http://www2.mmm.ucar.edu/wrf/OnLineTutorial/compilation_tutorial.php#STEP1
With GNU compiler, all was ok, from building the executable to the runtime. But the execution speed was a little bit slow. So I wanted to make the executable with Intel compiler.
With intel compiler :
I have changed the necessary environment variables from GNU to Intel. It is in the attached file "setEnvForWRF_intel", put in my .bash_profile.
All the small fortran and C tests from the above site were OK.
I succeeded to compile all the libraries such as netcdf, mpich,...
WRF and WPS compile well (see attached log files).
geogrid, ungrib and metgrid OK
real.exe OK
But wrf stops after the first time step. I have attached the corresponding namelist files and also rsl.error files. The program stops after the message saying segmentation fault (signal 11). No other info.
I have tried to compile wrf in debug mode by typing ./configure -d. But no other information came out! So I don't know what to do...The attached log file from compilation was made with no debug.
I used the version 15 from intel compiler, because here :http://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=40&t=571&p=1851&hilit=intel+error#p1851 it is said that WRF won't run with intel compiler version starting from 17, but 15 seemed ok. For me, it does not work.
My OS is CENTOS 7, running on i7 8086K and 6 cores.

Thank you for your help
 

Attachments

  • setEnvForWRF_intel.txt
    450 bytes · Views: 61
  • compileWRF.log.txt
    660.1 KB · Views: 60
  • rsl.error.0000.txt
    1.9 KB · Views: 62
  • namelist.input
    6.7 KB · Views: 59
Hi,

1) When you ran this with GNU, were you using the exact same namelist.input file, input data, domain, dates, etc.? Did the run complete without problems?
2) While some may have had problems with Intel versions 17+, this is not a known issue. I have compiled and run the code perfectly fine with Intel, so it may not be necessary to revert to v15.
3) The first thing that strikes me is that you have set your decomposition to be 1x6 (with the nproc_x and nproc_y settings). I would recommend trying this without those settings, so that the natural decomposition will be 2x3.

Looking at your namelist.input file, it looks like you are using a lot of advanced options. If trying #3 above doesn't help, I'd recommend starting with a basic namelist, using your domain, dates, input data, and see if that works. If it does, then try to add in or change a few new options at a time. Using this method, you can hopefully track down what about your setup is causing the problem.

Kelly
 
Hello,
Here are some answers :
1) When you ran this with GNU, were you using the exact same namelist.input file, input data, domain, dates, etc.? Did the run complete without problems?
Definitely yes. Yesterday, I have rebuilt the whole system with GNU, and wrf.exe has ended well : I have all the output files that I have requested.

3) The first thing that strikes me is that you have set your decomposition to be 1x6 (with the nproc_x and nproc_y settings). I would recommend trying this without those settings, so that the natural decomposition will be 2x3.
OK, the next step will be this change in my namelist. But I have to rebuild with intel. I will keep the version 15 because it is already installed now on my computer, and if ok, I could try with the newest compiler version.

Looking at your namelist.input file, it looks like you are using a lot of advanced options.
: yes. I was using the UEMS system from this institution : http://strc.comet.ucar.edu/index.htm. In this system, there is a script that guides the user through many steps in order to get a consistent set of parameters, and the namelist.input is finally automatically generated by the script. I have picked up the namelist generated by UEMS system and inserted it into the "standard" WRF system. I had to make some adjustments, especially for the namelist.wps because sometimes the syntax is different, but that was not difficult to do. But without the help of the UEMS, I wouldn't been able to write such a complicated namelist.input by myself.

I will let you know the progress !

Regards,

Alain
 
Hello,
I have finally installed the last version of intel compiler, which is ifort version 19.0.3.199.
In order to minimize the risk of no consistency among the parameters of the namelist.input file, I have rewritten this file in a simpler way. I have chosen the example of the input file which is set by default after the installation of the WRF system, in the run directory. I made some changes to reflects my own domain configuration, and date of GFS input files.

The problem of crash is still here. But with the new version of the intel compiler, I get some error messages in the rsl.error.0000 file. I have attached the files of the last run. WRF program generates a first output file and then crashes.
I hope this will help. Thanks a lot for any idea or suggestion that could solve the problem,

Alain
 

Attachments

  • namelist.input.txt
    2.5 KB · Views: 58
  • rsl.error.0000.txt
    3 KB · Views: 62
  • configure.wrf.txt
    20.4 KB · Views: 54
  • namelist.wps.txt
    1.6 KB · Views: 57
Hello,
In addition to my previous post, I have rebuilt the system with ./configure -d. For both WRF and WPS.
You will find the input parameter file.
It still crashes but at least it says where. I hope this will help to solve the problem.

Thank you for your help,

Alain
 

Attachments

  • namelist.wps.txt
    1.5 KB · Views: 56
  • namelist.input.txt
    2.5 KB · Views: 59
  • rsl.error.0000.txt
    3 KB · Views: 60
Thank you for sending those, and for attempting the compile with no optimization. Can you package together all of your rsl* files and send those, as well? Can you also send your met_em* files for the first 2 time periods so that I can try this out?

Thanks,
Kelly
 
Hello,
Here are the requested files.

Regards,
Alain
 

Attachments

  • Allrsl.tar.gz
    1.7 KB · Views: 54
  • met.tar.gz
    60.1 MB · Views: 53
Hi,
Thanks for sending those. I was just able to run a test case using your namelist.input file, your met_em* files, WRFV4.0.3, with Intel V17.0.1. I did this with 6 processors, as you've done. I was able to run 6 hours to completion without any problems. This indicates that the problem is likely something going on with your system or environment. Unfortunately you'll need to talk to the systems administrator at your institution to see if they have any advice to solve the problem you're having with Intel. If you do figure it out, and you'd like to share the solution here, it may help a future user who is having a similar issue. Good luck!
 
OK,
Thank you for your answer.
Just a question.
I am using à Vmware virtual machine. Have you ever heard about problems in such a configuration ?
Regards,
Alain
 
Alain,

Vmware should not be reason for WRF not to work. But, I got an idea when you mentioned that.

You are using mpich, is that correct? How do you start wrf.exe? Something like "mpirun -n 6 ./wrf.exe" ?

If that's correct, can you try, for test, running wrf without mpich? Just "./wrf.exe" please?

Ivan
 
Hello Ivan,
Nice to hear from you.
Yes I use mpirun in the way that you have said.
And I tried only with ./wrf , as you said, and the result is the same. It stops after the first time step.
 
Alain,

OK then mpi is out of suspect so I don't have further ideas. You might want to go with completely clean compilation environment, for example build everything in new system user directory tree, all compile-time dependencies and then wrf, from scratch.

Ivan
 
Hello everyone,
So it seems that I have solved my problem. ;)
The solution was found here :
http://wiki.seas.harvard.edu/geos-chem/index.php/Intel_Fortran_Compiler
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/270043

I had to set the followings in my .bash_profile :

ulimit -s unlimited
export OMP_STACKSIZE=500M

It was a problem of stack size, because it seems that the binaries from intel compiler need a lot of memory :eek:

Thanks to all for your support,

Alain
 
Top