(SOLVED) Segmentation fault after CAM-CLWRF co2vmr

Topics specifically related to the wrf.exe program
Post Reply
Klemens
Posts: 8
Joined: Wed Mar 13, 2019 9:01 am

(SOLVED) Segmentation fault after CAM-CLWRF co2vmr

Post by Klemens » Thu Sep 10, 2020 3:16 pm

namelist.input
(4.88 KiB) Downloaded 15 times
Dear all,

I try to run a simulation (WRF3.8.1) with input from ECHAM6. I did a lot of simulations for other days with a quite similar configuration, but this error appears for some runs/days:

d04 2004-09-21_12:04:00 CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.773999997567444E-006
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 00000000042172ED for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B728DECE630 Unknown Unknown Unknown
wrf.exe 0000000002347DDE Unknown Unknown Unknown
wrf.exe 000000000231D288 Unknown Unknown Unknown
wrf.exe 000000000230EA68 Unknown Unknown Unknown
wrf.exe 000000000230A4B7 Unknown Unknown Unknown
wrf.exe 000000000194C20B Unknown Unknown Unknown
wrf.exe 0000000001B17104 Unknown Unknown Unknown
wrf.exe 0000000001369783 Unknown Unknown Unknown
wrf.exe 0000000001180BEC Unknown Unknown Unknown
wrf.exe 00000000005358D3 Unknown Unknown Unknown
wrf.exe 0000000000535EBC Unknown Unknown Unknown
wrf.exe 0000000000535EBC Unknown Unknown Unknown
wrf.exe 0000000000535EBC Unknown Unknown Unknown
wrf.exe 000000000040E041 Unknown Unknown Unknown
wrf.exe 000000000040DFFF Unknown Unknown Unknown
wrf.exe 000000000040DF9E Unknown Unknown Unknown
libc-2.17.so 00002B728E3FF555 __libc_start_main Unknown Unknown
wrf.exe 000000000040DE92 Unknown Unknown Unknown


I do not find any cfl or error in the rsl-files. But I find
DEBUG: top of integrate(), clock time step = 0000000000_000:000:000
d04 2004-09-21_12:04:00 DEBUG wrf_timeinttoa(): returning with str = [0000000000_000:000:000]
though I do not use adaptive time steps.
I already set w_damping=1 and epssm=0.3.

namelist.input is attached.

What else can I do ?
Many thanks in advance !

Klemens

kwerner
Posts: 1926
Joined: Wed Feb 14, 2018 9:21 pm

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Post by kwerner » Fri Sep 11, 2020 3:25 am

Hi,
First, remove the debug_level option from your namelist (or set to 0). This is an option that was removed from the default namelist a few years ago because it's just not very useful in finding anything significant, it add a lot of extra junk to your rsl files, and it can sometimes actually make the files so large that it actually is the cause of the segmentation fault. I assume that you likely increased that number when you started getting the error, so that is probably not the root of the problem for you. However, I would like to see all of your rsl.error.* files and it will be nearly impossible for me to look at them if they have all those extra lines. So after you remove that from the namelist, rerun the simulation that is failing, and then package your rsl.error* files into a single *.TAR (not a .rar) file, and send that to me so that I can take a look. Thanks!
NCAR/MMM

Klemens
Posts: 8
Joined: Wed Mar 13, 2019 9:01 am

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Post by Klemens » Fri Sep 11, 2020 12:51 pm

Dear kwerner,

many thanks for your help.
I uploaded the rsl.error files to the cloud. Filename is KB_rsl.error.tar

Best regards,
Klemens

kwerner
Posts: 1926
Joined: Wed Feb 14, 2018 9:21 pm

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Post by kwerner » Mon Sep 14, 2020 9:37 pm

Hi,
Thanks for sending those. I went back and looked at your namelist a bit closer. I think the problem could almost certainly be the size of your domains for two reasons. 1) d01 and d02 are way too small; 2) the difference in the size of d01/d02 vs. d04 is likely problematic. I'm not sure why this does work sometimes, but it's not compliant with our recommendations. Take a look at these for some explanation:

FAQ regarding the number of processors to choose, vs. size of domains:
http://forum.mmm.ucar.edu/phpBB3/viewto ... =73&t=5082

Best Practices to use when setting up your namelist/domains:
http://www2.mmm.ucar.edu/wrf/users/name ... c_wps.html
NCAR/MMM

Klemens
Posts: 8
Joined: Wed Mar 13, 2019 9:01 am

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Post by Klemens » Wed Oct 07, 2020 12:20 pm

Dear kwerner,

sorry for my late reply/feedback, but I wanted to be sure that all my simulations finish successful:

I did around 400 individual simulations (36 hours) with more than 50 not finishing successful, when I posted the first message.
Most of them stopped at quite different simulation steps during runtime with different error messages (e.g. segmentation fault) and sometimes no error message at all.

All of them finished successfully when enlarging the outer domains for the nesting (your advice).

(I was trying to get rid of the problem using help from the internet by changing timesteps, damping and sometimes radiation schemes. The influence of the domain sizes on producing all this various kinds of errors was not clear to me (only on the quality of simulated results).)

Many thanks for your help again !

Klemens

kwerner
Posts: 1926
Joined: Wed Feb 14, 2018 9:21 pm

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Post by kwerner » Wed Oct 07, 2020 5:33 pm

Klemens,
That is great news! Thank you so much for updating the post.
NCAR/MMM

Post Reply

Return to “wrf.exe”