Page 1 of 1

(SOLVED) Segmentation fault after CAM-CLWRF co2vmr

Posted: Thu Sep 10, 2020 3:16 pm
by Klemens
namelist.input
(4.88 KiB) Downloaded 20 times
Dear all,

I try to run a simulation (WRF3.8.1) with input from ECHAM6. I did a lot of simulations for other days with a quite similar configuration, but this error appears for some runs/days:

d04 2004-09-21_12:04:00 CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.773999997567444E-006
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 00000000042172ED for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B728DECE630 Unknown Unknown Unknown
wrf.exe 0000000002347DDE Unknown Unknown Unknown
wrf.exe 000000000231D288 Unknown Unknown Unknown
wrf.exe 000000000230EA68 Unknown Unknown Unknown
wrf.exe 000000000230A4B7 Unknown Unknown Unknown
wrf.exe 000000000194C20B Unknown Unknown Unknown
wrf.exe 0000000001B17104 Unknown Unknown Unknown
wrf.exe 0000000001369783 Unknown Unknown Unknown
wrf.exe 0000000001180BEC Unknown Unknown Unknown
wrf.exe 00000000005358D3 Unknown Unknown Unknown
wrf.exe 0000000000535EBC Unknown Unknown Unknown
wrf.exe 0000000000535EBC Unknown Unknown Unknown
wrf.exe 0000000000535EBC Unknown Unknown Unknown
wrf.exe 000000000040E041 Unknown Unknown Unknown
wrf.exe 000000000040DFFF Unknown Unknown Unknown
wrf.exe 000000000040DF9E Unknown Unknown Unknown
libc-2.17.so 00002B728E3FF555 __libc_start_main Unknown Unknown
wrf.exe 000000000040DE92 Unknown Unknown Unknown


I do not find any cfl or error in the rsl-files. But I find
DEBUG: top of integrate(), clock time step = 0000000000_000:000:000
d04 2004-09-21_12:04:00 DEBUG wrf_timeinttoa(): returning with str = [0000000000_000:000:000]
though I do not use adaptive time steps.
I already set w_damping=1 and epssm=0.3.

namelist.input is attached.

What else can I do ?
Many thanks in advance !

Klemens

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Posted: Fri Sep 11, 2020 3:25 am
by kwerner
Hi,
First, remove the debug_level option from your namelist (or set to 0). This is an option that was removed from the default namelist a few years ago because it's just not very useful in finding anything significant, it add a lot of extra junk to your rsl files, and it can sometimes actually make the files so large that it actually is the cause of the segmentation fault. I assume that you likely increased that number when you started getting the error, so that is probably not the root of the problem for you. However, I would like to see all of your rsl.error.* files and it will be nearly impossible for me to look at them if they have all those extra lines. So after you remove that from the namelist, rerun the simulation that is failing, and then package your rsl.error* files into a single *.TAR (not a .rar) file, and send that to me so that I can take a look. Thanks!

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Posted: Fri Sep 11, 2020 12:51 pm
by Klemens
Dear kwerner,

many thanks for your help.
I uploaded the rsl.error files to the cloud. Filename is KB_rsl.error.tar

Best regards,
Klemens

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Posted: Mon Sep 14, 2020 9:37 pm
by kwerner
Hi,
Thanks for sending those. I went back and looked at your namelist a bit closer. I think the problem could almost certainly be the size of your domains for two reasons. 1) d01 and d02 are way too small; 2) the difference in the size of d01/d02 vs. d04 is likely problematic. I'm not sure why this does work sometimes, but it's not compliant with our recommendations. Take a look at these for some explanation:

FAQ regarding the number of processors to choose, vs. size of domains:
http://forum.mmm.ucar.edu/phpBB3/viewto ... =73&t=5082

Best Practices to use when setting up your namelist/domains:
http://www2.mmm.ucar.edu/wrf/users/name ... c_wps.html

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Posted: Wed Oct 07, 2020 12:20 pm
by Klemens
Dear kwerner,

sorry for my late reply/feedback, but I wanted to be sure that all my simulations finish successful:

I did around 400 individual simulations (36 hours) with more than 50 not finishing successful, when I posted the first message.
Most of them stopped at quite different simulation steps during runtime with different error messages (e.g. segmentation fault) and sometimes no error message at all.

All of them finished successfully when enlarging the outer domains for the nesting (your advice).

(I was trying to get rid of the problem using help from the internet by changing timesteps, damping and sometimes radiation schemes. The influence of the domain sizes on producing all this various kinds of errors was not clear to me (only on the quality of simulated results).)

Many thanks for your help again !

Klemens

Re: Segementation fault after CAM-CLWRF co2vmr: 3.790000046137720E-004 n2ovmr: 3.190000086306100E-007 ch4vmr: 1.77

Posted: Wed Oct 07, 2020 5:33 pm
by kwerner
Klemens,
That is great news! Thank you so much for updating the post.