(RESOLVED) forrtl: severe (66): output statement overflows record, unit -5, file Internal List-Directed Write

kellynunez · Oct 16, 2020

Hi there,

I am running wrf in Cheyenne with a test run of just 2 days (my actual simulation will be of 20 days). I'm getting a "
forrtl: severe (66): output statement overflows record, unit -5, file Internal List-Directed Write" error. I have attached my namelist.input and a screenshot of the error. Any help will be much appreciated!

Kelly

kellynunez · Oct 16, 2020

Also, I forgot to mention I'm using the pre-compiled 3.9.1 WRF version in Cheyenne

kwerner · Oct 20, 2020

Hi,
Can you point me to your running directory on Cheyenne so that I can take a look at the output files? Thanks!

kellynunez · Oct 21, 2020

Hi!

It is /glade/scratch/knocasio/WRF_research/wrf/run_test1/

Thanks!

kwerner · Oct 22, 2020

Thanks for sending that. I think the problem is almost certainly the number of processors you are using. In order to run your very large domain 3, you will need many more processor; however, running with the required amount for d03 is going to be too many processors for d01. Take a look at this FAQ that explains how to choose the correct number of processors and pay close attention to the final paragraph that discusses this problem.

kellynunez · Oct 22, 2020

Thank you for this information. Based on this, I have tried several number of nodes but haven't been successful (given the 36 processors per node in Cheyenne):
- 9 nodes for a total of 324 processors
- 36 nodes for a total of 1296 processors
- 81 nodes for a total of 2916 processors (still waiting for this one to run)

Note that given your equations, the most amount of processors I should use are 3,157 and the least 6

I think before attempting ndown, I might just go back and reduce my domains (especially my domain 3). In that case would I just need to re-run geogrid.exe, metgrid.exe and real.exe? Or would I also have to run ungrib.exe?

Thanks again,
Kelly

kwerner · Oct 23, 2020

Hi Kelly,
As a test, you could try running a quick test with only 2 domains to make sure that it's the contrast in sizes that is causing the problem. But yes, those are the only executables you need to re-run. It's not necessary to run ungrib again, as that is independent from the domain settings.

kellynunez · Oct 23, 2020

I did the test runs for 2 and 1 domain and still, I'm getting the same error. I wonder if it could be related to something else or if I should just shrink all 3 domains then.

Kelly

kwerner · Oct 23, 2020

Kelly,
I took a look at your runs. For these 1/2 domain runs, you are using too many processors. Remember that the most you should use should be based on your smallest domain. So if you're running only d01 (388x144) and d02 (997x340), then the most processors you should be using is about (388/25) x (144/25), which is about 89, and you are using 324. Try running with something closer to ~90 and see if that works better for you. You could use that same number for running both d01 and d02 together.

kellynunez · Oct 25, 2020

Hi there,

I tried multiple combinations with fewer processors. I tried 3 nodes using 36 processors in each (108 total processors), 2 nodes also using 36 processors in each (72 total processors) and 3 nodes using 30 processors in each (90 total processors), but still, I get the same error.

I think the issue then might be something regarding the landsea mask. I am wondering if it has to do with the input data. I linked the glade rda data to the folder: /glade/scratch/knocasio/ERA5_Aug_Sept_2006. In this folder I also linked the
e5.oper.invariant.128_172_lsm.ll025sc.1979010100_1979010100.grb file. I thought that the ungrib would be smart enough to use this file for all the input files for the model. I don't want to re-configure my domains just yet since I have not been able to run it even with just 1 domain and 90 processors.

I thank you in advance for any additional suggestions you may have.

kwerner · Oct 28, 2020

Hi Kelly,
If you are using time-invariant fields, you will need to process your input files a different way. You will need to run ungrib twice - once for the invariant data, and then again for all the other fields. You can use the same Vtable each time, but you need to use a different prefix for the invariant fields. You then need to add a line to the &metgrid section of namelist.wps, called "constants_name." Take a look at the description in Chapter 3 of the WRF Users Guide. The first part discusses the "constants_name" line, and the second tells you how to run ungrib for multiple data types. You can follow the basic logic of those for what you're trying to do. Since your run is so long, I would recommend just going through this for just a few time frames to see if it can run all the way through WRF. You don't need to re-run geogrid.

kellynunez · Nov 4, 2020

Thank you for all your help.

I have re-run by following your latest instructions, using the same dimensions (d01 (388x144) and d02 (997x340)) and running 90 processors with just 2 domains for when executing wrf.exe. Unfortunately, I am getting another error, although similar than the last one. I tried reducing my time step from 216 to 60 but still getting this error (attached). Perhaps is time to redefine my domains?

kellynunez · Nov 4, 2020

I have attempted one more thing. I re-ran real.exe and wrf.exe using surface_input_source (in physics options of namelist.input) with option 1 instead of 2. Now I have a completely new error, but it seems like it did more than having it in option 2. I have attached a screenshot of an rsl.out and an rsl.error file. I think the issue is still with the surface inputs or the landsea mask, otherwise domain 1 should have run by now based on our thread.

Ming Chen · Nov 6, 2020

I just wonder whether you are still using the precompiled code of WRRV3.9.1? If so, please recompile the code. We know that those precompiled code no longer works due to software update.
Please recompile and rerun the case. Let us know if you still have problems.

kellynunez · Nov 7, 2020

I am using the precompiled code of WRF3.9.1. Do you mean that all of the precompiled codes do not work anymore or just the one for the 3.9.1 version?

Ming Chen · Nov 7, 2020

I guess all the precompiled codes no longer work. I didn't tested all of them, but at least those I tested d(V3.9.1, V4.0 and V4.1) need to be recompiled.

kellynunez · Nov 11, 2020

I have recompiled WRF. Once I did that I re-ran my case but I got the same type of error (the last error I shared in the thread). In addition to this, I ran a test case following the WRF tutorial and I was able to successfully run WRF for the test case. This makes me conclude that my issue must be my input data and/or namelist.input. My next attempt is to try another input data I have used in the past (ERA-Interim) but not the rda ERA-Interim.

It has been a long month of going at this... I thank you for your guidance, hopefully we can find the 'bug' soon.

Ming Chen · Nov 12, 2020

Please try and let me know if you still have problems. By the way, I believe ERA-I downloaded from RDA should work perfectly. You may also try GFS.

kellynunez · Nov 15, 2020

Thank you for all for your help. I was able to run WRF for my case. I realized that after re-compiling I was getting some CFL errors. I changed my time step and now its running. Thanks again!

Ming Chen · Nov 19, 2020

Thanks for the update. We appreciate that you confirm the precompiled code no longer works.

(RESOLVED) forrtl: severe (66): output statement overflows record, unit -5, file Internal List-Directed Write

New member

Attachments

New member

Administrator

New member

Administrator

New member

Administrator

New member

Administrator

New member

Administrator

New member

Attachments

New member

Attachments

Moderator

New member

Moderator

New member

Moderator

New member

Moderator