Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Another forrtl: error (78): process killed (SIGTERM)

Afernandez

New member
I would greatly appreciate you can point me to the right direction to fix errors while running WRF 4.5.2. I’m running a 6-hour test simulation using MPI-ESM1-2-HR as input data, which I converted to WPS ready files using this application (GitHub - lzhenn/cmip6-to-wrfinterm: tools to process cmip6 data to drive wrf). I have 3 nested domains, 30km, 10km, and 2km. Real.exe runs fine but wrf.exe stops just after creating the first wrfout file for d03. When I checked the rsl files, I keep finding the error “forrtl: error (78): process killed (SIGTERM)”. After reviewing the forum, I have attempted the following things:
  • Changing the number of nodes/cores
  • Changing time steps: I have tried from 180 to 30s for d01
  • Changing domain sizes: I initially started with a 300x300 outermost domain, then 2 domains of 151x151 (d03 within d02). Now my intermediate domain is 325x325 and the innermost is 351x351.
  • I tried using w_damp=0 and 1
  • I also attempted smooth_cg_topo=.true. but the MPI-ESM data does not have the SOILHGT variable so real.exe won’t run.
  • Recompiled WRF with ./configure -D and tested with the 66 and 67 options.
I attach my namelist.input, rsl files (including one from Real), and srun script from my last attempt in case you can see something I haven’t. A few days ago I ran WRF successfully using the same data but in a different region; in this case I only used one domain though.

Thank you very much in advance
 

Attachments

  • namelist.input
    6.8 KB · Views: 4
  • rsl.error.0000
    16.8 KB · Views: 4
  • rsl.out.0000
    15.1 KB · Views: 2
  • REAL_EXE_rsl.error.0000
    5.6 KB · Views: 3
There are several things that could be contributing to your issue.

1. Are you getting any cfl errors in any of your rsl* files? You can check this with the following command:
Code:
grep cfl rsl*
If anything prints out, then the answer is 'yes.' Otherwise, there is no need to reduce your time_step or using smooth_cg_topo.

2. For the size of your domains, you should not be using 900 processors. That is too many. What other amounts have you tried? See Choosing an Appropriate Number of Processors for guidance

3. The SOILHGT variable is a mandatory input field for running WRF, so if you don't have that, that could be the issue. The fact that the model stops almost immediately often means there is an issue with the data.

4. You only have num_metgrid_soil_levels = 2, and you are using the Noah surface scheme. You will need 4 soil levels for this scheme. I believe the only one that allows only 2 levels is the RUC scheme.

5. Sometimes setting the parent_grid_ratio to different values for the children causes problems. You have it set to = 1, 3, 5. You could first try just running two domains and see if the model is okay with that. If so, then you could try using something like parent_grid_ratio= 1, 3, 3 or = 1, 5, 5.

I would highly recommend trying to run a very simple case first, since this data type isn't one that's commonly used. Just try a single domain, small, short simulation, using the default namelist. If that is successful, then you can move on to trying two domains, etc. You can slowly add more complicated options to your namelist, until you find the culprit that's causing the issue. Again, though, I'm not sure you can use these data if you're missing one of the required fields.
 
Top