Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF run terminated with MPI

tikargrg

New member
Hi everyone,

I’m seeking assistance regarding an issue with my WRF simulation that was unexpectedly terminated.

I am running WRF with a nested domain and using a restart file. The simulation worked perfectly from the beginning of 2016 until May 16, 2017, at 21:00. However, it suddenly terminated, and I can’t seem to get it to proceed. I encountered the following error message: “mpirun noticed that process rank 38 with PID 1042446 on node c2014 exited on signal 11 (Segmentation fault).”

Could this issue be related to WRF itself, or might it stem from the machine? For your reference, I have attached my `namelist.input`, `rsl.error.0000`, and job submission error files.

Thank you for your help!

Best regards,
 

Attachments

  • namelist.input
    4.6 KB · Views: 4
  • rsl.error.0000
    865.4 KB · Views: 0
  • wrf_job.error.txt
    11.7 KB · Views: 0
This is definitely a model issue. Something went wrong with your case. Can you upload your rsl.error.0038 and rsl.out.0038 for me to take a look?
 
This is definitely a model issue. Something went wrong with your case. Can you upload your rsl.error.0038 and rsl.out.0038 for me to take a look?
Hi Ming,

Thank you for your response. As you suggested, I have attached the error files from my recent run: `rsl.error.0038`, `rsl.out.0038`, along with `rsl.error.0000`, `rsl.out.0000`, and the job submission error file. I appreciate your support!

Best regards,
 

Attachments

  • rsl.error.0000
    865.4 KB · Views: 0
  • rsl.error.0038.txt
    575.9 KB · Views: 2
  • rsl.out.0000
    864 KB · Views: 1
  • rsl.out.0038.txt
    574.9 KB · Views: 1
  • wrf_job.error.txt
    11.7 KB · Views: 0
Hi,

In your rsl files, I found many error messages related to CFL violation, e.g.,

Max W: 290 64 30 W: 21.10 w-cfl: 2.20 dETA: 0.04

This indicates that the model becomes numerically unstable. I have a few concerns regarding your namelist.input:

(1) please turn on w_damping ( w_damping =1)
(2) increase the value of epssm (epssm = 0.6 or even larger)
(3) Is there any special reason you set etac = 0.04? Its default value is 0.2 and we expect this is a reasonable value.
(4) reduce time_step to 4 x DX, e.g., time_step = 48

Please turn off spectral nudging and see whether this case can run.

By the way, where is your domain located ? What is the forcing data for this case?
 
Hi Ming,
Thank you for your response. The model indeed became numerically unstable, but it had been running for over a year before it was terminated. Can you explain how this situation works?

I've started running the model as you recommended. The reason for setting "etac = 0.04" is due to the high-altitude region, as the domain encompasses the High-Mountain Asia area.

Best regards,
 
Top