Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

wrf.exe process manager error

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Catalyst26

New member
Hello All,

Please, I am having problem with wrf.exe as it crashes. Attached is the response I got.

I use the command "mpirun -np 8 ./wrf.exe". I have been using same command previously without having problems with it and even for the same domain. But it recently started crashing and I don't know what could have caused it. I have tried to reduce the number of processors by using "mpirun -np 6 ./wrf.exe" command but it still crashed.

I'll appreciate your guide and help in solving this problem, please.

Thanks and kind regards,

Catalyst
 

Attachments

  • IMG_20201017_182609.jpg
    IMG_20201017_182609.jpg
    1.7 MB · Views: 641
Hi,
I have moved this post to it's own topic since it was not related to the topic where you originally posted.

I believe this error is system-related, and not related to the WRF model. Do you have rsl.error* files after the run? If so, can you package those into a single *.TAR file and attach that, along with your namelist.input file so that I can make sure I don't see any specific WRF-related error? Thanks.
 
kwerner said:
Hi,
I have moved this post to it's own topic since it was not related to the topic where you originally posted.

I believe this error is system-related, and not related to the WRF model. Do you have rsl.error* files after the run? If so, can you package those into a single *.TAR file and attach that, along with your namelist.input file so that I can make sure I don't see any specific WRF-related error? Thanks.


Dear kwerner,

Thanks for your response. Even with 4 processors, the WRF still crashed. Attached contains the rsl.error files and the namelist.input. I'll appreciate your urgent help as it's very urgent. Please, I don't know where you've moved the post to, kindly mention me there.

Thanks and kind regards
 

Attachments

  • wrf_files.tar
    14.5 MB · Views: 37
Hi,
I meant that I moved the post here, as it's own topic. It's not anywhere else, but here.

Thanks for sending those files. I have a few thoughts about what may be going on:
1) You have "frames_per_outfile = 1000, 1000, 1000," This means you are putting 1000 output times into each wrfout* file. It could be that these files are getting too large. NetCDF places a limit on file size. You can check the size of those files, and if they are reaching 2GB, then you can try recompiling the code with large file support. You will need to clean the code (./clean -a), and then set:
Code:
setenv WRFIO_NCD_LARGE_FILE_SUPPORT 1
(which is a csh setting. Use "export WRFIO_NCD_LARGE_FILE_SUPPORT=1" for bash)
This will allow your files to be as large as 4GB without any problems. However, it may be an easier option to simply modify the "frames_per_outfile" setting to something smaller, so that the files aren't so large.
2) Since you are running for a long time, you may be running out of disk space. Check to see if you have enough space left on the disk where the output is being written (note that disk space is different than memory).
3) If neither of these helps the problem, try running only domain 01 to see if you still have problems. If not, try d01 and d02. This will help to narrow down which domain is causing the problem.
4) I see you are using V3.8.1, which is pretty old now. If none of the above help, can you run this test with the latest version of the code to see if perhaps there was a code problem that has been solved since this older version?
Thanks!
 
Top