Not successful in running real.exe

satiz

New member
Greetings,
I am trying to re-simulate a work (Recovery processes in a large offshore wind farm) that studies the flow recovery in wind farms. The paper which I'm referring to uses WRF 4.2.1- bug fix version but I'm trying to simulate the results with WRF 4.3.3 (with WPS 4.3.1). I have followed the same procedure as mentioned in the paper and edited the code accordingly in WRF 4.3.3+WPS 4.3.1. I have successfully compiled the WPS but I got stuck at running real.exe. Please help me to resolve my issue.
Thank You.
Note- Enclosed are the namelist files and rsl.error.out file
 

Attachments

  • namelist.input.txt
    5.1 KB · Views: 3
  • namelist.wps.txt
    871 bytes · Views: 2
  • rsl error 0000.txt
    13.9 KB · Views: 4

kwerner

Administrator
Staff member
Hi,
The issue is most likely the fact that your domains are pretty large and you are only using a single processor to run real.exe. The real program typically does not need a lot of processors (not nearly as many as wrf.exe), but since your domains are on the large size, I believe you will need more than 1. In case you're interested, when you are ready to run wrf.exe, take a look at this FAQ that describes the process for choosing a reasonable number of processors, based on your domain size.
 

satiz

New member
I have tried running the real.exe using multiple processors in interactive and batch modes in HPC. But both resulted in some unknown failures. First, I tried with interactive mode in HPC, but it ended up in error. So I gave batch run (as it is large domain) and again it ended with some error. Attached the necessary files.
Files_description:
ErrorBatch1.png - Batch job showed segmentation fault
ErrorInter1.png ( and all other ErrorInter1***.png) - Interactive job submission
140722PBS_sh - Batch script for running Batch job
140722PBS_output - o/p from batch script
140722PBS_error - error from batch script
 

Attachments

  • ErrorBatch1.png
    ErrorBatch1.png
    16.2 KB · Views: 3
  • ErrorInter1.png
    ErrorInter1.png
    249.8 KB · Views: 3
  • ErrorInter11.png
    ErrorInter11.png
    177.7 KB · Views: 3
  • ErrorInter111.png
    ErrorInter111.png
    180.2 KB · Views: 3
  • ErrorInter11111.png
    ErrorInter11111.png
    32.6 KB · Views: 3
  • ErrorInter1111.png
    ErrorInter1111.png
    238.7 KB · Views: 3
  • 140722PBS_sh.txt
    641 bytes · Views: 3
  • 140722PBS_output).txt
    43 bytes · Views: 2
  • 140722PBS_error.txt
    195 bytes · Views: 1
  • rsl-error-0000.txt
    13.7 KB · Views: 3
Last edited:

kwerner

Administrator
Staff member
Hi,
I am still a bit confused by how many processors you're using. The 140722PBS_sh.txt script shows that you're asking for, perhaps, 49 processors in this line:
Code:
#PBS -l select=7:ncpus=7
but then you are requesting 51 processors in the actual command line:
Code:
time -p mpirun -np 51 ./real.exe
But then in the rsl.error.0000 file you sent, again, it's only showing a single domain.
Code:
 Ntasks in X            1 , ntasks in Y            1

Using either 1 or 51 are probably not going to work. 1 is likely too small, and 51 is a prime number, meaning the decomposition will be 1x51. Can you run again, making sure that you are using, say 16 processors (4x4), and then if you are still getting errors, can you package all of the rsl.* files together in a *.tar file (not .rar - we can't open those) and attach those? Thanks!
 

satiz

New member
I requested for 49 processors (7x7) and used 49 processors for running real.exe. 51 is a typo - sorry for that. I'm running the same with 16 processors as per your post. Will upload the files once it is done.
 

satiz

New member
Hi,
I am still a bit confused by how many processors you're using. The 140722PBS_sh.txt script shows that you're asking for, perhaps, 49 processors in this line:
Code:
#PBS -l select=7:ncpus=7
but then you are requesting 51 processors in the actual command line:
Code:
time -p mpirun -np 51 ./real.exe
But then in the rsl.error.0000 file you sent, again, it's only showing a single domain.
Code:
 Ntasks in X            1 , ntasks in Y            1

Using either 1 or 51 are probably not going to work. 1 is likely too small, and 51 is a prime number, meaning the decomposition will be 1x51. Can you run again, making sure that you are using, say 16 processors (4x4), and then if you are still getting errors, can you package all of the rsl.* files together in a *.tar file (not .rar - we can't open those) and attach those? Thanks!
Hi,
This time I have used 64 processors with 40 hrs walltime. It is still not successful. Please find the output, error and batch job script in the jobFiles.tar.gz and rsl files under rsl.tar.gz. For the rsl files, I have checked the date modified to see if the rsl files are for the current run. But the modified date shown is june whereas I gave this run in the 1st week of August. Request for your immense help.
Thank You.
 

Attachments

  • rsl.tar.gz
    2.7 KB · Views: 2
  • jobFiles.tar.gz
    6.7 KB · Views: 2

kwerner

Administrator
Staff member
Hi,
Unfortunately the model still think you're only using a single processor. At the top of the new rsl.error.0000 file, it still shows
Code:
 Ntasks in X            1 , ntasks in Y            1

When you compiled the code, did you compile it for a serial build, or a dmpar build? If you aren't sure, can you send your configure.wrf file so I can check it? Thanks.
 

satiz

New member
Hi,
Unfortunately the model still think you're only using a single processor. At the top of the new rsl.error.0000 file, it still shows
Code:
 Ntasks in X            1 , ntasks in Y            1

When you compiled the code, did you compile it for a serial build, or a dmpar build? If you aren't sure, can you send your configure.wrf file so I can check it? Thanks.
I am not sure about that. But long back I used WRF 4.3 and did simulations using parallel processors. I followed the same procedures for compiling here for WRF 4.3.3 (attached as WRF_RUN.pdf) . Please find the attached configure.wrf file (latest).
 

Attachments

  • configure_wrf.tar.gz
    5.6 KB · Views: 1
  • WRF_RUN.pdf
    26 KB · Views: 2
Last edited:

kwerner

Administrator
Staff member
Based on your configure.wrf file, you did compile for parallel processing (dmpar). Since you are trying to use multiple processors in your batch scripts, but the model is still only running on a single processor, I'd recommend reaching out to a systems administrator at your institution to see if they have any ideas about what's going on, or what you can modify in your running script to make sure you're getting multiple processors. If you figure it out, let us know - it could help someone else in the future!
 
Top