"starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Topics specifically related to the wrf.exe program
Post Reply
mcanonic
Posts: 11
Joined: Tue May 26, 2020 8:21 am

"starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by mcanonic » Tue May 26, 2020 9:04 am

Hi all,
I'm new in this field and I'm helping a colleague to run this model in the cloud. I have created a ubuntu Virtual Machine (VM) and I've followed the steps described here to install and configure all the software.

In a previous VM everything works fine. When we run:
mpirun -np 4 wrf.exe
the output was:
starting wrf task 0 of 4
starting wrf task 1 of 4
starting wrf task 3 of 4
starting wrf task 2 of 4

but int the new VM the same command produces:
starting wrf task 0 of 1
starting wrf task 0 of 1
starting wrf task 0 of 1
starting wrf task 0 of 1

As I mentioned before, I have not much experience. I've tried to see the output files like rsl.error.0000 but I did not find any hints.

Could you suggest me where am I wrong?

The VM that I used has 16 VCPUs.

Thanks,
Massimo

Ming Chen
Posts: 1101
Joined: Mon Apr 23, 2018 9:42 pm

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by Ming Chen » Tue May 26, 2020 6:21 pm

Massimo,
It seems that only one processor is activated to run your case. My question here is:
How did you compile WRF (in serial, dmpar or smear mode)?
Did the case run to the end?
Is there any other error message in your rsl files or log file?
WRF Help Desk

mcanonic
Posts: 11
Joined: Tue May 26, 2020 8:21 am

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by mcanonic » Tue May 26, 2020 9:29 pm

Ming Chen wrote:
Tue May 26, 2020 6:21 pm
Massimo,
It seems that only one processor is activated to run your case. My question here is:
How did you compile WRF (in serial, dmpar or smear mode)?
Did the case run to the end?
Is there any other error message in your rsl files or log file?
Hi Ming,
I followed the instruction available here:
https://www2.mmm.ucar.edu/wrf/OnLineTut ... .php#STEP2
and when I configure WRF, I select option 34:
32. (serial) 33. (smpar) 34. (dmpar) 35. (dm+sm) GNU (gfortran/gcc)

I re-run the ./configure (where I selected 34 and then 1),
and then the command ./compile em_real >& log.compile
but now for some reason I get this error:
---> Problems building executables, look for errors in the build log <---

I'm attaching the log.compile file, maybe you can help me.

Thanks again,
M
log.compile
(176.13 KiB) Downloaded 22 times

Ming Chen
Posts: 1101
Joined: Mon Apr 23, 2018 9:42 pm

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by Ming Chen » Tue May 26, 2020 11:19 pm

Massimo,
I guess you didn't type ./clean -a before you recompile the code. Please let me know if I am wrong.
./clean -a will remove all previously compiled codes. Without it, the old and new settings will be mixed and cause failure of compiling.
Please type ./cean -a, then recompile and save the log file for me to take a look.
By the way, are you working on Amazon clouds?
WRF Help Desk

mcanonic
Posts: 11
Joined: Tue May 26, 2020 8:21 am

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by mcanonic » Wed May 27, 2020 11:47 am

Thanks! With clean the exe files are back.
We use Chameleon project which use OpenStack as Cloud Platform. If you have any question about cloud computing, we can discuss privately.
So my colleague with the exe files do like this:

Code: Select all

$:~/WRF/test/em_real$ mpirun -np 4 real.exe
 starting wrf task            0  of            1
 starting wrf task            0  of            1
 starting wrf task            0  of            1
 starting wrf task            0  of            1


By running this command, i've got some errors:

Code: Select all

$:~/WRF/test/em_real$ mpirun -np 4 wrf.exe&
[1] 19219
$:~/WRF/test/em_real$  
 starting wrf task            0  of            1
 starting wrf task            0  of            1
 starting wrf task            0  of            1
 starting wrf task            0  of            1

Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[51791,1],2]
  Exit code:    1
--------------------------------------------------------------------------
In the error file, what we got is this:

Code: Select all

-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  module_date_time.G  LINE:     910
WRFU_TimeSet() in wrf_atotime() FAILED   Routine returned error code =           -1
-------------------------------------------
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
Any suggestions?

Thanks,
M

Ming Chen
Posts: 1101
Joined: Mon Apr 23, 2018 9:42 pm

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by Ming Chen » Wed May 27, 2020 5:53 pm

(1) By mpirun -np 4, you should have four rsl files, --- is this what you have?
(2) By saying "Primary job terminated normally", do you mean the case run to the end? If not, how long did it integrate before it crashed?
(3) Which version of WRF are you using? Please send me your namelist.input to take a look.
WRF Help Desk

mcanonic
Posts: 11
Joined: Tue May 26, 2020 8:21 am

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by mcanonic » Thu May 28, 2020 8:07 am

Ming Chen wrote:
Wed May 27, 2020 5:53 pm
(1) By mpirun -np 4, you should have four rsl files, --- is this what you have?
(2) By saying "Primary job terminated normally", do you mean the case run to the end? If not, how long did it integrate before it crashed?
(3) Which version of WRF are you using? Please send me your namelist.input to take a look.
1) it creates just one file
2) That message is the output that I get by executing the command "mpirun -np 4 wrf.exe&"
3) WRF Model Version 4.2
namelist.output.txt
(83.98 KiB) Downloaded 24 times
I'm attaching the file namelist.input.

I can provide you (or to who want take a look at the VM) access, I just need the public key.

Thanks for your help!
M

mcanonic
Posts: 11
Joined: Tue May 26, 2020 8:21 am

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by mcanonic » Mon Jun 01, 2020 7:08 am

HI all,
any news on this?
Thanks,
M

Ming Chen
Posts: 1101
Joined: Mon Apr 23, 2018 9:42 pm

Re: "starting wrf task 0 of 1" instead of "starting wrf task 0 of 4"

Post by Ming Chen » Mon Jun 01, 2020 4:46 pm

I don't think this problem is related to the model. It is a machine issue. I believe that either the machine lib or the environmental settings are wrong in this case, which leads to failed MPI run.
Please consult your computer manager or colleagues regarding the machine issue.
WRF Help Desk

Post Reply

Return to “wrf.exe”