Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF output JOINER

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

hahnd_

New member
Hello,

I downloaded, compiled, and ran the wrfout joiner today. I ran into a few problems. I was able to resolve a couple of them, so I have suggestions to improve the code and documentation for those, and would like help with another one.

I downloaded the JOINER code from https://www2.mmm.ucar.edu/wrf/users/special_code.html

1. The documentation shows that the namelist.join file should be redirected into the executable. I found that this does not work, and the namelist file should be a command line parameter instead. The documentation says in multiple places to run "joinwrf < namelist.join". This should be updated to show "joinwrf namelist.join".

2. The provided namelist.join file has an error in the &patch section. It seems that the proc_sw variable has been replaced by proc_start_x and proc_start_y. The comments and parameters in the namelist.join file should be updated to reflect this change.

This is the one I would like some help with:
3. After the above changes, I was able to run the code as a single process without mpi, however the MPI run failed. The command and errors are below. Any ideas what the problem might be?

mpirun -np 4 joinwrf namelist.join
...
...
WRF file 15: patch - 571 = ./wrfout_d01_2018-08-18_22:00:00_0570
WRF file 15: patch - 572 = ./wrfout_d01_2018-08-18_22:00:00_0571
WRF file 15: patch - 573 = ./wrfout_d01_2018-08-18_22:00:00_0572
WRF file 15: patch - 574 = ./wrfout_d01_2018-08-18_22:00:00_0573
WRF file 15: patch - 575 = ./wrfout_d01_2018-08-18_22:00:00_0574
WRF file 15: patch - 576 = ./wrfout_d01_2018-08-18_22:00:00_0575
nextdfil = 15
ids, ide, jds, jde = 1 800 1 800
idss, idse, jdss, jdse = 1 799 1 799

*****************************
The joined subdomain is: stag - ids = 1, ide = 800;
jds = 1, jde = 800.
unstag - idss= 1, idse= 799;
jdss= 1, jdse= 799.

Done writing nopatch variable Times
Abort(643844) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Send: Invalid tag, error stack:
PMPI_Send(159): MPI_Send(buf=0x7a35a0, count=1, MPI_INTEGER, dest=0, tag=1002001, MPI_COMM_WORLD) failed
PMPI_Send(97).: Invalid tag, value is 1002001
Abort(67752708) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Send: Invalid tag, error stack:
PMPI_Send(159): MPI_Send(buf=0x7a35a0, count=1, MPI_INTEGER, dest=0, tag=2002001, MPI_COMM_WORLD) failed
PMPI_Send(97).: Invalid tag, value is 2002001
Abort(134861572) on node 3 (rank 3 in comm 0): Fatal error in PMPI_Send: Invalid tag, error stack:
PMPI_Send(159): MPI_Send(buf=0x7a35a0, count=1, MPI_INTEGER, dest=0, tag=3002001, MPI_COMM_WORLD) failed
PMPI_Send(97).: Invalid tag, value is 3002001
 
Hi,
First, thank you so much for providing the updates to the documentation. This is an older script that has been passed down and it's not something we officially support, so it's great to have updates to provide to others who may run into similar issues.

I did run this by someone who is really familiar with the program and they agree with your documentation modifications. They also said that, regarding the issue you're seeing with MPI jobs, they have started running into the same issue when using it in various cloud instances, but are not seeing it on their local machine, or our NCAR HPC. They weren't able to fix the issue, but said they weren't sure running it with MPI was necessary anyway. With some timing tests on our HPC, running this with single processor vs. multi-processor was giving about the same timing. So they've let go of the MPI setup, and have been content running it on a single processor.

They did point out another glitch, however. The mpistatus variable should be declared with "dimension(MPI_STATUS_SIZE)" This solved problems on some platforms for them, but may not be the issue here.
 
Top