Inserting write-statements into PIO

Questions about modifying the MPAS-Atmosphere code, and general discussion of MPAS code structure and development
Post Reply
Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Inserting write-statements into PIO

Post by Carl Ponder » Mon Jun 22, 2020 8:03 am

I'm trying to debug a problem I'm seeing with MPAS-A running with PIO (and other components).
I'm able to insert statements like this

Code: Select all

call mpas_log_write('Debug checkpoint 2')
into the MPAS-A source, and see the corresponding output.
But if I insert this into the PIO source I don't see any output

Code: Select all

write(6,*) "Debug checkpoint 2.1"
suggesting to me that MPAS-A is re-mapping the I/O streams somehow.
Are there channels I can use with fortran/write and C/printf to make sure the output makes it to the screen?

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Mon Jun 22, 2020 3:10 pm

This doesn't show up either:

Code: Select all

write(0,*) "Debug checkpoint 2.1"

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Mon Jun 22, 2020 3:19 pm

Nor did these show up:

Code: Select all

USE ISO_FORTRAN_ENV, ONLY : ERROR_UNIT, OUTPUT_UNIT
....
write(OUTPUT_UNIT,*) "Debug 2.0 (output)"
flush(unit=OUTPUT_UNIT)
write(ERROR_UNIT,*) "Debug 2.0 (output)"
flush(unit=ERROR_UNIT)

mgduda
Posts: 458
Joined: Mon Feb 26, 2018 7:35 pm

Re: Inserting write-statements into PIO

Post by mgduda » Mon Jun 22, 2020 6:45 pm

Before the introduction of the logging module in MPAS v6.0, we did re-direct stderr and stdout to log files; but, since MPAS v6.0, this is no longer the case. I've had no problems using simple "write(0,*)" statements to debug PIO in the past; is it possible that there's an issue with the batch system not capturing stdout/stderr from the jobs you're running?
NCAR/MMM

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Tue Jun 23, 2020 4:58 pm

(also logged as an MPAS-A issue on this page
https://github.com/MPAS-Dev/MPAS-Model/issues/609
but closed, to carry the discussion here instead)

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Tue Jun 23, 2020 5:02 pm

Our SLURM system has never shown this sort of problem.
I'm using the OpenACC version of MPAS-A which may be a snapshot prior to your un-doing the re-direction.
I believe that streams 2 & 3 usually make it to the screen in C, are there any other Fortran units that I can try using?

mcurry
Posts: 77
Joined: Mon Oct 29, 2018 5:33 pm
Location: Boulder, Co

Re: Inserting write-statements into PIO

Post by mcurry » Tue Jun 23, 2020 7:51 pm

Is execution getting to the output statements in PIO? Does a write statement in a higher level MPAS function such as a the run function in mpas_atm_core.F or mpas_init_atm_core.F produce output? Is it possible to open a file a write to it?
I'm using the OpenACC version of MPAS-A which may be a snapshot prior to your un-doing the re-direction.
If this is the case, then this would most likely be the cause of the problem you are experiencing. Can you confirm if you are using pre or post v6.0?
I believe that streams 2 & 3 usually make it to the screen in C, are there any other Fortran units that I can try using?
Values 1 and 2 are the output streams for C (stdout and stderr respectively). Either should produce output (if those streams aren't being redirected). If the stdout and stderr aren't being redirected, than streams 0 (stderr) and 6 (stdout) should produce output to the terminal in Fortran.
NCAR|MMM

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Tue Jun 23, 2020 9:32 pm

The routine in the PIO is called by the MPAS-A routine where this was called:

call mpas_log_write('Debug checkpoint 2')

The above did work. Once it got down into the PIO then I can get any output through.
I'll try working with the MPAS-A 7.0 code instead, it's showing the same failure as the OpenACC code.
Have you ever tested it with PGI 20.5 and OpenMPI 4.0.4?

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Tue Jun 23, 2020 11:59 pm

Same issue with MPAS-A 7.0 source code.
This call from the file MPAS-Model-7.0/src/framework/mpas_io.F

1328 call mpas_log_write('Carl-debug checkpoint 1')
1329 call PIO_initdecomp(handle % ioContext % pio_iosystem, pio_type, dimlist, compdof, new_decomp % decomphandle % pio_iodesc)
1330 call mpas_log_write('Carl-debug checkpoint 2')

gives output from the first line but evidently the call never returns.
Putting write-statements into the PIO_initdecomp doesn't produce any output.

Also, I'd tried to use write statements in the other MPAS-A snapshot i have, but haven't been able to get any output from there either.
What unit is the mpas_log_write function using?

mgduda
Posts: 458
Joined: Mon Feb 26, 2018 7:35 pm

Re: Inserting write-statements into PIO

Post by mgduda » Wed Jun 24, 2020 12:03 am

The OpenACC version of MPAS-A is based on MPAS v6.1; that, combined with the fact that the mpas_log_write function exists, suggests that there shouldn't be an issue of having code that still redirects stdout/stderr.

As a test, you could try adding a write(0,*) statement before and after the call to mpas_init in src/driver/mpas.F . If you can see the write statement before the call to mpas_init, but not the write statement after, that would suggest that stderr is somehow getting redirected by MPAS; but, if you can't see either write statement, then it suggests an issue with the batch system or with buffering of stdout/stderr.
NCAR/MMM

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Wed Jun 24, 2020 12:30 am

What unit is the mpas_log_write using? It'd be easier for me to just use that.

mgduda
Posts: 458
Joined: Mon Feb 26, 2018 7:35 pm

Re: Inserting write-statements into PIO

Post by mgduda » Wed Jun 24, 2020 1:08 am

The unit numbers used by stand-alone MPAS-Atmosphere aren't hard-wired anywhere in the code, but are determined at runtime. Fortunately, they should be set deterministically. This code block in mpas_log.F sets the Fortran unit numbers used by the log files.
NCAR/MMM

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Wed Jun 24, 2020 2:14 am

Using this

Code: Select all

     13    write(0,*) "Carl-debug: Checkpoint 0.1"
     14    call mpas_init()
     15    write(0,*) "Carl-debug: Checkpoint 0.2"
I get the first output. The second doesn't show up because the mpas_init is dying.
But it does still work when in gets down to here in ./MPAS-Model-7.0/src/framework/mpas_io.F

Code: Select all

   1327       dimlist(ndims) = field_cursor % fieldhandle % dims(ndims) % dimsize
   1328           call mpas_log_write('Carl-debug checkpoint 1')
   1329    write(0,*) "Carl-debug: Checkpoint 1"
   1330       call PIO_initdecomp(handle % ioContext % pio_iosystem, pio_type, dimlist, compdof, new_decomp % decomphandle % pio_iodesc)
but similar write statements don't work inside the PIO call.

That being said, maybe the failure is in the de-referencing of the parameters rather than inside the PIO call itself.
The failure-messages I see from the execution are these

Code: Select all

+ mpirun -n 1 -N 1 ../MPAS-Model-7.0/atmosphere_model.2020-06-08
 Carl-debug: Checkpoint 0.1
 Carl-debug: Checkpoint 1
[prm-dgx-04:26367:0:26367] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7fffd3055ff8)
==== backtrace (tid:  26367) ====
 0  /gpfs/fs1/SHARE/Utils/UCX/1.8.0/GCC-BASE-7.4.0_CUDA-11.0.1.0_450.36.06/lib/libucs.so.0(ucs_handle_error+0x124) [0x7f1386910de4]
 1  /gpfs/fs1/SHARE/Utils/UCX/1.8.0/GCC-BASE-7.4.0_CUDA-11.0.1.0_450.36.06/lib/libucs.so.0(+0x24215) [0x7f1386911215]
 2  /gpfs/fs1/SHARE/Utils/UCX/1.8.0/GCC-BASE-7.4.0_CUDA-11.0.1.0_450.36.06/lib/libucs.so.0(+0x244a9) [0x7f13869114a9]
 3  /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f138904e890]
 4  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x9de67c]
 5  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x9ea283]
 6  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x9ecc84]
 7  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x9e6e0f]
 8  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x9c2069]
 9  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x9c2202]
10  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x9045bf]
11  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x91aa2e]
12  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x917090]
13  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x40d99c]
14  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x40c24d]
15  ../MPAS-Model-7.0/atmosphere_model.2020-06-08() [0x40c1d3]
=================================
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 26367 on node prm-dgx-04 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
The UCX errors suggest that the failure has something to do with MPI, but given that the backtrace doesn't show any PIO layer, I'm thinking that it's dying on line 1330 of the MPAS-A code and the UCX messages are due to an unclean exit instead.

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Wed Jun 24, 2020 9:51 am

The backtrace details are a little different if I compile with DEBUG=on

[prm-dgx-30:74879:0:74879] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7fff65ac2ff8)
==== backtrace (tid: 74879) ====
0 /gpfs/fs1/SHARE/Utils/UCX/1.8.0/GCC-BASE-7.4.0_CUDA-11.0.1.0_450.36.06/lib/libucs.so.0(ucs_handle_error+0x124) [0x7f911e064de4]
1 /gpfs/fs1/SHARE/Utils/UCX/1.8.0/GCC-BASE-7.4.0_CUDA-11.0.1.0_450.36.06/lib/libucs.so.0(+0x24215) [0x7f911e065215]
2 /gpfs/fs1/SHARE/Utils/UCX/1.8.0/GCC-BASE-7.4.0_CUDA-11.0.1.0_450.36.06/lib/libucs.so.0(+0x244a9) [0x7f911e0654a9]
3 /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f91207a2890]
4 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(pioassert+0xc) [0x15fd55c]
5 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(coord_to_lindex+0x63) [0x1609163]
6 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(box_rearrange_create+0x994) [0x160bb64]
7 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(PIOc_InitDecomp+0x8bf) [0x1605cef]
8 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(piolib_mod_pio_initdecomp_internal_+0x3d9) [0x15e0f49]
9 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(piolib_mod_pio_initdecomp_dof_i8_+0xd2) [0x15e10e2]
10 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(mpas_io_mpas_io_set_var_indices_+0x6431) [0x13bf051]
11 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(mpas_bootstrapping_mpas_io_setup_cell_block_fields_+0x8d6) [0x14846a6]
12 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(mpas_bootstrapping_mpas_bootstrap_framework_phase1_+0x16b6) [0x147d966]
13 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(mpas_subdriver_mpas_init_+0x3c6b) [0x44211b]
14 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(MAIN_+0xae) [0x43e41e]
15 ../MPAS-Model-7.0/atmosphere_model.2020-06-08(main+0x33) [0x43e353]

Maybe the PIO layer was not reported before because the PIO library is compiled-in as a ".a" archive instead of a ".so" shared-object, so the backtrace shows the execution being in the MPAS-A source rather than the PIO.

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Wed Jun 24, 2020 9:53 am

It looks like the reason the writes aren't working is because I put them in the wrong subroutine.
They're called through an overloaded interface and the call-stack is referring to a different one.
I'm going to close this issue. Hopefully I'll make more progress figuring out where the failure is.

Carl Ponder
Posts: 15
Joined: Mon Jun 22, 2020 7:55 am

Re: Inserting write-statements into PIO

Post by Carl Ponder » Wed Jun 24, 2020 9:54 am

Can you close this? I don't see a button to do it.

mgduda
Posts: 458
Joined: Mon Feb 26, 2018 7:35 pm

Re: Inserting write-statements into PIO

Post by mgduda » Wed Jun 24, 2020 1:41 pm

Thanks very much for following-up. It's good to know that there isn't any redirection of stdout/stderr at fault. I remember running into similar problems when debugging PIO with print statements in the past, too -- it took me a few wrong guesses before I put my write statements in the correct implementation of an overloaded interface.

There isn't really a "close" option for discussion threads, so no worries there.
NCAR/MMM

Post Reply

Return to “Code development”