Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

ERROR: MPAS IO Error: Bad return value from PIO

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Hi MPAS's support team,
I got the problem when ran the 72-h simulation with MPAS model.
The error that I got is:
ERROR: MPAS IO Error: Bad return value from PIO
I alslo attach my log files in the below.
Here is my namelist.atmosphere:
Code:
&nhyd_model
    config_time_integration_order = 2
    config_dt = 18.0
    config_start_time = '2015-07-27_00:00:00'
    config_run_duration = '0000-00-00_72:00:00'
    config_split_dynamics_transport = true
    config_number_of_sub_steps = 2
    config_dynamics_split_steps = 3
    config_h_mom_eddy_visc2 = 0.0
    config_h_mom_eddy_visc4 = 0.0
    config_v_mom_eddy_visc2 = 0.0
    config_h_theta_eddy_visc2 = 0.0
    config_h_theta_eddy_visc4 = 0.0
    config_v_theta_eddy_visc2 = 0.0
    config_horiz_mixing = '2d_smagorinsky'
    config_len_disp = 3000.0
    config_visc4_2dsmag = 0.05
    config_w_adv_order = 3
    config_theta_adv_order = 3
    config_scalar_adv_order = 3
    config_u_vadv_order = 3
    config_w_vadv_order = 3
    config_theta_vadv_order = 3
    config_scalar_vadv_order = 3
    config_scalar_advection = true
    config_positive_definite = false
    config_monotonic = true
    config_coef_3rd_order = 0.25
    config_epssm = 0.1
    config_smdiv = 0.1
/
&damping
    config_zd = 22000.0
    config_xnutr = 0.2
/
&limited_area
    config_apply_lbcs = true
/
&io
    config_pio_num_iotasks = 0
    config_pio_stride = 1
/
&decomposition
    config_block_decomp_file_prefix = 'vietnam.graph.info.part.'
/
&restart
    config_do_restart = false
/
&printout
    config_print_global_minmax_vel = true
    config_print_detailed_minmax_vel = false
/
&IAU
    config_IAU_option = 'off'
    config_IAU_window_length_s = 21600.
/
&physics
    config_sst_update = false
    config_sstdiurn_update = false
    config_deepsoiltemp_update = false
    config_radtlw_interval = '00:30:00'
    config_radtsw_interval = '00:30:00'
    config_bucket_update = 'none'
    config_physics_suite = 'mesoscale_reference'
/
&soundings
    config_sounding_interval = 'none'
/
And this is my streams.atmosphere:
Code:
<streams>
<immutable_stream name="input"
                  type="input"
                  filename_template="vietnam.init.nc"
                  input_interval="initial_only" />

<immutable_stream name="restart"
                  type="input;output"
                  filename_template="restart.$Y-$M-$D_$h.$m.$s.nc"
                  input_interval="initial_only"
                  output_interval="6:00:00" />

<stream name="output"
        type="output"
        filename_template="history.$Y-$M-$D_$h.$m.$s.nc"
        clobber_mode="overwrite"
        output_interval="6:00:00" >

        <file name="stream_list.atmosphere.output"/>
</stream>

<stream name="diagnostics"
        type="output"
        filename_template="diag.$Y-$M-$D_$h.$m.$s.nc"
        clobber_mode="overwrite"
        output_interval="6:00:00" >

        <file name="stream_list.atmosphere.diagnostics"/>
</stream>
Please help me to solve this problem!
Thank you in advance.
 
Could you attach your log.atmosphere.0000.out file? The "ERROR: MPAS IO Error: Bad return value from PIO" error message is usually associated with stream output rather than stream input, and it would be helpful to get an idea of which output stream might be the problem.

How many horizontal grid cells are in your limited-area domain, and how many MPI tasks are you using for your simulation?
 
Hi mgduda,
Sorry that my reply is too late.
My log files are attached in the below.
For your 2nd question, my domain has 835586 horizontal grid cells (60-km – 3-km mesh) and I used 2 nodes with 20 tasks per node, so there are 40 tasks in my case.
 

Attachments

  • log.atmosphere.0000.out.txt
    679 KB · Views: 51
  • log.atmosphere.0006.err.txt
    294 bytes · Views: 55
Hi mgduda,

I am having the same error.

I built MPAS V7.0 on Cheyenne following the instructions from a recent MPAS tutorial. I had to add two more steps to the seven-step procedure outlined in the tutorial. My build steps were as follows.

git clone https://github.com/MPAS-Dev/MPAS-Model.git
module unload netcdf
module load netcdf-mpi
module load pnetcdf
module load pio
cd MPAS-Model
make ifort CORE=init_atmosphere PRECISION=single USE_PIO2=true
make clean CORE=atmosphere
make ifort CORE=atmosphere PRECISION=single USE_PIO2=true

If you will allow it I can also upload my .out and .err files. I am using the x4.163842 mesh and have attempted to run it with 64 and 12 mpi tasks with the same results. My run only gets to the point where it outputs the initial time and then the PIO error occurs. I inspected the output file using ncdump and the numbers all look reasonable.

Searching around on some online forums I found that some have suggested that the error may be related to the fact that parallel netcdf has a 2 GB limit per write of a variable. According to my quick calculations all of the variables in my file should be less than that.

Thank you for any help you can provide.

Jen
 
@Jen If you're seeing the same PIO error on Cheyenne, I'd be glad to see whether I can reproduce it and track down the cause. Could you let me know which directory you're running in?
 
Hi Mgduda,

Sorry for the delay in responding. I didn't see the response earlier.

I am running on /glade/work/hegarty. I have a run script called run_mpas.csh on that directory that I run using qsub.

Jen
 
@jhegarty I think the issue in your case may be the definition of the "diagnostics" stream:
Code:
<stream name="diagnostics"
        type="output"
        filename_template="diagnostics.$Y-$M-$D_$h:$m:$s.nc"
        output_interval="1:00:00">
          <stream name="diagnostics"/>
</stream>
The definition is recursive, in that the contents of the diagnostics stream come from the definition of the diagnostics stream ("<stream name="diagnostics"/>"). Can you try again after disabling output of this stream (you can set output_interval="none") or changing the contents of the stream so that the stream doesn't reference itself? Additionally, on Cheyenne, you may also need to use more than two nodes for this simulation, or to request two large-memory nodes.
 
@ntmanhvn181 Could you try your simulation again, but first ensure that any existing output files have been removed? It's possible that there is an issue with overwriting existing output (even though you have specified clobber_mode="overwrite" in your stream definitions).

The issue seems to be with a stream that is written after 6 hours; it could be that the restart stream is the problem. If running again after removing all existing output files still doesn't work, could you try again after setting output_interval="none" for the restart stream?

You may be able to save some simulation time for these tests if you reduce the output_interval for all of your streams from 6:00:00 to, say, 1:00:00.
 
Hi Mgduda,

Those suggestions seemed to work. Removing the recursive definition in the diagnostics stream eliminated the PIO error and setting mem=109GB in the PBS statement has enabled it to continue running. I will let you know if it encounters additional problems.

Thanks,

Jen
 
Hi Mgduda,
Thank you for your useful suggestions.
I've tried set my option of clobber of restart to 'overwrite' like that:
<immutable_stream name="restart"
type="input;output"
filename_template="restart.$Y-$M-$D_$h.$m.$s.nc"
input_interval="initial_only"
clobber_mode="overwrite"
output_interval="6:00:00" />
It worked without error.
Thank you very much!
 
@ntmanhvn181 Thanks for following up. It's good to hear that adding clobber_mode="overwrite" to the "restart" stream worked!
 
Top