Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

An error occurred when generate the static file (MPAS-A))

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

X1a0wu

New member
The program stopped when I ran the model to generate the static field with 60-3km mesh, and the static file stopped at 1.04G.

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
init_atmosphere_m 0000000000679704 for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B4A4B12B5D0 Unknown Unknown Unknown
libc-2.17.so 00002B4A4B6CE621 Unknown Unknown Unknown
libc-2.17.so 00002B4A4B6C8E75 Unknown Unknown Unknown
libmpi.so.12.0 00002B4A4A0AD1F6 Unknown Unknown Unknown
libmpi.so.12 00002B4A4A0B7AE4 ADIOI_GEN_WriteSt Unknown Unknown
libmpi.so.12.0 00002B4A4A576AAC Unknown Unknown Unknown
libmpi.so.12 00002B4A4A577B25 PMPI_File_write_a Unknown Unknown
libpnetcdf.so.4.0 00002B4A492ACB48 ncmpio_read_write Unknown Unknown
libpnetcdf.so.4.0 00002B4A492A7726 Unknown Unknown Unknown
libpnetcdf.so.4.0 00002B4A492A5173 Unknown Unknown Unknown
libpnetcdf.so.4.0 00002B4A492A2DD6 Unknown Unknown Unknown
libpnetcdf.so.4.0 00002B4A491E7E62 ncmpi_wait_all Unknown Unknown
libpioc.so.1.3.1 00002B4A48595394 flush_output_buff Unknown Unknown
libpioc.so.1.3.1 00002B4A4856B53D PIOc_write_darray Unknown Unknown
libpioc.so.1.3.1 00002B4A485956BF flush_buffer Unknown Unknown
libpioc.so.1.3.1 00002B4A4856DF32 PIOc_sync Unknown Unknown
libpiof.so.1.2.1 00002B4A48208862 piolib_mod_mp_syn Unknown Unknown
init_atmosphere_m 0000000000650D03 Unknown Unknown Unknown
init_atmosphere_m 00000000005BC4D5 Unknown Unknown Unknown
init_atmosphere_m 00000000005BBEEA Unknown Unknown Unknown
init_atmosphere_m 000000000044BD58 Unknown Unknown Unknown
init_atmosphere_m 000000000040DB67 Unknown Unknown Unknown
init_atmosphere_m 000000000040DAEE Unknown Unknown Unknown
init_atmosphere_m 000000000040DA9E Unknown Unknown Unknown
libc-2.17.so 00002B4A4B65C3D5 __libc_start_main Unknown Unknown
init_atmosphere_m 000000000040D9A9 Unknown Unknown Unknown
 

Attachments

  • log.init_atmosphere.0000.nc
    598.6 KB · Views: 13
If you haven't already done so, it may be worth trying to unlimit the stacksize before running the init_atmosphere_model program. In sh/bash, you can run 'ulimit -s unlimited' and in csh/tcsh you can run 'limit stacksize unlimited'.

It looks like the segfault may be happening within a call to the PIO library. Which version of the PIO library (and other I/O libraries) are you using?
 
mgduda said:
If you haven't already done so, it may be worth trying to unlimit the stacksize before running the init_atmosphere_model program. In sh/bash, you can run 'ulimit -s unlimited' and in csh/tcsh you can run 'limit stacksize unlimited'.

It looks like the segfault may be happening within a call to the PIO library. Which version of the PIO library (and other I/O libraries) are you using?

The pio version is pio/intel/2.5.1. I tried the 'ulimit -s unlimited', but it is invalid. In addition, when I run the static file from the mesh coarser than 60-3km(eg, 46-12km ,120km) or the regional area of the 60-3km, it works.
 
Could you clarify: when you say that 'ulimit -s unlimited' is invalid, do you mean that the ulimit command couldn't be run, or that successfully running 'ulimit -s unlimited' in your shell before running the init_atmosphere_model program didn't resolve the segmentation fault?

That you're able to run successfully for coarser meshes suggests a memory issue of some sort. By default, output files are written through the Parallel-NetCDF library, but you could try writing your static file using the serial NetCDF library by adding io_type="netcdf", i.e.,
Code:
<immutable_stream name="output"
                  type="output"
                  io_type="netcdf"
                  filename_template="hn_static.nc"
                  packages="initial_conds"
                  output_interval="initial_only" />
to the definition of the "output" stream in your streams.init_atmosphere file. If the init_atmosphere_model still fails, then the problem may not be with the Parallel-NetCDF library, but elsewhere. I don't know that I've used PIO 2.5.1 myself, but I have used other 2.5.x versions without problem.
 
mgduda said:
Could you clarify: when you say that 'ulimit -s unlimited' is invalid, do you mean that the ulimit command couldn't be run, or that successfully running 'ulimit -s unlimited' in your shell before running the init_atmosphere_model program didn't resolve the segmentation fault?

That you're able to run successfully for coarser meshes suggests a memory issue of some sort. By default, output files are written through the Parallel-NetCDF library, but you could try writing your static file using the serial NetCDF library by adding io_type="netcdf", i.e.,
Code:
<immutable_stream name="output"
                  type="output"
                  io_type="netcdf"
                  filename_template="hn_static.nc"
                  packages="initial_conds"
                  output_interval="initial_only" />
to the definition of the "output" stream in your streams.init_atmosphere file. If the init_atmosphere_model still fails, then the problem may not be with the Parallel-NetCDF library, but elsewhere. I don't know that I've used PIO 2.5.1 myself, but I have used other 2.5.x versions without problem.
Hi,
Thanks for your advice!
The 'ulimit -s unlimited' is successfully running, but didn't resolve the segmentation fault.
Fortunately, after adding io_type="netcdf", the static field of 60-3km mesh is successfully generated. Is the fault caused by the file type? The initial file type can't be over 1G?

X1a0wu
 
It should definitely be possible to write files larger than 1 GB with the Parallel-NetCDF library (the default for MPAS output streams, with io_type="pnetcdf"). It could be that the Parallel-NetCDF library you have used when compiling the PIO library may have some issues. Do you know which version of the Parallel-NetCDF library you are using?

Anyway, I think knowing that the static file can be written with io_type="netcdf" gets us closer to finding the source of the original problem. In general, it would be valuable to have the ability to write output files in parallel with the Parallel-NetCDF library.
 
mgduda said:
It should definitely be possible to write files larger than 1 GB with the Parallel-NetCDF library (the default for MPAS output streams, with io_type="pnetcdf"). It could be that the Parallel-NetCDF library you have used when compiling the PIO library may have some issues. Do you know which version of the Parallel-NetCDF library you are using?

Anyway, I think knowing that the static file can be written with io_type="netcdf" gets us closer to finding the source of the original problem. In general, it would be valuable to have the ability to write output files in parallel with the Parallel-NetCDF library.

The version of pnetcdf is mathlib/pnetcdf/intel/1.12.1
 
I think I've used Parallel-NetCDF 1.12.1 with success in the past, so that seems fine.

Now that you have a static file, the interpolation of atmospheric initial conditions can be run in parallel using any Metis graph partition file. It might be interesting to try generating the "init.nc" file with multiple MPI tasks and to revert to using the Parallel-NetCDF library to write that file; all that you'd need to do is to omit the io_type="netcdf" attribute in your "output" stream. I think it would be interesting to know whether the Parallel-NetCDF library can write files in parallel, and whether the issue is apparently only in writing files with a single MPI task.

Did you install the Parallel-NetCDF library yourself from source? If so, did you run all of the pre-installation tests, and did they all pass?
 
Top