Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

fails to run atmosphere_model

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

sekluzia

Member
Hi,

I am trying to run ./atmosphere_model (mpiexec -n 32 ./atmosphere_model). I do not understand why the MPAS model fails to run. Please, find attached my files.

Artur
 

Attachments

  • streams.atmosphere.txt
    1.5 KB · Views: 62
  • namelist.atmosphere.txt
    1.7 KB · Views: 61
  • log.atmosphere.0000.out.txt
    10.5 KB · Views: 62
  • log.atmosphere.0000.err.txt
    389 bytes · Views: 76
Hi Artur,

The model is giving you the error:

ERROR: Writing to stream 'diagnostics' would clobber file 'diag.2010-10-23_00.00.00.nc',
ERROR: but clobber_mode is set to 'never_modify'.

because there is already a diagnostics file within your run directory and the model has not been told that it is okay to overwrite it. The default behavior is to give this error message so any data is not accidentally overwritten. The same thing occurs for all other output streams (history, restart and any output streams you may have defined).

If you want to save this data, you can move it to another file, rename it and run the model again or, if you don't care about the data, you can delete the file, and then run the model again. You can also set
Code:
clobber_mode="overwrite"
the diagnostic stream output (and other streams) to have the model overwrite any output streams it fines. i.e.:

Code:
<stream name="diagnostics"
        type="output"
        filename_template="diag.$Y-$M-$D_$h.$m.$s.nc"
        clobber_mode="overwrite"
        output_interval="1:00:00" >

	<file name="stream_list.atmosphere.diagnostics"/>
</stream>

You can see additional options for
Code:
clobber_mode
in section 5.2 of the atmosphere user guide.
 
The clobber errors should not be "fatal" errors (in that, by themselves, they should not cause the model simulation to stop), and it does look like the log.atmosphere.0000.out file ends abruptly during the first model timestep. Was the atmosphere model intentionally terminated, or did it fail on its own during the first timestep?
 
Hi,

You are right the clobber errors are not "fatal" errors. I added clobber_mode="overwrite" in streams.atmosphere file (attached) and it did not help. The atmosphere model works 1-2 min, my RAM abruptly increases and becomes close to its maximum (56 GB out of 64 GB) and it hangs. Please, also find attached my log.out file (mpiexec -n 32 ./atmosphere_model >& log.out).

Artur
 

Attachments

  • streams.atmosphere.txt
    1.6 KB · Views: 74
  • log.out.txt
    564.2 KB · Views: 66
Can you try running the model with just a single MPI task? It seems like there may be some problem running the model in parallel.

I did notice in your log.out.txt file that there appear to be about 1024 stack traces, rather than the expected 32 (since you are running with 32 MPI tasks). I also noticed that in your namelist.atmosphere file, you have
Code:
&decomposition
    config_block_decomp_file_prefix = 'x1.10242.graph.info.part.32'
/
Generally, config_block_decomp_file_prefix should be the prefix of the mesh partition filename only. There should not be a numerical suffix specified, since the model will automatically append to the prefix a number corresponding to the MPI task count. With your specification of config_block_decomp_file_prefix, the model should actually be trying to read a file named "x1.10242.graph.info.part.3232".

Perhaps it's a coincidence, but 32 * 32 = 1024, and there were what appeared to be 1024 stack traces in your "log.out.txt" file. Is is possible that in running the job, you may somehow actually be using 1024 MPI tasks rather than 32?
 
Hi,

I corrected my namelist.atmosphere file (attached). I ran the model with a single MPI task (if I did correctly, I typed
./atmosphere_model >& log.out
). I also attach the updated log.out and log.err files.
When I am trying to run the atmosphere model with mpiexec command the following message appears:

ubuntu@wrf:~/MPAS-Model/run$ mpiexec -n 32 ./atmosphere_model
ssh: Could not resolve hostname wrf: Name or service not known
 

Attachments

  • log.atmosphere.0000.err.txt
    530 bytes · Views: 54
  • log.atmosphere.0000.out.txt
    10.5 KB · Views: 60
  • log.out.txt
    17.6 KB · Views: 61
  • namelist.atmosphere.txt
    1.7 KB · Views: 62
It may be that there is a problem with the MPI library on your system. Could you try compiling the attach MPI test program with
Code:
mpicc -o mpitest mpitest.c
Then, try running the resulting program in two ways:
Code:
./mpitest >& 1task.log
and
Code:
mpiexec -n 32 ./mpitest >& 32tasks.log
. Could you then attach the resulting 1task.log and 32tasks.log files?
 

Attachments

  • mpitest.c
    808 bytes · Views: 57
Yes, it seems that that there is a problem with the MPI library on my system. Please, find attached my *log files. How can I fix that? Please, also look the message in 32tasks_root.log when I am trying to run as a root.

Artur
 

Attachments

  • 32tasks.log
    64 bytes · Views: 62
  • 1task.log
    11 bytes · Views: 51
  • 32tasks_root.log
    604 bytes · Views: 55
@sekluzia Is there a systems administrator that manages computing systems at your institution? If so, it may be best to ask them for assistance in re-installing an MPI library and testing it to verify that it works correctly.
 
I re-installed the MPI version 3.3.1. Now, the MPI test should show that everything is fine (1task.log.txt and 32tasks.log.txt files are attached). However, after running the atmosphere model (./atmosphere_model >& log.out) there are again errors (log.atmosphere.0000.out.txt, log.atmosphere.0000.err.txt and log.out.tar.gz files are attached).

Artur
 

Attachments

  • 32tasks.log.txt
    406 bytes · Views: 57
  • 1task.log.txt
    11 bytes · Views: 58
  • log.atmosphere.0000.out.txt
    10.5 KB · Views: 56
  • log.atmosphere.0000.err.txt
    530 bytes · Views: 51
  • log.out.tar.gz
    738.8 KB · Views: 57
It does look like the MPI tests are passing, so reinstalling MPI does seem to have corrected that issue.

The messages in the log.out.tar.gz file suggest that there may be a problem in the physics schemes, which could be due to problems with the initial conditions in the model. To test whether the atmosphere_model program itself is working, it may be worth trying to make a simulation with some input files that are known to be good. Could you try downloading this 120-km benchmark case? After unpacking the .tar.gz file, you can link your atmosphere_model executable into the resulting directory and try running the benchmark case. If this works, then it suggests that your atmosphere_model executable is working, and the problem that you are seeing is related to the initial conditions.
 
Please find attached my log files for the test case. It seems that the same problem exists.
 

Attachments

  • log.out.txt
    467.7 KB · Views: 31
  • log.atmosphere.0000.out.txt
    10.1 KB · Views: 29
There are some indications in your log files (e.g., references to libgomp.so.1 and libpthread.so.0, as well as symbol names like __mpas_atmphys_driver_MOD_physics_driver._omp_fn.8) that suggest you are running the model with OpenMP threading. When you compiled MPAS-Atmosphere did you specify OPENMP=true in your make command? If so, it might be worth cleaning and recompiling the model without OPENMP=true so that no threading is enabled.
 
Top