I'm unsure from the logs you provided. With my knowledge I'm only sure that your job died during the initialization phase of the atmosphere core.
Typically information for this type of failure would either be written to STDOUT or STDERR. If you are running on a HPC system these are usually re-directed to a file for you. Or you may have saved this output yourself. Otherwise, you'd need to look back in the history of whatever shell you ran the atmosphere_model from (or just re-run).
You mention a "core" file. Assuming this is the literal name and it's present in your run directory, you could use the
gdb
tool to examine the core file to understand the failure. A procedure for this generally looks like:
Bash:
# Setup the environment to use the same software you compiled/ran MPAS with.
# This is system dependent so I'm not adding the commands here
# Start gdb, replace each PATH_TO with your values (if needed)
gdb ${PATH_TO}/atmosphere_model ${PATH_TO}/core
# There will then be quite a bit of output. You should look at the last few lines
# before the prompt. It may contain something like SIGFPE or SIGSEGV to
# tell you how it failed.
# You should also see a line above the prompt that tells you what the last
# line of code executed was. Depending on a few factors, you could get
# more info with the backtrace (or 'bt') command
(gdb) bt
# gdb will spit out many lines of output. There's usually not much else I do
# inside gdb. You could research about gdb to see how else to use the tool.
# I typically just exit/quit here and look the MPAS code to understand the error.
(gdb) quit
You may need to re-compile with
DEBUG=true
in your make command to get the most useful information from gdb. I can usually get at least the last line before the failure occurred even without building with debug flags.
Thank you for your response. I checked it using the debugging tool and obtained the following results which seems not helpful.
Interestingly, the model is running successfully after I switched run directory and linked the namelist.atmosphere, streams.atmosphere, region.init.nc, and all the lbc* files to the MPAS_Model folder where I built the MPAS model. As expected, I found three output files: log.atmosphere.0000.out, diag.2017-06-14_00.00.00.nc, and history.2017-06-14_00.00.00.nc.
Where should I set up the environment parameters, such as SDTOUT or STDEER, that you mentioned?
Thanks again for your time and assistance!
-------------
gdb atmosphere_model core.90312
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
The GNU General Public License v3.0 - GNU Project - Free Software Foundation>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<
Bugs in GDB>...
Reading symbols from MPAS/MPAS_Model/MPAS-Model-8.2.2/atmosphere_model...done.
warning: core file may not match specified executable file.
[New LWP 90312]
[New LWP 90334]
[New LWP 90333]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
warning: the debug information found in "/usr/lib/debug/usr/lib64/libz.so.1.2.7.debug" does not match "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libz.so.1" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib64/libz.so.1.2.7.debug" does not match "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libz.so.1" (CRC mismatch).
.....
Core was generated by `./atmosphere_model'.
Program terminated with signal 11, Segmentation fault.
#0 mca_btl_smcuda_sendi () at ../../../../../opal/mca/btl/smcuda/btl_smcuda.c:938
938 ../../../../../opal/mca/btl/smcuda/btl_smcuda.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install libibverbs-41mlnx1-OFED.4.5.0.1.0.45101.x86_64 libmlx4-41mlnx1-OFED.4.5.0.0.3.45101.x86_64 libmlx5-41mlnx1-OFED.4.5.0.3.8.45101.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-41mlnx1-OFED.4.2.0.1.3.45101.x86_64 librxe-41mlnx1-OFED.4.4.2.4.6.45101.x86_64 nspr-4.35.0-1.el7_9.x86_64 nss-3.90.0-2.el7_9.x86_64 nss-softokn-freebl-3.90.0-6.el7_9.x86_64 nss-util-3.90.0-1.el7_9.x86_64 numactl-libs-2.0.9-7.el7.x86_64 openssl-libs-1.0.2k-26.el7_9.x86_64 zlib-1.2.7-21.el7_9.x86_64