Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

60km uniform, Model integration failed

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

pouwereou Nimon

New member
Hello everyone,
Thank you for all the supports.
Please, I am running 60km resolution mesh and I am having problem with the model integration. Please what could be wrong, and how can I solve the problem?
Please find log.atmosphere.0000.out, atmos_stderr, streams.atmosphere and namelist.atmosphere files in the attached.

Thank you for your help
 

Attachments

  • namelist.atmosphere.dat
    1.7 KB · Views: 53
  • log.atmosphere.0000.out.dat
    8.5 KB · Views: 61
  • atmos_stderr.dat
    1.2 KB · Views: 56
  • streams.atmosphere.dat
    1.5 KB · Views: 53
Thanks very much for attaching your namlist.atmosphere, streams.atmosphere, and log files. The error message
At line 1495 of file mpas_timekeeping.F
Fortran runtime error: Bad integer for item 1 in list input
in your stderr log generally indicates that the model is attempting to parse an invalid date-time string.

In the &physics group in your namelist.atmosphere file, SST updates are enabled with config_sst_update:
Code:
&physics
    config_sst_update = true
    config_sstdiurn_update = true
    config_deepsoiltemp_update = false
    config_radtlw_interval = '00:30:00'
    config_radtsw_interval = '00:30:00'
    config_bucket_update = '153:00:00'
    config_physics_suite = 'mesoscale_reference'
/
but the "surface" stream in your streams.atmosphere file doesn't have a valid input interval:
Code:
<stream name="surface"
        type="input"
        filename_template="sfc_update.nc"
        filename_interval="none"
        input_interval="none" >

        <file name="stream_list.atmosphere.surface"/>
</stream>
. In this case, I think the code beginning at L.554 in mpas_atmphys_manager.F will try to use the string "none" to set an alarm, leading to the error you've encountered.

Could you try providing an input_interval that matches the interval at which you've created the SST and sea-ice update file? If you haven't created such a file as describe in Section 8.1 of the User's Guide, could you try setting config_sst_update = false in your namelist.atmosphere file?
 
I also noticed in your namelist.atmosphere file that you have the bucket update interval set to 153 hours ("153:00:00"):
Code:
&physics
    config_sst_update = true
    config_sstdiurn_update = true
    config_deepsoiltemp_update = false
    config_radtlw_interval = '00:30:00'
    config_radtsw_interval = '00:30:00'
    config_bucket_update = '153:00:00'
    config_physics_suite = 'mesoscale_reference'
/

The second post in this thread about precipitation post-processing describes briefly how bucket-updates work. While an update interval of 153 hours isn't necessarily wrong, it might be worth considering whether an update interval of, e.g., 24 hours, could simplify your post-processing workflow.
 
Dear mgduda,

Thank you very much for your reply. I am very grateful.
As you said, I have set up the <"input_interval="21600"> because I have created the SST and sea-ice update file at 6 hours interval. Also, I have change the config_bucket_update to config_bucket_update = '24:00:00'. Doing so, the model integration has stopped at a moment and gave only "diag.2006-05-01_00.00.00.nc" and "history.2006-05-01_00.00.00.nc".
Latter on, I have changed config_bucket_update to config_bucket_update to config_bucket_update to config_bucket_update = 'none' but I am still getting the same result.

These are the elements in my "mpas_atmos" working directory:

atmosphere_model core.34248 core.61610
atmos_stderr core.34249 core.61611
atmos_stdout core.34250 core.61612
build_tables core.34251 core.61613
CAM_ABS_DATA.DBL core.34252 core.61614
CAM_AEROPT_DATA.DBL core.34253 core.61615
core.112282 core.34254 core.61616
core.112283 core.34255 core.61617
core.112284 core.34256 core.61618
core.112285 core.34257 core.61619
core.112286 core.34258 core.61620
core.112287 core.34259 core.61621
core.112288 core.34260 core.61622
core.112289 core.34261 default_inputs
core.112290 core.34262 diag.2006-05-01_00.00.00.nc
core.112291 core.54602 GENPARM.TBL
core.112292 core.54603 history.2006-05-01_00.00.00.nc
core.112293 core.54606 init.nc
core.112294 core.54607 LANDUSE.TBL
core.112295 core.54608 log.atmosphere.0000.out
core.112296 core.54609 namelist.atmosphere
core.112297 core.54610 OZONE_DAT.TBL
core.143423 core.54611 OZONE_LAT.TBL
core.143424 core.54612 OZONE_PLEV.TBL
core.143425 core.54613 RRTMG_LW_DATA
core.143426 core.54614 RRTMG_LW_DATA.DBL
core.143427 core.54615 RRTMG_SW_DATA
core.143428 core.54616 RRTMG_SW_DATA.DBL
core.143429 core.54617 run_mpas_init_smp.qsub
core.143430 core.54962 sfc_update.nc
core.143431 core.54963 SOILPARM.TBL
core.143432 core.54964 static.nc
core.143433 core.54965 stream_list.atmosphere.diagnostics
core.143434 core.54966 stream_list.atmosphere.output
core.143435 core.54967 stream_list.atmosphere.surface
core.143436 core.54968 streams.atmosphere
core.143437 core.54969 streams.init_atmosphere
core.143438 core.54970 testing_and_setup
core.25350 core.54971 VEGPARM.TBL
core.25356 core.54972 x4.163842.graph.info.part.12
core.25357 core.54973 x4.163842.graph.info.part.128
core.25359 core.54975 x4.163842.graph.info.part.192
core.25360 core.61607 x4.163842.grid.nc
core.25361 core.61608
core.34247 core.61609


This is tail -f log.atmosphere.0000.out:

Timing for stream input: 10.0579418210000 s
----------------------------------------------------------------------
--- time to update background surface albedo, greeness fraction.
--- time to run the LW radiation scheme L_RADLW =T
--- time to run the SW radiation scheme L_RADSW =T
--- time to run the convection scheme L_CONV =T
--- time to update the ozone climatology for RRTMG radiation codes
--- time to apply limit to accumulated rainc and rainnc L_ACRAIN =F
--- time to apply limit to accumulated radiation diags. L_ACRADT =F
--- time to calculate additional physics_diagnostics =F

This is nano atmos_stderr:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Thank you very for your support
 
It's good to hear that setting the "input_interval" for the "surface" stream helped. We have been debating whether to include a check in the code to ensure that a valid input_interval is set if config_sst_update is set in the namelist.atmosphere file; hopefully this will save others from encountering this issue in future.

Which compilers have you used when building MPAS-Atmosphere? If you do happen to be using the Intel compilers (ifort + icc), it may help to try unlimiting the stack size; in the bash shell, you can try
Code:
ulimit -s unlimited
, and in the csh or tcsh shells you can try
Code:
limit stacksize unlimited
. The Intel compilers tend to generate code that aggressively allocates memory on the stack, and this can lead to model crashes if the default stack size is inadequate.

If you are not using the Intel compilers, though, could you say how much memory you have on each node of your machine, and how many nodes you're using with the 60-km simulation?
 
Dear mgduda,
thank you for your quick reply

My MPAS-Atmosphere has been built using intel compilers. As you recommended, I trid unlimiting the stack size by using: ulimit -s unlimited. In fact it's already in ".bashrc" file. But I am getting the same result.

Concerning the memory on each node of my machine, these are some characteristics of the machine: it has 1368 compute nodes with 24 cores and 128 GiB* memory (360 nodes have only 64 GiB) each, and five large memory “fat” nodes with 56 cores and 1TiB* each, all interconnected using FDR 56 Gb/s InfiniBand accessing 4 PB of shared storage over the Lustre filesystem.

For my 60km uniform run, I am using: 8 nodes, ncpus=16:mpiprocs=16. 8*16=128 that corresponds to "x4.163842.graph.info.part.128". That's what I did

I am new in MPAS and there are some technical aspects I don't know yet.
 
Thanks very much for the complete summary of the nodes. I think 8 nodes with 128 GiB/node should provide more than enough memory for a 60-km simulation. Do you know whether your ".bashrc" file is being sourced when batch jobs are run?

It might be worth trying to clean and re-compile the model with DEBUG=true to see whether that generates any useful error messages:
Code:
make clean CORE=atmosphere
make ifort CORE=atmosphere DEBUG=true
 
Dear mgduda,
Sorry for the late reply. I have been working on the issue of the model integration. Finally, I found that the issue was the way I produced my Intermediate file using CFSR data. My colleague helped me to produce the intermediate file and the model is running very well now.

Thank you very much
 
Thanks so much for following up! It's helpful to know that the issue might have been in the initial conditions. Perhaps in a future release of MPAS-Atmosphere we can add some diagnostics to check the ranges of values of input fields to help in identifying similar issues.
 
Dear mgduda,
I would like to thank you for your assistance. Please, how can one process precipitation from MPAS model when the config_bucket_update option is set to "None". Thank you in advance.
 
I just posted a reply in your other thread in the Post-processing section. Please don't hesitate to follow-up there if there are any details that I can help to clarify. We're glad to help!
 
Top