dashline
Member
I am doing ensemble integration. And I added perturbation to the variables in the restart file. When I cycle the model(120km uniform) from these restart files, I face this error as follow:
This is my log.out file, but there is no log.err file. I found "global min, max u -1633.39 1748.76" from the log.out file. this wind speed does not look normal. So is it a problem with my restart file?
[cn2083:677120:0:677120] Caught signal 8 (Floating point exception: floating-point invalid operation)
==== backtrace (tid: 677120) ====
0 0x00000000000534c9 ucs_debug_print_backtrace() ???:0
1 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0
2 0x0000000002ac551f __libm_powf_e7() ???:0
3 0x0000000000cc2078 atm_time_integration_mp_atm_recover_large_step_variables_work_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:3027
4 0x0000000000cb9ea5 atm_time_integration_mp_atm_recover_large_step_variables_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:2896
5 0x0000000000c2c6ec atm_time_integration_mp_atm_srk3_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:917
6 0x0000000000c0303c atm_time_integration_mp_atm_timestep_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:121
7 0x0000000000bc1e44 atm_core_mp_atm_do_timestep_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/mpas_atm_core.F:873
8 0x0000000000bbd20d atm_core_mp_atm_core_run_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/mpas_atm_core.F:664
9 0x0000000000419bae mpas_subdriver_mp_mpas_run_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/driver/mpas_subdriver.F:347
10 0x0000000000414c6a MAIN__() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/driver/mpas.F:16
11 0x0000000000414be2 main() ???:0
12 0x0000000000023493 __libc_start_main() ???:0
13 0x0000000000414aee _start() ???:0
=================================
forrtl: error (75): floating point exception
Image PC Routine Line Source
libnetcdf.so.19.0 0000151B003E891C for__signal_handl Unknown Unknown
libpthread-2.28.s 0000151AFDC3EB20 Unknown Unknown Unknown
atmosphere_model 0000000002AC551F Unknown Unknown Unknown
atmosphere_model 0000000000CC2078 atm_time_integrat 3027 mpas_atm_time_integration.F
atmosphere_model 0000000000CB9EA5 atm_time_integrat 2896 mpas_atm_time_integration.F
atmosphere_model 0000000000C2C6EC atm_time_integrat 917 mpas_atm_time_integration.F
atmosphere_model 0000000000C0303C atm_time_integrat 121 mpas_atm_time_integration.F
atmosphere_model 0000000000BC1E44 atm_core_mp_atm_d 873 mpas_atm_core.F
atmosphere_model 0000000000BBD20D atm_core_mp_atm_c 664 mpas_atm_core.F
atmosphere_model 0000000000419BAE mpas_subdriver_mp 347 mpas_subdriver.F
atmosphere_model 0000000000414C6A MAIN__ 16 mpas.F
atmosphere_model 0000000000414BE2 Unknown Unknown Unknown
libc-2.28.so 0000151AFD508493 __libc_start_main Unknown Unknown
atmosphere_model 0000000000414AEE Unknown Unknown Unknown
yhrun: error: cn2083: task 120: Aborted (core dumped)
yhrun: First task exited 60s ago
yhrun: StepId=628335.0 tasks 0-119,121-127: running
yhrun: StepId=628335.0 task 120: exited abnormally
yhrun: launch/slurm: _step_signal: Terminating StepId=628335.0
yhrun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 628335.0 ON cn2080 CANCELLED AT 2022-09-19T19:33:01 ***
yhrun: error: cn2082: tasks 64-85,87-95: Killed
yhrun: error: cn2082: task 86: Killed
yhrun: error: cn2081: tasks 32-63: Killed
yhrun: error: cn2080: tasks 0-31: Killed
yhrun: error: cn2083: tasks 96-103,105-119,121-127: Killed
yhrun: error: cn2083: task 104: Killed
This is my log.out file, but there is no log.err file. I found "global min, max u -1633.39 1748.76" from the log.out file. this wind speed does not look normal. So is it a problem with my restart file?