Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Floating point exception: floating-point invalid operation

dashline

Member
I am doing ensemble integration. And I added perturbation to the variables in the restart file. When I cycle the model(120km uniform) from these restart files, I face this error as follow:
[cn2083:677120:0:677120] Caught signal 8 (Floating point exception: floating-point invalid operation) ==== backtrace (tid: 677120) ==== 0 0x00000000000534c9 ucs_debug_print_backtrace() ???:0 1 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0 2 0x0000000002ac551f __libm_powf_e7() ???:0 3 0x0000000000cc2078 atm_time_integration_mp_atm_recover_large_step_variables_work_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:3027 4 0x0000000000cb9ea5 atm_time_integration_mp_atm_recover_large_step_variables_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:2896 5 0x0000000000c2c6ec atm_time_integration_mp_atm_srk3_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:917 6 0x0000000000c0303c atm_time_integration_mp_atm_timestep_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/dynamics/mpas_atm_time_integration.F:121 7 0x0000000000bc1e44 atm_core_mp_atm_do_timestep_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/mpas_atm_core.F:873 8 0x0000000000bbd20d atm_core_mp_atm_core_run_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/core_atmosphere/mpas_atm_core.F:664 9 0x0000000000419bae mpas_subdriver_mp_mpas_run_() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/driver/mpas_subdriver.F:347 10 0x0000000000414c6a MAIN__() /fs1/home/tianxj/yhluo/MPAS73/MPAS-Model-7.3/src/driver/mpas.F:16 11 0x0000000000414be2 main() ???:0 12 0x0000000000023493 __libc_start_main() ???:0 13 0x0000000000414aee _start() ???:0 ================================= forrtl: error (75): floating point exception Image PC Routine Line Source libnetcdf.so.19.0 0000151B003E891C for__signal_handl Unknown Unknown libpthread-2.28.s 0000151AFDC3EB20 Unknown Unknown Unknown atmosphere_model 0000000002AC551F Unknown Unknown Unknown atmosphere_model 0000000000CC2078 atm_time_integrat 3027 mpas_atm_time_integration.F atmosphere_model 0000000000CB9EA5 atm_time_integrat 2896 mpas_atm_time_integration.F atmosphere_model 0000000000C2C6EC atm_time_integrat 917 mpas_atm_time_integration.F atmosphere_model 0000000000C0303C atm_time_integrat 121 mpas_atm_time_integration.F atmosphere_model 0000000000BC1E44 atm_core_mp_atm_d 873 mpas_atm_core.F atmosphere_model 0000000000BBD20D atm_core_mp_atm_c 664 mpas_atm_core.F atmosphere_model 0000000000419BAE mpas_subdriver_mp 347 mpas_subdriver.F atmosphere_model 0000000000414C6A MAIN__ 16 mpas.F atmosphere_model 0000000000414BE2 Unknown Unknown Unknown libc-2.28.so 0000151AFD508493 __libc_start_main Unknown Unknown atmosphere_model 0000000000414AEE Unknown Unknown Unknown yhrun: error: cn2083: task 120: Aborted (core dumped) yhrun: First task exited 60s ago yhrun: StepId=628335.0 tasks 0-119,121-127: running yhrun: StepId=628335.0 task 120: exited abnormally yhrun: launch/slurm: _step_signal: Terminating StepId=628335.0 yhrun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd: error: *** STEP 628335.0 ON cn2080 CANCELLED AT 2022-09-19T19:33:01 *** yhrun: error: cn2082: tasks 64-85,87-95: Killed yhrun: error: cn2082: task 86: Killed yhrun: error: cn2081: tasks 32-63: Killed yhrun: error: cn2080: tasks 0-31: Killed yhrun: error: cn2083: tasks 96-103,105-119,121-127: Killed yhrun: error: cn2083: task 104: Killed
This is my log.out file, but there is no log.err file. I found "global min, max u -1633.39 1748.76" from the log.out file. this wind speed does not look normal. So is it a problem with my restart file?
 

Attachments

  • log.atmos_mem025.out.txt
    10.5 KB · Views: 2
  • namelist.atmosphere.txt
    1.9 KB · Views: 1
Just to confirm, if you restart the model from one of your restart files with no perturbations, the restart simulation does run without issue, right?

Which variables are you perturbing, and what is the magnitude of the perturbations?
 
Thanks. The original restart file can continue the simulation normally. I have generated many sample members and only 2 of them have this error. It was indeed a problem with my superimposed perturbations, and I finally discarded the two restart files that contained bad perturbations.
 
Top