Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

FATAL CALLED FROM FILE: <stdin> LINE: 1591 WARNING: Extreme t_soisno at c, level 1 1

Tbahaga

New member
Dear All,

I am using WRF-ARW v4.4. My WRF simulation has been continuously interpreted with the following error in rsl file.

FATAL CALLED FROM FILE: <stdin> LINE: 1591
WARNING: Extreme t_soisno at c, level 1 1

and my_wrf.log
MPI_ABORT was invoked on rank 160 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------

Currently, downscaling CFSv2 hindcast simulation over the East Africa region with physics_suite = 'tropical' and activated lake model.

I couldn't find a discussion on this issue, I would appreciate your technical support.

Best Regards,
Titike Bahaga
 
Hi,
Can you please attach your namelist.input file, as well as all of your rsl* files? You can package the rsl* files into a single *.tar file. Thanks!
 
Dear Kwerner,

Thank you for getting in touch. I've been eagerly awaiting a response to this issue. Apologies for the delayed reply. As requested, I have attached my namelist.input file and a zip containing all the TSL files for your reference.

Best regards,
Titike.
 

Attachments

  • namelist.input
    6.3 KB · Views: 4
  • rsl.zip
    1.2 MB · Views: 2
Hi,

Are you using a queueing system and submitting a batch script to run this? Is so, is it possible that you ran out of wall-clock time for your shared system?

Otherwise, it looks like you may be running wrf-hydro. If that's the case, see Questions related to WRF-Hydro.
 
Hi,
I am running the simulation by submitting a batch script; I have checked with the HPC admin, and it has nothing to do with wall clock time. Additionally, I am not running WRF-hydro, but some of my output will be used as input for the WRF-hydro simulation.

The problem seems related to a test performed in module_sf_lake.f90, which ensures that the variable t_soisno is within range. Kindly check again.
 
Hi,

Your namelist.input looks fine to me except that you turn on too many options, which add extra difficulties to debug what is wrong.

I wonder whether the case crashed immediately? if not, how long did it run before crashing?

Can you run a single-domain case first? please also turn off lake model. If it works, then add lake model and rerun the case. By these tests we can learn whether the issue is caused by lake.

If the model keeps crashing, please recompile WRF in debug mode, i.e., ./clean -a and ./configure -D. With the debug mode, you can find exactly when and where the error pops up first. This will give you hints to solve the issue.
 
Hi,

I work on the same project/machine but process a different time period (MAM season), and have the exact same issue (error on same line number). It happens both with the default physics and 'tropical' as we use. It appears to crash randomly at any point in time, i.e. I have it crash after 2 hours, 15 hours, and last run was only 1 minute. I have also reruns without crashing (around 30 hours). Using pre-installed module WRF/4.4-foss-2022a-dmpar.

Wall time limit is set to 44 hours. Can it be low on resources in any way, memory?
I run the "normal" queue with 8-10 nodes with 32 tasks-per-node. The system has 996 nodes with 32 cpus and 59 GiB ram, slightly less than 2 GiB per cpu. Is this normally enough?

I could try the 'bigmem' with 494 GiB ram, and is meant for jobs needing more than 4 GiB per cpu. Documentation says: "For bigmem jobs, the queue system hands out cpus and memory, not whole nodes.", so I am a bit skeptical about performance.

If I rebuild WRF with debug config, should I build with latest version 4.6, and do I need to update and rerun WPS as well?

Thank you.
 
Top