Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRFDA 4D-VAR running error

Polly_LO

New member
Good day!
I have some troubles to run my WRFDA 4D-VAR case.
And I'm looking for someone who has more experience in running this module.

For information:
I use CentOS 7.
I have successfully compiled 3D-VAR (4.1.3), WRFPLUS and 4D-VAR with using GNU dmpar configuration. Also I installated it on netcdf-c-4.9.2, netcdf-fortran-4.6.1, hdf5-1.10.5, zlib-1.2.13, jasper-1.900, libpng-1.6.37 and mpich-3.3.1.
I run 3D-VAR test successfully and didn't have any errors in compile log files.

When I run my case on 8 processors I didn't received any direct error but the da_wrfvar.exe process terminated for some time,
but rsl.error ends by:
...
Timing for main: time 2023-07-30_17:58:30 on domain 1: 1.68950 elapsed seconds
Timing for main: time 2023-07-30_18:00:00 on domain 1: 1.80228 elapsed seconds
Swap time: <2023-07-30_12:00:00>and: <2023-07-30_18:00:00>
Swap time: <2023-07-30_13:00:00>and: <2023-07-30_17:00:00>
Swap time: <2023-07-30_14:00:00>and: <2023-07-30_16:00:00>
wrf: calling adjoint integrate

in rsl.out
...
Timing for main: time 2023-07-30_17:58:30 on domain 1: 1.68950 elapsed seconds
Timing for main: time 2023-07-30_18:00:00 on domain 1: 1.80228 elapsed seconds
Calculate innovation vector(iv)
..
Minimize cost function using CG method
..
Swap time: <2023-07-30_12:00:00>and: <2023-07-30_18:00:00>
Swap time: <2023-07-30_13:00:00>and: <2023-07-30_17:00:00>
Swap time: <2023-07-30_14:00:00>and: <2023-07-30_16:00:00>
wrf: calling adjoint integrate

The termination looks like:
starting wrf task 0 of 8
starting wrf task 1 of 8
starting wrf task 2 of 8
starting wrf task 3 of 8
starting wrf task 4 of 8
starting wrf task 6 of 8
starting wrf task 5 of 8
starting wrf task 7 of 8

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 68254 RUNNING AT servicenew
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Сould it be related to the size and resolution of my domain or with input files (sound temp reports)? Or may be I should use more processors?

I have attached my configure requirements file.
I would be very grateful for any ideas how to solve this problem.
 

Attachments

  • namelist.input
    1.7 KB · Views: 4
  • rsl.error.0000
    29.2 KB · Views: 4
  • rsl.out.0000
    31.3 KB · Views: 3
  • wrfda_4d.log
    929 bytes · Views: 4
I have the same problem.Can you help me if you solve it
I found a solution that is not entirely explainable, but worked for test case and for my case calculation with the assimilation of 4D-VAR.
It looks like:
> mpirun -np 2 ./da_wrfvar.exe >& wrfda_4d.log

The assimilation module (4D-VAR) worked correctly only on 2 processors.
This may be due to the instability of my cluster file system or to incorrect parallelization of 4dvar itself, which seems more likely to me.
I would like to know from the WRFDA developers, if they have an example of stable operation on multiprocessor systems with 4D-VAR code compilation for Intel systems?
 
Top