Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Observation nudging in a nest

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

wallis

Member
I am setting up observation nudging in my nested domains. Currently I am generating little_r files and everything works well as long as I don't activate obs nudging on the nest with multiple MPI processes.
If I run with >2 MPI process, and have a OBS_DOMAIN201 file present, the model crashes during initialization at module_dm:5527
DO N = 1, NSTA
ERRF(1,IFULL_BUFFER(N)) = FULL_BUFFER(N)
END DO

If I run with 1 MPI process everything works fine. If I run nudging on only the outer domain with OBS_DOMAIN101, everything works fine. Please let me know if you have some ideas.

Namelist:
&time_control
<snip>
auxinput11_interval = 1,1,
/

&fdda
obs_nudge_opt = 1,1
max_obs = 10000
fdda_start = 0,0
fdda_end = 360,360
obs_twindo = 0.5,0.5
obs_ipf_errob = .true.
obs_prt_max = 100
obs_prt_freq = 2
obs_ipf_nudob = .true.
obs_ionf = 10
rinblw = 250,250
obs_ipf_init = .true.
/
 
Interestingly, if I set numtiles = 1, I am able to run with multiple MPI processes. Any values higher than 1 and it crashes.
So I'm suspicious there is a problem with the tiling/MPI code which seems eerily similar to https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=45&t=5391
 
To make this weirder, if it's compiled with dm+sm, it works, if it's compiled with dm alone, it doesn't.
Something is definitely wrong in the tiling in ftdda.
 
Hi,
Is this essentially the same problem as you posted about here?
https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=45&t=9490&p=18195#p18195

If so, let me know and we can just work from this post from here on out. In the future, please only post about the same problem in one post, as it gets confusing for us, and for other users when trying to follow slightly different versions of posts.

As I replied to the other post, can you let me know the version of the model you are running, and please attach the following:
1) namelist.input
2) the full error log (e.g., rsl.error.0000)
3) Issue the following commands (from the directory where you are running wrf):
Code:
ls -ls >& ls.log
Code:
ncdump -v Times wrffdda_d01 >& d01.log
Code:
 ncdump -v Times wrffdda_d02 >& d02.log
and please also attach those 3 *.log files. If you need instructions on attaching log files, please see the home page for this forum. Thanks!
 
Hi,

Thanks for looking into this. I am only trying to do obs nudging (obs_nudge_opt), not grid nudging currently, so I think wrffdda isn't being used(?).

As for this issue though, see attached as requested.
Version is WRF 4.2.1

The linked post is a different problem - I have subsequently discovered that particular error can be avoided by disabling DFI but I haven't investigated it any further yet, I will update that post including the title with my latest findings and debugging shortly.
 

Attachments

  • logs.tgz
    54.9 KB · Views: 60
Thanks for sending those. You are correct - that you will not have wrffdda* files for observational nudging. I apologize for the miscommunication.

I just ran a test with V4.2.1 compiled with the dmpar option, and used a nested case with obs_nudge_opt = 1, 1 (for both domains). This ran without any problems. So that I can test your case, can you attach (or upload if they are too large - see home page of the forum for uploading large files) the following files:

OBS_DOMAIN101
OBS_DOMAIN201
wrfinput_d01
wrfinput_d02
wrfbdy_d01
namelist.input

Thanks!
 
I have made some progress - setting max_obs to extreme values seems to avoid the crash, >1000000 is needed even though I only have a few hundred observations. Is max_obs possibly meant to be set to num_observations * MPI processes * tiles? I am running with >60 MPI processes and >30 tiles.
Unfortunately even though the model seems to run successfully, and the rsl.out files indicate nudging is happening, no nudging seems to actually be taking place.
You can test this by setting obs_nudge_opt = 0,0, or 1,1 alternately, and observing the T2 field is bit-for-bit identical in both outputs.
I have uploaded the files as demo.tgz
 
Thanks for sending that. I'm glad you got past the initial problem. As for the new problem with no nudging, are you able to get it to nudge when you only run d01, or does it not work either way?

What type of build are you using - dmpar, dm+sm, smpar?
 
Top