Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

real.exe can't run with met_em files generated from RegCM output

GewellLlorin

New member
Hello!

This issue is actually a continuation of the process that I have described in a previous thread. To recap, I'm attempting to downscale the results of a RegCM model (HadGEM2-ES) by using its output files as the boundary conditions of a WRFv4.2 simulation. I've used the python pywinter library to write data from the RegCM netcdf output files into the WPS intermediate file format and ran metgrid.exe to generate the met_em files needed to run real.exe.

Initially, running real.exe with the RegCM-based met_em files gave the error, "troubles, could not find trapping x locations." After reading the User Guide and forums with the same issue, (particularly this one), I ended up trying to increase the vertical resolution of the met_em files through linear interpolation when other methods did not work (ptop_requested was already 5000 and no data was outright NaN in the log files.) Following the pressure levels of a standard GFS-FNL file, I increased the original 13 levels to 26 in the met_em files and ran real.exe again. This time, it stopped in the middle of PSFC computations with no error message given. Setting the etac value to 0.1 in the namelist.input file changed nothing.

I'm at a loss for what else to do to diagnose and resolve this issue. Any comments would be greatly appreciated. I've uploaded my namelist.input files, log files, and some sample met_em files to this drive for reference, due to their large file sizes.

Thank you for your time and insights!
 
Hi,
The model stopping may not have anything to do with the fact that you've modified the num_metgrid_levels. Looking at your namelist.input file, though, I can see that your domain sizes are entirely too small, which can cause problems, and because they are so small, there isn't enough buffer space between your coarse and fine-resolution domains. Take a look at this best practices page that discusses some of the recommendations for setting up your domain.
 
Hello,

Thank you so much for your advice and apologies for the late reply. I've spent some time on reading through the WPS best practices page, especially the sections relevant to setting up my domains. Following the contents, I've reworked my two domains to increase their size and provide a buffer between them. However, these runs then led to the error, "forrtl: severe (174): SIGSEGV, segmentation fault occurred.”

Research on this error (particularly this thread) as well as advice from my supervisors led me to try the following solutions:
  • execute ulimit -s unlimited and check memory >> much space was still available
  • lessen frames_per_outfile to 6 (my run time is only 18 hours)
  • check namelist.wps for missing commas, wrong information >> nothing to report
  • changing time_step and parent_time_step_ratio according to DX >> 60, 30, 18, 15, 10 all gave the same error, as did changing the ratio between 1:2, 1:3, and 1:6
  • more domain configurations >> changing the grid ratio from 1:5 to 1:3 gave the same error, and even adding a new domain (1:3:3) was futile
  • specifying in namelist.input the 35 target eta_levels
Unfortunately, none of them have worked and give the same error message every time, albeit with variations in the map factor (1.0 to 1.02) every time the domains were changed. In later iterations I had to lessen the grid dimensions again since our study area is the City of Manila. The RegCM data I'm downscaling has a 25 km resolution. The input RegCM data more than covers all the domains I've tried and no data seems to be missing from any level. Currently, I'm comparing metgrid files generated from GFS-FNL (standard for our lab) and the ones from RegCM to check for missing data. Any more suggested courses of action would be appreciated.

I've uploaded some relevant files in the same drive I created for this thread, under the "segmentationError" folder. These include a few slides detailing the experimental runs with varying domains, and the diagnostic and input files for the latest simulation (#6).

Thank you for your time and insight!
 
I believe the problem now may be that you are only using a single processor. Take a look at this FAQ that discusses how to choose a reasonable number of processors, depending on the size of your domain.

If using more processors doesn't help, I'd recommend trying just the default namelist that comes with the new code (only modifying the settings specific to your domains and input data) to see if that runs. If it does, then you can slowly add in other options to see which one(s) is/are causing the issue.
 
Hello!

Apologies again for the delayed reply. I was again working on my tasks while incorporating your suggestions. When the suggestion to work with multiple processors and simplifying my runs did not work, I once again revisited my input files. I pinpointed them as the culprit when running the exact same settings with sample GFS-FNL output executed with no problems. Much to my frustration, the problem all along was my assumption that the RegCM output also operated with pascal units, when they were apparently in hectopascals. Fixing that in the FILEs generation allowed me to run my simulations using my original domain configuration, expanded to 3 domains (I attached an overview image to this post.)

However, I now know from the WPS best practices page that the size of the domains for this successful run is much too small. I then configured a new set of domains following the suggested guidelines. The new issue I run into is that real.exe once again suddenly stops in the middle of the run, despite working with the fixed met_em files. Checking the rsl.error file shows:

"d01 2005-01-02_18:00:00 Yes, this special data is acceptable to use: OUTPUT FROM METGRID V4.2
d01 2005-01-02_18:00:00 Input data is acceptable to use: met_em.d01.2005-01-02_18_00_00.nc
metgrid input_wrf.F first_date_input = 2005-01-02_18:00:00
metgrid input_wrf.F first_date_nml = 2005-01-02_00:00:00
d01 2005-01-02_18:00:00 Timing for input 0 s.
d01 2005-01-02_18:00:00 flag_soil_layers read from met_em file is 1
Missing surface RH, replaced with closest level, use_surface set to false.
Using sfcprs to compute psfc
forrtl: severe (174): SIGSEGV, segmentation fault occurred"


I've attached the namelist.input and rsl.error files below for reference. Any suggestions are much appreciated! So far, I've tried:
  1. Running with just 1/2 domains
  2. Changing time_step to 60/150
  3. Checking the met_em files for PSFC (present.)
  4. Experimenting with the feedback and smooth_option arguments
  5. Running with WRFv3.9 or on another server (currently in progress)

As always, thank you so much for your patience and insights!
 

Attachments

  • Progress Report #5 (LLORIN).pptx.jpg
    Progress Report #5 (LLORIN).pptx.jpg
    50.5 KB · Views: 1
  • namelist.input
    4.4 KB · Views: 4
  • rsl.error.0000
    5.2 KB · Views: 1
I know you said you checked the memory before - did you also check disk space? I think the namelist you attached may be your old one because domain 1's size is still small, and it's only running with a single domain. Can you attach the latest one - that matches the rsl file? Will you also go ahead and send a couple time periods of met_em* files so I can try to run a test? If they are too large to attach, see the home page of this forum for instructions on sharing large files. Thanks!
 
Hello,

The disk space of the server I'm working still has 700GB available, more than enough for my test runs I believe.

Also, apologies for overlooking the number of domains in the namelist.input file, that one was from trying the run with a single domain. But that is still the same one I use for my 3-domain run, with the max_dom parameter just set to 3. I know it's not recommended for domains to be less than 100 x 100 in size, but I'm greatly constrained by the domain of the RegCM output files given to me. I've attached below a sample figure to show its extent. I'll still try to expand my domain, maybe this time to fit the entirety of the available extent from the RegCM data.

In the meantime, I've uploaded the met_em files for 2005 January 2 to the same drive I made for this thread as a zip file (metFiles.zip). This was faster to upload than NextCloud, my apologies for the alternative platform. I've also reattached the namelist.input file to this post with the max_dom parameter edited. I hope these are fine.

Thank you very much for your help!
 

Attachments

  • 307888203_492703599388072_4902527239596688559_n.png
    307888203_492703599388072_4902527239596688559_n.png
    151.3 KB · Views: 0
  • namelist.input
    4.4 KB · Views: 1
Thanks for sending those. It's helped a lot. So the bottom line is that I believe some things are wrong with your input files. I do believe you need more than 13 num_metgrid_levels, and I'm not sure what the magic number is, but the lowest I've seen is from GFS/FNL, like you were trying earlier. I think the reason that didn't work for you is because there are still other issues. I also ran a single domain case and this is what I found.

Looking at the rsl.out.* files, I found this at the bottom of some:

Code:
 all_dim =            1
 order =            1
 i,j =           56          50
 p array =    9.210340
 f array =   0.0000000E+00
 p target=    11.50988       11.50317       11.49459       11.48370
   11.46995       11.45273       11.43137       11.40517       11.37348
   11.33572       11.29152       11.24074       11.18352       11.12014
   11.05044       10.97377       10.88946       10.79733       10.70122
   10.60512       10.50901       10.41290       10.31679       10.22068
   10.12457       10.02846       9.932343       9.836230       9.740117
   calculating min and maxes for SM000010.      9.355664       9.259551

I'm not sure why it stops there, and doesn't continue to print the next part of the coded message, but this is what should have printed after that, according to the code in dyn_em/module_initialize_real.F

Code:
      IF ( all_dim .LT. n+1 ) THEN
print *,'all_dim = ',all_dim
print *,'order = ',n
print *,'i,j = ',i,j
print *,'p array = ',all_x
print *,'f array = ',all_y
print *,'p target= ',target_x
         CALL wrf_message ( 0 , 'Troubles, the interpolating order is too large for this few input values' )
         CALL wrf_message ( 0 , 'This is usually caused by bad pressures' )
         CALL wrf_message ( 0 , 'At this (i,j), look at the input value of pressure from metgrid' )
         CALL wrf_message ( 0 , 'The surface pressure and the sea-level pressure should be reviewed, also from metgrid' )
         CALL wrf_message ( 0 , 'Finally, ridiculous values of moisture can mess up the vertical pressures, especially aloft' )
         CALL wrf_message ( 0 , 'The variable type is ' // var_type // '. This is not a unique identifer, but a type of field' )
         CALL wrf_message ( 0 , 'Check to see if all time periods with this data fail, or just this one' )
         CALL wrf_error_fatal ( 'This vertical interpolation failure is more typically associated with untested data sources to ungrib' )
      END IF

So this made me look into both the pressure and moisture fields in your files. The i,j location in the rsl file is 56,50. I notice the pressure value (PRES) for that location, at the bottom level is 10,188,300 Pa (101,883 hPa), which is obviously unrealistic. Similar values are evident over the entire domain. Then I looked at soil moisture. The values for all your soil moisture (the 3d "SM" value and 2d "SM*****" values) all range beyond 1, and the value should be a fraction, where 1 is the greatest value and fully saturated (e.g., a point in the ocean).

So I'm not exactly sure how this happened, but it likely stems from the conversion of your data file type. I would recommend going back over those and re-checking everything. There may be issues with other variables, as well, but I didn't check others. Try to just create a file for a single time period and a single domain to start with (to eliminate extra work) and then view the met_em* file values using something like "ncview." If you have questions about what some fields are, you can find them in the Registry/Registry.EM_COMMON file, with descriptions and units.
 
Hi again,

Thank you so much for taking the time to check my files. I looked at them again as advised and realized that while I fixed the pressure levels to be in pascals instead of hectopascals, I still hadn't checked the actual values for the fields I was using for PSFC and MSLP. As it turns out, the RegCM ps data (surface pressure) was in pascals, but psl (sea level pressure) was in hectopascals. The data I was using for soil moisture was also in kg per square meter, instead of the cubic meter per cubic meter expected by real.exe. After fixing those, real.exe ran just fine.
 
Top