Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Slow Processing of Soil Fields with GEFS Reforcast generated Intermediate Files

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.


New member
So recently I posted a topic regarding generating intermediate files using the GEFS reforecast set via two python packages (PyGrib / PyWinter), and had to solve an interpolation error in the set, however, I have now stumbled onto a new issue with this set. For some unknown reason metgrid is taking a very long time to process the soil parameters (ST/SM) using these generated intermediate files (Even on HPC environments it can take minutes to process a single field using double digit processor counts). As a comparative test, I ran through the same domain using CFSv2 processed via two ungrib runs and these fields are processed almost immediately (WRF/WPS V4.2).

The only thing I'm noticing in the metgrid output is that unlike the CFS processing each field is immediately processed fully (IE: TT @ 1000, TT @ 975, TT @ 950, so on) instead of what I normally see in the processing where each level is processed (IE: TT @ 1000, UU @ 1000, VV @ 1000, so on). Running rd_intermediate does not appear to show anything out of the ordinary for the intermediate files.

The only strange thing I have noticed in the files deals with some kind of value translation aspect (printing the array in python shows incorrect values, and a lot of values close to the extremes (IE: 0.9999997, 0.000001), but not quite touching the fractional limits (1.0, 0.0) as I have seen from other sets. I'm also noticing some strangeness with masked array behavior from some of the files when loading the GRIB data in PyGrib, but I'm not sure what is happening here to be exact.

I have attached a sample of the intermediate file (Note the file extension) generated by the GEFS reforecast set along with the metgrid logs for both GEFS View attachment 1 and CFSv2View attachment 2 test I did, and the rd_intermediate outputView attachment 3 from the sample GEFS file. Any assistance to pinning what's going on here, or some suggestions on things to try to fix the issue would be greatly appreciated. I can provide the python script I used to generate these intermediate files if you feel that would help to identify potential issues.



  • rd_out.txt
    55.7 KB · Views: 45
  • metgrid.CFS.log
    143.1 KB · Views: 43
  • metgrid.GEFS.log
    390.9 KB · Views: 41
  • SampleIFile.log
    141.8 MB · Views: 42
I just wanted to post an update regarding some additional things I have tried over the past week with no success:

  • Changed the precision of the GRIB files from float64 to float32 to allow for direct-type matching with the fortran code in PyWinter (2.0.5)
  • Truncated the extraneous decimal values to three digits to stop the 0.999999999999999 cases
  • Reordered the file such that the variables appear in the ordering per the WRF required variables documentation
  • Removed the Numpy array masking on all files loaded via PyGrib to ensure only native floats are used
  • Forced all values of SM, ST, and SKINTEMP to np.nan where LANDSEA = 0 (Water)

At this point in time, I'm completely baffled by this and out of things to try save the part I mentioned where variables are processing completely instead of by-level (Not sure how this would have any effect on the outcome though). I have compared the values generated in my files to other datasets and nothing seems to be out of place (Looked at CFSv2 and NARR), I have also used int2nc on the files to allow them to be loaded in Panoply and again, nothing looks out of place, or different than the other files (Save the recent test of NAN-ing out the values which blanked the NAN areas).

This is a fairly significant blocking issue in my research work, so any pointers on how to get over this issue would be greatly appreciated.


I am also having this exact same issue with some intermediate files I have generated using NCL from CMIP output. In my case, it looks like the met_em netcdf files have errors in the skin temperature and soil fields that take so very long to produce in metgrid, although I'm not sure why that is as analysis of the intermediate files does not show any obvious errors.

Did you have any luck figuring out what was taking metgrid so long to write your soil and skin temperature fields? I will continue to explore this and let you know if I find anything.

if you are running WPS to process the GEFS 0.5 deg data, please take a look at the website here:
I hope it is helpful for you to solve the problem.

However, if you are using GESF 1.0 deg data, then please don't bother.
Hi everyone,

Thanks, Ming, for this useful link. Although it doesn't connect directly with my issue, it does point out the importance of the land mask.

In my case, I was able to solve this problem when I realized that I had used land fraction in my intermediate files instead of land mask values. The three fields that were being very slow to process were soil moisture and temperature, as well as skin temperature. While looking at the METGRID.TBL you can see that these fields have more complex interpolation routines: interp_option=sixteen_pt+four_pt+wt_average_4pt+wt_average_16pt+search

First, I changed these interpolation options in the METGRID.TBL and found that metgrid no longer took so long to process these three fields. But I wanted to use the original interpolation method, so after thinking about what was really happening here realized that there was probably a problem with the mask. Indeed, I had accidentally used land fraction values (ranging from 0 to 1) instead of land mask (value of either 0 or 1).

I believe this may be the issue with the orginal post as well since in the rd_out.txt file says the LANDSEA field has units of fraction (although I have not looked at the values of this field in the output file that was uploaded).

Hope this helps!