Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

(resolved )troubles, could not find trapping x locations

mengronglu

New member
Previously, I had run multiple simulations using the same dataset, model, and parameters. Later, I modified p_top_requested = 10000 and successfully completed a full simulation, obtaining the wrfout output.


When I tried to run it again afterwards, the error started to occur:

Error info 1:
===========================================================
DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 330337484 bytes allocated
med_initialdata_input: calling input_input
Input data is acceptable to use:
CURRENT DATE = 2022-09-01_00:00:00
SIMULATION START DATE = 2022-09-01_00:00:00
Max map factor in domain 1 = 1.01. Scale the dt in the model accordingly.
D01: Time step = 120.0000 (s)
D01: Grid Distance = 30.00000 (km)
D01: Grid Distance Ratio dt/dx = 4.000000 (s/km)
D01: Ratio Including Maximum Map Factor = 4.041906 (s/km)
D01: NML defined reasonable_time_step_ratio = 6.000000
---- WARNING : Older v3 input data detected
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 684
---- Error : Cannot use moist theta option with old data
-------------------------------------------
Abort(1) on node 22 (rank 22 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 22
===========================================================

Afterwards I tried many times but it still didn’t work. I checked the data and confirmed there was no missing TITLE with V4.* — I have always included it. I then prepared to rerun real.exe to regenerate the data, but a new issue occurred:

Error info 2:
===========================================================
Using sfcprs3 to compute psfc
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 6506
troubles, could not find trapping x locations
-------------------------------------------
Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
===========================================================

I then found on the forum that in some cases, errors occurred after modifying p_top_requested = 10000. So I readjusted p_top_requested = 5000 and regenerated the new met_em.d0* data starting from WPS. However, the same error still occurred — “troubles, could not find trapping x locations,” as shown in the attached file.
This happened even after I changed the input data (I originally used ERA5_pl together with surface reanalysis data and Vtable.ECMWF, and then tried GFS-FNL data with Vtable.GFS) and even after recompiling WPS (v4.1) and WRF (v4.5.2, same version as before). The problem still persisted. (rsl.error.0014 are attatched)

Therefore, my question is: in my sample file, which variables are incorrect? Is it the PRES variable, where the data at k=0 comes from PSFC? However, I have previously run large-domain simulations in cases where PRES at k=0 was also taken from PSFC, and I did not encounter similar issues.

Given that all these problems occurred after I once modified p_top_requested, I am not sure whether this could have any lasting or potential effects on the system environment.
Because after error info 1 occurred, I used the same set of data without rerunning WPS, modified p_top_requested, and ran the model with the original met_em.d0* data. The error message I got was:

Error info 3:
===========================================================
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 95 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 96 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 97 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 98 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 99 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 100 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 101 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 102 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 103 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 104 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 105 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 106 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 107 30 38 , setting Qv to 0
d01 2022-09-01_09:00:00 t(i,j,k) was 0 at 108 30 38 , setting Qv to 0
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 1339
grid%p_top > previous value
-------------------------------------------
Abort(1) on node 2 (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
===========================================================
 

Attachments

  • rsl_error_0014.txt
    1.4 KB · Views: 1
  • rsl_out_0014.txt
    26.3 KB · Views: 0
  • namelist.input
    11 KB · Views: 1
  • namelist.wps
    1.3 KB · Views: 0
The error message seems to indicate that you are running a new version of WRF (newer than v4.0) but your input data were produced by older WPS.

Can you run WPSv4.5 to create met_em files, then rerun real.exe?

WPSv4.5 should be consistent with the WRF verison you are using (WRFv4.5.2). Please let me know if you still ahve some issues.
 
Hi Ming,

I have updated my WPS to version 4.5 but issues remain:

I have processed the following tests:
test 1:
I followed the cases:
https://forum.mmm.ucar.edu/threads/era5-landmask-error.21837/,
download ERA5 pressure level data. the dataset from RDA:
https://rda.ucar.edu/datasets/d633000/dataaccess/#
invariant:

e5.oper.invariant.128_129_z.regn320sc.2016010100_2016010100.nc
e5.oper.invariant.128_172_lsm.ll025sc.1979010100_1979010100.nc

run era5_to_int.py and the log file (era5.log)is attached

Error when running real.exe:
Using sfcprs3 to compute psfc
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 6506
troubles, could not find trapping x locations
----------------------------------------------------

test_1_met_em.d0* : met_em_files.
test_1_namelist.wps , test_1_namelist.input, and the test_rsl.* files are attached.


test2:
CDS ERA5 pressure level data and surface level data:
ERA5 hourly data on pressure levels from 1940 to present,

Error when running real.exe:
Using sfcprs3 to compute psfc
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 6506
troubles, could not find trapping x locations
-----------------------------------------------------
test_2_met_em.d0* : met_em_files.

My workflow is as follows:
1. link data and test_2_Vtable, from Vtable.ECMWF
2. ./ungrib.exe, ./metgrib.exe, attached : test_2_namelist.wps

My questions:
  1. Which part is most likely causing the problem — WPS or WRF?
  2. Why does the vertical distribution of the variable PRES look like this: could this be the cause?
    【 Because previously my input fields were similar to this, but I was able to successfully generate the wrfinput_d0* files, and the PB field was normal, and the simulation ran without problems (using the same data and process as in test_2, only with a 100×100 grid domain; met_em.d0*: met_em_files). 】
  3. If not, what other reasons could cause such a result?
PRES test_2_ layer=01755261436371.png

PRES test_2_layer=11755262553110.png
 

Attachments

  • test_2_rsl_error_0015.txt
    1.4 KB · Views: 0
  • test_2_namelist.wps.wps
    1.3 KB · Views: 0
  • test_2_namelist.input
    12.3 KB · Views: 0
  • test_1_rsl_out_0004.txt
    3.6 KB · Views: 0
  • test_1_rsl_error_0004.txt
    1.4 KB · Views: 0
  • test_1_namelist.wps
    732 bytes · Views: 0
  • test_1_namelist.input
    12.3 KB · Views: 1
  • era5.log
    144 KB · Views: 1
Hi,

Let's forget test2, which processed data from CDS. I am not familiar with the CDS data and we always use ERA5 from NCAR RDA.

For test1, I do believe that it is a data issue. Somehow you may not download all the required ERA5 data. I will take a look and get back to you.
 
In your namelist.input, the option " interval_seconds = 10800 " indicates that your ERA5 data should be at 3-hr interval.

However, in your era5.log, the data is at 6-hr interval (interval_hours = 6:00:00).

Can you make these two options consistent?

Also, please remove the options below:

! nproc_x = 6


! nproc_y = 9


nproc_x = -1,


nproc_y = -1,


numtiles = 1
 
Hi Ming,
Thanks for your reply!
I have updated the parameters in the namelist.input and run the whole process again, but the issue is remaining the same.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 6506
troubles, could not find trapping x locations
-------------------------------------------
test_6h_met_em.d0* : met_em_files.
May I ask if an environment-variable issue could cause this error? It appeared suddenly after running normally for a long time, and I haven’t been able to find or fix the cause by changing parameters or input data. I don’t think there’s anything wrong with my met_em.d0* files. I’m going to try downloading the official namelist.input again to see if that helps.
 

Attachments

  • test_6h_rsl_out_0004.txt
    2.6 KB · Views: 0
  • test_6h_rsl_error_0004.txt
    1.4 KB · Views: 0
  • namelist.input
    10.3 KB · Views: 0
In your namelist.input, the option " interval_seconds = 10800 " indicates that your ERA5 data should be at 3-hr interval.

However, in your era5.log, the data is at 6-hr interval (interval_hours = 6:00:00).

Can you make these two options consistent?

Also, please remove the options below:

! nproc_x = 6


! nproc_y = 9


nproc_x = -1,


nproc_y = -1,


numtiles = 1
Hi Ming,
I run more test cases (2022-09-01 to 2022-09-05, ref_lat=30, ref_lon=120,), and I found following things,
test_a
esn,ewe=100*100,dx=dy=30km,e_vert=50
fail, error message as shown is test_6h at the beginning.
test_b:
I deleted the SPECHUMD var.
esn,ewe=100*100,dx=dy=30km,e_vert=50
success
test_c:
I deleted the SPECHUMD var. and
I try to change the domain grib to 120*120, others parameters remained the same (dx=dy=30km,e_vert=50):
failed. error and out files are attached as test_c_rsl*:
d01 2022-09-01_18:00:00 Timing for input 0 s.
d01 2022-09-01_18:00:00 flag_soil_layers read from met_em file is 1
grid%p_top from last time period = 5000.000
grid%p_top from this time period = 9660.516
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 1359
grid%p_top > previous value
-------------------------------------------
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
test_d:
the same as test_c, but change the p_top_requested to 10000
failed. error: test_d_rsl*:
-------------------------------------------------------
d01 2022-09-01_18:00:00 Timing for input 0 s.
d01 2022-09-01_18:00:00 flag_soil_layers read from met_em file is 1
Using sfcprs3 to compute psfc
i,j = 1 31
target pressure and value = NaN 1.4012985E-45
column of pressure and value = NaN 0.0000000E+00
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = NaN NaN
column of pressure and value = -Infinity NaN
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 6567
troubles, could not find trapping x locations
-------------------------------------------
test_e:
I deleted the SPECHUMD var. and
I try to change the dx=dy=15km, and the grib setting is 120*120, evert=50,
failed. error: test_e_rsl*:
----------------------------------------
d01 2022-09-01_18:00:00 Timing for input 0 s.
d01 2022-09-01_18:00:00 flag_soil_layers read from met_em file is 1
grid%p_top from last time period = 5000.000
grid%p_top from this time period = 9481.984
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 1359
grid%p_top > previous value
-------------------------------------------

I'm really confused by these results. Do you have any advice?
Please let me know if you need any additional files. Thank you very much for your help.
 

Attachments

  • test_c_rsl_error_0000.txt
    3.6 KB · Views: 0
  • test_c_rsl_out_0000.txt
    13 KB · Views: 0
  • test_d_rsl_error_0004.txt
    3.6 KB · Views: 0
  • test_d_rsl_out_0004.txt
    15.2 KB · Views: 0
  • test_e_rsl_error_0004.txt
    3.5 KB · Views: 1
  • test_e_rsl_out_0004.txt
    3.5 KB · Views: 0
  • namelist.input
    12.3 KB · Views: 1
Are you running WRF-Chem? How did you compile the code?

The error message indicates that some namelist options are changed when REAL is running. This is not allowed when we run WRF-ARW.

I am suspicious that the data communication during parallel run somehow doesn't work correctly. This is more like a compiling or machine issue.
 
You were absolutely right.
I tested with a single core via srun yesterday, and all my previous failed input fields ran successfully on my side. It turns out the failures were due to an MPI issue in the parallel runs.
Many thanks for your kind and professional help.
Are you running WRF-Chem? How did you compile the code?

The error message indicates that some namelist options are changed when REAL is running. This is not allowed when we run WRF-ARW.

I am suspicious that the data communication during parallel run somehow doesn't work correctly. This is more like a compiling or machine issue.
 
Top