Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Invalid DateTime string during tutorial run

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

m_a_russo

New member
Hello,

I am trying to run the MPAS tutorial and everything worked well until I tried running the regional section 4.4.

I get a segmentation fault error from the system and the MPAS log says that there is an error in the DateTime string (see log and slurm.out attached)

My namelist and streams files are the same as the tutorial since I basically modified them according to the instructions.

I'm quite stuck here, I've checked all the dates and they are all the same and the starting date exists in the input files.

Am I missing something obvious?

Thanks in advanced,
Michael Russo
 

Attachments

  • log_atmosphere.0000_err.txt
    313 bytes · Views: 73
  • slurm-3392624_out.txt
    1.3 KB · Views: 69
  • log_atmosphere.0000_out.txt
    10 KB · Views: 64
Based on the log files, it looks like there may be an issue in reading the initial LBC file. Could you run 'ncdump -v xtime Mediterranean.init.nc' and 'ncdump -v xtime lbc.2019-09-01_00.00.00.nc' to see whether there is any difference in the time-stamps for these two files? The initial conditions file was apparently read without trouble, and I'm wondering whether there may be some corruption of the first LBC file (perhaps invalid characters at the end of the 'xtime' variable, or something similar).

It may also help in narrowing down the problem to clean and re-compile the model with 'DEBUG=true'. Which compiler (and compiler version) are you using?
 
Thank you for helping out.

There does seem to be an issue with the LBC files (the Mediterranean file has the correct xtime). The dates are correct in the file names but the xtime variable is not even a date. But no errors were given when creating the LBC files (see attached log for LBC).

xtime =
"\'\207\000\000\000\000\000\000(\207\000\000\000\000\000\000)\207\000\000\000\000\000\000*\207\000\000\000\000\000\000+\207\000\000\000\000\000\000,\207\000\000\000\000\000\000-\207\000\000\000\000\000\000.\207" ;

I'm not sure about the specific version but I'm using an Intel Compiler from 2018.

Currently re-compiling with debug active.
 

Attachments

  • log_init_atmosphere_0000_out.txt
    110.9 KB · Views: 67
If it's the case that one or more input files are corrupted (in this case, at least the time string in an LBC file), then we may not need to focus on debugging the model itself at present.

Do any of the other LBC files show corrupted strings or nonsense in the 'xtime' variable? In the initial LBC file, does the 'lbc_theta' field look like it contains reasonable potential temperatures? It might point us in the right direction to know whether there are other files or variables that might not have been written properly by the 'init_atmosphere_model' program.

We've used the Intel 18 compilers with no problems on our computing cluster at NCAR, so that should be fine.

Which version of the PIO, parallel-netCDF, and netCDF libraries are you using?

Apologies for all of the questions -- there's nothing that stands out as obviously problematic, so I'm hoping that we'll stumble upon something that will lead us in the right direction.
 
All the LBC files have those random strings in the xtime variable. There are temperatures of about 280K but they quickly progress to over 400K and up to 800K! So I would say they are not reasonable.

PIO is 2.5.0, parallel-netCDF is 1.12.1 and netCDF is 4.4.0.

Please ask anything you wish, I really appreciate the help! More questions leads to less random attempts of me trying to find an answer :)
 
Potential temperatures of ~800K at heights of 30 km seem reasonable; note that 'theta' is potential temperature and not temperature.

The last version of PIO that I had been using was 2.4.4. I just tried using PIO 2.5.0 on our computing cluster with the Intel 18.0.5 compilers, the parallel-netCDF 1.12.1 library, and the netCDF 4.7.3 library. Running through section 4 of the tutorial, the 'xtime' variable in my LBC files looks correct. So, perhaps this isn't a library issue.

Since it would only take a couple of minutes, can you try deleting the lbc*nc files, then re-running the init_atmosphere_model program as in section 4.3 of the tutorial? It seems unlikely that the outcome will be different, but this would be an easy test to verify that the problem is reproducible.
 
I see that the PIO 2.5.0 release states that netCDF 4.6.1 or newer is required. However, since you have parallel-netCDF 1.12.1, it might be worth forcing the use of the parallel-netCDF library to write output files from MPAS. Although the default "io_type" (see Section 5.2 of the User's Guide) is the parallel-netCDF library, I think MPAS (or PIO) may switch to the serial netCDF library if only one MPI task is used. Could you try generating the LBC files again, but running the 'init_atmosphere_model' program with, say, four MPI tasks?
 
Hello again, I'm sorry for the temperature part! I still have a lot to learn.

I deleted the LBC files and changed the io_type to parallel-netCDF, then ran with 4 MPI tasks. I got a segmentation fault error within 10 seconds of it starting but it still created an lbc file, although it seems to only have the variables created without any values, so it did not go far.

Then I remembered that the first time I created the LBC files I got seg fault as well. But that time I did not delete the LBCs and re-submitted the job, so it created the rest of the files without any error messages, but the files have that weird string in xtime.

I tried to recreate this by running the model again after getting a seg fault once or twice, and it did run without stopping again and no error messages in the log, the lbc_theta values are similar to those I reported but the xtime variable is once again the nonsense I got last time (slightly different strings bit still seemingly random characters with forward slashes like this "\020o\205\2568Fb@\231\353w\").

Basically I get seg fault errors a couple of times but eventually, after not deleting the first files, it runs until the end and creates corrupted files. I assume it does this because the clobber mode is set to never_modify, otherwise it would always stop at the first?
Also, I have no idea what could be the origin of the seg fault error.
 
mgduda said:
I see that the PIO 2.5.0 release states that netCDF 4.6.1 or newer is required. However, since you have parallel-netCDF 1.12.1, it might be worth forcing the use of the parallel-netCDF library to write output files from MPAS. Although the default "io_type" (see Section 5.2 of the User's Guide) is the parallel-netCDF library, I think MPAS (or PIO) may switch to the serial netCDF library if only one MPI task is used. Could you try generating the LBC files again, but running the 'init_atmosphere_model' program with, say, four MPI tasks?

UPDATE:
I asked our cluster support to help me on this and he tested submitting the model using different mpi instructions than the ones suggested in the tutorial and it seems to have worked! We switched the mpiexec -n 4 for mpirun -np 4 and I have a proper date in the lbc files. It seems that this is working for our system configuration.

I will continue the tutorial and later update this thread with any news.
Thank you so much for the help! All the question were really helpful to make me review everything properly.
 
Thanks very much for the update, and it's very interesting that switching from mpiexec to mpirun has apparently resolved the issue of invalid data in the output files. From what I can tell, the differences between 'mpiexec' and 'mpirun', if any, can vary by MPI implementation. For example, on my laptop with OpenMPI 4.0.1, both 'mpirun' and 'mpiexec' are symlinks to the same binary. So, perhaps there's a critical difference between the two on your system. In any case, I'm glad to hear that you've been able to create the LBC files!
 
Top