Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Trouble running MPAS-Atmosphere with high resolution

satya

New member
Hello, I am facing issues running MPAS-Atmosphere (atmosphere_model) with high resolution grid (nCells=999426). However, its running successfully with low resolution grid (nCells=40962). I have attached the log files and the namelist. Could you please help?
 

Attachments

  • TEST1.out.txt
    67.2 KB · Views: 4
  • namelist.atmosphere.txt
    1.8 KB · Views: 3
  • log.atmosphere.0000.out.txt
    22.5 KB · Views: 3
I am out of office for a business trip. I will look at your issue after I am back after July 5. Thank you for your patience.

Just a quick question: how many processors do you use to run this case?
 
Thanks for your response Ming.

> how many processors do you use to run this case?
I am using 512 processors (8 nodes, 64 processors per node).
 
Dear Ming, hope you are checked the issue posted by me. Could you please help me in resolving it.

With thanks
Satya
 
Hi Satya,

Thank you or reminding me of this issue, ---- I was distracted by many issues after getting out of office for weeks.

I looked at your log file and TEST1.out.txt. The only error message I can find is 'segmentation fault', which can be attributed to various reasons.

In your namelist.input, you set config_len_disp = 10000.0. Would you please tell me what mesh did you use to run this case?

Did you modify any codes in MPAS-V8.2.2? What data did you ungrib to produce initial condition for this case?

Thanks.
 
Last edited:
Hi Satya,

Thank you or reminding me of this issue, ---- I was distracted by many issues after getting out of office for weeks.

I looked at your log file and TEST1.out.txt. The only error message I can find is 'segmentation fault', which can be attributed to various reasons.

In your namelist.input, you set config_len_disp = 10000.0. Would you please tell me what mesh did you use to run this case?

Did you modify any codes in MPAS-V8.2.2? What data did you ungrib to produce initial condition for this case?

Thanks.
Hi Ming,

Thanks for responding to my message.

>Would you please tell me what mesh did you use to run this case?
I am using x6.999426 mesh.

>Did you modify any codes in MPAS-V8.2.2?
No, I have not made any change in the code. I Have tried doing the same simulation with 8.3.1 also but got the same error.

>What data did you ungrib to produce initial condition for this case?
For initial conditions, I tried with both ERA5 and GFS data. No success.

With thanks
Satya
 
Last edited by a moderator:
Satya,

The x6.999426 mesh is global 60-10km mesh. Please tell me the following information:

(1) Did you run grid_rotate to relocate the 10-km mesh to some other area?

(2) please send me your namelist and streams files (for static and initial condition as well as for the model run itself)

(3) please stay with GFS data, and rerun your failed case using more than 1500 processors (this test will tell us whether the memory could be an issue for the model crash)

Thanks.
 
Hi Ming, yes I am trying to do global simulation.

>(1) Did you run grid_rotate to relocate the 10-km mesh to some other area?
Yes,we have rotated the mess, and fixed over the Indian region. The issue persisted even after we ran the model without rotated grid.
We have also experimented using the uniform 30 km grid (x1.655362.grid.nc), but the issue persisted.

>(2) please send me your namelist and streams files (for static and initial condition as well as for the model run itself)
I have attached ns.tar file containing all namelist and streams files. These namelist and stream file sets are used for creating static, initial and boundary conditions. The namelist that contains “fct” was used to run the model in forecast mode.

>(3) please stay with GFS data, and rerun your failed case using more than 1500 processors (this test will tell us whether the memory could be an issue for the model crash)
Since we do not have 1500 processors, we are unable to run on that many processors. Maximum I can use 1024 processors.

With thanks
Satya
 

Attachments

  • ns.tar
    20 KB · Views: 1
Satya,

Thank you for sending the files. I will take a look and get back to you. It may take some time because this week is WRF tutorial week. Thanks for your patience.

Just a quick question: have you run the case with 1024 processors? How large is your run-time memory? I am suspicious that insufficient memory could be a reason for the model crash.
 
Hello Ming, yes I have run the same case on 1024 processors. It also stops after giving the segmentation fault error. The log file (log.atmosphere.0000.txt) is attached with this thread. By "run-time memory" do you mean the total memory usage? I am firing job using qsub, don't know how to find the total memory usage when the job is run via qsub. However, the log file also contains something rearding memory allocation, as follows.

Allocating fields ...
97 MB allocated for fields on this task
98958 MB total allocated for fields across all tasks
----- done allocating fields -----

If this issue is because of insufficient memory, then will it be possible for you to help me in fixing it?

With thanks
Satya
 

Attachments

  • log.atmosphere.0000.txt
    22.9 KB · Views: 1
Hi Satya,

I have looked at all the files included in your ns.tar file. They look fine to me except that in your namelist.atmosphere_fct, config_dt could be larger. For the finest mesh of 10km, config_dt can be 60, ---- however, your option of config_dt = 30 should be fine, only that the model run could be slow with smller than normal time step. It is not the reason for the model crahs.

I believe your case failure is attributed to insufficient memory. I have talked to our software engineer. Based on our estimation, a single column of MPAS will need about 0.4MB memory. For your case, the total memory required is close to 400000MB.

If you don't have a large memory, you will have to run MPAS with smaller number of mesh.

Hope this is helpful for you. Let me know if you have more questions.
 
Hello Ming,

I checked the memory available per node in our HPC. It's about 32 GB. I am using 16 nodes (64 processors per node) for running the model. Is it not sufficient enough? Earlier I had installed MPAS version 7.3 (MPAS_V7.3) in another HPC (OLD_HPC) which has almost half the resources of current HPC (NEW_HPC). It is running successfully on OLD_HPC. NEW_HPC was recently installed so I installed MPAS_V7.3 there. It got installed successfully but while running it gave a segmentation fault error. I tried other versions (v8.2.2 and 8.3.1) too but all have the same problem. In fact, I faced a similar problem while running WRF. I resolved it by compiling it again with an additional flag (-heap-arrays). I tried to compile MPAS also with the -heap-arrays flag but the issue remained the same. Can this be a compiler issue? I am using the ifort version 2021.2.0 on NEW_HPC. The OLD_HPC has version 16.0.1.

I have attached a log file obtained after running MPAS with valgrind.



Looking for you response.

With thanks
Satya
 

Attachments

  • TEST.err.txt.bz2
    301.1 KB · Views: 0
Top