Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

server bios configuration

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

MarcelloCasula

New member
Hi all,
I am running WRF 4.2 in a server Hawei 2488 equipped with 4 Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (total 4x24 cores) 388GB ram and I notice the strange behavior:
While running the test runs I encountered the following
anomalies:
1) the calculation time remains unchanged if the simulation is launched with 24 cores or more of them (up to use all 96)
2) sending 2 identical runs simultaneously each with 24 cores, the time of the single run doubles compared to the single run with 24 cores
3) sending 2 identical runs simultaneously each with 12 cores the time of the single run remains unchanged compared to the single run with 24 cores
4) the system was installed using both Intel and GNU, with the same results
5) opening the htop program you see that until you launch a run with 24 cores these are almost always correctly exploited at 100%, while as you increase the number of cores the percentage of use of each of them is lowered proportionally to the number of cores used
6) in all the test runs the ram of the system in use is around around 20% of the total, so it's not a memory problem insufficient installed.

It would seem like a threshold to the maximum number of operations that the system can do it per unit of time,

A little improvement turn up disabling NUMA in the bios, so the new time threshold became that one of 48 processor. But, as above, running at the same time two simulation with 48 core each one, the execution time of each simulation double exactly instead remains about the same. I'm quite sure this is not a problem of WRF, anyway I knock at the door of the experience of the community, to have at least a tip to solve this issue

Does anybody has a suggestion

Thank in advance

Marcello
 
Hi Marcello,
I'm a little confused about what a "single run" means in your post. For instance, in 2), you said the two runs were identical, both with 24 cores. I assume one of the two identical simulations is the "single run," but what is the other one? I thought you meant a run with a single node, but given the content, I don't think that is correct.

Unfortunately, our team at NCAR likely won't be able to help with this, as it sounds like it's a system issue. Do you have a systems administrator at your institution that could help with the problem? And as you said, perhaps someone in the community with experience will be able to help. I hope so!
 
Sorry for the misleading expression "single" in:
2) sending 2 identical runs simultaneously each with 24 cores, the time of the single run doubles compared to the single run with 24 cores
3) sending 2 identical runs simultaneously each with 12 cores the time of the single run remains unchanged compared to the single run with 24 cores

in points 2 and 3 I would like to say that if I run just one simulation in the directory AAA involving 24/94 core it takes, for instance, 1 hours, but if I duplicate the directory AAA in BBB and I run 2 simulations one in AAA (involving 24/94 core) the other in BBB (involving 24/94 core) the system take almost the double time, therefore I observe a doubling in time even if the system do not exploits all its resources. This fact it seems to me anomalous, It's correct my impression? And if it is so, does anybody have any suggestion?
 
Thanks for clarifying. That makes more sense. It does sound like a problem with your particular system. If you have anyone in the systems group at your institution that you can discuss the problem with, I'd recommend starting there. They should know how the machine should perform, what to expect, and solutions for correcting it.
 
Top