MarcelloCasula
New member
Hi all,
I am running WRF 4.2 in a server Hawei 2488 equipped with 4 Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (total 4x24 cores) 388GB ram and I notice the strange behavior:
While running the test runs I encountered the following
anomalies:
1) the calculation time remains unchanged if the simulation is launched with 24 cores or more of them (up to use all 96)
2) sending 2 identical runs simultaneously each with 24 cores, the time of the single run doubles compared to the single run with 24 cores
3) sending 2 identical runs simultaneously each with 12 cores the time of the single run remains unchanged compared to the single run with 24 cores
4) the system was installed using both Intel and GNU, with the same results
5) opening the htop program you see that until you launch a run with 24 cores these are almost always correctly exploited at 100%, while as you increase the number of cores the percentage of use of each of them is lowered proportionally to the number of cores used
6) in all the test runs the ram of the system in use is around around 20% of the total, so it's not a memory problem insufficient installed.
It would seem like a threshold to the maximum number of operations that the system can do it per unit of time,
A little improvement turn up disabling NUMA in the bios, so the new time threshold became that one of 48 processor. But, as above, running at the same time two simulation with 48 core each one, the execution time of each simulation double exactly instead remains about the same. I'm quite sure this is not a problem of WRF, anyway I knock at the door of the experience of the community, to have at least a tip to solve this issue
Does anybody has a suggestion
Thank in advance
Marcello
I am running WRF 4.2 in a server Hawei 2488 equipped with 4 Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (total 4x24 cores) 388GB ram and I notice the strange behavior:
While running the test runs I encountered the following
anomalies:
1) the calculation time remains unchanged if the simulation is launched with 24 cores or more of them (up to use all 96)
2) sending 2 identical runs simultaneously each with 24 cores, the time of the single run doubles compared to the single run with 24 cores
3) sending 2 identical runs simultaneously each with 12 cores the time of the single run remains unchanged compared to the single run with 24 cores
4) the system was installed using both Intel and GNU, with the same results
5) opening the htop program you see that until you launch a run with 24 cores these are almost always correctly exploited at 100%, while as you increase the number of cores the percentage of use of each of them is lowered proportionally to the number of cores used
6) in all the test runs the ram of the system in use is around around 20% of the total, so it's not a memory problem insufficient installed.
It would seem like a threshold to the maximum number of operations that the system can do it per unit of time,
A little improvement turn up disabling NUMA in the bios, so the new time threshold became that one of 48 processor. But, as above, running at the same time two simulation with 48 core each one, the execution time of each simulation double exactly instead remains about the same. I'm quite sure this is not a problem of WRF, anyway I knock at the door of the experience of the community, to have at least a tip to solve this issue
Does anybody has a suggestion
Thank in advance
Marcello