Hi,
I am running WRF v4.5 simulations on a cluster. I performed few tests to determine which number of nodes/tasks/CPU was optimum for my case. I tried several configurations and measured the time of a timestep. Here is a sample of the results :
1 node, 1 task, 1 core : 15 s
1 node, 4 tasks, 4 cores : 53 s
1 node, 4 tasks, 48 cores : 53 s
4 nodes, 4 tasks, 4 cores : 7 s
4 nodes, 9 tasks, 9 cores : 53 s
9 nodes, 9 tasks, 9 cores : 3.5 s
16 nodes, 16 tasks, 16 cores : 2 s
The nodes involved are always the same ones. My simulation is composed of 3 nested 120x120x52 cells domains.
I have difficulties to understand these results. It seems that when multiple cores and used in the same node, the program is very slow. I would have infered the opposite since message passing inside a single node should be faster than between two nodes ?
Mathieu
I am running WRF v4.5 simulations on a cluster. I performed few tests to determine which number of nodes/tasks/CPU was optimum for my case. I tried several configurations and measured the time of a timestep. Here is a sample of the results :
1 node, 1 task, 1 core : 15 s
1 node, 4 tasks, 4 cores : 53 s
1 node, 4 tasks, 48 cores : 53 s
4 nodes, 4 tasks, 4 cores : 7 s
4 nodes, 9 tasks, 9 cores : 53 s
9 nodes, 9 tasks, 9 cores : 3.5 s
16 nodes, 16 tasks, 16 cores : 2 s
The nodes involved are always the same ones. My simulation is composed of 3 nested 120x120x52 cells domains.
I have difficulties to understand these results. It seems that when multiple cores and used in the same node, the program is very slow. I would have infered the opposite since message passing inside a single node should be faster than between two nodes ?
Mathieu