Hi All,
I am building a small cluster to run WRF in my group. I'm aiming to have a handful of nodes, with about 200-400 cores in total. I have a couple of technical questions that I'm hoping to get answers from this group.
What is the recommended connectivity between the nodes?
The last time I built a machine like this (over 10 years ago), InfiniBand was a must, and top speed was 40gbps. I learned that InfiniBand reach more than 100gbps nowadays, but also that prices have sky rocketed (over $25k for an NVidea Infiniband switch). Is InfiniBand the only solution? Are there alternatives (optical, or copper)?
What is the recommended number of cores per node?
AMD now has a single CPU with 128 cores, so I was initially considering building nodes with 256 cores each. However, I'm worried that memory access might become an issue. Even with newer processors having more memory channels, 256 copies of WRF trying to access the same memory at the same time might be a bottleneck. So, maybe it might make more sense to limit the number of cores per node to 40 (?) but purchase 5 times more nodes? Any recommendations?
Number of CPU's per node?
For the same reason as above: would it be ideal to use single-CPU motherboards or dual-CPU? One vendor mentioned that the HPC community is going towards single-CPU motherboards in each nodes, and that Intel will not even produce dual sockets motherboards in the next years. Is the WRF community following this trend? Any recommendations?
What is the recommended memory?
In terms of memory, what is the minimum requirement for WRF? Is 4Mb per core enough, or should one aim at 6Mb to give some slack? This would mean 1.5Tb of RAM for 256 cores.
Thanks in advance,
Henrique
I am building a small cluster to run WRF in my group. I'm aiming to have a handful of nodes, with about 200-400 cores in total. I have a couple of technical questions that I'm hoping to get answers from this group.
What is the recommended connectivity between the nodes?
The last time I built a machine like this (over 10 years ago), InfiniBand was a must, and top speed was 40gbps. I learned that InfiniBand reach more than 100gbps nowadays, but also that prices have sky rocketed (over $25k for an NVidea Infiniband switch). Is InfiniBand the only solution? Are there alternatives (optical, or copper)?
What is the recommended number of cores per node?
AMD now has a single CPU with 128 cores, so I was initially considering building nodes with 256 cores each. However, I'm worried that memory access might become an issue. Even with newer processors having more memory channels, 256 copies of WRF trying to access the same memory at the same time might be a bottleneck. So, maybe it might make more sense to limit the number of cores per node to 40 (?) but purchase 5 times more nodes? Any recommendations?
Number of CPU's per node?
For the same reason as above: would it be ideal to use single-CPU motherboards or dual-CPU? One vendor mentioned that the HPC community is going towards single-CPU motherboards in each nodes, and that Intel will not even produce dual sockets motherboards in the next years. Is the WRF community following this trend? Any recommendations?
What is the recommended memory?
In terms of memory, what is the minimum requirement for WRF? Is 4Mb per core enough, or should one aim at 6Mb to give some slack? This would mean 1.5Tb of RAM for 256 cores.
Thanks in advance,
Henrique