Hi everyone. I've been using WRF for a long time and have always found performance improvements as the number of CPU cores increases below my expectations. I know that, given a certain hardware and compiler, the speedup depends on many factors, such as the size of the calculation domain and the resolution, however after a few cores (5,6 or 7 at most) I no longer have improvements in performance. Am I wrong or do I have to set something I don't know?
Here is my last system: Intel compiler, Linux machine, SSD, 16GB RAM (used little during simulations...)
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i9-12900T
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 59%
CPU max MHz: 4900,0000
For the ARW, this is the configuration:
&domains
time_step = 40,
time_step_fract_num = 0,
time_step_fract_den = 1,
time_step_dfi = 40,
use_adaptive_time_step = .false.,
step_to_output_time = .false.,
target_cfl = 1.2,
max_step_increase_pct = 10,
starting_time_step = 40,
starting_time_step_den = 0,
max_time_step = 108,
min_time_step = 6,
adaptation_domain = 1,
max_dom = 1,
s_we = 1,
e_we = 198,
s_sn = 1,
e_sn = 198,
s_vert = 1,
e_vert = 51,
num_metgrid_levels = 61,
num_metgrid_soil_levels = 6
dx = 7000,
dy = 7000,
grid_id = 1,
parent_id = 1,
i_parent_start = 1,
j_parent_start = 1,
parent_grid_ratio = 1,
parent_time_step_ratio = 1,
numtiles = 1,
p_top_requested = 5000.
smooth_option = 0
feedback = 0
/
Each core seems to be well used in terms of CPU:
672414 master 20 0 3238616 566752 107684 R 100,0 3,5 55:47.97 wrf_arw.exe
672409 master 20 0 3249520 579408 122660 R 100,0 3,6 55:49.55 wrf_arw.exe
672410 master 20 0 3238620 566992 107684 R 100,0 3,5 56:00.34 wrf_arw.exe
672411 master 20 0 3236568 567452 110036 R 100,0 3,5 55:42.65 wrf_arw.exe
672413 master 20 0 3238620 567124 110512 R 100,0 3,5 55:54.50 wrf_arw.exe
672412 master 20 0 3240672 567712 109832 R 100,0 3,5 55:56.74 wrf_arw.exe
672415 master 20 0 3218644 543264 105232 R 100,0 3,4 55:44.20 wrf_arw.exe
I only use 7 cores here, because if I increase the number of cores the calculation time improves rapidly for the first cores, but once it reaches 7 it no longer improves. I know the domain I use is small, but I didn't think that by reaching just 6 or 7 cores there would be no more improvements. I also tried increasing the domain size (all of Europe) and nothing changed!
A similar situation has always happened to me with other computers too...
What do you think? Any suggestions?
Thank you all!
Here is my last system: Intel compiler, Linux machine, SSD, 16GB RAM (used little during simulations...)
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i9-12900T
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 59%
CPU max MHz: 4900,0000
For the ARW, this is the configuration:
&domains
time_step = 40,
time_step_fract_num = 0,
time_step_fract_den = 1,
time_step_dfi = 40,
use_adaptive_time_step = .false.,
step_to_output_time = .false.,
target_cfl = 1.2,
max_step_increase_pct = 10,
starting_time_step = 40,
starting_time_step_den = 0,
max_time_step = 108,
min_time_step = 6,
adaptation_domain = 1,
max_dom = 1,
s_we = 1,
e_we = 198,
s_sn = 1,
e_sn = 198,
s_vert = 1,
e_vert = 51,
num_metgrid_levels = 61,
num_metgrid_soil_levels = 6
dx = 7000,
dy = 7000,
grid_id = 1,
parent_id = 1,
i_parent_start = 1,
j_parent_start = 1,
parent_grid_ratio = 1,
parent_time_step_ratio = 1,
numtiles = 1,
p_top_requested = 5000.
smooth_option = 0
feedback = 0
/
Each core seems to be well used in terms of CPU:
672414 master 20 0 3238616 566752 107684 R 100,0 3,5 55:47.97 wrf_arw.exe
672409 master 20 0 3249520 579408 122660 R 100,0 3,6 55:49.55 wrf_arw.exe
672410 master 20 0 3238620 566992 107684 R 100,0 3,5 56:00.34 wrf_arw.exe
672411 master 20 0 3236568 567452 110036 R 100,0 3,5 55:42.65 wrf_arw.exe
672413 master 20 0 3238620 567124 110512 R 100,0 3,5 55:54.50 wrf_arw.exe
672412 master 20 0 3240672 567712 109832 R 100,0 3,5 55:56.74 wrf_arw.exe
672415 master 20 0 3218644 543264 105232 R 100,0 3,4 55:44.20 wrf_arw.exe
I only use 7 cores here, because if I increase the number of cores the calculation time improves rapidly for the first cores, but once it reaches 7 it no longer improves. I know the domain I use is small, but I didn't think that by reaching just 6 or 7 cores there would be no more improvements. I also tried increasing the domain size (all of Europe) and nothing changed!
A similar situation has always happened to me with other computers too...
What do you think? Any suggestions?
Thank you all!