Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Few CORES and the SPEEDUP no longer IMPROVES on my domains

GervyWRF

New member
Hi everyone. I've been using WRF for a long time and have always found performance improvements as the number of CPU cores increases below my expectations. I know that, given a certain hardware and compiler, the speedup depends on many factors, such as the size of the calculation domain and the resolution, however after a few cores (5,6 or 7 at most) I no longer have improvements in performance. Am I wrong or do I have to set something I don't know?

Here is my last system: Intel compiler, Linux machine, SSD, 16GB RAM (used little during simulations...)
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i9-12900T
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 59%
CPU max MHz: 4900,0000

For the ARW, this is the configuration:
&domains
time_step = 40,
time_step_fract_num = 0,
time_step_fract_den = 1,
time_step_dfi = 40,
use_adaptive_time_step = .false.,
step_to_output_time = .false.,
target_cfl = 1.2,
max_step_increase_pct = 10,
starting_time_step = 40,
starting_time_step_den = 0,
max_time_step = 108,
min_time_step = 6,
adaptation_domain = 1,
max_dom = 1,
s_we = 1,
e_we = 198,
s_sn = 1,
e_sn = 198,
s_vert = 1,
e_vert = 51,
num_metgrid_levels = 61,
num_metgrid_soil_levels = 6
dx = 7000,
dy = 7000,
grid_id = 1,
parent_id = 1,
i_parent_start = 1,
j_parent_start = 1,
parent_grid_ratio = 1,
parent_time_step_ratio = 1,
numtiles = 1,
p_top_requested = 5000.
smooth_option = 0
feedback = 0
/

Each core seems to be well used in terms of CPU:
672414 master 20 0 3238616 566752 107684 R 100,0 3,5 55:47.97 wrf_arw.exe
672409 master 20 0 3249520 579408 122660 R 100,0 3,6 55:49.55 wrf_arw.exe
672410 master 20 0 3238620 566992 107684 R 100,0 3,5 56:00.34 wrf_arw.exe
672411 master 20 0 3236568 567452 110036 R 100,0 3,5 55:42.65 wrf_arw.exe
672413 master 20 0 3238620 567124 110512 R 100,0 3,5 55:54.50 wrf_arw.exe
672412 master 20 0 3240672 567712 109832 R 100,0 3,5 55:56.74 wrf_arw.exe
672415 master 20 0 3218644 543264 105232 R 100,0 3,4 55:44.20 wrf_arw.exe

I only use 7 cores here, because if I increase the number of cores the calculation time improves rapidly for the first cores, but once it reaches 7 it no longer improves. I know the domain I use is small, but I didn't think that by reaching just 6 or 7 cores there would be no more improvements. I also tried increasing the domain size (all of Europe) and nothing changed!

A similar situation has always happened to me with other computers too...
What do you think? Any suggestions?

Thank you all!
 
Code:
Product Collection12th Generation Intel® Core™ i9 Processors
Code NameProducts formerly Alder Lake

Processor Number i9-12900T

Use Conditions PC/Client/Tablet, Workstation

CPU Specifications

Total Cores
16
# of Performance-cores
8
# of Efficient-cores
8
Total Threads
24
Max Turbo Frequency
4.90 GHz
Performance-core Max Turbo Frequency
4.80 GHz
Efficient-core Max Turbo Frequency
3.60 GHz
Performance-core Base Frequency
1.40 GHz
Efficient-core Base Frequency
1.00 GHz

I believe what you are hitting is the performance core vs effcient core issue. I have noticed on my intel 13900K that when I use more then the p-cores my speed gets worse.

From my computer science friends they have told me that when the e-cores and p-cores work together they run at less efficent speed together then just the p-cores by itself
 
Last edited:
Hello and thanks for your reply!

Some observations: the problem has always occurred to me, even with older PCs with multicore CPUs which I don't think have this hybrid p-e architecture... Even the old NMM core of the WRF, with the same domain and resolution I couldn't go more than 5 cores...

Also: I would then expect a performance improvement at least up to 8 cores, not 6 or 7 (5 in the case of the NMM, which I tested on the same computer).

Finally: the Intel Thread Director should be responsible for managing the coordinated functioning of the P-Core and E-Core. A hardware solution designed to optimize task management, choosing from time to time the most suitable Core to execute a given thread. According to needs, therefore, the Thread Director is able to assign the heaviest tasks to the Performance Cores and the lighter ones to the Efficient Cores.

And then: same problem with another computer, but AMD (not Intel)...


I really can't understand...
 
I guess that with more processors, the communication between these processors take more time, which eventually offset the higher computation efficiency due to more processors.
 
I thought so too, but I see many applications around where they use dozens of cores, or rather, even computer clusters (where the problem of communication speed is much greater) and I stop at 6 or 7 cores?
 
Your grid number is only 198 x 198, and 6 - 7 processors are already sufficient for this case. Further increasing the core numbers is not necessary and doesn't really help.
 
As I said at the beginning: "I also tried increasing the domain size (all of Europe) and nothing changed"...
 
Top