|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
6309ff98 |
| 04-Mar-2025 |
Ahmed S. Darwish <[email protected]> |
x86/cacheinfo: Remove unnecessary headers and reorder the rest
Remove the headers at cacheinfo.c that are no longer required.
Alphabetically reorder what remains since more headers will be included
x86/cacheinfo: Remove unnecessary headers and reorder the rest
Remove the headers at cacheinfo.c that are no longer required.
Alphabetically reorder what remains since more headers will be included in further commits.
Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
b3a756bd |
| 04-Mar-2025 |
Thomas Gleixner <[email protected]> |
x86/cacheinfo: Remove the P4 trace leftovers for real
Commit 851026a2bf54 ("x86/cacheinfo: Remove unused trace variable") removed the switch case for LVL_TRACE but did not get rid of the surrounding
x86/cacheinfo: Remove the P4 trace leftovers for real
Commit 851026a2bf54 ("x86/cacheinfo: Remove unused trace variable") removed the switch case for LVL_TRACE but did not get rid of the surrounding gunk.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
8177c6be |
| 04-Mar-2025 |
Ahmed S. Darwish <[email protected]> |
x86/cacheinfo: Validate CPUID leaf 0x2 EDX output
CPUID leaf 0x2 emits one-byte descriptors in its four output registers EAX, EBX, ECX, and EDX. For these descriptors to be valid, the most signific
x86/cacheinfo: Validate CPUID leaf 0x2 EDX output
CPUID leaf 0x2 emits one-byte descriptors in its four output registers EAX, EBX, ECX, and EDX. For these descriptors to be valid, the most significant bit (MSB) of each register must be clear.
The historical Git commit:
019361a20f016 ("- pre6: Intel: start to add Pentium IV specific stuff (128-byte cacheline etc)...")
introduced leaf 0x2 output parsing. It only validated the MSBs of EAX, EBX, and ECX, but left EDX unchecked.
Validate EDX's most-significant bit.
Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
9677be09 |
| 28-Nov-2024 |
Ricardo Neri <[email protected]> |
x86/cacheinfo: Delete global num_cache_leaves
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all CPUs from the same global "num_cache_leaves".
This is erroneous on systems su
x86/cacheinfo: Delete global num_cache_leaves
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all CPUs from the same global "num_cache_leaves".
This is erroneous on systems such as Meteor Lake, where each CPU has a distinct num_leaves value. Delete the global "num_cache_leaves" and initialize num_leaves on each CPU.
init_cache_level() no longer needs to set num_leaves. Also, it never had to set num_levels as it is unnecessary in x86. Keep checking for zero cache leaves. Such condition indicates a bug.
[ bp: Cleanup. ]
Signed-off-by: Ricardo Neri <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Cc: [email protected] # 6.3+ Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2 |
|
| #
ffc92cf3 |
| 24-Jan-2024 |
Kirill A. Shutemov <[email protected]> |
x86/pat: Simplify the PAT programming protocol
The programming protocol for the PAT MSR follows the MTRR programming protocol. However, this protocol is cumbersome and requires disabling caching (CR
x86/pat: Simplify the PAT programming protocol
The programming protocol for the PAT MSR follows the MTRR programming protocol. However, this protocol is cumbersome and requires disabling caching (CR0.CD=1), which is not possible on some platforms.
Specifically, a TDX guest is not allowed to set CR0.CD. It triggers a #VE exception.
It turns out that the requirement to follow the MTRR programming protocol for PAT programming is unnecessarily strict. The new Intel Software Developer Manual (http://www.intel.com/sdm) (December 2023) relaxes this requirement, please refer to the section titled "Programming the PAT" for more information.
In short, this section provides an alternative PAT update sequence which doesn't need to disable caches around the PAT update but only to flush those caches and TLBs.
The AMD documentation does not link PAT programming to MTRR and is there fore, fine too.
The kernel only needs to flush the TLB after updating the PAT MSR. The set_memory code already takes care of flushing the TLB and cache when changing the memory type of a page.
[ bp: Expand commit message. ]
Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
89b0f15f |
| 13-Feb-2024 |
Thomas Gleixner <[email protected]> |
x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
Now that __num_cores_per_package and __num_threads_per_package are available, cpuinfo::x86_max_cores and the related math all over the place can b
x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
Now that __num_cores_per_package and __num_threads_per_package are available, cpuinfo::x86_max_cores and the related math all over the place can be replaced with the ready to consume data.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
f7fb3b2d |
| 13-Feb-2024 |
Thomas Gleixner <[email protected]> |
x86/cpu: Provide an AMD/HYGON specific topology parser
AMD/HYGON uses various methods for topology evaluation:
- Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb, which is the p
x86/cpu: Provide an AMD/HYGON specific topology parser
AMD/HYGON uses various methods for topology evaluation:
- Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb, which is the preferred variant for modern CPUs.
Leaf 0xb will be superseded by leaf 0x80000026 soon, which is just another variant of the Intel 0x1f leaf for whatever reasons. - Subleaf 0x80000008 and NODEID_MSR base
- Legacy fallback
That code is following the principle of random bits and pieces all over the place which results in multiple evaluations and impenetrable code flows in the same way as the Intel parsing did.
Provide a sane implementation by clearly separating the three variants and bringing them in the proper preference order in one place.
This provides the parsing for both AMD and HYGON because there is no point in having a separate HYGON parser which only differs by 3 lines of code. Any further divergence between AMD and HYGON can be handled in different functions, while still sharing the existing parsers.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Zhang Rui <[email protected]> Tested-by: Wang Wendy <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
7e3ec628 |
| 13-Feb-2024 |
Thomas Gleixner <[email protected]> |
x86/cpu/amd: Provide a separate accessor for Node ID
AMD (ab)uses topology_die_id() to store the Node ID information and topology_max_dies_per_pkg to store the number of nodes per package.
This col
x86/cpu/amd: Provide a separate accessor for Node ID
AMD (ab)uses topology_die_id() to store the Node ID information and topology_max_dies_per_pkg to store the number of nodes per package.
This collides with the proper processor die level enumeration which is coming on AMD with CPUID 8000_0026, unless there is a correlation between the two. There is zero documentation about that.
So provide new storage and new accessors which for now still access die_id and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are converted over.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Zhang Rui <[email protected]> Tested-by: Wang Wendy <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7 |
|
| #
6e290323 |
| 14-Aug-2023 |
Thomas Gleixner <[email protected]> |
x86/cpu: Move cpu_l[l2]c_id into topology info
The topology IDs which identify the LLC and L2 domains clearly belong to the per CPU topology information.
Move them into cpuinfo_x86::cpuinfo_topo an
x86/cpu: Move cpu_l[l2]c_id into topology info
The topology IDs which identify the LLC and L2 domains clearly belong to the per CPU topology information.
Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU data and the related exports.
This also paves the way to do proper topology evaluation during early boot because it removes the only per CPU dependency for that.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Reviewed-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
8a169ed4 |
| 14-Aug-2023 |
Thomas Gleixner <[email protected]> |
x86/cpu: Move cpu_die_id into topology info
Move the next member.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: So
x86/cpu: Move cpu_die_id into topology info
Move the next member.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
02fb601d |
| 14-Aug-2023 |
Thomas Gleixner <[email protected]> |
x86/cpu: Move phys_proc_id into topology info
Rename it to pkg_id which is the terminology used in the kernel.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by:
x86/cpu: Move phys_proc_id into topology info
Rename it to pkg_id which is the terminology used in the kernel.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
b9655e70 |
| 14-Aug-2023 |
Thomas Gleixner <[email protected]> |
x86/cpu: Encapsulate topology information in cpuinfo_x86
The topology related information is randomly scattered across cpuinfo_x86.
Create a new structure cpuinfo_topo and move in a first step init
x86/cpu: Encapsulate topology information in cpuinfo_x86
The topology related information is randomly scattered across cpuinfo_x86.
Create a new structure cpuinfo_topo and move in a first step initial_apicid and apicid into it.
Aside of being better readable this is in preparation for replacing the horribly fragile CPU topology evaluation code further down the road.
Consolidate APIC ID fields to u32 as that represents the hardware type.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2 |
|
| #
a32226fa |
| 12-May-2023 |
Thomas Gleixner <[email protected]> |
x86/cpu/cacheinfo: Remove cpu_callout_mask dependency
cpu_callout_mask is used for the stop machine based MTRR/PAT init.
In preparation of moving the BP/AP synchronization to the core hotplug code,
x86/cpu/cacheinfo: Remove cpu_callout_mask dependency
cpu_callout_mask is used for the stop machine based MTRR/PAT init.
In preparation of moving the BP/AP synchronization to the core hotplug code, use a private CPU mask for cacheinfo and manage it in the starting/dying hotplug state.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Tested-by: Helge Deller <[email protected]> # parisc Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8 |
|
| #
851026a2 |
| 10-Feb-2023 |
Borislav Petkov (AMD) <[email protected]> |
x86/cacheinfo: Remove unused trace variable
15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the last use of the trace local var. Remove it too and the useless trace cache case.
No
x86/cacheinfo: Remove unused trace variable
15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the last use of the trace local var. Remove it too and the useless trace cache case.
No functional changes.
Reported-by: Jiapeng Chong <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: http://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4 |
|
| #
30f89e52 |
| 02-Nov-2022 |
Juergen Gross <[email protected]> |
x86/cacheinfo: Switch cache_ap_init() to hotplug callback
Instead of explicitly calling cache_ap_init() in identify_secondary_cpu() use a CPU hotplug callback instead. By registering the callback on
x86/cacheinfo: Switch cache_ap_init() to hotplug callback
Instead of explicitly calling cache_ap_init() in identify_secondary_cpu() use a CPU hotplug callback instead. By registering the callback only after having started the non-boot CPUs and initializing cache_aps_delayed_init with "true", calling set_cache_aps_delayed_init() at boot time can be dropped.
It should be noted that this change results in cache_ap_init() being called a little bit later when hotplugging CPUs. By using a new hotplug slot right at the start of the low level bringup this is not problematic, as no operations requiring a specific caching mode are performed that early in CPU initialization.
Suggested-by: Borislav Petkov <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]>
show more ...
|
| #
adfe7512 |
| 02-Nov-2022 |
Juergen Gross <[email protected]> |
x86: Decouple PAT and MTRR handling
Today, PAT is usable only with MTRR being active, with some nasty tweaks to make PAT usable when running as a Xen PV guest which doesn't support MTRR.
The reason
x86: Decouple PAT and MTRR handling
Today, PAT is usable only with MTRR being active, with some nasty tweaks to make PAT usable when running as a Xen PV guest which doesn't support MTRR.
The reason for this coupling is that both PAT MSR changes and MTRR changes require a similar sequence and so full PAT support was added using the already available MTRR handling.
Xen PV PAT handling can work without MTRR, as it just needs to consume the PAT MSR setting done by the hypervisor without the ability and need to change it. This in turn has resulted in a convoluted initialization sequence and wrong decisions regarding cache mode availability due to misguiding PAT availability flags.
Fix all of that by allowing to use PAT without MTRR and by reworking the current PAT initialization sequence to match better with the newly introduced generic cache initialization.
This removes the need of the recently added pat_force_disabled flag, so remove the remnants of the patch adding it.
Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]>
show more ...
|
| #
0b9a6a8b |
| 02-Nov-2022 |
Juergen Gross <[email protected]> |
x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()
Instead of having a stop_machine() handler for either a specific MTRR register or all state at once, add a handler just for calli
x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()
Instead of having a stop_machine() handler for either a specific MTRR register or all state at once, add a handler just for calling cache_cpu_init() if appropriate.
Add functions for calling stop_machine() with this handler as well.
Add a generic replacement for mtrr_bp_restore() and a wrapper for mtrr_bp_init().
Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]>
show more ...
|
| #
955d0e08 |
| 02-Nov-2022 |
Juergen Gross <[email protected]> |
x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init
In order to prepare decoupling MTRR and PAT replace the MTRR-specific mtrr_aps_delayed_init flag with a more generic cache_aps_dela
x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init
In order to prepare decoupling MTRR and PAT replace the MTRR-specific mtrr_aps_delayed_init flag with a more generic cache_aps_delayed_init one.
Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]>
show more ...
|
| #
7d71db53 |
| 02-Nov-2022 |
Juergen Gross <[email protected]> |
x86/mtrr: Disentangle MTRR init from PAT init
Add a main cache_cpu_init() init routine which initializes MTRR and/or PAT support depending on what has been detected on the system.
Leave the MTRR-sp
x86/mtrr: Disentangle MTRR init from PAT init
Add a main cache_cpu_init() init routine which initializes MTRR and/or PAT support depending on what has been detected on the system.
Leave the MTRR-specific initialization in a MTRR-specific init function where the smp_changes_mask setting happens now with caches disabled.
This global mask update was done with caches enabled before probably because atomic operations while running uncached might have been quite expensive.
But since only systems with a broken BIOS should ever require to set any bit in smp_changes_mask, hurting those devices with a penalty of a few microseconds during boot shouldn't be a real issue.
Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]>
show more ...
|
| #
23a63e36 |
| 02-Nov-2022 |
Juergen Gross <[email protected]> |
x86/mtrr: Move cache control code to cacheinfo.c
Prepare making PAT and MTRR support independent from each other by moving some code needed by both out of the MTRR-specific sources.
[ bp: Massage
x86/mtrr: Move cache control code to cacheinfo.c
Prepare making PAT and MTRR support independent from each other by moving some code needed by both out of the MTRR-specific sources.
[ bp: Massage commit message. ]
Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]>
show more ...
|
| #
45fa71f1 |
| 02-Nov-2022 |
Juergen Gross <[email protected]> |
x86/mtrr: Replace use_intel() with a local flag
In MTRR code use_intel() is only used in one source file, and the relevant use_intel_if member of struct mtrr_ops is set only in generic_mtrr_ops.
Re
x86/mtrr: Replace use_intel() with a local flag
In MTRR code use_intel() is only used in one source file, and the relevant use_intel_if member of struct mtrr_ops is set only in generic_mtrr_ops.
Replace use_intel() with a single flag in cacheinfo.c which can be set when assigning generic_mtrr_ops to mtrr_if. This allows to drop use_intel_if from mtrr_ops, while preparing to decouple PAT from MTRR. As another preparation for the PAT/MTRR decoupling use a bit for MTRR control and one for PAT control. For now set both bits together, this can be changed later.
As the new flag will be set only if mtrr_enabled is set, the test for mtrr_enabled can be dropped at some places.
[ bp: Massage commit message. ]
Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5 |
|
| #
adbcaef8 |
| 02-Jul-2022 |
Sander Vanheule <[email protected]> |
x86/cacheinfo: move shared cache map definitions
Patch series "cpumask: Fix invalid uniprocessor assumptions", v4.
On uniprocessor builds, it is currently assumed that any cpumask will contain the
x86/cacheinfo: move shared cache map definitions
Patch series "cpumask: Fix invalid uniprocessor assumptions", v4.
On uniprocessor builds, it is currently assumed that any cpumask will contain the single CPU: cpu0. This assumption is used to provide optimised implementations.
The current assumption also appears to be wrong, by ignoring the fact that users can provide empty cpumasks. This can result in bugs as explained in [1] - for_each_cpu() will run one iteration of the loop even when passed an empty cpumask.
This series introduces some basic tests, and updates the optimisations for uniprocessor builds.
The x86 patch was written after the kernel test robot [2] ran into a failed build. I have tried to list the files potentially affected by the changes to cpumask.h, in an attempt to find any other cases that fail on !SMP. I've gone through some of the files manually, and ran a few cross builds, but nothing else popped up. I (build) checked about half of the potientally affected files, but I do not have the resources to do them all. I hope we can fix other issues if/when they pop up later.
[1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/
This patch (of 5):
The maps to keep track of shared caches between CPUs on SMP systems are declared in asm/smp.h, among them specifically cpu_llc_shared_map. These maps are externally defined in cpu/smpboot.c. The latter is only compiled on CONFIG_SMP=y, which means the declared extern symbols from asm/smp.h do not have a corresponding definition on uniprocessor builds.
The inline cpu_llc_shared_mask() function from asm/smp.h refers to the map declaration mentioned above. This function is referenced in cacheinfo.c inside for_each_cpu() loop macros, to provide cpumask for the loop. On uniprocessor builds, the symbol for the cpu_llc_shared_map does not exist. However, the current implementation of for_each_cpu() also (wrongly) ignores the provided mask.
By sheer luck, the compiler thus optimises out this unused reference to cpu_llc_shared_map, and the linker therefore does not require the cpu_llc_shared_mask to actually exist on uniprocessor builds. Only on SMP bulids does smpboot.o exist to provide the required symbols.
To no longer rely on compiler optimisations for successful uniprocessor builds, move the definitions of cpu_llc_shared_map and cpu_l2c_shared_map from smpboot.c to cacheinfo.c.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/e8167ddb570f56744a3dc12c2149a660a324d969.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Marco Elver <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Yury Norov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3 |
|
| #
66558b73 |
| 24-Sep-2021 |
Tim Chen <[email protected]> |
sched: Add cluster scheduler level for x86
There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is shared among a cluster of cores instead of being exclusive to one single core.
To pre
sched: Add cluster scheduler level for x86
There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is shared among a cluster of cores instead of being exclusive to one single core.
To prevent oversubscription of L2 cache, load should be balanced between such L2 clusters, especially for tasks with no shared data. On benchmark such as SPECrate mcf test, this change provides a boost to performance especially on medium load system on Jacobsville. on a Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4 cores each, the benchmark number is as follow:
Improvement over baseline kernel for mcf_r copies run time base rate 1 -0.1% -0.2% 6 25.1% 25.1% 12 18.8% 19.0% 24 0.3% 0.3%
So this looks pretty good. In terms of the system's task distribution, some pretty bad clumping can be seen for the vanilla kernel without the L2 cluster domain for the 6 and 12 copies case. With the extra domain for cluster, the load does get evened out between the clusters.
Note this patch isn't an universal win as spreading isn't necessarily a win, particually for those workload who can benefit from packing.
Signed-off-by: Tim Chen <[email protected]> Signed-off-by: Barry Song <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.15-rc2, v5.15-rc1 |
|
| #
4b92d4ad |
| 31-Aug-2021 |
Thomas Gleixner <[email protected]> |
drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework to ensure that the cache related functions are called
drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework to ensure that the cache related functions are called on the upcoming CPU because the notifier itself could run on any online CPU.
The hotplug state machine guarantees that the callbacks are invoked on the upcoming CPU. So there is no need to have this SMP function call obfuscation. That indirection was missed when the hotplug notifiers were converted.
This also solves the problem of ARM64 init_cache_level() invoking ACPI functions which take a semaphore in that context. That's invalid as SMP function calls run with interrupts disabled. Running it just from the callback in context of the CPU hotplug thread solves this.
Fixes: 8571890e1513 ("arm64: Add support for ACPI based firmware tables") Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Guenter Roeck <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
show more ...
|
|
Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6 |
|
| #
dda451f3 |
| 31-Mar-2021 |
Yang Li <[email protected]> |
x86/cacheinfo: Remove unneeded dead-store initialization
$ make CC=clang clang-analyzer
(needs clang-tidy installed on the system too)
on x86_64 defconfig triggers:
arch/x86/kernel/cpu/cacheinf
x86/cacheinfo: Remove unneeded dead-store initialization
$ make CC=clang clang-analyzer
(needs clang-tidy installed on the system too)
on x86_64 defconfig triggers:
arch/x86/kernel/cpu/cacheinfo.c:880:24: warning: Value stored to 'this_cpu_ci' \ during its initialization is never read [clang-analyzer-deadcode.DeadStores] struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); ^ arch/x86/kernel/cpu/cacheinfo.c:880:24: note: Value stored to 'this_cpu_ci' \ during its initialization is never read
So simply remove this unneeded dead-store initialization.
As compilers will detect this unneeded assignment and optimize this anyway the resulting object code is identical before and after this change.
No functional change. No change to object code.
[ bp: Massage commit message. ]
Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|