History log of /linux-6.15/arch/x86/kernel/cpu/cacheinfo.c (Results 1 – 25 of 37)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6
# 6309ff98 04-Mar-2025 Ahmed S. Darwish <[email protected]>

x86/cacheinfo: Remove unnecessary headers and reorder the rest

Remove the headers at cacheinfo.c that are no longer required.

Alphabetically reorder what remains since more headers will be included

x86/cacheinfo: Remove unnecessary headers and reorder the rest

Remove the headers at cacheinfo.c that are no longer required.

Alphabetically reorder what remains since more headers will be included
in further commits.

Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# b3a756bd 04-Mar-2025 Thomas Gleixner <[email protected]>

x86/cacheinfo: Remove the P4 trace leftovers for real

Commit 851026a2bf54 ("x86/cacheinfo: Remove unused trace variable") removed
the switch case for LVL_TRACE but did not get rid of the surrounding

x86/cacheinfo: Remove the P4 trace leftovers for real

Commit 851026a2bf54 ("x86/cacheinfo: Remove unused trace variable") removed
the switch case for LVL_TRACE but did not get rid of the surrounding gunk.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 8177c6be 04-Mar-2025 Ahmed S. Darwish <[email protected]>

x86/cacheinfo: Validate CPUID leaf 0x2 EDX output

CPUID leaf 0x2 emits one-byte descriptors in its four output registers
EAX, EBX, ECX, and EDX. For these descriptors to be valid, the most
signific

x86/cacheinfo: Validate CPUID leaf 0x2 EDX output

CPUID leaf 0x2 emits one-byte descriptors in its four output registers
EAX, EBX, ECX, and EDX. For these descriptors to be valid, the most
significant bit (MSB) of each register must be clear.

The historical Git commit:

019361a20f016 ("- pre6: Intel: start to add Pentium IV specific stuff (128-byte cacheline etc)...")

introduced leaf 0x2 output parsing. It only validated the MSBs of EAX,
EBX, and ECX, but left EDX unchecked.

Validate EDX's most-significant bit.

Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: "H. Peter Anvin" <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1
# 9677be09 28-Nov-2024 Ricardo Neri <[email protected]>

x86/cacheinfo: Delete global num_cache_leaves

Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".

This is erroneous on systems su

x86/cacheinfo: Delete global num_cache_leaves

Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".

This is erroneous on systems such as Meteor Lake, where each CPU has a
distinct num_leaves value. Delete the global "num_cache_leaves" and
initialize num_leaves on each CPU.

init_cache_level() no longer needs to set num_leaves. Also, it never had to
set num_levels as it is unnecessary in x86. Keep checking for zero cache
leaves. Such condition indicates a bug.

[ bp: Cleanup. ]

Signed-off-by: Ricardo Neri <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: [email protected] # 6.3+
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2
# ffc92cf3 24-Jan-2024 Kirill A. Shutemov <[email protected]>

x86/pat: Simplify the PAT programming protocol

The programming protocol for the PAT MSR follows the MTRR programming
protocol. However, this protocol is cumbersome and requires disabling
caching (CR

x86/pat: Simplify the PAT programming protocol

The programming protocol for the PAT MSR follows the MTRR programming
protocol. However, this protocol is cumbersome and requires disabling
caching (CR0.CD=1), which is not possible on some platforms.

Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
a #VE exception.

It turns out that the requirement to follow the MTRR programming
protocol for PAT programming is unnecessarily strict. The new Intel
Software Developer Manual (http://www.intel.com/sdm) (December 2023)
relaxes this requirement, please refer to the section titled
"Programming the PAT" for more information.

In short, this section provides an alternative PAT update sequence which
doesn't need to disable caches around the PAT update but only to flush
those caches and TLBs.

The AMD documentation does not link PAT programming to MTRR and is there
fore, fine too.

The kernel only needs to flush the TLB after updating the PAT MSR. The
set_memory code already takes care of flushing the TLB and cache when
changing the memory type of a page.

[ bp: Expand commit message. ]

Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 89b0f15f 13-Feb-2024 Thomas Gleixner <[email protected]>

x86/cpu/topology: Get rid of cpuinfo::x86_max_cores

Now that __num_cores_per_package and __num_threads_per_package are
available, cpuinfo::x86_max_cores and the related math all over the place
can b

x86/cpu/topology: Get rid of cpuinfo::x86_max_cores

Now that __num_cores_per_package and __num_threads_per_package are
available, cpuinfo::x86_max_cores and the related math all over the place
can be replaced with the ready to consume data.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# f7fb3b2d 13-Feb-2024 Thomas Gleixner <[email protected]>

x86/cpu: Provide an AMD/HYGON specific topology parser

AMD/HYGON uses various methods for topology evaluation:

- Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb,
which is the p

x86/cpu: Provide an AMD/HYGON specific topology parser

AMD/HYGON uses various methods for topology evaluation:

- Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb,
which is the preferred variant for modern CPUs.

Leaf 0xb will be superseded by leaf 0x80000026 soon, which is just
another variant of the Intel 0x1f leaf for whatever reasons.

- Subleaf 0x80000008 and NODEID_MSR base

- Legacy fallback

That code is following the principle of random bits and pieces all over the
place which results in multiple evaluations and impenetrable code flows in
the same way as the Intel parsing did.

Provide a sane implementation by clearly separating the three variants and
bringing them in the proper preference order in one place.

This provides the parsing for both AMD and HYGON because there is no point
in having a separate HYGON parser which only differs by 3 lines of
code. Any further divergence between AMD and HYGON can be handled in
different functions, while still sharing the existing parsers.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Tested-by: Wang Wendy <[email protected]>
Tested-by: K Prateek Nayak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 7e3ec628 13-Feb-2024 Thomas Gleixner <[email protected]>

x86/cpu/amd: Provide a separate accessor for Node ID

AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This col

x86/cpu/amd: Provide a separate accessor for Node ID

AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.

So provide new storage and new accessors which for now still access die_id
and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are
converted over.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Tested-by: Wang Wendy <[email protected]>
Tested-by: K Prateek Nayak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7
# 6e290323 14-Aug-2023 Thomas Gleixner <[email protected]>

x86/cpu: Move cpu_l[l2]c_id into topology info

The topology IDs which identify the LLC and L2 domains clearly belong to
the per CPU topology information.

Move them into cpuinfo_x86::cpuinfo_topo an

x86/cpu: Move cpu_l[l2]c_id into topology info

The topology IDs which identify the LLC and L2 domains clearly belong to
the per CPU topology information.

Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU
data and the related exports.

This also paves the way to do proper topology evaluation during early boot
because it removes the only per CPU dependency for that.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Reviewed-by: Arjan van de Ven <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 8a169ed4 14-Aug-2023 Thomas Gleixner <[email protected]>

x86/cpu: Move cpu_die_id into topology info

Move the next member.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: So

x86/cpu: Move cpu_die_id into topology info

Move the next member.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 02fb601d 14-Aug-2023 Thomas Gleixner <[email protected]>

x86/cpu: Move phys_proc_id into topology info

Rename it to pkg_id which is the terminology used in the kernel.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by:

x86/cpu: Move phys_proc_id into topology info

Rename it to pkg_id which is the terminology used in the kernel.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# b9655e70 14-Aug-2023 Thomas Gleixner <[email protected]>

x86/cpu: Encapsulate topology information in cpuinfo_x86

The topology related information is randomly scattered across cpuinfo_x86.

Create a new structure cpuinfo_topo and move in a first step init

x86/cpu: Encapsulate topology information in cpuinfo_x86

The topology related information is randomly scattered across cpuinfo_x86.

Create a new structure cpuinfo_topo and move in a first step initial_apicid
and apicid into it.

Aside of being better readable this is in preparation for replacing the
horribly fragile CPU topology evaluation code further down the road.

Consolidate APIC ID fields to u32 as that represents the hardware type.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2
# a32226fa 12-May-2023 Thomas Gleixner <[email protected]>

x86/cpu/cacheinfo: Remove cpu_callout_mask dependency

cpu_callout_mask is used for the stop machine based MTRR/PAT init.

In preparation of moving the BP/AP synchronization to the core hotplug
code,

x86/cpu/cacheinfo: Remove cpu_callout_mask dependency

cpu_callout_mask is used for the stop machine based MTRR/PAT init.

In preparation of moving the BP/AP synchronization to the core hotplug
code, use a private CPU mask for cacheinfo and manage it in the
starting/dying hotplug state.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8
# 851026a2 10-Feb-2023 Borislav Petkov (AMD) <[email protected]>

x86/cacheinfo: Remove unused trace variable

15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the
last use of the trace local var. Remove it too and the useless trace
cache case.

No

x86/cacheinfo: Remove unused trace variable

15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the
last use of the trace local var. Remove it too and the useless trace
cache case.

No functional changes.

Reported-by: Jiapeng Chong <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: http://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4
# 30f89e52 02-Nov-2022 Juergen Gross <[email protected]>

x86/cacheinfo: Switch cache_ap_init() to hotplug callback

Instead of explicitly calling cache_ap_init() in
identify_secondary_cpu() use a CPU hotplug callback instead. By
registering the callback on

x86/cacheinfo: Switch cache_ap_init() to hotplug callback

Instead of explicitly calling cache_ap_init() in
identify_secondary_cpu() use a CPU hotplug callback instead. By
registering the callback only after having started the non-boot CPUs
and initializing cache_aps_delayed_init with "true", calling
set_cache_aps_delayed_init() at boot time can be dropped.

It should be noted that this change results in cache_ap_init() being
called a little bit later when hotplugging CPUs. By using a new
hotplug slot right at the start of the low level bringup this is not
problematic, as no operations requiring a specific caching mode are
performed that early in CPU initialization.

Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


# adfe7512 02-Nov-2022 Juergen Gross <[email protected]>

x86: Decouple PAT and MTRR handling

Today, PAT is usable only with MTRR being active, with some nasty tweaks
to make PAT usable when running as a Xen PV guest which doesn't support
MTRR.

The reason

x86: Decouple PAT and MTRR handling

Today, PAT is usable only with MTRR being active, with some nasty tweaks
to make PAT usable when running as a Xen PV guest which doesn't support
MTRR.

The reason for this coupling is that both PAT MSR changes and MTRR
changes require a similar sequence and so full PAT support was added
using the already available MTRR handling.

Xen PV PAT handling can work without MTRR, as it just needs to consume
the PAT MSR setting done by the hypervisor without the ability and need
to change it. This in turn has resulted in a convoluted initialization
sequence and wrong decisions regarding cache mode availability due to
misguiding PAT availability flags.

Fix all of that by allowing to use PAT without MTRR and by reworking
the current PAT initialization sequence to match better with the newly
introduced generic cache initialization.

This removes the need of the recently added pat_force_disabled flag, so
remove the remnants of the patch adding it.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


# 0b9a6a8b 02-Nov-2022 Juergen Gross <[email protected]>

x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()

Instead of having a stop_machine() handler for either a specific
MTRR register or all state at once, add a handler just for calli

x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()

Instead of having a stop_machine() handler for either a specific
MTRR register or all state at once, add a handler just for calling
cache_cpu_init() if appropriate.

Add functions for calling stop_machine() with this handler as well.

Add a generic replacement for mtrr_bp_restore() and a wrapper for
mtrr_bp_init().

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


# 955d0e08 02-Nov-2022 Juergen Gross <[email protected]>

x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init

In order to prepare decoupling MTRR and PAT replace the MTRR-specific
mtrr_aps_delayed_init flag with a more generic cache_aps_dela

x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init

In order to prepare decoupling MTRR and PAT replace the MTRR-specific
mtrr_aps_delayed_init flag with a more generic cache_aps_delayed_init
one.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


# 7d71db53 02-Nov-2022 Juergen Gross <[email protected]>

x86/mtrr: Disentangle MTRR init from PAT init

Add a main cache_cpu_init() init routine which initializes MTRR and/or
PAT support depending on what has been detected on the system.

Leave the MTRR-sp

x86/mtrr: Disentangle MTRR init from PAT init

Add a main cache_cpu_init() init routine which initializes MTRR and/or
PAT support depending on what has been detected on the system.

Leave the MTRR-specific initialization in a MTRR-specific init function
where the smp_changes_mask setting happens now with caches disabled.

This global mask update was done with caches enabled before probably
because atomic operations while running uncached might have been quite
expensive.

But since only systems with a broken BIOS should ever require to set any
bit in smp_changes_mask, hurting those devices with a penalty of a few
microseconds during boot shouldn't be a real issue.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


# 23a63e36 02-Nov-2022 Juergen Gross <[email protected]>

x86/mtrr: Move cache control code to cacheinfo.c

Prepare making PAT and MTRR support independent from each other by
moving some code needed by both out of the MTRR-specific sources.

[ bp: Massage

x86/mtrr: Move cache control code to cacheinfo.c

Prepare making PAT and MTRR support independent from each other by
moving some code needed by both out of the MTRR-specific sources.

[ bp: Massage commit message. ]

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


# 45fa71f1 02-Nov-2022 Juergen Gross <[email protected]>

x86/mtrr: Replace use_intel() with a local flag

In MTRR code use_intel() is only used in one source file, and the
relevant use_intel_if member of struct mtrr_ops is set only in
generic_mtrr_ops.

Re

x86/mtrr: Replace use_intel() with a local flag

In MTRR code use_intel() is only used in one source file, and the
relevant use_intel_if member of struct mtrr_ops is set only in
generic_mtrr_ops.

Replace use_intel() with a single flag in cacheinfo.c which can be
set when assigning generic_mtrr_ops to mtrr_if. This allows to drop
use_intel_if from mtrr_ops, while preparing to decouple PAT from MTRR.
As another preparation for the PAT/MTRR decoupling use a bit for MTRR
control and one for PAT control. For now set both bits together, this
can be changed later.

As the new flag will be set only if mtrr_enabled is set, the test for
mtrr_enabled can be dropped at some places.

[ bp: Massage commit message. ]

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


Revision tags: v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5
# adbcaef8 02-Jul-2022 Sander Vanheule <[email protected]>

x86/cacheinfo: move shared cache map definitions

Patch series "cpumask: Fix invalid uniprocessor assumptions", v4.

On uniprocessor builds, it is currently assumed that any cpumask will
contain the

x86/cacheinfo: move shared cache map definitions

Patch series "cpumask: Fix invalid uniprocessor assumptions", v4.

On uniprocessor builds, it is currently assumed that any cpumask will
contain the single CPU: cpu0. This assumption is used to provide
optimised implementations.

The current assumption also appears to be wrong, by ignoring the fact that
users can provide empty cpumasks. This can result in bugs as explained in
[1] - for_each_cpu() will run one iteration of the loop even when passed
an empty cpumask.

This series introduces some basic tests, and updates the optimisations for
uniprocessor builds.

The x86 patch was written after the kernel test robot [2] ran into a
failed build. I have tried to list the files potentially affected by the
changes to cpumask.h, in an attempt to find any other cases that fail on
!SMP. I've gone through some of the files manually, and ran a few cross
builds, but nothing else popped up. I (build) checked about half of the
potientally affected files, but I do not have the resources to do them
all. I hope we can fix other issues if/when they pop up later.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/


This patch (of 5):

The maps to keep track of shared caches between CPUs on SMP systems are
declared in asm/smp.h, among them specifically cpu_llc_shared_map. These
maps are externally defined in cpu/smpboot.c. The latter is only compiled
on CONFIG_SMP=y, which means the declared extern symbols from asm/smp.h do
not have a corresponding definition on uniprocessor builds.

The inline cpu_llc_shared_mask() function from asm/smp.h refers to the map
declaration mentioned above. This function is referenced in cacheinfo.c
inside for_each_cpu() loop macros, to provide cpumask for the loop. On
uniprocessor builds, the symbol for the cpu_llc_shared_map does not exist.
However, the current implementation of for_each_cpu() also (wrongly)
ignores the provided mask.

By sheer luck, the compiler thus optimises out this unused reference to
cpu_llc_shared_map, and the linker therefore does not require the
cpu_llc_shared_mask to actually exist on uniprocessor builds. Only on SMP
bulids does smpboot.o exist to provide the required symbols.

To no longer rely on compiler optimisations for successful uniprocessor
builds, move the definitions of cpu_llc_shared_map and cpu_l2c_shared_map
from smpboot.c to cacheinfo.c.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/e8167ddb570f56744a3dc12c2149a660a324d969.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Yury Norov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3
# 66558b73 24-Sep-2021 Tim Chen <[email protected]>

sched: Add cluster scheduler level for x86

There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
shared among a cluster of cores instead of being exclusive to one
single core.

To pre

sched: Add cluster scheduler level for x86

There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
shared among a cluster of cores instead of being exclusive to one
single core.

To prevent oversubscription of L2 cache, load should be balanced
between such L2 clusters, especially for tasks with no shared data.
On benchmark such as SPECrate mcf test, this change provides a boost
to performance especially on medium load system on Jacobsville. on a
Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4
cores each, the benchmark number is as follow:

Improvement over baseline kernel for mcf_r
copies run time base rate
1 -0.1% -0.2%
6 25.1% 25.1%
12 18.8% 19.0%
24 0.3% 0.3%

So this looks pretty good. In terms of the system's task distribution,
some pretty bad clumping can be seen for the vanilla kernel without
the L2 cluster domain for the 6 and 12 copies case. With the extra
domain for cluster, the load does get evened out between the clusters.

Note this patch isn't an universal win as spreading isn't necessarily
a win, particually for those workload who can benefit from packing.

Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: Barry Song <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.15-rc2, v5.15-rc1
# 4b92d4ad 31-Aug-2021 Thomas Gleixner <[email protected]>

drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()

DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework
to ensure that the cache related functions are called

drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()

DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework
to ensure that the cache related functions are called on the upcoming CPU
because the notifier itself could run on any online CPU.

The hotplug state machine guarantees that the callbacks are invoked on the
upcoming CPU. So there is no need to have this SMP function call
obfuscation. That indirection was missed when the hotplug notifiers were
converted.

This also solves the problem of ARM64 init_cache_level() invoking ACPI
functions which take a semaphore in that context. That's invalid as SMP
function calls run with interrupts disabled. Running it just from the
callback in context of the CPU hotplug thread solves this.

Fixes: 8571890e1513 ("arm64: Add support for ACPI based firmware tables")
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx

show more ...


Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6
# dda451f3 31-Mar-2021 Yang Li <[email protected]>

x86/cacheinfo: Remove unneeded dead-store initialization

$ make CC=clang clang-analyzer

(needs clang-tidy installed on the system too)

on x86_64 defconfig triggers:

arch/x86/kernel/cpu/cacheinf

x86/cacheinfo: Remove unneeded dead-store initialization

$ make CC=clang clang-analyzer

(needs clang-tidy installed on the system too)

on x86_64 defconfig triggers:

arch/x86/kernel/cpu/cacheinfo.c:880:24: warning: Value stored to 'this_cpu_ci' \
during its initialization is never read [clang-analyzer-deadcode.DeadStores]
struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
^
arch/x86/kernel/cpu/cacheinfo.c:880:24: note: Value stored to 'this_cpu_ci' \
during its initialization is never read

So simply remove this unneeded dead-store initialization.

As compilers will detect this unneeded assignment and optimize this
anyway the resulting object code is identical before and after this
change.

No functional change. No change to object code.

[ bp: Massage commit message. ]

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


12