History log of /linux-6.15/arch/powerpc/platforms/pseries/hotplug-cpu.c (Results 1 – 25 of 94)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3
# 6142be7e 10-Oct-2024 Thomas Weißschuh <[email protected]>

powerpc: Split systemcfg struct definitions out from vdso

The systemcfg data has nothing to do anymore with the vdso.
Split it into a dedicated header file.

Signed-off-by: Thomas Weißschuh <thomas.

powerpc: Split systemcfg struct definitions out from vdso

The systemcfg data has nothing to do anymore with the vdso.
Split it into a dedicated header file.

Signed-off-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 1184674d 10-Oct-2024 Thomas Weißschuh <[email protected]>

powerpc: Split systemcfg data out of vdso data page

The systemcfg data only has minimal overlap with the vdso data.
Splitting the two avoids mapping the implementation-defined vdso data
into /proc/p

powerpc: Split systemcfg data out of vdso data page

The systemcfg data only has minimal overlap with the vdso data.
Splitting the two avoids mapping the implementation-defined vdso data
into /proc/ppc64/systemcfg.
It is also a preparation for the standardization of vdso data storage.

The only field actually used by both systemcfg and vdso is
tb_ticks_per_sec and it is only changed once during time_init().
Initialize it in both structures there.

Signed-off-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# b76e0d42 22-Aug-2024 Haren Myneni <[email protected]>

powerpc/pseries: Use correct data types from pseries_hp_errorlog struct

_be32 type is defined for some elements in pseries_hp_errorlog
struct but also used them u32 after be32_to_cpu() conversion.

powerpc/pseries: Use correct data types from pseries_hp_errorlog struct

_be32 type is defined for some elements in pseries_hp_errorlog
struct but also used them u32 after be32_to_cpu() conversion.

Example: In handle_dlpar_errorlog()
hp_elog->_drc_u.drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);

And later assigned to u32 type
dlpar_cpu() - u32 drc_index = hp_elog->_drc_u.drc_index;

This incorrect usage is giving the following warnings and the
patch resolve these warnings with the correct assignment.

arch/powerpc/platforms/pseries/dlpar.c:398:53: sparse: sparse:
incorrect type in argument 1 (different base types) @@
expected unsigned int [usertype] drc_index @@
got restricted __be32 [usertype] drc_index @@
...
arch/powerpc/platforms/pseries/dlpar.c:418:43: sparse: sparse:
incorrect type in assignment (different base types) @@
expected restricted __be32 [usertype] drc_count @@
got unsigned int [usertype] @@

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Haren Myneni <[email protected]>

v3:
- Fix warnings from using incorrect data types in pseries_hp_errorlog
struct
v2:
- Remove pr_info() and TODO comments
- Update more information in the commit logs

Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1
# d1099e22 05-Jul-2023 Michael Ellerman <[email protected]>

powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

Integrate with the generic SMT support, so that when a CPU is DLPAR
onlined it is brought up with the correct SMT mode.

Signed-off

powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

Integrate with the generic SMT support, so that when a CPU is DLPAR
onlined it is brought up with the correct SMT mode.

Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

show more ...


# 3b3a4d0f 05-Jul-2023 Michael Ellerman <[email protected]>

powerpc/pseries: Initialise CPU hotplug callbacks earlier

As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel

powerpc/pseries: Initialise CPU hotplug callbacks earlier

As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel
command line.

The way that is implemented is the secondary threads are brought partly
online, and then taken back offline again. That is done to support x86
CPUs needing certain initialisation done on all threads. However powerpc
has similar needs, see commit d70a54e2d085 ("powerpc/powernv: Ignore
smt-enabled on Power8 and later").

For that to work the powerpc CPU hotplug callbacks need to be registered
before secondary CPUs are brought online, otherwise __cpu_disable()
fails due to smp_ops->cpu_disable being NULL.

So split the basic initialisation into pseries_cpu_hotplug_init() which
can be called early from setup_arch(). The DLPAR related initialisation
can still be done later, because it needs to do allocations.

Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

show more ...


Revision tags: v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2
# 857d423c 10-Mar-2023 Rob Herring <[email protected]>

powerpc: Use of_property_present() for testing DT property presence

It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_prope

powerpc: Use of_property_present() for testing DT property presence

It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.

Signed-off-by: Rob Herring <[email protected]>
[mpe: Drop change in ppc4xx_probe_pci_bridge(), formatting]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

show more ...


Revision tags: v6.3-rc1, v6.2, v6.2-rc8
# 08273c9f 10-Feb-2023 Nathan Lynch <[email protected]>

powerpc/rtas: arch-wide function token lookup conversions

With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts

powerpc/rtas: arch-wide function token lookup conversions

With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts, there is no need to use rtas_token() or cache its result.

Most conversions are trivial, but a few are worth describing in more
detail:

* Error injection token comparisons for lockdown purposes are
consolidated into a simple predicate: token_is_restricted_errinjct().

* A couple of special cases in block_rtas_call() do not use
rtas_token() but perform string comparisons against names in the
function table. These are converted to compare against token values
instead, which is logically equivalent but less expensive.

* The lookup for the ibm,os-term token can be deferred until needed,
instead of caching it at boot to avoid device tree traversal during
panic.

* Since rtas_function_token() accesses a read-only data structure
without taking any locks, xmon's lookup of set-indicator can be
performed as needed instead of cached at startup.

Signed-off-by: Nathan Lynch <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6
# f6aa37c5 14-Nov-2022 Laurent Dufour <[email protected]>

powerpc/pseries: unregister VPA when hot unplugging a CPU

The VPA should unregister when offlining a CPU. Otherwise there could be
a short window where 2 CPUs could share the same VPA.

This happens

powerpc/pseries: unregister VPA when hot unplugging a CPU

The VPA should unregister when offlining a CPU. Otherwise there could be
a short window where 2 CPUs could share the same VPA.

This happens because the hypervisor is still keeping the VPA attached to
the vCPU even if it became offline.

Here is a potential situation:
1. remove proc A,
2. add proc B. If proc B gets proc A's place in cpu_present_mask, then
it registers proc A's VPAs.
3. If proc B is then re-added to the LP, its threads are sharing VPAs
with proc A briefly as they come online.

As the hypervisor may check for the VPA's yield_count field oddity, it
may detect an unexpected value and kill the LPAR.

Suggested-by: Nathan Lynch <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Reviewed-by: Nathan Lynch <[email protected]>
[mpe: s/cpu_present_map/cpu_present_mask/ in change log]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4
# 6ec4836f 21-Jun-2022 Liang He <[email protected]>

powerpc/pseries: Add missing of_node_put()s in hotplug-cpu.c

In pseries_cpuhp_cache_use_count() and pseries_cpuhp_detach_nodes(),
we need carefully hold the reference returned by
of_find_next_cache_

powerpc/pseries: Add missing of_node_put()s in hotplug-cpu.c

In pseries_cpuhp_cache_use_count() and pseries_cpuhp_detach_nodes(),
we need carefully hold the reference returned by
of_find_next_cache_node() and use it to call of_node_put() to keep
refcount balance.

Signed-off-by: Liang He <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3
# 48b63961 11-Apr-2022 Oscar Salvador <[email protected]>

powerpc/numa: Associate numa node to its cpu earlier

powerpc is the only platform that do not rely on
cpu_up()->try_online_node() to bring up a numa node,
and special cases it, instead, deep in its

powerpc/numa: Associate numa node to its cpu earlier

powerpc is the only platform that do not rely on
cpu_up()->try_online_node() to bring up a numa node,
and special cases it, instead, deep in its own machinery:

dlpar_online_cpu
find_and_online_cpu_nid
try_online_node

This should not be needed, but the thing is that the try_online_node()
from cpu_up() will not apply on the right node, because cpu_to_node()
will return the old mapping numa<->cpu that gets set on boot stage
for all possible cpus.

That can be seen easily if we try to print out the numa node passed
to try_online_node() in cpu_up().

The thing is that the numa<->cpu mapping does not get updated till a much
later stage in start_secondary:

start_secondary:
set_numa_node(numa_cpu_lookup_table[cpu])

But we do not really care, as we already now the
CPU <-> NUMA associativity back in find_and_online_cpu_nid(),
so let us make use of that and set the proper numa<->cpu mapping,
so cpu_to_node() in cpu_up() returns the right node and
try_online_node() can do its work.

Signed-off-by: Oscar Salvador <[email protected]>
Tested-by: Geetika Moolchandani <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1
# 3b54c715 05-Nov-2021 Nicholas Piggin <[email protected]>

powerpc/pseries: use slab context cpumask allocation in CPU hotplug init

Slab is up at this point, using the bootmem allocator triggers a
warning. Switch to using the regular cpumask allocator.

Sig

powerpc/pseries: use slab context cpumask allocation in CPU hotplug init

Slab is up at this point, using the bootmem allocator triggers a
warning. Switch to using the regular cpumask allocator.

Signed-off-by: Nicholas Piggin <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Reviewed-by: Nathan Lynch <[email protected]>
Reviewed-by: Laurent Dufour <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4
# f9473a65 27-Sep-2021 Nathan Lynch <[email protected]>

powerpc/pseries/cpuhp: remove obsolete comment from pseries_cpu_die

This comment likely refers to the obsolete DLPAR workflow where some
resource state transitions were driven more directly from use

powerpc/pseries/cpuhp: remove obsolete comment from pseries_cpu_die

This comment likely refers to the obsolete DLPAR workflow where some
resource state transitions were driven more directly from user space
utilities, but it also seems to contradict itself: "Change isolate state to
Isolate [...]" is at odds with the preceding sentences, and it does not
relate at all to the code that follows.

Remove it to prevent confusion.

Signed-off-by: Nathan Lynch <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# fa2a5dfe 27-Sep-2021 Nathan Lynch <[email protected]>

powerpc/pseries/cpuhp: delete add/remove_by_count code

The core DLPAR code supports two actions (add and remove) and three
subtypes of action:

* By DRC index: the action is attempted on a single sp

powerpc/pseries/cpuhp: delete add/remove_by_count code

The core DLPAR code supports two actions (add and remove) and three
subtypes of action:

* By DRC index: the action is attempted on a single specified resource.
This is the usual case for processors.
* By indexed count: the action is attempted on a range of resources
beginning at the specified index. This is implemented only by the memory
DLPAR code.
* By count: the lower layer (CPU or memory) is responsible for locating the
specified number of resources to which the action can be applied.

I cannot find any evidence of the "by count" subtype being used by drmgr or
qemu for processors. And when I try to exercise this code, the add case
does not work:

$ ppc64_cpu --smt ; nproc
SMT=8
24
$ printf "cpu remove count 2" > /sys/kernel/dlpar
$ nproc
8
$ printf "cpu add count 2" > /sys/kernel/dlpar
-bash: printf: write error: Invalid argument
$ dmesg | tail -2
pseries-hotplug-cpu: Failed to find enough CPUs (1 of 2) to add
dlpar: Could not handle DLPAR request "cpu add count 2"
$ nproc
8
$ drmgr -c cpu -a -q 2 # this uses the by-index method
Validating CPU DLPAR capability...yes.
CPU 1
CPU 17
$ nproc
24

This is because find_drc_info_cpus_to_add() does not increment drc_index
appropriately during its search.

This is not hard to fix. But the _by_count() functions also have the
property that they attempt to roll back all prior operations if the entire
request cannot be satisfied, even though the rollback itself can encounter
errors. It's not possible to provide transaction-like behavior at this
level, and it's undesirable to have code that can only pretend to do that.
Any users of these functions cannot know what the state of the system is in
the error case. And the error paths are, to my knowledge, impossible to
test without adding custom error injection code.

Summary:

* This code has not worked reliably since its introduction.
* There is no evidence that it is used.
* It contains questionable rollback behaviors in error paths which are
difficult to test.

So let's remove it.

Fixes: ac71380071d1 ("powerpc/pseries: Add CPU dlpar remove functionality")
Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality")
Fixes: b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR support for drc-info property")
Signed-off-by: Nathan Lynch <[email protected]>
Tested-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 7edd5c9a 27-Sep-2021 Nathan Lynch <[email protected]>

powerpc/pseries/cpuhp: cache node corrections

On pseries, cache nodes in the device tree can be added and removed by the
CPU DLPAR code as well as the partition migration (mobility) code. PowerVM
pa

powerpc/pseries/cpuhp: cache node corrections

On pseries, cache nodes in the device tree can be added and removed by the
CPU DLPAR code as well as the partition migration (mobility) code. PowerVM
partitions in dedicated processor mode typically have L2 and L3 cache
nodes.

The CPU DLPAR code has the following shortcomings:

* Cache nodes returned as siblings of a new CPU node by
ibm,configure-connector are silently discarded; only the CPU node is
added to the device tree.

* Cache nodes which become unreferenced in the processor removal path are
not removed from the device tree. This can lead to duplicate nodes when
the post-migration device tree update code replaces cache nodes.

This is long-standing behavior. Presumably it has gone mostly unnoticed
because the two bugs have the property of obscuring each other in common
simple scenarios (e.g. remove a CPU and add it back). Likely you'd notice
only if you cared to inspect the device tree or the sysfs cacheinfo
information.

Booted with two processors:

$ pwd
/sys/firmware/devicetree/base/cpus
$ ls -1d */
l2-cache@2010/
l2-cache@2011/
l3-cache@3110/
l3-cache@3111/
PowerPC,POWER9@0/
PowerPC,POWER9@8/
$ lsprop */l2-cache
l2-cache@2010/l2-cache
00003110 (12560)
l2-cache@2011/l2-cache
00003111 (12561)
PowerPC,POWER9@0/l2-cache
00002010 (8208)
PowerPC,POWER9@8/l2-cache
00002011 (8209)
$ ls /sys/devices/system/cpu/cpu0/cache/
index0 index1 index2 index3

After DLPAR-adding PowerPC,POWER9@10, we see that its associated cache
nodes are absent, its threads' L2+L3 cacheinfo is unpopulated, and it is
missing a cache level in its sched domain hierarchy:

$ ls -1d */
l2-cache@2010/
l2-cache@2011/
l3-cache@3110/
l3-cache@3111/
PowerPC,POWER9@0/
PowerPC,POWER9@10/
PowerPC,POWER9@8/
$ lsprop PowerPC\,POWER9@10/l2-cache
PowerPC,POWER9@10/l2-cache
00002012 (8210)
$ ls /sys/devices/system/cpu/cpu16/cache/
index0 index1
$ grep . /sys/kernel/debug/sched/domains/cpu{0,8,16}/domain*/name
/sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT
/sys/kernel/debug/sched/domains/cpu0/domain1/name:CACHE
/sys/kernel/debug/sched/domains/cpu0/domain2/name:DIE
/sys/kernel/debug/sched/domains/cpu8/domain0/name:SMT
/sys/kernel/debug/sched/domains/cpu8/domain1/name:CACHE
/sys/kernel/debug/sched/domains/cpu8/domain2/name:DIE
/sys/kernel/debug/sched/domains/cpu16/domain0/name:SMT
/sys/kernel/debug/sched/domains/cpu16/domain1/name:DIE

When removing PowerPC,POWER9@8, we see that its cache nodes are left
behind:

$ ls -1d */
l2-cache@2010/
l2-cache@2011/
l3-cache@3110/
l3-cache@3111/
PowerPC,POWER9@0/

When DLPAR is combined with VM migration, we can get duplicate nodes. E.g.
removing one processor, then migrating, adding a processor, and then
migrating again can result in warnings from the OF core during
post-migration device tree updates:

Duplicate name in cpus, renamed to "l2-cache@2011#1"
Duplicate name in cpus, renamed to "l3-cache@3111#1"

and nodes with duplicated phandles in the tree, making lookup behavior
unpredictable:

$ lsprop l[23]-cache@*/ibm,phandle
l2-cache@2010/ibm,phandle
00002010 (8208)
l2-cache@2011#1/ibm,phandle
00002011 (8209)
l2-cache@2011/ibm,phandle
00002011 (8209)
l3-cache@3110/ibm,phandle
00003110 (12560)
l3-cache@3111#1/ibm,phandle
00003111 (12561)
l3-cache@3111/ibm,phandle
00003111 (12561)

Address these issues by:

* Correctly processing siblings of the node returned from
dlpar_configure_connector().
* Removing cache nodes in the CPU remove path when it can be determined
that they are not associated with other CPUs or caches.

Use the of_changeset API in both cases, which allows us to keep the error
handling in this code from becoming more complex while ensuring that the
device tree cannot become inconsistent.

Fixes: ac71380071d1 ("powerpc/pseries: Add CPU dlpar remove functionality")
Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality")
Signed-off-by: Nathan Lynch <[email protected]>
Tested-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7
# 8b893ef1 16-Aug-2021 Michael Ellerman <[email protected]>

powerpc/pseries: Fix build error when NUMA=n

As reported by lkp, if NUMA=n we see a build error:

arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 'pseries_cpu_hotplug_init':
arch/pow

powerpc/pseries: Fix build error when NUMA=n

As reported by lkp, if NUMA=n we see a build error:

arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 'pseries_cpu_hotplug_init':
arch/powerpc/platforms/pseries/hotplug-cpu.c:1022:8: error: 'node_to_cpumask_map' undeclared
1022 | node_to_cpumask_map[node]);

Use cpumask_of_node() which has an empty stub for NUMA=n, and when
NUMA=y does a lookup from node_to_cpumask_map[].

Fixes: bd1dd4c5f528 ("powerpc/pseries: Prevent free CPU ids being reused on another node")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.14-rc6
# 8ddc6448 12-Aug-2021 Aneesh Kumar K.V <[email protected]>

powerpc/pseries: Consolidate different NUMA distance update code paths

The associativity details of the newly added resourced are collected from
the hypervisor via "ibm,configure-connector" rtas cal

powerpc/pseries: Consolidate different NUMA distance update code paths

The associativity details of the newly added resourced are collected from
the hypervisor via "ibm,configure-connector" rtas call. Update the numa
distance details of the newly added numa node after the above call.

Instead of updating NUMA distance every time we lookup a node id
from the associativity property, add helpers that can be used
during boot which does this only once. Also remove the distance
update from node id lookup helpers.

Currently, we duplicate parsing code for ibm,associativity and
ibm,associativity-lookup-arrays in the kernel. The associativity array provided
by these device tree properties are very similar and hence can use
a helper to parse the node id and numa distance details.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1
# bd1dd4c5 29-Apr-2021 Laurent Dufour <[email protected]>

powerpc/pseries: Prevent free CPU ids being reused on another node

When a CPU is hot added, the CPU ids are taken from the available mask
from the lower possible set. If that set of values was previ

powerpc/pseries: Prevent free CPU ids being reused on another node

When a CPU is hot added, the CPU ids are taken from the available mask
from the lower possible set. If that set of values was previously used
for a CPU attached to a different node, it appears to an application as
if these CPUs have migrated from one node to another node which is not
expected.

To prevent this, it is needed to record the CPU ids used for each node
and to not reuse them on another node. However, to prevent CPU hot plug
to fail, in the case the CPU ids is starved on a node, the capability to
reuse other nodes’ free CPU ids is kept. A warning is displayed in such
a case to warn the user.

A new CPU bit mask (node_recorded_ids_map) is introduced for each
possible node. It is populated with the CPU onlined at boot time, and
then when a CPU is hot plugged to a node. The bits in that mask remain
when the CPU is hot unplugged, to remind this CPU ids have been used for
this node.

If no id set was found, a retry is made without removing the ids used on
the other nodes to try reusing them. This is the way ids have been
allocated prior to this patch.

The effect of this patch can be seen by removing and adding CPUs using
the Qemu monitor. In the following case, the first CPU from the node 2
is removed, then the first one from the node 1 is removed too. Later,
the first CPU of the node 2 is added back. Without that patch, the
kernel will number these CPUs using the first CPU ids available which
are the ones freed when removing the second CPU of the node 0. This
leads to the CPU ids 16-23 to move from the node 1 to the node 2. With
the patch applied, the CPU ids 32-39 are used since they are the lowest
free ones which have not been used on another node.

At boot time:
[root@vm40 ~]# numactl -H | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Vanilla kernel, after the CPU hot unplug/plug operations:
[root@vm40 ~]# numactl -H | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 24 25 26 27 28 29 30 31
node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47

Patched kernel, after the CPU hot unplug/plug operations:
[root@vm40 ~]# numactl -H | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 24 25 26 27 28 29 30 31
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Signed-off-by: Laurent Dufour <[email protected]>
Reviewed-by: Nathan Lynch <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.12, v5.12-rc8
# ed8029d7 18-Apr-2021 Michael Ellerman <[email protected]>

powerpc/pseries: Stop calling printk in rtas_stop_self()

RCU complains about us calling printk() from an offline CPU:

=============================
WARNING: suspicious RCU usage
5.12.0-rc7-02

powerpc/pseries: Stop calling printk in rtas_stop_self()

RCU complains about us calling printk() from an offline CPU:

=============================
WARNING: suspicious RCU usage
5.12.0-rc7-02874-g7cf90e481cb8 #1 Not tainted
-----------------------------
kernel/locking/lockdep.c:3568 RCU-list traversed in non-reader section!!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7-02874-g7cf90e481cb8 #1
Call Trace:
dump_stack+0xec/0x144 (unreliable)
lockdep_rcu_suspicious+0x124/0x144
__lock_acquire+0x1098/0x28b0
lock_acquire+0x128/0x600
_raw_spin_lock_irqsave+0x6c/0xc0
down_trylock+0x2c/0x70
__down_trylock_console_sem+0x60/0x140
vprintk_emit+0x1a8/0x4b0
vprintk_func+0xcc/0x200
printk+0x40/0x54
pseries_cpu_offline_self+0xc0/0x120
arch_cpu_idle_dead+0x54/0x70
do_idle+0x174/0x4a0
cpu_startup_entry+0x38/0x40
rest_init+0x268/0x388
start_kernel+0x748/0x790
start_here_common+0x1c/0x614

Which happens because by the time we get to rtas_stop_self() we are
already offline. In addition the message can be spammy, and is not that
helpful for users, so remove it.

Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 29c9a269 16-Apr-2021 Daniel Henrique Barboza <[email protected]>

powerpc/pseries: Set UNISOLATE on dlpar_cpu_remove() failure

The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothin

powerpc/pseries: Set UNISOLATE on dlpar_cpu_remove() failure

The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
for both QEMU and phyp. This gives us an opportunity to use this
behavior to signal the hypervisor layer when an error during device
removal happens, allowing it to do a proper error handling, while not
breaking QEMU/phyp implementations that don't have this support.

This patch introduces this idea by unisolating all CPU DRCs that failed
to be removed by dlpar_cpu_remove_by_index(), when handling the
PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
only because its the only CPU removal event QEMU uses, and there's no
need at this moment to add this mechanism for phyp only code.

Signed-off-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.12-rc7, v5.12-rc6, v5.12-rc5
# d19b3ad0 23-Mar-2021 Daniel Henrique Barboza <[email protected]>

powerpc/pseries/hotplug-cpu: Show 'last online CPU' error in dlpar_cpu_offline()

One of the reasons that dlpar_cpu_offline can fail is when attempting to
offline the last online CPU of the kernel. T

powerpc/pseries/hotplug-cpu: Show 'last online CPU' error in dlpar_cpu_offline()

One of the reasons that dlpar_cpu_offline can fail is when attempting to
offline the last online CPU of the kernel. This can be observed in a
pseries QEMU guest that has hotplugged CPUs. If the user offlines all
other CPUs of the guest, and a hotplugged CPU is now the last online
CPU, trying to reclaim it will fail.

The current error message in this situation returns rc with -EBUSY and a
generic explanation, e.g.:

pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16

EBUSY can be caused by other conditions, such as cpu_hotplug_disable
being true. Throwing a more specific error message for this case,
instead of just "Failed to offline CPU", makes it clearer that the error
is in fact a known error situation instead of other generic/unknown
cause.

This patch adds a 'last online' check in dlpar_cpu_offline() to catch
the 'last online CPU' offline error, eturning a more informative error
message:

pseries-hotplug-cpu: Unable to remove last online CPU PowerPC,POWER9

Signed-off-by: Daniel Henrique Barboza <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6
# 01b0f0ea 26-Nov-2020 Nicholas Piggin <[email protected]>

powerpc/64s: Trim offlined CPUs from mm_cpumasks

When offlining a CPU, powerpc/64s does not flush TLBs, rather it just
leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs
to manage

powerpc/64s: Trim offlined CPUs from mm_cpumasks

When offlining a CPU, powerpc/64s does not flush TLBs, rather it just
leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs
to manage its TLBs.

However the exit_flush_lazy_tlbs() function expects that after
returning, all CPUs (except self) have flushed TLBs for that mm, in
which case TLBIEL can be used for this flush. This breaks for offline
CPUs because they don't get the IPI to flush their TLB. This can lead
to stale translations.

Fix this by clearing the CPU from mm_cpumasks, then flushing all TLBs
before going offline.

These offlined CPU bits stuck in the cpumask also prevents the cpumask
from being trimmed back to local mode, which means continual broadcast
IPIs or TLBIEs are needed for TLB flushing. This patch prevents that
situation too.

A cast of many were involved in working this out, but in particular
Milton, Aneesh, Paul made key discoveries.

Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Debugged-by: Milton Miller <[email protected]>
Debugged-by: Aneesh Kumar K.V <[email protected]>
Debugged-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.10-rc5, v5.10-rc4
# a40fdaf1 11-Nov-2020 Zhang Xiaoxu <[email protected]>

Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"

This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980.

Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR
s

Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"

This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980.

Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR
support for drc-info property"), the 'cpu_drcs' wouldn't be double
freed when the 'cpus' node not found.

So we needn't apply this patch, otherwise, the memory will be leaked.

Fixes: a0ff72f9f5a7 ("powerpc/pseries/hotplug-cpu: Remove double free in error path")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zhang Xiaoxu <[email protected]>
[mpe: Caused by me applying a patch to a function that had changed in the interim]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2
# 39f87561 19-Aug-2020 Michael Ellerman <[email protected]>

powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()

We have smp_ops->cpu_die() and ppc_md.cpu_die(). One of them offlines
the current CPU and one offlines another CPU, can you guess whi

powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()

We have smp_ops->cpu_die() and ppc_md.cpu_die(). One of them offlines
the current CPU and one offlines another CPU, can you guess which is
which? Also one is in smp_ops and one is in ppc_md?

So rename ppc_md.cpu_die(), to cpu_offline_self(), because that's what
it does. And move it into smp_ops where it belongs.

Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.9-rc1
# 801980f6 11-Aug-2020 Michael Roth <[email protected]>

powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death

For a power9 KVM guest with XIVE enabled, running a test loop
where we hotplug 384 vcpus and then unplug them, the following traces
can

powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death

For a power9 KVM guest with XIVE enabled, running a test loop
where we hotplug 384 vcpus and then unplug them, the following traces
can be seen (generally within a few loops) either from the unplugged
vcpu:

cpu 65 (hwid 65) Ready to die...
Querying DEAD? cpu 66 (66) shows 2
list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ...
CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le)
MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000842 XER: 20040000
...
NIP __list_del_entry_valid+0xac/0x100
LR __list_del_entry_valid+0xa8/0x100
Call Trace:
__list_del_entry_valid+0xa8/0x100 (unreliable)
free_pcppages_bulk+0x1f8/0x940
free_unref_page+0xd0/0x100
xive_spapr_cleanup_queue+0x148/0x1b0
xive_teardown_cpu+0x1bc/0x240
pseries_mach_cpu_die+0x78/0x2f0
cpu_die+0x48/0x70
arch_cpu_idle_dead+0x20/0x40
do_idle+0x2f4/0x4c0
cpu_startup_entry+0x38/0x40
start_secondary+0x7bc/0x8f0
start_secondary_prolog+0x10/0x14

or on the worker thread handling the unplug:

pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
Querying DEAD? cpu 314 (314) shows 2
BUG: Bad page state in process kworker/u768:3 pfn:95de1
cpu 314 (hwid 314) Ready to die...
page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
flags: 0x5ffffc00000000()
raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ...
CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
Workqueue: pseries hotplug workque pseries_hp_work_fn
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
bad_page+0x12c/0x1b0
free_pcppages_bulk+0x5bc/0x940
page_alloc_cpu_dead+0x118/0x120
cpuhp_invoke_callback.constprop.5+0xb8/0x760
_cpu_down+0x188/0x340
cpu_down+0x5c/0xa0
cpu_subsys_offline+0x24/0x40
device_offline+0xf0/0x130
dlpar_offline_cpu+0x1c4/0x2a0
dlpar_cpu_remove+0xb8/0x190
dlpar_cpu_remove_by_index+0x12c/0x150
dlpar_cpu+0x94/0x800
pseries_hp_work_fn+0x128/0x1e0
process_one_work+0x304/0x5d0
worker_thread+0xcc/0x7a0
kthread+0x1ac/0x1c0
ret_from_kernel_thread+0x5c/0x80

The latter trace is due to the following sequence:

page_alloc_cpu_dead
drain_pages
drain_pages_zone
free_pcppages_bulk

where drain_pages() in this case is called under the assumption that
the unplugged cpu is no longer executing. To ensure that is the case,
and early call is made to __cpu_die()->pseries_cpu_die(), which runs a
loop that waits for the cpu to reach a halted state by polling its
status via query-cpu-stopped-state RTAS calls. It only polls for 25
iterations before giving up, however, and in the trace above this
results in the following being printed only .1 seconds after the
hotplug worker thread begins processing the unplug request:

pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
Querying DEAD? cpu 314 (314) shows 2

At that point the worker thread assumes the unplugged CPU is in some
unknown/dead state and procedes with the cleanup, causing the race
with the XIVE cleanup code executed by the unplugged CPU.

Fix this by waiting indefinitely, but also making an effort to avoid
spurious lockup messages by allowing for rescheduling after polling
the CPU status and printing a warning if we wait for longer than 120s.

Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Suggested-by: Michael Ellerman <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Tested-by: Greg Kurz <[email protected]>
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
[mpe: Trim oopses in change log slightly for readability]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1
# a0ff72f9 19-Sep-2019 Nathan Lynch <[email protected]>

powerpc/pseries/hotplug-cpu: Remove double free in error path

In the unlikely event that the device tree lacks a /cpus node,
find_dlpar_cpus_to_add() oddly frees the cpu_drcs buffer it has been
pass

powerpc/pseries/hotplug-cpu: Remove double free in error path

In the unlikely event that the device tree lacks a /cpus node,
find_dlpar_cpus_to_add() oddly frees the cpu_drcs buffer it has been
passed before returning an error. Its only caller also frees the
buffer on error.

Remove the less conventional kfree() of a caller-supplied buffer from
find_dlpar_cpus_to_add().

Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality")
Signed-off-by: Nathan Lynch <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


1234