|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
07864f1a |
| 05-Mar-2025 |
Yosry Ahmed <[email protected]> |
mm: zsmalloc: remove object mapping APIs and per-CPU map areas
zs_map_object() and zs_unmap_object() are no longer used, remove them. Since these are the only users of per-CPU mapping_areas, remove
mm: zsmalloc: remove object mapping APIs and per-CPU map areas
zs_map_object() and zs_unmap_object() are no longer used, remove them. Since these are the only users of per-CPU mapping_areas, remove them and the associated CPU hotplug callbacks too.
[[email protected]: update the docs] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Sergey Senozhatsky <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Nhat Pham <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
d1a89197 |
| 26-Sep-2024 |
Frederic Weisbecker <[email protected]> |
kthread: Default affine kthread to its preferred NUMA node
Kthreads attached to a preferred NUMA node for their task structure allocation can also be assumed to run preferrably within that same node
kthread: Default affine kthread to its preferred NUMA node
Kthreads attached to a preferred NUMA node for their task structure allocation can also be assumed to run preferrably within that same node.
A more precise affinity is usually notified by calling kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup.
For the others, a default affinity to the node is desired and sometimes implemented with more or less success when it comes to deal with hotplug events and nohz_full / CPU Isolation interactions:
- kcompactd is affine to its node and handles hotplug but not CPU Isolation - kswapd is affine to its node and ignores hotplug and CPU Isolation - A bunch of drivers create their kthreads on a specific node and don't take care about affining further.
Handle that default node affinity preference at the generic level instead, provided a kthread is created on an actual node and doesn't apply any specific affinity such as a given CPU or a custom cpumask to bind to before its first wake-up.
This generic handling is aware of CPU hotplug events and CPU isolation such that:
* When a housekeeping CPU goes up that is part of the node of a given kthread, the related task is re-affined to that own node if it was previously running on the default last resort online housekeeping set from other nodes.
* When a housekeeping CPU goes down while it was part of the node of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the same node or, as a last resort, to all housekeepers from other nodes.
Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
| #
25caea95 |
| 31-Oct-2024 |
Inochi Amaoto <[email protected]> |
irqchip: Add T-HEAD C900 ACLINT SSWI driver
Add a driver for the T-HEAD C900 ACLINT SSWI device. This device allows the system with T-HEAD cpus to send ipi via fast device interface.
Signed-off-by:
irqchip: Add T-HEAD C900 ACLINT SSWI driver
Add a driver for the T-HEAD C900 ACLINT SSWI device. This device allows the system with T-HEAD cpus to send ipi via fast device interface.
Signed-off-by: Inochi Amaoto <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
9e9af8bb |
| 10-Oct-2024 |
Kan Liang <[email protected]> |
perf/x86/rapl: Clean up cpumask and hotplug
The rapl pmu is die scope, which is supported by the generic perf_event subsystem now.
Set the scope for the rapl PMU and remove all the cpumask and hotp
perf/x86/rapl: Clean up cpumask and hotplug
The rapl pmu is die scope, which is supported by the generic perf_event subsystem now.
Set the scope for the rapl PMU and remove all the cpumask and hotplug codes.
Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Oliver Sang <[email protected]> Tested-by: Dhananjay Ugwekar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
e1dce564 |
| 28-Oct-2024 |
Gowthami Thiagarajan <[email protected]> |
perf/marvell: Marvell PEM performance monitor support
PCI Express Interface PMU includes various performance counters to monitor the data that is transmitted over the PCIe link. The counters track v
perf/marvell: Marvell PEM performance monitor support
PCI Express Interface PMU includes various performance counters to monitor the data that is transmitted over the PCIe link. The counters track various inbound and outbound transactions which includes separate counters for posted/non-posted/completion TLPs. Also, inbound and outbound memory read requests along with their latencies can also be monitored. Address Translation Services(ATS)events such as ATS Translation, ATS Page Request, ATS Invalidation along with their corresponding latencies are also supported.
The performance counters are 64 bits wide.
For instance, perf stat -e ib_tlp_pr <workload> tracks the inbound posted TLPs for the workload.
Co-developed-by: Linu Cherian <[email protected]> Signed-off-by: Linu Cherian <[email protected]> Signed-off-by: Gowthami Thiagarajan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
08155c7f |
| 02-Aug-2024 |
Kan Liang <[email protected]> |
perf/x86/intel/cstate: Clean up cpumask and hotplug
There are three cstate PMUs with different scopes, core, die and module. The scopes are supported by the generic perf_event subsystem now.
Set th
perf/x86/intel/cstate: Clean up cpumask and hotplug
There are three cstate PMUs with different scopes, core, die and module. The scopes are supported by the generic perf_event subsystem now.
Set the scope for each PMU and remove all the cpumask and hotplug codes.
Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
ae16f05c |
| 23-Aug-2024 |
Tianyang Zhang <[email protected]> |
irqchip/loongarch-avec: Add AVEC irqchip support
Introduce the advanced extended interrupt controllers (AVECINTC). This feature will allow each core to have 256 independent interrupt vectors and MSI
irqchip/loongarch-avec: Add AVEC irqchip support
Introduce the advanced extended interrupt controllers (AVECINTC). This feature will allow each core to have 256 independent interrupt vectors and MSI interrupts can be independently routed to any vector on any CPU.
The whole topology of irqchips in LoongArch machines looks like this if AVECINTC is supported:
+-----+ +-----------------------+ +-------+ | IPI | --> | CPUINTC | <-- | Timer | +-----+ +-----------------------+ +-------+ ^ ^ ^ | | | +---------+ +----------+ +---------+ +-------+ | EIOINTC | | AVECINTC | | LIOINTC | <-- | UARTs | +---------+ +----------+ +---------+ +-------+ ^ ^ | | +---------+ +---------+ | PCH-PIC | | PCH-MSI | +---------+ +---------+ ^ ^ ^ | | | +---------+ +---------+ +---------+ | Devices | | PCH-LPC | | Devices | +---------+ +---------+ +---------+ ^ | +---------+ | Devices | +---------+
Co-developed-by: Jianmin Lv <[email protected]> Signed-off-by: Jianmin Lv <[email protected]> Co-developed-by: Liupu Wang <[email protected]> Signed-off-by: Liupu Wang <[email protected]> Co-developed-by: Huacai Chen <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Tianyang Zhang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
9e83dd3e |
| 23-Aug-2024 |
Huacai Chen <[email protected]> |
irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING
Rename CPUHP_AP_IRQ_LOONGARCH_STARTING to CPUHP_AP_IRQ_EIOINTC_STARTING because the upcoming AVECINTC irqchip driver will introduce a
irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING
Rename CPUHP_AP_IRQ_LOONGARCH_STARTING to CPUHP_AP_IRQ_EIOINTC_STARTING because the upcoming AVECINTC irqchip driver will introduce a new state and so both are clearly identifiable.
Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Tianyang Zhang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.11-rc1 |
|
| #
3908ba2e |
| 17-Jul-2024 |
Nick Hu <[email protected]> |
RISC-V: Enable the IPI before workqueue_online_cpu()
Sometimes the hotplug cpu stalls at the arch_cpu_idle() for a while after workqueue_online_cpu(). When cpu stalls at the idle loop, the reschedul
RISC-V: Enable the IPI before workqueue_online_cpu()
Sometimes the hotplug cpu stalls at the arch_cpu_idle() for a while after workqueue_online_cpu(). When cpu stalls at the idle loop, the reschedule IPI is pending. However the enable bit is not enabled yet so the cpu stalls at WFI until watchdog timeout. Therefore enable the IPI before the workqueue_online_cpu() to fix the issue.
Fixes: 63c5484e7495 ("workqueue: Add multiple affinity scopes and interface to select them") Signed-off-by: Nick Hu <[email protected]> Reviewed-by: Anup Patel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
show more ...
|
| #
2accfdb7 |
| 29-Jul-2024 |
Linus Torvalds <[email protected]> |
profiling: attempt to remove per-cpu profile flip buffer
This is the really old legacy kernel profiling code, which has long since been obviated by "real profiling" (ie 'prof' and company), and main
profiling: attempt to remove per-cpu profile flip buffer
This is the really old legacy kernel profiling code, which has long since been obviated by "real profiling" (ie 'prof' and company), and mainly remains as a source of syzbot reports.
There are anecdotal reports that people still use it for boot-time profiling, but it's unlikely that such use would care about the old NUMA optimizations in this code from 2004 (commit ad02973d42: "profile: 512x Altix timer interrupt livelock fix" in the BK import archive at [1])
So in order to head off future syzbot reports, let's try to simplify this code and get rid of the per-cpu profile buffers that are quite a large portion of the complexity footprint of this thing (including CPU hotplug callbacks etc).
It's unlikely anybody will actually notice, or possibly, as Thomas put it: "Only people who indulge in nostalgia will notice :)".
That said, if it turns out that this code is actually actively used by somebody, we can always revert this removal. Thus the "attempt" in the summary line.
[ Note: in a small nod to "the profiling code can cause NUMA problems", this also removes the "increment the last entry in the profiling array on any unknown hits" logic. That would account any program counter in a module to that single counter location, and might exacerbate any NUMA cacheline bouncing issues ]
Link: https://lore.kernel.org/all/CAHk-=wgs52BxT4Zjmjz8aNvHWKxf5_ThBY4bYL1Y6CTaNL2dTw@mail.gmail.com/ Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git [1] Cc: Thomas Gleixner <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
10a0e6f3 |
| 17-Jul-2024 |
Anna-Maria Behnsen <[email protected]> |
timers/migration: Move hierarchy setup into cpuhotplug prepare callback
When a CPU comes online the first time, it is possible that a new top level group will be created. In general all propagation
timers/migration: Move hierarchy setup into cpuhotplug prepare callback
When a CPU comes online the first time, it is possible that a new top level group will be created. In general all propagation is done from the bottom to top. This minimizes complexity and prevents possible races. But when a new top level group is created, the formely top level group needs to be connected to the new level. This is the only time, when the direction to propagate changes is changed: the changes are propagated from top (new top level group) to bottom (formerly top level group).
This introduces two races (see (A) and (B)) as reported by Frederic:
(A) This race happens, when marking the formely top level group as active, but the last active CPU of the formerly top level group goes idle. Then it's likely that formerly group is no longer active, but marked nevertheless as active in new top level group:
[GRP0:0] migrator = 0 active = 0 nextevt = KTIME_MAX / \ 0 1 .. 7 active idle
0) Hierarchy has for now only 8 CPUs and CPU 0 is the only active CPU.
[GRP1:0] migrator = TMIGR_NONE active = NONE nextevt = KTIME_MAX \ [GRP0:0] [GRP0:1] migrator = 0 migrator = TMIGR_NONE active = 0 active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 active idle !online
1) CPU 8 is booting and creates a new group in first level GRP0:1 and therefore also a new top group GRP1:0. For now the setup code proceeded only until the connected between GRP0:1 to the new top group. The connection between CPU8 and GRP0:1 is not yet established and CPU 8 is still !online.
[GRP1:0] migrator = TMIGR_NONE active = NONE nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0 migrator = TMIGR_NONE active = 0 active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 active idle !online
2) Setup code now connects GRP0:0 to GRP1:0 and observes while in tmigr_connect_child_parent() that GRP0:0 is not TMIGR_NONE. So it prepares to call tmigr_active_up() on it. It hasn't done it yet.
[GRP1:0] migrator = TMIGR_NONE active = NONE nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = NONE active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 idle idle !online
3) CPU 0 goes idle. Since GRP0:0->parent has been updated by CPU 8 with GRP0:0->lock held, CPU 0 observes GRP1:0 after calling tmigr_update_events() and it propagates the change to the top (no change there and no wakeup programmed since there is no timer).
[GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = NONE active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 idle idle !online
4) Now the setup code finally calls tmigr_active_up() to and sets GRP0:0 active in GRP1:0
[GRP1:0] migrator = GRP0:0 active = GRP0:0, GRP0:1 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = 8 active = NONE active = 8 nextevt = KTIME_MAX nextevt = KTIME_MAX / \ | 0 1 .. 7 8 idle idle active
5) Now CPU 8 is connected with GRP0:1 and CPU 8 calls tmigr_active_up() out of tmigr_cpu_online().
[GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T8 / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = NONE active = NONE nextevt = KTIME_MAX nextevt = T8 / \ | 0 1 .. 7 8 idle idle idle
5) CPU 8 goes idle with a timer T8 and relies on GRP0:0 as the migrator. But it's not really active, so T8 gets ignored.
--> The update which is done in third step is not noticed by setup code. So a wrong migrator is set to top level group and a timer could get ignored.
(B) Reading group->parent and group->childmask when an hierarchy update is ongoing and reaches the formerly top level group is racy as those values could be inconsistent. (The notation of migrator and active now slightly changes in contrast to the above example, as now the childmasks are used.)
[GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = 0x00 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 0 childmask= 1 parent = NULL parent = GRP1:0 / \ 0 1 .. 7 8 idle idle !online childmask=1
1) Hierarchy has 8 CPUs. CPU 8 is at the moment in the process of onlining but did not yet connect GRP0:0 to GRP1:0.
[GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = 0x00 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 0 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 idle idle !online childmask=1
2) Setup code (running on CPU 8) now connects GRP0:0 to GRP1:0, updates parent pointer of GRP0:0 and ...
[GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0x01 migrator = TMIGR_NONE active = 0x01 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 0 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 active idle !online childmask=1
tmigr_walk.childmask = 0
3) ... CPU 0 comes active in the same time. As migrator in GRP0:0 was TMIGR_NONE, childmask of GRP0:0 is stored in update propagation data structure tmigr_walk (as update of childmask is not yet visible/updated). And now ...
[GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0x01 migrator = TMIGR_NONE active = 0x01 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 2 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 active idle !online childmask=1
tmigr_walk.childmask = 0
4) ... childmask of GRP0:0 is updated by CPU 8 (still part of setup code).
[GRP1:0] migrator = 0x00 active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0x01 migrator = TMIGR_NONE active = 0x01 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 2 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 active idle !online childmask=1
tmigr_walk.childmask = 0
5) CPU 0 sees the connection to GRP1:0 and now propagates active state to GRP1:0 but with childmask = 0 as stored in propagation data structure.
--> Now GRP1:0 always has a migrator as 0x00 != TMIGR_NONE and for all CPUs it looks like GRP1:0 is always active.
To prevent those races, the setup of the hierarchy is moved into the cpuhotplug prepare callback. The prepare callback is not executed by the CPU which will come online, it is executed by the CPU which prepares onlining of the other CPU. This CPU is active while it is connecting the formerly top level to the new one. This prevents from (A) to happen and it also prevents from any further walk above the formerly top level until that active CPU becomes inactive, releasing the new ->parent and ->childmask updates to be visible by any subsequent walk up above the formerly top level hierarchy. This prevents from (B) to happen. The direction for the updates is now forced to look like "from bottom to top".
However if the active CPU prevents from tmigr_cpu_(in)active() to walk up with the update not-or-half visible, nothing prevents walking up to the new top with a 0 childmask in tmigr_handle_remote_up() or tmigr_requires_handle_remote_up() if the active CPU doing the prepare is not the migrator. But then it looks fine because:
* tmigr_check_migrator() should just return false * The migrator is active and should eventually observe the new childmask at some point in a future tick.
Split setup functionality of online callback into the cpuhotplug prepare callback and setup hotplug state. Change init call into early_initcall() to make sure an already active CPU prepares everything for newly upcoming CPUs. Reorder the code, that all prepare related functions are close to each other and online and offline callbacks are also close together.
Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10 |
|
| #
4bdc3eaa |
| 10-Jul-2024 |
Chris Packham <[email protected]> |
clocksource/drivers/realtek: Add timer driver for rtl-otto platforms
The timer/counter block on the Realtek SoCs provides up to 5 timers. It also includes a watchdog timer which is handled by the re
clocksource/drivers/realtek: Add timer driver for rtl-otto platforms
The timer/counter block on the Realtek SoCs provides up to 5 timers. It also includes a watchdog timer which is handled by the realtek_otto_wdt.c driver.
One timer will be used per CPU as a local clock event generator. An additional timer will be used as an overal stable clocksource.
Signed-off-by: Markus Stockhausen <[email protected]> Signed-off-by: Sander Vanheule <[email protected]> Signed-off-by: Chris Packham <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Daniel Lezcano <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
f45a6051 |
| 25-Mar-2024 |
Costa Shulyupin <[email protected]> |
cpu/hotplug: Fix typo in comment
Signed-off-by: Costa Shulyupin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/20240325163810.669459-1-co
cpu/hotplug: Fix typo in comment
Signed-off-by: Costa Shulyupin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.9-rc1, v6.8 |
|
| #
21a8f8a0 |
| 07-Mar-2024 |
Anup Patel <[email protected]> |
irqchip: Add RISC-V incoming MSI controller early driver
The RISC-V advanced interrupt architecture (AIA) specification defines a new MSI controller called incoming message signalled interrupt contr
irqchip: Add RISC-V incoming MSI controller early driver
The RISC-V advanced interrupt architecture (AIA) specification defines a new MSI controller called incoming message signalled interrupt controller (IMSIC) which manages MSI on per-HART (or per-CPU) basis. It also supports IPIs as software injected MSIs. (For more details refer https://github.com/riscv/riscv-aia)
Add an early irqchip driver for RISC-V IMSIC which sets up the IMSIC state and provide IPIs.
Signed-off-by: Anup Patel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Björn Töpel <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6 |
|
| #
3ad6eb06 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Start centralizing tick related CPU hotplug operations
During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks
tick: Start centralizing tick related CPU hotplug operations
During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks on stop machine, sometimes through explicit calls, sometimes from the control CPU after the CPU died. The reason why these shutdown operations are spread around is not always clear and it makes the tick lifecycle hard to follow.
The tick should be shut down in order from highest to lowest level:
On stop machine from the dying CPU (high-level):
1) Hand-over the timekeeping duty (tick_handover_do_timer()) 2) Cancel the tick implementation called by the clockevent callback (tick_cancel_sched_timer()) 3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline())
On stop machine from the dying CPU (low-level):
4) Shutdown clockevents drivers (CPUHP_AP_*_TIMER_STARTING states)
From the control CPU after the CPU died (low-level):
5) Shutdown/unregister/cleanup clockevents for the dead CPU (tick_cleanup_dead_cpu())
Instead the current order is 2, 4 (both from CPU hotplug states), then 1 and 3 through direct calls. This layout and order don't make much sense. The operations 1, 2, 3 should be gathered together and in order.
Sort this situation with creating a new TICK shut-down CPU hotplug state and start with introducing the timekeeping duty hand-over there. The state must precede hrtimers migration because the tick hrtimer will be stopped from it in a further patch.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
7ee98877 |
| 22-Feb-2024 |
Anna-Maria Behnsen <[email protected]> |
timers: Implement the hierarchical pull model
Placing timers at enqueue time on a target CPU based on dubious heuristics does not make any sense:
1) Most timer wheel timers are canceled or rearmed
timers: Implement the hierarchical pull model
Placing timers at enqueue time on a target CPU based on dubious heuristics does not make any sense:
1) Most timer wheel timers are canceled or rearmed before they expire.
2) The heuristics to predict which CPU will be busy when the timer expires are wrong by definition.
So placing the timers at enqueue wastes precious cycles.
The proper solution to this problem is to always queue the timers on the local CPU and allow the non pinned timers to be pulled onto a busy CPU at expiry time.
Therefore split the timer storage into local pinned and global timers: Local pinned timers are always expired on the CPU on which they have been queued. Global timers can be expired on any CPU.
As long as a CPU is busy it expires both local and global timers. When a CPU goes idle it arms for the first expiring local timer. If the first expiring pinned (local) timer is before the first expiring movable timer, then no action is required because the CPU will wake up before the first movable timer expires. If the first expiring movable timer is before the first expiring pinned (local) timer, then this timer is queued into an idle timerqueue and eventually expired by another active CPU.
To avoid global locking the timerqueues are implemented as a hierarchy. The lowest level of the hierarchy holds the CPUs. The CPUs are associated to groups of 8, which are separated per node. If more than one CPU group exist, then a second level in the hierarchy collects the groups. Depending on the size of the system more than 2 levels are required. Each group has a "migrator" which checks the timerqueue during the tick for remote expirable timers.
If the last CPU in a group goes idle it reports the first expiring event in the group up to the next group(s) in the hierarchy. If the last CPU goes idle it arms its timer for the first system wide expiring timer to ensure that no timer event is missed.
Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8 |
|
| #
8ba2f844 |
| 28-Dec-2023 |
Chengming Zhou <[email protected]> |
mm/zswap: change per-cpu mutex and buffer to per-acomp_ctx
First of all, we need to rename acomp_ctx->dstmem field to buffer, since we are now using for purposes other than compression.
Then we cha
mm/zswap: change per-cpu mutex and buffer to per-acomp_ctx
First of all, we need to rename acomp_ctx->dstmem field to buffer, since we are now using for purposes other than compression.
Then we change per-cpu mutex and buffer to per-acomp_ctx, since them belong to the acomp_ctx and are necessary parts when used in the compress/decompress contexts.
So we can remove the old per-cpu mutex and dstmem.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Chris Li <[email protected]> (Google) Reviewed-by: Nhat Pham <[email protected]> Cc: Barry Song <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc7 |
|
| #
fe22944c |
| 19-Dec-2023 |
xiaoming Wang <[email protected]> |
cpu/hotplug: Increase the number of dynamic states
The dynamically allocatable hotplug state space can be exhausted by the existing drivers and infrastructure which install CPU hotplug states dynami
cpu/hotplug: Increase the number of dynamic states
The dynamically allocatable hotplug state space can be exhausted by the existing drivers and infrastructure which install CPU hotplug states dynamically. That prevents new drivers and infrastructure from installing dynamically allocated states.
Increase the size of the CPUHP_AP_ONLINE_DYN state by 10 to make room.
Signed-off-by: Xiaoming Wang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
15bece7b |
| 24-Nov-2023 |
Zenghui Yu <[email protected]> |
cpu/hotplug: Remove unused CPU hotplug states
There are unused hotplug states which either have never been used or the removal of the usage did not remove the state constant.
Drop them to reduce th
cpu/hotplug: Remove unused CPU hotplug states
There are unused hotplug states which either have never been used or the removal of the usage did not remove the state constant.
Drop them to reduce the size of the cpuhp_hp_states array.
Signed-off-by: Zenghui Yu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5 |
|
| #
70da1d01 |
| 02-Oct-2023 |
Vlastimil Babka <[email protected]> |
cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks
The CPUHP_SLAB_PREPARE hooks are only used by SLAB which is removed. SLUB defines them as NULL, so we can remove those altogether.
Acked-by: Thomas Glei
cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks
The CPUHP_SLAB_PREPARE hooks are only used by SLAB which is removed. SLUB defines them as NULL, so we can remove those altogether.
Acked-by: Thomas Gleixner <[email protected]> Acked-by: David Rientjes <[email protected]> Tested-by: David Rientjes <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
5c0930cc |
| 07-Nov-2023 |
Thomas Gleixner <[email protected]> |
hrtimers: Push pending hrtimers away from outgoing CPU earlier
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") solved the straight forward CPU hotplug deadlock vs. the schedule
hrtimers: Push pending hrtimers away from outgoing CPU earlier
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") solved the straight forward CPU hotplug deadlock vs. the scheduler bandwidth timer. Yu discovered a more involved variant where a task which has a bandwidth timer started on the outgoing CPU holds a lock and then gets throttled. If the lock required by one of the CPU hotplug callbacks the hotplug operation deadlocks because the unthrottling timer event is not handled on the dying CPU and can only be recovered once the control CPU reaches the hotplug state which pulls the pending hrtimers from the dead CPU.
Solve this by pushing the hrtimers away from the dying CPU in the dying callbacks. Nothing can queue a hrtimer on the dying CPU at that point because all other CPUs spin in stop_machine() with interrupts disabled and once the operation is finished the CPU is marked offline.
Reported-by: Yu Liao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Liu Tie <[email protected]> Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
show more ...
|
| #
20f3b8ea |
| 16-Oct-2023 |
Mark Rutland <[email protected]> |
arm64/arm: xen: enlighten: Fix KPTI checks
When KPTI is in use, we cannot register a runstate region as XEN requires that this is always a valid VA, which we cannot guarantee. Due to this, xen_start
arm64/arm: xen: enlighten: Fix KPTI checks
When KPTI is in use, we cannot register a runstate region as XEN requires that this is always a valid VA, which we cannot guarantee. Due to this, xen_starting_cpu() must avoid registering each CPU's runstate region, and xen_guest_init() must avoid setting up features that depend upon it.
We tried to ensure that in commit:
f88af7229f6f22ce (" xen/arm: do not setup the runstate info page if kpti is enabled")
... where we added checks for xen_kernel_unmapped_at_usr(), which wraps arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit arm.
Unfortunately, as xen_guest_init() is an early_initcall, this happens before secondary CPUs are booted and arm64 has finalized the ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as secondary CPUs are onlined. On a big.LITTLE system where the boot CPU does not require KPTI but some secondary CPUs do, this will result in xen_guest_init() intializing features that depend on the runstate region, and xen_starting_cpu() registering the runstate region on some CPUs before KPTI is subsequent enabled, resulting the the problems the aforementioned commit tried to avoid.
Handle this more robsutly by deferring the initialization of the runstate region until secondary CPUs have been initialized and the ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is moved into a new hotplug starting function which is registered later when we're certain that KPTI will not be used.
Fixes: f88af7229f6f ("xen/arm: do not setup the runstate info page if kpti is enabled") Signed-off-by: Mark Rutland <[email protected]> Cc: Bertrand Marquis <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
| #
166b76a0 |
| 16-Oct-2023 |
Mark Rutland <[email protected]> |
clocksource/drivers/arm_arch_timer: Initialize evtstrm after finalizing cpucaps
We attempt to initialize each CPU's arch_timer event stream in arch_timer_evtstrm_enable(), which we call from the arc
clocksource/drivers/arm_arch_timer: Initialize evtstrm after finalizing cpucaps
We attempt to initialize each CPU's arch_timer event stream in arch_timer_evtstrm_enable(), which we call from the arch_timer_starting_cpu() cpu hotplug callback which is registered early in boot. As this is registered before we initialize the system cpucaps, the test for ARM64_HAS_ECV will always be false for CPUs present at boot time, and will only be taken into account for CPUs onlined late (including those which are hotplugged out and in again).
Due to this, CPUs present and boot time may not use the intended divider and scale factor to generate the event stream, and may differ from other CPUs.
Correct this by only initializing the event stream after cpucaps have been finalized, registering a separate CPU hotplug callback for the event stream configuration. Since the caps must be finalized by this point, use cpus_have_final_cap() to verify this.
Signed-off-by: Mark Rutland <[email protected]> Acked-by: Marc Zyngier <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1 |
|
| #
32e4fa37 |
| 04-Sep-2023 |
Olaf Hering <[email protected]> |
cpu/hotplug: Remove unused cpuhp_state CPUHP_AP_X86_VDSO_VMA_ONLINE
Commit b2e2ba578e01 ("x86/vdso: Initialize the CPU/node NR segment descriptor earlier") removed the single user of this constant.
cpu/hotplug: Remove unused cpuhp_state CPUHP_AP_X86_VDSO_VMA_ONLINE
Commit b2e2ba578e01 ("x86/vdso: Initialize the CPU/node NR segment descriptor earlier") removed the single user of this constant.
Remove it to reduce the size of cpuhp_hp_states[].
Signed-off-by: Olaf Hering <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
ef7d9593 |
| 11-Sep-2023 |
Darrick J. Wong <[email protected]> |
xfs: remove CPU hotplug infrastructure
There are no users of the cpu hotplug hooks in xfs now, so remove it. This reverts f1653c2e2831e ("xfs: introduce CPU hotplug infrastructure").
Signed-off-by:
xfs: remove CPU hotplug infrastructure
There are no users of the cpu hotplug hooks in xfs now, so remove it. This reverts f1653c2e2831e ("xfs: introduce CPU hotplug infrastructure").
Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
show more ...
|