|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7 |
|
| #
a2a3374c |
| 10-Jan-2025 |
Andrea Righi <[email protected]> |
sched_ext: idle: Refresh idle masks during idle-to-idle transitions
With the consolidation of put_prev_task/set_next_task(), see commit 436f3eed5c69 ("sched: Combine the last put_prev_task() and the
sched_ext: idle: Refresh idle masks during idle-to-idle transitions
With the consolidation of put_prev_task/set_next_task(), see commit 436f3eed5c69 ("sched: Combine the last put_prev_task() and the first set_next_task()"), we are now skipping the transition between these two functions when the previous and the next tasks are the same.
As a result, the scx idle state of a CPU is updated only when transitioning to or from the idle thread. While this is generally correct, it can lead to uneven and inefficient core utilization in certain scenarios [1].
A typical scenario involves proactive wake-ups: scx_bpf_pick_idle_cpu() selects and marks an idle CPU as busy, followed by a wake-up via scx_bpf_kick_cpu(), without dispatching any tasks. In this case, the CPU continues running the idle thread, returns to idle, but remains marked as busy, preventing it from being selected again as an idle CPU (until a task eventually runs on it and releases the CPU).
For example, running a workload that uses 20% of each CPU, combined with an scx scheduler using proactive wake-ups, results in the following core utilization:
CPU 0: 25.7% CPU 1: 29.3% CPU 2: 26.5% CPU 3: 25.5% CPU 4: 0.0% CPU 5: 25.5% CPU 6: 0.0% CPU 7: 10.5%
To address this, refresh the idle state also in pick_task_idle(), during idle-to-idle transitions, but only trigger ops.update_idle() on actual state changes to prevent unnecessary updates to the scx scheduler and maintain balanced state transitions.
With this change in place, the core utilization in the previous example becomes the following:
CPU 0: 18.8% CPU 1: 19.4% CPU 2: 18.0% CPU 3: 18.7% CPU 4: 19.3% CPU 5: 18.9% CPU 6: 18.7% CPU 7: 19.3%
[1] https://github.com/sched-ext/scx/pull/1139
Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
46d076af |
| 31-Oct-2024 |
Nam Cao <[email protected]> |
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimer_setup_on_stack() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init_on_stack() and
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimer_setup_on_stack() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init_on_stack() and the open coded initialization of hrtimer::function with the new setup mechanism.
The conversion was done with Coccinelle.
Signed-off-by: Nam Cao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/17f9421fed6061df4ad26a4cc91873d2c078cb0f.1730386209.git.namcao@linutronix.de
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
8e113df9 |
| 09-Oct-2024 |
Zhongqiu Han <[email protected]> |
sched: idle: Optimize the generic idle loop by removing needless memory barrier
The memory barrier rmb() in generic idle loop do_idle() function is not needed, it doesn't order any load instruction,
sched: idle: Optimize the generic idle loop by removing needless memory barrier
The memory barrier rmb() in generic idle loop do_idle() function is not needed, it doesn't order any load instruction, just remove it as needless rmb() can cause performance impact.
The rmb() was introduced by the tglx/history.git commit f2f1b44c75c4 ("[PATCH] Remove RCU abuse in cpu_idle()") to order the loads between cpu_idle_map and pm_idle. It pairs with wmb() in function cpu_idle_wait().
And then with the removal of cpu_idle_state in function cpu_idle() and wmb() in function cpu_idle_wait() in commit 783e391b7b5b ("x86: Simplify cpu_idle_wait"), rmb() no longer has a reason to exist.
After that, commit d16699123434 ("idle: Implement generic idle function") implemented a generic idle function cpu_idle_loop() which resembles the functionality found in arch/. And it retained the rmb() in generic idle loop in file kernel/cpu/idle.c.
And at last, commit cf37b6b48428 ("sched/idle: Move cpu/idle.c to sched/idle.c") moved cpu/idle.c to sched/idle.c. And commit c1de45ca831a ("sched/idle: Add support for tasks that inject idle") renamed function cpu_idle_loop() to do_idle().
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: Zhongqiu Han <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
b2d70222 |
| 13-Aug-2024 |
Peter Zijlstra <[email protected]> |
sched: Add put_prev_task(.next)
In order to tell the previous sched_class what the next task is, add put_prev_task(.next).
Notable SCX will use this to:
1) determine the next task will leave the
sched: Add put_prev_task(.next)
In order to tell the previous sched_class what the next task is, add put_prev_task(.next).
Notable SCX will use this to:
1) determine the next task will leave the SCX sched class and push the current task to another CPU if possible. 2) statistics on how often and which other classes preempt it
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
fd03c5b8 |
| 13-Aug-2024 |
Peter Zijlstra <[email protected]> |
sched: Rework pick_next_task()
The current rule is that:
pick_next_task() := pick_task() + set_next_task(.first = true)
And many classes implement it directly as such. Change things around to ma
sched: Rework pick_next_task()
The current rule is that:
pick_next_task() := pick_task() + set_next_task(.first = true)
And many classes implement it directly as such. Change things around to make pick_next_task() optional while also changing the definition to:
pick_next_task(prev) := pick_task() + put_prev_task() + set_next_task(.first = true)
The reason is that sched_ext would like to have a 'final' call that knows the next task. By placing put_prev_task() right next to set_next_task() (as it already is for sched_core) this becomes trivial.
As a bonus, this is a nice cleanup on its own.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3 |
|
| #
863ccdbb |
| 03-Apr-2024 |
Peter Zijlstra <[email protected]> |
sched: Allow sched_class::dequeue_task() to fail
Change the function signature of sched_class::dequeue_task() to return a boolean, allowing future patches to 'fail' dequeue.
Signed-off-by: Peter Zi
sched: Allow sched_class::dequeue_task() to fail
Change the function signature of sched_class::dequeue_task() to return a boolean, allowing future patches to 'fail' dequeue.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Tested-by: Valentin Schneider <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
a110a81c |
| 27-May-2024 |
Daniel Bristot de Oliveira <[email protected]> |
sched/deadline: Deferrable dl server
Among the motivations for the DL servers is the real-time throttling mechanism. This mechanism works by throttling the rt_rq after running for a long period with
sched/deadline: Deferrable dl server
Among the motivations for the DL servers is the real-time throttling mechanism. This mechanism works by throttling the rt_rq after running for a long period without leaving space for fair tasks.
The base dl server avoids this problem by boosting fair tasks instead of throttling the rt_rq. The point is that it boosts without waiting for potential starvation, causing some non-intuitive cases.
For example, an IRQ dispatches two tasks on an idle system, a fair and an RT. The DL server will be activated, running the fair task before the RT one. This problem can be avoided by deferring the dl server activation.
By setting the defer option, the dl_server will dispatch an SCHED_DEADLINE reservation with replenished runtime, but throttled.
The dl_timer will be set for the defer time at (period - runtime) ns from start time. Thus boosting the fair rq at defer time.
If the fair scheduler has the opportunity to run while waiting for defer time, the dl server runtime will be consumed. If the runtime is completely consumed before the defer time, the server will be replenished while still in a throttled state. Then, the dl_timer will be reset to the new defer time
If the fair server reaches the defer time without consuming its runtime, the server will start running, following CBS rules (thus without breaking SCHED_DEADLINE). Then the server will continue the running state (without deferring) until it fair tasks are able to execute as regular fair scheduler (end of the starvation).
Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Link: https://lore.kernel.org/r/dd175943c72533cd9f0b87767c6499204879cc38.1716811044.git.bristot@kernel.org
show more ...
|
| #
a7a9fc54 |
| 18-Jun-2024 |
Tejun Heo <[email protected]> |
sched_ext: Add boilerplate for extensible scheduler class
This adds dummy implementations of sched_ext interfaces which interact with the scheduler core and hook them in the correct places. As they'
sched_ext: Add boilerplate for extensible scheduler class
This adds dummy implementations of sched_ext interfaces which interact with the scheduler core and hook them in the correct places. As they're all dummies, this doesn't cause any behavior changes. This is split out to help reviewing.
v2: balance_scx_on_up() dropped. This will be handled in sched_ext proper.
Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]> Acked-by: Josh Don <[email protected]> Acked-by: Hao Luo <[email protected]> Acked-by: Barret Rhoden <[email protected]>
show more ...
|
| #
764d5fcc |
| 03-Jun-2024 |
Christian Loehle <[email protected]> |
idle: Remove stale RCU comment
The call of rcu_idle_enter() from within cpuidle_idle_call() was removed in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path") which makes
idle: Remove stale RCU comment
The call of rcu_idle_enter() from within cpuidle_idle_call() was removed in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path") which makes the comment out of place.
Signed-off-by: Christian Loehle <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
402de7fc |
| 27-May-2024 |
Ingo Molnar <[email protected]> |
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ing
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
2be2a197 |
| 29-Feb-2024 |
Thomas Gleixner <[email protected]> |
sched/idle: Conditionally handle tick broadcast in default_idle_call()
The x86 architecture has an idle routine for AMD CPUs which are affected by erratum 400. On the affected CPUs the local APIC ti
sched/idle: Conditionally handle tick broadcast in default_idle_call()
The x86 architecture has an idle routine for AMD CPUs which are affected by erratum 400. On the affected CPUs the local APIC timer stops in the C1E halt state.
It therefore requires tick broadcasting. The invocation of tick_broadcast_enter()/exit() from this function violates the RCU constraints because it can end up in lockdep or tracing, which rightfully triggers a warning.
tick_broadcast_enter()/exit() must be invoked before ct_cpuidle_enter() and after ct_cpuidle_exit() in default_idle_call().
Add a static branch conditional invocation of tick_broadcast_enter()/exit() into this function to allow X86 to replace the AMD specific idle code. It's guarded by a config switch which will be selected by x86. Otherwise it's a NOOP.
Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc6 |
|
| #
500f8f9b |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after.
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after. Therefore it's guaranteed that the current CPU isn't the timekeeper upon its last call to idle.
Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes into idle suggests that the tick is going to be stopped while it is actually stopped already from the appropriate CPU hotplug state.
Remove the confusing call and the obsolete case handling and convert it to a sanity check that verifies the above assumption.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2 |
|
| #
dd540386 |
| 14-Nov-2023 |
Frederic Weisbecker <[email protected]> |
sched/cpuidle: Comment about timers requirements VS idle handler
Add missing explanation concerning IRQs re-enablement constraints in the cpuidle path against timers.
Signed-off-by: Frederic Weisbe
sched/cpuidle: Comment about timers requirements VS idle handler
Add missing explanation concerning IRQs re-enablement constraints in the cpuidle path against timers.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2 |
|
| #
cff9b233 |
| 15-Sep-2023 |
Liam R. Howlett <[email protected]> |
kernel/sched: Modify initial boot task idle setup
Initial booting is setting the task flag to idle (PF_IDLE) by the call path sched_init() -> init_idle(). Having the task idle and calling call_rcu(
kernel/sched: Modify initial boot task idle setup
Initial booting is setting the task flag to idle (PF_IDLE) by the call path sched_init() -> init_idle(). Having the task idle and calling call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be set. Subsequent calls to any cond_resched() will enable IRQs, potentially earlier than the IRQ setup has completed. Recent changes have caused just this scenario and IRQs have been enabled early.
This causes a warning later in start_kernel() as interrupts are enabled before they are fully set up.
Fix this issue by setting the PF_IDLE flag later in the boot sequence.
Although the boot task was marked as idle since (at least) d80e4fda576d, I am not sure that it is wrong to do so. The forced context-switch on idle task was introduced in the tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.
Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
show more ...
|
| #
e23edc86 |
| 19-Sep-2023 |
Ingo Molnar <[email protected]> |
sched/fair: Rename check_preempt_curr() to wakeup_preempt()
The name is a bit opaque - make it clear that this is about wakeup preemption.
Also rename the ->check_preempt_curr() methods similarly.
sched/fair: Rename check_preempt_curr() to wakeup_preempt()
The name is a bit opaque - make it clear that this is about wakeup preemption.
Also rename the ->check_preempt_curr() methods similarly.
Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2 |
|
| #
071c44e4 |
| 14-Feb-2023 |
Josh Poimboeuf <[email protected]> |
sched/idle: Mark arch_cpu_idle_dead() __noreturn
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead() return"), in Xen, when a previously offlined CPU was brought back online, it unexp
sched/idle: Mark arch_cpu_idle_dead() __noreturn
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead() return"), in Xen, when a previously offlined CPU was brought back online, it unexpectedly resumed execution where it left off in the middle of the idle loop.
There were some hacks to make that work, but the behavior was surprising as do_idle() doesn't expect an offlined CPU to return from the dead (in arch_cpu_idle_dead()).
Now that Xen has been fixed, and the arch-specific implementations of arch_cpu_idle_dead() also don't return, give it a __noreturn attribute.
This will cause the compiler to complain if an arch-specific implementation might return. It also improves code generation for both caller and callee.
Also fixes the following warning:
vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction
Reported-by: Paul E. McKenney <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <[email protected]>
show more ...
|
| #
dfb0f170 |
| 14-Feb-2023 |
Josh Poimboeuf <[email protected]> |
sched/idle: Make sure weak version of arch_cpu_idle_dead() doesn't return
arch_cpu_idle_dead() should never return. Make it so.
Link: https://lore.kernel.org/r/cf5ad95eef50f7704bb30e7770c59bfe2337
sched/idle: Make sure weak version of arch_cpu_idle_dead() doesn't return
arch_cpu_idle_dead() should never return. Make it so.
Link: https://lore.kernel.org/r/cf5ad95eef50f7704bb30e7770c59bfe23372af7.1676358308.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4 |
|
| #
89b30987 |
| 12-Jan-2023 |
Peter Zijlstra <[email protected]> |
arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled
Current arch_cpu_idle() is called with IRQs disabled, but will return with IRQs enabled.
However, the very first thing the
arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled
Current arch_cpu_idle() is called with IRQs disabled, but will return with IRQs enabled.
However, the very first thing the generic code does after calling arch_cpu_idle() is raw_local_irq_disable(). This means that architectures that can idle with IRQs disabled end up doing a pointless 'enable-disable' dance.
Therefore, push this IRQ disabling into the idle function, meaning that those architectures can avoid the pointless IRQ state flipping.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Tony Lindgren <[email protected]> Tested-by: Ulf Hansson <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Acked-by: Mark Rutland <[email protected]> [arm64] Acked-by: Rafael J. Wysocki <[email protected]> Acked-by: Guo Ren <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
a01353cf |
| 12-Jan-2023 |
Peter Zijlstra <[email protected]> |
cpuidle: Fix ct_idle_*() usage
The whole disable-RCU, enable-IRQS dance is very intricate since changing IRQ state is traced, which depends on RCU.
Add two helpers for the cpuidle case that mirror
cpuidle: Fix ct_idle_*() usage
The whole disable-RCU, enable-IRQS dance is very intricate since changing IRQ state is traced, which depends on RCU.
Add two helpers for the cpuidle case that mirror the entry code:
ct_cpuidle_enter() ct_cpuidle_exit()
And fix all the cases where the enter/exit dance was buggy.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Tony Lindgren <[email protected]> Tested-by: Ulf Hansson <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2 |
|
| #
e67198cc |
| 08-Jun-2022 |
Frederic Weisbecker <[email protected]> |
context_tracking: Take idle eqs entrypoints over RCU
The RCU dynticks counter is going to be merged into the context tracking subsystem. Start with moving the idle extended quiescent states entrypoi
context_tracking: Take idle eqs entrypoints over RCU
The RCU dynticks counter is going to be merged into the context tracking subsystem. Start with moving the idle extended quiescent states entrypoints to context tracking. For now those are dumb redirections to existing RCU calls.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Nicolas Saenz Julienne <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Xiongfeng Wang <[email protected]> Cc: Yu Liao <[email protected]> Cc: Phil Auld <[email protected]> Cc: Paul Gortmaker<[email protected]> Cc: Alex Belits <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Tested-by: Nicolas Saenz Julienne <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3 |
|
| #
16bf5a5e |
| 13-Apr-2022 |
Thomas Gleixner <[email protected]> |
smp: Rename flush_smp_call_function_from_idle()
This is invoked from the stopper thread too, which is definitely not idle. Rename it to flush_smp_call_function_queue() and fixup the callers.
Signed
smp: Rename flush_smp_call_function_from_idle()
This is invoked from the stopper thread too, which is definitely not idle. Rename it to flush_smp_call_function_queue() and fixup the callers.
Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.18-rc2, v5.18-rc1, v5.17 |
|
| #
8b023acc |
| 14-Mar-2022 |
Nick Desaulniers <[email protected]> |
lockdep: Fix -Wunused-parameter for _THIS_IP_
While looking into a bug related to the compiler's handling of addresses of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep. Drive by
lockdep: Fix -Wunused-parameter for _THIS_IP_
While looking into a bug related to the compiler's handling of addresses of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep. Drive by cleanup.
-Wunused-parameter: kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip' kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip' kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip'
Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
5b6547ed |
| 16-Mar-2022 |
Peter Zijlstra <[email protected]> |
sched/core: Fix forceidle balancing
Steve reported that ChromeOS encounters the forceidle balancer being ran from rt_mutex_setprio()'s balance_callback() invocation and explodes.
Now, the forceidle
sched/core: Fix forceidle balancing
Steve reported that ChromeOS encounters the forceidle balancer being ran from rt_mutex_setprio()'s balance_callback() invocation and explodes.
Now, the forceidle balancer gets queued every time the idle task gets selected, set_next_task(), which is strictly too often. rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:
queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */ running = task_current(rq, p); /* rq->curr == p */
if (queued) dequeue_task(...); if (running) put_prev_task(...);
/* change task properties */
if (queued) enqueue_task(...); if (running) set_next_task(...);
However, rt_mutex_setprio() will explicitly not run this pattern on the idle task (since priority boosting the idle task is quite insane). Most other 'change' pattern users are pidhash based and would also not apply to idle.
Also, the change pattern doesn't contain a __balance_callback() invocation and hence we could have an out-of-band balance-callback, which *should* trigger the WARN in rq_pin_lock() (which guards against this exact anti-pattern).
So while none of that explains how this happens, it does indicate that having it in set_next_task() might not be the most robust option.
Instead, explicitly queue the forceidle balancer from pick_next_task() when it does indeed result in forceidle selection. Having it here, ensures it can only be triggered under the __schedule() rq->lock instance, and hence must be ran from that context.
This also happens to clean up the code a little, so win-win.
Fixes: d2dfa17bc7de ("sched: Trivial forced-newidle balancer") Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: T.J. Alumbaugh <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
f96eca43 |
| 22-Feb-2022 |
Ingo Molnar <[email protected]> |
sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there
Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kerne
sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there
Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kernel/sched/build_policy.c:
kernel/sched/idle.c
kernel/sched/rt.c
kernel/sched/cpudeadline.c kernel/sched/pelt.c
kernel/sched/cputime.c kernel/sched/deadline.c
With the exception of fair.c, which we continue to build as a separate file for build efficiency and parallelism reasons.
Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1 |
|
| #
98484179 |
| 06-Sep-2021 |
Sebastian Andrzej Siewior <[email protected]> |
sched/idle: Make the idle timer expire in hard interrupt context
The intel powerclamp driver will setup a per-CPU worker with RT priority. The worker will then invoke play_idle() in which it remains
sched/idle: Make the idle timer expire in hard interrupt context
The intel powerclamp driver will setup a per-CPU worker with RT priority. The worker will then invoke play_idle() in which it remains in the idle poll loop until it is stopped by the timer it started earlier.
That timer needs to expire in hard interrupt context on PREEMPT_RT. Otherwise the timer will expire in ksoftirqd as a SOFT timer but that task won't be scheduled on the CPU because its priority is lower than the priority of the worker which is in the idle loop.
Always expire the idle timer in hard interrupt context.
Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|