|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
dd5bdaf2 |
| 17-Mar-2025 |
Ingo Molnar <[email protected]> |
sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
All the big Linux distros enable CONFIG_SCHED_DEBUG, because the various features it provides help not just with kernel development,
sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
All the big Linux distros enable CONFIG_SCHED_DEBUG, because the various features it provides help not just with kernel development, but with system administration and user-space software development as well.
Reflect this reality and enable this functionality unconditionally.
Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Shrikanth Hegde <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ben Segall <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
f7d2728c |
| 17-Mar-2025 |
Ingo Molnar <[email protected]> |
sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
The scheduler has this special SCHED_WARN() facility that depends on CONFIG_SCHED_DEBUG.
Since CONFIG_SCHED_DEBUG is getting removed, convert S
sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
The scheduler has this special SCHED_WARN() facility that depends on CONFIG_SCHED_DEBUG.
Since CONFIG_SCHED_DEBUG is getting removed, convert SCHED_WARN() to WARN_ON_ONCE().
Note that the warning output isn't 100% equivalent:
#define SCHED_WARN_ON(x) WARN_ONCE(x, #x)
Because SCHED_WARN_ON() would output the 'x' condition as well, while WARN_ONCE() will only show a backtrace.
Hopefully these are rare enough to not really matter.
If it does, we should probably introduce a new WARN_ON() variant that outputs the condition in stringified form, or improve WARN_ON() itself.
Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Shrikanth Hegde <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ben Segall <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc7 |
|
| #
45007c6f |
| 13-Mar-2025 |
Juri Lelli <[email protected]> |
sched/deadline: Generalize unique visiting of root domains
Bandwidth checks and updates that work on root domains currently employ a cookie mechanism for efficiency. This mechanism is very much tied
sched/deadline: Generalize unique visiting of root domains
Bandwidth checks and updates that work on root domains currently employ a cookie mechanism for efficiency. This mechanism is very much tied to when root domains are first created and initialized.
Generalize the cookie mechanism so that it can be used also later at runtime while updating root domains. Also, additionally guard it with sched_domains_mutex, since domains need to be stable while updating them (and it will be required for further dynamic changes).
Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug") Reported-by: Jon Hunter <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Jon Hunter <[email protected]> Tested-by: Dietmar Eggemann <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
ee13da87 |
| 05-Feb-2025 |
Nam Cao <[email protected]> |
sched: Switch to use hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initialization o
sched: Switch to use hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism.
Signed-off-by: Nam Cao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/a55e849cba3c41b4c5708be6ea6be6f337d1a8fb.1738746821.git.namcao@linutronix.de
show more ...
|
|
Revision tags: v6.14-rc1 |
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
af0c8b2b |
| 09-Oct-2024 |
Peter Zijlstra <[email protected]> |
sched: Split scheduler and execution contexts
Let's define the "scheduling context" as all the scheduler state in task_struct for the task chosen to run, which we'll call the donor task, and the "ex
sched: Split scheduler and execution contexts
Let's define the "scheduling context" as all the scheduler state in task_struct for the task chosen to run, which we'll call the donor task, and the "execution context" as all state required to actually run the task.
Currently both are intertwined in task_struct. We want to logically split these such that we can use the scheduling context of the donor task selected to be scheduled, but use the execution context of a different task to actually be run.
To this purpose, introduce rq->donor field to point to the task_struct chosen from the runqueue by the scheduler, and will be used for scheduler state, and preserve rq->curr to indicate the execution context of the task that will actually be run.
This patch introduces the donor field as a union with curr, so it doesn't cause the contexts to be split yet, but adds the logic to handle everything separately.
[add additional comments and update more sched_class code to use rq::proxy] [jstultz: Rebased and resolved minor collisions, reworked to use accessors, tweaked update_curr_common to use rq_proxy fixing rt scheduling issues]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Connor O'Brien <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Metin Kaya <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Tested-by: Metin Kaya <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
18adad1d |
| 09-Oct-2024 |
Connor O'Brien <[email protected]> |
sched: Consolidate pick_*_task to task_is_pushable helper
This patch consolidates rt and deadline pick_*_task functions to a task_is_pushable() helper
This patch was broken out from a larger chain
sched: Consolidate pick_*_task to task_is_pushable helper
This patch consolidates rt and deadline pick_*_task functions to a task_is_pushable() helper
This patch was broken out from a larger chain migration patch originally by Connor O'Brien.
[jstultz: split out from larger chain migration patch, renamed helper function]
Signed-off-by: Connor O'Brien <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Metin Kaya <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Christian Loehle <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Tested-by: Metin Kaya <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
2b05a0b4 |
| 09-Oct-2024 |
Connor O'Brien <[email protected]> |
sched: Add move_queued_task_locked helper
Switch logic that deactivates, sets the task cpu, and reactivates a task on a different rq to use a helper that will be later extended to push entire blocke
sched: Add move_queued_task_locked helper
Switch logic that deactivates, sets the task cpu, and reactivates a task on a different rq to use a helper that will be later extended to push entire blocked task chains.
This patch was broken out from a larger chain migration patch originally by Connor O'Brien.
[jstultz: split out from larger chain migration patch] Signed-off-by: Connor O'Brien <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Metin Kaya <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Qais Yousef <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Tested-by: Metin Kaya <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
b2d70222 |
| 13-Aug-2024 |
Peter Zijlstra <[email protected]> |
sched: Add put_prev_task(.next)
In order to tell the previous sched_class what the next task is, add put_prev_task(.next).
Notable SCX will use this to:
1) determine the next task will leave the
sched: Add put_prev_task(.next)
In order to tell the previous sched_class what the next task is, add put_prev_task(.next).
Notable SCX will use this to:
1) determine the next task will leave the SCX sched class and push the current task to another CPU if possible. 2) statistics on how often and which other classes preempt it
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
fd03c5b8 |
| 13-Aug-2024 |
Peter Zijlstra <[email protected]> |
sched: Rework pick_next_task()
The current rule is that:
pick_next_task() := pick_task() + set_next_task(.first = true)
And many classes implement it directly as such. Change things around to ma
sched: Rework pick_next_task()
The current rule is that:
pick_next_task() := pick_task() + set_next_task(.first = true)
And many classes implement it directly as such. Change things around to make pick_next_task() optional while also changing the definition to:
pick_next_task(prev) := pick_task() + put_prev_task() + set_next_task(.first = true)
The reason is that sched_ext would like to have a 'final' call that knows the next task. By placing put_prev_task() right next to set_next_task() (as it already is for sched_core) this becomes trivial.
As a bonus, this is a nice cleanup on its own.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3 |
|
| #
863ccdbb |
| 03-Apr-2024 |
Peter Zijlstra <[email protected]> |
sched: Allow sched_class::dequeue_task() to fail
Change the function signature of sched_class::dequeue_task() to return a boolean, allowing future patches to 'fail' dequeue.
Signed-off-by: Peter Zi
sched: Allow sched_class::dequeue_task() to fail
Change the function signature of sched_class::dequeue_task() to return a boolean, allowing future patches to 'fail' dequeue.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Tested-by: Valentin Schneider <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
5f6bd380 |
| 27-May-2024 |
Peter Zijlstra <[email protected]> |
sched/rt: Remove default bandwidth control
Now that fair_server exists, we no longer need RT bandwidth control unless RT_GROUP_SCHED.
Enable fair_server with parameters equivalent to RT throttling.
sched/rt: Remove default bandwidth control
Now that fair_server exists, we no longer need RT bandwidth control unless RT_GROUP_SCHED.
Enable fair_server with parameters equivalent to RT throttling.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: "Peter Zijlstra (Intel)" <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: "Vineeth Pillai (Google)" <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Link: https://lore.kernel.org/r/14d562db55df5c3c780d91940743acb166895ef7.1716811044.git.bristot@kernel.org
show more ...
|
| #
78eb4ea2 |
| 24-Jul-2024 |
Joel Granados <[email protected]> |
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ct
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified.
This patch has been generated by the following coccinelle script:
``` virtual patch
@r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@ identifier func, ctl, write, buffer, lenp, ppos; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... }
@r3@ identifier func; @@
int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *);
@r4@ identifier func, ctl; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *);
@r5@ identifier func, write, buffer, lenp, ppos; @@
int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration.
Co-developed-by: Thomas Weißschuh <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Co-developed-by: Joel Granados <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
| #
402de7fc |
| 27-May-2024 |
Ingo Molnar <[email protected]> |
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ing
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
f532376e |
| 27-Jun-2023 |
Joel Granados <[email protected]> |
scheduler: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (se
scheduler: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
rm sentinel element from ctl_table arrays
Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Valentin Schneider <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
| #
5d69eca5 |
| 04-Nov-2023 |
Peter Zijlstra <[email protected]> |
sched: Unify runtime accounting across classes
All classes use sched_entity::exec_start to track runtime and have copies of the exact same code around to compute runtime.
Collapse all that.
Signed
sched: Unify runtime accounting across classes
All classes use sched_entity::exec_start to track runtime and have copies of the exact same code around to compute runtime.
Collapse all that.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Phil Auld <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Link: https://lkml.kernel.org/r/54d148a144f26d9559698c4dd82d8859038a7380.1699095159.git.bristot@kernel.org
show more ...
|
| #
f0498d2a |
| 10-Oct-2023 |
Peter Zijlstra <[email protected]> |
sched: Fix stop_one_cpu_nowait() vs hotplug
Kuyo reported sporadic failures on a sched_setaffinity() vs CPU hotplug stress-test -- notably affine_move_task() remains stuck in wait_for_completion(),
sched: Fix stop_one_cpu_nowait() vs hotplug
Kuyo reported sporadic failures on a sched_setaffinity() vs CPU hotplug stress-test -- notably affine_move_task() remains stuck in wait_for_completion(), leading to a hung-task detector warning.
Specifically, it was reported that stop_one_cpu_nowait(.fn = migration_cpu_stop) returns false -- this stopper is responsible for the matching complete().
The race scenario is:
CPU0 CPU1
// doing _cpu_down()
__set_cpus_allowed_ptr() task_rq_lock(); takedown_cpu() stop_machine_cpuslocked(take_cpu_down..)
<PREEMPT: cpu_stopper_thread() MULTI_STOP_PREPARE ... __set_cpus_allowed_ptr_locked() affine_move_task() task_rq_unlock();
<PREEMPT: cpu_stopper_thread()\> ack_state() MULTI_STOP_RUN take_cpu_down() __cpu_disable(); stop_machine_park(); stopper->enabled = false; /> /> stop_one_cpu_nowait(.fn = migration_cpu_stop); if (stopper->enabled) // false!!!
That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the stopper thread gets a chance to preempt and allows the cpu-down for the target CPU to complete.
OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to issue a wakeup, it must not be ran under the scheduler locks.
Solve this apparent contradiction by keeping preemption disabled over the unlock + queue_stopper combination:
preempt_disable(); task_rq_unlock(...); if (!stop_pending) stop_one_cpu_nowait(...) preempt_enable();
This respects the lock ordering contraints while still avoiding the above race. That is, if we find the CPU is online under rq-lock, the targeted stop_one_cpu_nowait() must succeed.
Apply this pattern to all similar stop_one_cpu_nowait() invocations.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Reported-by: "Kuyo Chang (張建文)" <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: "Kuyo Chang (張建文)" <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
7bc26384 |
| 09-Oct-2023 |
Vincent Guittot <[email protected]> |
sched/topology: Consolidate and clean up access to a CPU's max compute capacity
Remove the rq::cpu_capacity_orig field and use arch_scale_cpu_capacity() instead.
The scheduler uses 3 methods to get
sched/topology: Consolidate and clean up access to a CPU's max compute capacity
Remove the rq::cpu_capacity_orig field and use arch_scale_cpu_capacity() instead.
The scheduler uses 3 methods to get access to a CPU's max compute capacity:
- arch_scale_cpu_capacity(cpu) which is the default way to get a CPU's capacity.
- cpu_capacity_orig field which is periodically updated with arch_scale_cpu_capacity().
- capacity_orig_of(cpu) which encapsulates rq->cpu_capacity_orig.
There is no real need to save the value returned by arch_scale_cpu_capacity() in struct rq. arch_scale_cpu_capacity() returns:
- either a per_cpu variable.
- or a const value for systems which have only one capacity.
Remove rq::cpu_capacity_orig and use arch_scale_cpu_capacity() everywhere.
No functional changes.
Some performance tests on Arm64:
- small SMP device (hikey): no noticeable changes - HMP device (RB5): hackbench shows minor improvement (1-2%) - large smp (thx2): hackbench and tbench shows minor improvement (1%)
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
089768df |
| 08-Oct-2023 |
Yajun Deng <[email protected]> |
sched/rt: Change the type of 'sysctl_sched_rt_period' from 'unsigned int' to 'int'
Doing this matches the natural type of 'int' based calculus in sched_rt_handler(), and also enables the adding in o
sched/rt: Change the type of 'sysctl_sched_rt_period' from 'unsigned int' to 'int'
Doing this matches the natural type of 'int' based calculus in sched_rt_handler(), and also enables the adding in of a correct upper bounds check on the sysctl interface.
[ mingo: Rewrote the changelog. ]
Signed-off-by: Yajun Deng <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
079be8fc |
| 02-Oct-2023 |
Cyril Hrubis <[email protected]> |
sched/rt: Disallow writing invalid values to sched_rt_period_us
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned
sched/rt: Disallow writing invalid values to sched_rt_period_us
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/[email protected]/
Signed-off-by: Cyril Hrubis <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
612f769e |
| 11-Aug-2023 |
Valentin Schneider <[email protected]> |
sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask
Sebastian noted that the rto_push_work IRQ work can be queued for a CPU that has an empty pushable_tasks list, which means nothing useful
sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask
Sebastian noted that the rto_push_work IRQ work can be queued for a CPU that has an empty pushable_tasks list, which means nothing useful will be done in the IPI other than queue the work for the next CPU on the rto_mask.
rto_push_irq_work_func() only operates on tasks in the pushable_tasks list, but the conditions for that irq_work to be queued (and for a CPU to be added to the rto_mask) rely on rq_rt->nr_migratory instead.
nr_migratory is increased whenever an RT task entity is enqueued and it has nr_cpus_allowed > 1. Unlike the pushable_tasks list, nr_migratory includes a rt_rq's current task. This means a rt_rq can have a migratible current, N non-migratible queued tasks, and be flagged as overloaded / have its CPU set in the rto_mask, despite having an empty pushable_tasks list.
Make an rt_rq's overload logic be driven by {enqueue,dequeue}_pushable_task(). Since rt_rq->{rt_nr_migratory,rt_nr_total} become unused, remove them.
Note that the case where the current task is pushed away to make way for a migration-disabled task remains unchanged: the migration-disabled task has to be in the pushable_tasks list in the first place, which means it has nr_cpus_allowed > 1.
Reported-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Valentin Schneider <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
e23edc86 |
| 19-Sep-2023 |
Ingo Molnar <[email protected]> |
sched/fair: Rename check_preempt_curr() to wakeup_preempt()
The name is a bit opaque - make it clear that this is about wakeup preemption.
Also rename the ->check_preempt_curr() methods similarly.
sched/fair: Rename check_preempt_curr() to wakeup_preempt()
The name is a bit opaque - make it clear that this is about wakeup preemption.
Also rename the ->check_preempt_curr() methods similarly.
Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
| #
c1fc6484 |
| 02-Aug-2023 |
Cyril Hrubis <[email protected]> |
sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always go
sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1
Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Petr Vorel <[email protected]> Acked-by: Mel Gorman <[email protected]> Tested-by: Petr Vorel <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
c7fcb998 |
| 02-Aug-2023 |
Cyril Hrubis <[email protected]> |
sched/rt: Fix sysctl_sched_rr_timeslice intial value
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_g
sched/rt: Fix sysctl_sched_rr_timeslice intial value
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Petr Vorel <[email protected]> Acked-by: Mel Gorman <[email protected]> Tested-by: Petr Vorel <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3 |
|
| #
feffe5bb |
| 28-Aug-2022 |
Schspa Shi <[email protected]> |
sched/rt: Fix bad task migration for rt tasks
Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing") allows find_lock_lowest_rq() to pick a task with migration disabled. The purpose of
sched/rt: Fix bad task migration for rt tasks
Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing") allows find_lock_lowest_rq() to pick a task with migration disabled. The purpose of the commit is to push the current running task on the CPU that has the migrate_disable() task away.
However, there is a race which allows a migrate_disable() task to be migrated. Consider:
CPU0 CPU1 push_rt_task check is_migration_disabled(next_task)
task not running and migration_disabled == 0
find_lock_lowest_rq(next_task, rq); _double_lock_balance(this_rq, busiest); raw_spin_rq_unlock(this_rq); double_rq_lock(this_rq, busiest); <<wait for busiest rq>> <wakeup> task become running migrate_disable(); <context out> deactivate_task(rq, next_task, 0); set_task_cpu(next_task, lowest_rq->cpu); WARN_ON_ONCE(is_migration_disabled(p));
Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing") Signed-off-by: Schspa Shi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Tested-by: Dwaine Gonyier <[email protected]>
show more ...
|