|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
1a491044 |
| 02-Dec-2024 |
Vincent Guittot <[email protected]> |
sched/fair: Use the new cfs_rq.h_nr_runnable
Use the new h_nr_runnable that tracks only queued and runnable tasks in the statistics that are used to balance the system:
- PELT runnable_avg - deci
sched/fair: Use the new cfs_rq.h_nr_runnable
Use the new h_nr_runnable that tracks only queued and runnable tasks in the statistics that are used to balance the system:
- PELT runnable_avg - deciding if a group is overloaded or has spare capacity - numa stats - reduced capacity management - load balance - nohz kick
It should be noticed that the rq->nr_running still counts the delayed dequeued tasks as delayed dequeue is a fair feature that is meaningless at core level.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
7b8a702d |
| 02-Dec-2024 |
Vincent Guittot <[email protected]> |
sched/fair: Rename h_nr_running into h_nr_queued
With delayed dequeued feature, a sleeping sched_entity remains queued in the rq until its lag has elapsed but can't run. Rename h_nr_running into h_n
sched/fair: Rename h_nr_running into h_nr_queued
With delayed dequeued feature, a sleeping sched_entity remains queued in the rq until its lag has elapsed but can't run. Rename h_nr_running into h_nr_queued to reflect this new behavior.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
76f2f783 |
| 02-Dec-2024 |
Peter Zijlstra <[email protected]> |
sched/eevdf: More PELT vs DELAYED_DEQUEUE
Vincent and Dietmar noted that while commit fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE") fixes the entity runnable stats, it does not adjust
sched/eevdf: More PELT vs DELAYED_DEQUEUE
Vincent and Dietmar noted that while commit fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE") fixes the entity runnable stats, it does not adjust the cfs_rq runnable stats, which are based off of h_nr_running.
Track h_nr_delayed such that we can discount those and adjust the signal.
Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE") Closes: https://lore.kernel.org/lkml/[email protected]/ Closes: https://lore.kernel.org/lkml/CAKfTPtCNUvWE_GX5LyvTF-WdxUT=ZgvZZv-4t=eWntg5uOFqiQ@mail.gmail.com/ [ Fixes checkpatch warnings and rebased ] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reported-by: Dietmar Eggemann <[email protected]> Reported-by: Vincent Guittot <[email protected]> Signed-off-by: "Peter Zijlstra (Intel)" <[email protected]> Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
af0c8b2b |
| 09-Oct-2024 |
Peter Zijlstra <[email protected]> |
sched: Split scheduler and execution contexts
Let's define the "scheduling context" as all the scheduler state in task_struct for the task chosen to run, which we'll call the donor task, and the "ex
sched: Split scheduler and execution contexts
Let's define the "scheduling context" as all the scheduler state in task_struct for the task chosen to run, which we'll call the donor task, and the "execution context" as all state required to actually run the task.
Currently both are intertwined in task_struct. We want to logically split these such that we can use the scheduling context of the donor task selected to be scheduled, but use the execution context of a different task to actually be run.
To this purpose, introduce rq->donor field to point to the task_struct chosen from the runqueue by the scheduler, and will be used for scheduler state, and preserve rq->curr to indicate the execution context of the task that will actually be run.
This patch introduces the donor field as a union with curr, so it doesn't cause the contexts to be split yet, but adds the logic to handle everything separately.
[add additional comments and update more sched_class code to use rq::proxy] [jstultz: Rebased and resolved minor collisions, reworked to use accessors, tweaked update_curr_common to use rq_proxy fixing rt scheduling issues]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Connor O'Brien <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Metin Kaya <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Tested-by: Metin Kaya <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
902d67a2 |
| 11-Sep-2024 |
Tejun Heo <[email protected]> |
sched: Move update_other_load_avgs() to kernel/sched/pelt.c
96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") added update_other_load_avgs() in kernel/sched/
sched: Move update_other_load_avgs() to kernel/sched/pelt.c
96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") added update_other_load_avgs() in kernel/sched/syscalls.c right above effective_cpu_util(). This location didn't fit that well in the first place, and with 5d871a63997f ("sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c") moving effective_cpu_util() to kernel/sched/fair.c, it looks even more out of place.
Relocate the function to kernel/sched/pelt.c where all its callees are.
No functional changes.
Signed-off-by: Tejun Heo <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
402de7fc |
| 27-May-2024 |
Ingo Molnar <[email protected]> |
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ing
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
d4dbc991 |
| 26-Mar-2024 |
Vincent Guittot <[email protected]> |
sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
Now that cpufreq provides a pressure value to the scheduler, rename arch_update_thermal_pressure into HW pressure to
sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
Now that cpufreq provides a pressure value to the scheduler, rename arch_update_thermal_pressure into HW pressure to reflect that it returns a pressure applied by HW (i.e. with a high frequency change) and not always related to thermal mitigation but also generated by max current limitation as an example. Such high frequency signal needs filtering to be smoothed and provide an value that reflects the average available capacity into the scheduler time scale.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Lukasz Luba <[email protected]> Reviewed-by: Qais Yousef <[email protected]> Reviewed-by: Lukasz Luba <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.9-rc1 |
|
| #
d72cf624 |
| 12-Mar-2024 |
Ingo Molnar <[email protected]> |
sched/balancing: Fix a couple of outdated function names in comments
The 'idle_balance()' function hasn't existed for years, and there's no load_balance_newidle() either - both are sched_balance_new
sched/balancing: Fix a couple of outdated function names in comments
The 'idle_balance()' function hasn't existed for years, and there's no load_balance_newidle() either - both are sched_balance_newidle() today.
Reported-by: Honglei Wang <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/ZfAwNufbiyt/[email protected]
show more ...
|
|
Revision tags: v6.8 |
|
| #
391b7a53 |
| 08-Mar-2024 |
Ingo Molnar <[email protected]> |
sched/balancing: Rename update_blocked_averages() => sched_balance_update_blocked_averages()
Standardize scheduler load-balancing function names on the sched_balance_() prefix.
Signed-off-by: Ingo
sched/balancing: Rename update_blocked_averages() => sched_balance_update_blocked_averages()
Standardize scheduler load-balancing function names on the sched_balance_() prefix.
Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Shrikanth Hegde <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6 |
|
| #
1b8a955d |
| 12-Oct-2023 |
Mathieu Desnoyers <[email protected]> |
sched: Make PELT acronym definition searchable
The PELT acronym definition can be found right at the top of kernel/sched/pelt.c (of course), but it cannot be found through use of
grep -r PELT kerne
sched: Make PELT acronym definition searchable
The PELT acronym definition can be found right at the top of kernel/sched/pelt.c (of course), but it cannot be found through use of
grep -r PELT kernel/sched/
Add the acronym "(PELT)" after "Per Entity Load Tracking" at the top of the source file.
Signed-off-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
f96eca43 |
| 22-Feb-2022 |
Ingo Molnar <[email protected]> |
sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there
Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kerne
sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there
Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kernel/sched/build_policy.c:
kernel/sched/idle.c
kernel/sched/rt.c
kernel/sched/cpudeadline.c kernel/sched/pelt.c
kernel/sched/cputime.c kernel/sched/deadline.c
With the exception of fair.c, which we continue to build as a separate file for build efficiency and parallelism reasons.
Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4 |
|
| #
3b03706f |
| 18-Mar-2021 |
Ingo Molnar <[email protected]> |
sched: Fix various typos
Fix ~42 single-word typos in scheduler code comments.
We have accumulated a few fun ones over the years. :-)
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlst
sched: Fix various typos
Fix ~42 single-word typos in scheduler code comments.
We have accumulated a few fun ones over the years. :-)
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ben Segall <[email protected]> Cc: Mel Gorman <[email protected]> Cc: [email protected]
show more ...
|
|
Revision tags: v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4 |
|
| #
9d246053 |
| 29-Jun-2020 |
Phil Auld <[email protected]> |
sched: Add a tracepoint to track rq->nr_running
Add a bare tracepoint trace_sched_update_nr_running_tp which tracks ->nr_running CPU's rq. This is used to accurately trace this data and provide a vi
sched: Add a tracepoint to track rq->nr_running
Add a bare tracepoint trace_sched_update_nr_running_tp which tracks ->nr_running CPU's rq. This is used to accurately trace this data and provide a visualization of scheduler imbalances in, for example, the form of a heat map. The tracepoint is accessed by loading an external kernel module. An example module (forked from Qais' module and including the pelt related tracepoints) can be found at:
https://github.com/auldp/tracepoints-helpers.git
A script to turn the trace-cmd report output into a heatmap plot can be found at:
https://github.com/jirvoz/plot-nr-running
The tracepoints are added to add_nr_running() and sub_nr_running() which are in kernel/sched/sched.h. In order to avoid CREATE_TRACE_POINTS in the header a wrapper call is used and the trace/events/sched.h include is moved before sched.h in kernel/sched/core.
Signed-off-by: Phil Auld <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.8-rc3, v5.8-rc2, v5.8-rc1 |
|
| #
87e867b4 |
| 12-Jun-2020 |
Vincent Guittot <[email protected]> |
sched/pelt: Cleanup PELT divider
Factorize in a single place the calculation of the divider to be used to to compute *_avg from *_sum value
Suggested-by: Dietmar Eggemann <[email protected]>
sched/pelt: Cleanup PELT divider
Factorize in a single place the calculation of the divider to be used to to compute *_avg from *_sum value
Suggested-by: Dietmar Eggemann <[email protected]> Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
844eb645 |
| 03-Jun-2020 |
Dietmar Eggemann <[email protected]> |
sched/pelt: Remove redundant cap_scale() definition
Besides in PELT cap_scale() is used in the Deadline scheduler class for scale-invariant bandwidth enforcement. Remove the cap_scale() definition i
sched/pelt: Remove redundant cap_scale() definition
Besides in PELT cap_scale() is used in the Deadline scheduler class for scale-invariant bandwidth enforcement. Remove the cap_scale() definition in kernel/sched/pelt.c and keep the one in kernel/sched/sched.h.
Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5 |
|
| #
95d68593 |
| 06-May-2020 |
Vincent Guittot <[email protected]> |
sched/pelt: Sync util/runnable_sum with PELT window when propagating
update_tg_cfs_*() propagate the impact of the attach/detach of an entity down into the cfs_rq hierarchy and must keep the sync wi
sched/pelt: Sync util/runnable_sum with PELT window when propagating
update_tg_cfs_*() propagate the impact of the attach/detach of an entity down into the cfs_rq hierarchy and must keep the sync with the current pelt window.
Even if we can't sync child cfs_rq and its group se, we can sync the group se and its parent cfs_rq with current position in the PELT window. In fact, we must keep them sync in order to stay also synced with others entities and group entities that are already attached to the cfs_rq.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3 |
|
| #
76504793 |
| 22-Feb-2020 |
Thara Gopinath <[email protected]> |
sched/pelt: Add support to track thermal pressure
Extrapolating on the existing framework to track rt/dl utilization using pelt signals, add a similar mechanism to track thermal pressure. The differ
sched/pelt: Add support to track thermal pressure
Extrapolating on the existing framework to track rt/dl utilization using pelt signals, add a similar mechanism to track thermal pressure. The difference here from rt/dl utilization tracking is that, instead of tracking time spent by a CPU running a RT/DL task through util_avg, the average thermal pressure is tracked through load_avg. This is because thermal pressure signal is weighted time "delta" capacity unlike util_avg which is binary. "delta capacity" here means delta between the actual capacity of a CPU and the decreased capacity a CPU due to a thermal event.
In order to track average thermal pressure, a new sched_avg variable avg_thermal is introduced. Function update_thermal_load_avg can be called to do the periodic bookkeeping (accumulate, decay and average) of the thermal pressure.
Reviewed-by: Vincent Guittot <[email protected]> Signed-off-by: Thara Gopinath <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
9f683953 |
| 24-Feb-2020 |
Vincent Guittot <[email protected]> |
sched/pelt: Add a new runnable average signal
Now that runnable_load_avg has been removed, we can replace it by a new signal that will highlight the runnable pressure on a cfs_rq. This signal track
sched/pelt: Add a new runnable average signal
Now that runnable_load_avg has been removed, we can replace it by a new signal that will highlight the runnable pressure on a cfs_rq. This signal track the waiting time of tasks on rq and can help to better define the state of rqs.
At now, only util_avg is used to define the state of a rq: A rq with more that around 80% of utilization and more than 1 tasks is considered as overloaded.
But the util_avg signal of a rq can become temporaly low after that a task migrated onto another rq which can bias the classification of the rq.
When tasks compete for the same rq, their runnable average signal will be higher than util_avg as it will include the waiting time and we can use this signal to better classify cfs_rqs.
The new runnable_avg will track the runnable time of a task which simply adds the waiting time to the running time. The runnable _avg of cfs_rq will be the /Sum of se's runnable_avg and the runnable_avg of group entity will follow the one of the rq similarly to util_avg.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: "Dietmar Eggemann <[email protected]>" Acked-by: Peter Zijlstra <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Phil Auld <[email protected]> Cc: Hillf Danton <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
0dacee1b |
| 24-Feb-2020 |
Vincent Guittot <[email protected]> |
sched/pelt: Remove unused runnable load average
Now that runnable_load_avg is no more used, we can remove it to make space for a new signal.
Signed-off-by: Vincent Guittot <[email protected]
sched/pelt: Remove unused runnable load average
Now that runnable_load_avg is no more used, we can remove it to make space for a new signal.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: "Dietmar Eggemann <[email protected]>" Acked-by: Peter Zijlstra <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Phil Auld <[email protected]> Cc: Hillf Danton <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2 |
|
| #
d040e073 |
| 13-Dec-2019 |
Peng Wang <[email protected]> |
schied/fair: Skip calculating @contrib without load
Because of the:
if (!load) runnable = running = 0;
clause in ___update_load_sum(), all the actual users of @contrib in accumulate_sum():
if
schied/fair: Skip calculating @contrib without load
Because of the:
if (!load) runnable = running = 0;
clause in ___update_load_sum(), all the actual users of @contrib in accumulate_sum():
if (load) sa->load_sum += load * contrib; if (runnable) sa->runnable_load_sum += runnable * contrib; if (running) sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
don't happen, and therefore we don't care what @contrib actually is and calculating it is pointless.
If we count the times when @load equals zero and not as below:
if (load) { load_is_not_zero_count++; contrib = __accumulate_pelt_segments(periods, 1024 - sa->period_contrib,delta); } else load_is_zero_count++;
As we can see, load_is_zero_count is much bigger than load_is_zero_count, and the gap is gradually widening:
load_is_zero_count: 6016044 times load_is_not_zero_count: 244316 times 19:50:43 up 1 min, 1 user, load average: 0.09, 0.06, 0.02
load_is_zero_count: 7956168 times load_is_not_zero_count: 261472 times 19:51:42 up 2 min, 1 user, load average: 0.03, 0.05, 0.01
load_is_zero_count: 10199896 times load_is_not_zero_count: 278364 times 19:52:51 up 3 min, 1 user, load average: 0.06, 0.05, 0.01
load_is_zero_count: 14333700 times load_is_not_zero_count: 318424 times 19:54:53 up 5 min, 1 user, load average: 0.01, 0.03, 0.00
Perhaps we can gain some performance advantage by saving these unnecessary calculation.
Signed-off-by: Peng Wang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot < [email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4 |
|
| #
8de6242c |
| 04-Jun-2019 |
Qais Yousef <[email protected]> |
sched/debug: Add new tracepoint to track PELT at se level
The new tracepoint allows tracking PELT signals at sched_entity level. Which is supported in CFS tasks and taskgroups only.
Signed-off-by:
sched/debug: Add new tracepoint to track PELT at se level
The new tracepoint allows tracking PELT signals at sched_entity level. Which is supported in CFS tasks and taskgroups only.
Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
ba19f51f |
| 04-Jun-2019 |
Qais Yousef <[email protected]> |
sched/debug: Add new tracepoints to track PELT at rq level
The new tracepoints allow tracking PELT signals at rq level for all scheduling classes + irq.
Signed-off-by: Qais Yousef <qais.yousef@arm.
sched/debug: Add new tracepoints to track PELT at rq level
The new tracepoints allow tracking PELT signals at rq level for all scheduling classes + irq.
Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
8ec59c0f |
| 17-Jun-2019 |
Vincent Guittot <[email protected]> |
sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is unused since commit:
765d0af19f5f ("sched/topol
sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is unused since commit:
765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'")
Remove it.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Viresh Kumar <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4 |
|
| #
23127296 |
| 23-Jan-2019 |
Vincent Guittot <[email protected]> |
sched/fair: Update scale invariance of PELT
The current implementation of load tracking invariance scales the contribution with current frequency and uarch performance (only for utilization) of the
sched/fair: Update scale invariance of PELT
The current implementation of load tracking invariance scales the contribution with current frequency and uarch performance (only for utilization) of the CPU. One main result of this formula is that the figures are capped by current capacity of CPU. Another one is that the load_avg is not invariant because not scaled with uarch.
The util_avg of a periodic task that runs r time slots every p time slots varies in the range :
U * (1-y^r)/(1-y^p) * y^i < Utilization < U * (1-y^r)/(1-y^p)
with U is the max util_avg value = SCHED_CAPACITY_SCALE
At a lower capacity, the range becomes:
U * C * (1-y^r')/(1-y^p) * y^i' < Utilization < U * C * (1-y^r')/(1-y^p)
with C reflecting the compute capacity ratio between current capacity and max capacity.
so C tries to compensate changes in (1-y^r') but it can't be accurate.
Instead of scaling the contribution value of PELT algo, we should scale the running time. The PELT signal aims to track the amount of computation of tasks and/or rq so it seems more correct to scale the running time to reflect the effective amount of computation done since the last update.
In order to be fully invariant, we need to apply the same amount of running time and idle time whatever the current capacity. Because running at lower capacity implies that the task will run longer, we have to ensure that the same amount of idle time will be applied when system becomes idle and no idle time has been "stolen". But reaching the maximum utilization value (SCHED_CAPACITY_SCALE) means that the task is seen as an always-running task whatever the capacity of the CPU (even at max compute capacity). In this case, we can discard this "stolen" idle times which becomes meaningless.
In order to achieve this time scaling, a new clock_pelt is created per rq. The increase of this clock scales with current capacity when something is running on rq and synchronizes with clock_task when rq is idle. With this mechanism, we ensure the same running and idle time whatever the current capacity. This also enables to simplify the pelt algorithm by removing all references of uarch and frequency and applying the same contribution to utilization and loads. Furthermore, the scaling is done only once per update of clock (update_rq_clock_task()) instead of during each update of sched_entities and cfs/rt/dl_rq of the rq like the current implementation. This is interesting when cgroup are involved as shown in the results below:
On a hikey (octo Arm64 platform). Performance cpufreq governor and only shallowest c-state to remove variance generated by those power features so we only track the impact of pelt algo.
each test runs 16 times:
./perf bench sched pipe (higher is better) kernel tip/sched/core + patch ops/seconds ops/seconds diff cgroup root 59652(+/- 0.18%) 59876(+/- 0.24%) +0.38% level1 55608(+/- 0.27%) 55923(+/- 0.24%) +0.57% level2 52115(+/- 0.29%) 52564(+/- 0.22%) +0.86%
hackbench -l 1000 (lower is better) kernel tip/sched/core + patch duration(sec) duration(sec) diff cgroup root 4.453(+/- 2.37%) 4.383(+/- 2.88%) -1.57% level1 4.859(+/- 8.50%) 4.830(+/- 7.07%) -0.60% level2 5.063(+/- 9.83%) 4.928(+/- 9.66%) -2.66%
Then, the responsiveness of PELT is improved when CPU is not running at max capacity with this new algorithm. I have put below some examples of duration to reach some typical load values according to the capacity of the CPU with current implementation and with this patch. These values has been computed based on the geometric series and the half period value:
Util (%) max capacity half capacity(mainline) half capacity(w/ patch) 972 (95%) 138ms not reachable 276ms 486 (47.5%) 30ms 138ms 60ms 256 (25%) 13ms 32ms 26ms
On my hikey (octo Arm64 platform) with schedutil governor, the time to reach max OPP when starting from a null utilization, decreases from 223ms with current scale invariance down to 121ms with the new algorithm.
Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8 |
|
| #
4a465e3e |
| 03-Aug-2018 |
Dietmar Eggemann <[email protected]> |
sched/fair: Remove setting task's se->runnable_weight during PELT update
A CFS (SCHED_OTHER, SCHED_BATCH or SCHED_IDLE policy) task's se->runnable_weight must always be in sync with its se->load.wei
sched/fair: Remove setting task's se->runnable_weight during PELT update
A CFS (SCHED_OTHER, SCHED_BATCH or SCHED_IDLE policy) task's se->runnable_weight must always be in sync with its se->load.weight.
se->runnable_weight is set to se->load.weight when the task is forked (init_entity_runnable_average()) or reniced (reweight_entity()).
There are two cases in set_load_weight() which since they currently only set se->load.weight could lead to a situation in which se->load.weight is different to se->runnable_weight for a CFS task:
(1) A task switches to SCHED_IDLE.
(2) A SCHED_FIFO, SCHED_RR or SCHED_DEADLINE task which has been reniced (during which only its static priority gets set) switches to SCHED_OTHER or SCHED_BATCH.
Set se->runnable_weight to se->load.weight in these two cases to prevent this. This eliminates the need to explicitly set it to se->load.weight during PELT updates in the CFS scheduler fastpath.
Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vincent Guittot <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|