|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
f3fa0e40 |
| 05-Feb-2025 |
Yafang Shao <[email protected]> |
sched/clock: Don't define sched_clock_irqtime as static key
The sched_clock_irqtime was defined as a static key in:
8722903cbb8f ("sched: Define sched_clock_irqtime as static key")
However, this
sched/clock: Don't define sched_clock_irqtime as static key
The sched_clock_irqtime was defined as a static key in:
8722903cbb8f ("sched: Define sched_clock_irqtime as static key")
However, this change introduces a 'sleeping in atomic context' warning:
arch/x86/kernel/tsc.c:1214 mark_tsc_unstable() warn: sleeping in atomic context
As analyzed by Dan, the affected code path is as follows:
vcpu_load() <- disables preempt -> kvm_arch_vcpu_load() -> mark_tsc_unstable() <- sleeps
virt/kvm/kvm_main.c 166 void vcpu_load(struct kvm_vcpu *vcpu) 167 { 168 int cpu = get_cpu(); ^^^^^^^^^^ This get_cpu() disables preemption.
169 170 __this_cpu_write(kvm_running_vcpu, vcpu); 171 preempt_notifier_register(&vcpu->preempt_notifier); 172 kvm_arch_vcpu_load(vcpu, cpu); 173 put_cpu(); 174 }
arch/x86/kvm/x86.c 4979 if (unlikely(vcpu->cpu != cpu) || kvm_check_tsc_unstable()) { 4980 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 : 4981 rdtsc() - vcpu->arch.last_host_tsc; 4982 if (tsc_delta < 0) 4983 mark_tsc_unstable("KVM discovered backwards TSC");
arch/x86/kernel/tsc.c 1206 void mark_tsc_unstable(char *reason) 1207 { 1208 if (tsc_unstable) 1209 return; 1210 1211 tsc_unstable = 1; 1212 if (using_native_sched_clock()) 1213 clear_sched_clock_stable(); --> 1214 disable_sched_clock_irqtime(); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ kernel/jump_label.c 245 void static_key_disable(struct static_key *key) 246 { 247 cpus_read_lock(); ^^^^^^^^^^^^^^^^ This lock has a might_sleep() in it which triggers the static checker warning.
248 static_key_disable_cpuslocked(key); 249 cpus_read_unlock(); 250 }
Let revert this change for now as {disable,enable}_sched_clock_irqtime are used in many places, as pointed out by Sean, including the following:
The code path in clocksource_watchdog():
clocksource_watchdog() | -> spin_lock(&watchdog_lock); | -> __clocksource_unstable() | -> clocksource.mark_unstable() == tsc_cs_mark_unstable() | -> disable_sched_clock_irqtime()
And the code path in sched_clock_register():
/* Cannot register a sched_clock with interrupts on */ local_irq_save(flags);
...
/* Enable IRQ time accounting if we have a fast enough sched_clock() */ if (irqtime > 0 || (irqtime == -1 && rate >= 1000000)) enable_sched_clock_irqtime();
local_irq_restore(flags);
[ [email protected]: reported a build error in the prev version ] [ mingo: cherry-picked it over into sched/urgent ]
Closes: https://lore.kernel.org/kvm/[email protected]/ Fixes: 8722903cbb8f ("sched: Define sched_clock_irqtime as static key") Reported-by: Dan Carpenter <[email protected]> Debugged-by: Dan Carpenter <[email protected]> Debugged-by: Sean Christopherson <[email protected]> Debugged-by: Michal Koutný <[email protected]> Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
b9f2b29b |
| 05-Feb-2025 |
Yafang Shao <[email protected]> |
sched: Don't define sched_clock_irqtime as static key
The sched_clock_irqtime was defined as a static key in commit 8722903cbb8f ('sched: Define sched_clock_irqtime as static key'). However, this ch
sched: Don't define sched_clock_irqtime as static key
The sched_clock_irqtime was defined as a static key in commit 8722903cbb8f ('sched: Define sched_clock_irqtime as static key'). However, this change introduces a 'sleeping in atomic context' warning, as shown below:
arch/x86/kernel/tsc.c:1214 mark_tsc_unstable() warn: sleeping in atomic context
As analyzed by Dan, the affected code path is as follows:
vcpu_load() <- disables preempt -> kvm_arch_vcpu_load() -> mark_tsc_unstable() <- sleeps
virt/kvm/kvm_main.c 166 void vcpu_load(struct kvm_vcpu *vcpu) 167 { 168 int cpu = get_cpu(); ^^^^^^^^^^ This get_cpu() disables preemption.
169 170 __this_cpu_write(kvm_running_vcpu, vcpu); 171 preempt_notifier_register(&vcpu->preempt_notifier); 172 kvm_arch_vcpu_load(vcpu, cpu); 173 put_cpu(); 174 }
arch/x86/kvm/x86.c 4979 if (unlikely(vcpu->cpu != cpu) || kvm_check_tsc_unstable()) { 4980 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 : 4981 rdtsc() - vcpu->arch.last_host_tsc; 4982 if (tsc_delta < 0) 4983 mark_tsc_unstable("KVM discovered backwards TSC");
arch/x86/kernel/tsc.c 1206 void mark_tsc_unstable(char *reason) 1207 { 1208 if (tsc_unstable) 1209 return; 1210 1211 tsc_unstable = 1; 1212 if (using_native_sched_clock()) 1213 clear_sched_clock_stable(); --> 1214 disable_sched_clock_irqtime(); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ kernel/jump_label.c 245 void static_key_disable(struct static_key *key) 246 { 247 cpus_read_lock(); ^^^^^^^^^^^^^^^^ This lock has a might_sleep() in it which triggers the static checker warning.
248 static_key_disable_cpuslocked(key); 249 cpus_read_unlock(); 250 }
Let revert this change for now as {disable,enable}_sched_clock_irqtime are used in many places, as pointed out by Sean, including the following:
The code path in clocksource_watchdog():
clocksource_watchdog() | -> spin_lock(&watchdog_lock); | -> __clocksource_unstable() | -> clocksource.mark_unstable() == tsc_cs_mark_unstable() | -> disable_sched_clock_irqtime()
And the code path in sched_clock_register():
/* Cannot register a sched_clock with interrupts on */ local_irq_save(flags);
...
/* Enable IRQ time accounting if we have a fast enough sched_clock() */ if (irqtime > 0 || (irqtime == -1 && rate >= 1000000)) enable_sched_clock_irqtime();
local_irq_restore(flags);
[[email protected]: reported a build error in the prev version]
Closes: https://lore.kernel.org/kvm/[email protected]/ Fixes: 8722903cbb8f ("sched: Define sched_clock_irqtime as static key") Reported-by: Dan Carpenter <[email protected]> Debugged-by: Dan Carpenter <[email protected]> Debugged-by: Sean Christopherson <[email protected]> Debugged-by: Michal Koutný <[email protected]> Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6 |
|
| #
8722903c |
| 03-Jan-2025 |
Yafang Shao <[email protected]> |
sched: Define sched_clock_irqtime as static key
Since CPU time accounting is a performance-critical path, let's define sched_clock_irqtime as a static key to minimize potential overhead.
Signed-off
sched: Define sched_clock_irqtime as static key
Since CPU time accounting is a performance-critical path, let's define sched_clock_irqtime as a static key to minimize potential overhead.
Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
77baa5ba |
| 26-Jul-2024 |
Zheng Zucheng <[email protected]> |
sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
In extreme test scenarios: the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime, utime = 18446744073709518790 ns, rti
sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
In extreme test scenarios: the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime, utime = 18446744073709518790 ns, rtime = 135989749728000 ns
In cputime_adjust() process, stime is greater than rtime due to mul_u64_u64_div_u64() precision problem. before call mul_u64_u64_div_u64(), stime = 175136586720000, rtime = 135989749728000, utime = 1416780000. after call mul_u64_u64_div_u64(), stime = 135989949653530
unsigned reversion occurs because rtime is less than stime. utime = rtime - stime = 135989749728000 - 135989949653530 = -199925530 = (u64)18446744073709518790
Trigger condition: 1). User task run in kernel mode most of time 2). ARM64 architecture 3). TICK_CPU_ACCOUNTING=y CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()") Signed-off-by: Zheng Zucheng <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
402de7fc |
| 27-May-2024 |
Ingo Molnar <[email protected]> |
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ing
sched: Fix spelling in comments
Do a spell-checking pass.
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
89d6910c |
| 10-Apr-2024 |
Alexander Gordeev <[email protected]> |
sched/vtime: Get rid of generic vtime_task_switch() implementation
The generic vtime_task_switch() implementation gets built only if __ARCH_HAS_VTIME_TASK_SWITCH is not defined, but requires an arch
sched/vtime: Get rid of generic vtime_task_switch() implementation
The generic vtime_task_switch() implementation gets built only if __ARCH_HAS_VTIME_TASK_SWITCH is not defined, but requires an architecture to implement arch_vtime_task_switch() callback at the same time, which is confusing.
Further, arch_vtime_task_switch() is implemented for 32-bit PowerPC architecture only and vtime_task_switch() generic variant is rather superfluous.
Simplify the whole vtime_task_switch() wiring by moving the existing generic implementation to PowerPC.
Signed-off-by: Alexander Gordeev <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> Acked-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/2cb6e3caada93623f6d4f78ad938ac6cd0e2fda8.1712760275.git.agordeev@linux.ibm.com
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1 |
|
| #
c8997020 |
| 20-Dec-2022 |
Nicholas Piggin <[email protected]> |
cputime: remove cputime_to_nsecs fallback
The archs that use cputime_to_nsecs() internally provide their own definition and don't need the fallback. cputime_to_usecs() unused except in this fallback
cputime: remove cputime_to_nsecs fallback
The archs that use cputime_to_nsecs() internally provide their own definition and don't need the fallback. cputime_to_usecs() unused except in this fallback, and is not defined anywhere.
This removes the final remnant of the cputime_t code from the kernel.
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5 |
|
| #
1fcf54de |
| 29-Jun-2022 |
Josh Don <[email protected]> |
sched/core: add forced idle accounting for cgroups
4feee7d1260 previously added per-task forced idle accounting. This patch extends this to also include cgroups.
rstat is used for cgroup accounting
sched/core: add forced idle accounting for cgroups
4feee7d1260 previously added per-task forced idle accounting. This patch extends this to also include cgroups.
rstat is used for cgroup accounting, except for the root, which uses kcpustat in order to bypass the need for doing an rstat flush when reading root stats.
Only cgroup v2 is supported. Similar to the task accounting, the cgroup accounting requires that schedstats is enabled.
Signed-off-by: Josh Don <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
f96eca43 |
| 22-Feb-2022 |
Ingo Molnar <[email protected]> |
sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there
Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kerne
sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there
Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kernel/sched/build_policy.c:
kernel/sched/idle.c
kernel/sched/rt.c
kernel/sched/cpudeadline.c kernel/sched/pelt.c
kernel/sched/cputime.c kernel/sched/deadline.c
With the exception of fair.c, which we continue to build as a separate file for build efficiency and parallelism reasons.
Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15 |
|
| #
e7f2be11 |
| 26-Oct-2021 |
Frederic Weisbecker <[email protected]> |
sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
getrusage(RUSAGE_THREAD) with nohz_full may return shorter utime/stime than the actual time.
task_cputime_adjusted() snapshots utime and s
sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
getrusage(RUSAGE_THREAD) with nohz_full may return shorter utime/stime than the actual time.
task_cputime_adjusted() snapshots utime and stime and then adjust their sum to match the scheduler maintained cputime.sum_exec_runtime. Unfortunately in nohz_full, sum_exec_runtime is only updated once per second in the worst case, causing a discrepancy against utime and stime that can be updated anytime by the reader using vtime.
To fix this situation, perform an update of cputime.sum_exec_runtime when the cputime snapshot reports the task as actually running while the tick is disabled. The related overhead is then contained within the relevant situations.
Reported-by: Hasegawa Hitomi <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Hasegawa Hitomi <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Masayoshi Mizuma <[email protected]> Acked-by: Phil Auld <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
9731698e |
| 15-Nov-2021 |
Andrey Ryabinin <[email protected]> |
cputime, cpuacct: Include guest time in user time in cpuacct.stat
cpuacct.stat in no-root cgroups shows user time without guest time included int it. This doesn't match with user time shown in root
cputime, cpuacct: Include guest time in user time in cpuacct.stat
cpuacct.stat in no-root cgroups shows user time without guest time included int it. This doesn't match with user time shown in root cpuacct.stat and /proc/<pid>/stat. This also affects cgroup2's cpu.stat in the same way.
Make account_guest_time() to add user time to cgroup's cpustat to fix this.
Fixes: ef12fefabf94 ("cpuacct: add per-cgroup utime/stime statistics") Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Daniel Jordan <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4 |
|
| #
3b03706f |
| 18-Mar-2021 |
Ingo Molnar <[email protected]> |
sched: Fix various typos
Fix ~42 single-word typos in scheduler code comments.
We have accumulated a few fun ones over the years. :-)
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlst
sched: Fix various typos
Fix ~42 single-word typos in scheduler code comments.
We have accumulated a few fun ones over the years. :-)
Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ben Segall <[email protected]> Cc: Mel Gorman <[email protected]> Cc: [email protected]
show more ...
|
|
Revision tags: v5.12-rc3 |
|
| #
6516b386 |
| 09-Mar-2021 |
Thomas Gleixner <[email protected]> |
irqtime: Make accounting correct on RT
vtime_account_irq and irqtime_account_irq() base checks on preempt_count() which fails on RT because preempt_count() does not contain the softirq accounting wh
irqtime: Make accounting correct on RT
vtime_account_irq and irqtime_account_irq() base checks on preempt_count() which fails on RT because preempt_count() does not contain the softirq accounting which is seperate on RT.
These checks do not need the full preempt count as they only operate on the hard and softirq sections.
Use irq_count() instead which provides the correct value on both RT and non RT kernels. The compiler is clever enough to fold the masking for !RT:
99b: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax - 9a2: 25 ff ff ff 7f and $0x7fffffff,%eax + 9a2: 25 00 ff ff 00 and $0xffff00,%eax
Reported-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7 |
|
| #
d3759e71 |
| 02-Dec-2020 |
Frederic Weisbecker <[email protected]> |
irqtime: Move irqtime entry accounting after irq offset incrementation
IRQ time entry is currently accounted before HARDIRQ_OFFSET or SOFTIRQ_OFFSET are incremented. This is convenient to decide to
irqtime: Move irqtime entry accounting after irq offset incrementation
IRQ time entry is currently accounted before HARDIRQ_OFFSET or SOFTIRQ_OFFSET are incremented. This is convenient to decide to which index the cputime to account is dispatched.
Unfortunately it prevents tick_irq_enter() from being called under HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ entry accounting due to the necessary clock catch up. As a result we don't benefit from appropriate lockdep coverage on tick_irq_enter().
To prepare for fixing this, move the IRQ entry cputime accounting after the preempt offset is incremented. This requires the cputime dispatch code to handle the extra offset.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
8a6a5920 |
| 02-Dec-2020 |
Frederic Weisbecker <[email protected]> |
sched/vtime: Consolidate IRQ time accounting
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the app
sched/vtime: Consolidate IRQ time accounting
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function.
Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
2b91ec9f |
| 02-Dec-2020 |
Frederic Weisbecker <[email protected]> |
s390/vtime: Use the generic IRQ entry accounting
s390 has its own version of IRQ entry accounting because it doesn't account the idle time the same way the other architectures do. Only the actual id
s390/vtime: Use the generic IRQ entry accounting
s390 has its own version of IRQ entry accounting because it doesn't account the idle time the same way the other architectures do. Only the actual idle sleep time is accounted as idle time, the rest of the idle task execution is accounted as system time.
Make the generic IRQ entry accounting aware of architectures that have their own way of accounting idle time and convert s390 to use it.
This prepares s390 to get involved in further consolidations of IRQ time accounting.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
7197688b |
| 02-Dec-2020 |
Frederic Weisbecker <[email protected]> |
sched/cputime: Remove symbol exports from IRQ time accounting
account_irq_enter_time() and account_irq_exit_time() are not called from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ
sched/cputime: Remove symbol exports from IRQ time accounting
account_irq_enter_time() and account_irq_exit_time() are not called from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ cputime accounting functions called from there.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7 |
|
| #
3dc167ba |
| 19-May-2020 |
Oleg Nesterov <[email protected]> |
sched/cputime: Improve cputime_adjust()
People report that utime and stime from /proc/<pid>/stat become very wrong when the numbers are big enough, especially if you watch these counters incremental
sched/cputime: Improve cputime_adjust()
People report that utime and stime from /proc/<pid>/stat become very wrong when the numbers are big enough, especially if you watch these counters incrementally.
Specifically, the current implementation of: stime*rtime/total, results in a saw-tooth function on top of the desired line, where the teeth grow in size the larger the values become. IOW, it has a relative error.
The result is that, when watching incrementally as time progresses (for large values), we'll see periods of pure stime or utime increase, irrespective of the actual ratio we're striving for.
Replace scale_stime() with a math64.h helper: mul_u64_u64_div_u64() that is far more accurate. This also allows architectures to override the implementation -- for instance they can opt for the old algorithm if this new one turns out to be too expensive for them.
Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6 |
|
| #
e0d648f9 |
| 27-Mar-2020 |
Borislav Petkov <[email protected]> |
sched/vtime: Work around an unitialized variable warning
Work around this warning:
kernel/sched/cputime.c: In function ‘kcpustat_field’: kernel/sched/cputime.c:1007:6: warning: ‘val’ may be use
sched/vtime: Work around an unitialized variable warning
Work around this warning:
kernel/sched/cputime.c: In function ‘kcpustat_field’: kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized]
because GCC can't see that val is used only when err is 0.
Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5 |
|
| #
f1dfdab6 |
| 23-Jan-2020 |
Chris Wilson <[email protected]> |
sched/vtime: Prevent unstable evaluation of WARN(vtime->state)
As the vtime is sampled under loose seqcount protection by kcpustat, the vtime fields may change as the code flows. Where logic dictate
sched/vtime: Prevent unstable evaluation of WARN(vtime->state)
As the vtime is sampled under loose seqcount protection by kcpustat, the vtime fields may change as the code flows. Where logic dictates a field has a static value, use a READ_ONCE.
Signed-off-by: Chris Wilson <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Fixes: 74722bb223d0 ("sched/vtime: Bring up complete kcpustat accessor") Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.5-rc7, v5.5-rc6, v5.5-rc5 |
|
| #
9dec1b69 |
| 02-Jan-2020 |
Alex Shi <[email protected]> |
sched/cputime: move rq parameter in irqtime_account_process_tick
Every time we call irqtime_account_process_tick() is in a interrupt, Every caller will get and assign a parameter rq = this_rq(), Thi
sched/cputime: move rq parameter in irqtime_account_process_tick
Every time we call irqtime_account_process_tick() is in a interrupt, Every caller will get and assign a parameter rq = this_rq(), This is unnecessary and increase the code size a little bit. Move the rq getting action to irqtime_account_process_tick internally is better.
base with this patch cputime.o 578792 bytes 577888 bytes
Signed-off-by: Alex Shi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4 |
|
| #
74722bb2 |
| 21-Nov-2019 |
Frederic Weisbecker <[email protected]> |
sched/vtime: Bring up complete kcpustat accessor
Many callsites want to fetch the values of system, user, user_nice, guest or guest_nice kcpustat fields altogether or at least a pair of these.
In t
sched/vtime: Bring up complete kcpustat accessor
Many callsites want to fetch the values of system, user, user_nice, guest or guest_nice kcpustat fields altogether or at least a pair of these.
In that case calling kcpustat_field() for each requested field brings unecessary overhead when we could fetch all of them in a row.
So provide kcpustat_cpu_fetch() that fetches the whole kcpustat array in a vtime safe way under the same RCU and seqcount block.
Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Yauheni Kaliuta <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
5a1c9558 |
| 21-Nov-2019 |
Frederic Weisbecker <[email protected]> |
sched/cputime: Support other fields on kcpustat_field()
Provide support for user, nice, guest and guest_nice fields through kcpustat_field().
Whether we account the delta to a nice or not nice fiel
sched/cputime: Support other fields on kcpustat_field()
Provide support for user, nice, guest and guest_nice fields through kcpustat_field().
Whether we account the delta to a nice or not nice field is decided on top of the nice value snapshot taken at the time we call kcpustat_field(). If the nice value of the task has been changed since the last vtime update, we may have inacurrate distribution of the nice VS unnice cputime.
However this is considered as a minor issue compared to the proper fix that would involve interrupting the target on nice updates, which is undesired on nohz_full CPUs.
Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Yauheni Kaliuta <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5 |
|
| #
64eea63c |
| 25-Oct-2019 |
Frederic Weisbecker <[email protected]> |
sched/kcpustat: Introduce vtime-aware kcpustat accessor for CPUTIME_SYSTEM
Kcpustat is not correctly supported on nohz_full CPUs. The tick doesn't fire and the cputime therefore doesn't move forward
sched/kcpustat: Introduce vtime-aware kcpustat accessor for CPUTIME_SYSTEM
Kcpustat is not correctly supported on nohz_full CPUs. The tick doesn't fire and the cputime therefore doesn't move forward. The issue has shown up after the vanishing of the remaining 1Hz which has made the stall visible.
We are solving that with checking the task running on a CPU through RCU and reading its vtime delta that we add to the raw kcpustat values.
We make sure that we fetch a coherent raw-kcpustat/vtime-delta couple sequence while checking that the CPU referred by the target vtime is the correct one, under the locked vtime seqcount.
Only CPUTIME_SYSTEM is handled here as a start because it's the trivial case. User and guest time will require more preparation work to correctly handle niceness.
Reported-by: Yauheni Kaliuta <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wanpeng Li <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.4-rc4 |
|
| #
e44fcb4b |
| 16-Oct-2019 |
Frederic Weisbecker <[email protected]> |
sched/vtime: Rename vtime_accounting_cpu_enabled() to vtime_accounting_enabled_this_cpu()
Standardize the naming on top of the vtime_accounting_enabled_*() base. Also make it clear we are checking t
sched/vtime: Rename vtime_accounting_cpu_enabled() to vtime_accounting_enabled_this_cpu()
Standardize the naming on top of the vtime_accounting_enabled_*() base. Also make it clear we are checking the vtime state of the *current* CPU with this function. We'll need to add an API to check that state on remote CPUs as well, so we must disambiguate the naming.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Jacek Anaszewski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Yauheni Kaliuta <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|