|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6 |
|
| #
37acade0 |
| 10-Oct-2023 |
Matthew Wilcox (Oracle) <[email protected]> |
sched: remove wait bookmarks
There are no users of wait bookmarks left, so simplify the wait code by removing them.
Link: https://lkml.kernel.org/r/[email protected] Signe
sched: remove wait bookmarks
There are no users of wait bookmarks left, so simplify the wait code by removing them.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Benjamin Segall <[email protected]> Cc: Bin Lai <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2 |
|
| #
6f63904c |
| 08-Mar-2023 |
Andrei Vagin <[email protected]> |
sched: add a few helpers to wake up tasks on the current cpu
Add complete_on_current_cpu, wake_up_poll_on_current_cpu helpers to wake up tasks on the current CPU.
These two helpers are useful when
sched: add a few helpers to wake up tasks on the current cpu
Add complete_on_current_cpu, wake_up_poll_on_current_cpu helpers to wake up tasks on the current CPU.
These two helpers are useful when the task needs to make a synchronous context switch to another task. In this context, synchronous means it wakes up the target task and falls asleep right after that.
One example of such workloads is seccomp user notifies. This mechanism allows the supervisor process handles system calls on behalf of a target process. While the supervisor is handling an intercepted system call, the target process will be blocked in the kernel, waiting for a response to come back.
On-CPU context switches are much faster than regular ones.
Signed-off-by: Andrei Vagin <[email protected]> Acked-by: "Peter Zijlstra (Intel)" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
| #
ef73d6a4 |
| 02-Jun-2023 |
Arve Hjønnevåg <[email protected]> |
sched/wait: Fix a kthread_park race with wait_woken()
kthread_park and wait_woken have a similar race that kthread_stop and wait_woken used to have before it was fixed in commit cb6538e740d7 ("sched
sched/wait: Fix a kthread_park race with wait_woken()
kthread_park and wait_woken have a similar race that kthread_stop and wait_woken used to have before it was fixed in commit cb6538e740d7 ("sched/wait: Fix a kthread race with wait_woken()"). Extend that fix to also cover kthread_park.
[jstultz: Made changes suggested by Peter to optimize memory loads]
Signed-off-by: Arve Hjønnevåg <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6 |
|
| #
ee7dc86b |
| 15-Nov-2022 |
Gabriel Krisman Bertazi <[email protected]> |
wait: Return number of exclusive waiters awaken
Sbitmap code will need to know how many waiters were actually woken for its batched wakeups implementation. Return the number of woken exclusive wait
wait: Return number of exclusive waiters awaken
Sbitmap code will need to know how many waiters were actually woken for its batched wakeups implementation. Return the number of woken exclusive waiters from __wake_up() to facilitate that.
Suggested-by: Jan Kara <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
801c1419 |
| 22-Feb-2022 |
Ingo Molnar <[email protected]> |
sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there
Collect all utility functionality source code files into a single kernel/sched/build_utility.c file, via #incl
sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there
Collect all utility functionality source code files into a single kernel/sched/build_utility.c file, via #include-ing the .c files:
kernel/sched/clock.c kernel/sched/completion.c kernel/sched/loadavg.c kernel/sched/swait.c kernel/sched/wait_bit.c kernel/sched/wait.c
CONFIG_CPU_FREQ: kernel/sched/cpufreq.c
CONFIG_CPU_FREQ_GOV_SCHEDUTIL: kernel/sched/cpufreq_schedutil.c
CONFIG_CGROUP_CPUACCT: kernel/sched/cpuacct.c
CONFIG_SCHED_DEBUG: kernel/sched/debug.c
CONFIG_SCHEDSTATS: kernel/sched/stats.c
CONFIG_SMP: kernel/sched/cpupri.c kernel/sched/stop_task.c kernel/sched/topology.c
CONFIG_SCHED_CORE: kernel/sched/core_sched.c
CONFIG_PSI: kernel/sched/psi.c
CONFIG_MEMBARRIER: kernel/sched/membarrier.c
CONFIG_CPU_ISOLATION: kernel/sched/isolation.c
CONFIG_SCHED_AUTOGROUP: kernel/sched/autogroup.c
The goal is to amortize the 60+ KLOC header bloat from over a dozen build units into a single build unit.
The build time of build_utility.c also roughly matches the build time of core.c and fair.c - allowing better load-balancing of scheduler-only rebuilds.
Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5 |
|
| #
42288cb4 |
| 09-Dec-2021 |
Eric Biggers <[email protected]> |
wait: add wake_up_pollfree()
Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This
wait: add wake_up_pollfree()
Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'.
However, that has a bug: wake_up_poll() calls __wake_up() with nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters, and the wakeup function for the first one returns a positive value, only that one will be called. That's *not* what's needed for POLLFREE; POLLFREE is special in that it really needs to wake up everyone.
Considering the three non-blocking poll systems:
- io_uring poll doesn't handle POLLFREE at all, so it is broken anyway.
- aio poll is unaffected, since it doesn't support exclusive waits. However, that's fragile, as someone could add this feature later.
- epoll doesn't appear to be broken by this, since its wakeup function returns 0 when it sees POLLFREE. But this is fragile.
Although there is a workaround (see epoll), it's better to define a function which always sends POLLFREE to all waiters. Add such a function. Also make it verify that the queue really becomes empty after all waiters have been woken up.
Reported-by: Linus Torvalds <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6 |
|
| #
11c7aa0d |
| 07-Jun-2021 |
Jan Kara <[email protected]> |
rq-qos: fix missed wake-ups in rq_qos_throttle try two
Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") tried to fix a problem that a process could be sleeping in rq_qos_wait()
rq-qos: fix missed wake-ups in rq_qos_throttle try two
Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") tried to fix a problem that a process could be sleeping in rq_qos_wait() without anyone to wake it up. However the fix is not complete and the following can still happen:
CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker) rq_qos_wait() rq_qos_wait() acquire_inflight_cb() -> fails acquire_inflight_cb() -> fails
completes IOs, inflight decreased prepare_to_wait_exclusive() prepare_to_wait_exclusive() has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers has_sleeper = !wq_has_single_sleeper() -> true io_schedule() io_schedule()
Deadlock as now there's nobody to wakeup the two waiters. The logic automatically blocking when there are already sleepers is really subtle and the only way to make it work reliably is that we check whether there are some waiters in the queue when adding ourselves there. That way, we are guaranteed that at least the first process to enter the wait queue will recheck the waiting condition before going to sleep and thus guarantee forward progress.
Fixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") CC: [email protected] Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2 |
|
| #
c4d51a52 |
| 27-Oct-2020 |
David Woodhouse <[email protected]> |
sched/wait: Add add_wait_queue_priority()
This allows an exclusive wait_queue_entry to be added at the head of the queue, instead of the tail as normal. Thus, it gets to consume events first without
sched/wait: Add add_wait_queue_priority()
This allows an exclusive wait_queue_entry to be added at the head of the queue, instead of the tail as normal. Thus, it gets to consume events first without allowing non-exclusive waiters to be woken at all.
The (first) intended use is for KVM IRQFD, which currently has inconsistent behaviour depending on whether posted interrupts are available or not. If they are, KVM will bypass the eventfd completely and deliver interrupts directly to the appropriate vCPU. If not, events are delivered through the eventfd and userspace will receive them when polling on the eventfd.
By using add_wait_queue_priority(), KVM will be able to consistently consume events within the kernel without accidentally exposing them to userspace when they're supposed to be bypassed. This, in turn, means that userspace doesn't have to jump through hoops to avoid listening on the erroneously noisy eventfd and injecting duplicate interrupts.
Signed-off-by: David Woodhouse <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
show more ...
|
|
Revision tags: v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7 |
|
| #
c6fe44d9 |
| 23-Jul-2020 |
Linus Torvalds <[email protected]> |
list: add "list_del_init_careful()" to go with "list_empty_careful()"
That gives us ordering guarantees around the pair.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Revision tags: v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1 |
|
| #
f94df989 |
| 24-Sep-2019 |
David Howells <[email protected]> |
Add wake_up_interruptible_sync_poll_locked()
Add a wakeup call for a case whereby the caller already has the waitqueue spinlock held. This can be used by pipes to alter the ring buffer indices and
Add wake_up_interruptible_sync_poll_locked()
Add a wakeup call for a case whereby the caller already has the waitqueue spinlock held. This can be used by pipes to alter the ring buffer indices and issue a wakeup under the same spinlock.
Signed-off-by: David Howells <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
| #
ce4dd442 |
| 16-Oct-2019 |
David Howells <[email protected]> |
Remove the nr_exclusive argument from __wake_up_sync_key()
Remove the nr_exclusive argument from __wake_up_sync_key() and derived functions as everything seems to set it to 1. Note also that if it
Remove the nr_exclusive argument from __wake_up_sync_key()
Remove the nr_exclusive argument from __wake_up_sync_key() and derived functions as everything seems to set it to 1. Note also that if it wasn't set to 1, it would clear WF_SYNC anyway.
Signed-off-by: David Howells <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
|
Revision tags: v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5 |
|
| #
016190a4 |
| 11-Jun-2019 |
Pavel Begunkov <[email protected]> |
sched/wait: Deduplicate code with do-while
Statements in the loop's body and before it are identical. Use do-while to not repeat it.
Signed-off-by: Pavel Begunkov <[email protected]> Signed-of
sched/wait: Deduplicate code with do-while
Statements in the loop's body and before it are identical. Use do-while to not repeat it.
Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.com Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1 |
|
| #
457c8996 |
| 19-May-2019 |
Thomas Gleixner <[email protected]> |
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was use
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1 |
|
| #
34ec35ad |
| 03-Jan-2019 |
Davidlohr Bueso <[email protected]> |
kernel/sched/: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery.
Link: http://lkml.kernel.org/r/[email protected]
kernel/sched/: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1 |
|
| #
e05a8e4d |
| 22-Aug-2018 |
Christoph Hellwig <[email protected]> |
sched/wait: assert the wait_queue_head lock is held in __wake_up_common
Better ensure we actually hold the lock using lockdep than just commenting on it. Due to the various exported _locked interfa
sched/wait: assert the wait_queue_head lock is held in __wake_up_common
Better ensure we actually hold the lock using lockdep than just commenting on it. Due to the various exported _locked interfaces it is far too easy to get the locking wrong.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Al Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Baron <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6 |
|
| #
7696f991 |
| 16-Jul-2018 |
Andrea Parri <[email protected]> |
sched/Documentation: Update wake_up() & co. memory-barrier guarantees
Both the implementation and the users' expectation [1] for the various wakeup primitives have evolved over time, but the documen
sched/Documentation: Update wake_up() & co. memory-barrier guarantees
Both the implementation and the users' expectation [1] for the various wakeup primitives have evolved over time, but the documentation has not kept up with these changes: brings it into 2018.
[1] http://lkml.kernel.org/r/[email protected]
Also applied feedback from Alan Stern.
Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Andrea Parri <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Akira Yokosawa <[email protected]> Cc: Alan Stern <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Daniel Lustig <[email protected]> Cc: David Howells <[email protected]> Cc: Jade Alglave <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luc Maranget <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
76e079fe |
| 16-Jul-2018 |
Andrea Parri <[email protected]> |
sched/core: Use smp_mb() in wake_woken_function()
wake_woken_function() synchronizes with wait_woken() as follows:
[wait_woken] [wake_woken_function]
entry->flags &= ~wq_
sched/core: Use smp_mb() in wake_woken_function()
wake_woken_function() synchronizes with wait_woken() as follows:
[wait_woken] [wake_woken_function]
entry->flags &= ~wq_flag_woken; condition = true; smp_mb(); smp_wmb(); if (condition) wq_entry->flags |= wq_flag_woken; break;
This commit replaces the above smp_wmb() with an smp_mb() in order to guarantee that either wait_woken() sees the wait condition being true or the store to wq_entry->flags in woken_wake_function() follows the store in wait_woken() in the coherence order (so that the former can eventually be observed by wait_woken()).
The commit also fixes a comment associated to set_current_state() in wait_woken(): the comment pairs the barrier in set_current_state() to the above smp_wmb(), while the actual pairing involves the barrier in set_current_state() and the barrier executed by the try_to_wake_up() in wake_woken_function().
Signed-off-by: Andrea Parri <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4 |
|
| #
325ea10c |
| 03-Mar-2018 |
Ingo Molnar <[email protected]> |
sched/headers: Simplify and clean up header usage in the scheduler
Do the following cleanups and simplifications:
- sched/sched.h already includes <asm/paravirt.h>, so no need to include it in
sched/headers: Simplify and clean up header usage in the scheduler
Do the following cleanups and simplifications:
- sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again.
- order the <linux/sched/*.h> headers alphabetically
- add all <linux/sched/*.h> headers to kernel/sched/sched.h
- remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h.
Finally, make all scheduler .c files use a single common header:
#include "sched.h"
... which now contains a union of the relied upon headers.
This makes the various .c files easier to read and easier to handle.
Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
97fb7a0a |
| 03-Mar-2018 |
Ingo Molnar <[email protected]> |
sched: Clean up and harmonize the coding style of the scheduler code base
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all
sched: Clean up and harmonize the coding style of the scheduler code base
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all these details:
- fix speling in comments,
- use curly braces for multi-line statements,
- remove unnecessary parentheses from integer literals,
- capitalize consistently,
- remove stray newlines,
- add comments where necessary,
- remove invalid/unnecessary comments,
- align structure definitions and other data types vertically,
- add missing newlines for increased readability,
- fix vertical tabulation where it's misaligned,
- harmonize preprocessor conditional block labeling and vertical alignment,
- remove line-breaks where they uglify the code,
- add newline after local variable definitions,
No change in functionality:
md5: 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm
Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3 |
|
| #
c6b9d9a3 |
| 06-Dec-2017 |
Omar Sandoval <[email protected]> |
sched/wait: Fix add_wait_queue() behavioral change
The following cleanup commit:
50816c48997a ("sched/wait: Standardize internal naming of wait-queue entries")
... unintentionally changed the be
sched/wait: Fix add_wait_queue() behavioral change
The following cleanup commit:
50816c48997a ("sched/wait: Standardize internal naming of wait-queue entries")
... unintentionally changed the behavior of add_wait_queue() from inserting the wait entry at the head of the wait queue to the tail of the wait queue.
Beyond a negative performance impact this change in behavior theoretically also breaks wait queues which mix exclusive and non-exclusive waiters, as non-exclusive waiters will not be woken up if they are queued behind enough exclusive waiters.
Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: ("sched/wait: Standardize internal naming of wait-queue entries") Link: http://lkml.kernel.org/r/a16c8ccffd39bd08fdaa45a5192294c784b803a7.1512544324.git.osandov@fb.com Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7 |
|
| #
11a19c7b |
| 25-Aug-2017 |
Tim Chen <[email protected]> |
sched/wait: Introduce wakeup boomark in wake_up_page_bit
Now that we have added breaks in the wait queue scan and allow bookmark on scan position, we put this logic in the wake_up_page_bit function.
sched/wait: Introduce wakeup boomark in wake_up_page_bit
Now that we have added breaks in the wait queue scan and allow bookmark on scan position, we put this logic in the wake_up_page_bit function.
We can have very long page wait list in large system where multiple pages share the same wait list. We break the wake up walk here to allow other cpus a chance to access the list, and not to disable the interrupts when traversing the list for too long. This reduces the interrupt and rescheduling latency, and excessive page wait queue lock hold time.
[ v2: Remove bookmark_wake_function ]
Signed-off-by: Tim Chen <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
2554db91 |
| 25-Aug-2017 |
Tim Chen <[email protected]> |
sched/wait: Break up long wake list walk
We encountered workloads that have very long wake up list on large systems. A waker takes a long time to traverse the entire wake list and execute all the wa
sched/wait: Break up long wake list walk
We encountered workloads that have very long wake up list on large systems. A waker takes a long time to traverse the entire wake list and execute all the wake functions.
We saw page wait list that are up to 3700+ entries long in tests of large 4 and 8 socket systems. It took 0.8 sec to traverse such list during wake up. Any other CPU that contends for the list spin lock will spin for a long time. It is a result of the numa balancing migration of hot pages that are shared by many threads.
Multiple CPUs waking are queued up behind the lock, and the last one queued has to wait until all CPUs did all the wakeups.
The page wait list is traversed with interrupt disabled, which caused various problems. This was the original cause that triggered the NMI watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only extending the NMI watch dog timer there helped.
This patch bookmarks the waker's scan position in wake list and break the wake up walk, to allow access to the list before the waker resume its walk down the rest of the wait list. It lowers the interrupt and rescheduling latency.
This patch also provides a performance boost when combined with the next patch to break up page wakeup list walk. We saw 22% improvement in the will-it-scale file pread2 test on a Xeon Phi system running 256 threads.
[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and simply access to flags. ]
Reported-by: Kan Liang <[email protected]> Tested-by: Kan Liang <[email protected]> Signed-off-by: Tim Chen <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
3510ca20 |
| 27-Aug-2017 |
Linus Torvalds <[email protected]> |
Minor page waitqueue cleanups
Tim Chen and Kan Liang have been battling a customer load that shows extremely long page wakeup lists. The cause seems to be constant NUMA migration of a hot page that
Minor page waitqueue cleanups
Tim Chen and Kan Liang have been battling a customer load that shows extremely long page wakeup lists. The cause seems to be constant NUMA migration of a hot page that is shared across a lot of threads, but the actual root cause for the exact behavior has not been found.
Tim has a patch that batches the wait list traversal at wakeup time, so that we at least don't get long uninterruptible cases where we traverse and wake up thousands of processes and get nasty latency spikes. That is likely 4.14 material, but we're still discussing the page waitqueue specific parts of it.
In the meantime, I've tried to look at making the page wait queues less expensive, and failing miserably. If you have thousands of threads waiting for the same page, it will be painful. We'll need to try to figure out the NUMA balancing issue some day, in addition to avoiding the excessive spinlock hold times.
That said, having tried to rewrite the page wait queues, I can at least fix up some of the braindamage in the current situation. In particular:
(a) we don't want to continue walking the page wait list if the bit we're waiting for already got set again (which seems to be one of the patterns of the bad load). That makes no progress and just causes pointless cache pollution chasing the pointers.
(b) we don't want to put the non-locking waiters always on the front of the queue, and the locking waiters always on the back. Not only is that unfair, it means that we wake up thousands of reading threads that will just end up being blocked by the writer later anyway.
Also add a comment about the layout of 'struct wait_page_key' - there is an external user of it in the cachefiles code that means that it has to match the layout of 'struct wait_bit_key' in the two first members. It so happens to match, because 'struct page *' and 'unsigned long *' end up having the same values simply because the page flags are the first member in struct page.
Cc: Tim Chen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7 |
|
| #
2055da97 |
| 20-Jun-2017 |
Ingo Molnar <[email protected]> |
sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
So I've noticed a number of instances where it was not obvious from the code whether ->task_list was for a wait-queue head
sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
So I've noticed a number of instances where it was not obvious from the code whether ->task_list was for a wait-queue head or a wait-queue entry.
Furthermore, there's a number of wait-queue users where the lists are not for 'tasks' but other entities (poll tables, etc.), in which case the 'task_list' name is actively confusing.
To clear this all up, name the wait-queue head and entry list structure fields unambiguously:
struct wait_queue_head::task_list => ::head struct wait_queue_entry::task_list => ::entry
For example, this code:
rqw->wait.task_list.next != &wait->task_list
... is was pretty unclear (to me) what it's doing, while now it's written this way:
rqw->wait.head.next != &wait->entry
... which makes it pretty clear that we are iterating a list until we see the head.
Other examples are:
list_for_each_entry_safe(pos, next, &x->task_list, task_list) { list_for_each_entry(wq, &fence->wait.task_list, task_list) {
... where it's unclear (to me) what we are iterating, and during review it's hard to tell whether it's trying to walk a wait-queue entry (which would be a bug), while now it's written as:
list_for_each_entry_safe(pos, next, &x->head, entry) { list_for_each_entry(wq, &fence->wait.head, entry) {
Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
5dd43ce2 |
| 20-Jun-2017 |
Ingo Molnar <[email protected]> |
sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
The wait_bit*() types and APIs are mixed into wait.h, but they are a pretty orthogonal extension of wait-queues
sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
The wait_bit*() types and APIs are mixed into wait.h, but they are a pretty orthogonal extension of wait-queues.
Furthermore, only about 50 kernel files use these APIs, while over 1000 use the regular wait-queue functionality.
So clean up the main wait.h by moving the wait-bit functionality out of it, into a separate .h and .c file:
include/linux/wait_bit.h for types and APIs kernel/sched/wait_bit.c for the implementation
Update all header dependencies.
This reduces the size of wait.h rather significantly, by about 30%.
Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|