|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
cec78a59 |
| 17-Sep-2024 |
Zijun Hu <[email protected]> |
list: Remove duplicated and unused macro list_for_each_reverse
Remove macro list_for_each_reverse due to below reasons:
- it is same as list_for_each_prev. - it is not used by current kernel tree.
list: Remove duplicated and unused macro list_for_each_reverse
Remove macro list_for_each_reverse due to below reasons:
- it is same as list_for_each_prev. - it is not used by current kernel tree.
Signed-off-by: Zijun Hu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4 |
|
| #
2932fb0a |
| 08-Feb-2024 |
Wei Yang <[email protected]> |
list: leverage list_is_head() for list_entry_is_head()
This is what list_is_head() exactly do.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Ya
list: leverage list_is_head() for list_entry_is_head()
This is what list_is_head() exactly do.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Cc: Andy Shevchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7 |
|
| #
a43c4756 |
| 04-Jan-2024 |
Pierre Gondois <[email protected]> |
list: add hlist_count_nodes()
Add a generic hlist_count_nodes() function and use it in two drivers.
This patch (of 3):
Add a function to count nodes in a hlist. hlist_count_nodes() is similar to
list: add hlist_count_nodes()
Add a generic hlist_count_nodes() function and use it in two drivers.
This patch (of 3):
Add a function to count nodes in a hlist. hlist_count_nodes() is similar to list_count_nodes().
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pierre Gondois <[email protected]> Reviewed-by: Carlos Llamas <[email protected]> Acked-by: Coly Li <[email protected]> Acked-by: Marco Elver <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Martijn Coenen <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Todd Kjos <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
083772c9 |
| 26-Nov-2023 |
Jakub Kicinski <[email protected]> |
net: page_pool: record pools per netdev
Link the page pools with netdevs. This needs to be netns compatible so we have two options. Either we record the pools per netns and have to worry about movin
net: page_pool: record pools per netdev
Link the page pools with netdevs. This needs to be netns compatible so we have two options. Either we record the pools per netns and have to worry about moving them as the netdev gets moved. Or we record them directly on the netdev so they move with the netdev without any extra work.
Implement the latter option. Since pools may outlast netdev we need a place to store orphans. In time honored tradition use loopback for this purpose.
Reviewed-by: Mina Almasry <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3 |
|
| #
8bf0cdfa |
| 21-Sep-2023 |
Ingo Molnar <[email protected]> |
<linux/list.h>: Introduce the list_for_each_reverse() method
The list_head counterpart of list_for_each_entry_reverse() was missing, add it to complete the list handling APIs in <linux/list.h>.
[ T
<linux/list.h>: Introduce the list_for_each_reverse() method
The list_head counterpart of list_for_each_entry_reverse() was missing, add it to complete the list handling APIs in <linux/list.h>.
[ This new API is also relied on by a WIP scheduler patch, so this variant is not a theoretical possibility only. ]
Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected]
show more ...
|
|
Revision tags: v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6 |
|
| #
aebc7b0d |
| 11-Aug-2023 |
Marco Elver <[email protected]> |
list: Introduce CONFIG_LIST_HARDENED
Numerous production kernel configs (see [1, 2]) are choosing to enable CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened configs [3]. The m
list: Introduce CONFIG_LIST_HARDENED
Numerous production kernel configs (see [1, 2]) are choosing to enable CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened configs [3]. The motivation behind this is that the option can be used as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025 are mitigated by the option [4]).
The feature has never been designed with performance in mind, yet common list manipulation is happening across hot paths all over the kernel.
Introduce CONFIG_LIST_HARDENED, which performs list pointer checking inline, and only upon list corruption calls the reporting slow path.
To generate optimal machine code with CONFIG_LIST_HARDENED:
1. Elide checking for pointer values which upon dereference would result in an immediate access fault (i.e. minimal hardening checks). The trade-off is lower-quality error reports.
2. Use the __preserve_most function attribute (available with Clang, but not yet with GCC) to minimize the code footprint for calling the reporting slow path. As a result, function size of callers is reduced by avoiding saving registers before calling the rarely called reporting slow path.
Note that all TUs in lib/Makefile already disable function tracing, including list_debug.c, and __preserve_most's implied notrace has no effect in this case.
3. Because the inline checks are a subset of the full set of checks in __list_*_valid_or_report(), always return false if the inline checks failed. This avoids redundant compare and conditional branch right after return from the slow path.
As a side-effect of the checks being inline, if the compiler can prove some condition to always be true, it can completely elide some checks.
Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the Kconfig variables are changed to reflect that: DEBUG_LIST selects LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on DEBUG_LIST.
Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with "preserve_most") shows throughput improvements, in my case of ~7% on average (up to 20-30% on some test cases).
Link: https://r.android.com/1266735 [1] Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2] Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3] Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4] Signed-off-by: Marco Elver <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
| #
b16c42c8 |
| 11-Aug-2023 |
Marco Elver <[email protected]> |
list_debug: Introduce inline wrappers for debug checks
Turn the list debug checking functions __list_*_valid() into inline functions that wrap the out-of-line functions. Care is taken to ensure the
list_debug: Introduce inline wrappers for debug checks
Turn the list debug checking functions __list_*_valid() into inline functions that wrap the out-of-line functions. Care is taken to ensure the inline wrappers are always inlined, so that additional compiler instrumentation (such as sanitizers) does not result in redundant outlining.
This change is preparation for performing checks in the inline wrappers.
No functional change intended.
Signed-off-by: Marco Elver <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8 |
|
| #
4d70c746 |
| 30-Nov-2022 |
Andy Shevchenko <[email protected]> |
i915: Move list_count() to list.h as list_count_nodes() for broader use
Some of the existing users, and definitely will be new ones, want to count existing nodes in the list. Provide a generic API f
i915: Move list_count() to list.h as list_count_nodes() for broader use
Some of the existing users, and definitely will be new ones, want to count existing nodes in the list. Provide a generic API for that by moving code from i915 to list.h.
Reviewed-by: Lucas De Marchi <[email protected]> Acked-by: Jani Nikula <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
| #
51daa42d |
| 30-Nov-2022 |
Greg Kroah-Hartman <[email protected]> |
Revert "i915: Move list_count() to list.h for broader use"
This reverts commit a9efc04cfd05690e91279f41c2325c46335c43ef as it breaks the build.
Link: https://lore.kernel.org/r/20221130131854.35b58b
Revert "i915: Move list_count() to list.h for broader use"
This reverts commit a9efc04cfd05690e91279f41c2325c46335c43ef as it breaks the build.
Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Cc: Lucas De Marchi <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Andy Shevchenko <[email protected]> Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc7 |
|
| #
a9efc04c |
| 23-Nov-2022 |
Andy Shevchenko <[email protected]> |
i915: Move list_count() to list.h for broader use
Some of the existing users, and definitely will be new ones, want to count existing nodes in the list. Provide a generic API for that by moving code
i915: Move list_count() to list.h for broader use
Some of the existing users, and definitely will be new ones, want to count existing nodes in the list. Provide a generic API for that by moving code from i915 to list.h.
Reviewed-by: Lucas De Marchi <[email protected]> Acked-by: Jani Nikula <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18 |
|
| #
ad25f5cb |
| 21-May-2022 |
David Howells <[email protected]> |
rxrpc: Fix locking issue
There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses
rxrpc: Fix locking issue
There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock.
================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario:
CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock);
*** DEADLOCK ***
1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d
Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc7, v5.18-rc6 |
|
| #
2fbdf45d |
| 06-May-2022 |
Ricardo Martinez <[email protected]> |
list: Add list_next_entry_circular() and list_prev_entry_circular()
Add macros to get the next or previous entries and wraparound if needed. For example, calling list_next_entry_circular() on the la
list: Add list_next_entry_circular() and list_prev_entry_circular()
Add macros to get the next or previous entries and wraparound if needed. For example, calling list_next_entry_circular() on the last element should return the first element in the list.
Signed-off-by: Ricardo Martinez <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc5 |
|
| #
d679ae94 |
| 29-Apr-2022 |
Kuniyuki Iwashima <[email protected]> |
list: fix a data-race around ep->rdllist
ep_poll() first calls ep_events_available() with no lock held and checks if ep->rdllist is empty by list_empty_careful(), which reads rdllist->prev. Thus al
list: fix a data-race around ep->rdllist
ep_poll() first calls ep_events_available() with no lock held and checks if ep->rdllist is empty by list_empty_careful(), which reads rdllist->prev. Thus all accesses to it need some protection to avoid store/load-tearing.
Note INIT_LIST_HEAD_RCU() already has the annotation for both prev and next.
Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") added the first lockless ep_events_available(), and commit c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") made some ep_events_available() calls lockless and added single call under a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") made the last ep_events_available() lockless.
BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait
write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0: INIT_LIST_HEAD include/linux/list.h:38 [inline] list_splice_init include/linux/list.h:492 [inline] ep_start_scan fs/eventpoll.c:622 [inline] ep_send_events fs/eventpoll.c:1656 [inline] ep_poll fs/eventpoll.c:1806 [inline] do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1: list_empty_careful include/linux/list.h:329 [inline] ep_events_available fs/eventpoll.c:381 [inline] ep_poll fs/eventpoll.c:1797 [inline] do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff88810480c7d0 -> 0xffff888103c15098
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Link: https://lkml.kernel.org/r/[email protected] Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reported-by: [email protected] Cc: Al Viro <[email protected]>, Andrew Morton <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: "Soheil Hassas Yeganeh" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Sridhar Samudrala" <[email protected]> Cc: Alexander Duyck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
04254730 |
| 20-Jan-2022 |
Andy Shevchenko <[email protected]> |
list: introduce list_is_head() helper and re-use it in list.h
Introduce list_is_head() in the similar (*) way as it's done for list_entry_is_head(). Make use of it in the list.h.
*) it's done as i
list: introduce list_is_head() helper and re-use it in list.h
Introduce list_is_head() in the similar (*) way as it's done for list_entry_is_head(). Make use of it in the list.h.
*) it's done as inliner and not a macro to be aligned with other list_is_*() APIs; while at it, make all three to have the same style.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Heikki Krogerus <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1 |
|
| #
cd7187e1 |
| 09-Nov-2021 |
Andy Shevchenko <[email protected]> |
include/linux/list.h: replace kernel.h with the necessary inclusions
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are invo
include/linux/list.h: replace kernel.h with the necessary inclusions
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5 |
|
| #
4704bd31 |
| 16-Nov-2020 |
Mauro Carvalho Chehab <[email protected]> |
list: Fix a typo at the kernel-doc markup
hlist_add_behing -> hlist_add_behind
Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
|
|
Revision tags: v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6 |
|
| #
1eafe075 |
| 20-Sep-2020 |
Asif Rasheed <[email protected]> |
list.h: Update comment to explicitly note circular lists
The students in the Operating System Lecture Section at the American University of Sharjah were confused by the header comment in include/lin
list.h: Update comment to explicitly note circular lists
The students in the Operating System Lecture Section at the American University of Sharjah were confused by the header comment in include/linux/list.h, which says "Simple doubly linked list implementation". This comment means "simple" as in "not complex", but "simple" is often used in this context to mean "not circular". This commit therefore avoids this ambiguity by explicitly calling out "circular".
Signed-off-by: Asif Rasheed <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
| #
e1308161 |
| 16-Oct-2020 |
Andy Shevchenko <[email protected]> |
include/linux/list.h: add a macro to test if entry is pointing to the head
Add a macro to test if entry is pointing to the head of the list which is useful in cases like:
list_for_each_entry(pos,
include/linux/list.h: add a macro to test if entry is pointing to the head
Add a macro to test if entry is pointing to the head of the list which is useful in cases like:
list_for_each_entry(pos, &head, member) { if (cond) break; } if (list_entry_is_head(pos, &head, member)) return -ERRNO;
that allows to avoid additional variable to be added to track if loop has not been stopped in the middle.
While here, convert list_for_each_entry*() family of macros to use a new one.
Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Cezary Rojewski <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7 |
|
| #
c6fe44d9 |
| 23-Jul-2020 |
Linus Torvalds <[email protected]> |
list: add "list_del_init_careful()" to go with "list_empty_careful()"
That gives us ordering guarantees around the pair.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Revision tags: v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2 |
|
| #
c84716c4 |
| 21-Mar-2019 |
NeilBrown <[email protected]> |
list/hashtable: minor documentation corrections.
hash_for_each_safe() and hash_for_each_possible_safe() need to be passed a temp 'struct hlist_node' pointer, but do not say that in the documentation
list/hashtable: minor documentation corrections.
hash_for_each_safe() and hash_for_each_possible_safe() need to be passed a temp 'struct hlist_node' pointer, but do not say that in the documentation - they just say a 'struct'.
Also the documentation for hlist_for_each_entry_safe() describes @n as "another" hlist_node, but in reality it is the only one.
Signed-off-by: NeilBrown <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
show more ...
|
| #
46deb744 |
| 09-Nov-2019 |
Paul E. McKenney <[email protected]> |
rcu: Add and update docbook header comments in list.h
[ paulmck: Fix typo found by kbuild test robot. ] Signed-off-by: Paul E. McKenney <[email protected]>
|
| #
28ca0d6d |
| 28-Nov-2019 |
Pavel Begunkov <[email protected]> |
list: introduce list_for_each_continue()
As other *continue() helpers, this continues iteration from a given position.
Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axb
list: introduce list_for_each_continue()
As other *continue() helpers, this continues iteration from a given position.
Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
c54a2744 |
| 07-Nov-2019 |
Eric Dumazet <[email protected]> |
list: Add hlist_unhashed_lockless()
We would like to use hlist_unhashed() from timer_pending(), which runs without protection of a lock.
Note that other callers might also want to use this variant.
list: Add hlist_unhashed_lockless()
We would like to use hlist_unhashed() from timer_pending(), which runs without protection of a lock.
Note that other callers might also want to use this variant.
Instead of forcing a READ_ONCE() for all hlist_unhashed() callers, add a new helper with an explicit _lockless suffix in the name to better document what is going on.
Also add various WRITE_ONCE() in __hlist_del(), hlist_add_head() and hlist_add_before()/hlist_add_behind() to pair with the READ_ONCE().
Signed-off-by: Eric Dumazet <[email protected]> Cc: Thomas Gleixner <[email protected]> [ paulmck: Also add WRITE_ONCE() to rculist.h. ] Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
| #
c8af5cd7 |
| 28-Jun-2019 |
Toke Høiland-Jørgensen <[email protected]> |
xskmap: Move non-standard list manipulation to helper
Add a helper in list.h for the non-standard way of clearing a list that is used in xskmap. This makes it easier to reuse it in the other map typ
xskmap: Move non-standard list manipulation to helper
Add a helper in list.h for the non-standard way of clearing a list that is used in xskmap. This makes it easier to reuse it in the other map types, and also makes sure this usage is not forgotten in any list refactorings in the future.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
show more ...
|
| #
e900a918 |
| 14-May-2019 |
Dan Williams <[email protected]> |
mm: shuffle initial free memory to improve memory-side-cache utilization
Patch series "mm: Randomize free memory", v10.
This patch (of 3):
Randomization of the page allocator improves the average
mm: shuffle initial free memory to improve memory-side-cache utilization
Patch series "mm: Randomize free memory", v10.
This patch (of 3):
Randomization of the page allocator improves the average utilization of a direct-mapped memory-side-cache. Memory side caching is a platform capability that Linux has been previously exposed to in HPC (high-performance computing) environments on specialty platforms. In that instance it was a smaller pool of high-bandwidth-memory relative to higher-capacity / lower-bandwidth DRAM. Now, this capability is going to be found on general purpose server platforms where DRAM is a cache in front of higher latency persistent memory [1].
Robert offered an explanation of the state of the art of Linux interactions with memory-side-caches [2], and I copy it here:
It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software
and this abandoned patch series proposed that for the kernel: https://lkml.kernel.org/r/[email protected]
Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s."
A replacement for zonesort was merged upstream in commit cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability"). With this numa_emulation capability, memory can be split into cache sized ("near-memory" sized) numa nodes. A bind operation to such a node, and disabling workloads on other nodes, enables full cache performance. However, once the workload exceeds the cache size then cache conflicts are unavoidable. While HPC environments might be able to tolerate time-scheduling of cache sized workloads, for general purpose server platforms, the oversubscribed cache case will be the common case.
The worst case scenario is that a server system owner benchmarks a workload at boot with an un-contended cache only to see that performance degrade over time, even below the average cache performance due to excessive conflicts. Randomization clips the peaks and fills in the valleys of cache utilization to yield steady average performance.
Here are some performance impact details of the patches:
1/ An Intel internal synthetic memory bandwidth measurement tool, saw a 3X speedup in a contrived case that tries to force cache conflicts. The contrived cased used the numa_emulation capability to force an instance of the benchmark to be run in two of the near-memory sized numa nodes. If both instances were placed on the same emulated they would fit and cause zero conflicts. While on separate emulated nodes without randomization they underutilized the cache and conflicted unnecessarily due to the in-order allocation per node.
2/ A well known Java server application benchmark was run with a heap size that exceeded cache size by 3X. The cache conflict rate was 8% for the first run and degraded to 21% after page allocator aging. With randomization enabled the rate levelled out at 11%.
3/ A MongoDB workload did not observe measurable difference in cache-conflict rates, but the overall throughput dropped by 7% with randomization in one case.
4/ Mel Gorman ran his suite of performance workloads with randomization enabled on platforms without a memory-side-cache and saw a mix of some improvements and some losses [3].
While there is potentially significant improvement for applications that depend on low latency access across a wide working-set, the performance may be negligible to negative for other workloads. For this reason the shuffle capability defaults to off unless a direct-mapped memory-side-cache is detected. Even then, the page_alloc.shuffle=0 parameter can be specified to disable the randomization on those systems.
Outside of memory-side-cache utilization concerns there is potentially security benefit from randomization. Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined.
Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc.
I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."
While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. However, it should be noted, the concrete security benefits are hard to quantify, and no known CVE is mitigated by this randomization.
Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. Do this based on either the presence of a page_alloc.shuffle=Y command line parameter, or autodetection of a memory-side-cache (to be added in a follow-on patch).
The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work.
This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate.
[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM [3]: https://lkml.org/lkml/2018/10/12/309
[[email protected]: fix shuffle enable] Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com [[email protected]: fix SHUFFLE_PAGE_ALLOCATOR help texts] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Qian Cai <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Keith Busch <[email protected]> Cc: Robert Elliott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|