|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4 |
|
| #
8c57b687 |
| 22-Feb-2025 |
Alexei Starovoitov <[email protected]> |
mm, bpf: Introduce free_pages_nolock()
Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be
mm, bpf: Introduce free_pages_nolock()
Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in PREEMPT_RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context.
Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort.
Acked-by: Vlastimil Babka <[email protected]> Acked-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
97769a53 |
| 22-Feb-2025 |
Alexei Starovoitov <[email protected]> |
mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional
mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in PREEMPT_RT when in NMI or in hard IRQ. Bailout early in such case.
The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches.
This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/
Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
031e04bd |
| 22-Nov-2024 |
Marco Elver <[email protected]> |
stackdepot: fix stack_depot_save_flags() in NMI context
Per documentation, stack_depot_save_flags() was meant to be usable from NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset. However, it still
stackdepot: fix stack_depot_save_flags() in NMI context
Per documentation, stack_depot_save_flags() was meant to be usable from NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset. However, it still would try to take the pool_lock in an attempt to save a stack trace in the current pool (if space is available).
This could result in deadlock if an NMI is handled while pool_lock is already held. To avoid deadlock, only try to take the lock in NMI context and give up if unsuccessful.
The documentation is fixed to clearly convey this.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 4434a56ec209 ("stackdepot: make fast paths lock-less again") Signed-off-by: Marco Elver <[email protected]> Reported-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
70c435ca |
| 30-Apr-2024 |
Dave Chinner <[email protected]> |
stackdepot: use gfp_nested_mask() instead of open coded masking
The stackdepot code is used by KASAN and lockdep for recoding stack traces. Both of these track allocation context information, and s
stackdepot: use gfp_nested_mask() instead of open coded masking
The stackdepot code is used by KASAN and lockdep for recoding stack traces. Both of these track allocation context information, and so their internal allocations must obey the caller allocation contexts to avoid generating their own false positive warnings that have nothing to do with the code they are instrumenting/tracking.
We also don't want recording stack traces to deplete emergency memory reserves - debug code is useless if it creates new issues that can't be replicated when the debug code is disabled.
Switch the stackdepot allocation masking to use gfp_nested_mask() to address these issues. gfp_nested_mask() also strips GFP_ZONEMASK naturally, so that greatly simplifies this code.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Andrey Konovalov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5 |
|
| #
6fe60465 |
| 18-Apr-2024 |
Andrey Ryabinin <[email protected]> |
stackdepot: respect __GFP_NOLOCKDEP allocation flag
If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may
stackdepot: respect __GFP_NOLOCKDEP allocation flag
If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow:
====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ #49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs]
but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79
Always preserve __GFP_NOLOCKDEP to fix this.
Link: https://lkml.kernel.org/r/[email protected] Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <[email protected]> Reported-by: Xiubo Li <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reported-by: Damien Le Moal <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Dave Chinner <[email protected]> Tested-by: Xiubo Li <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3 |
|
| #
a6c1d9cb |
| 02-Apr-2024 |
Peter Collingbourne <[email protected]> |
stackdepot: rename pool_index to pool_index_plus_1
Commit 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") changed the meaning of the pool_index field to mean "the pool index plus
stackdepot: rename pool_index to pool_index_plus_1
Commit 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") changed the meaning of the pool_index field to mean "the pool index plus 1". This made the code accessing this field less self-documenting, as well as causing debuggers such as drgn to not be able to easily remain compatible with both old and new kernels, because they typically do that by testing for presence of the new field. Because stackdepot is a debugging tool, we should make sure that it is debugger friendly. Therefore, give the field a different name to improve readability as well as enabling debugger backwards compatibility.
This is needed in 6.9, which would otherwise become an odd release with the new semantics and old name so debuggers wouldn't recognize the new semantics there.
Fixes: 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") Link: https://lkml.kernel.org/r/[email protected] Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88 Signed-off-by: Peter Collingbourne <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Acked-by: Oscar Salvador <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Omar Sandoval <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
dc245594 |
| 23-Feb-2024 |
Dan Carpenter <[email protected]> |
lib/stackdepot: off by one in depot_fetch_stack()
The stack_pools[] array has DEPOT_MAX_POOLS. The "pools_num" tracks the number of pools which are initialized. See depot_init_pool() for more deta
lib/stackdepot: off by one in depot_fetch_stack()
The stack_pools[] array has DEPOT_MAX_POOLS. The "pools_num" tracks the number of pools which are initialized. See depot_init_pool() for more details.
If pool_index == pools_num_cached, this will read one element beyond what we want. If not all the pools are initialized, then the pool will be NULL, triggering a WARN(), and if they are all initialized it will read one element beyond the end of the array.
Link: https://lkml.kernel.org/r/[email protected] Fixes: b29d31885814 ("lib/stackdepot: store free stack records in a freelist") Signed-off-by: Dan Carpenter <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5 |
|
| #
4bedfb31 |
| 15-Feb-2024 |
Oscar Salvador <[email protected]> |
mm,page_owner: maintain own list of stack_records structs
page_owner needs to increment a stack_record refcount when a new allocation occurs, and decrement it on a free operation. In order to do th
mm,page_owner: maintain own list of stack_records structs
page_owner needs to increment a stack_record refcount when a new allocation occurs, and decrement it on a free operation. In order to do that, we need to have a way to get a stack_record from a handle. Implement __stack_depot_get_stack_record() which just does that, and make it public so page_owner can use it.
Also, traversing all stackdepot buckets comes with its own complexity, plus we would have to implement a way to mark only those stack_records that were originated from page_owner, as those are the ones we are interested in. For that reason, page_owner maintains its own list of stack_records, because traversing that list is faster than traversing all buckets while keeping at the same time a low complexity.
For now, add to stack_list only the stack_records of dummy_handle and failure_handle, and set their refcount of 1.
Further patches will add code to increment or decrement stack_records count on allocation and free operation.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: Marco Elver <[email protected]> Acked-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
8151c7a3 |
| 15-Feb-2024 |
Oscar Salvador <[email protected]> |
lib/stackdepot: move stack_record struct definition into the header
In order to move the heavy lifting into page_owner code, this one needs to have access to the stack_record structure, which right
lib/stackdepot: move stack_record struct definition into the header
In order to move the heavy lifting into page_owner code, this one needs to have access to the stack_record structure, which right now sits in lib/stackdepot.c. Move it to the stackdepot.h header so page_owner can access stack_record's struct fields.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
3ee34eab |
| 15-Feb-2024 |
Oscar Salvador <[email protected]> |
lib/stackdepot: fix first entry having a 0-handle
Patch series "page_owner: print stacks and their outstanding allocations", v10.
page_owner is a great debug functionality tool that lets us know ab
lib/stackdepot: fix first entry having a 0-handle
Patch series "page_owner: print stacks and their outstanding allocations", v10.
page_owner is a great debug functionality tool that lets us know about all pages that have been allocated/freed and their specific stacktrace. This comes very handy when debugging memory leaks, since with some scripting we can see the outstanding allocations, which might point to a memory leak.
In my experience, that is one of the most useful cases, but it can get really tedious to screen through all pages and try to reconstruct the stack <-> allocated/freed relationship, becoming most of the time a daunting and slow process when we have tons of allocation/free operations.
This patchset aims to ease that by adding a new functionality into page_owner. This functionality creates a new directory called 'page_owner_stacks' under 'sys/kernel//debug' with a read-only file called 'show_stacks', which prints out all the stacks followed by their outstanding number of allocations (being that the times the stacktrace has allocated but not freed yet). This gives us a clear and a quick overview of stacks <-> allocated/free.
We take advantage of the new refcount_f field that stack_record struct gained, and increment/decrement the stack refcount on every __set_page_owner() (alloc operation) and __reset_page_owner (free operation) call.
Unfortunately, we cannot use the new stackdepot api STACK_DEPOT_FLAG_GET because it does not fulfill page_owner needs, meaning we would have to special case things, at which point makes more sense for page_owner to do its own {dec,inc}rementing of the stacks. E.g: Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0, such stack gets evicted, so page_owner would lose information.
This patchset also creates a new file called 'set_threshold' within 'page_owner_stacks' directory, and by writing a value to it, the stacks which refcount is below such value will be filtered out.
A PoC can be found below:
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks.txt # head -40 page_owner_full_stacks.txt prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 page_cache_ra_unbounded+0x96/0x180 filemap_get_pages+0xfd/0x590 filemap_read+0xcc/0x330 blkdev_read_iter+0xb8/0x150 vfs_read+0x285/0x320 ksys_read+0xa5/0xe0 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 521
prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 __filemap_get_folio+0x14a/0x490 ext4_write_begin+0xbd/0x4b0 [ext4] generic_perform_write+0xc1/0x1e0 ext4_buffered_write_iter+0x68/0xe0 [ext4] ext4_file_write_iter+0x70/0x740 [ext4] vfs_write+0x33d/0x420 ksys_write+0xa5/0xe0 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 4609 ... ...
# echo 5000 > /sys/kernel/debug/page_owner_stacks/set_threshold # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks_5000.txt # head -40 page_owner_full_stacks_5000.txt prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 __filemap_get_folio+0x14a/0x490 ext4_write_begin+0xbd/0x4b0 [ext4] generic_perform_write+0xc1/0x1e0 ext4_buffered_write_iter+0x68/0xe0 [ext4] ext4_file_write_iter+0x70/0x740 [ext4] vfs_write+0x33d/0x420 ksys_pwrite64+0x75/0x90 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 6781
prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 pcpu_populate_chunk+0xec/0x350 pcpu_balance_workfn+0x2d1/0x4a0 process_scheduled_works+0x84/0x380 worker_thread+0x12a/0x2a0 kthread+0xe3/0x110 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1b/0x30 stack_count: 8641
This patch (of 7):
The very first entry of stack_record gets a handle of 0, but this is wrong because stackdepot treats a 0-handle as a non-valid one. E.g: See the check in stack_depot_fetch()
Fix this by adding and offset of 1.
This bug has been lurking since the very beginning of stackdepot, but no one really cared as it seems. Because of that I am not adding a Fixes tag.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Co-developed-by: Marco Elver <[email protected]> Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Oscar Salvador <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3 |
|
| #
31639fd6 |
| 29-Jan-2024 |
Marco Elver <[email protected]> |
stackdepot: use variable size records for non-evictable entries
With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely
stackdepot: use variable size records for non-evictable entries
With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely store differently sized stack traces. In all cases that do not make use of evictions, this wastes lots of space.
Fix it by re-introducing variable size stack records (up to the max allowed size) for entries that will never be evicted. We know if an entry will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided, since a later stack_depot_put() attempt is undefined behavior.
With my current kernel config that enables KASAN and also SLUB owner tracking, I observe (after a kernel boot) a whopping reduction of 296 stack depot pools, which translates into 4736 KiB saved. The savings here are from SLUB owner tracking only, because KASAN generic mode still uses refcounting.
Before:
pools: 893 allocations: 29841 frees: 6524 in_use: 23317 freelist_size: 3454
After:
pools: 597 refcounted_allocations: 17547 refcounted_frees: 6477 refcounted_in_use: 11070 freelist_size: 3497 persistent_count: 12163 persistent_bytes: 1717008
[[email protected]: fix -Wstringop-overflow warning] Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Tested-by: Mikhail Gavrilov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1 |
|
| #
4434a56e |
| 18-Jan-2024 |
Marco Elver <[email protected]> |
stackdepot: make fast paths lock-less again
With the introduction of the pool_rwlock (reader-writer lock), several fast paths end up taking the pool_rwlock as readers. Furthermore, stack_depot_put(
stackdepot: make fast paths lock-less again
With the introduction of the pool_rwlock (reader-writer lock), several fast paths end up taking the pool_rwlock as readers. Furthermore, stack_depot_put() unconditionally takes the pool_rwlock as a writer.
Despite allowing readers to make forward-progress concurrently, reader-writer locks have inherent cache contention issues, which does not scale well on systems with large CPU counts.
Rework the synchronization story of stack depot to again avoid taking any locks in the fast paths. This is done by relying on RCU-protected list traversal, and the NMI-safe subset of RCU to delay reuse of freed stack records. See code comments for more details.
Along with the performance issues, this also fixes incorrect nesting of rwlock within a raw_spinlock, given that stack depot should still be usable from anywhere:
| [ BUG: Invalid wait context ] | ----------------------------- | swapper/0/1 is trying to lock: | ffffffff89869be8 (pool_rwlock){..--}-{3:3}, at: stack_depot_save_flags | other info that might help us debug this: | context-{5:5} | 2 locks held by swapper/0/1: | #0: ffffffff89632440 (rcu_read_lock){....}-{1:3}, at: __queue_work | #1: ffff888100092018 (&pool->lock){-.-.}-{2:2}, at: __queue_work <-- raw_spin_lock
Stack depot usage stats are similar to the previous version after a KASAN kernel boot:
$ cat /sys/kernel/debug/stackdepot/stats pools: 838 allocations: 29865 frees: 6604 in_use: 23261 freelist_size: 1879
The number of pools is the same as previously. The freelist size is minimally larger, but this may also be due to variance across system boots. This shows that even though we do not eagerly wait for the next RCU grace period (such as with synchronize_rcu() or call_rcu()) after freeing a stack record - requiring depot_pop_free() to "poll" if an entry may be used - new allocations are very likely to happen in later RCU grace periods.
Link: https://lkml.kernel.org/r/[email protected] Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Reported-by: Andi Kleen <[email protected]> Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
c2a29254 |
| 18-Jan-2024 |
Marco Elver <[email protected]> |
stackdepot: add stats counters exported via debugfs
Add a few basic stats counters for stack depot that can be used to derive if stack depot is working as intended. This is a snapshot of the new st
stackdepot: add stats counters exported via debugfs
Add a few basic stats counters for stack depot that can be used to derive if stack depot is working as intended. This is a snapshot of the new stats after booting a system with a KASAN-enabled kernel:
$ cat /sys/kernel/debug/stackdepot/stats pools: 838 allocations: 29861 frees: 6561 in_use: 23300 freelist_size: 1840
Generally, "pools" should be well below the max; once the system is booted, "in_use" should remain relatively steady.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Alexander Potapenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7 |
|
| #
a914d8d6 |
| 19-Dec-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: add printk_deferred_enter/exit guards
Patch series "lib/stackdepot, kasan: fixes for stack eviction series", v3.
A few fixes for the stack depot eviction series ("stackdepot: allow
lib/stackdepot: add printk_deferred_enter/exit guards
Patch series "lib/stackdepot, kasan: fixes for stack eviction series", v3.
A few fixes for the stack depot eviction series ("stackdepot: allow evicting stack traces").
This patch (of 5):
Stack depot functions can be called from various contexts that do allocations, including with console locks taken. At the same time, stack depot functions might print WARNING's or refcount-related failures.
This can cause a deadlock on console locks.
Add printk_deferred_enter/exit guards to stack depot to avoid this.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/82092f9040d075a161d1264377d51e0bac847e8a.1703020707.git.andreyknvl@google.com Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Konovalov <[email protected]> Reported-by: Tetsuo Handa <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
bd9d9624 |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: adjust DEPOT_POOLS_CAP for KMSAN
KMSAN is frequently used in fuzzing scenarios and thus saves a lot of stack traces. As KMSAN does not support evicting stack traces from the stack d
lib/stackdepot: adjust DEPOT_POOLS_CAP for KMSAN
KMSAN is frequently used in fuzzing scenarios and thus saves a lot of stack traces. As KMSAN does not support evicting stack traces from the stack depot, the stack depot capacity might be reached quickly with large stack records.
Adjust the maximum number of stack depot pools for this case.
The average size of a stack trace saved into the stack depot is ~16 frames. Thus, adjust the maximum pools number accordingly to keep the maximum number of stack traces that can be saved into the stack depot similar to the one that was allowed before the stack trace eviction changes.
Link: https://lkml.kernel.org/r/301a115cf7ce8ddb42ef6de9151c2bb76ba728fc.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
108be8de |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: allow users to evict stack traces
Add stack_depot_put, a function that decrements the reference counter on a stack record and removes it from the stack depot once the counter reaches
lib/stackdepot: allow users to evict stack traces
Add stack_depot_put, a function that decrements the reference counter on a stack record and removes it from the stack depot once the counter reaches 0.
Internally, when removing a stack record, the function unlinks it from the hash table bucket and returns to the freelist.
With this change, the users of stack depot can call stack_depot_put when keeping a stack trace in the stack depot is not needed anymore. This allows avoiding polluting the stack depot with irrelevant stack traces and thus have more space to store the relevant ones before the stack depot reaches its capacity.
Link: https://lkml.kernel.org/r/1d1ad5692ee43d4fc2b3fd9d221331d30b36123f.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
410b764f |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: add refcount for records
Add a reference counter for how many times a stack records has been added to stack depot.
Add a new STACK_DEPOT_FLAG_GET flag to stack_depot_save_flags th
lib/stackdepot: add refcount for records
Add a reference counter for how many times a stack records has been added to stack depot.
Add a new STACK_DEPOT_FLAG_GET flag to stack_depot_save_flags that instructs the stack depot to increment the refcount.
Do not yet decrement the refcount; this is implemented in one of the following patches.
Do not yet enable any users to use the flag to avoid overflowing the refcount.
This is preparatory patch for implementing the eviction of stack records from the stack depot.
Link: https://lkml.kernel.org/r/a3fc14a2359d019d2a008d4ff8b46a665371ffee.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
022012dc |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot, kasan: add flags to __stack_depot_save and rename
Change the bool can_alloc argument of __stack_depot_save to a u32 argument that accepts a set of flags.
The following patch will
lib/stackdepot, kasan: add flags to __stack_depot_save and rename
Change the bool can_alloc argument of __stack_depot_save to a u32 argument that accepts a set of flags.
The following patch will add another flag to stack_depot_save_flags besides the existing STACK_DEPOT_FLAG_CAN_ALLOC.
Also rename the function to stack_depot_save_flags, as __stack_depot_save is a cryptic name,
Link: https://lkml.kernel.org/r/645fa15239621eebbd3a10331e5864b718839512.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
4805180b |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: use list_head for stack record links
Switch stack_record to use list_head for links in the hash table and in the freelist.
This will allow removing entries from the hash table buc
lib/stackdepot: use list_head for stack record links
Switch stack_record to use list_head for links in the hash table and in the freelist.
This will allow removing entries from the hash table buckets.
This is preparatory patch for implementing the eviction of stack records from the stack depot.
Link: https://lkml.kernel.org/r/4787d9a584cd33433d9ee1846b17fa3d3e1987ad.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
a6cd9570 |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: use read/write lock
Currently, stack depot uses the following locking scheme:
1. Lock-free accesses when looking up a stack record, which allows to have multiple users to look up
lib/stackdepot: use read/write lock
Currently, stack depot uses the following locking scheme:
1. Lock-free accesses when looking up a stack record, which allows to have multiple users to look up records in parallel; 2. Spinlock for protecting the stack depot pools and the hash table when adding a new record.
For implementing the eviction of stack traces from stack depot, the lock-free approach is not going to work anymore, as we will need to be able to also remove records from the hash table.
Convert the spinlock into a read/write lock, and drop the atomic accesses, as they are no longer required.
Looking up stack traces is now protected by the read lock and adding new records - by the write lock. One of the following patches will add a new function for evicting stack records, which will be protected by the write lock as well.
With this change, multiple users can still look up records in parallel.
This is preparatory patch for implementing the eviction of stack records from the stack depot.
Link: https://lkml.kernel.org/r/9f81ffcc4bb422ebb6326a65a770bf1918634cbb.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
b29d3188 |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: store free stack records in a freelist
Instead of using the global pool_offset variable to find a free slot when storing a new stack record, mainlain a freelist of free slots within
lib/stackdepot: store free stack records in a freelist
Instead of using the global pool_offset variable to find a free slot when storing a new stack record, mainlain a freelist of free slots within the allocated stack pools.
A global next_stack variable is used as the head of the freelist, and the next field in the stack_record struct is reused as freelist link (when the record is not in the freelist, this field is used as a link in the hash table).
This is preparatory patch for implementing the eviction of stack records from the stack depot.
Link: https://lkml.kernel.org/r/b9e4c79955c2121b69301778643b203d3fb09ccc.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
a5d21f71 |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: store next pool pointer in new_pool
Instead of using the last pointer in stack_pools for storing the pointer to a new pool (which does not yet store any stack records), use a new new
lib/stackdepot: store next pool pointer in new_pool
Instead of using the last pointer in stack_pools for storing the pointer to a new pool (which does not yet store any stack records), use a new new_pool variable.
This a purely code readability change: it seems more logical to store the pointer to a pool with a special meaning in a dedicated variable.
Link: https://lkml.kernel.org/r/448bc18296c16bef95cb3167697be6583dcc8ce3.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
b6a353d3 |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: rename next_pool_required to new_pool_required
Rename next_pool_required to new_pool_required.
This a purely code readability change: the following patch will change stack depot to
lib/stackdepot: rename next_pool_required to new_pool_required
Rename next_pool_required to new_pool_required.
This a purely code readability change: the following patch will change stack depot to store the pointer to the new pool in a separate variable, and "new" seems like a more logical name.
Link: https://lkml.kernel.org/r/fd7cd6c6eb250c13ec5d2009d75bb4ddd1470db9.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
94b7d328 |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: rework helpers for depot_alloc_stack
Split code in depot_alloc_stack and depot_init_pool into 3 functions:
1. depot_keep_next_pool that keeps preallocated memory for the next pool
lib/stackdepot: rework helpers for depot_alloc_stack
Split code in depot_alloc_stack and depot_init_pool into 3 functions:
1. depot_keep_next_pool that keeps preallocated memory for the next pool if required.
2. depot_update_pools that moves on to the next pool if there's no space left in the current pool, uses preallocated memory for the new current pool if required, and calls depot_keep_next_pool otherwise.
3. depot_alloc_stack that calls depot_update_pools and then allocates a stack record as before.
This makes it somewhat easier to follow the logic of depot_alloc_stack and also serves as a preparation for implementing the eviction of stack records from the stack depot.
Link: https://lkml.kernel.org/r/71fb144d42b701fcb46708d7f4be6801a4a8270e.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
fcccc41e |
| 20-Nov-2023 |
Andrey Konovalov <[email protected]> |
lib/stackdepot: fix and clean-up atomic annotations
Drop smp_load_acquire from next_pool_required in depot_init_pool, as both depot_init_pool and the all smp_store_release's to this variable are exe
lib/stackdepot: fix and clean-up atomic annotations
Drop smp_load_acquire from next_pool_required in depot_init_pool, as both depot_init_pool and the all smp_store_release's to this variable are executed under the stack depot lock.
Also simplify and clean up comments accompanying the use of atomic accesses in the stack depot code.
Link: https://lkml.kernel.org/r/c118ef044d8db80248d9e1f14592c72e8429e9d9.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|