|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4 |
|
| #
75673fda |
| 24-Apr-2025 |
Brandon Kammerdiener <[email protected]> |
bpf: fix possible endless loop in BPF map iteration
The _safe variant used here gets the next element before running the callback, avoiding the endless loop condition.
Signed-off-by: Brandon Kammer
bpf: fix possible endless loop in BPF map iteration
The _safe variant used here gets the next element before running the callback, avoiding the endless loop condition.
Signed-off-by: Brandon Kammerdiener <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Hou Tao <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
4fa8d68a |
| 16-Mar-2025 |
Kumar Kartikeya Dwivedi <[email protected]> |
bpf: Convert hashtab.c to rqspinlock
Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed per-cpu counter crud from the code base which is no longer necessary.
Closes: https://lor
bpf: Convert hashtab.c to rqspinlock
Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed per-cpu counter crud from the code base which is no longer necessary.
Closes: https://lore.kernel.org/bpf/[email protected] Closes: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
bb2243f4 |
| 15-Mar-2025 |
Hou Tao <[email protected]> |
bpf: Check map->record at the beginning of check_and_free_fields()
When there are no special fields in the map value, there is no need to invoke bpf_obj_free_fields(). Therefore, checking the validi
bpf: Check map->record at the beginning of check_and_free_fields()
When there are no special fields in the map value, there is no need to invoke bpf_obj_free_fields(). Therefore, checking the validity of map->record in advance.
After the change, the benchmark result of the per-cpu update case in map_perf_test increased by 40% under a 16-CPU VM.
Signed-off-by: Hou Tao <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5 |
|
| #
11ba7ce0 |
| 24-Feb-2025 |
Yonghong Song <[email protected]> |
bpf: Fix kmemleak warning for percpu hashmap
Vlad Poenaru reported the following kmemleak issue:
unreferenced object 0x606fd7c44ac8 (size 32): backtrace (crc 0): pcpu_alloc_noprof+0x730
bpf: Fix kmemleak warning for percpu hashmap
Vlad Poenaru reported the following kmemleak issue:
unreferenced object 0x606fd7c44ac8 (size 32): backtrace (crc 0): pcpu_alloc_noprof+0x730/0xeb0 bpf_map_alloc_percpu+0x69/0xc0 prealloc_init+0x9d/0x1b0 htab_map_alloc+0x363/0x510 map_create+0x215/0x3a0 __sys_bpf+0x16b/0x3e0 __x64_sys_bpf+0x18/0x20 do_syscall_64+0x7b/0x150 entry_SYSCALL_64_after_hwframe+0x4b/0x53
Further investigation shows the reason is due to not 8-byte aligned store of percpu pointer in htab_elem_set_ptr(): *(void __percpu **)(l->key + key_size) = pptr;
Note that the whole htab_elem alignment is 8 (for x86_64). If the key_size is 4, that means pptr is stored in a location which is 4 byte aligned but not 8 byte aligned. In mm/kmemleak.c, scan_block() scans the memory based on 8 byte stride, so it won't detect above pptr, hence reporting the memory leak.
In htab_map_alloc(), we already have
htab->elem_size = sizeof(struct htab_elem) + round_up(htab->map.key_size, 8); if (percpu) htab->elem_size += sizeof(void *); else htab->elem_size += round_up(htab->map.value_size, 8);
So storing pptr with 8-byte alignment won't cause any problem and can fix kmemleak too.
The issue can be reproduced with bpf selftest as well: 1. Enable CONFIG_DEBUG_KMEMLEAK config 2. Add a getchar() before skel destroy in test_hash_map() in prog_tests/for_each.c. The purpose is to keep map available so kmemleak can be detected. 3. run './test_progs -t for_each/hash_map &' and a kmemleak should be reported.
Reported-by: Vlad Poenaru <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13 |
|
| #
47363f15 |
| 17-Jan-2025 |
Hou Tao <[email protected]> |
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
The freeing of special fields in map value may acquire a spin-lock (e.g., the freeing of bpf_timer), however, the lookup_and_del
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
The freeing of special fields in map value may acquire a spin-lock (e.g., the freeing of bpf_timer), however, the lookup_and_delete_elem procedure has already held a raw-spin-lock, which violates the lockdep rule.
The running context of __htab_map_lookup_and_delete_elem() has already disabled the migration. Therefore, it is OK to invoke free_htab_elem() after unlocking the bucket lock.
Fix the potential problem by freeing element after unlocking bucket lock in __htab_map_lookup_and_delete_elem().
Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
588c6ead |
| 17-Jan-2025 |
Hou Tao <[email protected]> |
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
Use goto statement to bail out early when the target element is not found, instead of using a large else branch to handle the more likely c
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
Use goto statement to bail out early when the target element is not found, instead of using a large else branch to handle the more likely case. This change doesn't affect functionality and simply make the code cleaner.
Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
45dc92c3 |
| 17-Jan-2025 |
Hou Tao <[email protected]> |
bpf: Free special fields after unlock in htab_lru_map_delete_node()
When bpf_timer is used in LRU hash map, calling check_and_free_fields() in htab_lru_map_delete_node() will invoke bpf_timer_cancel
bpf: Free special fields after unlock in htab_lru_map_delete_node()
When bpf_timer is used in LRU hash map, calling check_and_free_fields() in htab_lru_map_delete_node() will invoke bpf_timer_cancel_and_free() to free the bpf_timer. If the timer is running on other CPUs, hrtimer_cancel() will invoke hrtimer_cancel_wait_running() to spin on current CPU to wait for the completion of the hrtimer callback.
Considering that the deletion has already acquired a raw-spin-lock (bucket lock). To reduce the time holding the bucket lock, move the invocation of check_and_free_fields() out of bucket lock. However, because htab_lru_map_delete_node() is invoked with LRU raw spin lock being held, the freeing of special fields still happens in a locked scope.
Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7 |
|
| #
4b7e7cd1 |
| 08-Jan-2025 |
Hou Tao <[email protected]> |
bpf: Disable migration before calling ops->map_free()
The freeing of all map elements may invoke bpf_obj_free_fields() to free the special fields in the map value. Since these special fields may be
bpf: Disable migration before calling ops->map_free()
The freeing of all map elements may invoke bpf_obj_free_fields() to free the special fields in the map value. Since these special fields may be allocated from bpf memory allocator, migrate_{disable|enable} pairs are necessary for the freeing of these special fields.
To simplify reasoning about when migrate_disable() is needed for the freeing of these special fields, let the caller to guarantee migration is disabled before invoking bpf_obj_free_fields(). Therefore, disabling migration before calling ops->map_free() to simplify the freeing of map values or special fields allocated from bpf memory allocator.
After disabling migration in bpf_map_free(), there is no need for additional migration_{disable|enable} pairs in these ->map_free() callbacks. Remove these redundant invocations.
The migrate_{disable|enable} pairs in the underlying implementation of bpf_obj_free_fields() will be removed by the following patch.
Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
53f2ba0b |
| 08-Jan-2025 |
Hou Tao <[email protected]> |
bpf: Remove migrate_{disable|enable} in htab_elem_free
htab_elem_free() has two call-sites: delete_all_elements() has already disabled migration, free_htab_elem() is invoked by other 4 functions: __
bpf: Remove migrate_{disable|enable} in htab_elem_free
htab_elem_free() has two call-sites: delete_all_elements() has already disabled migration, free_htab_elem() is invoked by other 4 functions: __htab_map_lookup_and_delete_elem, __htab_map_lookup_and_delete_batch, htab_map_update_elem and htab_map_delete_elem.
BPF syscall has already disabled migration before invoking ->map_update_elem, ->map_delete_elem, and ->map_lookup_and_delete_elem callbacks for hash map. __htab_map_lookup_and_delete_batch() also disables migration before invoking free_htab_elem(). ->map_update_elem() and ->map_delete_elem() of hash map may be invoked by BPF program and the running context of BPF program has already disabled migration. Therefore, it is safe to remove the migration_{disable|enable} pair in htab_elem_free()
Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
ea5b2296 |
| 08-Jan-2025 |
Hou Tao <[email protected]> |
bpf: Remove migrate_{disable|enable} in ->map_for_each_callback
BPF program may call bpf_for_each_map_elem(), and it will call the ->map_for_each_callback callback of related bpf map. Considering th
bpf: Remove migrate_{disable|enable} in ->map_for_each_callback
BPF program may call bpf_for_each_map_elem(), and it will call the ->map_for_each_callback callback of related bpf map. Considering the running context of bpf program has already disabled migration, remove the unnecessary migrate_{disable|enable} pair in the implementations of ->map_for_each_callback. To ensure the guarantee will not be voilated later, also add cant_migrate() check in the implementations.
Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
b9e9ed90 |
| 06-Nov-2024 |
Hou Tao <[email protected]> |
bpf: Call free_htab_elem() after htab_unlock_bucket()
For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_fre
bpf: Call free_htab_elem() after htab_unlock_bucket()
For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_free_id() to free the id of the removed map element. However, bpf_map_fd_put_ptr() is invoked while holding a bucket lock (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock (spinlock_t), triggering the following lockdep warning:
============================= [ BUG: Invalid wait context ] 6.11.0-rc4+ #49 Not tainted ----------------------------- test_maps/4881 is trying to lock: ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 other info that might help us debug this: context-{5:5} 2 locks held by test_maps/4881: #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 stack backtrace: CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x73e/0x36c0 lock_acquire+0x182/0x450 _raw_spin_lock_irqsave+0x43/0x70 bpf_map_free_id.part.0+0x21/0x70 bpf_map_put+0xcf/0x110 bpf_map_fd_put_ptr+0x9a/0xb0 free_htab_elem+0x69/0xe0 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 bpf_map_update_value+0x266/0x380 __sys_bpf+0x21bb/0x36b0 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e
One way to fix the lockdep warning is using raw_spinlock_t for map_idr_lock as well. However, bpf_map_alloc_id() invokes idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a similar lockdep warning because the slab's lock (s->cpu_slab->lock) is still a spinlock.
Instead of changing map_idr_lock's type, fix the issue by invoking htab_put_fd_value() after htab_unlock_bucket(). However, only deferring the invocation of htab_put_fd_value() is not enough, because the old map pointers in htab of maps can not be saved during batched deletion. Therefore, also defer the invocation of free_htab_elem(), so these to-be-freed elements could be linked together similar to lru map.
There are four callers for ->map_fd_put_ptr:
(1) alloc_htab_elem() (through htab_put_fd_value()) It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of htab_put_fd_value() can not simply move after htab_unlock_bucket(), because the old element has already been stashed in htab->extra_elems. It may be reused immediately after htab_unlock_bucket() and the invocation of htab_put_fd_value() after htab_unlock_bucket() may release the newly-added element incorrectly. Therefore, saving the map pointer of the old element for htab of maps before unlocking the bucket and releasing the map_ptr after unlock. Beside the map pointer in the old element, should do the same thing for the special fields in the old element as well.
(2) free_htab_elem() (through htab_put_fd_value()) Its caller includes __htab_map_lookup_and_delete_elem(), htab_map_delete_elem() and __htab_map_lookup_and_delete_batch().
For htab_map_delete_elem(), simply invoke free_htab_elem() after htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just like lru map, linking the to-be-freed element into node_to_free list and invoking free_htab_elem() for these element after unlock. It is safe to reuse batch_flink as the link for node_to_free, because these elements have been removed from the hash llist.
Because htab of maps doesn't support lookup_and_delete operation, __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept it as is.
(3) fd_htab_map_free() It invokes ->map_fd_put_ptr without raw_spinlock_t.
(4) bpf_fd_htab_map_update_elem() It invokes ->map_fd_put_ptr without raw_spinlock_t.
After moving free_htab_elem() outside htab bucket lock scope, using pcpu_freelist_push() instead of __pcpu_freelist_push() to disable the irq before freeing elements, and protecting the invocations of bpf_mem_cache_free() with migrate_{disable|enable} pair.
Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
1d244784 |
| 10-Sep-2024 |
Tao Chen <[email protected]> |
bpf: Check percpu map value size first
Percpu map is often used, but the map value size limit often ignored, like issue: https://github.com/iovisor/bcc/issues/2519. Actually, percpu map value size i
bpf: Check percpu map value size first
Percpu map is often used, but the map value size limit often ignored, like issue: https://github.com/iovisor/bcc/issues/2519. Actually, percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first, like percpu map of local_storage. Maybe the error message seems clearer compared with "cannot allocate memory".
Signed-off-by: Jinke Han <[email protected]> Signed-off-by: Tao Chen <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Jiri Olsa <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3 |
|
| #
6d641ca5 |
| 11-Aug-2024 |
Uros Bizjak <[email protected]> |
bpf: Fix percpu address space issues
In arraymap.c:
In bpf_array_map_seq_start() and bpf_array_map_seq_next() cast return values from the __percpu address space to the generic address space via uin
bpf: Fix percpu address space issues
In arraymap.c:
In bpf_array_map_seq_start() and bpf_array_map_seq_next() cast return values from the __percpu address space to the generic address space via uintptr_t [1].
Correct the declaration of pptr pointer in __bpf_array_map_seq_show() to void __percpu * and cast the value from the generic address space to the __percpu address space via uintptr_t [1].
In hashtab.c:
Assign the return value from bpf_mem_cache_alloc() to void pointer and cast the value to void __percpu ** (void pointer to percpu void pointer) before dereferencing.
In memalloc.c:
Explicitly declare __percpu variables.
Cast obj to void __percpu **.
In helpers.c:
Cast ptr in BPF_CALL_1 and BPF_CALL_2 from generic address space to __percpu address space via const uintptr_t [1].
Found by GCC's named address space checks.
There were no changes in the resulting object files.
[1] https://sparse.docs.kernel.org/en/latest/annotations.html#address-space-name
Signed-off-by: Uros Bizjak <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Eduard Zingerman <[email protected]> Cc: Song Liu <[email protected]> Cc: Yonghong Song <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1, v6.10 |
|
| #
df862de4 |
| 14-Jul-2024 |
Markus Elfring <[email protected]> |
bpf: Replace 8 seq_puts() calls by seq_putc() calls
Single line breaks should occasionally be put into a sequence. Thus use the corresponding function “seq_putc”.
This issue was transformed by usin
bpf: Replace 8 seq_puts() calls by seq_putc() calls
Single line breaks should occasionally be put into a sequence. Thus use the corresponding function “seq_putc”.
This issue was transformed by using the Coccinelle software.
Signed-off-by: Markus Elfring <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
a891711d |
| 30-Apr-2024 |
Benjamin Tissoires <[email protected]> |
bpf: Do not walk twice the hash map on free
If someone stores both a timer and a workqueue in a hash map, on free, we would walk it twice.
Add a check in htab_free_malloced_timers_or_wq and free th
bpf: Do not walk twice the hash map on free
If someone stores both a timer and a workqueue in a hash map, on free, we would walk it twice.
Add a check in htab_free_malloced_timers_or_wq and free the timers and workqueues if they are present.
Fixes: 246331e3f1ea ("bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps") Signed-off-by: Benjamin Tissoires <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5 |
|
| #
246331e3 |
| 20-Apr-2024 |
Benjamin Tissoires <[email protected]> |
bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
Currently bpf_wq_cancel_and_free() is just a placeholder as there is no memory allocation for bpf_wq just yet.
Again, duplication o
bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
Currently bpf_wq_cancel_and_free() is just a placeholder as there is no memory allocation for bpf_wq just yet.
Again, duplication of the bpf_timer approach
Signed-off-by: Benjamin Tissoires <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
a7de265c |
| 17-Apr-2024 |
Rafael Passos <[email protected]> |
bpf: Fix typos in comments
Found the following typos in comments, and fixed them:
s/unpriviledged/unprivileged/ s/reponsible/responsible/ s/possiblities/possibilities/ s/Divison/Division/ s/precsio
bpf: Fix typos in comments
Found the following typos in comments, and fixed them:
s/unpriviledged/unprivileged/ s/reponsible/responsible/ s/possiblities/possibilities/ s/Divison/Division/ s/precsion/precision/ s/havea/have a/ s/reponsible/responsible/ s/responsibile/responsible/ s/tigher/tighter/ s/respecitve/respective/
Signed-off-by: Rafael Passos <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3 |
|
| #
0b56e637 |
| 02-Apr-2024 |
Andrii Nakryiko <[email protected]> |
bpf: inline bpf_map_lookup_elem() helper for PERCPU_HASH map
Using new per-CPU BPF instruction, partially inline bpf_map_lookup_elem() helper for per-CPU hashmap BPF map. Just like for normal HASH m
bpf: inline bpf_map_lookup_elem() helper for PERCPU_HASH map
Using new per-CPU BPF instruction, partially inline bpf_map_lookup_elem() helper for per-CPU hashmap BPF map. Just like for normal HASH map, we still generate a call into __htab_map_lookup_elem(), but after that we resolve per-CPU element address using a new instruction, saving on extra functions calls.
Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc2 |
|
| #
ee3bad03 |
| 27-Mar-2024 |
Yafang Shao <[email protected]> |
bpf: Mitigate latency spikes associated with freeing non-preallocated htab
Following the recent upgrade of one of our BPF programs, we encountered significant latency spikes affecting other applicat
bpf: Mitigate latency spikes associated with freeing non-preallocated htab
Following the recent upgrade of one of our BPF programs, we encountered significant latency spikes affecting other applications running on the same host. After thorough investigation, we identified that these spikes were primarily caused by the prolonged duration required to free a non-preallocated htab with approximately 2 million keys.
Notably, our kernel configuration lacks the presence of CONFIG_PREEMPT. In scenarios where kernel execution extends excessively, other threads might be starved of CPU time, resulting in latency issues across the system. To mitigate this, we've adopted a proactive approach by incorporating cond_resched() calls within the kernel code. This ensures that during lengthy kernel operations, the scheduler is invoked periodically to provide opportunities for other threads to execute.
Signed-off-by: Yafang Shao <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8 |
|
| #
6787d916 |
| 07-Mar-2024 |
Toke Høiland-Jørgensen <[email protected]> |
bpf: Fix hashtab overflow check on 32-bit arches
The hashtab code relies on roundup_pow_of_two() to compute the number of hash buckets, and contains an overflow check by checking if the resulting va
bpf: Fix hashtab overflow check on 32-bit arches
The hashtab code relies on roundup_pow_of_two() to compute the number of hash buckets, and contains an overflow check by checking if the resulting value is 0. However, on 32-bit arches, the roundup code itself can overflow by doing a 32-bit left-shift of an unsigned long value, which is undefined behaviour, so it is not guaranteed to truncate neatly. This was triggered by syzbot on the DEVMAP_HASH type, which contains the same check, copied from the hashtab code. So apply the same fix to hashtab, by moving the overflow check to before the roundup.
Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
1e2f2d31 |
| 15-Dec-2023 |
Kent Overstreet <[email protected]> |
Kill sched.h dependency on rcupdate.h
by moving cond_resched_rcu() to rcupdate_wait.h, we can kill another big sched.h dependency.
Signed-off-by: Kent Overstreet <[email protected]>
|
| #
8f82583f |
| 14-Dec-2023 |
Hou Tao <[email protected]> |
bpf: Reduce the scope of rcu_read_lock when updating fd map
There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two callb
bpf: Reduce the scope of rcu_read_lock when updating fd map
There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two callbacks.
For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need rcu-read-lock because array->ptrs must still be allocated. For bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use rcu_read_lock() during the invocation of htab_map_update_elem().
Acked-by: Yonghong Song <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5 |
|
| #
20c20bd1 |
| 04-Dec-2023 |
Hou Tao <[email protected]> |
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
map is the pointer of outer map, and need_defer needs some explanation. need_defer tells the implementation to defer the reference release
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
map is the pointer of outer map, and need_defer needs some explanation. need_defer tells the implementation to defer the reference release of the passed element and ensure that the element is still alive before the bpf program, which may manipulate it, exits.
The following three cases will invoke map_fd_put_ptr() and different need_defer values will be passed to these callers:
1) release the reference of the old element in the map during map update or map deletion. The release must be deferred, otherwise the bpf program may incur use-after-free problem, so need_defer needs to be true. 2) release the reference of the to-be-added element in the error path of map update. The to-be-added element is not visible to any bpf program, so it is OK to pass false for need_defer parameter. 3) release the references of all elements in the map during map release. Any bpf program which has access to the map must have been exited and released, so need_defer=false will be OK.
These two parameters will be used by the following patches to fix the potential use-after-free problem for map-in-map.
Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6 |
|
| #
d35381aa |
| 12-Oct-2023 |
Song Liu <[email protected]> |
bpf: Fix unnecessary -EBUSY from htab_lock_bucket
htab_lock_bucket uses the following logic to avoid recursion:
1. preempt_disable(); 2. check percpu counter htab->map_locked[hash] for recursion;
bpf: Fix unnecessary -EBUSY from htab_lock_bucket
htab_lock_bucket uses the following logic to avoid recursion:
1. preempt_disable(); 2. check percpu counter htab->map_locked[hash] for recursion; 2.1. if map_lock[hash] is already taken, return -BUSY; 3. raw_spin_lock_irqsave();
However, if an IRQ hits between 2 and 3, BPF programs attached to the IRQ logic will not able to access the same hash of the hashtab and get -EBUSY.
This -EBUSY is not really necessary. Fix it by disabling IRQ before checking map_locked:
1. preempt_disable(); 2. local_irq_save(); 3. check percpu counter htab->map_locked[hash] for recursion; 3.1. if map_lock[hash] is already taken, return -BUSY; 4. raw_spin_lock().
Similarly, use raw_spin_unlock() and local_irq_restore() in htab_unlock_bucket().
Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked") Suggested-by: Tejun Heo <[email protected]> Signed-off-by: Song Liu <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
9bc421b6 |
| 06-Jul-2023 |
Anton Protopopov <[email protected]> |
bpf: populate the per-cpu insertions/deletions counters for hashmaps
Initialize and utilize the per-cpu insertions/deletions counters for hash-based maps. Non-trivial changes only apply to the preal
bpf: populate the per-cpu insertions/deletions counters for hashmaps
Initialize and utilize the per-cpu insertions/deletions counters for hash-based maps. Non-trivial changes only apply to the preallocated maps for which the {inc,dec}_elem_count functions are not called, as there's no need in counting elements to sustain proper map operations.
To increase/decrease percpu counters for preallocated maps we add raw calls to the bpf_map_{inc,dec}_elem_count functions so that the impact is minimal. For dynamically allocated maps we add corresponding calls to the existing {inc,dec}_elem_count functions.
Signed-off-by: Anton Protopopov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|