|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
c03bb2fa |
| 22-Mar-2025 |
Kohei Enju <[email protected]> |
bpf: Fix out-of-bounds read in check_atomic_load/store()
syzbot reported the following splat [0].
In check_atomic_load/store(), register validity is not checked before atomic_ptr_type_ok(). This ca
bpf: Fix out-of-bounds read in check_atomic_load/store()
syzbot reported the following splat [0].
In check_atomic_load/store(), register validity is not checked before atomic_ptr_type_ok(). This causes the out-of-bounds read in is_ctx_reg() called from atomic_ptr_type_ok() when the register number is MAX_BPF_REG or greater.
Call check_load_mem()/check_store_reg() before atomic_ptr_type_ok() to avoid the OOB read.
However, some tests introduced by commit ff3afe5da998 ("selftests/bpf: Add selftests for load-acquire and store-release instructions") assume calling atomic_ptr_type_ok() before checking register validity. Therefore the swapping of order unintentionally changes verifier messages of these tests.
For example in the test load_acquire_from_pkt_pointer(), expected message is 'BPF_ATOMIC loads from R2 pkt is not allowed' although actual messages are different.
validate_msgs:FAIL:754 expect_msg VERIFIER LOG: ============= Global function load_acquire_from_pkt_pointer() doesn't return scalar. Only those are supported. 0: R1=ctx() R10=fp0 ; asm volatile ( @ verifier_load_acquire.c:140 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx() R2_w=pkt(r=0) 1: (d3) r0 = load_acquire((u8 *)(r2 +0)) invalid access to packet, off=0 size=1, R2(id=0,off=0,r=0) R2 offset is outside of the packet processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 ============= EXPECTED SUBSTR: 'BPF_ATOMIC loads from R2 pkt is not allowed' #505/19 verifier_load_acquire/load-acquire from pkt pointer:FAIL
This is because instructions in the test don't pass check_load_mem() and therefore don't enter the atomic_ptr_type_ok() path. In this case, we have to modify instructions so that they pass the check_load_mem() and trigger atomic_ptr_type_ok(). Similarly for store-release tests, we need to modify instructions so that they pass check_store_reg().
Like load_acquire_from_pkt_pointer(), modify instructions in: load_acquire_from_sock_pointer() store_release_to_ctx_pointer() store_release_to_pkt_pointer()
Also in store_release_to_sock_pointer(), check_store_reg() returns error early and atomic_ptr_type_ok() is not triggered, since write to sock pointer is not possible in general. We might be able to remove the test, but for now let's leave it and just change the expected message.
[0] BUG: KASAN: slab-out-of-bounds in is_ctx_reg kernel/bpf/verifier.c:6185 [inline] BUG: KASAN: slab-out-of-bounds in atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223 Read of size 4 at addr ffff888141b0d690 by task syz-executor143/5842
CPU: 1 UID: 0 PID: 5842 Comm: syz-executor143 Not tainted 6.14.0-rc3-syzkaller-gf28214603dc6 #0 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0x16e/0x5b0 mm/kasan/report.c:521 kasan_report+0x143/0x180 mm/kasan/report.c:634 is_ctx_reg kernel/bpf/verifier.c:6185 [inline] atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223 check_atomic_store kernel/bpf/verifier.c:7804 [inline] check_atomic kernel/bpf/verifier.c:7841 [inline] do_check+0x89dd/0xedd0 kernel/bpf/verifier.c:19334 do_check_common+0x1678/0x2080 kernel/bpf/verifier.c:22600 do_check_main kernel/bpf/verifier.c:22691 [inline] bpf_check+0x165c8/0x1cca0 kernel/bpf/verifier.c:23821
Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c Tested-by: [email protected] Fixes: e24bbad29a8d ("bpf: Introduce load-acquire and store-release instructions") Fixes: ff3afe5da998 ("selftests/bpf: Add selftests for load-acquire and store-release instructions") Signed-off-by: Kohei Enju <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
51d65049 |
| 19-Mar-2025 |
Juntong Deng <[email protected]> |
bpf: Add struct_ops context information to struct bpf_prog_aux
This patch adds struct_ops context information to struct bpf_prog_aux.
This context information will be used in the kfunc filter.
Cur
bpf: Add struct_ops context information to struct bpf_prog_aux
This patch adds struct_ops context information to struct bpf_prog_aux.
This context information will be used in the kfunc filter.
Currently the added context information includes struct_ops member offset and a pointer to struct bpf_struct_ops.
Signed-off-by: Juntong Deng <[email protected]> Signed-off-by: Amery Hung <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Link: https://patch.msgid.link/[email protected]
show more ...
|
|
Revision tags: v6.14-rc7 |
|
| #
ea21771c |
| 16-Mar-2025 |
Kumar Kartikeya Dwivedi <[email protected]> |
bpf: Maintain FIFO property for rqspinlock unlock
Since out-of-order unlocks are unsupported for rqspinlock, and irqsave variants enforce strict FIFO ordering anyway, make the same change for normal
bpf: Maintain FIFO property for rqspinlock unlock
Since out-of-order unlocks are unsupported for rqspinlock, and irqsave variants enforce strict FIFO ordering anyway, make the same change for normal non-irqsave variants, such that FIFO ordering is enforced.
Two new verifier state fields (active_lock_id, active_lock_ptr) are used to denote the top of the stack, and prev_id and prev_ptr are ascertained whenever popping the topmost entry through an unlock.
Take special care to make these fields part of the state comparison in refsafe.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
0de20461 |
| 16-Mar-2025 |
Kumar Kartikeya Dwivedi <[email protected]> |
bpf: Implement verifier support for rqspinlock
Introduce verifier-side support for rqspinlock kfuncs. The first step is allowing bpf_res_spin_lock type to be defined in map values and allocated obje
bpf: Implement verifier support for rqspinlock
Introduce verifier-side support for rqspinlock kfuncs. The first step is allowing bpf_res_spin_lock type to be defined in map values and allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK field to recognize and validate.
Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only one of them (and at most one of them per-object, like before) must be present. The bpf_res_spin_lock can also be used to protect objects that require lock protection for their kfuncs, like BPF rbtree and linked list.
The verifier plumbing to simulate success and failure cases when calling the kfuncs is done by pushing a new verifier state to the verifier state stack which will verify the failure case upon calling the kfunc. The path where success is indicated creates all lock reference state and IRQ state (if necessary for irqsave variants). In the case of failure, the state clears the registers r0-r5, sets the return value, and skips kfunc processing, proceeding to the next instruction.
When marking the return value for success case, the value is marked as 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier never traverses such branches for success cases, and would be aware that the lock is not held in such cases.
We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs are invoked. We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs with IRQ state created by bpf_local_irq_save.
With all this infrastructure, these kfuncs become usable in programs while satisfying all safety properties required by the kernel.
Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
a2598045 |
| 18-Mar-2025 |
Andrea Terzolo <[email protected]> |
bpf: clarify a misleading verifier error message
The current verifier error message states that tail_calls are not allowed in non-JITed programs with BPF-to-BPF calls. While this is accurate, it is
bpf: clarify a misleading verifier error message
The current verifier error message states that tail_calls are not allowed in non-JITed programs with BPF-to-BPF calls. While this is accurate, it is not the only scenario where this restriction applies. Some architectures do not support this feature combination even when programs are JITed. This update improves the error message to better reflect these limitations.
Suggested-by: Shung-Hsi Yu <[email protected]> Signed-off-by: Andrea Terzolo <[email protected]> Acked-by: Shung-Hsi Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
cfe816d4 |
| 18-Mar-2025 |
Yafang Shao <[email protected]> |
bpf: Reject attaching fexit/fmod_ret to __noreturn functions
If we attach fexit/fmod_ret to __noreturn functions, it will cause an issue that the bpf trampoline image will be left over even if the b
bpf: Reject attaching fexit/fmod_ret to __noreturn functions
If we attach fexit/fmod_ret to __noreturn functions, it will cause an issue that the bpf trampoline image will be left over even if the bpf link has been destroyed. Take attaching do_exit() with fexit for example. The fexit works as follows,
bpf_trampoline + __bpf_tramp_enter + percpu_ref_get(&tr->pcref);
+ call do_exit()
+ __bpf_tramp_exit + percpu_ref_put(&tr->pcref);
Since do_exit() never returns, the refcnt of the trampoline image is never decremented, preventing it from being freed. That can be verified with as follows,
$ bpftool link show <<<< nothing output $ grep "bpf_trampoline_[0-9]" /proc/kallsyms ffffffffc04cb000 t bpf_trampoline_6442526459 [bpf] <<<< leftover
In this patch, all functions annotated with __noreturn are rejected, except for the following cases: - Functions that result in a system reboot, such as panic, machine_real_restart and rust_begin_unwind - Functions that are never executed by tasks, such as rest_init and cpu_startup_entry - Functions implemented in assembly, such as rewind_stack_and_make_dead and xen_cpu_bringup_again, lack an associated BTF ID.
With this change, attaching fexit probes to functions like do_exit() will be rejected.
$ ./fexit libbpf: prog 'fexit': BPF program load failed: -EINVAL libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG -- Attaching fexit/fmod_ret to __noreturn functions is rejected.
Signed-off-by: Yafang Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6 |
|
| #
871ef8d5 |
| 05-Mar-2025 |
Eduard Zingerman <[email protected]> |
bpf: correct use/def for may_goto instruction
may_goto instruction does not use any registers, but in compute_insn_live_regs() it was treated as a regular conditional jump of kind BPF_K with r0 as s
bpf: correct use/def for may_goto instruction
may_goto instruction does not use any registers, but in compute_insn_live_regs() it was treated as a regular conditional jump of kind BPF_K with r0 as source register. Thus unnecessarily marking r0 as used.
Fixes: 14c8552db644 ("bpf: simple DFA-based live registers analysis") Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
0fb3cf61 |
| 04-Mar-2025 |
Eduard Zingerman <[email protected]> |
bpf: use register liveness information for func_states_equal
Liveness analysis DFA computes a set of registers live before each instruction. Leverage this information to skip comparison of dead regi
bpf: use register liveness information for func_states_equal
Liveness analysis DFA computes a set of registers live before each instruction. Leverage this information to skip comparison of dead registers in func_states_equal(). This helps with convergance of iterator processing loops, as bpf_reg_state->live marks can't be used when loops are processed.
This has certain performance impact for selftests, here is a veristat listing using `-f "insns_pct>5" -f "!insns<200"`
selftests:
File Program States (A) States (B) States (DIFF) -------------------- ----------------------------- ---------- ---------- -------------- arena_htab.bpf.o arena_htab_llvm 37 35 -2 (-5.41%) arena_htab_asm.bpf.o arena_htab_asm 37 33 -4 (-10.81%) arena_list.bpf.o arena_list_add 37 22 -15 (-40.54%) dynptr_success.bpf.o test_dynptr_copy 22 16 -6 (-27.27%) dynptr_success.bpf.o test_dynptr_copy_xdp 68 58 -10 (-14.71%) iters.bpf.o checkpoint_states_deletion 918 40 -878 (-95.64%) iters.bpf.o clean_live_states 136 66 -70 (-51.47%) iters.bpf.o iter_nested_deeply_iters 43 37 -6 (-13.95%) iters.bpf.o iter_nested_iters 72 62 -10 (-13.89%) iters.bpf.o iter_pass_iter_ptr_to_subprog 30 26 -4 (-13.33%) iters.bpf.o iter_subprog_iters 68 59 -9 (-13.24%) iters.bpf.o loop_state_deps2 35 32 -3 (-8.57%) iters_css.bpf.o iter_css_for_each 32 29 -3 (-9.38%) pyperf600_iter.bpf.o on_event 286 192 -94 (-32.87%)
Total progs: 3578 Old success: 2061 New success: 2061 States diff min: -95.64% States diff max: 0.00% -100 .. -90 %: 1 -55 .. -45 %: 3 -45 .. -35 %: 2 -35 .. -25 %: 5 -20 .. -10 %: 12 -10 .. 0 %: 6
sched_ext:
File Program States (A) States (B) States (DIFF) ----------------- ---------------------- ---------- ---------- --------------- bpf.bpf.o lavd_dispatch 8950 7065 -1885 (-21.06%) bpf.bpf.o lavd_init 516 480 -36 (-6.98%) bpf.bpf.o layered_dispatch 662 501 -161 (-24.32%) bpf.bpf.o layered_dump 298 237 -61 (-20.47%) bpf.bpf.o layered_init 523 423 -100 (-19.12%) bpf.bpf.o layered_init_task 24 22 -2 (-8.33%) bpf.bpf.o layered_runnable 151 125 -26 (-17.22%) bpf.bpf.o p2dq_dispatch 66 53 -13 (-19.70%) bpf.bpf.o p2dq_init 170 142 -28 (-16.47%) bpf.bpf.o refresh_layer_cpumasks 120 78 -42 (-35.00%) bpf.bpf.o rustland_init 37 34 -3 (-8.11%) bpf.bpf.o rustland_init 37 34 -3 (-8.11%) bpf.bpf.o rusty_select_cpu 125 108 -17 (-13.60%) scx_central.bpf.o central_dispatch 59 43 -16 (-27.12%) scx_central.bpf.o central_init 39 28 -11 (-28.21%) scx_nest.bpf.o nest_init 58 51 -7 (-12.07%) scx_pair.bpf.o pair_dispatch 142 111 -31 (-21.83%) scx_qmap.bpf.o qmap_dispatch 174 141 -33 (-18.97%) scx_qmap.bpf.o qmap_init 768 654 -114 (-14.84%)
Total progs: 216 Old success: 186 New success: 186 States diff min: -35.00% States diff max: 0.00% -35 .. -25 %: 3 -25 .. -20 %: 6 -20 .. -15 %: 6 -15 .. -5 %: 7 -5 .. 0 %: 6
Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
14c8552d |
| 04-Mar-2025 |
Eduard Zingerman <[email protected]> |
bpf: simple DFA-based live registers analysis
Compute may-live registers before each instruction in the program. The register is live before the instruction I if it is read by I or some instruction
bpf: simple DFA-based live registers analysis
Compute may-live registers before each instruction in the program. The register is live before the instruction I if it is read by I or some instruction S following I during program execution and is not overwritten between I and S.
This information would be used in the next patch as a hint in func_states_equal().
Use a simple algorithm described in [1] to compute this information: - define the following: - I.use : a set of all registers read by instruction I; - I.def : a set of all registers written by instruction I; - I.in : a set of all registers that may be alive before I execution; - I.out : a set of all registers that may be alive after I execution; - I.successors : a set of instructions S that might immediately follow I for some program execution; - associate separate empty sets 'I.in' and 'I.out' with each instruction; - visit each instruction in a postorder and update corresponding 'I.in' and 'I.out' sets as follows:
I.out = U [S.in for S in I.successors] I.in = (I.out / I.def) U I.use
(where U stands for set union, / stands for set difference) - repeat the computation while I.{in,out} changes for any instruction.
On implementation side keep things as simple, as possible: - check_cfg() already marks instructions EXPLORED in post-order, modify it to save the index of each EXPLORED instruction in a vector; - represent I.{in,out,use,def} as bitmasks; - don't split the program into basic blocks and don't maintain the work queue, instead: - do fixed-point computation by visiting each instruction; - maintain a simple 'changed' flag if I.{in,out} for any instruction change; Measurements show that even such simplistic implementation does not add measurable verification time overhead (for selftests, at-least).
Note on check_cfg() ex_insn_beg/ex_done change: To avoid out of bounds access to env->cfg.insn_postorder array, it should be guaranteed that instruction transitions to EXPLORED state only once. Previously this was not the fact for incorrect programs with direct calls to exception callbacks.
The 'align' selftest needs adjustment to skip computed insn/live registers printout. Otherwise it matches lines from the live registers printout.
[1] https://en.wikipedia.org/wiki/Live-variable_analysis
Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
22f83974 |
| 04-Mar-2025 |
Eduard Zingerman <[email protected]> |
bpf: get_call_summary() utility function
Refactor mark_fastcall_pattern_for_call() to extract a utility function get_call_summary(). For a helper or kfunc call this function fills the following info
bpf: get_call_summary() utility function
Refactor mark_fastcall_pattern_for_call() to extract a utility function get_call_summary(). For a helper or kfunc call this function fills the following information: {num_params, is_void, fastcall}.
This function would be used in the next patch in order to get number of parameters of a helper or kfunc call.
Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
80ca3f1d |
| 04-Mar-2025 |
Eduard Zingerman <[email protected]> |
bpf: jmp_offset() and verbose_insn() utility functions
Extract two utility functions: - One BPF jump instruction uses .imm field to encode jump offset, while the rest use .off. Encapsulate this de
bpf: jmp_offset() and verbose_insn() utility functions
Extract two utility functions: - One BPF jump instruction uses .imm field to encode jump offset, while the rest use .off. Encapsulate this detail as jmp_offset() function. - Avoid duplicating instruction printing callback definitions by defining a verbose_insn() function, which disassembles an instruction into the verifier log while hiding this detail.
These functions will be used in the next patch.
Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
88044230 |
| 04-Mar-2025 |
Peilin Ye <[email protected]> |
bpf: Introduce load-acquire and store-release instructions
Introduce BPF instructions with load-acquire and store-release semantics, as discussed in [1]. Define 2 new flags:
#define BPF_LOAD_ACQ
bpf: Introduce load-acquire and store-release instructions
Introduce BPF instructions with load-acquire and store-release semantics, as discussed in [1]. Define 2 new flags:
#define BPF_LOAD_ACQ 0x100 #define BPF_STORE_REL 0x110
A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x100).
Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0x110).
Unlike existing atomic read-modify-write operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). As an exception, however, 64-bit load-acquires/store-releases are not supported on 32-bit architectures (to fix a build error reported by the kernel test robot).
An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDARH and friends.
Similar to existing atomic read-modify-write operations, misaligned load-acquires/store-releases are not allowed (even if BPF_F_ANY_ALIGNMENT is set).
As an example, consider the following 64-bit load-acquire BPF instruction (assuming little-endian):
db 10 00 00 00 01 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x00000100): BPF_LOAD_ACQ
Similarly, a 16-bit BPF store-release:
cb 21 00 00 10 01 00 00 store_release((u16 *)(r1 + 0x0), w2)
opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX imm (0x00000110): BPF_STORE_REL
In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the new instructions, until the corresponding JIT compiler supports them in arena.
[1] https://lore.kernel.org/all/[email protected]/
Acked-by: Eduard Zingerman <[email protected]> Acked-by: Ilya Leoshkevich <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/a217f46f0e445fbd573a1a024be5c6bf1d5fe716.1741049567.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
e723608b |
| 04-Mar-2025 |
Kumar Kartikeya Dwivedi <[email protected]> |
bpf: Add verifier support for timed may_goto
Implement support in the verifier for replacing may_goto implementation from a counter-based approach to one which samples time on the local CPU to have
bpf: Add verifier support for timed may_goto
Implement support in the verifier for replacing may_goto implementation from a counter-based approach to one which samples time on the local CPU to have a bigger loop bound.
We implement it by maintaining 16-bytes per-stack frame, and using 8 bytes for maintaining the count for amortizing time sampling, and 8 bytes for the starting timestamp. To minimize overhead, we need to avoid spilling and filling of registers around this sequence, so we push this cost into the time sampling function 'arch_bpf_timed_may_goto'. This is a JIT-specific wrapper around bpf_check_timed_may_goto which returns us the count to store into the stack through BPF_REG_AX. All caller-saved registers (r0-r5) are guaranteed to remain untouched.
The loop can be broken by returning count as 0, otherwise we dispatch into the function when the count drops to 0, and the runtime chooses to refresh it (by returning count as BPF_MAX_TIMED_LOOPS) or returning 0 and aborting the loop on next iteration.
Since the check for 0 is done right after loading the count from the stack, all subsequent cond_break sequences should immediately break as well, of the same loop or subsequent loops in the program.
We pass in the stack_depth of the count (and thus the timestamp, by adding 8 to it) to the arch_bpf_timed_may_goto call so that it can be passed in to bpf_check_timed_may_goto as an argument after r1 is saved, by adding the offset to r10/fp. This adjustment will be arch specific, and the next patch will introduce support for x86.
Note that depending on loop complexity, time spent in the loop can be more than the current limit (250 ms), but imposing an upper bound on program runtime is an orthogonal problem which will be addressed when program cancellations are supported.
The current time afforded by cond_break may not be enough for cases where BPF programs want to implement locking algorithms inline, and use cond_break as a promise to the verifier that they will eventually terminate.
Below are some benchmarking numbers on the time taken per-iteration for an empty loop that counts the number of iterations until cond_break fires. For comparison, we compare it against bpf_for/bpf_repeat which is another way to achieve the same number of spins (BPF_MAX_LOOPS). The hardware used for benchmarking was a Sapphire Rapids Intel server with performance governor enabled, mitigations were enabled.
+-----------------------------+--------------+--------------+------------------+ | Loop type | Iterations | Time (ms) | Time/iter (ns) | +-----------------------------|--------------+--------------+------------------+ | may_goto | 8388608 | 3 | 0.36 | | timed_may_goto (count=65535)| 589674932 | 250 | 0.42 | | bpf_for | 8388608 | 10 | 1.19 | +-----------------------------+--------------+--------------+------------------+
This gives a good approximation at low overhead while staying close to the current implementation.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
a752ba43 |
| 03-Mar-2025 |
Peilin Ye <[email protected]> |
bpf: Factor out check_load_mem() and check_store_reg()
Extract BPF_LDX and most non-ATOMIC BPF_STX instruction handling logic in do_check() into helper functions to be used later. While we are here
bpf: Factor out check_load_mem() and check_store_reg()
Extract BPF_LDX and most non-ATOMIC BPF_STX instruction handling logic in do_check() into helper functions to be used later. While we are here, make that comment about "reserved fields" more specific.
Suggested-by: Eduard Zingerman <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/8b39c94eac2bb7389ff12392ca666f939124ec4f.1740978603.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
2626ffe9 |
| 03-Mar-2025 |
Peilin Ye <[email protected]> |
bpf: Factor out check_atomic_rmw()
Currently, check_atomic() only handles atomic read-modify-write (RMW) instructions. Since we are planning to introduce other types of atomic instructions (i.e., a
bpf: Factor out check_atomic_rmw()
Currently, check_atomic() only handles atomic read-modify-write (RMW) instructions. Since we are planning to introduce other types of atomic instructions (i.e., atomic load/store), extract the existing RMW handling logic into its own function named check_atomic_rmw().
Remove the @insn_idx parameter as it is not really necessary. Use 'env->insn_idx' instead, as in other places in verifier.c.
Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/6323ac8e73a10a1c8ee547c77ed68cf8eb6b90e1.1740978603.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
66faaea9 |
| 03-Mar-2025 |
Peilin Ye <[email protected]> |
bpf: Factor out atomic_ptr_type_ok()
Factor out atomic_ptr_type_ok() as a helper function to be used later.
Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/e5ef8b3116
bpf: Factor out atomic_ptr_type_ok()
Factor out atomic_ptr_type_ok() as a helper function to be used later.
Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/e5ef8b3116f3fffce78117a14060ddce05eba52a.1740978603.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
e2d8f560 |
| 01-Mar-2025 |
Kumar Kartikeya Dwivedi <[email protected]> |
bpf: Summarize sleepable global subprogs
The verifier currently does not permit global subprog calls when a lock is held, preemption is disabled, or when IRQs are disabled. This is because we don't
bpf: Summarize sleepable global subprogs
The verifier currently does not permit global subprog calls when a lock is held, preemption is disabled, or when IRQs are disabled. This is because we don't know whether the global subprog calls sleepable functions or not.
In case of locks, there's an additional reason: functions called by the global subprog may hold additional locks etc. The verifier won't know while verifying the global subprog whether it was called in context where a spin lock is already held by the program.
Perform summarization of the sleepable nature of a global subprog just like changes_pkt_data and then allow calls to global subprogs for non-sleepable ones from atomic context.
While making this change, I noticed that RCU read sections had no protection against sleepable global subprog calls, include it in the checks and fix this while we're at it.
Care needs to be taken to not allow global subprog calls when regular bpf_spin_lock is held. When resilient spin locks is held, we want to potentially have this check relaxed, but not for now.
Also make sure extensions freplacing global functions cannot do so in case the target is non-sleepable, but the extension is. The other combination is ok.
Tests are included in the next patch to handle all special conditions.
Fixes: 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()") Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
01c7bc51 |
| 03-Mar-2025 |
Brian Gerst <[email protected]> |
x86/smp: Move cpu number to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <ubizjak@gmai
x86/smp: Move cpu number to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
18cdd90a |
| 27-Feb-2025 |
Brian Gerst <[email protected]> |
x86/bpf: Fix BPF percpu accesses
Due to this recent commit in the x86 tree:
9d7de2aa8b41 ("Use relative percpu offsets")
percpu addresses went from positive offsets from the GSBASE to negative k
x86/bpf: Fix BPF percpu accesses
Due to this recent commit in the x86 tree:
9d7de2aa8b41 ("Use relative percpu offsets")
percpu addresses went from positive offsets from the GSBASE to negative kernel virtual addresses. The BPF verifier has an optimization for x86-64 that loads the address of cpu_number into a register, but was only doing a 32-bit load which truncates negative addresses.
Change it to a 64-bit load so that the address is properly sign-extended.
Fixes: 9d7de2aa8b41 ("Use relative percpu offsets") Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Uros Bizjak <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d519594e |
| 25-Feb-2025 |
Amery Hung <[email protected]> |
bpf: Search and add kfuncs in struct_ops prologue and epilogue
Currently, add_kfunc_call() is only invoked once before the main verification loop. Therefore, the verifier could not find the bpf_kfun
bpf: Search and add kfuncs in struct_ops prologue and epilogue
Currently, add_kfunc_call() is only invoked once before the main verification loop. Therefore, the verifier could not find the bpf_kfunc_btf_tab of a new kfunc call which is not seen in user defined struct_ops operators but introduced in gen_prologue or gen_epilogue during do_misc_fixup(). Fix this by searching kfuncs in the patching instruction buffer and add them to prog->aux->kfunc_tab.
Signed-off-by: Amery Hung <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
f3c2d243 |
| 25-Feb-2025 |
Eduard Zingerman <[email protected]> |
bpf: abort verification if env->cur_state->loop_entry != NULL
In addition to warning abort verification with -EFAULT. If env->cur_state->loop_entry != NULL something is irrecoverably buggy.
Fixes:
bpf: abort verification if env->cur_state->loop_entry != NULL
In addition to warning abort verification with -EFAULT. If env->cur_state->loop_entry != NULL something is irrecoverably buggy.
Fixes: bbbc02b7445e ("bpf: copy_verifier_state() should copy 'loop_entry' field") Suggested-by: Andrii Nakryiko <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4 |
|
| #
201b62cc |
| 21-Feb-2025 |
Amery Hung <[email protected]> |
bpf: Refactor check_ctx_access()
Reduce the variable passing madness surrounding check_ctx_access(). Currently, check_mem_access() passes many pointers to local variables to check_ctx_access(). They
bpf: Refactor check_ctx_access()
Reduce the variable passing madness surrounding check_ctx_access(). Currently, check_mem_access() passes many pointers to local variables to check_ctx_access(). They are used to initialize "struct bpf_insn_access_aux info" in check_ctx_access() and then passed to is_valid_access(). Then, check_ctx_access() takes the data our from info and write them back the pointers to pass them back. This can be simpilified by moving info up to check_mem_access().
No functional change.
Signed-off-by: Amery Hung <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
38f1e66a |
| 20-Feb-2025 |
Amery Hung <[email protected]> |
bpf: Do not allow tail call in strcut_ops program with __ref argument
Reject struct_ops programs with refcounted kptr arguments (arguments tagged with __ref suffix) that tail call. Once a refcounted
bpf: Do not allow tail call in strcut_ops program with __ref argument
Reject struct_ops programs with refcounted kptr arguments (arguments tagged with __ref suffix) that tail call. Once a refcounted kptr is passed to a struct_ops program from the kernel, it can be freed or xchged into maps. As there is no guarantee a callee can get the same valid refcounted kptr in the ctx, we cannot allow such usage.
Signed-off-by: Amery Hung <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3 |
|
| #
574078b0 |
| 15-Feb-2025 |
Eduard Zingerman <[email protected]> |
bpf: fix env->peak_states computation
Compute env->peak_states as a maximum value of sum of env->explored_states and env->free_list size.
Signed-off-by: Eduard Zingerman <[email protected]> Link: h
bpf: fix env->peak_states computation
Compute env->peak_states as a maximum value of sum of env->explored_states and env->free_list size.
Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
408fcf94 |
| 15-Feb-2025 |
Eduard Zingerman <[email protected]> |
bpf: free verifier states when they are no longer referenced
When fixes from patches 1 and 3 are applied, Patrick Somaru reported an increase in memory consumption for sched_ext iterator-based progr
bpf: free verifier states when they are no longer referenced
When fixes from patches 1 and 3 are applied, Patrick Somaru reported an increase in memory consumption for sched_ext iterator-based programs hitting 1M instructions limit. For example, 2Gb VMs ran out of memory while verifying a program. Similar behaviour could be reproduced on current bpf-next master.
Here is an example of such program:
/* verification completes if given 16G or RAM, * final env->free_list size is 369,960 entries. */ SEC("raw_tp") __flag(BPF_F_TEST_STATE_FREQ) __success int free_list_bomb(const void *ctx) { volatile char buf[48] = {}; unsigned i, j;
j = 0; bpf_for(i, 0, 10) { /* this forks verifier state: * - verification of current path continues and * creates a checkpoint after 'if'; * - verification of forked path hits the * checkpoint and marks it as loop_entry. */ if (bpf_get_prandom_u32()) asm volatile (""); /* this marks 'j' as precise, thus any checkpoint * created on current iteration would not be matched * on the next iteration. */ buf[j++] = 42; j %= ARRAY_SIZE(buf); } asm volatile (""::"r"(buf)); return 0; }
Memory consumption increased due to more states being marked as loop entries and eventually added to env->free_list.
This commit introduces logic to free states from env->free_list during verification. A state in env->free_list can be freed if: - it has no child states; - it is not used as a loop_entry.
This commit: - updates bpf_verifier_state->used_as_loop_entry to be a counter that tracks how many states use this one as a loop entry; - adds a function maybe_free_verifier_state(), which: - frees a state if its ->branches and ->used_as_loop_entry counters are both zero; - if the state is freed, state->loop_entry->used_as_loop_entry is decremented, and an attempt is made to free state->loop_entry.
In the example above, this approach reduces the maximum number of states in the free list from 369,960 to 16,223.
However, this approach has its limitations. If the buf size in the example above is modified to 64, state caching overflows: the state for j=0 is evicted from the cache before it can be used to stop traversal. As a result, states in the free list accumulate because their branch counters do not reach zero.
The effect of this patch on the selftests looks as follows:
File Program Max free list (A) Max free list (B) Max free list (DIFF) -------------------------------- ------------------------------------ ----------------- ----------------- -------------------- arena_list.bpf.o arena_list_add 17 3 -14 (-82.35%) bpf_iter_task_stack.bpf.o dump_task_stack 39 9 -30 (-76.92%) iters.bpf.o checkpoint_states_deletion 265 89 -176 (-66.42%) iters.bpf.o clean_live_states 19 0 -19 (-100.00%) profiler2.bpf.o tracepoint__syscalls__sys_enter_kill 102 1 -101 (-99.02%) profiler3.bpf.o tracepoint__syscalls__sys_enter_kill 144 0 -144 (-100.00%) pyperf600_iter.bpf.o on_event 15 0 -15 (-100.00%) pyperf600_nounroll.bpf.o on_event 1170 1158 -12 (-1.03%) setget_sockopt.bpf.o skops_sockopt 18 0 -18 (-100.00%) strobemeta_nounroll1.bpf.o on_event 147 83 -64 (-43.54%) strobemeta_nounroll2.bpf.o on_event 312 209 -103 (-33.01%) strobemeta_subprogs.bpf.o on_event 124 86 -38 (-30.65%) test_cls_redirect_subprogs.bpf.o cls_redirect 15 0 -15 (-100.00%) timer.bpf.o test1 30 15 -15 (-50.00%)
Measured using "do-not-submit" patches from here: https://github.com/eddyz87/bpf/tree/get-loop-entry-hungup
Reported-by: Patrick Somaru <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|