|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
7c8ce4ff |
| 12-Nov-2024 |
Xu Kuohai <[email protected]> |
bpf: Add kernel symbol for struct_ops trampoline
Without kernel symbols for struct_ops trampoline, the unwinder may produce unexpected stacktraces.
For example, the x86 ORC and FP unwinders check i
bpf: Add kernel symbol for struct_ops trampoline
Without kernel symbols for struct_ops trampoline, the unwinder may produce unexpected stacktraces.
For example, the x86 ORC and FP unwinders check if an IP is in kernel text by verifying the presence of the IP's kernel symbol. When a struct_ops trampoline address is encountered, the unwinder stops due to the absence of symbol, resulting in an incomplete stacktrace that consists only of direct and indirect child functions called from the trampoline.
The arm64 unwinder is another example. While the arm64 unwinder can proceed across a struct_ops trampoline address, the corresponding symbol name is displayed as "unknown", which is confusing.
Thus, add kernel symbol for struct_ops trampoline. The name is bpf__<struct_ops_name>_<member_name>, where <struct_ops_name> is the type name of the struct_ops, and <member_name> is the name of the member that the trampoline is linked to.
Below is a comparison of stacktraces captured on x86 by perf record, before and after this patch.
Before: ffffffff8116545d __lock_acquire+0xad ([kernel.kallsyms]) ffffffff81167fcc lock_acquire+0xcc ([kernel.kallsyms]) ffffffff813088f4 __bpf_prog_enter+0x34 ([kernel.kallsyms])
After: ffffffff811656bd __lock_acquire+0x30d ([kernel.kallsyms]) ffffffff81167fcc lock_acquire+0xcc ([kernel.kallsyms]) ffffffff81309024 __bpf_prog_enter+0x34 ([kernel.kallsyms]) ffffffffc000d7e9 bpf__tcp_congestion_ops_cong_avoid+0x3e ([kernel.kallsyms]) ffffffff81f250a5 tcp_ack+0x10d5 ([kernel.kallsyms]) ffffffff81f27c66 tcp_rcv_established+0x3b6 ([kernel.kallsyms]) ffffffff81f3ad03 tcp_v4_do_rcv+0x193 ([kernel.kallsyms]) ffffffff81d65a18 __release_sock+0xd8 ([kernel.kallsyms]) ffffffff81d65af4 release_sock+0x34 ([kernel.kallsyms]) ffffffff81f15c4b tcp_sendmsg+0x3b ([kernel.kallsyms]) ffffffff81f663d7 inet_sendmsg+0x47 ([kernel.kallsyms]) ffffffff81d5ab40 sock_write_iter+0x160 ([kernel.kallsyms]) ffffffff8149c67b vfs_write+0x3fb ([kernel.kallsyms]) ffffffff8149caf6 ksys_write+0xc6 ([kernel.kallsyms]) ffffffff8149cb5d __x64_sys_write+0x1d ([kernel.kallsyms]) ffffffff81009200 x64_sys_call+0x1d30 ([kernel.kallsyms]) ffffffff82232d28 do_syscall_64+0x68 ([kernel.kallsyms]) ffffffff8240012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
Fixes: 85d33df357b6 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS") Signed-off-by: Xu Kuohai <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
5bd36da1 |
| 12-Nov-2024 |
Yonghong Song <[email protected]> |
bpf: Support private stack for struct_ops progs
For struct_ops progs, whether a particular prog uses private stack depends on prog->aux->priv_stack_requested setting before actual insn-level verific
bpf: Support private stack for struct_ops progs
For struct_ops progs, whether a particular prog uses private stack depends on prog->aux->priv_stack_requested setting before actual insn-level verification for that prog. One particular implementation is to piggyback on struct_ops->check_member(). The next patch has an example for this. The struct_ops->check_member() sets prog->aux->priv_stack_requested to be true which enables private stack usage.
The struct_ops prog follows the same rule as kprobe/tracing progs after function bpf_enable_priv_stack(). For example, even a struct_ops prog requests private stack, it could still use normal kernel stack if the stack size is small (< 64 bytes).
Similar to tracing progs, nested same cpu same prog run will be skipped. A field (recursion_detected()) is added to bpf_prog_aux structure. If bpf_prog->aux->recursion_detected is implemented by the struct_ops subsystem and nested same cpu/prog happens, the function will be triggered to report an error, collect related info, etc.
Acked-by: Tejun Heo <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4 |
|
| #
d6083f04 |
| 15-Oct-2024 |
Leon Hwang <[email protected]> |
bpf: Prevent tailcall infinite loop caused by freplace
There is a potential infinite loop issue that can occur when using a combination of tail calls and freplace.
In an upcoming selftest, the atta
bpf: Prevent tailcall infinite loop caused by freplace
There is a potential infinite loop issue that can occur when using a combination of tail calls and freplace.
In an upcoming selftest, the attach target for entry_freplace of tailcall_freplace.c is subprog_tc of tc_bpf2bpf.c, while the tail call in entry_freplace leads to entry_tc. This results in an infinite loop:
entry_tc -> subprog_tc -> entry_freplace --tailcall-> entry_tc.
The problem arises because the tail_call_cnt in entry_freplace resets to zero each time entry_freplace is executed, causing the tail call mechanism to never terminate, eventually leading to a kernel panic.
To fix this issue, the solution is twofold:
1. Prevent updating a program extended by an freplace program to a prog_array map. 2. Prevent extending a program that is already part of a prog_array map with an freplace program.
This ensures that:
* If a program or its subprogram has been extended by an freplace program, it can no longer be updated to a prog_array map. * If a program has been added to a prog_array map, neither it nor its subprograms can be extended by an freplace program.
Moreover, an extension program should not be tailcalled. As such, return -EINVAL if the program has a type of BPF_PROG_TYPE_EXT when adding it to a prog_array map.
Additionally, fix a minor code style issue by replacing eight spaces with a tab for proper formatting.
Reviewed-by: Eduard Zingerman <[email protected]> Signed-off-by: Leon Hwang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
1e52af7f |
| 28-Feb-2024 |
Paul E. McKenney <[email protected]> |
bpf: Choose RCU Tasks based on TASKS_RCU rather than PREEMPTION
The advent of CONFIG_PREEMPT_AUTO, AKA lazy preemption, will mean that even kernels built with CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_V
bpf: Choose RCU Tasks based on TASKS_RCU rather than PREEMPTION
The advent of CONFIG_PREEMPT_AUTO, AKA lazy preemption, will mean that even kernels built with CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY might see the occasional preemption, and that this preemption just might happen within a trampoline.
Therefore, update bpf_tramp_image_put() to choose call_rcu_tasks() based on CONFIG_TASKS_RCU instead of CONFIG_PREEMPTION.
This change might enable further simplifications, but the goal of this effort is to make the code safe, not necessarily optimal.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: John Fastabend <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Song Liu <[email protected]> Cc: Yonghong Song <[email protected]> Cc: KP Singh <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Ankur Arora <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
| #
ce09cbdd |
| 02-Apr-2024 |
Jose Fernandez <[email protected]> |
bpf: Improve program stats run-time calculation
This patch improves the run-time calculation for program stats by capturing the duration as soon as possible after the program returns.
Previously, t
bpf: Improve program stats run-time calculation
This patch improves the run-time calculation for program stats by capturing the duration as soon as possible after the program returns.
Previously, the duration included u64_stats_t operations. While the instrumentation overhead is part of the total time spent when stats are enabled, distinguishing between the program's native execution time and the time spent due to instrumentation is crucial for accurate performance analysis.
By making this change, the patch facilitates more precise optimization of BPF programs, enabling users to understand their performance in environments without stats enabled.
I used a virtualized environment to measure the run-time over one minute for a basic raw_tracepoint/sys_enter program, which just increments a local counter. Although the virtualization introduced some performance degradation that could affect the results, I observed approximately a 16% decrease in average run-time reported by stats with this change (310 -> 260 nsec).
Signed-off-by: Jose Fernandez <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| #
c733239f |
| 16-Mar-2024 |
Christophe Leroy <[email protected]> |
bpf: Check return from set_memory_rox()
arch_protect_bpf_trampoline() and alloc_new_pack() call set_memory_rox() which can fail, leading to unprotected memory.
Take into account return from set_mem
bpf: Check return from set_memory_rox()
arch_protect_bpf_trampoline() and alloc_new_pack() call set_memory_rox() which can fail, leading to unprotected memory.
Take into account return from set_memory_rox() function and add __must_check flag to arch_protect_bpf_trampoline().
Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/fe1c163c83767fde5cab31d209a4a6be3ddb3a73.1710574353.git.christophe.leroy@csgroup.eu Signed-off-by: Martin KaFai Lau <[email protected]>
show more ...
|
| #
e3362acd |
| 16-Mar-2024 |
Christophe Leroy <[email protected]> |
bpf: Remove arch_unprotect_bpf_trampoline()
Last user of arch_unprotect_bpf_trampoline() was removed by commit 187e2af05abe ("bpf: struct_ops supports more than one page for trampolines.")
Remove a
bpf: Remove arch_unprotect_bpf_trampoline()
Last user of arch_unprotect_bpf_trampoline() was removed by commit 187e2af05abe ("bpf: struct_ops supports more than one page for trampolines.")
Remove arch_unprotect_bpf_trampoline()
Reported-by: Daniel Borkmann <[email protected]> Fixes: 187e2af05abe ("bpf: struct_ops supports more than one page for trampolines.") Signed-off-by: Christophe Leroy <[email protected]> Link: https://lore.kernel.org/r/42c635bb54d3af91db0f9b85d724c7c290069f67.1710574353.git.christophe.leroy@csgroup.eu Signed-off-by: Martin KaFai Lau <[email protected]>
show more ...
|
| #
66c84731 |
| 09-Mar-2024 |
Andrii Nakryiko <[email protected]> |
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
prog->aux->sleepable is checked very frequently as part of (some) BPF program run hot paths. So this extra aux indirection seems wasteful and o
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
prog->aux->sleepable is checked very frequently as part of (some) BPF program run hot paths. So this extra aux indirection seems wasteful and on busy systems might cause unnecessary memory cache misses.
Let's move sleepable flag into prog itself to eliminate unnecessary pointer dereference.
Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Jiri Olsa <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5 |
|
| #
26ef208c |
| 06-Dec-2023 |
Song Liu <[email protected]> |
bpf: Use arch_bpf_trampoline_size
Instead of blindly allocating PAGE_SIZE for each trampoline, check the size of the trampoline with arch_bpf_trampoline_size(). This size is saved in bpf_tramp_image
bpf: Use arch_bpf_trampoline_size
Instead of blindly allocating PAGE_SIZE for each trampoline, check the size of the trampoline with arch_bpf_trampoline_size(). This size is saved in bpf_tramp_image->size, and used for modmem charge/uncharge. The fallback arch_alloc_bpf_trampoline() still allocates a whole page because we need to use set_memory_* to protect the memory.
struct_ops trampoline still uses a whole page for multiple trampolines.
With this size check at caller (regular trampoline and struct_ops trampoline), remove arch_bpf_trampoline_size() from arch_prepare_bpf_trampoline() in archs.
Also, update bpf_image_ksym_add() to handle symbol of different sizes.
Signed-off-by: Song Liu <[email protected]> Acked-by: Ilya Leoshkevich <[email protected]> Tested-by: Ilya Leoshkevich <[email protected]> # on s390x Acked-by: Jiri Olsa <[email protected]> Acked-by: Björn Töpel <[email protected]> Tested-by: Björn Töpel <[email protected]> # on riscv Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
96d1b7c0 |
| 06-Dec-2023 |
Song Liu <[email protected]> |
bpf: Add arch_bpf_trampoline_size()
This helper will be used to calculate the size of the trampoline before allocating the memory.
arch_prepare_bpf_trampoline() for arm64 and riscv64 can use arch_b
bpf: Add arch_bpf_trampoline_size()
This helper will be used to calculate the size of the trampoline before allocating the memory.
arch_prepare_bpf_trampoline() for arm64 and riscv64 can use arch_bpf_trampoline_size() to check the trampoline fits in the image.
OTOH, arch_prepare_bpf_trampoline() for s390 has to call the JIT process twice, so it cannot use arch_bpf_trampoline_size().
Signed-off-by: Song Liu <[email protected]> Acked-by: Ilya Leoshkevich <[email protected]> Tested-by: Ilya Leoshkevich <[email protected]> # on s390x Acked-by: Jiri Olsa <[email protected]> Acked-by: Björn Töpel <[email protected]> Tested-by: Björn Töpel <[email protected]> # on riscv Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
82583daa |
| 06-Dec-2023 |
Song Liu <[email protected]> |
bpf: Add helpers for trampoline image management
As BPF trampoline of different archs moves from bpf_jit_[alloc|free]_exec() to bpf_prog_pack_[alloc|free](), we need to use different _alloc, _free f
bpf: Add helpers for trampoline image management
As BPF trampoline of different archs moves from bpf_jit_[alloc|free]_exec() to bpf_prog_pack_[alloc|free](), we need to use different _alloc, _free for different archs during the transition. Add the following helpers for this transition:
void *arch_alloc_bpf_trampoline(unsigned int size); void arch_free_bpf_trampoline(void *image, unsigned int size); void arch_protect_bpf_trampoline(void *image, unsigned int size); void arch_unprotect_bpf_trampoline(void *image, unsigned int size);
The fallback version of these helpers require size <= PAGE_SIZE, but they are only called with size == PAGE_SIZE. They will be called with size < PAGE_SIZE when arch_bpf_trampoline_size() helper is introduced later.
Signed-off-by: Song Liu <[email protected]> Acked-by: Ilya Leoshkevich <[email protected]> Tested-by: Ilya Leoshkevich <[email protected]> # on s390x Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
7a3d9a15 |
| 06-Dec-2023 |
Song Liu <[email protected]> |
bpf: Adjust argument names of arch_prepare_bpf_trampoline()
We are using "im" for "struct bpf_tramp_image" and "tr" for "struct bpf_trampoline" in most of the code base. The only exception is the pr
bpf: Adjust argument names of arch_prepare_bpf_trampoline()
We are using "im" for "struct bpf_tramp_image" and "tr" for "struct bpf_trampoline" in most of the code base. The only exception is the prototype and fallback version of arch_prepare_bpf_trampoline(). Update them to match the rest of the code base.
We mix "orig_call" and "func_addr" for the argument in different versions of arch_prepare_bpf_trampoline(). s/orig_call/func_addr/g so they match.
Signed-off-by: Song Liu <[email protected]> Acked-by: Ilya Leoshkevich <[email protected]> Tested-by: Ilya Leoshkevich <[email protected]> # on s390x Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2 |
|
| #
2b5dcb31 |
| 12-Sep-2023 |
Leon Hwang <[email protected]> |
bpf, x64: Fix tailcall infinite loop
From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before.
From commit e411901c0
bpf, x64: Fix tailcall infinite loop
From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before.
From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs.
How about combining them all together?
1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls the subprogram's caller.
As a result, a tailcall infinite loop comes up. And the loop would halt the machine.
As we know, in tail call context, the tail_call_cnt propagates by stack and rax register between BPF subprograms. So do in trampolines.
Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Leon Hwang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc1 |
|
| #
6764e767 |
| 30-Aug-2023 |
Sebastian Andrzej Siewior <[email protected]> |
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
__bpf_prog_enter_recur() assigns bpf_tramp_run_ctx::saved_run_ctx before performing the recursion check which means in case of a
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
__bpf_prog_enter_recur() assigns bpf_tramp_run_ctx::saved_run_ctx before performing the recursion check which means in case of a recursion __bpf_prog_exit_recur() uses the previously set bpf_tramp_run_ctx::saved_run_ctx value.
__bpf_prog_enter_sleepable_recur() assigns bpf_tramp_run_ctx::saved_run_ctx after the recursion check which means in case of a recursion __bpf_prog_exit_sleepable_recur() uses an uninitialized value. This does not look right. If I read the entry trampoline code right, then bpf_tramp_run_ctx isn't initialized upfront.
Align __bpf_prog_enter_sleepable_recur() with __bpf_prog_enter_recur() and set bpf_tramp_run_ctx::saved_run_ctx before the recursion check is made. Remove the assignment of saved_run_ctx in kern_sys_bpf() since it happens a few cycles later.
Fixes: e384c7b7b46d0 ("bpf, x86: Create bpf_tramp_run_ctx on the caller thread's stack") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3 |
|
| #
108598c3 |
| 15-May-2023 |
Yafang Shao <[email protected]> |
bpf: Fix memleak due to fentry attach failure
If it fails to attach fentry, the allocated bpf trampoline image will be left in the system. That can be verified by checking /proc/kallsyms.
This meam
bpf: Fix memleak due to fentry attach failure
If it fails to attach fentry, the allocated bpf trampoline image will be left in the system. That can be verified by checking /proc/kallsyms.
This meamleak can be verified by a simple bpf program as follows:
SEC("fentry/trap_init") int fentry_run() { return 0; }
It will fail to attach trap_init because this function is freed after kernel init, and then we can find the trampoline image is left in the system by checking /proc/kallsyms.
$ tail /proc/kallsyms ffffffffc0613000 t bpf_trampoline_6442453466_1 [bpf] ffffffffc06c3000 t bpf_trampoline_6442453466_1 [bpf]
$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep "FUNC 'trap_init'" [2522] FUNC 'trap_init' type_id=119 linkage=static
$ echo $((6442453466 & 0x7fffffff)) 2522
Note that there are two left bpf trampoline images, that is because the libbpf will fallback to raw tracepoint if -EINVAL is returned.
Fixes: e21aa341785c ("bpf: Fix fexit trampoline.") Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| #
47e79cbe |
| 15-May-2023 |
Yafang Shao <[email protected]> |
bpf: Remove bpf trampoline selector
After commit e21aa341785c ("bpf: Fix fexit trampoline."), the selector is only used to indicate how many times the bpf trampoline image are updated and been displ
bpf: Remove bpf trampoline selector
After commit e21aa341785c ("bpf: Fix fexit trampoline."), the selector is only used to indicate how many times the bpf trampoline image are updated and been displayed in the trampoline ksym name. After the trampoline is freed, the selector will start from 0 again. So the selector is a useless value to the user. We can remove it.
If the user want to check whether the bpf trampoline image has been updated or not, the user can compare the address. Each time the trampoline image is updated, the address will change consequently. Jiri also pointed out another issue that perf is still using the old name "bpf_trampoline_%lu", so this change can fix the issue in perf.
Fixes: e21aa341785c ("bpf: Fix fexit trampoline.") Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/bpf/ZFvOOlrmHiY9AgXE@krava Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
da8bdfbd |
| 21-Mar-2023 |
Florent Revest <[email protected]> |
ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs
Now that the original _ftrace_direct APIs are gone, the "_multi" suffixes only add confusion.
Link: https://lkml.kernel.org/r/2023032
ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs
Now that the original _ftrace_direct APIs are gone, the "_multi" suffixes only add confusion.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Florent Revest <[email protected]> Acked-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
59495740 |
| 21-Mar-2023 |
Florent Revest <[email protected]> |
ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()
A common pattern when using the ftrace_direct_multi API is to unregister the ops and also immediately free its filter. We've no
ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()
A common pattern when using the ftrace_direct_multi API is to unregister the ops and also immediately free its filter. We've noticed it's very easy for users to miss calling ftrace_free_filter().
This adds a "free_filters" argument to unregister_ftrace_direct_multi() to both remind the user they should free filters and also to make their life easier.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Florent Revest <[email protected]> Acked-by: Mark Rutland <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc3, v6.3-rc2 |
|
| #
31bf1dbc |
| 10-Mar-2023 |
Viktor Malik <[email protected]> |
bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules
This resolves two problems with attachment of fentry/fexit/fmod_ret/lsm to functions located in modules:
1. The verifier tries to find the ad
bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules
This resolves two problems with attachment of fentry/fexit/fmod_ret/lsm to functions located in modules:
1. The verifier tries to find the address to attach to in kallsyms. This is always done by searching the entire kallsyms, not respecting the module in which the function is located. Such approach causes an incorrect attachment address to be computed if the function to attach to is shadowed by a function of the same name located earlier in kallsyms.
2. If the address to attach to is located in a module, the module reference is only acquired in register_fentry. If the module is unloaded between the place where the address is found (bpf_check_attach_target in the verifier) and register_fentry, it is possible that another module is loaded to the same address which may lead to potential errors.
Since the attachment must contain the BTF of the program to attach to, we extract the module from it and search for the function address in the correct module (resolving problem no. 1). Then, the module reference is taken directly in bpf_check_attach_target and stored in the bpf program (in bpf_prog_aux). The reference is only released when the program is unloaded (resolving problem no. 2).
Signed-off-by: Viktor Malik <[email protected]> Acked-by: Jiri Olsa <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/3f6a9d8ae850532b5ef864ef16327b0f7a669063.1678432753.git.vmalik@redhat.com Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1 |
|
| #
9ed1d9ae |
| 24-Dec-2022 |
Chuang Wang <[email protected]> |
bpf: Fix panic due to wrong pageattr of im->image
In the scenario where livepatch and kretfunc coexist, the pageattr of im->image is rox after arch_prepare_bpf_trampoline in bpf_trampoline_update, a
bpf: Fix panic due to wrong pageattr of im->image
In the scenario where livepatch and kretfunc coexist, the pageattr of im->image is rox after arch_prepare_bpf_trampoline in bpf_trampoline_update, and then modify_fentry or register_fentry returns -EAGAIN from bpf_tramp_ftrace_ops_func, the BPF_TRAMP_F_ORIG_STACK flag will be configured, and arch_prepare_bpf_trampoline will be re-executed.
At this time, because the pageattr of im->image is rox, arch_prepare_bpf_trampoline will read and write im->image, which causes a fault. as follows:
insmod livepatch-sample.ko # samples/livepatch/livepatch-sample.c bpftrace -e 'kretfunc:cmdline_proc_show {}'
BUG: unable to handle page fault for address: ffffffffa0206000 PGD 322d067 P4D 322d067 PUD 322e063 PMD 1297e067 PTE d428061 Oops: 0003 [#1] PREEMPT SMP PTI CPU: 2 PID: 270 Comm: bpftrace Tainted: G E K 6.1.0 #5 RIP: 0010:arch_prepare_bpf_trampoline+0xed/0x8c0 RSP: 0018:ffffc90001083ad8 EFLAGS: 00010202 RAX: ffffffffa0206000 RBX: 0000000000000020 RCX: 0000000000000000 RDX: ffffffffa0206001 RSI: ffffffffa0206000 RDI: 0000000000000030 RBP: ffffc90001083b70 R08: 0000000000000066 R09: ffff88800f51b400 R10: 000000002e72c6e5 R11: 00000000d0a15080 R12: ffff8880110a68c8 R13: 0000000000000000 R14: ffff88800f51b400 R15: ffffffff814fec10 FS: 00007f87bc0dc780(0000) GS:ffff88803e600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa0206000 CR3: 0000000010b70000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> bpf_trampoline_update+0x25a/0x6b0 __bpf_trampoline_link_prog+0x101/0x240 bpf_trampoline_link_prog+0x2d/0x50 bpf_tracing_prog_attach+0x24c/0x530 bpf_raw_tp_link_attach+0x73/0x1d0 __sys_bpf+0x100e/0x2570 __x64_sys_bpf+0x1c/0x30 do_syscall_64+0x5b/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
With this patch, when modify_fentry or register_fentry returns -EAGAIN from bpf_tramp_ftrace_ops_func, the pageattr of im->image will be reset to nx+rw.
Cc: [email protected] Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)") Signed-off-by: Chuang Wang <[email protected]> Acked-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3 |
|
| #
d48567c9 |
| 26-Oct-2022 |
Peter Zijlstra <[email protected]> |
mm: Introduce set_memory_rox()
Because endlessly repeating:
set_memory_ro() set_memory_x()
is getting tedious.
Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Peter
mm: Introduce set_memory_rox()
Because endlessly repeating:
set_memory_ro() set_memory_x()
is getting tedious.
Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
271de525 |
| 25-Oct-2022 |
Martin KaFai Lau <[email protected]> |
bpf: Remove prog->active check for bpf_lsm and bpf_iter
The commit 64696c40d03c ("bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline") removed prog->active check for struct_ops pr
bpf: Remove prog->active check for bpf_lsm and bpf_iter
The commit 64696c40d03c ("bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline") removed prog->active check for struct_ops prog. The bpf_lsm and bpf_iter is also using trampoline. Like struct_ops, the bpf_lsm and bpf_iter have fixed hooks for the prog to attach. The kernel does not call the same hook in a recursive way. This patch also removes the prog->active check for bpf_lsm and bpf_iter.
A later patch has a test to reproduce the recursion issue for a sleepable bpf_lsm program.
This patch appends the '_recur' naming to the existing enter and exit functions that track the prog->active counter. New __bpf_prog_{enter,exit}[_sleepable] function are added to skip the prog->active tracking. The '_struct_ops' version is also removed.
It also moves the decision on picking the enter and exit function to the new bpf_trampoline_{enter,exit}(). It returns the '_recur' ones for all tracing progs to use. For bpf_lsm, bpf_iter, struct_ops (no prog->active tracking after 64696c40d03c), and bpf_lsm_cgroup (no prog->active tracking after 69fd337a975c7), it will return the functions that don't track the prog->active.
Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc2, v6.1-rc1, v6.0 |
|
| #
64696c40 |
| 29-Sep-2022 |
Martin KaFai Lau <[email protected]> |
bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to i
bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way.
The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller.
The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion.
One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches.
Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
5b0d1c7b |
| 26-Sep-2022 |
Song Liu <[email protected]> |
bpf: Enforce W^X for bpf trampoline
Mark the trampoline as RO+X after arch_prepare_bpf_trampoline, so that the trampoine follows W^X rule strictly. This will turn off warnings like
CPA refuse W^X v
bpf: Enforce W^X for bpf trampoline
Mark the trampoline as RO+X after arch_prepare_bpf_trampoline, so that the trampoine follows W^X rule strictly. This will turn off warnings like
CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ...
Also remove bpf_jit_alloc_exec_page(), since it is not used any more.
Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc7, v6.0-rc6 |
|
| #
05b24ff9 |
| 16-Sep-2022 |
Jiri Olsa <[email protected]> |
bpf: Prevent bpf program recursion for raw tracepoint probes
We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the s
bpf: Prevent bpf program recursion for raw tracepoint probes
We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock.
Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ...
The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint.
Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason.
Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option.
Reviewed-by: Stanislav Fomichev <[email protected]> Reported-by: [email protected] [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|