|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
d4dd9775 |
| 29-Aug-2024 |
Andrii Nakryiko <[email protected]> |
bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers
Add sleepable implementations of bpf_get_stack() and bpf_get_task_stack() helpers and allow them to be used from sleepable BPF
bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers
Add sleepable implementations of bpf_get_stack() and bpf_get_task_stack() helpers and allow them to be used from sleepable BPF program (e.g., sleepable uprobes).
Note, the stack trace IPs capturing itself is not sleepable (that would need to be a separate project), only build ID fetching is sleepable and thus more reliable, as it will wait for data to be paged in, if necessary. For that we make use of sleepable build_id_parse() implementation.
Now that build ID related internals in kernel/bpf/stackmap.c can be used both in sleepable and non-sleepable contexts, we need to add additional rcu_read_lock()/rcu_read_unlock() protection around fetching perf_callchain_entry, but with the refactoring in previous commit it's now pretty straightforward. We make sure to do rcu_read_unlock (in sleepable mode only) right before stack_map_get_build_id_offset() call which can sleep. By that time we don't have any more use of perf_callchain_entry.
Note, bpf_get_task_stack() will fail for user mode if task != current. And for kernel mode build ID are irrelevant. So in that sense adding sleepable bpf_get_task_stack() implementation is a no-op. It feel right to wire this up for symmetry and completeness, but I'm open to just dropping it until we support `user && crosstask` condition.
Reviewed-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
4f4c4fc0 |
| 29-Aug-2024 |
Andrii Nakryiko <[email protected]> |
bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry
Change stack_map_get_build_id_offset() which is used to convert stack trace IP addresses into build ID+offset pairs. Right now
bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry
Change stack_map_get_build_id_offset() which is used to convert stack trace IP addresses into build ID+offset pairs. Right now this function accepts an array of u64s as an input, and uses array of struct bpf_stack_build_id as an output.
This is problematic because u64 array is coming from perf_callchain_entry, which is (non-sleepable) RCU protected, so once we allows sleepable build ID fetching, this all breaks down.
But its actually pretty easy to make stack_map_get_build_id_offset() works with array of struct bpf_stack_build_id as both input and output. Which is what this patch is doing, eliminating the dependency on perf_callchain_entry. We require caller to fill out bpf_stack_build_id.ip fields (all other can be left uninitialized), and update in place as we do build ID resolution.
We make sure to READ_ONCE() and cache locally current IP value as we used it in a few places to find matching VMA and so on. Given this data is directly accessible and modifiable by user's BPF code, we should make sure to have a consistent view of it.
Reviewed-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
45b8fc30 |
| 29-Aug-2024 |
Andrii Nakryiko <[email protected]> |
lib/buildid: rename build_id_parse() into build_id_parse_nofault()
Make it clear that build_id_parse() assumes that it can take no page fault by renaming it and current few users to build_id_parse_n
lib/buildid: rename build_id_parse() into build_id_parse_nofault()
Make it clear that build_id_parse() assumes that it can take no page fault by renaming it and current few users to build_id_parse_nofault().
Also add build_id_parse() stub which for now falls back to non-sleepable implementation, but will be changed in subsequent patches to take advantage of sleepable context. PROCMAP_QUERY ioctl() on /proc/<pid>/maps file is using build_id_parse() and will automatically take advantage of more reliable sleepable context implementation.
Reviewed-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8 |
|
| #
7a4b2125 |
| 07-Mar-2024 |
Toke Høiland-Jørgensen <[email protected]> |
bpf: Fix stackmap overflow check on 32-bit arches
The stackmap code relies on roundup_pow_of_two() to compute the number of hash buckets, and contains an overflow check by checking if the resulting
bpf: Fix stackmap overflow check on 32-bit arches
The stackmap code relies on roundup_pow_of_two() to compute the number of hash buckets, and contains an overflow check by checking if the resulting value is 0. However, on 32-bit arches, the roundup code itself can overflow by doing a 32-bit left-shift of an unsigned long value, which is undefined behaviour, so it is not guaranteed to truncate neatly. This was triggered by syzbot on the DEVMAP_HASH type, which contains the same check, copied from the hashtab code.
The commit in the fixes tag actually attempted to fix this, but the fix did not account for the UB, so the fix only works on CPUs where an overflow does result in a neat truncation to zero, which is not guaranteed. Checking the value before rounding does not have this problem.
Fixes: 6183f4d3a0a2 ("bpf: Check for integer overflow when using roundup_pow_of_two()") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Reviewed-by: Bui Quang Minh <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1 |
|
| #
b8e3a87a |
| 08-Nov-2023 |
Jordan Rome <[email protected]> |
bpf: Add crosstask check to __bpf_get_stack
Currently get_perf_callchain only supports user stack walking for the current task. Passing the correct *crosstask* param will return 0 frames if the task
bpf: Add crosstask check to __bpf_get_stack
Currently get_perf_callchain only supports user stack walking for the current task. Passing the correct *crosstask* param will return 0 frames if the task passed to __bpf_get_stack isn't the current one instead of a single incorrect frame/address. This change passes the correct *crosstask* param but also does a preemptive check in __bpf_get_stack if the task is current and returns -EOPNOTSUPP if it is not.
This issue was found using bpf_get_task_stack inside a BPF iterator ("iter/task"), which iterates over all tasks. bpf_get_task_stack works fine for fetching kernel stacks but because get_perf_callchain relies on the caller to know if the requested *task* is the current one (via *crosstask*) it was failing in a confusing way.
It might be possible to get user stacks for all tasks utilizing something like access_process_vm but that requires the bpf program calling bpf_get_task_stack to be sleepable and would therefore be a breaking change.
Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") Signed-off-by: Jordan Rome <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5 |
|
| #
84cb9cbd |
| 06-Oct-2023 |
Kees Cook <[email protected]> |
bpf: Annotate struct bpf_stack_map with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have
bpf: Annotate struct bpf_stack_map with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).
As found with Coccinelle [1], add __counted_by for struct bpf_stack_map.
Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7 |
|
| #
6c3eba1c |
| 13-Jun-2023 |
Andrii Nakryiko <[email protected]> |
bpf: Centralize permissions checks for all BPF map types
This allows to do more centralized decisions later on, and generally makes it very explicit which maps are privileged and which are not (e.g.
bpf: Centralize permissions checks for all BPF map types
This allows to do more centralized decisions later on, and generally makes it very explicit which maps are privileged and which are not (e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants, as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit and easy to verify).
Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
d7ba4cc9 |
| 22-Mar-2023 |
JP Kobryn <[email protected]> |
bpf: return long from bpf_map_ops funcs
This patch changes the return types of bpf_map_ops functions to long, where previously int was returned. Using long allows for bpf programs to maintain the si
bpf: return long from bpf_map_ops funcs
This patch changes the return types of bpf_map_ops functions to long, where previously int was returned. Using long allows for bpf programs to maintain the sign bit in the absence of sign extension during situations where inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative error is returned.
The definitions of the helper funcs are generated from comments in the bpf uapi header at `include/uapi/linux/bpf.h`. The return type of these helpers was previously changed from int to long in commit bdb7b79b4ce8. For any case where one of the map helpers call the bpf_map_ops funcs that are still returning 32-bit int, a compiler might not include sign extension instructions to properly convert the 32-bit negative value a 64-bit negative value.
For example: bpf assembly excerpt of an inlined helper calling a kernel function and checking for a specific error:
; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST); ... 46: call 0xffffffffe103291c ; htab_map_update_elem ; if (err && err != -EEXIST) { 4b: cmp $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
kernel function assembly excerpt of return value from `htab_map_update_elem` returning 32-bit int:
movl $0xffffffef, %r9d ... movl %r9d, %eax
...results in the comparison: cmp $0xffffffffffffffef, $0x00000000ffffffef
Fixes: bdb7b79b4ce8 ("bpf: Switch most helper return values from 32-bit int to 64-bit long") Tested-by: Eduard Zingerman <[email protected]> Signed-off-by: JP Kobryn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc3, v6.3-rc2, v6.3-rc1 |
|
| #
cbb9b606 |
| 05-Mar-2023 |
Yafang Shao <[email protected]> |
bpf: stackmap memory usage
A new helper is introduced to get stackmap memory usage. Some small memory allocations are ignored as their memory size is quite small compared to the totol usage.
The re
bpf: stackmap memory usage
A new helper is introduced to get stackmap memory usage. Some small memory allocations are ignored as their memory size is quite small compared to the totol usage.
The result as follows, - before 16: stack_trace name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 1048576B
- after 16: stack_trace name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 2097472B
Signed-off-by: Yafang Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5 |
|
| #
16817ad7 |
| 08-Sep-2022 |
Namhyung Kim <[email protected]> |
perf/bpf: Always use perf callchains if exist
If the perf_event has PERF_SAMPLE_CALLCHAIN, BPF can use it for stack trace. The problematic cases like PEBS and IBS already handled in the PMU driver a
perf/bpf: Always use perf callchains if exist
If the perf_event has PERF_SAMPLE_CALLCHAIN, BPF can use it for stack trace. The problematic cases like PEBS and IBS already handled in the PMU driver and they filled the callchain info in the sample data. For others, we can call perf_callchain() before the BPF handler.
Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5 |
|
| #
c317ab71 |
| 25-Apr-2022 |
Menglong Dong <[email protected]> |
bpf: Compute map_btf_id during build time
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map types are computed during vmlinux-btf init:
btf_parse_vmlinux() -> btf_vmlinux_map_id
bpf: Compute map_btf_id during build time
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map types are computed during vmlinux-btf init:
btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()
It will lookup the btf_type according to the 'map_btf_name' field in 'struct bpf_map_ops'. This process can be done during build time, thanks to Jiri's resolve_btfids.
selftest of map_ptr has passed:
$96 map_ptr:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Reported-by: kernel test robot <[email protected]> Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc4, v5.18-rc3, v5.18-rc2 |
|
| #
b4504319 |
| 07-Apr-2022 |
Yuntao Wang <[email protected]> |
bpf: Fix excessive memory allocation in stack_map_alloc()
The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the allocated memory for 'smap' is never used after the memlock acc
bpf: Fix excessive memory allocation in stack_map_alloc()
The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the allocated memory for 'smap' is never used after the memlock accounting was removed, thus get rid of it.
[ Note, Daniel:
Commit b936ca643ade ("bpf: rework memlock-based memory accounting for maps") moved `cost += n_buckets * (value_size + sizeof(struct stack_map_bucket))` up and therefore before the bpf_map_area_alloc() allocation, sigh. In a later step commit c85d69135a91 ("bpf: move memory size checks to bpf_map_charge_init()"), and the overflow checks of `cost >= U32_MAX - PAGE_SIZE` moved into bpf_map_charge_init(). And then 370868107bf6 ("bpf: Eliminate rlimit-based memory accounting for stackmap maps") finally removed the bpf_map_charge_init(). Anyway, the original code did the allocation same way as /after/ this fix. ]
Fixes: b936ca643ade ("bpf: rework memlock-based memory accounting for maps") Signed-off-by: Yuntao Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.18-rc1 |
|
| #
8eb943fc |
| 23-Mar-2022 |
Yuntao Wang <[email protected]> |
bpf: Remove redundant assignment to smap->map.value_size
The attr->value_size is already assigned to smap->map.value_size in bpf_map_init_from_attr(), there is no need to do it again in stack_map_al
bpf: Remove redundant assignment to smap->map.value_size
The attr->value_size is already assigned to smap->map.value_size in bpf_map_init_from_attr(), there is no need to do it again in stack_map_alloc().
Signed-off-by: Yuntao Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Joanne Koong <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.17 |
|
| #
ee2a0988 |
| 14-Mar-2022 |
Namhyung Kim <[email protected]> |
bpf: Adjust BPF stack helper functions to accommodate skip > 0
Let's say that the caller has storage for num_elem stack frames. Then, the BPF stack helper functions walk the stack for only num_elem
bpf: Adjust BPF stack helper functions to accommodate skip > 0
Let's say that the caller has storage for num_elem stack frames. Then, the BPF stack helper functions walk the stack for only num_elem frames. This means that if skip > 0, one keeps only 'num_elem - skip' frames.
This is because it sets init_nr in the perf_callchain_entry to the end of the buffer to save num_elem entries only. I believe it was because the perf callchain code unwound the stack frames until it reached the global max size (sysctl_perf_event_max_stack).
However it now has perf_callchain_entry_ctx.max_stack to limit the iteration locally. This simplifies the code to handle init_nr in the BPF callstack entries and removes the confusion with the perf_event's __PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0.
Also change the comment on bpf_get_stack() in the header file to be more explicit what the return value means.
Fixes: c195651e565a ("bpf: add bpf_get_stack helper") Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
Based-on-patch-by: Eugene Loh <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
ceac059e |
| 24-Feb-2022 |
Hao Luo <[email protected]> |
bpf: Cache the last valid build_id
For binaries that are statically linked, consecutive stack frames are likely to be in the same VMA and therefore have the same build id.
On a real-world workload,
bpf: Cache the last valid build_id
For binaries that are statically linked, consecutive stack frames are likely to be in the same VMA and therefore have the same build id.
On a real-world workload, we observed that 66% of CPU cycles in __bpf_get_stackid() were spent on build_id_parse() and find_vma().
As an optimization for this case, we can cache the previous frame's VMA, if the new frame has the same VMA as the previous one, reuse the previous one's build id.
We are holding the MM locks as reader across the entire loop, so we don't need to worry about VMA going away.
Tested through "stacktrace_build_id" and "stacktrace_build_id_nmi" in test_progs.
Suggested-by: Greg Thelen <[email protected]> Signed-off-by: Hao Luo <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16 |
|
| #
b992f01e |
| 06-Jan-2022 |
Naveen N. Rao <[email protected]> |
bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
task_pt_regs() can return NULL on powerpc for kernel threads. This is then used in __bpf_get_stack() to check for user mode, resulti
bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
task_pt_regs() can return NULL on powerpc for kernel threads. This is then used in __bpf_get_stack() to check for user mode, resulting in a kernel oops. Guard against this by checking return value of task_pt_regs() before trying to obtain the call chain.
Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()") Cc: [email protected] # v5.9+ Signed-off-by: Naveen N. Rao <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/d5ef83c361cc255494afd15ff1b4fb02a36e1dcf.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
show more ...
|
|
Revision tags: v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1 |
|
| #
d19ddb47 |
| 12-Nov-2021 |
Song Liu <[email protected]> |
bpf: Introduce btf_tracing_ids
Similar to btf_sock_ids, btf_tracing_ids provides btf ID for task_struct, file, and vm_area_struct via easy to understand format like btf_tracing_ids[BTF_TRACING_TYPE_
bpf: Introduce btf_tracing_ids
Similar to btf_sock_ids, btf_tracing_ids provides btf ID for task_struct, file, and vm_area_struct via easy to understand format like btf_tracing_ids[BTF_TRACING_TYPE_[TASK|file|VMA]].
Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Song Liu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| #
7c7e3d31 |
| 05-Nov-2021 |
Song Liu <[email protected]> |
bpf: Introduce helper bpf_find_vma
In some profiler use cases, it is necessary to map an address to the backing file, e.g., a shared library. bpf_find_vma helper provides a flexible way to achieve t
bpf: Introduce helper bpf_find_vma
In some profiler use cases, it is necessary to map an address to the backing file, e.g., a shared library. bpf_find_vma helper provides a flexible way to achieve this. bpf_find_vma maps an address of a task to the vma (vm_area_struct) for this address, and feed the vma to an callback BPF function. The callback function is necessary here, as we need to ensure mmap_sem is unlocked.
It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem safely when irqs are disable, we use the same mechanism as stackmap with build_id. Specifically, when irqs are disabled, the unlocked is postponed in an irq_work. Refactor stackmap.c so that the irq_work is shared among bpf_find_vma and stackmap helpers.
Signed-off-by: Song Liu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Tested-by: Hengqi Chen <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4 |
|
| #
30e29a9a |
| 30-Sep-2021 |
Tatsuhiko Yasumatsu <[email protected]> |
bpf: Fix integer overflow in prealloc_elems_and_freelist()
In prealloc_elems_and_freelist(), the multiplication to calculate the size passed to bpf_map_area_alloc() could lead to an integer overflow
bpf: Fix integer overflow in prealloc_elems_and_freelist()
In prealloc_elems_and_freelist(), the multiplication to calculate the size passed to bpf_map_area_alloc() could lead to an integer overflow. As a result, out-of-bounds write could occur in pcpu_freelist_populate() as reported by KASAN:
[...] [ 16.968613] BUG: KASAN: slab-out-of-bounds in pcpu_freelist_populate+0xd9/0x100 [ 16.969408] Write of size 8 at addr ffff888104fc6ea0 by task crash/78 [ 16.970038] [ 16.970195] CPU: 0 PID: 78 Comm: crash Not tainted 5.15.0-rc2+ #1 [ 16.970878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 16.972026] Call Trace: [ 16.972306] dump_stack_lvl+0x34/0x44 [ 16.972687] print_address_description.constprop.0+0x21/0x140 [ 16.973297] ? pcpu_freelist_populate+0xd9/0x100 [ 16.973777] ? pcpu_freelist_populate+0xd9/0x100 [ 16.974257] kasan_report.cold+0x7f/0x11b [ 16.974681] ? pcpu_freelist_populate+0xd9/0x100 [ 16.975190] pcpu_freelist_populate+0xd9/0x100 [ 16.975669] stack_map_alloc+0x209/0x2a0 [ 16.976106] __sys_bpf+0xd83/0x2ce0 [...]
The possibility of this overflow was originally discussed in [0], but was overlooked.
Fix the integer overflow by changing elem_size to u64 from u32.
[0] https://lore.kernel.org/bpf/[email protected]/
Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation") Signed-off-by: Tatsuhiko Yasumatsu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.15-rc3, v5.15-rc2, v5.15-rc1 |
|
| #
2f1aaf3e |
| 09-Sep-2021 |
Yonghong Song <[email protected]> |
bpf, mm: Fix lockdep warning triggered by stack_map_get_build_id_offset()
Currently the bpf selftest "get_stack_raw_tp" triggered the warning:
[ 1411.304463] WARNING: CPU: 3 PID: 140 at include/l
bpf, mm: Fix lockdep warning triggered by stack_map_get_build_id_offset()
Currently the bpf selftest "get_stack_raw_tp" triggered the warning:
[ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0 [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod] [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G W O 5.14.0+ #53 [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0 [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8 [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246 [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000 [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0 [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0 [ 1411.304501] FS: 00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000 [ 1411.304504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0 [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1411.304512] Call Trace: [ 1411.304517] stack_map_get_build_id_offset+0x17c/0x260 [ 1411.304528] __bpf_get_stack+0x18f/0x230 [ 1411.304541] bpf_get_stack_raw_tp+0x5a/0x70 [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000 [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04 [ 1411.306978] bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0 [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108 [ 1411.309074] bpf_trace_run2+0x8f/0x1a0 [ 1411.309891] FS: 00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000 [ 1411.309896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1411.311221] syscall_trace_enter.isra.20+0x161/0x1f0 [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0 [ 1411.312291] do_syscall_64+0x15/0x80 [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1411.313803] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1411.315082] RIP: 0033:0x7f00ad80a0e0 [ 1411.315626] Call Trace: [ 1411.315632] stack_map_get_build_id_offset+0x17c/0x260
To reproduce, first build `test_progs` binary:
make -C tools/testing/selftests/bpf -j60
and then run the binary at tools/testing/selftests/bpf directory:
./test_progs -t get_stack_raw_tp
The warning is due to commit 5b78ed24e8ec ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function asserts that mm->mmap_lock needs to be held. But this is not the case for bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held in bpf_get_stack[id]() use case, the above warning is emitted during test run.
This patch fixed the issue by (1). using mmap_read_trylock() instead of mmap_read_trylock_non_owner() to satisfy lockdep checking in find_vma(), and (2). droping lockdep for mmap_lock right before the irq_work_queue(). The function mmap_read_trylock_non_owner() is also removed since after this patch nobody calls it any more.
Fixes: 5b78ed24e8ec ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Luigi Rizzo <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.14 |
|
| #
33c5cb36 |
| 24-Aug-2021 |
Daniel Xu <[email protected]> |
bpf: Consolidate task_struct BTF_ID declarations
No need to have it defined 5 times. Once is enough.
Signed-off-by: Daniel Xu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link
bpf: Consolidate task_struct BTF_ID declarations
No need to have it defined 5 times. Once is enough.
Signed-off-by: Daniel Xu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz
show more ...
|
|
Revision tags: v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6 |
|
| #
06ab134c |
| 01-Apr-2021 |
Dave Marchevsky <[email protected]> |
bpf: Refcount task stack in bpf_get_task_stack
On x86 the struct pt_regs * grabbed by task_pt_regs() points to an offset of task->stack. The pt_regs are later dereferenced in __bpf_get_stack (e.g. b
bpf: Refcount task stack in bpf_get_task_stack
On x86 the struct pt_regs * grabbed by task_pt_regs() points to an offset of task->stack. The pt_regs are later dereferenced in __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if the task in question exits while bpf_get_task_stack is executing, as warned by task_stack_page's comment:
* When accessing the stack of a non-current task that might exit, use * try_get_task_stack() instead. task_stack_page will return a pointer * that could get freed out from under you.
Taking the comment's advice and using try_get_task_stack() and put_task_stack() to hold task->stack refcount, or bail early if it's already 0. Incrementing stack_refcount will ensure the task's stack sticks around while we're using its data.
I noticed this bug while testing a bpf task iter similar to bpf_iter_task_stack in selftests, except mine grabbed user stack, and getting intermittent crashes, which resulted in dumps like:
BUG: unable to handle page fault for address: 0000000000003fe0 \#PF: supervisor read access in kernel mode \#PF: error_code(0x0000) - not-present page RIP: 0010:__bpf_get_stack+0xd0/0x230 <snip...> Call Trace: bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec bpf_iter_run_prog+0x24/0x81 __task_seq_show+0x58/0x80 bpf_seq_read+0xf7/0x3d0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x48/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") Signed-off-by: Dave Marchevsky <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6 |
|
| #
6183f4d3 |
| 27-Jan-2021 |
Bui Quang Minh <[email protected]> |
bpf: Check for integer overflow when using roundup_pow_of_two()
On 32-bit architecture, roundup_pow_of_two() can return 0 when the argument has upper most bit set due to resulting 1UL << 32. Add a c
bpf: Check for integer overflow when using roundup_pow_of_two()
On 32-bit architecture, roundup_pow_of_two() can return 0 when the argument has upper most bit set due to resulting 1UL << 32. Add a check for this case.
Fixes: d5a3b1f69186 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE") Signed-off-by: Bui Quang Minh <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.11-rc5, v5.11-rc4 |
|
| #
921f88fc |
| 14-Jan-2021 |
Jiri Olsa <[email protected]> |
bpf: Add size arg to build_id_parse function
It's possible to have other build id types (other than default SHA1). Currently there's also ld support for MD5 build id.
Adding size argument to build_
bpf: Add size arg to build_id_parse function
It's possible to have other build id types (other than default SHA1). Currently there's also ld support for MD5 build id.
Adding size argument to build_id_parse function, that returns (if defined) size of the parsed build id, so we can recognize the build id type.
Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| #
bd7525da |
| 14-Jan-2021 |
Jiri Olsa <[email protected]> |
bpf: Move stack_map_get_build_id into lib
Moving stack_map_get_build_id into lib with declaration in linux/buildid.h header:
int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id
bpf: Move stack_map_get_build_id into lib
Moving stack_map_get_build_id into lib with declaration in linux/buildid.h header:
int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);
This function returns build id for given struct vm_area_struct. There is no functional change to stack_map_get_build_id function.
Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|