|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
7e74a7c0 |
| 29-Jan-2025 |
Sebastian Andrzej Siewior <[email protected]> |
kprobes: Use RCU in all users of __module_text_address().
__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled.
Replace the preempt_disab
kprobes: Use RCU in all users of __module_text_address().
__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled.
Replace the preempt_disable() section around __module_text_address() with RCU.
Cc: David S. Miller <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Naveen N Rao <[email protected]> Cc: [email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Petr Pavlu <[email protected]>
show more ...
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
bef8e6af |
| 09-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
kprobes: Remove remaining gotos
Remove remaining gotos from kprobes.c to clean up the code. This does not use cleanup macros, but changes code flow for avoiding gotos.
Link: https://lore.kernel.org
kprobes: Remove remaining gotos
Remove remaining gotos from kprobes.c to clean up the code. This does not use cleanup macros, but changes code flow for avoiding gotos.
Link: https://lore.kernel.org/all/173371212474.480397.5684523564137819115.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
5e5b8b49 |
| 09-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
kprobes: Remove unneeded goto
Remove unneeded gotos. Since the labels referred by these gotos have only one reference for each, we can replace those gotos with the referred code.
Link: https://lore
kprobes: Remove unneeded goto
Remove unneeded gotos. Since the labels referred by these gotos have only one reference for each, we can replace those gotos with the referred code.
Link: https://lore.kernel.org/all/173371211203.480397.13988907319659165160.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
a35fb2bc |
| 09-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
kprobes: Use guard for rcu_read_lock
Use guard(rcu) for rcu_read_lock so that it can remove unneeded gotos and make it more structured.
Link: https://lore.kernel.org/all/173371209846.480397.3852648
kprobes: Use guard for rcu_read_lock
Use guard(rcu) for rcu_read_lock so that it can remove unneeded gotos and make it more structured.
Link: https://lore.kernel.org/all/173371209846.480397.3852648910271029695.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
54c79390 |
| 09-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
kprobes: Use guard() for external locks
Use guard() for text_mutex, cpu_read_lock, and jump_label_lock in the kprobes.
Link: https://lore.kernel.org/all/173371208663.480397.7535769878667655223.stgi
kprobes: Use guard() for external locks
Use guard() for text_mutex, cpu_read_lock, and jump_label_lock in the kprobes.
Link: https://lore.kernel.org/all/173371208663.480397.7535769878667655223.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1 |
|
| #
587e8e6d |
| 29-Nov-2024 |
Masami Hiramatsu (Google) <[email protected]> |
kprobes: Adopt guard() and scoped_guard()
Use guard() or scoped_guard() for critical sections rather than discrete lock/unlock pairs.
Link: https://lore.kernel.org/all/173289887835.73724.6082232173
kprobes: Adopt guard() and scoped_guard()
Use guard() or scoped_guard() for critical sections rather than discrete lock/unlock pairs.
Link: https://lore.kernel.org/all/173289887835.73724.608223217359025939.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
1703abb4 |
| 08-Dec-2024 |
Thomas Weißschuh <[email protected]> |
kprobes: Reduce preempt disable scope in check_kprobe_access_safe()
Commit a189d0350f387 ("kprobes: disable preempt for module_text_address() and kernel_text_address()") introduced a preempt_disable
kprobes: Reduce preempt disable scope in check_kprobe_access_safe()
Commit a189d0350f387 ("kprobes: disable preempt for module_text_address() and kernel_text_address()") introduced a preempt_disable() region to protect against concurrent module unloading. However this region also includes the call to jump_label_text_reserved() which takes a long time; up to 400us, iterating over approx 6000 jump tables.
The scope protected by preempt_disable() is largen than necessary. core_kernel_text() does not need to be protected as it does not interact with module code at all. Only the scope from __module_text_address() to try_module_get() needs to be protected. By limiting the critical section to __module_text_address() and try_module_get() the function responsible for the latency spike remains preemptible.
This works fine even when !CONFIG_MODULES as in that case try_module_get() will always return true and that block can be optimized away.
Limit the critical section to __module_text_address() and try_module_get(). Use guard(preempt)() for easier error handling.
While at it also remove a spurious *probed_mod = NULL in an error path. On errors the output parameter is never inspected by the caller. Some error paths were clearing the parameters, some didn't. Align them for clarity.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Thomas Weißschuh <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
3fbff988 |
| 30-Oct-2024 |
Nathan Chancellor <[email protected]> |
kprobes: Use struct_size() in __get_insn_slot()
__get_insn_slot() allocates 'struct kprobe_insn_page' using a custom structure size calculation macro, KPROBE_INSN_PAGE_SIZE. Replace KPROBE_INSN_PAGE
kprobes: Use struct_size() in __get_insn_slot()
__get_insn_slot() allocates 'struct kprobe_insn_page' using a custom structure size calculation macro, KPROBE_INSN_PAGE_SIZE. Replace KPROBE_INSN_PAGE_SIZE with the struct_size() macro, which is the preferred way to calculate the size of flexible structures in the kernel because it handles overflow and makes it easier to change and audit how flexible structures are allocated across the entire tree.
Link: https://lore.kernel.org/all/20241030-kprobes-fix-counted-by-annotation-v1-2-8f266001fad0@kernel.org/ (Masami modofied this to be applicable without the 1st patch in the series.)
Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
da93dd93 |
| 13-Aug-2024 |
Jinjie Ruan <[email protected]> |
kprobes: Cleanup collect_one_slot() and __disable_kprobe()
If kip->nused is not zero, collect_one_slot() return false, otherwise do a lot of linked list operations, reverse the processing order to m
kprobes: Cleanup collect_one_slot() and __disable_kprobe()
If kip->nused is not zero, collect_one_slot() return false, otherwise do a lot of linked list operations, reverse the processing order to make the code if nesting more concise. __disable_kprobe() is the same as well.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
ce7f27dc |
| 13-Aug-2024 |
Jinjie Ruan <[email protected]> |
kprobes: Cleanup the config comment
The CONFIG_KPROBES_ON_FTRACE #if/#else/#endif section is small and doesn't nest additional #ifdefs so the comment is useless and should be removed, but the __ARCH
kprobes: Cleanup the config comment
The CONFIG_KPROBES_ON_FTRACE #if/#else/#endif section is small and doesn't nest additional #ifdefs so the comment is useless and should be removed, but the __ARCH_WANT_KPROBES_INSN_SLOT and CONFIG_OPTPROBES() nest is long, it is better to add comment for reading.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2 |
|
| #
8c8acb8f |
| 02-Aug-2024 |
Masami Hiramatsu (Google) <[email protected]> |
kprobes: Fix to check symbol prefixes correctly
Since str_has_prefix() takes the prefix as the 2nd argument and the string as the first, is_cfi_preamble_symbol() always fails to check the prefix. Fi
kprobes: Fix to check symbol prefixes correctly
Since str_has_prefix() takes the prefix as the 2nd argument and the string as the first, is_cfi_preamble_symbol() always fails to check the prefix. Fix the function parameter order so that it correctly check the prefix.
Link: https://lore.kernel.org/all/172260679559.362040.7360872132937227206.stgit@devnote2/
Fixes: de02f2ac5d8c ("kprobes: Prohibit probing on CFI preamble symbol") Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1 |
|
| #
78eb4ea2 |
| 24-Jul-2024 |
Joel Granados <[email protected]> |
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ct
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified.
This patch has been generated by the following coccinelle script:
``` virtual patch
@r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@ identifier func, ctl, write, buffer, lenp, ppos; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... }
@r3@ identifier func; @@
int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *);
@r4@ identifier func, ctl; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *);
@r5@ identifier func, write, buffer, lenp, ppos; @@
int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration.
Co-developed-by: Thomas Weißschuh <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Co-developed-by: Joel Granados <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1 |
|
| #
4b377b48 |
| 18-May-2024 |
Linus Torvalds <[email protected]> |
kprobe/ftrace: fix build error due to bad function definition
Commit 1a7d0890dd4a ("kprobe/ftrace: bail out if ftrace was killed") introduced a bad K&R function definition, which we haven't accepted
kprobe/ftrace: fix build error due to bad function definition
Commit 1a7d0890dd4a ("kprobe/ftrace: bail out if ftrace was killed") introduced a bad K&R function definition, which we haven't accepted in a long long time.
Gcc seems to let it slide, but clang notices with the appropriate error:
kernel/kprobes.c:1140:24: error: a function declaration without a prototype is deprecated in all > 1140 | void kprobe_ftrace_kill() | ^ | void
but this commit was apparently never in linux-next before it was sent upstream, so it didn't get the appropriate build test coverage.
Fixes: 1a7d0890dd4a kprobe/ftrace: bail out if ftrace was killed Cc: Stephen Brennan <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Guo Ren <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.9, v6.9-rc7 |
|
| #
1a7d0890 |
| 01-May-2024 |
Stephen Brennan <[email protected]> |
kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet th
kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic.
This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]:
[1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds
After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Stephen Brennan <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Guo Ren <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
7582b7be |
| 05-May-2024 |
Mike Rapoport (IBM) <[email protected]> |
kprobes: remove dependency on CONFIG_MODULES
kprobes depended on CONFIG_MODULES because it has to allocate memory for code.
Since code allocations are now implemented with execmem, kprobes can be e
kprobes: remove dependency on CONFIG_MODULES
kprobes depended on CONFIG_MODULES because it has to allocate memory for code.
Since code allocations are now implemented with execmem, kprobes can be enabled in non-modular kernels.
Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the dependency of CONFIG_KPROBES on CONFIG_MODULES.
Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> [mcgrof: rebase in light of NEED_TASKS_RCU ] Signed-off-by: Luis Chamberlain <[email protected]>
show more ...
|
| #
12af2b83 |
| 05-May-2024 |
Mike Rapoport (IBM) <[email protected]> |
mm: introduce execmem_alloc() and execmem_free()
module_alloc() is used everywhere as a mean to allocate memory for code.
Beside being semantically wrong, this unnecessarily ties all subsystems tha
mm: introduce execmem_alloc() and execmem_free()
module_alloc() is used everywhere as a mean to allocate memory for code.
Beside being semantically wrong, this unnecessarily ties all subsystems that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code.
Several architectures override module_alloc() because of various constraints where the executable memory can be located and this causes additional obstacles for improvements of code allocation.
Start splitting code allocation from modules by introducing execmem_alloc() and execmem_free() APIs.
Initially, execmem_alloc() is a wrapper for module_alloc() and execmem_free() is a replacement of module_memfree() to allow updating all call sites to use the new APIs.
Since architectures define different restrictions on placement, permissions, alignment and other parameters for memory that can be used by different subsystems that allocate executable memory, execmem_alloc() takes a type argument, that will be used to identify the calling subsystem and to allow architectures define parameters for ranges suitable for that subsystem.
No functional changes.
Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
f884cd38 |
| 27-Jun-2023 |
Joel Granados <[email protected]> |
kprobes: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sent
kprobes: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove sentinel element from kprobe_sysclts
Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
| #
325f3fb5 |
| 10-Apr-2024 |
Zheng Yejian <[email protected]> |
kprobes: Fix possible use-after-free issue on kprobe registration
When unloading a module, its state is changing MODULE_STATE_LIVE -> MODULE_STATE_GOING -> MODULE_STATE_UNFORMED. Each change will t
kprobes: Fix possible use-after-free issue on kprobe registration
When unloading a module, its state is changing MODULE_STATE_LIVE -> MODULE_STATE_GOING -> MODULE_STATE_UNFORMED. Each change will take a time. `is_module_text_address()` and `__module_text_address()` works with MODULE_STATE_LIVE and MODULE_STATE_GOING. If we use `is_module_text_address()` and `__module_text_address()` separately, there is a chance that the first one is succeeded but the next one is failed because module->state becomes MODULE_STATE_UNFORMED between those operations.
In `check_kprobe_address_safe()`, if the second `__module_text_address()` is failed, that is ignored because it expected a kernel_text address. But it may have failed simply because module->state has been changed to MODULE_STATE_UNFORMED. In this case, arm_kprobe() will try to modify non-exist module text address (use-after-free).
To fix this problem, we should not use separated `is_module_text_address()` and `__module_text_address()`, but use only `__module_text_address()` once and do `try_module_get(module)` which is only available with MODULE_STATE_LIVE.
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 28f6c37a2910 ("kprobes: Forbid probing on trampoline and BPF code areas") Cc: [email protected] Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
9efd24ec |
| 19-Sep-2023 |
Li zeming <[email protected]> |
kprobes: Remove unnecessary initial values of variables
ri and sym is assigned first, so it does not need to initialize the assignment.
Link: https://lore.kernel.org/all/20230919012823.7815-1-zemin
kprobes: Remove unnecessary initial values of variables
ri and sym is assigned first, so it does not need to initialize the assignment.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Li zeming <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
d839a656 |
| 01-Dec-2023 |
JP Kobryn <[email protected]> |
kprobes: consistent rcu api usage for kretprobe holder
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is RCU-managed, based on the (non-rethook) implementation of get_kretpr
kprobes: consistent rcu api usage for kretprobe holder
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is RCU-managed, based on the (non-rethook) implementation of get_kretprobe(). The thought behind this patch is to make use of the RCU API where possible when accessing this pointer so that the needed barriers are always in place and to self-document the code.
The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes done to the "rp" pointer are changed to make use of the RCU macro for assignment. For the single read, the implementation of get_kretprobe() is simplified by making use of an RCU macro which accomplishes the same, but note that the log warning text will be more generic.
I did find that there is a difference in assembly generated between the usage of the RCU macros vs without. For example, on arm64, when using rcu_assign_pointer(), the corresponding store instruction is a store-release (STLR) which has an implicit barrier. When normal assignment is done, a regular store (STR) is found. In the macro case, this seems to be a result of rcu_assign_pointer() using smp_store_release() when the value to write is not NULL.
Link: https://lore.kernel.org/all/[email protected]/
Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash") Cc: [email protected] Signed-off-by: JP Kobryn <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
4bbd9345 |
| 17-Oct-2023 |
wuqiang.matt <[email protected]> |
kprobes: kretprobe scalability improvement
kretprobe is using freelist to manage return-instances, but freelist, as LIFO queue based on singly linked list, scales badly and reduces the overall throu
kprobes: kretprobe scalability improvement
kretprobe is using freelist to manage return-instances, but freelist, as LIFO queue based on singly linked list, scales badly and reduces the overall throughput of kretprobed routines, especially for high contention scenarios.
Here's a typical throughput test of sys_prctl (counts in 10 seconds, measured with perf stat -a -I 10000 -e syscalls:sys_enter_prctl):
OS: Debian 10 X86_64, Linux 6.5rc7 with freelist HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s
1T 2T 4T 8T 16T 24T 24150045 29317964 15446741 12494489 18287272 17708768 32T 48T 64T 72T 96T 128T 16200682 13737658 11645677 11269858 10470118 9931051
This patch introduces objpool to replace freelist. objpool is a high performance queue, which can bring near-linear scalability to kretprobed routines. Tests of kretprobe throughput show the biggest ratio as 159x of original freelist. Here's the result:
1T 2T 4T 8T 16T native: 41186213 82336866 164250978 328662645 658810299 freelist: 24150045 29317964 15446741 12494489 18287272 objpool: 23926730 48010314 96125218 191782984 385091769 32T 48T 64T 96T 128T native: 1330338351 1969957941 2512291791 2615754135 2671040914 freelist: 16200682 13737658 11645677 10470118 9931051 objpool: 764481096 1147149781 1456220214 1502109662 1579015050
Testings on 96-core ARM64 output similarly, but with the biggest ratio up to 448x:
OS: Debian 10 AARCH64, Linux 6.5rc7 HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s
1T 2T 4T 8T 16T native: . 30066096 63569843 126194076 257447289 505800181 freelist: 16152090 11064397 11124068 7215768 5663013 objpool: 13997541 28032100 55726624 110099926 221498787 24T 32T 48T 64T 96T native: 763305277 1015925192 1521075123 2033009392 3021013752 freelist: 5015810 4602893 3766792 3382478 2945292 objpool: 328192025 439439564 668534502 887401381 1319972072
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: wuqiang.matt <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
8865aea0 |
| 25-Jul-2023 |
Ruan Jinjie <[email protected]> |
kernel: kprobes: Use struct_size()
Use struct_size() instead of hand-writing it, when allocating a structure with a flex array.
This is less verbose.
Link: https://lore.kernel.org/all/202307251954
kernel: kprobes: Use struct_size()
Use struct_size() instead of hand-writing it, when allocating a structure with a flex array.
This is less verbose.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Ruan Jinjie <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
| #
de02f2ac |
| 11-Jul-2023 |
Masami Hiramatsu (Google) <[email protected]> |
kprobes: Prohibit probing on CFI preamble symbol
Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those are used for CFI and not executed. Probing it will break the CFI.
Link:
kprobes: Prohibit probing on CFI preamble symbol
Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those are used for CFI and not executed. Probing it will break the CFI.
Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
ed9492df |
| 11-Jul-2023 |
Li zeming <[email protected]> |
kernel: kprobes: Remove unnecessary ‘0’ values
it is assigned first, so it does not need to initialize the assignment.
Link: https://lore.kernel.org/all/[email protected]/
kernel: kprobes: Remove unnecessary ‘0’ values
it is assigned first, so it does not need to initialize the assignment.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Li zeming <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|