|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
fa6192ad |
| 12-Feb-2025 |
Jiri Olsa <[email protected]> |
uprobes/x86: Harden uretprobe syscall trampoline check
Jann reported a possible issue when trampoline_check_ip returns address near the bottom of the address space that is allowed to call into the s
uprobes/x86: Harden uretprobe syscall trampoline check
Jann reported a possible issue when trampoline_check_ip returns address near the bottom of the address space that is allowed to call into the syscall if uretprobes are not set up:
https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341b8fbc11737dacbcd29f0054413cbbf
Though the mmap minimum address restrictions will typically prevent creating mappings there, let's make sure uretprobe syscall checks for that.
Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Jann Horn <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1 |
|
| #
83179cd6 |
| 24-Jan-2025 |
Liao Chang <[email protected]> |
uprobes: Remove the spinlock within handle_singlestep()
This patch introduces a flag to track TIF_SIGPENDING is suppress temporarily during the uprobe single-step. Upon uprobe singlestep is handled
uprobes: Remove the spinlock within handle_singlestep()
This patch introduces a flag to track TIF_SIGPENDING is suppress temporarily during the uprobe single-step. Upon uprobe singlestep is handled and the flag is confirmed, it could resume the TIF_SIGPENDING directly without acquiring the siglock in most case, then reducing contention and improving overall performance.
I've use the script developed by Andrii in [1] to run benchmark. The CPU used was Kunpeng916 (Hi1616), 4 NUMA nodes, 64 [email protected] running the kernel on next tree + the optimization for get_xol_insn_slot() [2].
before-opt ---------- uprobe-nop ( 1 cpus): 0.907 ± 0.003M/s ( 0.907M/s/cpu) uprobe-nop ( 2 cpus): 1.676 ± 0.008M/s ( 0.838M/s/cpu) uprobe-nop ( 4 cpus): 3.210 ± 0.003M/s ( 0.802M/s/cpu) uprobe-nop ( 8 cpus): 4.457 ± 0.003M/s ( 0.557M/s/cpu) uprobe-nop (16 cpus): 3.724 ± 0.011M/s ( 0.233M/s/cpu) uprobe-nop (32 cpus): 2.761 ± 0.003M/s ( 0.086M/s/cpu) uprobe-nop (64 cpus): 1.293 ± 0.015M/s ( 0.020M/s/cpu)
uprobe-push ( 1 cpus): 0.883 ± 0.001M/s ( 0.883M/s/cpu) uprobe-push ( 2 cpus): 1.642 ± 0.005M/s ( 0.821M/s/cpu) uprobe-push ( 4 cpus): 3.086 ± 0.002M/s ( 0.771M/s/cpu) uprobe-push ( 8 cpus): 3.390 ± 0.003M/s ( 0.424M/s/cpu) uprobe-push (16 cpus): 2.652 ± 0.005M/s ( 0.166M/s/cpu) uprobe-push (32 cpus): 2.713 ± 0.005M/s ( 0.085M/s/cpu) uprobe-push (64 cpus): 1.313 ± 0.009M/s ( 0.021M/s/cpu)
uprobe-ret ( 1 cpus): 1.774 ± 0.000M/s ( 1.774M/s/cpu) uprobe-ret ( 2 cpus): 3.350 ± 0.001M/s ( 1.675M/s/cpu) uprobe-ret ( 4 cpus): 6.604 ± 0.000M/s ( 1.651M/s/cpu) uprobe-ret ( 8 cpus): 6.706 ± 0.005M/s ( 0.838M/s/cpu) uprobe-ret (16 cpus): 5.231 ± 0.001M/s ( 0.327M/s/cpu) uprobe-ret (32 cpus): 5.743 ± 0.003M/s ( 0.179M/s/cpu) uprobe-ret (64 cpus): 4.726 ± 0.016M/s ( 0.074M/s/cpu)
after-opt --------- uprobe-nop ( 1 cpus): 0.985 ± 0.002M/s ( 0.985M/s/cpu) uprobe-nop ( 2 cpus): 1.773 ± 0.005M/s ( 0.887M/s/cpu) uprobe-nop ( 4 cpus): 3.304 ± 0.001M/s ( 0.826M/s/cpu) uprobe-nop ( 8 cpus): 5.328 ± 0.002M/s ( 0.666M/s/cpu) uprobe-nop (16 cpus): 6.475 ± 0.002M/s ( 0.405M/s/cpu) uprobe-nop (32 cpus): 4.831 ± 0.082M/s ( 0.151M/s/cpu) uprobe-nop (64 cpus): 2.564 ± 0.053M/s ( 0.040M/s/cpu)
uprobe-push ( 1 cpus): 0.964 ± 0.001M/s ( 0.964M/s/cpu) uprobe-push ( 2 cpus): 1.766 ± 0.002M/s ( 0.883M/s/cpu) uprobe-push ( 4 cpus): 3.290 ± 0.009M/s ( 0.823M/s/cpu) uprobe-push ( 8 cpus): 4.670 ± 0.002M/s ( 0.584M/s/cpu) uprobe-push (16 cpus): 5.197 ± 0.004M/s ( 0.325M/s/cpu) uprobe-push (32 cpus): 5.068 ± 0.161M/s ( 0.158M/s/cpu) uprobe-push (64 cpus): 2.605 ± 0.026M/s ( 0.041M/s/cpu)
uprobe-ret ( 1 cpus): 1.833 ± 0.001M/s ( 1.833M/s/cpu) uprobe-ret ( 2 cpus): 3.384 ± 0.003M/s ( 1.692M/s/cpu) uprobe-ret ( 4 cpus): 6.677 ± 0.004M/s ( 1.669M/s/cpu) uprobe-ret ( 8 cpus): 6.854 ± 0.005M/s ( 0.857M/s/cpu) uprobe-ret (16 cpus): 6.508 ± 0.006M/s ( 0.407M/s/cpu) uprobe-ret (32 cpus): 5.793 ± 0.009M/s ( 0.181M/s/cpu) uprobe-ret (64 cpus): 4.743 ± 0.016M/s ( 0.074M/s/cpu)
Above benchmark results demonstrates a obivious improvement in the scalability of trig-uprobe-nop and trig-uprobe-push, the peak throughput of which are from 4.5M/s to 6.4M/s and 3.3M/s to 5.1M/s individually.
[1] https://lore.kernel.org/all/[email protected] [2] https://lore.kernel.org/all/[email protected]
Acked-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Liao Chang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
8622e45b |
| 06-Dec-2024 |
Andrii Nakryiko <[email protected]> |
uprobes: Reuse return_instances between multiple uretprobes within task
Instead of constantly allocating and freeing very short-lived struct return_instance, reuse it as much as possible within curr
uprobes: Reuse return_instances between multiple uretprobes within task
Instead of constantly allocating and freeing very short-lived struct return_instance, reuse it as much as possible within current task. For that, store a linked list of reusable return_instances within current->utask.
The only complication is that ri_timer() might be still processing such return_instance. And so while the main uretprobe processing logic might be already done with return_instance and would be OK to immediately reuse it for the next uretprobe instance, it's not correct to unconditionally reuse it just like that.
Instead we make sure that ri_timer() can't possibly be processing it by using seqcount_t, with ri_timer() being "a writer", while free_ret_instance() being "a reader". If, after we unlink return instance from utask->return_instances list, we know that ri_timer() hasn't gotten to processing utask->return_instances yet, then we can be sure that immediate return_instance reuse is OK, and so we put it onto utask->ri_pool for future (potentially, almost immediate) reuse.
This change shows improvements both in single CPU performance (by avoiding relatively expensive kmalloc/free combon) and in terms of multi-CPU scalability, where you can see that per-CPU throughput doesn't decline as steeply with increased number of CPUs (which were previously attributed to kmalloc()/free() through profiling):
BASELINE (latest perf/core) =========================== uretprobe-nop ( 1 cpus): 1.898 ± 0.002M/s ( 1.898M/s/cpu) uretprobe-nop ( 2 cpus): 3.574 ± 0.011M/s ( 1.787M/s/cpu) uretprobe-nop ( 3 cpus): 5.279 ± 0.066M/s ( 1.760M/s/cpu) uretprobe-nop ( 4 cpus): 6.824 ± 0.047M/s ( 1.706M/s/cpu) uretprobe-nop ( 5 cpus): 8.339 ± 0.060M/s ( 1.668M/s/cpu) uretprobe-nop ( 6 cpus): 9.812 ± 0.047M/s ( 1.635M/s/cpu) uretprobe-nop ( 7 cpus): 11.030 ± 0.048M/s ( 1.576M/s/cpu) uretprobe-nop ( 8 cpus): 12.453 ± 0.126M/s ( 1.557M/s/cpu) uretprobe-nop (10 cpus): 14.838 ± 0.044M/s ( 1.484M/s/cpu) uretprobe-nop (12 cpus): 17.092 ± 0.115M/s ( 1.424M/s/cpu) uretprobe-nop (14 cpus): 19.576 ± 0.022M/s ( 1.398M/s/cpu) uretprobe-nop (16 cpus): 22.264 ± 0.015M/s ( 1.391M/s/cpu) uretprobe-nop (24 cpus): 33.534 ± 0.078M/s ( 1.397M/s/cpu) uretprobe-nop (32 cpus): 43.262 ± 0.127M/s ( 1.352M/s/cpu) uretprobe-nop (40 cpus): 53.252 ± 0.080M/s ( 1.331M/s/cpu) uretprobe-nop (48 cpus): 55.778 ± 0.045M/s ( 1.162M/s/cpu) uretprobe-nop (56 cpus): 56.850 ± 0.227M/s ( 1.015M/s/cpu) uretprobe-nop (64 cpus): 62.005 ± 0.077M/s ( 0.969M/s/cpu) uretprobe-nop (72 cpus): 66.445 ± 0.236M/s ( 0.923M/s/cpu) uretprobe-nop (80 cpus): 68.353 ± 0.180M/s ( 0.854M/s/cpu)
THIS PATCHSET (on top of latest perf/core) ========================================== uretprobe-nop ( 1 cpus): 2.253 ± 0.004M/s ( 2.253M/s/cpu) uretprobe-nop ( 2 cpus): 4.281 ± 0.003M/s ( 2.140M/s/cpu) uretprobe-nop ( 3 cpus): 6.389 ± 0.027M/s ( 2.130M/s/cpu) uretprobe-nop ( 4 cpus): 8.328 ± 0.005M/s ( 2.082M/s/cpu) uretprobe-nop ( 5 cpus): 10.353 ± 0.001M/s ( 2.071M/s/cpu) uretprobe-nop ( 6 cpus): 12.513 ± 0.010M/s ( 2.086M/s/cpu) uretprobe-nop ( 7 cpus): 14.525 ± 0.017M/s ( 2.075M/s/cpu) uretprobe-nop ( 8 cpus): 15.633 ± 0.013M/s ( 1.954M/s/cpu) uretprobe-nop (10 cpus): 19.532 ± 0.011M/s ( 1.953M/s/cpu) uretprobe-nop (12 cpus): 21.405 ± 0.009M/s ( 1.784M/s/cpu) uretprobe-nop (14 cpus): 24.857 ± 0.020M/s ( 1.776M/s/cpu) uretprobe-nop (16 cpus): 26.466 ± 0.018M/s ( 1.654M/s/cpu) uretprobe-nop (24 cpus): 40.513 ± 0.222M/s ( 1.688M/s/cpu) uretprobe-nop (32 cpus): 54.180 ± 0.074M/s ( 1.693M/s/cpu) uretprobe-nop (40 cpus): 66.100 ± 0.082M/s ( 1.652M/s/cpu) uretprobe-nop (48 cpus): 70.544 ± 0.068M/s ( 1.470M/s/cpu) uretprobe-nop (56 cpus): 74.494 ± 0.055M/s ( 1.330M/s/cpu) uretprobe-nop (64 cpus): 79.317 ± 0.029M/s ( 1.239M/s/cpu) uretprobe-nop (72 cpus): 84.875 ± 0.020M/s ( 1.179M/s/cpu) uretprobe-nop (80 cpus): 92.318 ± 0.224M/s ( 1.154M/s/cpu)
For reference, with uprobe-nop we hit the following throughput:
uprobe-nop (80 cpus): 143.485 ± 0.035M/s ( 1.794M/s/cpu)
So now uretprobe stays a bit closer to that performance.
Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
2ff913ab |
| 06-Dec-2024 |
Andrii Nakryiko <[email protected]> |
uprobes: Simplify session consumer tracking
In practice, each return_instance will typically contain either zero or one return_consumer, depending on whether it has any uprobe session consumer attac
uprobes: Simplify session consumer tracking
In practice, each return_instance will typically contain either zero or one return_consumer, depending on whether it has any uprobe session consumer attached or not. It's highly unlikely that more than one uprobe session consumers will be attached to any given uprobe, so there is no need to optimize for that case. But the way we currently do memory allocation and accounting is by pre-allocating the space for 4 session consumers in contiguous block of memory next to struct return_instance fixed part. This is unnecessarily wasteful.
This patch changes this to keep struct return_instance fixed-sized with one pre-allocated return_consumer, while (in a highly unlikely scenario) allowing for more session consumers in a separate dynamically allocated and reallocated array.
We also simplify accounting a bit by not maintaining a separate temporary capacity for consumers array, and, instead, relying on krealloc() to be a no-op if underlying memory can accommodate a slightly bigger allocation (but again, it's very uncommon scenario to even have to do this reallocation).
All this gets rid of ri_size(), simplifies push_consumer() and removes confusing ri->consumers_cnt re-assignment, while containing this singular preallocated consumer logic contained within a few simple preexisting helpers.
Having fixed-sized struct return_instance simplifies and speeds up return_instance reuse that we ultimately add later in this patch set, see follow up patches.
Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
c554aa9c |
| 01-Nov-2024 |
Christophe JAILLET <[email protected]> |
uprobes: Re-order struct uprobe_task to save some space
On x86_64, with allmodconfig, struct uprobe_task is 72 bytes long, with a hole and some padding.
/* size: 72, cachelines: 2, members: 7 */
uprobes: Re-order struct uprobe_task to save some space
On x86_64, with allmodconfig, struct uprobe_task is 72 bytes long, with a hole and some padding.
/* size: 72, cachelines: 2, members: 7 */ /* sum members: 64, holes: 1, sum holes: 4 */ /* padding: 4 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 4 */ /* last cacheline: 8 bytes */
Reorder the structure to fill the hole and avoid the padding.
This way, the whole structure fits in a single cacheline and some memory is saved when it is allocated.
/* size: 64, cachelines: 1, members: 7 */ /* forced alignments: 1 */
Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: "Masami Hiramatsu (Google)" <[email protected]> Link: https://lore.kernel.org/r/a9f541d0cedf421f765c77a1fb93d6a979778a88.1730495562.git.christophe.jaillet@wanadoo.fr
show more ...
|
|
Revision tags: v6.12-rc5 |
|
| #
dd1a7567 |
| 24-Oct-2024 |
Andrii Nakryiko <[email protected]> |
uprobes: SRCU-protect uretprobe lifetime (with timeout)
Avoid taking refcount on uprobe in prepare_uretprobe(), instead take uretprobe-specific SRCU lock and keep it active as kernel transfers contr
uprobes: SRCU-protect uretprobe lifetime (with timeout)
Avoid taking refcount on uprobe in prepare_uretprobe(), instead take uretprobe-specific SRCU lock and keep it active as kernel transfers control back to user space.
Given we can't rely on user space returning from traced function within reasonable time period, we need to make sure not to keep SRCU lock active for too long, though. To that effect, we employ a timer callback which is meant to terminate SRCU lock region after predefined timeout (currently set to 100ms), and instead transfer underlying struct uprobe's lifetime protection to refcounting.
This fallback to less scalable refcounting after 100ms is a fine tradeoff from uretprobe's scalability and performance perspective, because uretprobing *long running* user functions inherently doesn't run into scalability issues (there is just not enough frequency to cause noticeable issues with either performance or scalability).
The overall trick is in ensuring synchronization between current thread and timer's callback fired on some other thread. To cope with that with minimal logic complications, we add hprobe wrapper which is used to contain all the synchronization related issues behind a small number of basic helpers: hprobe_expire() for "downgrading" uprobe from SRCU-protected state to refcounted state, and a hprobe_consume() and hprobe_finalize() pair of single-use consuming helpers. Other than that, whatever current thread's logic is there stays the same, as timer thread cannot modify return_instance state (or add new/remove old return_instances). It only takes care of SRCU unlock and uprobe refcounting, which is hidden from the higher-level uretprobe handling logic.
We use atomic xchg() in hprobe_consume(), which is called from performance critical handle_uretprobe_chain() function run in the current context. When uncontended, this xchg() doesn't seem to hurt performance as there are no other competing CPUs fighting for the same cache line. We also mark struct return_instance as ____cacheline_aligned to ensure no false sharing can happen.
Another technical moment. We need to make sure that the list of return instances can be safely traversed under RCU from timer callback, so we delay return_instance freeing with kfree_rcu() and make sure that list modifications use RCU-aware operations.
Also, given SRCU lock survives transition from kernel to user space and back we need to use lower-level __srcu_read_lock() and __srcu_read_unlock() to avoid lockdep complaining.
Just to give an impression of a kind of performance improvements this change brings, below are benchmarking results with and without these SRCU changes, assuming other uprobe optimizations (mainly RCU Tasks Trace for entry uprobes, lockless RB-tree lookup, and lockless VMA to uprobe lookup) are left intact:
WITHOUT SRCU for uretprobes =========================== uretprobe-nop ( 1 cpus): 2.197 ± 0.002M/s ( 2.197M/s/cpu) uretprobe-nop ( 2 cpus): 3.325 ± 0.001M/s ( 1.662M/s/cpu) uretprobe-nop ( 3 cpus): 4.129 ± 0.002M/s ( 1.376M/s/cpu) uretprobe-nop ( 4 cpus): 6.180 ± 0.003M/s ( 1.545M/s/cpu) uretprobe-nop ( 8 cpus): 7.323 ± 0.005M/s ( 0.915M/s/cpu) uretprobe-nop (16 cpus): 6.943 ± 0.005M/s ( 0.434M/s/cpu) uretprobe-nop (32 cpus): 5.931 ± 0.014M/s ( 0.185M/s/cpu) uretprobe-nop (64 cpus): 5.145 ± 0.003M/s ( 0.080M/s/cpu) uretprobe-nop (80 cpus): 4.925 ± 0.005M/s ( 0.062M/s/cpu)
WITH SRCU for uretprobes ======================== uretprobe-nop ( 1 cpus): 1.968 ± 0.001M/s ( 1.968M/s/cpu) uretprobe-nop ( 2 cpus): 3.739 ± 0.003M/s ( 1.869M/s/cpu) uretprobe-nop ( 3 cpus): 5.616 ± 0.003M/s ( 1.872M/s/cpu) uretprobe-nop ( 4 cpus): 7.286 ± 0.002M/s ( 1.822M/s/cpu) uretprobe-nop ( 8 cpus): 13.657 ± 0.007M/s ( 1.707M/s/cpu) uretprobe-nop (32 cpus): 45.305 ± 0.066M/s ( 1.416M/s/cpu) uretprobe-nop (64 cpus): 42.390 ± 0.922M/s ( 0.662M/s/cpu) uretprobe-nop (80 cpus): 47.554 ± 2.411M/s ( 0.594M/s/cpu)
Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc4 |
|
| #
4d756095 |
| 18-Oct-2024 |
Jiri Olsa <[email protected]> |
uprobe: Add support for session consumer
This change allows the uprobe consumer to behave as session which means that 'handler' and 'ret_handler' callbacks are connected in a way that allows to:
uprobe: Add support for session consumer
This change allows the uprobe consumer to behave as session which means that 'handler' and 'ret_handler' callbacks are connected in a way that allows to:
- control execution of 'ret_handler' from 'handler' callback - share data between 'handler' and 'ret_handler' callbacks
The session concept fits to our common use case where we do filtering on entry uprobe and based on the result we decide to run the return uprobe (or not).
It's also convenient to share the data between session callbacks.
To achive this we are adding new return value the uprobe consumer can return from 'handler' callback:
UPROBE_HANDLER_IGNORE - Ignore 'ret_handler' callback for this consumer.
And store cookie and pass it to 'ret_handler' when consumer has both 'handler' and 'ret_handler' callbacks defined.
We store shared data in the return_consumer object array as part of the return_instance object. This way the handle_uretprobe_chain can find related return_consumer and its shared data.
We also store entry handler return value, for cases when there are multiple consumers on single uprobe and some of them are ignored and some of them not, in which case the return probe gets installed and we need to have a way to find out which consumer needs to be ignored.
The tricky part is when consumer is registered 'after' the uprobe entry handler is hit. In such case this consumer's 'ret_handler' gets executed as well, but it won't have the proper data pointer set, so we can filter it out.
Suggested-by: Oleg Nesterov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
da09a9e0 |
| 18-Oct-2024 |
Jiri Olsa <[email protected]> |
uprobe: Add data pointer to consumer handlers
Adding data pointer to both entry and exit consumer handlers and all its users. The functionality itself is coming in following change.
Signed-off-by:
uprobe: Add data pointer to consumer handlers
Adding data pointer to both entry and exit consumer handlers and all its users. The functionality itself is coming in following change.
Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
ed8d5b0c |
| 11-Sep-2024 |
Oleg Nesterov <[email protected]> |
Revert "uprobes: use vm_special_mapping close() functionality"
This reverts commit 08e28de1160a712724268fd33d77b32f1bc84d1c.
A malicious application can munmap() its "[uprobes]" vma and in this cas
Revert "uprobes: use vm_special_mapping close() functionality"
This reverts commit 08e28de1160a712724268fd33d77b32f1bc84d1c.
A malicious application can munmap() its "[uprobes]" vma and in this case xol_mapping.close == uprobe_clear_state() will free the memory which can be used by another thread, or the same thread when it hits the uprobe bp afterwards.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sven Schnelle <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7 |
|
| #
08e28de1 |
| 03-Sep-2024 |
Sven Schnelle <[email protected]> |
uprobes: use vm_special_mapping close() functionality
The following KASAN splat was shown:
[ 44.505448] ==================================================================
uprobes: use vm_special_mapping close() functionality
The following KASAN splat was shown:
[ 44.505448] ================================================================== 20:37:27 [3421/145075] [ 44.505455] BUG: KASAN: slab-use-after-free in special_mapping_close+0x9c/0xc8 [ 44.505471] Read of size 8 at addr 00000000868dac48 by task sh/1384 [ 44.505479] [ 44.505486] CPU: 51 UID: 0 PID: 1384 Comm: sh Not tainted 6.11.0-rc6-next-20240902-dirty #1496 [ 44.505503] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0) [ 44.505508] Call Trace: [ 44.505511] [<000b0324d2f78080>] dump_stack_lvl+0xd0/0x108 [ 44.505521] [<000b0324d2f5435c>] print_address_description.constprop.0+0x34/0x2e0 [ 44.505529] [<000b0324d2f5464c>] print_report+0x44/0x138 [ 44.505536] [<000b0324d1383192>] kasan_report+0xc2/0x140 [ 44.505543] [<000b0324d2f52904>] special_mapping_close+0x9c/0xc8 [ 44.505550] [<000b0324d12c7978>] remove_vma+0x78/0x120 [ 44.505557] [<000b0324d128a2c6>] exit_mmap+0x326/0x750 [ 44.505563] [<000b0324d0ba655a>] __mmput+0x9a/0x370 [ 44.505570] [<000b0324d0bbfbe0>] exit_mm+0x240/0x340 [ 44.505575] [<000b0324d0bc0228>] do_exit+0x548/0xd70 [ 44.505580] [<000b0324d0bc1102>] do_group_exit+0x132/0x390 [ 44.505586] [<000b0324d0bc13b6>] __s390x_sys_exit_group+0x56/0x60 [ 44.505592] [<000b0324d0adcbd6>] do_syscall+0x2f6/0x430 [ 44.505599] [<000b0324d2f78434>] __do_syscall+0xa4/0x170 [ 44.505606] [<000b0324d2f9454c>] system_call+0x74/0x98 [ 44.505614] [ 44.505616] Allocated by task 1384: [ 44.505621] kasan_save_stack+0x40/0x70 [ 44.505630] kasan_save_track+0x28/0x40 [ 44.505636] __kasan_kmalloc+0xa0/0xc0 [ 44.505642] __create_xol_area+0xfa/0x410 [ 44.505648] get_xol_area+0xb0/0xf0 [ 44.505652] uprobe_notify_resume+0x27a/0x470 [ 44.505657] irqentry_exit_to_user_mode+0x15e/0x1d0 [ 44.505664] pgm_check_handler+0x122/0x170 [ 44.505670] [ 44.505672] Freed by task 1384: [ 44.505676] kasan_save_stack+0x40/0x70 [ 44.505682] kasan_save_track+0x28/0x40 [ 44.505687] kasan_save_free_info+0x4a/0x70 [ 44.505693] __kasan_slab_free+0x5a/0x70 [ 44.505698] kfree+0xe8/0x3f0 [ 44.505704] __mmput+0x20/0x370 [ 44.505709] exit_mm+0x240/0x340 [ 44.505713] do_exit+0x548/0xd70 [ 44.505718] do_group_exit+0x132/0x390 [ 44.505722] __s390x_sys_exit_group+0x56/0x60 [ 44.505727] do_syscall+0x2f6/0x430 [ 44.505732] __do_syscall+0xa4/0x170 [ 44.505738] system_call+0x74/0x98
The problem is that uprobe_clear_state() kfree's struct xol_area, which contains struct vm_special_mapping *xol_mapping. This one is passed to _install_special_mapping() in xol_add_vma(). __mput reads:
static inline void __mmput(struct mm_struct *mm) { VM_BUG_ON(atomic_read(&mm->mm_users));
uprobe_clear_state(mm); exit_aio(mm); ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ exit_mmap(mm); ... }
So uprobe_clear_state() in the beginning free's the memory area containing the vm_special_mapping data, but exit_mmap() uses this address later via vma->vm_private_data (which was set in _install_special_mapping().
Fix this by moving uprobe_clear_state() to uprobes.c and use it as close() callback.
[[email protected]: remove unneeded condition] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 223febc6e557 ("mm: add optional close() to struct vm_special_mapping") Signed-off-by: Sven Schnelle <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
04b01625 |
| 03-Sep-2024 |
Peter Zijlstra <[email protected]> |
perf/uprobe: split uprobe_unregister()
With uprobe_unregister() having grown a synchronize_srcu(), it becomes fairly slow to call. Esp. since both users of this API call it in a loop.
Peel off the
perf/uprobe: split uprobe_unregister()
With uprobe_unregister() having grown a synchronize_srcu(), it becomes fairly slow to call. Esp. since both users of this API call it in a loop.
Peel off the sync_srcu() and do it once, after the loop.
We also need to add uprobe_unregister_sync() into uprobe_register()'s error handling path, as we need to be careful about returning to the caller before we have a guarantee that partially attached consumer won't be called anymore. This is an unlikely slow path and this should be totally fine to be slow in the case of a failed attach.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: "Peter Zijlstra (Intel)" <[email protected]> Co-developed-by: Andrii Nakryiko <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
cc01bd04 |
| 03-Sep-2024 |
Andrii Nakryiko <[email protected]> |
uprobes: travers uprobe's consumer list locklessly under SRCU protection
uprobe->register_rwsem is one of a few big bottlenecks to scalability of uprobes, so we need to get rid of it to improve upro
uprobes: travers uprobe's consumer list locklessly under SRCU protection
uprobe->register_rwsem is one of a few big bottlenecks to scalability of uprobes, so we need to get rid of it to improve uprobe performance and multi-CPU scalability.
First, we turn uprobe's consumer list to a typical doubly-linked list and utilize existing RCU-aware helpers for traversing such lists, as well as adding and removing elements from it.
For entry uprobes we already have SRCU protection active since before uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe won't go away from under us, but we add SRCU protection around consumer list traversal.
Lastly, to keep handler_chain()'s UPROBE_HANDLER_REMOVE handling simple, we remember whether any removal was requested during handler calls, but then we double-check the decision under a proper register_rwsem using consumers' filter callbacks. Handler removal is very rare, so this extra lock won't hurt performance, overall, but we also avoid the need for any extra protection (e.g., seqcount locks).
Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
59da880a |
| 03-Sep-2024 |
Andrii Nakryiko <[email protected]> |
uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks
It serves no purpose beyond adding unnecessray argument passed to the filter callback. Just get rid of it, no one is actually us
uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks
It serves no purpose beyond adding unnecessray argument passed to the filter callback. Just get rid of it, no one is actually using it.
Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
3c83a9ad |
| 01-Aug-2024 |
Oleg Nesterov <[email protected]> |
uprobes: make uprobe_register() return struct uprobe *
This way uprobe_unregister() and uprobe_apply() can use "struct uprobe *" rather than inode + offset. This simplifies the code and allows to av
uprobes: make uprobe_register() return struct uprobe *
This way uprobe_unregister() and uprobe_apply() can use "struct uprobe *" rather than inode + offset. This simplifies the code and allows to avoid the unnecessary find_uprobe() + put_uprobe() in these functions.
TODO: uprobe_unregister() still needs get_uprobe/put_uprobe to ensure that this uprobe can't be freed before up_write(&uprobe->register_rwsem).
Co-developed-by: Andrii Nakryiko <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
e04332eb |
| 01-Aug-2024 |
Oleg Nesterov <[email protected]> |
uprobes: kill uprobe_register_refctr()
It doesn't make any sense to have 2 versions of _register(). Note that trace_uprobe_enable(), the only user of uprobe_register(), doesn't need to check tu->ref
uprobes: kill uprobe_register_refctr()
It doesn't make any sense to have 2 versions of _register(). Note that trace_uprobe_enable(), the only user of uprobe_register(), doesn't need to check tu->ref_ctr_offset to decide which one should be used, it could safely pass ref_ctr_offset == 0 to uprobe_register_refctr().
Add this argument to uprobe_register(), update the callers, and kill uprobe_register_refctr().
Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
cfa7f3d2 |
| 29-Jul-2024 |
Andrii Nakryiko <[email protected]> |
perf,x86: avoid missing caller address in stack traces captured in uprobe
When tracing user functions with uprobe functionality, it's common to install the probe (e.g., a BPF program) at the first i
perf,x86: avoid missing caller address in stack traces captured in uprobe
When tracing user functions with uprobe functionality, it's common to install the probe (e.g., a BPF program) at the first instruction of the function. This is often going to be `push %rbp` instruction in function preamble, which means that within that function frame pointer hasn't been established yet. This leads to consistently missing an actual caller of the traced function, because perf_callchain_user() only records current IP (capturing traced function) and then following frame pointer chain (which would be caller's frame, containing the address of caller's caller).
So when we have target_1 -> target_2 -> target_3 call chain and we are tracing an entry to target_3, captured stack trace will report target_1 -> target_3 call chain, which is wrong and confusing.
This patch proposes a x86-64-specific heuristic to detect `push %rbp` (`push %ebp` on 32-bit architecture) instruction being traced. Given entire kernel implementation of user space stack trace capturing works under assumption that user space code was compiled with frame pointer register (%rbp/%ebp) preservation, it seems pretty reasonable to use this instruction as a strong indicator that this is the entry to the function. In that case, return address is still pointed to by %rsp/%esp, so we fetch it and add to stack trace before proceeding to unwind the rest using frame pointer-based logic.
We also check for `endbr64` (for 64-bit modes) as another common pattern for function entry, as suggested by Josh Poimboeuf. Even if we get this wrong sometimes for uprobes attached not at the function entry, it's OK because stack trace will still be overall meaningful, just with one extra bogus entry. If we don't detect this, we end up with guaranteed to be missing caller function entry in the stack trace, which is worse overall.
Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4 |
|
| #
ff474a78 |
| 11-Jun-2024 |
Jiri Olsa <[email protected]> |
uprobe: Add uretprobe syscall to speed up return probe
Adding uretprobe syscall instead of trap to speed up return probe.
At the moment the uretprobe setup/path is:
- install entry uprobe
- w
uprobe: Add uretprobe syscall to speed up return probe
Adding uretprobe syscall instead of trap to speed up return probe.
At the moment the uretprobe setup/path is:
- install entry uprobe
- when the uprobe is hit, it overwrites probed function's return address on stack with address of the trampoline that contains breakpoint instruction
- the breakpoint trap code handles the uretprobe consumers execution and jumps back to original return address
This patch replaces the above trampoline's breakpoint instruction with new ureprobe syscall call. This syscall does exactly the same job as the trap with some more extra work:
- syscall trampoline must save original value for rax/r11/rcx registers on stack - rax is set to syscall number and r11/rcx are changed and used by syscall instruction
- the syscall code reads the original values of those registers and restore those values in task's pt_regs area
- only caller from trampoline exposed in '[uprobes]' is allowed, the process will receive SIGILL signal otherwise
Even with some extra work, using the uretprobes syscall shows speed improvement (compared to using standard breakpoint):
On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)
current: uretprobe-nop : 1.498 ± 0.000M/s uretprobe-push : 1.448 ± 0.001M/s uretprobe-ret : 0.816 ± 0.001M/s
with the fix: uretprobe-nop : 1.969 ± 0.002M/s < 31% speed up uretprobe-push : 1.910 ± 0.000M/s < 31% speed up uretprobe-ret : 0.934 ± 0.000M/s < 14% speed up
On Amd (AMD Ryzen 7 5700U)
current: uretprobe-nop : 0.778 ± 0.001M/s uretprobe-push : 0.744 ± 0.001M/s uretprobe-ret : 0.540 ± 0.001M/s
with the fix: uretprobe-nop : 0.860 ± 0.001M/s < 10% speed up uretprobe-push : 0.818 ± 0.001M/s < 10% speed up uretprobe-ret : 0.578 ± 0.000M/s < 7% speed up
The performance test spawns a thread that runs loop which triggers uprobe with attached bpf program that increments the counter that gets printed in results above.
The uprobe (and uretprobe) kind is determined by which instruction is being patched with breakpoint instruction. That's also important for uretprobes, because uprobe is installed for each uretprobe.
The performance test is part of bpf selftests: tools/testing/selftests/bpf/run_bench_uprobes.sh
Note at the moment uretprobe syscall is supported only for native 64-bit process, compat process still uses standard breakpoint.
Note that when shadow stack is enabled the uretprobe syscall returns via iret, which is slower than return via sysret, but won't cause the shadow stack violation.
Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Andrii Nakryiko <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3 |
|
| #
1a59d1b8 |
| 27-May-2019 |
Thomas Gleixner <[email protected]> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1334 file(s).
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Richard Fontana <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7 |
|
| #
aad42dd4 |
| 26-Apr-2019 |
Nadav Amit <[email protected]> |
uprobes: Initialize uprobes earlier
In order to have a separate address space for text poking, we need to duplicate init_mm early during start_kernel(). This, however, introduces a problem since upr
uprobes: Initialize uprobes earlier
In order to have a separate address space for text poking, we need to duplicate init_mm early during start_kernel(). This, however, introduces a problem since uprobes functions are called from dup_mmap(), but uprobes is still not initialized in this early stage.
Since uprobes initialization is necassary for fork, and since all the dependant initialization has been done when fork is initialized (percpu and vmalloc), move uprobes initialization to fork_init(). It does not seem uprobes introduces any security problem for the poking_mm.
Crash and burn if uprobes initialization fails, similarly to other early initializations. Change the init_probes() name to probes_init() to match other early initialization functions name convention.
Reported-by: kernel test robot <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1 |
|
| #
1cc33161 |
| 20-Aug-2018 |
Ravi Bangoria <[email protected]> |
uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQ
uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker:
if (reference_counter > 0) { Execute marker instructions; }
Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing.
Implement the reference counter logic in core uprobe. User will be able to use it from trace_uprobe as well as from kernel module. New trace_uprobe definition with reference counter will now be:
<path>:<offset>[(ref_ctr_offset)]
where ref_ctr_offset is an optional field. For kernel module, new variant of uprobe_register() has been introduced:
uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
No new variant for uprobe_unregister() because it's assumed to have only one reference counter for one uprobe.
[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
Note: 'reference counter' is called as 'semaphore' in original Dtrace (or Systemtap, bcc and even in ELF) documentation and code. But the term 'semaphore' is misleading in this context. This is just a counter used to hold number of tracers tracing on a marker. This is not really used for any synchronization. So we are calling it a 'reference counter' in kernel / perf code.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Masami Hiramatsu <[email protected]> [Only trace_uprobe.c] Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Song Liu <[email protected]> Tested-by: Song Liu <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v4.18 |
|
| #
6d43743e |
| 09-Aug-2018 |
Ravi Bangoria <[email protected]> |
Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Add addition argument 'arch_uprobe' to uprobe_write_opcode(). We need this in later set of patches.
Link: http://lkml.kernel.org/r/2
Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Add addition argument 'arch_uprobe' to uprobe_write_opcode(). We need this in later set of patches.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Song Liu <[email protected]> Acked-by: Srikar Dronamraju <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3, v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3, v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8, v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3, v4.9-rc2, v4.9-rc1 |
|
| #
e8f4aa60 |
| 13-Oct-2016 |
Allen Pais <[email protected]> |
sparc64:Support User Probes for sparc
Signed-off-by: Eric Saint Etienne <[email protected]> Signed-off-by: Allen Pais <[email protected]> Signed-off-by: David S. Miller <davem@daveml
sparc64:Support User Probes for sparc
Signed-off-by: Eric Saint Etienne <[email protected]> Signed-off-by: Allen Pais <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.8, v4.8-rc8, v4.8-rc7, v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2, v4.8-rc1, v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3, v4.7-rc2, v4.7-rc1, v4.6, v4.6-rc7, v4.6-rc6, v4.6-rc5, v4.6-rc4, v4.6-rc3, v4.6-rc2, v4.6-rc1, v4.5, v4.5-rc7, v4.5-rc6, v4.5-rc5, v4.5-rc4, v4.5-rc3, v4.5-rc2, v4.5-rc1, v4.4, v4.4-rc8, v4.4-rc7, v4.4-rc6, v4.4-rc5, v4.4-rc4, v4.4-rc3, v4.4-rc2 |
|
| #
90eec103 |
| 16-Nov-2015 |
Peter Zijlstra <[email protected]> |
treewide: Remove old email address
There were still a number of references to my old Red Hat email address in the kernel source. Remove these while keeping the Red Hat copyright notices intact.
Sig
treewide: Remove old email address
There were still a number of references to my old Red Hat email address in the kernel source. Remove these while keeping the Red Hat copyright notices intact.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.4-rc1, v4.3, v4.3-rc7, v4.3-rc6, v4.3-rc5, v4.3-rc4, v4.3-rc3, v4.3-rc2, v4.3-rc1, v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6, v4.2-rc5, v4.2-rc4 |
|
| #
db087ef6 |
| 21-Jul-2015 |
Oleg Nesterov <[email protected]> |
uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever
The previous change documents that cleanup_return_instances() can't always detect the dead frames, the stack can grow. But there
uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever
The previous change documents that cleanup_return_instances() can't always detect the dead frames, the stack can grow. But there is one special case which imho worth fixing: arch_uretprobe_is_alive() can return true when the stack didn't actually grow, but the next "call" insn uses the already invalidated frame.
Test-case:
#include <stdio.h> #include <setjmp.h>
jmp_buf jmp; int nr = 1024;
void func_2(void) { if (--nr == 0) return; longjmp(jmp, 1); }
void func_1(void) { setjmp(jmp); func_2(); }
int main(void) { func_1(); return 0; }
If you ret-probe func_1() and func_2() prepare_uretprobe() hits the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not reported.
When we know that the new call is not chained, we can do the more strict check. In this case "sp" points to the new ret-addr, so every frame which uses the same "sp" must be dead. The only complication is that arch_uretprobe_is_alive() needs to know was it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum and change prepare_uretprobe() to pass RP_CHECK_CALL only if !chained.
Note: arch_uretprobe_is_alive() could also re-read *sp and check if this word is still trampoline_vaddr. This could obviously improve the logic, but I would like to avoid another copy_from_user() especially in the case when we can't avoid the false "alive == T" positives.
Tested-by: Pratyush Anand <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Srikar Dronamraju <[email protected]> Acked-by: Anton Arapov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
86dcb702 |
| 21-Jul-2015 |
Oleg Nesterov <[email protected]> |
uprobes: Add the "enum rp_check ctx" arg to arch_uretprobe_is_alive()
arch/x86 doesn't care (so far), but as Pratyush Anand pointed out other architectures might want why arch_uretprobe_is_alive() w
uprobes: Add the "enum rp_check ctx" arg to arch_uretprobe_is_alive()
arch/x86 doesn't care (so far), but as Pratyush Anand pointed out other architectures might want why arch_uretprobe_is_alive() was called and use different checks depending on the context. Add the new argument to distinguish 2 callers.
Tested-by: Pratyush Anand <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Srikar Dronamraju <[email protected]> Acked-by: Anton Arapov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|