|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
7f81f27b |
| 20-Mar-2025 |
Tengda Wu <[email protected]> |
tracing: Fix use-after-free in print_graph_function_flags during tracer switching
Kairui reported a UAF issue in print_graph_function_flags() during ftrace stress testing [1]. This issue can be repr
tracing: Fix use-after-free in print_graph_function_flags during tracer switching
Kairui reported a UAF issue in print_graph_function_flags() during ftrace stress testing [1]. This issue can be reproduced if puting a 'mdelay(10)' after 'mutex_unlock(&trace_types_lock)' in s_start(), and executing the following script:
$ echo function_graph > current_tracer $ cat trace > /dev/null & $ sleep 5 # Ensure the 'cat' reaches the 'mdelay(10)' point $ echo timerlat > current_tracer
The root cause lies in the two calls to print_graph_function_flags within print_trace_line during each s_show():
* One through 'iter->trace->print_line()'; * Another through 'event->funcs->trace()', which is hidden in print_trace_fmt() before print_trace_line returns.
Tracer switching only updates the former, while the latter continues to use the print_line function of the old tracer, which in the script above is print_graph_function_flags.
Moreover, when switching from the 'function_graph' tracer to the 'timerlat' tracer, s_start only calls graph_trace_close of the 'function_graph' tracer to free 'iter->private', but does not set it to NULL. This provides an opportunity for 'event->funcs->trace()' to use an invalid 'iter->private'.
To fix this issue, set 'iter->private' to NULL immediately after freeing it in graph_trace_close(), ensuring that an invalid pointer is not passed to other tracers. Additionally, clean up the unnecessary 'iter->private = NULL' during each 'cat trace' when using wakeup and irqsoff tracers.
[1] https://lore.kernel.org/all/[email protected]/
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Zheng Yejian <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: eecb91b9f98d ("tracing: Fix memleak due to race between current_tracer and trace") Closes: https://lore.kernel.org/all/CAMgjq7BW79KDSCyp+tZHjShSzHsScSiJxn5ffskp-QzVM06fxw@mail.gmail.com/ Reported-by: Kairui Song <[email protected]> Signed-off-by: Tengda Wu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
76fe0337 |
| 27-Feb-2025 |
Sven Schnelle <[email protected]> |
ftrace: Add arguments to function tracer
Wire up the code to print function arguments in the function tracer. This functionality can be enabled/disabled during runtime with options/func-args.
ftrace: Add arguments to function tracer
Wire up the code to print function arguments in the function tracer. This functionality can be enabled/disabled during runtime with options/func-args.
ping-689 [004] b.... 77.170220: dummy_xmit(skb = 0x82904800, dev = 0x882d0000) <-dev_hard_start_xmit
Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Guo Ren <[email protected]> Cc: Donglin Peng <[email protected]> Cc: Zheng Yejian <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Co-developed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
66611c04 |
| 22-Jan-2025 |
Steven Rostedt <[email protected]> |
fgraph: Remove calltime and rettime from generic operations
The function graph infrastructure is now generic so that kretprobes, fprobes and BPF can use it. But there is still some leftover logic th
fgraph: Remove calltime and rettime from generic operations
The function graph infrastructure is now generic so that kretprobes, fprobes and BPF can use it. But there is still some leftover logic that only the function graph tracer itself uses. This is the calculation of the calltime and return time of the functions. The calculation of the calltime has been moved into the function graph tracer and those users that need it so that it doesn't cause overhead to the other users. But the return function timestamp was still called.
Instead of just moving the taking of the timestamp into the function graph trace remove the calltime and rettime completely from the ftrace_graph_ret structure. Instead, move it into the function graph return entry event structure and this also moves all the calltime and rettime logic out of the generic fgraph.c code and into the tracing code that uses it.
This has been reported to decrease the overhead by ~27%.
Link: https://lore.kernel.org/all/Z3aSuql3fnXMVMoM@krava/ Link: https://lore.kernel.org/all/173665959558.1629214.16724136597211810729.stgit@devnote2/
Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Reported-by: Jiri Olsa <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13 |
|
| #
a485ea9e |
| 13-Jan-2025 |
Steven Rostedt <[email protected]> |
tracing: Fix irqsoff and wakeup latency tracers when using function graph
The function graph tracer has become generic so that kretprobes and BPF can use it along with function graph tracing itself.
tracing: Fix irqsoff and wakeup latency tracers when using function graph
The function graph tracer has become generic so that kretprobes and BPF can use it along with function graph tracing itself. Some of the infrastructure was specific for function graph tracing such as recording the calltime and return time of the functions. Calling the clock code on a high volume function does add overhead. The calculation of the calltime was removed from the generic code and placed into the function graph tracer itself so that the other users did not incur this overhead as they did not need that timestamp.
The calltime field was still kept in the generic return entry structure and the function graph return entry callback filled it as that structure was passed to other code.
But this broke both irqsoff and wakeup latency tracer as they still depended on the trace structure containing the calltime when the option display-graph is set as it used some of those same functions that the function graph tracer used. But now the calltime was not set and was just zero. This caused the calculation of the function time to be the absolute value of the return timestamp and not the length of the function.
# cd /sys/kernel/tracing # echo 1 > options/display-graph # echo irqsoff > current_tracer
The tracers went from:
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 4) <idle>-0 | d..1. | 0.000 us | irqentry_enter(); 3 us | 4) <idle>-0 | d..2. | | irq_enter_rcu() { 4 us | 4) <idle>-0 | d..2. | 0.431 us | preempt_count_add(); 5 us | 4) <idle>-0 | d.h2. | | tick_irq_enter() { 5 us | 4) <idle>-0 | d.h2. | 0.433 us | tick_check_oneshot_broadcast_this_cpu(); 6 us | 4) <idle>-0 | d.h2. | 2.426 us | ktime_get(); 9 us | 4) <idle>-0 | d.h2. | | tick_nohz_stop_idle() { 10 us | 4) <idle>-0 | d.h2. | 0.398 us | nr_iowait_cpu(); 11 us | 4) <idle>-0 | d.h1. | 1.903 us | } 11 us | 4) <idle>-0 | d.h2. | | tick_do_update_jiffies64() { 12 us | 4) <idle>-0 | d.h2. | | _raw_spin_lock() { 12 us | 4) <idle>-0 | d.h2. | 0.360 us | preempt_count_add(); 13 us | 4) <idle>-0 | d.h3. | 0.354 us | do_raw_spin_lock(); 14 us | 4) <idle>-0 | d.h2. | 2.207 us | } 15 us | 4) <idle>-0 | d.h3. | 0.428 us | calc_global_load(); 16 us | 4) <idle>-0 | d.h3. | | _raw_spin_unlock() { 16 us | 4) <idle>-0 | d.h3. | 0.380 us | do_raw_spin_unlock(); 17 us | 4) <idle>-0 | d.h3. | 0.334 us | preempt_count_sub(); 18 us | 4) <idle>-0 | d.h1. | 1.768 us | } 18 us | 4) <idle>-0 | d.h2. | | update_wall_time() { [..]
To:
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 5) <idle>-0 | d.s2. | 0.000 us | _raw_spin_lock_irqsave(); 0 us | 5) <idle>-0 | d.s3. | 312159583 us | preempt_count_add(); 2 us | 5) <idle>-0 | d.s4. | 312159585 us | do_raw_spin_lock(); 3 us | 5) <idle>-0 | d.s4. | | _raw_spin_unlock() { 3 us | 5) <idle>-0 | d.s4. | 312159586 us | do_raw_spin_unlock(); 4 us | 5) <idle>-0 | d.s4. | 312159587 us | preempt_count_sub(); 4 us | 5) <idle>-0 | d.s2. | 312159587 us | } 5 us | 5) <idle>-0 | d.s3. | | _raw_spin_lock() { 5 us | 5) <idle>-0 | d.s3. | 312159588 us | preempt_count_add(); 6 us | 5) <idle>-0 | d.s4. | 312159589 us | do_raw_spin_lock(); 7 us | 5) <idle>-0 | d.s3. | 312159590 us | } 8 us | 5) <idle>-0 | d.s4. | 312159591 us | calc_wheel_index(); 9 us | 5) <idle>-0 | d.s4. | | enqueue_timer() { 9 us | 5) <idle>-0 | d.s4. | | wake_up_nohz_cpu() { 11 us | 5) <idle>-0 | d.s4. | | native_smp_send_reschedule() { 11 us | 5) <idle>-0 | d.s4. | 312171987 us | default_send_IPI_single_phys(); 12408 us | 5) <idle>-0 | d.s3. | 312171990 us | } 12408 us | 5) <idle>-0 | d.s3. | 312171991 us | } 12409 us | 5) <idle>-0 | d.s3. | 312171991 us | }
Where the calculation of the time for each function was the return time minus zero and not the time of when the function returned.
Have these tracers also save the calltime in the fgraph data section and retrieve it again on the return to get the correct timings again.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Mark Rutland <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: f1f36e22bee9 ("ftrace: Have calltime be saved in the fgraph storage") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5 |
|
| #
2ca8c112 |
| 26-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
fgraph: Pass ftrace_regs to retfunc
Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (includ
fgraph: Pass ftrace_regs to retfunc
Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (including return address) via this ftrace_regs.
Cc: Alexei Starovoitov <[email protected]> Cc: Florent Revest <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: bpf <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alan Maguire <[email protected]> Cc: Mark Rutland <[email protected]> Link: https://lore.kernel.org/173518992972.391279.14055405490327765506.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
41705c42 |
| 26-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
fgraph: Pass ftrace_regs to entryfunc
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (in
fgraph: Pass ftrace_regs to entryfunc
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (including return address) via this ftrace_regs.
Note that the ftrace_regs can be NULL when the arch does NOT define: HAVE_DYNAMIC_FTRACE_WITH_ARGS or HAVE_DYNAMIC_FTRACE_WITH_REGS. More specifically, if HAVE_DYNAMIC_FTRACE_WITH_REGS is defined but not the HAVE_DYNAMIC_FTRACE_WITH_ARGS, and the ftrace ops used to register the function callback does not set FTRACE_OPS_FL_SAVE_REGS. In this case, ftrace_regs can be NULL in user callback.
Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Florent Revest <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: bpf <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alan Maguire <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Naveen N Rao <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/173518990044.391279.17406984900626078579.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
0a6c61bc |
| 10-Oct-2024 |
Masami Hiramatsu (Google) <[email protected]> |
fgraph: Simplify return address printing in function graph tracer
Simplify return address printing in the function graph tracer by removing fgraph_extras. Since this feature is only used by the func
fgraph: Simplify return address printing in function graph tracer
Simplify return address printing in the function graph tracer by removing fgraph_extras. Since this feature is only used by the function graph tracer and the feature flags can directly accessible from the function graph tracer, fgraph_extras can be removed from the fgraph callback.
Cc: Donglin Peng <[email protected]> Link: https://lore.kernel.org/172857234900.270774.15378354017601069781.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
21e92806 |
| 15-Sep-2024 |
Donglin Peng <[email protected]> |
function_graph: Support recording and printing the function return address
When using function_graph tracer to analyze the flow of kernel function execution, it is often necessary to quickly locate
function_graph: Support recording and printing the function return address
When using function_graph tracer to analyze the flow of kernel function execution, it is often necessary to quickly locate the exact line of code where the call occurs. While this may be easy at times, it can be more time-consuming when some functions are inlined or the flow is too long.
This feature aims to simplify the process by recording the return address of traced funcions and printing it when outputing trace logs.
To enhance human readability, the prefix 'ret=' is used for the kernel return value, while '<-' serves as the prefix for the return address in trace logs to make it look more like the function tracer.
A new trace option named 'funcgraph-retaddr' has been introduced, and the existing option 'sym-addr' can be used to control the format of the return address.
See below logs with both funcgraph-retval and funcgraph-retaddr enabled.
0) | load_elf_binary() { /* <-bprm_execve+0x249/0x600 */ 0) | load_elf_phdrs() { /* <-load_elf_binary+0x84/0x1730 */ 0) | __kmalloc_noprof() { /* <-load_elf_phdrs+0x4a/0xb0 */ 0) 3.657 us | __cond_resched(); /* <-__kmalloc_noprof+0x28c/0x390 ret=0x0 */ 0) + 24.335 us | } /* __kmalloc_noprof ret=0xffff8882007f3000 */ 0) | kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */ 0) | rw_verify_area() { /* <-kernel_read+0x2b/0x50 */ 0) | security_file_permission() { /* <-kernel_read+0x2b/0x50 */ 0) | selinux_file_permission() { /* <-security_file_permission+0x26/0x40 */ 0) | __inode_security_revalidate() { /* <-selinux_file_permission+0x6d/0x140 */ 0) 2.034 us | __cond_resched(); /* <-__inode_security_revalidate+0x5f/0x80 ret=0x0 */ 0) 6.602 us | } /* __inode_security_revalidate ret=0x0 */ 0) 2.214 us | avc_policy_seqno(); /* <-selinux_file_permission+0x107/0x140 ret=0x0 */ 0) + 16.670 us | } /* selinux_file_permission ret=0x0 */ 0) + 20.809 us | } /* security_file_permission ret=0x0 */ 0) + 25.217 us | } /* rw_verify_area ret=0x0 */ 0) | __kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */ 0) | ext4_file_read_iter() { /* <-__kernel_read+0x160/0x2e0 */
Then, we can use the faddr2line to locate the source code, for example:
$ ./scripts/faddr2line ./vmlinux load_elf_phdrs+0x6c/0xb0 load_elf_phdrs+0x6c/0xb0: elf_read at fs/binfmt_elf.c:471 (inlined by) load_elf_phdrs at fs/binfmt_elf.c:531
Link: https://lore.kernel.org/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Donglin Peng <[email protected]> [ Rebased to handle text_delta offsets ] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
12117f33 |
| 03-Jun-2024 |
Steven Rostedt (VMware) <[email protected]> |
function_graph: Move set_graph_function tests to shadow stack global var
The use of the task->trace_recursion for the logic used for the set_graph_function was a bit of an abuse of that variable. No
function_graph: Move set_graph_function tests to shadow stack global var
The use of the task->trace_recursion for the logic used for the set_graph_function was a bit of an abuse of that variable. Now that there exists global vars that are per stack for registered graph traces, use that instead.
Link: https://lore.kernel.org/linux-trace-kernel/171509105520.162236.10339831553995971290.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Florent Revest <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: bpf <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Alan Maguire <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Guo Ren <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
37238abe |
| 03-Jun-2024 |
Steven Rostedt (VMware) <[email protected]> |
ftrace/function_graph: Pass fgraph_ops to function graph callbacks
Pass the fgraph_ops structure to the function graph callbacks. This will allow callbacks to add a descriptor to a fgraph_ops privat
ftrace/function_graph: Pass fgraph_ops to function graph callbacks
Pass the fgraph_ops structure to the function graph callbacks. This will allow callbacks to add a descriptor to a fgraph_ops private field that wil be added in the future and use it for the callbacks. This will be useful when more than one callback can be registered to the function graph tracer.
Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509098588.162236.4787930115997357578.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Florent Revest <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: bpf <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Alan Maguire <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Guo Ren <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7 |
|
| #
eecb91b9 |
| 17-Aug-2023 |
Zheng Yejian <[email protected]> |
tracing: Fix memleak due to race between current_tracer and trace
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128): comm "cat", pid 204981, jiff
tracing: Fix memleak due to race between current_tracer and trace
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128): comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s) hex dump (first 32 bytes): e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}.......... f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e... backtrace: [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0 [<000000007df90faa>] graph_trace_open+0xb0/0x344 [<00000000737524cd>] __tracing_open+0x450/0xb10 [<0000000098043327>] tracing_open+0x1a0/0x2a0 [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0 [<000000004015bcd6>] vfs_open+0x98/0xd0 [<000000002b5f60c9>] do_open+0x520/0x8d0 [<00000000376c7820>] path_openat+0x1c0/0x3e0 [<00000000336a54b5>] do_filp_open+0x14c/0x324 [<000000002802df13>] do_sys_openat2+0x2c4/0x530 [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4 [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394 [<00000000313647bf>] do_el0_svc+0xac/0xec [<000000002ef1c651>] el0_svc+0x20/0x30 [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4 [<000000000c309c35>] el0_sync+0x160/0x180
The root cause is descripted as follows:
__tracing_open() { // 1. File 'trace' is being opened; ... *iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is // currently set; ... iter->trace->open(iter); // 3. Call graph_trace_open() here, // and memory are allocated in it; ... }
s_start() { // 4. The opened file is being read; ... *iter->trace = *tr->current_trace; // 5. If tracer is switched to // 'nop' or others, then memory // in step 3 are leaked!!! ... }
To fix it, in s_start(), close tracer before switching then reopen the new tracer after switching. And some tracers like 'wakeup' may not update 'iter->private' in some cases when reopen, then it should be cleared to avoid being mistakenly closed again.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants") Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6 |
|
| #
36590c50 |
| 25-Jan-2021 |
Sebastian Andrzej Siewior <[email protected]> |
tracing: Merge irqflags + preempt counter.
The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actua
tracing: Merge irqflags + preempt counter.
The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed.
The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases).
The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state.
Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording.
This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations.
As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update()
In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times.
A build with all tracers enabled on x86-64 with and without the patch:
text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new
text shrank by 379 bytes, data remained constant.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc5 |
|
| #
da7f84cd |
| 19-Jan-2021 |
Viktor Rosendahl <[email protected]> |
tracing: Use pause-on-trace with the latency tracers
Eaerlier, tracing was disabled when reading the trace file. This behavior was changed with:
commit 06e0a548bad0 ("tracing: Do not disable tracin
tracing: Use pause-on-trace with the latency tracers
Eaerlier, tracing was disabled when reading the trace file. This behavior was changed with:
commit 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file").
This doesn't seem to work with the latency tracers.
The above mentioned commit dit not only change the behavior but also added an option to emulate the old behavior. The idea with this patch is to enable this pause-on-trace option when the latency tracers are used.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected] Fixes: 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file") Signed-off-by: Viktor Rosendahl <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2 |
|
| #
d19ad077 |
| 28-Oct-2020 |
Steven Rostedt (VMware) <[email protected]> |
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
In preparation to have arguments of a function passed to callbacks attached to functions as default, change the default cal
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
In preparation to have arguments of a function passed to callbacks attached to functions as default, change the default callback prototype to receive a struct ftrace_regs as the forth parameter instead of a pt_regs.
For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they will now need to get the pt_regs via a ftrace_get_regs() helper call. If this is called by a callback that their ftrace_ops did not have a FL_SAVE_REGS flag set, it that helper function will return NULL.
This will allow the ftrace_regs to hold enough just to get the parameters and stack pointer, but without the worry that callbacks may have a pt_regs that is not completely filled.
Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6 |
|
| #
1c5eb448 |
| 09-Jan-2020 |
Steven Rostedt (VMware) <[email protected]> |
tracing: Rename trace_buffer to array_buffer
As we are working to remove the generic "ring_buffer" name that is used by both tracing and perf, the ring_buffer name for tracing will be renamed to tra
tracing: Rename trace_buffer to array_buffer
As we are working to remove the generic "ring_buffer" name that is used by both tracing and perf, the ring_buffer name for tracing will be renamed to trace_buffer, and perf's ring buffer will be renamed to perf_buffer.
As there already exists a trace_buffer that is used by the trace_arrays, it needs to be first renamed to array_buffer.
Link: https://lore.kernel.org/r/20191213153553.GE20583@krava
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7 |
|
| #
eeeb080b |
| 12-Feb-2019 |
Masami Hiramatsu <[email protected]> |
kprobes: Prohibit probing on hardirq tracers
Since kprobes breakpoint handling involves hardirq tracer, probing these functions cause breakpoint recursion problem.
Prohibit probing on those functio
kprobes: Prohibit probing on hardirq tracers
Since kprobes breakpoint handling involves hardirq tracer, probing these functions cause breakpoint recursion problem.
Prohibit probing on those functions.
Signed-off-by: Masami Hiramatsu <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrea Righi <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/154998802073.31052.17255044712514564153.stgit@devbox Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1 |
|
| #
9acd8de6 |
| 01-Jan-2019 |
Changbin Du <[email protected]> |
function_graph: Support displaying relative timestamp
When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This
function_graph: Support displaying relative timestamp
When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff).
Instead of:
# tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64
Show:
# tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Changbin Du <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3 |
|
| #
688f7089 |
| 15-Nov-2018 |
Steven Rostedt (VMware) <[email protected]> |
fgraph: Add new fgraph_ops structure to enable function graph hooks
Currently the registering of function graph is to pass in a entry and return function. We need to have a way to associate those fu
fgraph: Add new fgraph_ops structure to enable function graph hooks
Currently the registering of function graph is to pass in a entry and return function. We need to have a way to associate those functions together where the entry can determine to run the return hook. Having a structure that contains both functions will facilitate the process of converting the code to be able to do such.
This is similar to the way function hooks are enabled (it passes in ftrace_ops). Instead of passing in the functions to use, a single structure is passed in to the registering function.
The unregister function is now passed in the fgraph_ops handle. When we allow more than one callback to the function graph hooks, this will let the system know which one to remove.
Reviewed-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
| #
5cf99a0f |
| 29-Nov-2018 |
Steven Rostedt (VMware) <[email protected]> |
tracing/fgraph: Fix set_graph_function from showing interrupts
The tracefs file set_graph_function is used to only function graph functions that are listed in that file (or all functions if the file
tracing/fgraph: Fix set_graph_function from showing interrupts
The tracefs file set_graph_function is used to only function graph functions that are listed in that file (or all functions if the file is empty). The way this is implemented is that the function graph tracer looks at every function, and if the current depth is zero and the function matches something in the file then it will trace that function. When other functions are called, the depth will be greater than zero (because the original function will be at depth zero), and all functions will be traced where the depth is greater than zero.
The issue is that when a function is first entered, and the handler that checks this logic is called, the depth is set to zero. If an interrupt comes in and a function in the interrupt handler is traced, its depth will be greater than zero and it will automatically be traced, even if the original function was not. But because the logic only looks at depth it may trace interrupts when it should not be.
The recent design change of the function graph tracer to fix other bugs caused the depth to be zero while the function graph callback handler is being called for a longer time, widening the race of this happening. This bug was actually there for a longer time, but because the race window was so small it seldom happened. The Fixes tag below is for the commit that widen the race window, because that commit belongs to a series that will also help fix the original bug.
Cc: [email protected] Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack") Reported-by: Joe Lawrence <[email protected]> Tested-by: Joe Lawrence <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1 |
|
| #
bcea3f96 |
| 16-Aug-2018 |
Steven Rostedt (VMware) <[email protected]> |
tracing: Add SPDX License format tags to tracing files
Add the SPDX License header to ease license compliance management.
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Revision tags: v4.18 |
|
| #
3f1756dc |
| 09-Aug-2018 |
Steven Rostedt (VMware) <[email protected]> |
tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
Joel Fernandes created a nice patch that cleaned up the duplicate hooks used by lockdep and irqsoff late
tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
Joel Fernandes created a nice patch that cleaned up the duplicate hooks used by lockdep and irqsoff latency tracer. It made both use tracepoints. But the latency tracer is triggering warnings when using tracepoints to call into the latency tracer's routines. Mainly, they can be called from NMI context. If that happens, then the SRCU may not work properly because on some architectures, SRCU is not safe to be called in both NMI and non-NMI context.
This is a partial revert of the clean up patch c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage") that adds back the direct calls into the latency tracer. It also only calls the trace events when not in NMI.
Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Joel Fernandes (Google) <[email protected]> Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
| #
f27107fa |
| 07-Aug-2018 |
Steven Rostedt (VMware) <[email protected]> |
tracing/irqsoff: Handle preempt_count for different configs
I was hitting the following warning:
WARNING: CPU: 0 PID: 1 at kernel/trace/trace_irqsoff.c:631 tracer_hardirqs_off+0x15/0x2a
Modules li
tracing/irqsoff: Handle preempt_count for different configs
I was hitting the following warning:
WARNING: CPU: 0 PID: 1 at kernel/trace/trace_irqsoff.c:631 tracer_hardirqs_off+0x15/0x2a
Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6-test+ #13 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tracer_hardirqs_off+0x15/0x2a Code: ff 85 c0 74 0e 8b 45 00 8b 50 04 8b 45 04 e8 35 ff ff ff 5d c3 55 64 a1 cc 37 51 c1 a9 ff ff ff 7f 89 e5 53 89 d3 89 ca 75 02 <0f> 0b e8 90 fc ff ff 85 c0 74 07 89 d8 e8 0c ff ff ff 5b 5d c3 55 EAX: 80000000 EBX: c04337f0 ECX: c04338e3 EDX: c04338e3 ESI: c04337f0 EDI: c04338e3 EBP: f2aa1d68 ESP: f2aa1d64 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210046 CR0: 80050033 CR2: 00000000 CR3: 01668000 CR4: 001406f0 Call Trace: trace_irq_disable_rcuidle+0x63/0x6c trace_hardirqs_off+0x26/0x30 default_send_IPI_mask_allbutself_logical+0x31/0x93 default_send_IPI_allbutself+0x37/0x48 native_send_call_func_ipi+0x4d/0x6a smp_call_function_many+0x165/0x19d ? add_nops+0x34/0x34 ? trace_hardirqs_on_caller+0x2d/0x2d ? add_nops+0x34/0x34 smp_call_function+0x1f/0x23 on_each_cpu+0x15/0x43 ? trace_hardirqs_on_caller+0x2d/0x2d ? trace_hardirqs_on_caller+0x2d/0x2d ? trace_irq_disable_rcuidle+0x1/0x6c text_poke_bp+0xa0/0xc2 ? trace_hardirqs_on_caller+0x2d/0x2d arch_jump_label_transform+0xa7/0xcb ? trace_irq_disable_rcuidle+0x5/0x6c __jump_label_update+0x3e/0x6d jump_label_update+0x7d/0x81 static_key_slow_inc_cpuslocked+0x58/0x6d static_key_slow_inc+0x19/0x20 tracepoint_probe_register_prio+0x19e/0x1d1 ? start_critical_timings+0x1c/0x1c tracepoint_probe_register+0xf/0x11 irqsoff_tracer_init+0x21/0xf2 tracer_init+0x16/0x1a trace_selftest_startup_irqsoff+0x25/0xc4 run_tracer_selftest+0xca/0x131 register_tracer+0xd5/0x172 ? trace_event_define_fields_preemptirq_template+0x45/0x45 init_irqsoff_tracer+0xd/0x11 do_one_initcall+0xab/0x1e8 ? rcu_read_lock_sched_held+0x3d/0x44 ? trace_initcall_level+0x52/0x86 kernel_init_freeable+0x195/0x21a ? rest_init+0xb4/0xb4 kernel_init+0xd/0xe4 ret_from_fork+0x2e/0x38
It is due to running a CONFIG_PREEMPT_VOLUNTARY kernel, which would trigger this warning every time:
WARN_ON_ONCE(preempt_count());
Because on CONFIG_PREEMPT_VOLUNTARY, preempt_count() is always zero.
This warning is to make sure preempt_count is set because tracer_hardirqs_on() does a preempt_enable_notrace() to make the preempt_trace() work properly, as being called by a trace event, the trace event code disables preemption, and the tracer wants to know what the preemption was before it was called.
Instead of enabling preemption like this, just record the preempt_count, subtract PREEMPT_DISABLE_OFFSET from it (which is zero with !CONFIG_PREEMPT set), and pass that value to the necessary functions, which should use the passed in parameter instead of calling preempt_count() directly.
Fixes: da5b3ebb45277 ("tracing: irqsoff: Account for additional preempt_disable") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
| #
da5b3ebb |
| 06-Aug-2018 |
Joel Fernandes (Google) <[email protected]> |
tracing: irqsoff: Account for additional preempt_disable
Recently we tried to make the preemptirqsoff tracer to use irqsoff tracepoint probes. However this causes issues as reported by Masami:
[2.2
tracing: irqsoff: Account for additional preempt_disable
Recently we tried to make the preemptirqsoff tracer to use irqsoff tracepoint probes. However this causes issues as reported by Masami:
[2.271078] Testing tracer preemptirqsoff: .. no entries found ..FAILED! [2.381015] WARNING: CPU: 0 PID: 1 at /home/mhiramat/ksrc/linux/kernel/ trace/trace.c:1512 run_tracer_selftest+0xf3/0x154
This is due to the tracepoint code increasing the preempt nesting count by calling an additional preempt_disable before calling into the preemptoff tracer which messes up the preempt_count() check in tracer_hardirqs_off.
To fix this, make the irqsoff tracer probes balance the additional outer preempt_disable with a preempt_enable_notrace.
The other way to fix this is to just use SRCU for all tracepoints. However we can't do that because we can't use NMIs from RCU context.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage") Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU") Reported-by: Masami Hiramatsu <[email protected]> Tested-by: Masami Hiramatsu <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v4.18-rc8 |
|
| #
c3bc8fd6 |
| 30-Jul-2018 |
Joel Fernandes (Google) <[email protected]> |
tracing: Centralize preemptirq tracepoints and unify their usage
This patch detaches the preemptirq tracepoints from the tracers and keeps it separate.
Advantages: * Lockdep and irqsoff event can n
tracing: Centralize preemptirq tracepoints and unify their usage
This patch detaches the preemptirq tracepoints from the tracers and keeps it separate.
Advantages: * Lockdep and irqsoff event can now run in parallel since they no longer have their own calls.
* This unifies the usecase of adding hooks to an irqsoff and irqson event, and a preemptoff and preempton event. 3 users of the events exist: - Lockdep - irqsoff and preemptoff tracers - irqs and preempt trace events
The unification cleans up several ifdefs and makes the code in preempt tracer and irqsoff tracers simpler. It gets rid of all the horrific ifdeferry around PROVE_LOCKING and makes configuration of the different users of the tracepoints more easy and understandable. It also gets rid of the time_* function calls from the lockdep hooks used to call into the preemptirq tracer which is not needed anymore. The negative delta in lines of code in this patch is quite large too.
In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS as a single point for registering probes onto the tracepoints. With this, the web of config options for preempt/irq toggle tracepoints and its users becomes:
PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING | | \ | | \ (selects) / \ \ (selects) / TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS \ / \ (depends on) / PREEMPTIRQ_TRACEPOINTS
Other than the performance tests mentioned in the previous patch, I also ran the locking API test suite. I verified that all tests cases are passing.
I also injected issues by not registering lockdep probes onto the tracepoints and I see failures to confirm that the probes are indeed working.
This series + lockdep probes not registered (just to inject errors): [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok | [ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
With this series + lockdep probes registered, all locking tests pass:
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
Link: http://lkml.kernel.org/r/[email protected]
Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|
|
Revision tags: v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3 |
|
| #
2b27ece6 |
| 28-Jun-2018 |
Joel Fernandes (Google) <[email protected]> |
tracing/irqsoff: Split reset into separate functions
Split reset functions into seperate functions in preparation of future patches that need to do tracer specific reset.
Link: http://lkml.kernel.o
tracing/irqsoff: Split reset into separate functions
Split reset functions into seperate functions in preparation of future patches that need to do tracer specific reset.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
show more ...
|