|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5 |
|
| #
0a8f11f8 |
| 02-May-2025 |
Steven Rostedt <[email protected]> |
tracing: Do not take trace_event_sem in print_event_fields()
On some paths in print_event_fields() it takes the trace_event_sem for read, even though it should always be held when the function is ca
tracing: Do not take trace_event_sem in print_event_fields()
On some paths in print_event_fields() it takes the trace_event_sem for read, even though it should always be held when the function is called.
Remove the taking of that mutex and add a lockdep_assert_held_read() to make sure the trace_event_sem is held when print_event_fields() is called.
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 80a76994b2d88 ("tracing: Add "fields" option to show raw trace event fields") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
35a380dd |
| 24-Mar-2025 |
Masami Hiramatsu (Google) <[email protected]> |
tracing: Show last module text symbols in the stacktrace
Since the previous boot trace buffer can include module text address in the stacktrace. As same as the kernel text address, convert the modul
tracing: Show last module text symbols in the stacktrace
Since the previous boot trace buffer can include module text address in the stacktrace. As same as the kernel text address, convert the module text address using the module address information.
Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/174282689201.356346.17647540360450727687.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14 |
|
| #
391dda1b |
| 23-Mar-2025 |
Sasha Levin <[email protected]> |
tracing: Use hashtable.h for event_hash
Convert the event_hash array in trace_output.c to use the generic hashtable implementation from hashtable.h instead of the manually implemented hash table.
T
tracing: Use hashtable.h for event_hash
Convert the event_hash array in trace_output.c to use the generic hashtable implementation from hashtable.h instead of the manually implemented hash table.
This simplifies the code and makes it more maintainable by using the standard hashtable API defined in hashtable.h.
Rename EVENT_HASHSIZE to EVENT_HASH_BITS to properly reflect its new meaning as the number of bits for the hashtable size.
Link: https://lore.kernel.org/[email protected] Link: https://lore.kernel.org/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
76fe0337 |
| 27-Feb-2025 |
Sven Schnelle <[email protected]> |
ftrace: Add arguments to function tracer
Wire up the code to print function arguments in the function tracer. This functionality can be enabled/disabled during runtime with options/func-args.
ftrace: Add arguments to function tracer
Wire up the code to print function arguments in the function tracer. This functionality can be enabled/disabled during runtime with options/func-args.
ping-689 [004] b.... 77.170220: dummy_xmit(skb = 0x82904800, dev = 0x882d0000) <-dev_hard_start_xmit
Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Guo Ren <[email protected]> Cc: Donglin Peng <[email protected]> Cc: Zheng Yejian <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Co-developed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
533c20b0 |
| 27-Feb-2025 |
Sven Schnelle <[email protected]> |
ftrace: Add print_function_args()
Add a function to decode argument types with the help of BTF. Will be used to display arguments in the function and function graph tracer.
It can only handle simpl
ftrace: Add print_function_args()
Add a function to decode argument types with the help of BTF. Will be used to display arguments in the function and function graph tracer.
It can only handle simply arguments and up to FTRACE_REGS_MAX_ARGS number of arguments. When it hits a max, it will print ", ...":
page_to_skb(vi=0xffff8d53842dc980, rq=0xffff8d53843a0800, page=0xfffffc2e04337c00, offset=6160, len=64, truesize=1536, ...)
And if it hits an argument that is not recognized, it will print the raw value and the type of argument it is:
make_vfsuid(idmap=0xffffffff87f99db8, fs_userns=0xffffffff87e543c0, kuid=0x0 (STRUCT)) __pti_set_user_pgtbl(pgdp=0xffff8d5384ab47f8, pgd=0x110e74067 (STRUCT))
Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Guo Ren <[email protected]> Cc: Donglin Peng <[email protected]> Cc: Zheng Yejian <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Co-developed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
afd2627f |
| 17-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Check "%s" dereference via the field and not the TP_printk format
The TP_printk() portion of a trace event is executed at the time a event is read from the trace. This can happen seconds, m
tracing: Check "%s" dereference via the field and not the TP_printk format
The TP_printk() portion of a trace event is executed at the time a event is read from the trace. This can happen seconds, minutes, hours, days, months, years possibly later since the event was recorded. If the print format contains a dereference to a string via "%s", and that string was allocated, there's a chance that string could be freed before it is read by the trace file.
To protect against such bugs, there are two functions that verify the event. The first one is test_event_printk(), which is called when the event is created. It reads the TP_printk() format as well as its arguments to make sure nothing may be dereferencing a pointer that was not copied into the ring buffer along with the event. If it is, it will trigger a WARN_ON().
For strings that use "%s", it is not so easy. The string may not reside in the ring buffer but may still be valid. Strings that are static and part of the kernel proper which will not be freed for the life of the running system, are safe to dereference. But to know if it is a pointer to a static string or to something on the heap can not be determined until the event is triggered.
This brings us to the second function that tests for the bad dereferencing of strings, trace_check_vprintf(). It would walk through the printf format looking for "%s", and when it finds it, it would validate that the pointer is safe to read. If not, it would produces a WARN_ON() as well and write into the ring buffer "[UNSAFE-MEMORY]".
The problem with this is how it used va_list to have vsnprintf() handle all the cases that it didn't need to check. Instead of re-implementing vsnprintf(), it would make a copy of the format up to the %s part, and call vsnprintf() with the current va_list ap variable, where the ap would then be ready to point at the string in question.
For architectures that passed va_list by reference this was possible. For architectures that passed it by copy it was not. A test_can_verify() function was used to differentiate between the two, and if it wasn't possible, it would disable it.
Even for architectures where this was feasible, it was a stretch to rely on such a method that is undocumented, and could cause issues later on with new optimizations of the compiler.
Instead, the first function test_event_printk() was updated to look at "%s" as well. If the "%s" argument is a pointer outside the event in the ring buffer, it would find the field type of the event that is the problem and mark the structure with a new flag called "needs_test". The event itself will be marked by TRACE_EVENT_FL_TEST_STR to let it be known that this event has a field that needs to be verified before the event can be printed using the printf format.
When the event fields are created from the field type structure, the fields would copy the field type's "needs_test" value.
Finally, before being printed, a new function ignore_event() is called which will check if the event has the TEST_STR flag set (if not, it returns false). If the flag is set, it then iterates through the events fields looking for the ones that have the "needs_test" flag set.
Then it uses the offset field from the field structure to find the pointer in the ring buffer event. It runs the tests to make sure that pointer is safe to print and if not, it triggers the WARN_ON() and also adds to the trace output that the event in question has an unsafe memory access.
The ignore_event() makes the trace_check_vprintf() obsolete so it is removed.
Link: https://lore.kernel.org/all/CAHk-=wh3uOnqnZPpR0PeLZZtyWbZLboZ7cHLCKRWsocvs9Y7hQ@mail.gmail.com/
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
0172afef |
| 22-Nov-2024 |
Thomas Gleixner <[email protected]> |
tracing: Record task flag NEED_RESCHED_LAZY.
The scheduler added NEED_RESCHED_LAZY scheduling. Record this state as part of trace flags and expose it in the need_resched field.
Record and expose NE
tracing: Record task flag NEED_RESCHED_LAZY.
The scheduler added NEED_RESCHED_LAZY scheduling. Record this state as part of trace flags and expose it in the need_resched field.
Record and expose NEED_RESCHED_LAZY.
[bigeasy: Commit description, documentation bits.]
Cc: Peter Zijlstra <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: Ankur Arora <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
6ce5a6f0 |
| 21-Oct-2024 |
Tatsuya S <[email protected]> |
tracing: Fix function name for trampoline
The issue that unrelated function name is shown on stack trace like following even though it should be trampoline code address is caused by the creation of
tracing: Fix function name for trampoline
The issue that unrelated function name is shown on stack trace like following even though it should be trampoline code address is caused by the creation of trampoline code in the area where .init.text section of module was freed after module is loaded.
bash-1344 [002] ..... 43.644608: <stack trace> => (MODULE INIT FUNCTION) => vfs_write => ksys_write => do_syscall_64 => entry_SYSCALL_64_after_hwframe
To resolve this, when function address of stack trace entry is in trampoline, output without looking up symbol name.
Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Tatsuya S <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
e9f0a363 |
| 22-Oct-2024 |
Sebastian Andrzej Siewior <[email protected]> |
tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT
It was possible to enable tracing with no IRQ tracing support. The tracing infrastructure would then record TRACE_FLAG_IRQS_NOSUPPORT as the only tracing fl
tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT
It was possible to enable tracing with no IRQ tracing support. The tracing infrastructure would then record TRACE_FLAG_IRQS_NOSUPPORT as the only tracing flag and show an 'X' in the output.
The last user of this feature was PPC32 which managed to implement it during PowerPC merge in 2009. Since then, it was unused and the PPC32 dependency was finally removed in commit 0ea5ee035133a ("tracing: Remove PPC32 wart from config TRACING_SUPPORT"). Since the PowerPC merge the code behind !CONFIG_TRACE_IRQFLAGS_SUPPORT with TRACING enabled can no longer be selected used and the 'X' is not displayed or recorded.
Remove the CONFIG_TRACE_IRQFLAGS_SUPPORT from the tracing code. Remove TRACE_FLAG_IRQS_NOSUPPORT.
Cc: Peter Zijlstra <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
9b7bdf6f |
| 23-Aug-2024 |
Steven Rostedt <[email protected]> |
tracing: Have trace_printk not use binary prints if boot buffer
If the persistent boot mapped ring buffer is used for trace_printk(), force it to not use the binary versions. trace_printk() by defau
tracing: Have trace_printk not use binary prints if boot buffer
If the persistent boot mapped ring buffer is used for trace_printk(), force it to not use the binary versions. trace_printk() by default uses bin_printf() that only saves the pointer to the format and not the format itself inside the ring buffer. But for a persistent buffer that is read after reboot, the pointers to the format strings may not be the same, or worse, not even exist! Instead, just force the more robust, but slower, version that does the formatting before saving into the ring buffer.
The boot mapped buffer can now be used for trace_printk and friends!
Using the trace_printk() and the persistent buffer was used to debug the issue with the osnoise tracer:
Link: https://lore.kernel.org/all/[email protected]/
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineeth Pillai <[email protected]> Cc: Beau Belgrave <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: David Howells <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kees Cook <[email protected]> Cc: Alexander Aring <[email protected]> Cc: "Luis Claudio R. Goncalves" <[email protected]> Cc: Tomas Glozar <[email protected]> Cc: John Kacur <[email protected]> Cc: Clark Williams <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: "Jonathan Corbet" <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4 |
|
| #
a62b4f6f |
| 12-Jun-2024 |
Steven Rostedt (Google) <[email protected]> |
tracing: Add last boot delta offset for stack traces
The addresses of a stack trace event are relative to the kallsyms. As that can change between boots, when printing the stack trace from a buffer
tracing: Add last boot delta offset for stack traces
The addresses of a stack trace event are relative to the kallsyms. As that can change between boots, when printing the stack trace from a buffer that was from the last boot, it needs all the addresses to be added to the "text_delta" that gives the delta between the addresses of the functions for the current boot compared to the address of the last boot. Then it can be passed to kallsyms to find the function name, otherwise it just shows a useless list of addresses.
Link: https://lkml.kernel.org/r/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineeth Pillai <[email protected]> Cc: Youssef Esmat <[email protected]> Cc: Beau Belgrave <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: David Howells <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
7cfeb903 |
| 12-Jun-2024 |
Steven Rostedt (Google) <[email protected]> |
tracing: Update function tracing output for previous boot buffer
For a persistent ring buffer that is saved across boots, if function tracing was performed in the previous boot, it only saves the ad
tracing: Update function tracing output for previous boot buffer
For a persistent ring buffer that is saved across boots, if function tracing was performed in the previous boot, it only saves the address of the functions and uses "%pS" to print their names. But the current boot, those functions may be in different locations. The persistent meta-data saves the text delta between the two boots and can be used to find the address of the saved function of where it is located in the current boot.
Link: https://lkml.kernel.org/r/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineeth Pillai <[email protected]> Cc: Youssef Esmat <[email protected]> Cc: Beau Belgrave <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: David Howells <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8 |
|
| #
5efd3e2a |
| 04-Mar-2024 |
Steven Rostedt (Google) <[email protected]> |
tracing: Remove precision vsnprintf() check from print event
This reverts 60be76eeabb3d ("tracing: Add size check when printing trace_marker output"). The only reason the precision check was added w
tracing: Remove precision vsnprintf() check from print event
This reverts 60be76eeabb3d ("tracing: Add size check when printing trace_marker output"). The only reason the precision check was added was because of a bug that miscalculated the write size of the string into the ring buffer and it truncated it removing the terminating nul byte. On reading the trace it crashed the kernel. But this was due to the bug in the code that happened during development and should never happen in practice. If anything, the precision can hide bugs where the string in the ring buffer isn't nul terminated and it will not be checked.
Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Linus Torvalds <[email protected]> Fixes: 60be76eeabb3d ("tracing: Add size check when printing trace_marker output") Reported-by: Sachin Sant <[email protected]> Tested-by: Sachin Sant <[email protected]> Reviewed-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
60be76ee |
| 12-Dec-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Add size check when printing trace_marker output
If for some reason the trace_marker write does not have a nul byte for the string, it will overflow the print:
trace_seq_printf(s, ": %s"
tracing: Add size check when printing trace_marker output
If for some reason the trace_marker write does not have a nul byte for the string, it will overflow the print:
trace_seq_printf(s, ": %s", field->buf);
The field->buf could be missing the nul byte. To prevent overflow, add the max size that the buf can be by using the event size and the field location.
int max = iter->ent_size - offsetof(struct print_entry, buf);
trace_seq_printf(s, ": %*.s", max, field->buf);
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6 |
|
| #
08582d67 |
| 09-Oct-2023 |
Amir Goldstein <[email protected]> |
fs: create helper file_user_path() for user displayed mapped file path
Overlayfs uses backing files with "fake" overlayfs f_path and "real" underlying f_inode, in order to use underlying inode aops
fs: create helper file_user_path() for user displayed mapped file path
Overlayfs uses backing files with "fake" overlayfs f_path and "real" underlying f_inode, in order to use underlying inode aops for mapped files and to display the overlayfs path in /proc/<pid>/maps.
In preparation for storing the overlayfs "fake" path instead of the underlying "real" path in struct backing_file, define a noop helper file_user_path() that returns f_path for now.
Use the new helper in procfs and kernel logs whenever a path of a mapped file is displayed to users.
Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
e88ed227 |
| 06-Jun-2023 |
Daniel Bristot de Oliveira <[email protected]> |
tracing/timerlat: Add user-space interface
Going a step further, we propose a way to use any user-space workload as the task waiting for the timerlat timer. This is done via a per-CPU file named osn
tracing/timerlat: Add user-space interface
Going a step further, we propose a way to use any user-space workload as the task waiting for the timerlat timer. This is done via a per-CPU file named osnoise/cpu$id/timerlat_fd file.
The tracef_fd allows a task to open at a time. When a task reads the file, the timerlat timer is armed for future osnoise/timerlat_period_us time. When the timer fires, it prints the IRQ latency and wakes up the user-space thread waiting in the timerlat_fd.
The thread then starts to run, executes the timerlat measurement, prints the thread scheduling latency and returns to user-space.
When the thread rereads the timerlat_fd, the tracer will print the user-ret(urn) latency, which is an additional metric.
This additional metric is also traced by the tracer and can be used, for example of measuring the context switch overhead from kernel-to-user and user-to-kernel, or the response time for an arbitrary execution in user-space.
The tracer supports one thread per CPU, the thread must be pinned to the CPU, and it cannot migrate while holding the timerlat_fd. The reason is that the tracer is per CPU (nothing prohibits the tracer from allowing migrations in the future). The tracer monitors the migration of the thread and disables the tracer if detected.
The timerlat_fd is only available for opening/reading when timerlat tracer is enabled, and NO_OSNOISE_WORKLOAD is set.
The simplest way to activate this feature from user-space is:
-------------------------------- %< ----------------------------------- int main(void) { char buffer[1024]; int timerlat_fd; int retval; long cpu = 0; /* place in CPU 0 */ cpu_set_t set;
CPU_ZERO(&set); CPU_SET(cpu, &set);
if (sched_setaffinity(gettid(), sizeof(set), &set) == -1) return 1;
snprintf(buffer, sizeof(buffer), "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd", cpu);
timerlat_fd = open(buffer, O_RDONLY); if (timerlat_fd < 0) { printf("error opening %s: %s\n", buffer, strerror(errno)); exit(1); }
for (;;) { retval = read(timerlat_fd, buffer, 1024); if (retval < 0) break; }
close(timerlat_fd); exit(0); } -------------------------------- >% -----------------------------------
When disabling timerlat, if there is a workload holding the timerlat_fd, the SIGKILL will be sent to the thread.
Link: https://lkml.kernel.org/r/69fe66a863d2792ff4c3a149bf9e32e26468bb3a.1686063934.git.bristot@kernel.org
Cc: Juri Lelli <[email protected]> Cc: William White <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4 |
|
| #
e70bb54d |
| 25-May-2023 |
sunliming <[email protected]> |
tracing: Modify print_fields() for fields output order
Now the print_fields() print trace event fields in reverse order. Modify it to the positive sequence.
Example outputs for a user event: test0
tracing: Modify print_fields() for fields output order
Now the print_fields() print trace event fields in reverse order. Modify it to the positive sequence.
Example outputs for a user event: test0 u32 count1; u32 count2
Output before: example-2547 [000] ..... 325.666387: test0: count2=0x2 (2) count1=0x1 (1)
Output after: example-2742 [002] ..... 429.769370: test0: count1=0x1 (1) count2=0x2 (2)
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Fixes: 80a76994b2d88 ("tracing: Add "fields" option to show raw trace event fields") Signed-off-by: sunliming <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc3, v6.4-rc2, v6.4-rc1 |
|
| #
adace440 |
| 26-Apr-2023 |
Ken Lin <[email protected]> |
tracing: Add missing spaces in trace_print_hex_seq()
If the buffer length is larger than 16 and concatenate is set to false, there would be missing spaces every 16 bytes. Example: Before: c5 11 10
tracing: Add missing spaces in trace_print_hex_seq()
If the buffer length is larger than 16 and concatenate is set to false, there would be missing spaces every 16 bytes. Example: Before: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 4000 40 00 After: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 40 00 40 00
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Signed-off-by: Ken Lin <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.3 |
|
| #
c7bdb079 |
| 19-Apr-2023 |
Beau Belgrave <[email protected]> |
tracing: Fix print_fields() for __dyn_loc/__rel_loc
Both print_fields() and print_array() do not handle if dynamic data ends at the last byte of the payload for both __dyn_loc and __rel_loc field ty
tracing: Fix print_fields() for __dyn_loc/__rel_loc
Both print_fields() and print_array() do not handle if dynamic data ends at the last byte of the payload for both __dyn_loc and __rel_loc field types. For __rel_loc, the offset was off by 4 bytes, leading to incorrect strings and data being printed out. In print_array() the buffer pos was missed from being advanced, which results in the first payload byte being used as the offset base instead of the field offset.
Advance __rel_loc offset by 4 to ensure correct offset and advance pos to the field offset to ensure correct data is displayed when printing arrays. Change >= to > when checking if data is in-bounds, since it's valid for dynamic data to include the last byte of the payload.
Example outputs for event format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1;
field:__rel_loc char text[]; offset:8; size:4; signed:1;
Output before: tp_rel_loc: text=<OVERFLOW>
Output after: tp_rel_loc: text=Test
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 80a76994b2d8 ("tracing: Add "fields" option to show raw trace event fields") Reported-by: Doug Cook <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5 |
|
| #
80a76994 |
| 28-Mar-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Add "fields" option to show raw trace event fields
The hex, raw and bin formats come from the old PREEMPT_RT patch set latency tracer. That actually gave real alternatives to reading the as
tracing: Add "fields" option to show raw trace event fields
The hex, raw and bin formats come from the old PREEMPT_RT patch set latency tracer. That actually gave real alternatives to reading the ascii buffer. But they have started to bit rot and they do not give a good representation of the tracing data.
Add "fields" option that will read the trace event fields and parse the data from how the fields are defined:
With "fields" = 0 (default)
echo 1 > events/sched/sched_switch/enable cat trace <idle>-0 [003] d..2. 540.078653: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/3:1 next_pid=83 next_prio=120 kworker/3:1-83 [003] d..2. 540.078860: sched_switch: prev_comm=kworker/3:1 prev_pid=83 prev_prio=120 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [003] d..2. 540.206423: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=807 next_prio=120 sshd-807 [003] d..2. 540.206531: sched_switch: prev_comm=sshd prev_pid=807 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [001] d..2. 540.206597: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206617: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206678: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206696: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206713: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120
echo 1 > options/fields <...>-998 [002] d..2. 538.643732: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/2 prev_state=0x20 (32) prev_prio=0x78 (120) prev_pid=0x3e6 (998) prev_comm=trace-cmd <idle>-0 [001] d..2. 538.643806: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/1 bash-830 [001] d..2. 538.644106: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644130: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644180: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644185: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644204: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/1 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash <idle>-0 [003] d..2. 538.644211: sched_switch: next_prio=0x78 (120) next_pid=0x327 (807) next_comm=sshd prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/3 sshd-807 [003] d..2. 538.644340: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/3 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x327 (807) prev_comm=sshd
It traces the data safely without using the trace print formatting.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3 |
|
| #
3bb06eb6 |
| 04-Jan-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Make sure trace_printk() can output as soon as it can be used
Currently trace_printk() can be used as soon as early_trace_init() is called from start_kernel(). But if a crash happens, and "
tracing: Make sure trace_printk() can output as soon as it can be used
Currently trace_printk() can be used as soon as early_trace_init() is called from start_kernel(). But if a crash happens, and "ftrace_dump_on_oops" is set on the kernel command line, all you get will be:
[ 0.456075] <idle>-0 0dN.2. 347519us : Unknown type 6 [ 0.456075] <idle>-0 0dN.2. 353141us : Unknown type 6 [ 0.456075] <idle>-0 0dN.2. 358684us : Unknown type 6
This is because the trace_printk() event (type 6) hasn't been registered yet. That gets done via an early_initcall(), which may be early, but not early enough.
Instead of registering the trace_printk() event (and other ftrace events, which are not trace events) via an early_initcall(), have them registered at the same time that trace_printk() can be used. This way, if there is a crash before early_initcall(), then the trace_printk()s will actually be useful.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Fixes: e725c731e3bb1 ("tracing: Split tracing initialization into two for early initialization") Reported-by: "Joel Fernandes (Google)" <[email protected]> Tested-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc2, v6.2-rc1, v6.1 |
|
| #
bfd5a5e8 |
| 05-Dec-2022 |
David Howells <[email protected]> |
tracing: Fix some checker warnings
Fix some checker warnings in the trace code by adding __printf attributes to a number of trace functions and their declarations.
Changes: ======== ver #2) - Drop
tracing: Fix some checker warnings
Fix some checker warnings in the trace code by adding __printf attributes to a number of trace functions and their declarations.
Changes: ======== ver #2) - Dropped the fix for the unconditional tracing_max_lat_fops decl[1].
Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/166992525941.1716618.13740663757583361463.stgit@warthog.procyon.org.uk/ # v1 Link: https://lkml.kernel.org/r/167023571258.382307.15314866482834835192.stgit@warthog.procyon.org.uk
Signed-off-by: David Howells <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5 |
|
| #
96e6122c |
| 10-Nov-2022 |
Zheng Yejian <[email protected]> |
tracing: Optimize event type allocation with IDA
After commit 060fa5c83e67 ("tracing/events: reuse trace event ids after overflow"), trace events with dynamic type are linked up in list 'ftrace_eve
tracing: Optimize event type allocation with IDA
After commit 060fa5c83e67 ("tracing/events: reuse trace event ids after overflow"), trace events with dynamic type are linked up in list 'ftrace_event_list' through field 'trace_event.list'. Then when max event type number used up, it's possible to reuse type number of some freed one by traversing 'ftrace_event_list'.
As instead, using IDA to manage available type numbers can make codes simpler and then the field 'trace_event.list' can be dropped.
Since 'struct trace_event' is used in static tracepoints, drop 'trace_event.list' can make vmlinux smaller. Local test with about 2000 tracepoints, vmlinux reduced about 64KB: before:-rwxrwxr-x 1 root root 76669448 Nov 8 17:14 vmlinux after: -rwxrwxr-x 1 root root 76604176 Nov 8 17:15 vmlinux
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zheng Yejian <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5 |
|
| #
1da27a25 |
| 27-Apr-2022 |
Jakob Koschel <[email protected]> |
tracing: Remove usage of list iterator after the loop body
In preparation to limit the scope of the list iterator variable to the traversal loop, use a dedicated pointer to point to the found elemen
tracing: Remove usage of list iterator after the loop body
In preparation to limit the scope of the list iterator variable to the traversal loop, use a dedicated pointer to point to the found element [1].
Before, the code implicitly used the head when no element was found when using &pos->list. Since the new variable is only set if an element was found, the head needs to be used explicitly if the variable is NULL.
Link: https://lkml.kernel.org/r/[email protected]
Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc4 |
|
| #
4ee51101 |
| 24-Apr-2022 |
Guo Zhengkui <[email protected]> |
tracing: Use WARN instead of printk and WARN_ON
Use `WARN(cond, ...)` instead of `if (cond)` + `printk(...)` + `WARN_ON(1)`.
Link: https://lkml.kernel.org/r/[email protected]
tracing: Use WARN instead of printk and WARN_ON
Use `WARN(cond, ...)` instead of `if (cond)` + `printk(...)` + `WARN_ON(1)`.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Guo Zhengkui <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|