|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
502d2e71 |
| 09-Mar-2025 |
Christophe JAILLET <[email protected]> |
tracing: Constify struct event_trigger_ops
'event_trigger_ops mwifiex_if_ops' are not modified in these drivers.
Constifying these structures moves some data to a read-only section, so increase ove
tracing: Constify struct event_trigger_ops
'event_trigger_ops mwifiex_if_ops' are not modified in these drivers.
Constifying these structures moves some data to a read-only section, so increase overall security, especially when the structure holds some function pointers.
On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 31368 9024 6200 46592 b600 kernel/trace/trace_events_trigger.o
After: ===== text data bss dec hex filename 31752 8608 6200 46560 b5e0 kernel/trace/trace_events_trigger.o
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/66e8f990e649678e4be37d4d1a19158ca0dea2f4.1741521295.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
0b4ffbe4 |
| 14-Mar-2025 |
Tengda Wu <[email protected]> |
tracing: Correct the refcount if the hist/hist_debug file fails to open
The function event_{hist,hist_debug}_open() maintains the refcount of 'file->tr' and 'file' through tracing_open_file_tr(). Ho
tracing: Correct the refcount if the hist/hist_debug file fails to open
The function event_{hist,hist_debug}_open() maintains the refcount of 'file->tr' and 'file' through tracing_open_file_tr(). However, it does not roll back these counts on subsequent failure paths, resulting in a refcount leak.
A very obvious case is that if the hist/hist_debug file belongs to a specific instance, the refcount leak will prevent the deletion of that instance, as it relies on the condition 'tr->ref == 1' within __remove_instance().
Fix this by calling tracing_release_file_tr() on all failure paths in event_{hist,hist_debug}_open() to correct the refcount.
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Zheng Yejian <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 1cc111b9cddc ("tracing: Fix uaf issue when open the hist or hist_debug file") Signed-off-by: Tengda Wu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
6f86bdea |
| 27-Feb-2025 |
Steven Rostedt <[email protected]> |
tracing: Fix bad hist from corrupting named_triggers list
The following commands causes a crash:
~# cd /sys/kernel/tracing/events/rcu/rcu_callback ~# echo 'hist:name=bad:keys=common_pid:onmax(bog
tracing: Fix bad hist from corrupting named_triggers list
The following commands causes a crash:
~# cd /sys/kernel/tracing/events/rcu/rcu_callback ~# echo 'hist:name=bad:keys=common_pid:onmax(bogus).save(common_pid)' > trigger bash: echo: write error: Invalid argument ~# echo 'hist:name=bad:keys=common_pid' > trigger
Because the following occurs:
event_trigger_write() { trigger_process_regex() { event_hist_trigger_parse() {
data = event_trigger_alloc(..);
event_trigger_register(.., data) { cmd_ops->reg(.., data, ..) [hist_register_trigger()] { data->ops->init() [event_hist_trigger_init()] { save_named_trigger(name, data) { list_add(&data->named_list, &named_triggers); } } } }
ret = create_actions(); (return -EINVAL) if (ret) goto out_unreg; [..] ret = hist_trigger_enable(data, ...) { list_add_tail_rcu(&data->list, &file->triggers); <<<---- SKIPPED!!! (this is important!) [..] out_unreg: event_hist_unregister(.., data) { cmd_ops->unreg(.., data, ..) [hist_unregister_trigger()] { list_for_each_entry(iter, &file->triggers, list) { if (!hist_trigger_match(data, iter, named_data, false)) <- never matches continue; [..] test = iter; } if (test && test->ops->free) <<<-- test is NULL
test->ops->free(test) [event_hist_trigger_free()] { [..] if (data->name) del_named_trigger(data) { list_del(&data->named_list); <<<<-- NEVER gets removed! } } } }
[..] kfree(data); <<<-- frees item but it is still on list
The next time a hist with name is registered, it causes an u-a-f bug and the kernel can crash.
Move the code around such that if event_trigger_register() succeeds, the next thing called is hist_trigger_enable() which adds it to the list.
A bunch of actions is called if get_named_trigger_data() returns false. But that doesn't need to be called after event_trigger_register(), so it can be moved up, allowing event_trigger_register() to be called just before hist_trigger_enable() keeping them together and allowing the file->triggers to be properly populated.
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers") Reported-by: Tomas Glozar <[email protected]> Tested-by: Tomas Glozar <[email protected]> Reviewed-by: Tom Zanussi <[email protected]> Closes: https://lore.kernel.org/all/CAP4=nvTsxjckSBTz=Oe_UYh8keD9_sZC4i++4h72mJLic4_W4A@mail.gmail.com/ Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5 |
|
| #
66fc6f52 |
| 27-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
tracing/hist: Support POLLPRI event for poll on histogram
Since POLLIN will not be flushed until the hist file is read, the user needs to repeatedly read() and poll() on the hist file for monitoring
tracing/hist: Support POLLPRI event for poll on histogram
Since POLLIN will not be flushed until the hist file is read, the user needs to repeatedly read() and poll() on the hist file for monitoring the event continuously. But the read() is somewhat redundant when the user is only monitoring for event updates.
Add POLLPRI poll event on the hist file so the event returns when a histogram is updated after open(), poll() or read(). Thus it is possible to wait for the next event without having to issue a read().
Cc: Shuah Khan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/173527248770.464571.2536902137325258133.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
1bd13edb |
| 27-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
tracing/hist: Add poll(POLLIN) support on hist file
Add poll syscall support on the `hist` file. The Waiter will be waken up when the histogram is updated with POLLIN.
Currently, there is no way to
tracing/hist: Add poll(POLLIN) support on hist file
Add poll syscall support on the `hist` file. The Waiter will be waken up when the histogram is updated with POLLIN.
Currently, there is no way to wait for a specific event in userspace. So user needs to peek the `trace` periodicaly, or wait on `trace_pipe`. But it is not a good idea to peek at the `trace` for an event that randomly happens. And `trace_pipe` is not coming back until a page is filled with events.
This allows a user to wait for a specific event on the `hist` file. User can set a histogram trigger on the event which they want to monitor and poll() on its `hist` file. Since this poll() returns POLLIN, the next poll() will return soon unless a read() happens on that hist file.
NOTE: To read the hist file again, you must set the file offset to 0, but just for monitoring the event, you may not need to read the histogram.
Cc: Shuah Khan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/173527247756.464571.14236296701625509931.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc4 |
|
| #
2b36a97a |
| 19-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Switch trace_events_hist.c code over to use guard()
There are a couple functions in trace_events_hist.c that have "goto out" or equivalent on error in order to release locks that were taken
tracing: Switch trace_events_hist.c code over to use guard()
There are a couple functions in trace_events_hist.c that have "goto out" or equivalent on error in order to release locks that were taken. This can be error prone or just simply make the code more complex.
Switch every location that ends with unlocking a mutex on error over to using the guard(mutex)() infrastructure to let the compiler worry about releasing locks. This makes the code easier to read and understand.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
6371b4bc |
| 07-Nov-2024 |
Colin Ian King <[email protected]> |
tracing: Remove redundant check on field->field in histograms
The check on field->field being true is handled as the first check on the cascaded if statement, so the later checks on field->field are
tracing: Remove redundant check on field->field in histograms
The check on field->field being true is handled as the first check on the cascaded if statement, so the later checks on field->field are redundant because this clause has already been handled. Since this later check is redundant, just remove it.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6 |
|
| #
242b32d8 |
| 31-Oct-2024 |
Jinjie Ruan <[email protected]> |
tracing: Replace strncpy() with strscpy() when copying comm
Replace the depreciated[1] strncpy() calls with strscpy() when copying comm.
Link: https://github.com/KSPP/linux/issues/90 [1]
Cc: <mhir
tracing: Replace strncpy() with strscpy() when copying comm
Replace the depreciated[1] strncpy() calls with strscpy() when copying comm.
Link: https://github.com/KSPP/linux/issues/90 [1]
Cc: <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Jinjie Ruan <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
4a8840af |
| 08-Oct-2024 |
Josh Poimboeuf <[email protected]> |
tracepoints: Use new static branch API
The old static key API is deprecated. Switch to the new one.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]
tracepoints: Use new static branch API
The old static key API is deprecated. Switch to the new one.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Alice Ryhl <[email protected]> Link: https://lore.kernel.org/7a08dae3c5eddb14b13864923c1b58ac1f4af83c.1728414936.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
b1560408 |
| 30-Jul-2024 |
Steven Rostedt <[email protected]> |
tracing: Have format file honor EVENT_FILE_FL_FREED
When eventfs was introduced, special care had to be done to coordinate the freeing of the file meta data with the files that are exposed to user s
tracing: Have format file honor EVENT_FILE_FL_FREED
When eventfs was introduced, special care had to be done to coordinate the freeing of the file meta data with the files that are exposed to user space. The file meta data would have a ref count that is set when the file is created and would be decremented and freed after the last user that opened the file closed it. When the file meta data was to be freed, it would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed, and any new references made (like new opens or reads) would fail as it is marked freed. This allowed other meta data to be freed after this flag was set (under the event_mutex).
All the files that were dynamically created in the events directory had a pointer to the file meta data and would call event_release() when the last reference to the user space file was closed. This would be the time that it is safe to free the file meta data.
A shortcut was made for the "format" file. It's i_private would point to the "call" entry directly and not point to the file's meta data. This is because all format files are the same for the same "call", so it was thought there was no reason to differentiate them. The other files maintain state (like the "enable", "trigger", etc). But this meant if the file were to disappear, the "format" file would be unaware of it.
This caused a race that could be trigger via the user_events test (that would create dynamic events and free them), and running a loop that would read the user_events format files:
In one console run:
# cd tools/testing/selftests/user_events # while true; do ./ftrace_test; done
And in another console run:
# cd /sys/kernel/tracing/ # while true; do cat events/user_events/__test_event/format; done 2>/dev/null
With KASAN memory checking, it would trigger a use-after-free bug report (which was a real bug). This was because the format file was not checking the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the event that the file meta data pointed to after the event was freed.
After inspection, there are other locations that were found to not check the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a new helper function: event_file_file() that will make sure that the event_mutex is held, and will return NULL if the trace_event_file has the EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file pointer use event_file_file() and check for NULL. Later uses can still use the event_file_data() helper function if the event_mutex is still held and was not released since the event_file_file() call.
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Ajay Kaher <[email protected]> Cc: Ilkka Naulapää <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Beau Belgrave <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Alexey Makhalov <[email protected]> Cc: Vasavi Sirnapalli <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: b63db58e2fa5d ("eventfs/tracing: Add callback for release of an eventfs_inode") Reported-by: Mathias Krause <[email protected]> Tested-by: Mathias Krause <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1 |
|
| #
4f1991a9 |
| 08-Jan-2024 |
Steven Rostedt (Google) <[email protected]> |
tracing histograms: Simplify parse_actions() function
The parse_actions() function uses 'len = str_has_prefix()' to test which action is in the string being parsed. But then it goes and repeats the
tracing histograms: Simplify parse_actions() function
The parse_actions() function uses 'len = str_has_prefix()' to test which action is in the string being parsed. But then it goes and repeats the logic for each different action. This logic can be simplified and duplicate code can be removed as 'len' contains the length of the found prefix which should be used for all actions.
Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
1cc111b9 |
| 14-Dec-2023 |
Zheng Yejian <[email protected]> |
tracing: Fix uaf issue when open the hist or hist_debug file
KASAN report following issue. The root cause is when opening 'hist' file of an instance and accessing 'trace_event_file' in hist_show(),
tracing: Fix uaf issue when open the hist or hist_debug file
KASAN report following issue. The root cause is when opening 'hist' file of an instance and accessing 'trace_event_file' in hist_show(), but 'trace_event_file' has been freed due to the instance being removed. 'hist_debug' file has the same problem. To fix it, call tracing_{open,release}_file_tr() in file_operations callback to have the ref count and avoid 'trace_event_file' being freed.
BUG: KASAN: slab-use-after-free in hist_show+0x11e0/0x1278 Read of size 8 at addr ffff242541e336b8 by task head/190
CPU: 4 PID: 190 Comm: head Not tainted 6.7.0-rc5-g26aff849438c #133 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x98/0xf8 show_stack+0x1c/0x30 dump_stack_lvl+0x44/0x58 print_report+0xf0/0x5a0 kasan_report+0x80/0xc0 __asan_report_load8_noabort+0x1c/0x28 hist_show+0x11e0/0x1278 seq_read_iter+0x344/0xd78 seq_read+0x128/0x1c0 vfs_read+0x198/0x6c8 ksys_read+0xf4/0x1e0 __arm64_sys_read+0x70/0xa8 invoke_syscall+0x70/0x260 el0_svc_common.constprop.0+0xb0/0x280 do_el0_svc+0x44/0x60 el0_svc+0x34/0x68 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x168/0x170
Allocated by task 188: kasan_save_stack+0x28/0x50 kasan_set_track+0x28/0x38 kasan_save_alloc_info+0x20/0x30 __kasan_slab_alloc+0x6c/0x80 kmem_cache_alloc+0x15c/0x4a8 trace_create_new_event+0x84/0x348 __trace_add_new_event+0x18/0x88 event_trace_add_tracer+0xc4/0x1a0 trace_array_create_dir+0x6c/0x100 trace_array_create+0x2e8/0x568 instance_mkdir+0x48/0x80 tracefs_syscall_mkdir+0x90/0xe8 vfs_mkdir+0x3c4/0x610 do_mkdirat+0x144/0x200 __arm64_sys_mkdirat+0x8c/0xc0 invoke_syscall+0x70/0x260 el0_svc_common.constprop.0+0xb0/0x280 do_el0_svc+0x44/0x60 el0_svc+0x34/0x68 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x168/0x170
Freed by task 191: kasan_save_stack+0x28/0x50 kasan_set_track+0x28/0x38 kasan_save_free_info+0x34/0x58 __kasan_slab_free+0xe4/0x158 kmem_cache_free+0x19c/0x508 event_file_put+0xa0/0x120 remove_event_file_dir+0x180/0x320 event_trace_del_tracer+0xb0/0x180 __remove_instance+0x224/0x508 instance_rmdir+0x44/0x78 tracefs_syscall_rmdir+0xbc/0x140 vfs_rmdir+0x1cc/0x4c8 do_rmdir+0x220/0x2b8 __arm64_sys_unlinkat+0xc0/0x100 invoke_syscall+0x70/0x260 el0_svc_common.constprop.0+0xb0/0x280 do_el0_svc+0x44/0x60 el0_svc+0x34/0x68 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x168/0x170
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
545db7e2 |
| 21-Oct-2023 |
Christophe JAILLET <[email protected]> |
tracing/histograms: Simplify last_cmd_set()
Turn a kzalloc()+strcpy()+strncat() into an equivalent and less verbose kasprintf().
Link: https://lore.kernel.org/linux-trace-kernel/30b6fb04dadc10a03cc
tracing/histograms: Simplify last_cmd_set()
Turn a kzalloc()+strcpy()+strncat() into an equivalent and less verbose kasprintf().
Link: https://lore.kernel.org/linux-trace-kernel/30b6fb04dadc10a03cc1ad08f5d8a93ef623a167.1697899346.git.christophe.jaillet@wanadoo.fr
Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Mukesh ojha <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2 |
|
| #
4b8b3905 |
| 14-Jul-2023 |
Mohamed Khalfella <[email protected]> |
tracing/histograms: Return an error if we fail to add histogram to hist_vars list
Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables") added a ch
tracing/histograms: Return an error if we fail to add histogram to hist_vars list
Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables") added a check to fail histogram creation if save_hist_vars() failed to add histogram to hist_vars list. But the commit failed to set ret to failed return code before jumping to unregister histogram, fix it.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected] Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables") Signed-off-by: Mohamed Khalfella <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
6018b585 |
| 12-Jul-2023 |
Mohamed Khalfella <[email protected]> |
tracing/histograms: Add histograms to hist_vars if they have referenced variables
Hist triggers can have referenced variables without having direct variables fields. This can be the case if referenc
tracing/histograms: Add histograms to hist_vars if they have referenced variables
Hist triggers can have referenced variables without having direct variables fields. This can be the case if referenced variables are added for trigger actions. In this case the newly added references will not have field variables. Not taking such referenced variables into consideration can result in a bug where it would be possible to remove hist trigger with variables being refenced. This will result in a bug that is easily reproducable like so
$ cd /sys/kernel/tracing $ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events $ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger $ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger $ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
[ 100.263533] ================================================================== [ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180 [ 100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439 [ 100.266320] [ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4 [ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 [ 100.268561] Call Trace: [ 100.268902] <TASK> [ 100.269189] dump_stack_lvl+0x4c/0x70 [ 100.269680] print_report+0xc5/0x600 [ 100.270165] ? resolve_var_refs+0xc7/0x180 [ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0 [ 100.271389] ? resolve_var_refs+0xc7/0x180 [ 100.271913] kasan_report+0xbd/0x100 [ 100.272380] ? resolve_var_refs+0xc7/0x180 [ 100.272920] __asan_load8+0x71/0xa0 [ 100.273377] resolve_var_refs+0xc7/0x180 [ 100.273888] event_hist_trigger+0x749/0x860 [ 100.274505] ? kasan_save_stack+0x2a/0x50 [ 100.275024] ? kasan_set_track+0x29/0x40 [ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10 [ 100.276138] ? ksys_write+0xd1/0x170 [ 100.276607] ? do_syscall_64+0x3c/0x90 [ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 100.277771] ? destroy_hist_data+0x446/0x470 [ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860 [ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10 [ 100.279627] ? __kasan_check_write+0x18/0x20 [ 100.280177] ? mutex_unlock+0x85/0xd0 [ 100.280660] ? __pfx_mutex_unlock+0x10/0x10 [ 100.281200] ? kfree+0x7b/0x120 [ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0 [ 100.282197] ? event_trigger_write+0xac/0x100 [ 100.282764] ? __kasan_slab_free+0x16/0x20 [ 100.283293] ? __kmem_cache_free+0x153/0x2f0 [ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250 [ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10 [ 100.285221] ? event_trigger_write+0xbc/0x100 [ 100.285781] ? __kasan_check_read+0x15/0x20 [ 100.286321] ? __bitmap_weight+0x66/0xa0 [ 100.286833] ? _find_next_bit+0x46/0xe0 [ 100.287334] ? task_mm_cid_work+0x37f/0x450 [ 100.287872] event_triggers_call+0x84/0x150 [ 100.288408] trace_event_buffer_commit+0x339/0x430 [ 100.289073] ? ring_buffer_event_data+0x3f/0x60 [ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0 [ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0 [ 100.298653] syscall_enter_from_user_mode+0x32/0x40 [ 100.301808] do_syscall_64+0x1a/0x90 [ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 100.307775] RIP: 0033:0x7f686c75c1cb [ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48 [ 100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021 [ 100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb [ 100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a [ 100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a [ 100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009 [ 100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007 [ 100.338381] </TASK>
We hit the bug because when second hist trigger has was created has_hist_vars() returned false because hist trigger did not have variables. As a result of that save_hist_vars() was not called to add the trigger to trace_array->hist_vars. Later on when we attempted to remove the first histogram find_any_var_ref() failed to detect it is being used because it did not find the second trigger in hist_vars list.
With this change we wait until trigger actions are created so we can take into consideration if hist trigger has variable references. Also, now we check the return value of save_hist_vars() and fail trigger creation if save_hist_vars() fails.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected] Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers") Signed-off-by: Mohamed Khalfella <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4 |
|
| #
4b512860 |
| 24-May-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Rename stacktrace field to common_stacktrace
The histogram and synthetic events can use a pseudo event called "stacktrace" that will create a stacktrace at the time of the event and use it
tracing: Rename stacktrace field to common_stacktrace
The histogram and synthetic events can use a pseudo event called "stacktrace" that will create a stacktrace at the time of the event and use it just like it was a normal field. We have other pseudo events such as "common_cpu" and "common_timestamp". To stay consistent with that, convert "stacktrace" to "common_stacktrace". As this was used in older kernels, to keep backward compatibility, this will act just like "common_cpu" did with "cpu". That is, "cpu" will be the same as "common_cpu" unless the event has a "cpu" field. In which case, the event's field is used. The same is true with "stacktrace".
Also update the documentation to reflect this change.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
e30fbc61 |
| 24-May-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing/histograms: Allow variables to have some modifiers
Modifiers are used to change the behavior of keys. For instance, they can grouped into buckets, converted to syscall names (from the syscal
tracing/histograms: Allow variables to have some modifiers
Modifiers are used to change the behavior of keys. For instance, they can grouped into buckets, converted to syscall names (from the syscall identifier), show task->comm of the current pid, be an array of longs that represent a stacktrace, and more.
It was found that nothing stopped a value from taking a modifier. As values are simple counters. If this happened, it would call code that was not expecting a modifier and crash the kernel. This was fixed by having the ___create_val_field() function test if a modifier was present and fail if one was. This fixed the crash.
Now there's a problem with variables. Variables are used to pass fields from one event to another. Variables are allowed to have some modifiers, as the processing may need to happen at the time of the event (like stacktraces and comm names of the current pid). The issue is that it too uses __create_val_field(). Now that fails on modifiers, variables can no longer use them (this is a regression).
As not all modifiers are for variables, have them use a separate check.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Mark Rutland <[email protected]> Fixes: e0213434fe3e4 ("tracing: Do not let histogram values have some modifiers") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1 |
|
| #
9f116f76 |
| 02-Mar-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Check field value in hist_field_name()
The function hist_field_name() cannot handle being passed a NULL field parameter. It should never be NULL, but due to a previous bug, NULL was passed
tracing: Check field value in hist_field_name()
The function hist_field_name() cannot handle being passed a NULL field parameter. It should never be NULL, but due to a previous bug, NULL was passed to the function and the kernel crashed due to a NULL dereference. Mark Rutland reported this to me on IRC.
The bug was fixed, but to prevent future bugs from crashing the kernel, check the field and add a WARN_ON() if it is NULL.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reported-by: Mark Rutland <[email protected]> Fixes: c6afad49d127f ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers") Tested-by: Mark Rutland <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
e0213434 |
| 02-Mar-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Do not let histogram values have some modifiers
Histogram values can not be strings, stacktraces, graphs, symbols, syscalls, or grouped in buckets or log. Give an error if a value is set to
tracing: Do not let histogram values have some modifiers
Histogram values can not be strings, stacktraces, graphs, symbols, syscalls, or grouped in buckets or log. Give an error if a value is set to do so.
Note, the histogram code was not prepared to handle these modifiers for histograms and caused a bug.
Mark Rutland reported:
# echo 'p:copy_to_user __arch_copy_to_user n=$arg2' >> /sys/kernel/tracing/kprobe_events # echo 'hist:keys=n:vals=hitcount.buckets=8:sort=hitcount' > /sys/kernel/tracing/events/kprobes/copy_to_user/trigger # cat /sys/kernel/tracing/events/kprobes/copy_to_user/hist [ 143.694628] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 143.695190] Mem abort info: [ 143.695362] ESR = 0x0000000096000004 [ 143.695604] EC = 0x25: DABT (current EL), IL = 32 bits [ 143.695889] SET = 0, FnV = 0 [ 143.696077] EA = 0, S1PTW = 0 [ 143.696302] FSC = 0x04: level 0 translation fault [ 143.702381] Data abort info: [ 143.702614] ISV = 0, ISS = 0x00000004 [ 143.702832] CM = 0, WnR = 0 [ 143.703087] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000448f9000 [ 143.703407] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 143.704137] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 143.704714] Modules linked in: [ 143.705273] CPU: 0 PID: 133 Comm: cat Not tainted 6.2.0-00003-g6fc512c10a7c #3 [ 143.706138] Hardware name: linux,dummy-virt (DT) [ 143.706723] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 143.707120] pc : hist_field_name.part.0+0x14/0x140 [ 143.707504] lr : hist_field_name.part.0+0x104/0x140 [ 143.707774] sp : ffff800008333a30 [ 143.707952] x29: ffff800008333a30 x28: 0000000000000001 x27: 0000000000400cc0 [ 143.708429] x26: ffffd7a653b20260 x25: 0000000000000000 x24: ffff10d303ee5800 [ 143.708776] x23: ffffd7a6539b27b0 x22: ffff10d303fb8c00 x21: 0000000000000001 [ 143.709127] x20: ffff10d303ec2000 x19: 0000000000000000 x18: 0000000000000000 [ 143.709478] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 143.709824] x14: 0000000000000000 x13: 203a6f666e692072 x12: 6567676972742023 [ 143.710179] x11: 0a230a6d6172676f x10: 000000000000002c x9 : ffffd7a6521e018c [ 143.710584] x8 : 000000000000002c x7 : 7f7f7f7f7f7f7f7f x6 : 000000000000002c [ 143.710915] x5 : ffff10d303b0103e x4 : ffffd7a653b20261 x3 : 000000000000003d [ 143.711239] x2 : 0000000000020001 x1 : 0000000000000001 x0 : 0000000000000000 [ 143.711746] Call trace: [ 143.712115] hist_field_name.part.0+0x14/0x140 [ 143.712642] hist_field_name.part.0+0x104/0x140 [ 143.712925] hist_field_print+0x28/0x140 [ 143.713125] event_hist_trigger_print+0x174/0x4d0 [ 143.713348] hist_show+0xf8/0x980 [ 143.713521] seq_read_iter+0x1bc/0x4b0 [ 143.713711] seq_read+0x8c/0xc4 [ 143.713876] vfs_read+0xc8/0x2a4 [ 143.714043] ksys_read+0x70/0xfc [ 143.714218] __arm64_sys_read+0x24/0x30 [ 143.714400] invoke_syscall+0x50/0x120 [ 143.714587] el0_svc_common.constprop.0+0x4c/0x100 [ 143.714807] do_el0_svc+0x44/0xd0 [ 143.714970] el0_svc+0x2c/0x84 [ 143.715134] el0t_64_sync_handler+0xbc/0x140 [ 143.715334] el0t_64_sync+0x190/0x194 [ 143.715742] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (f9400000) [ 143.716510] ---[ end trace 0000000000000000 ]--- Segmentation fault
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: c6afad49d127f ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers") Reported-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.2, v6.2-rc8 |
|
| #
f5914b30 |
| 10-Feb-2023 |
Tom Zanussi <[email protected]> |
tracing/histogram: Fix stacktrace key
The current code will always use the current stacktrace as a key even if a stacktrace contained in a specific event field was specified.
For example, we expect
tracing/histogram: Fix stacktrace key
The current code will always use the current stacktrace as a key even if a stacktrace contained in a specific event field was specified.
For example, we expect to use the 'unsigned long[] stack' field in the below event in the histogram:
# echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > /sys/kernel/debug/tracing/dynamic_events # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
But in fact, when we type out the trigger, we see that it's using the plain old global 'stacktrace' as the key, which is just the stacktrace when the event was hit and not the stacktrace contained in the event, which is what we want:
# cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
And in fact, there's no code to actually retrieve it from the event, so we need to add HIST_FIELD_FN_STACK and hist_field_stack() to get it and hook it into the trigger code. For now, since the stack is just using dynamic strings, this could just use the dynamic string function, but it seems cleaner to have a dedicated function an be able to tweak independently as necessary.
Link: https://lkml.kernel.org/r/11aa614c82976adbfa4ea763dbe885b5fb01d59c.1676063532.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <[email protected]> [ Fixed 32bit build warning reported by kernel test robot <[email protected]> ] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
2bacfd9f |
| 10-Feb-2023 |
Tom Zanussi <[email protected]> |
tracing/histogram: Fix a few problems with stacktrace variable printing
Currently, there are a few problems when printing hist triggers and trace output when using stacktrace variables. This fixes
tracing/histogram: Fix a few problems with stacktrace variable printing
Currently, there are a few problems when printing hist triggers and trace output when using stacktrace variables. This fixes the problems seen below:
# echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger # cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
# echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger hist:keys=next_pid:vals=hitcount:ts=common_timestamp.usecs,st=stacktrace.stacktrace:sort=hitcount:size=2048:clock=global if prev_state == 2 [active]
and also in the trace output (should be stack.stacktrace):
{ delta: ~ 100-199, stacktrace __schedule+0xa19/0x1520
Link: https://lkml.kernel.org/r/60bebd4e546728e012a7a2bcbf58716d48ba6edb.1676063532.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
8261ef2e |
| 16-Feb-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
The max string length for a histogram variable is 256 bytes. The max depth of a stacktrace is 16. With 8byte words, that's 16 * 8 = 1
tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
The max string length for a histogram variable is 256 bytes. The max depth of a stacktrace is 16. With 8byte words, that's 16 * 8 = 128. Which can easily fit in the string variable. The histogram stacktrace is being stored in the string value (with the given max length), with the assumption it will fit. To make sure that this is always the case (in the case that the stack trace depth increases), add a BUILD_BUG_ON() to test this.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
fc1a9dc1 |
| 10-Feb-2023 |
Tom Zanussi <[email protected]> |
tracing/histogram: Don't use strlen to find length of stacktrace variables
Because stacktraces are saved in dynamic strings, trace_event_raw_event_synth() uses strlen to determine the length of the
tracing/histogram: Don't use strlen to find length of stacktrace variables
Because stacktraces are saved in dynamic strings, trace_event_raw_event_synth() uses strlen to determine the length of the stack. Stacktraces may contain 0-bytes, though, in the saved addresses, so the length found and passed to reserve() will be too small.
Fix this by using the first unsigned long in the stack variables to store the actual number of elements in the stack and have trace_event_raw_event_synth() use that to determine the length of the stack.
Link: https://lkml.kernel.org/r/1ed6906cd9d6477ef2bd8e63c61de20a9ffe64d7.1676063532.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc7, v6.2-rc6, v6.2-rc5 |
|
| #
cc5fc8bf |
| 17-Jan-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing/histogram: Add stacktrace type
Now that stacktraces can be part of synthetic events, allow a key to be typed as a stacktrace.
# cd /sys/kernel/tracing # echo 's:block_lat u64 delta; uns
tracing/histogram: Add stacktrace type
Now that stacktraces can be part of synthetic events, allow a key to be typed as a stacktrace.
# cd /sys/kernel/tracing # echo 's:block_lat u64 delta; unsigned long stack[];' >> dynamic_events # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,st2=$st:onmatch(sched.sched_switch).trace(block_lat,$delta,$st2)' >> events/sched/sched_switch/trigger # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger
# cat events/synthetic/block_lat/hist
# event histogram # # trigger info: hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active] #
{ delta: ~ 0-99, stacktrace: event_hist_trigger+0x464/0x480 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x250 trace_event_raw_event_sched_switch+0xfc/0x150 __traceiter_sched_switch+0x41/0x60 __schedule+0x448/0x7b0 schedule_idle+0x26/0x40 cpu_startup_entry+0x19/0x20 start_secondary+0xed/0xf0 secondary_startup_64_no_verify+0xe0/0xeb } hitcount: 6 { delta: ~ 0-99, stacktrace: event_hist_trigger+0x464/0x480 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x250 trace_event_raw_event_sched_switch+0xfc/0x150 __traceiter_sched_switch+0x41/0x60 __schedule+0x448/0x7b0 schedule_idle+0x26/0x40 cpu_startup_entry+0x19/0x20 __pfx_kernel_init+0x0/0x10 arch_call_rest_init+0xa/0x24 start_kernel+0x964/0x98d secondary_startup_64_no_verify+0xe0/0xeb } hitcount: 3 { delta: ~ 0-99, stacktrace: event_hist_trigger+0x464/0x480 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x250 trace_event_raw_event_sched_switch+0xfc/0x150 __traceiter_sched_switch+0x41/0x60 __schedule+0x448/0x7b0 schedule+0x5a/0xb0 worker_thread+0xaf/0x380 kthread+0xe9/0x110 ret_from_fork+0x2c/0x50 } hitcount: 1 { delta: ~ 100-199, stacktrace: event_hist_trigger+0x464/0x480 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x250 trace_event_raw_event_sched_switch+0xfc/0x150 __traceiter_sched_switch+0x41/0x60 __schedule+0x448/0x7b0 schedule_idle+0x26/0x40 cpu_startup_entry+0x19/0x20 start_secondary+0xed/0xf0 secondary_startup_64_no_verify+0xe0/0xeb } hitcount: 15 [..] { delta: ~ 8500-8599, stacktrace: event_hist_trigger+0x464/0x480 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x250 trace_event_raw_event_sched_switch+0xfc/0x150 __traceiter_sched_switch+0x41/0x60 __schedule+0x448/0x7b0 schedule_idle+0x26/0x40 cpu_startup_entry+0x19/0x20 start_secondary+0xed/0xf0 secondary_startup_64_no_verify+0xe0/0xeb } hitcount: 1
Totals: Hits: 89 Entries: 11 Dropped: 0
Link: https://lkml.kernel.org/r/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Ching-lin Yu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
00cf3d67 |
| 17-Jan-2023 |
Steven Rostedt (Google) <[email protected]> |
tracing: Allow synthetic events to pass around stacktraces
Allow a stacktrace from one event to be displayed by the end event of a synthetic event. This is very useful when looking for the longest l
tracing: Allow synthetic events to pass around stacktraces
Allow a stacktrace from one event to be displayed by the end event of a synthetic event. This is very useful when looking for the longest latency of a sleep or something blocked on I/O.
# cd /sys/kernel/tracing/ # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' > events/sched/sched_switch/trigger # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
The above creates a "block_lat" synthetic event that take the stacktrace of when a task schedules out in either the interruptible or uninterruptible states, and on a new per process max $delta (the time it was scheduled out), will print the process id and the stacktrace.
# echo 1 > events/synthetic/block_lat/enable # cat trace # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | kworker/u16:0-767 [006] d..4. 560.645045: block_lat: pid=767 delta=66 stack=STACK: => __schedule => schedule => pipe_read => vfs_read => ksys_read => do_syscall_64 => 0x966000aa
<idle>-0 [003] d..4. 561.132117: block_lat: pid=0 delta=413787 stack=STACK: => __schedule => schedule => schedule_hrtimeout_range_clock => do_sys_poll => __x64_sys_poll => do_syscall_64 => 0x966000aa
<...>-153 [006] d..4. 562.068407: block_lat: pid=153 delta=54 stack=STACK: => __schedule => schedule => io_schedule => rq_qos_wait => wbt_wait => __rq_qos_throttle => blk_mq_submit_bio => submit_bio_noacct_nocheck => ext4_bio_write_page => mpage_submit_page => mpage_process_page_bufs => mpage_prepare_extent_to_map => ext4_do_writepages => ext4_writepages => do_writepages => __writeback_single_inode
Link: https://lkml.kernel.org/r/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Ching-lin Yu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|