|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2 |
|
| #
833fd800 |
| 11-Jul-2023 |
Petr Pavlu <[email protected]> |
x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT
The kprobes optimization check can_optimize() calls insn_is_indirect_jump() to detect indirect jump instructions
x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT
The kprobes optimization check can_optimize() calls insn_is_indirect_jump() to detect indirect jump instructions in a target function. If any is found, creating an optprobe is disallowed in the function because the jump could be from a jump table and could potentially land in the middle of the target optprobe.
With retpolines, insn_is_indirect_jump() additionally looks for calls to indirect thunks which the compiler potentially used to replace original jumps. This extra check is however unnecessary because jump tables are disabled when the kernel is built with retpolines. The same is currently the case with IBT.
Based on this observation, remove the logic to look for calls to indirect thunks and skip the check for indirect jumps altogether if the kernel is built with retpolines or IBT. Remove subsequently the symbols __indirect_thunk_start and __indirect_thunk_end which are no longer needed.
Dropping this logic indirectly fixes a problem where the range [__indirect_thunk_start, __indirect_thunk_end] wrongly included also the return thunk. It caused that machines which used the return thunk as a mitigation and didn't have it patched by any alternative ended up not being able to use optprobes in any regular function.
Fixes: 0b53c374b9ef ("x86/retpoline: Use -mfunction-return") Suggested-by: Peter Zijlstra (Intel) <[email protected]> Suggested-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Petr Pavlu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
ee84a303 |
| 08-Jun-2023 |
Ian Rogers <[email protected]> |
perf thread: Add accessor functions for thread
Using accessors will make it easier to add reference count checking in later patches.
Committer notes:
thread->nsinfo wasn't wrapped as it is used to
perf thread: Add accessor functions for thread
Using accessors will make it easier to add reference count checking in later patches.
Committer notes:
thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer.
When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
5ab6d715 |
| 20-Mar-2023 |
Ian Rogers <[email protected]> |
perf maps: Add functions to access maps
Introduce functions to access struct maps. These functions reduce the number of places reference counting is necessary. While tidying APIs do some small const
perf maps: Add functions to access maps
Introduce functions to access struct maps. These functions reduce the number of places reference counting is necessary. While tidying APIs do some small const-ification, in particlar to unwind_libunwind_ops.
Committer notes:
Fixed up tools/perf/util/unwind-libunwind.c:
- return ops->get_entries(cb, arg, thread, data, max_stack); + return ops->get_entries(cb, arg, thread, data, max_stack, best_effort);
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4 |
|
| #
3749e0bb |
| 29-Apr-2020 |
Adrian Hunter <[email protected]> |
perf thread-stack: Add thread_stack__br_sample_late()
Add a thread stack function to create a branch stack for hardware events where the sample records get created some time after the event occurred
perf thread-stack: Add thread_stack__br_sample_late()
Add a thread stack function to create a branch stack for hardware events where the sample records get created some time after the event occurred.
Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
86d67180 |
| 29-Apr-2020 |
Adrian Hunter <[email protected]> |
perf thread-stack: Add branch stack support
Intel PT already has support for creating branch stacks for each context (per-cpu or per-thread). In the more common per-cpu case, the branch stack is not
perf thread-stack: Add branch stack support
Intel PT already has support for creating branch stacks for each context (per-cpu or per-thread). In the more common per-cpu case, the branch stack is not separated for different threads, instead being cleared in between each sample.
That approach will not work very well for adding branch stacks to regular events. The branch stacks really need to be accumulated separately for each thread.
As a start to accomplishing that, this patch adds support for putting branch stack support into the thread-stack. The advantages are:
1. the branches are accumulated separately for each thread 2. the branch stack is cleared only in between continuous traces
This helps pave the way for adding branch stacks to regular events, not just synthesized events as at present.
Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.7-rc3, v5.7-rc2, v5.7-rc1 |
|
| #
4fef41bf |
| 01-Apr-2020 |
Adrian Hunter <[email protected]> |
perf thread-stack: Add thread_stack__sample_late()
Add a thread stack function to create a call chain for hardware events where the sample records get created some time after the event occurred.
Si
perf thread-stack: Add thread_stack__sample_late()
Add a thread stack function to create a call chain for hardware events where the sample records get created some time after the event occurred.
Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1 |
|
| #
fe87797d |
| 26-Nov-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf thread: Rename thread->mg to thread->maps
One more step on the merge of 'struct maps' with 'struct map_groups'.
Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]>
perf thread: Rename thread->mg to thread->maps
One more step on the merge of 'struct maps' with 'struct map_groups'.
Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7 |
|
| #
8520a98d |
| 29-Aug-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf debug: Remove needless include directives from debug.h
All we need there is a forward declaration for 'union perf_event', so remove it from there and add missing header directives in places usi
perf debug: Remove needless include directives from debug.h
All we need there is a forward declaration for 'union perf_event', so remove it from there and add missing header directives in places using things from this indirect include.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2 |
|
| #
7f7c536f |
| 04-Jul-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
tools lib: Adopt zalloc()/zfree() from tools/perf
Eroding a bit more the tools/perf/util/util.h hodpodge header.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Nam
tools lib: Adopt zalloc()/zfree() from tools/perf
Eroding a bit more the tools/perf/util/util.h hodpodge header.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
215a0d30 |
| 04-Jul-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf tools: Add missing headers, mostly stdlib.h
Part of the erosion of util/util.h, that will lose its include stdlib.h, we need to add it to places where it is needed but was getting it indirectly
perf tools: Add missing headers, mostly stdlib.h
Part of the erosion of util/util.h, that will lose its include stdlib.h, we need to add it to places where it is needed but was getting it indirectly.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc7, v5.2-rc6 |
|
| #
eb5d8544 |
| 19-Jun-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Eliminate code duplicating thread_stack__pop_ks()
Use new function thread_stack__pop_ks() in place of equivalent code.
Signed-off-by: Adrian Hunter <[email protected]> Cc:
perf thread-stack: Eliminate code duplicating thread_stack__pop_ks()
Use new function thread_stack__pop_ks() in place of equivalent code.
Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
97860b48 |
| 19-Jun-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Fix thread stack return from kernel for kernel-only case
Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol") had the side-effect of introd
perf thread-stack: Fix thread stack return from kernel for kernel-only case
Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol") had the side-effect of introducing more stack entries before return from kernel space.
When user space is also traced, those entries are popped before entry to user space, but when user space is not traced, they get stuck at the bottom of the stack, making the stack grow progressively larger.
Fix by detecting a return-from-kernel branch type, and popping kernel addresses from the stack then.
Note, the problem and fix affect the exported Call Graph / Tree but not the callindent option used by "perf script --call-trace".
Example:
perf-with-kcore record example -e intel_pt//k -- ls perf-with-kcore script example --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db
Menu option: Reports -> Context-Sensitive Call Graph
Before: (showing Call Path column only)
Call Path ▶ perf ▼ ls ▼ 12111:12111 ▶ setup_new_exec ▶ __task_pid_nr_ns ▶ perf_event_pid_type ▶ perf_event_comm_output ▶ perf_iterate_ctx ▶ perf_iterate_sb ▶ perf_event_comm ▶ __set_task_comm ▶ load_elf_binary ▶ search_binary_handler ▶ __do_execve_file.isra.41 ▶ __x64_sys_execve ▶ do_syscall_64 ▼ entry_SYSCALL_64_after_hwframe ▼ swapgs_restore_regs_and_return_to_usermode ▼ native_iret ▶ error_entry ▶ do_page_fault ▼ error_exit ▼ retint_user ▶ prepare_exit_to_usermode ▼ native_iret ▶ error_entry ▶ do_page_fault ▼ error_exit ▼ retint_user ▶ prepare_exit_to_usermode ▼ native_iret ▶ error_entry ▶ do_page_fault ▼ error_exit ▼ retint_user ▶ prepare_exit_to_usermode ▶ native_iret
After: (showing Call Path column only)
Call Path ▶ perf ▼ ls ▼ 12111:12111 ▶ setup_new_exec ▶ __task_pid_nr_ns ▶ perf_event_pid_type ▶ perf_event_comm_output ▶ perf_iterate_ctx ▶ perf_iterate_sb ▶ perf_event_comm ▶ __set_task_comm ▶ load_elf_binary ▶ search_binary_handler ▶ __do_execve_file.isra.41 ▶ __x64_sys_execve ▶ do_syscall_64 ▶ entry_SYSCALL_64_after_hwframe ▶ page_fault ▼ entry_SYSCALL_64 ▼ do_syscall_64 ▶ __x64_sys_brk ▶ __x64_sys_access ▶ __x64_sys_openat ▶ __x64_sys_newfstat ▶ __x64_sys_mmap ▶ __x64_sys_close ▶ __x64_sys_read ▶ __x64_sys_mprotect ▶ __x64_sys_arch_prctl ▶ __x64_sys_munmap ▶ exit_to_usermode_loop ▶ __x64_sys_set_tid_address ▶ __x64_sys_set_robust_list ▶ __x64_sys_rt_sigaction ▶ __x64_sys_rt_sigprocmask ▶ __x64_sys_prlimit64 ▶ __x64_sys_statfs ▶ __x64_sys_ioctl ▶ __x64_sys_getdents64 ▶ __x64_sys_write ▶ __x64_sys_exit_group
Committer notes:
The first arg to the perf-with-kcore needs to be the same for the 'record' and 'script' lines, otherwise we'll record the perf.data file and kcore_dir/ files in one directory ('example') to then try to use it from the 'bep' directory, fix the instructions above it so that both use 'example'.
Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc5, v5.2-rc4, v5.2-rc3 |
|
| #
2025cf9e |
| 29-May-2019 |
Thomas Gleixner <[email protected]> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms and c
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 263 file(s).
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc2 |
|
| #
003ccdc7 |
| 20-May-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Accumulate IPC information
Cycle and instruction counts are added to the stack. The IPC of a function and all functions it calls, is also recorded.
Signed-off-by: Adrian Hunter <
perf thread-stack: Accumulate IPC information
Cycle and instruction counts are added to the stack. The IPC of a function and all functions it calls, is also recorded.
Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0 |
|
| #
f435887e |
| 28-Feb-2019 |
Adrian Hunter <[email protected]> |
perf db-export: Add calls parent_id to enable creation of call trees
The call_path can be used to find the parent symbol for a call but not the exact parent call. To do that add parent_id to the cal
perf db-export: Add calls parent_id to enable creation of call trees
The call_path can be used to find the parent symbol for a call but not the exact parent call. To do that add parent_id to the call_return export. This enables the creation of a call tree from the exported data.
Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2 |
|
| #
3c0cd952 |
| 09-Jan-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Hide x86 retpolines
x86 retpoline functions pollute the call graph by showing up everywhere there is an indirect branch, but they do not really mean anything. Make changes so that
perf thread-stack: Hide x86 retpolines
x86 retpoline functions pollute the call graph by showing up everywhere there is an indirect branch, but they do not really mean anything. Make changes so that the default retpoline functions will no longer appear in the call graph. Note this only affects the call graph, since all the original branches are left unchanged.
This does not handle function return thunks, nor is there any improvement for the handling of inline thunks or extern thunks.
Example:
$ cat simple-retpoline.c __attribute__((noinline)) int bar(void) { return -1; }
int foo(void) { return bar() + 1; }
__attribute__((indirect_branch("thunk"))) int main() { int (*volatile fn)(void) = foo;
fn(); return fn(); } $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c $ objdump -d simple-retpoline <SNIP> 0000000000001040 <main>: 1040: 48 83 ec 18 sub $0x18,%rsp 1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo> 104b: 48 89 44 24 08 mov %rax,0x8(%rsp) 1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax 1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax> 105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax 105f: 48 83 c4 18 add $0x18,%rsp 1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax> <SNIP> 0000000000001160 <bar>: 1160: b8 ff ff ff ff mov $0xffffffff,%eax 1165: c3 retq <SNIP> 0000000000001170 <foo>: 1170: e8 eb ff ff ff callq 1160 <bar> 1175: 83 c0 01 add $0x1,%eax 1178: c3 retq 0000000000001179 <__x86_indirect_thunk_rax>: 1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc> 117e: f3 90 pause 1180: 0f ae e8 lfence 1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5> 1185: 48 89 04 24 mov %rax,(%rsp) 1189: c3 retq <SNIP> $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ] $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls 2019-01-08 14:03:37.851655 Creating database... 2019-01-08 14:03:37.863256 Writing records... 2019-01-08 14:03:38.069750 Adding indexes 2019-01-08 14:03:38.078799 Done $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main -> __x86_indirect_thunk_rax -> __x86_indirect_thunk_rax -> foo -> bar
After:
main -> foo -> bar
Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Remove (sym->name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
1f35cd65 |
| 09-Jan-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Improve thread_stack__no_call_return()
Improve thread_stack__no_call_return() to better handle 'returns' that do not match the stack i.e. 'no call'. See code comments for details.
perf thread-stack: Improve thread_stack__no_call_return()
Improve thread_stack__no_call_return() to better handle 'returns' that do not match the stack i.e. 'no call'. See code comments for details. The example below shows how retpolines are affected:
Example:
$ cat simple-retpoline.c __attribute__((noinline)) int bar(void) { return -1; }
int foo(void) { return bar() + 1; }
__attribute__((indirect_branch("thunk"))) int main() { int (*volatile fn)(void) = foo;
fn(); return fn(); } $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c $ objdump -d simple-retpoline <SNIP> 0000000000001040 <main>: 1040: 48 83 ec 18 sub $0x18,%rsp 1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo> 104b: 48 89 44 24 08 mov %rax,0x8(%rsp) 1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax 1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax> 105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax 105f: 48 83 c4 18 add $0x18,%rsp 1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax> <SNIP> 0000000000001160 <bar>: 1160: b8 ff ff ff ff mov $0xffffffff,%eax 1165: c3 retq <SNIP> 0000000000001170 <foo>: 1170: e8 eb ff ff ff callq 1160 <bar> 1175: 83 c0 01 add $0x1,%eax 1178: c3 retq 0000000000001179 <__x86_indirect_thunk_rax>: 1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc> 117e: f3 90 pause 1180: 0f ae e8 lfence 1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5> 1185: 48 89 04 24 mov %rax,(%rsp) 1189: c3 retq <SNIP> $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ] $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls 2019-01-08 14:03:37.851655 Creating database... 2019-01-08 14:03:37.863256 Writing records... 2019-01-08 14:03:38.069750 Adding indexes 2019-01-08 14:03:38.078799 Done $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main -> __x86_indirect_thunk_rax -> __x86_indirect_thunk_rax -> __x86_indirect_thunk_rax -> bar
After:
main -> __x86_indirect_thunk_rax -> __x86_indirect_thunk_rax -> foo -> bar
Committer testing:
Chose "Reports", Then "Context-Sensitive Call Graph" and then go on expanding:
Before:
simple-retpolin PID:PID _start _start __libc_start_main main __x86_indirect_thunk_rax __x86_indirect_thunk_rax bar
After:
Remove the "simple.retpoline.db" file, run again the 'perf script' line to regenerate the .db file and run the exported-sql-viewer.py again to get the same all the way to 'main', then, from there, including 'main':
main __x86_indirect_thunk_rax __x86_indirect_thunk_rax foo bar
Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
f08046cb |
| 09-Jan-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Represent jmps to the start of a different symbol
The compiler might optimize a call/ret combination by making it a jmp. However the thread-stack does not presently cater for that
perf thread-stack: Represent jmps to the start of a different symbol
The compiler might optimize a call/ret combination by making it a jmp. However the thread-stack does not presently cater for that, so that such control flow is not visible in the call graph. Make it visible by recording on the stack a branch to the start of a different symbol. Note, that means when a ret pops the stack, all jmps must be popped off first.
Example:
$ cat jmp-to-fn.c __attribute__((noinline)) int bar(void) { return -1; }
__attribute__((noinline)) int foo(void) { return bar() + 1; }
int main() { return foo(); } $ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c $ objdump -d jmp-to-fn <SNIP> 0000000000001040 <main>: 1040: 31 c0 xor %eax,%eax 1042: e9 09 01 00 00 jmpq 1150 <foo> <SNIP> 0000000000001140 <bar>: 1140: b8 ff ff ff ff mov $0xffffffff,%eax 1145: c3 retq <SNIP> 0000000000001150 <foo>: 1150: 31 c0 xor %eax,%eax 1152: e8 e9 ff ff ff callq 1140 <bar> 1157: 83 c0 01 add $0x1,%eax 115a: c3 retq <SNIP> $ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ] $ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls 2019-01-08 13:24:58.783069 Creating database... 2019-01-08 13:24:58.794650 Writing records... 2019-01-08 13:24:59.008050 Adding indexes 2019-01-08 13:24:59.015802 Done $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db
Before:
main -> bar
After:
main -> foo -> bar
Committer testing:
Install the python2-pyside package, then select these menu options on the GUI:
"Reports" "Context sensitive callgraphs"
Then go on expanding the symbols, to get, full picture when doing this on a fedora:29 with gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC):
jmp-to-fn PID:TID _start (ld-2.28.so) __libc_start_main main foo bar
To verify that indeed, this fixes the problem.
Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
90c2cda7 |
| 09-Jan-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
Make thread_stack__no_call_return() more readable by adding more local variables.
Signed-off-by: Adrian Hunter
perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
Make thread_stack__no_call_return() more readable by adding more local variables.
Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
e7a3a055 |
| 09-Jan-2019 |
Adrian Hunter <[email protected]> |
perf thread-stack: Tidy thread_stack__push_cp() usage
If 'cp' is checked in thread_stack__push_cp() a number of error checks can be removed, reducing code size and improving readability.
Signed-off
perf thread-stack: Tidy thread_stack__push_cp() usage
If 'cp' is checked in thread_stack__push_cp() a number of error checks can be removed, reducing code size and improving readability.
Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc1, v4.20 |
|
| #
256d92bc |
| 21-Dec-2018 |
Adrian Hunter <[email protected]> |
perf thread-stack: Fix thread stack processing for the idle task
perf creates a single 'struct thread' to represent the idle task. That is because threads are identified by PID and TID, and the idle
perf thread-stack: Fix thread stack processing for the idle task
perf creates a single 'struct thread' to represent the idle task. That is because threads are identified by PID and TID, and the idle task always has PID == TID == 0.
However, there are actually separate idle tasks for each CPU. That creates a problem for thread stack processing which assumes that each thread has a single stack, not one stack per CPU.
Fix that by passing through the CPU number, and in the case of the idle "thread", pick the thread stack from an array based on the CPU number.
Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
139f42f3 |
| 21-Dec-2018 |
Adrian Hunter <[email protected]> |
perf thread-stack: Allocate an array of thread stacks
In preparation for fixing thread stack processing for the idle task, allocate an array of thread stacks.
Signed-off-by: Adrian Hunter <adrian.h
perf thread-stack: Allocate an array of thread stacks
In preparation for fixing thread stack processing for the idle task, allocate an array of thread stacks.
Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ No need to check for NULL when calling zfree(), noticed by Jiri Olsa ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
2e9e8688 |
| 21-Dec-2018 |
Adrian Hunter <[email protected]> |
perf thread-stack: Factor out thread_stack__init()
In preparation for fixing thread stack processing for the idle task, factor out thread_stack__init().
Signed-off-by: Adrian Hunter <adrian.hunter@
perf thread-stack: Factor out thread_stack__init()
In preparation for fixing thread stack processing for the idle task, factor out thread_stack__init().
Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
f6060ac6 |
| 21-Dec-2018 |
Adrian Hunter <[email protected]> |
perf thread-stack: Allow for a thread stack array
In preparation for fixing thread stack processing for the idle task, allow for a thread stack array.
Signed-off-by: Adrian Hunter <adrian.hunter@in
perf thread-stack: Allow for a thread stack array
In preparation for fixing thread stack processing for the idle task, allow for a thread stack array.
Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
bd8e68ac |
| 21-Dec-2018 |
Adrian Hunter <[email protected]> |
perf thread-stack: Avoid direct reference to the thread's stack
In preparation for fixing thread stack processing for the idle task, avoid direct reference to the thread's stack. The thread stack wi
perf thread-stack: Avoid direct reference to the thread's stack
In preparation for fixing thread stack processing for the idle task, avoid direct reference to the thread's stack. The thread stack will change to an array of thread stacks, at which point the meaning of the direct reference will change.
Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Rename thread_stack__ts() to thread__stack() since this operates on a 'thread' struct ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|