|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
70351029 |
| 19-Mar-2025 |
Ian Rogers <[email protected]> |
perf thread: Add support for reading the e_machine type for a thread
First try to read the e_machine from the dsos associated with the thread's maps. If live use the executable from /proc/pid/exe an
perf thread: Add support for reading the e_machine type for a thread
First try to read the e_machine from the dsos associated with the thread's maps. If live use the executable from /proc/pid/exe and read the e_machine from the ELF header. On failure use EM_HOST. Change builtin-trace syscall functions to pass e_machine from the thread rather than EM_HOST, so that in later patches when syscalltbl can use the e_machine the system calls are specific to the architecture.
Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Namhyung Kim <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3 |
|
| #
599c1939 |
| 08-Aug-2024 |
Ian Rogers <[email protected]> |
perf callchain: Fix stitch LBR memory leaks
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps and map members are reference counted. Ensure these values use a _get routine to i
perf callchain: Fix stitch LBR memory leaks
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps and map members are reference counted. Ensure these values use a _get routine to increment the reference counts and use map_symbol__exit() to release the reference counts.
Do similar for 'struct thread's prev_lbr_cursor, but save the size of the prev_lbr_cursor array so that it may be iterated.
Ensure that when stitch_nodes are placed on the free list the map_symbols are exited.
Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to list_splice_init(), so the whole list is moved and nodes aren't leaked.
A reproduction of the memory leaks is possible with a leak sanitizer build in the perf report command of:
``` $ perf record -e cycles --call-graph lbr perf test -w thloop $ perf report --stitch-lbr ```
Reviewed-by: Kan Liang <[email protected]> Fixes: ff165628d72644e3 ("perf callchain: Stitch LBR call stack") Signed-off-by: Ian Rogers <[email protected]> [ Basic tests after applying the patch, repeating the example above ] Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Changbin Du <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
d436f90a |
| 01-Mar-2024 |
Ian Rogers <[email protected]> |
perf machine: Move machine's threads into its own abstraction
Move thread_rb_node into the machine.c file. This hides the implementation of threads from the rest of the code allowing for it to be re
perf machine: Move machine's threads into its own abstraction
Move thread_rb_node into the machine.c file. This hides the implementation of threads from the rest of the code allowing for it to be refactored.
Locking discipline is tightened up in this change. As the lock is now encapsulated in threads, the findnew function requires holding it (as it already did in machine). Rather than do conditionals with locks based on whether the thread should be created (which could potentially be error prone with a read lock match with a write unlock), have a separate threads__find that won't create the thread and only holds the read lock. This effectively duplicates the findnew logic, with the existing findnew logic only operating under a write lock assuming creation is necessary as a previous find failed. The creation may still fail with the write lock due to another thread. The duplication is removed in a later next patch that delegates the implementation to hashtable.
Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5 |
|
| #
8f0ec15f |
| 17-Feb-2024 |
Changbin Du <[email protected]> |
perf: util: use capstone disasm engine to show assembly instructions
Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to d
perf: util: use capstone disasm engine to show assembly instructions
Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to disassemble the hex string via intel XED tool.
Here we use capstone as our disassembler engine to give more friendly instructions. We select libcapstone because capstone can provide more insn details. Perf will fallback to raw instructions if libcapstone is not available.
The advantages compared to XED tool: * Support arm, arm64, x86-32, x86_64 (more could be supported), xed only for x86_64. * Immediate address operands are shown as symbol+offs.
Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1 |
|
| #
9ffa6c75 |
| 02-Nov-2023 |
Ian Rogers <[email protected]> |
perf machine thread: Remove exited threads by default
'struct thread' values hold onto references to mmaps, DSOs, etc. When a thread exits it is necessary to clean all of this memory up by removing
perf machine thread: Remove exited threads by default
'struct thread' values hold onto references to mmaps, DSOs, etc. When a thread exits it is necessary to clean all of this memory up by removing the thread from the machine's threads. Some tools require this doesn't happen, such as auxtrace events, 'perf report' if offcpu events exist or if a task list is being generated, so add a 'struct symbol_conf' member to make the behavior optional. When an exited thread is left in the machine's threads, mark it as exited.
This change relates to commit 40826c45eb0b8856 ("perf thread: Remove notion of dead threads") . Dead threads were removed as they had a reference count of 0 and were difficult to reason about with the reference count checker. Here a thread is removed from threads when it exits, unless via symbol_conf the exited thread isn't remove and is marked as exited. Reference counting behaves as it normally does.
Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Changbin Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: German Gomez <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Ming Wang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Vincent Whitchurch <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Yang Jihong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3 |
|
| #
04cb4fc4 |
| 19-Jul-2023 |
Arnaldo Carvalho de Melo <[email protected]> |
perf thread: Allow tools to register a thread->priv destructor
So that when thread__delete() runs it can be called and free stuff tools stashed into thread->priv, like 'perf trace' does and will use
perf thread: Allow tools to register a thread->priv destructor
So that when thread__delete() runs it can be called and free stuff tools stashed into thread->priv, like 'perf trace' does and will use this new facility to plug some leaks.
Added an assert(thread__priv_destructor == NULL) as suggested in Ian's review.
Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/CAP-5=fV3Er=Ek8=iE=bSGbEBmM56_PJffMWot1g_5Bh8B5hO7A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
f6005caf |
| 08-Jun-2023 |
Ian Rogers <[email protected]> |
perf thread: Add reference count checking
Modify struct declaration and accessor functions for the reference count checkers additional layer of indirection. Make sure pid_cmp in builtin-sched.c uses
perf thread: Add reference count checking
Modify struct declaration and accessor functions for the reference count checkers additional layer of indirection. Make sure pid_cmp in builtin-sched.c uses the underlying/original struct in pointer arithmetic, and not the temporary get/put indirection.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
ee84a303 |
| 08-Jun-2023 |
Ian Rogers <[email protected]> |
perf thread: Add accessor functions for thread
Using accessors will make it easier to add reference count checking in later patches.
Committer notes:
thread->nsinfo wasn't wrapped as it is used to
perf thread: Add accessor functions for thread
Using accessors will make it easier to add reference count checking in later patches.
Committer notes:
thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer.
When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
7ee227f6 |
| 08-Jun-2023 |
Ian Rogers <[email protected]> |
perf thread: Make threads rbtree non-invasive
Separate the rbtree out of thread and into a new struct thread_rb_node. The refcnt is in thread and the rbtree is responsible for a single count.
Signe
perf thread: Make threads rbtree non-invasive
Separate the rbtree out of thread and into a new struct thread_rb_node. The refcnt is in thread and the rbtree is responsible for a single count.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
40826c45 |
| 08-Jun-2023 |
Ian Rogers <[email protected]> |
perf thread: Remove notion of dead threads
The dead thread list is best effort. Threads live on it until the reference count hits zero and they are removed. With correct reference counting this shou
perf thread: Remove notion of dead threads
The dead thread list is best effort. Threads live on it until the reference count hits zero and they are removed. With correct reference counting this should never happen. It is, however, part of the 'perf sched' output that is now removed. If this is an issue we should implement tracking of dead threads in a robust not best-effort way.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3 |
|
| #
cde56712 |
| 27-Oct-2022 |
Arnaldo Carvalho de Melo <[email protected]> |
perf thread: Move thread__resolve() from event.h
Its a thread method, so move it to thread.h, this way some places that were using event.h just to get this prototype may stop doing so and speed up b
perf thread: Move thread__resolve() from event.h
Its a thread method, so move it to thread.h, this way some places that were using event.h just to get this prototype may stop doing so and speed up building and disentanble the header dependency graph.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7 |
|
| #
797efbc5 |
| 11-Jul-2022 |
Adrian Hunter <[email protected]> |
perf tools: Add guest_cpu to hypervisor threads
It is possible to know which guest machine was running at a point in time based on the PID of the currently running host thread. That is, perf identif
perf tools: Add guest_cpu to hypervisor threads
It is possible to know which guest machine was running at a point in time based on the PID of the currently running host thread. That is, perf identifies guest machines by the PID of the hypervisor.
To determine the guest CPU, put it on the hypervisor (QEMU) thread for that VCPU.
This is done when processing the id_index which provides the necessary information.
Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7 |
|
| #
ff165628 |
| 19-Mar-2020 |
Kan Liang <[email protected]> |
perf callchain: Stitch LBR call stack
In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers.
For example, on skylake, the depth of reconstructed
perf callchain: Stitch LBR call stack
In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers.
For example, on skylake, the depth of reconstructed LBR call stack is always <= 32.
# To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles' # Event count (approx.): 6487119731 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. # ................................
99.97% 99.97% tchain_edit tchain_edit [.] f43 | --99.64%--f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30 f31 f32 f33 f34 f35 f36 f37 f38 f39 f40 f41 f42 f43
For a call stack which is deeper than LBR limit, HW will overwrite the LBR register with oldest branch. Only partial call stacks can be reconstructed.
However, the overwritten LBRs may still be retrieved from previous sample. At that moment, HW hasn't overwritten the LBR registers yet. Perf tools can stitch those overwritten LBRs on current call stacks to get a more complete call stack.
To determine if LBRs can be stitched, perf tools need to compare current sample with previous sample.
- They should have identical LBR records (Same from, to and flags values, and the same physical index of LBR registers).
- The searching starts from the base-of-stack of current sample.
Once perf determines to stitch the previous LBRs, the corresponding LBR cursor nodes will be copied to 'lists'. The 'lists' is to track the LBR cursor nodes which are going to be stitched.
When the stitching is over, the nodes will not be freed immediately. They will be moved to 'free_lists'. Next stitching may reuse the space. Both 'lists' and 'free_lists' will be freed when all samples are processed.
Committer notes:
Fix the intel-pt.c initialization of the union with 'struct branch_flags', that breaks the build with its unnamed union on older gcc versions.
Uninline thread__free_stitch_list(), as it grew big and started dragging includes to thread.h, so move it to thread.c where what it needs in terms of headers are already there.
This fixes the build in several systems such as debian:experimental when cross building to the MIPS32 architecture, i.e. in the other cases what was needed was being included by sheer luck.
In file included from builtin-sched.c:11: util/thread.h: In function 'thread__free_stitch_list': util/thread.h:169:3: error: implicit declaration of function 'free' [-Werror=implicit-function-declaration] 169 | free(pos); | ^~~~ util/thread.h:169:3: error: incompatible implicit declaration of built-in function 'free' [-Werror] util/thread.h:19:1: note: include '<stdlib.h>' or provide a declaration of 'free' 18 | #include "callchain.h" +++ |+#include <stdlib.h> 19 | util/thread.h:174:3: error: incompatible implicit declaration of built-in function 'free' [-Werror] 174 | free(pos); | ^~~~ util/thread.h:174:3: note: include '<stdlib.h>' or provide a declaration of 'free'
Signed-off-by: Kan Liang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pavel Gerasimov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vitaly Slobodskoy <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
7f1d3931 |
| 19-Mar-2020 |
Kan Liang <[email protected]> |
perf callchain: Save previous cursor nodes for LBR stitching approach
The cursor nodes which generates from sample are eventually added into callchain. To avoid generating cursor nodes from previous
perf callchain: Save previous cursor nodes for LBR stitching approach
The cursor nodes which generates from sample are eventually added into callchain. To avoid generating cursor nodes from previous samples again, the previous cursor nodes are also saved for LBR stitching approach.
Some option, e.g. hide-unresolved, may hide some LBRs. Add a variable 'valid' in struct callchain_cursor_node to indicate this case. The LBR stitching approach will only append the valid cursor nodes from previous samples later.
Signed-off-by: Kan Liang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pavel Gerasimov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vitaly Slobodskoy <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] [ Use zfree() instead of open coded equivalent, and use it when freeing members of structs ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
9c6c3f47 |
| 19-Mar-2020 |
Kan Liang <[email protected]> |
perf thread: Save previous sample for LBR stitching approach
To retrieve the overwritten LBRs from previous sample for LBR stitching approach, perf has to save the previous sample.
Only allocate th
perf thread: Save previous sample for LBR stitching approach
To retrieve the overwritten LBRs from previous sample for LBR stitching approach, perf has to save the previous sample.
Only allocate the struct lbr_stitch once, when LBR stitching approach is enabled and kernel supports hw_idx.
Signed-off-by: Kan Liang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pavel Gerasimov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vitaly Slobodskoy <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] [ Use zalloc()/zfree() for thread->lbr_stitch ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
771fd155 |
| 19-Mar-2020 |
Kan Liang <[email protected]> |
perf thread: Add a knob for LBR stitch approach
The LBR stitch approach should be disabled by default. Because
- The stitching approach base on LBR call stack technology. The known limitations of
perf thread: Add a knob for LBR stitch approach
The LBR stitch approach should be disabled by default. Because
- The stitching approach base on LBR call stack technology. The known limitations of LBR call stack technology still apply to the approach, e.g. Exception handing such as setjmp/longjmp will have calls/returns not match.
- This approach is not foolproof. There can be cases where it creates incorrect call stacks from incorrect matches. There is no attempt to validate any matches in another way.
The 'lbr_stitch_enable' is used to indicate whether enable LBR stitch approach, which is disabled by default. The following patch will introduce a new option for each tools to enable the LBR stitch approach.
Signed-off-by: Kan Liang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pavel Gerasimov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vitaly Slobodskoy <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1 |
|
| #
fe87797d |
| 26-Nov-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf thread: Rename thread->mg to thread->maps
One more step on the merge of 'struct maps' with 'struct map_groups'.
Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]>
perf thread: Rename thread->mg to thread->maps
One more step on the merge of 'struct maps' with 'struct map_groups'.
Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
79b6bb73 |
| 26-Nov-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf maps: Merge 'struct maps' with 'struct map_groups'
And pick the shortest name: 'struct maps'.
The split existed because we used to have two groups of maps, one for functions and one for variab
perf maps: Merge 'struct maps' with 'struct map_groups'
And pick the shortest name: 'struct maps'.
The split existed because we used to have two groups of maps, one for functions and one for variables, but that only complicated things, sometimes we needed to figure out what was at some address and then had to first try it on the functions group and if that failed, fall back to the variables one.
That split is long gone, so for quite a while we had only one struct maps per struct map_groups, simplify things by combining those structs.
First patch is the minimum needed to merge both, follow up patches will rename 'thread->mg' to 'thread->maps', etc.
Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7 |
|
| #
69d81f09 |
| 26-Aug-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
libperf: Rename the PERF_RECORD_ structs to have a "perf" suffix
Even more, to have a "perf_record_" prefix, so that they match the PERF_RECORD_ enum they map to.
Cc: Adrian Hunter <adrian.hunter@i
libperf: Rename the PERF_RECORD_ structs to have a "perf" suffix
Even more, to have a "perf_record_" prefix, so that they match the PERF_RECORD_ enum they map to.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.3-rc6, v5.3-rc5 |
|
| #
e8ba2906 |
| 15-Aug-2019 |
John Keeping <[email protected]> |
perf unwind: Fix libunwind when tid != pid
Commit e5adfc3e7e77 ("perf map: Synthesize maps only for thread group leader") changed the recording side so that we no longer get mmap events for threads
perf unwind: Fix libunwind when tid != pid
Commit e5adfc3e7e77 ("perf map: Synthesize maps only for thread group leader") changed the recording side so that we no longer get mmap events for threads other than the thread group leader (when synthesising these events for threads which exist before perf is started).
When a file recorded after this change is loaded, the lack of mmap records mean that unwinding is not set up for any other threads.
This can be seen in a simple record/report scenario:
perf record --call-graph=dwarf -t $TID perf report
If $TID is a process ID then the report will show call graphs, but if $TID is a secondary thread the output is as if --call-graph=none was specified.
Following the rationale in that commit, move the libunwind fields into struct map_groups and update the libunwind functions to take this instead of the struct thread. This is only required for unwind__finish_access which must now be called from map_groups__delete and the others are changed for symmetry.
Note that unwind__get_entries keeps the thread argument since it is required for symbol lookup and the libdw unwind provider uses the thread ID.
Signed-off-by: John Keeping <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: e5adfc3e7e77 ("perf map: Synthesize maps only for thread group leader") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3 |
|
| #
7cb10a08 |
| 27-May-2019 |
Namhyung Kim <[email protected]> |
perf tools: Remove const from thread read accessors
The namespaces and comm fields of a thread are protected by rwsem and require write access for it. So it ended up using a cast to remove the cons
perf tools: Remove const from thread read accessors
The namespaces and comm fields of a thread are protected by rwsem and require write access for it. So it ended up using a cast to remove the const qualifier. Let's get rid of the const then.
Signed-off-by: Namhyung Kim <[email protected]> Suggested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krister Johansen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1 |
|
| #
15325938 |
| 06-Mar-2019 |
Andi Kleen <[email protected]> |
perf thread: Generalize function to copy from thread addr space from intel-bts code
Add a utility function to fetch executable code. Convert one user over to it. There are more places doing that, bu
perf thread: Generalize function to copy from thread addr space from intel-bts code
Add a utility function to fetch executable code. Convert one user over to it. There are more places doing that, but they do significantly different actions, so they are not easy to fit into a single library function.
Committer changes:
. No need to cast around, make 'buf' be a void pointer.
. Rename it to thread__memcpy() to reflect the fact it is about copying a chunk of memory from a thread, i.e. from its address space.
. No need to have it in a separate object file, move it to thread.[ch]
. Check the return of map__load(), the original code didn't do it, but since we're moving this around, check that as well.
Signed-off-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4 |
|
| #
e22c1c75 |
| 27-Jan-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf thread: Don't include symbol.h, symbol_conf.h is enough
Also add stdio.h to get the FILE definition.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung K
perf thread: Don't include symbol.h, symbol_conf.h is enough
Also add stdio.h to get the FILE definition.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
4fed0726 |
| 27-Jan-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf srccode: Move struct definition from map.h to srccode.h
To reduce the header dependencies, since we already have a srccode.h header, then there is where the 'struct srccode_state' should be, an
perf srccode: Move struct definition from map.h to srccode.h
To reduce the header dependencies, since we already have a srccode.h header, then there is where the 'struct srccode_state' should be, and map.h, that is more widely used should have just a forward declaraion of 'struct srccode_state'.
Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
40f3b2d2 |
| 22-Jan-2019 |
Arnaldo Carvalho de Melo <[email protected]> |
perf namespaces: Remove namespaces.h from .h headers
There we need just forward declarations, so remove it and add it just on the .c files that actually touch the struct definitions.
Cc: Adrian Hun
perf namespaces: Remove namespaces.h from .h headers
There we need just forward declarations, so remove it and add it just on the .c files that actually touch the struct definitions.
Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|