|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
bbf006d6 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Add --code-with-type option.
This option is to show data type info in the regular (code) annotation. It tries to find data type for each (memory) instruction in the function. It'd be
perf annotate: Add --code-with-type option.
This option is to show data type info in the regular (code) annotation. It tries to find data type for each (memory) instruction in the function. It'd be useful to see function-level memory access pattern and also to debug the data type profiling result.
The output would be added at the end of the line and have "# data-type:" prefix.
For now, it only works with --stdio mode for simplicity. I can work on enabling it for TUI later.
$ perf annotate --stdio --code-with-type Percent | Source code & Disassembly of vmlinux for cpu/mem-loads/ppk (253 samples, percent: local period) --------------------------------------------------------------------------------------------------------------- : 0 0xffffffff81baa000 <check_preemption_disabled>: 0.00 : ffffffff81baa000: pushq %r12 # data-type: (stack operation) 0.00 : ffffffff81baa002: pushq %rbp # data-type: (stack operation) 0.00 : ffffffff81baa003: pushq %rbx # data-type: (stack operation) 0.00 : ffffffff81baa004: subq $0x8, %rsp 18.00 : ffffffff81baa008: movl %gs:0x7e48893d(%rip), %ebx # 0x3294c <pcpu_hot+0xc> # data-type: struct pcpu_hot +0xc (cpu_number) 12.58 : ffffffff81baa00f: movl %gs:0x7e488932(%rip), %eax # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa016: testl $0x7fffffff, %eax 0.00 : ffffffff81baa01b: je 0xffffffff81baa02c <check_preemption_disabled+0x2c> 0.00 : ffffffff81baa01d: addq $0x8, %rsp 0.00 : ffffffff81baa021: movl %ebx, %eax 14.19 : ffffffff81baa023: popq %rbx # data-type: (stack operation) 18.86 : ffffffff81baa024: popq %rbp # data-type: (stack operation) 12.10 : ffffffff81baa025: popq %r12 # data-type: (stack operation) 17.78 : ffffffff81baa027: jmp 0xffffffff81bc1170 <__x86_return_thunk> 6.49 : ffffffff81baa02c: callq *0xc9139e(%rip) # 0xffffffff8283b3d0 <pv_ops+0xf0> # data-type: (stack operation) 0.00 : ffffffff81baa032: testb $0x2, %ah 0.00 : ffffffff81baa035: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa037: movq %rdi, %rbp 0.00 : ffffffff81baa03a: movq %gs:0x32940, %rax # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa043: testb $0x4, 0x2f(%rax) # data-type: struct task_struct +0x2f (flags) 0.00 : ffffffff81baa047: je 0xffffffff81baa052 <check_preemption_disabled+0x52> 0.00 : ffffffff81baa049: cmpl $0x1, 0x3d0(%rax) # data-type: struct task_struct +0x3d0 (nr_cpus_allowed) 0.00 : ffffffff81baa050: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa052: movq %gs:0x32940, %r12 # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa05b: cmpw $0x0, 0x7f0(%r12) # data-type: struct task_struct +0x7f0 (migration_disabled) 0.00 : ffffffff81baa065: movq %rsi, (%rsp) 0.00 : ffffffff81baa069: jne 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa06b: movl 0xe8dd13(%rip), %eax # 0xffffffff82a37d84 <system_state> # data-type: enum system_states +0 0.00 : ffffffff81baa071: testl %eax, %eax 0.00 : ffffffff81baa073: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa075: incl %gs:0x7e4888cc(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa07c: movq $-0x7e14a100, %rdi 0.00 : ffffffff81baa083: callq 0xffffffff81148c40 <__printk_ratelimit> # data-type: (stack operation) 0.00 : ffffffff81baa088: testl %eax, %eax 0.00 : ffffffff81baa08a: je 0xffffffff81baa0d5 <check_preemption_disabled+0xd5> 0.00 : ffffffff81baa08c: movl 0x958(%r12), %r9d # data-type: struct task_struct +0x958 (pid) 0.00 : ffffffff81baa094: movq (%rsp), %rdx # data-type: char* +0 0.00 : ffffffff81baa098: movq %rbp, %rsi 0.00 : ffffffff81baa09b: leaq 0xb88(%r12), %r8 # data-type: struct task_struct +0xb88 (comm) 0.00 : ffffffff81baa0a3: movl %gs:0x7e48889e(%rip), %ecx # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0aa: andl $0x7fffffff, %ecx 0.00 : ffffffff81baa0b0: movq $-0x7dd3cdf0, %rdi 0.00 : ffffffff81baa0b7: subl $0x1, %ecx 0.00 : ffffffff81baa0ba: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0bf: movq 0x20(%rsp), %rsi 0.00 : ffffffff81baa0c4: movq $-0x7ddb8c7e, %rdi 0.00 : ffffffff81baa0cb: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0d0: callq 0xffffffff81b7ab60 <dump_stack> # data-type: (stack operation) 0.00 : ffffffff81baa0d5: decl %gs:0x7e48886c(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0dc: jmp 0xffffffff81baa01d <check_preemption_disabled+0x1d>
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
fe8da669 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Pass hist_entry to annotate functions
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_
perf annotate: Pass hist_entry to annotate functions
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_entry so it needs to be passed deeper.
Also rename a function with the same name in the builtin-annotate.c to hist_entry__stdio_annotate since it matches better to the command line option. And change the condition inside to be simpler.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
02b5ed8a |
| 06-Dec-2024 |
Ian Rogers <[email protected]> |
perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS
libperf exposes MAX_NR_CPUS via tools/lib/perf/include/internal/cpumap.h which is internal.
The preferred dependency should be the
perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS
libperf exposes MAX_NR_CPUS via tools/lib/perf/include/internal/cpumap.h which is internal.
The preferred dependency should be the definition in tools/perf/perf.h.
Add the includes of perf.h so that MAX_NR_CPUS can be hidden in libperf.
Reviewed-by: Leo Yan <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kyle Meyer <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4 |
|
| #
8838abf6 |
| 17-Oct-2024 |
Ian Rogers <[email protected]> |
perf build: Rename HAVE_DWARF_SUPPORT to HAVE_LIBDW_SUPPORT
In Makefile.config for unwinding the name dwarf implies either libunwind or libdw. Make it clearer that HAVE_DWARF_SUPPORT is really just
perf build: Rename HAVE_DWARF_SUPPORT to HAVE_LIBDW_SUPPORT
In Makefile.config for unwinding the name dwarf implies either libunwind or libdw. Make it clearer that HAVE_DWARF_SUPPORT is really just defined when libdw is present by renaming to HAVE_LIBDW_SUPPORT.
Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Anup Patel <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: David S. Miller <[email protected]> Cc: Albert Ou <[email protected]> Cc: Shenlin Liang <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Guilherme Amadio <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Changbin Du <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Przemek Kitszel <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Guo Ren <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Will Deacon <[email protected]> Cc: James Clark <[email protected]> Cc: Mike Leach <[email protected]> Cc: Chen Pei <[email protected]> Cc: Leo Yan <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Aditya Gupta <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Bibo Mao <[email protected]> Cc: John Garry <[email protected]> Cc: Atish Patra <[email protected]> Cc: Dima Kogan <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Dr. David Alan Gilbert <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
5eb22425 |
| 17-Oct-2024 |
Ian Rogers <[email protected]> |
perf libdw: Remove unnecessary defines
As HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT always match HAVE_DWARF_SUPPORT remove the macros and use HAVE_DWARF_SUPPORT. If building the fil
perf libdw: Remove unnecessary defines
As HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT always match HAVE_DWARF_SUPPORT remove the macros and use HAVE_DWARF_SUPPORT. If building the file is guarded by CONFIG_DWARF then remove all ifs.
Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Anup Patel <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: David S. Miller <[email protected]> Cc: Albert Ou <[email protected]> Cc: Shenlin Liang <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Guilherme Amadio <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Changbin Du <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Przemek Kitszel <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Guo Ren <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Will Deacon <[email protected]> Cc: James Clark <[email protected]> Cc: Mike Leach <[email protected]> Cc: Chen Pei <[email protected]> Cc: Leo Yan <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Aditya Gupta <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Bibo Mao <[email protected]> Cc: John Garry <[email protected]> Cc: Atish Patra <[email protected]> Cc: Dima Kogan <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Dr. David Alan Gilbert <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
e6952dce |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }', 4000 Hz, Event count (approx.): f3 /home/sdp/test/tchain_edit [Percent: local period] Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%) │ 0000000000401755 <f3>: 0.00 0.00 │ endbr64 │ push %rbp │ mov %rsp,%rbp │ movl $0x0,-0x4(%rbp) 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25 11.03 11.03 │ 11: mov -0x4(%rbp),%eax │ and $0x1,%eax │ test %eax,%eax 17.13 17.13 │2.41 1 |A |- | ↓ je 21 │ addl $0x1,-0x4(%rbp) 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp) 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp) 11.03 11.03 │0.61 3 |A |- | ↑ jle 11 │ nop │ pop %rbp 0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
1f2b7fbb |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available f
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64 space.
Because the saturation of a branch counter is small, e.g., for Intel Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter once saturated. That can be used to indicate a potential event lost because of the saturation.
Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
d48940ca |
| 12-Aug-2024 |
Ian Rogers <[email protected]> |
perf annotate: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@goo
perf annotate: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
30f29bae |
| 12-Aug-2024 |
Ian Rogers <[email protected]> |
perf tool: Constify tool pointers
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be co
perf tool: Constify tool pointers
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be const to lower the possibilities of what could happen with a tool.
Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Adrian Hunter <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3 |
|
| #
336989d0 |
| 07-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Fix --group behavior when leader has no samples
When --group option is used, it should display all events together. But the current logic only checks if the first (leader) event has
perf annotate: Fix --group behavior when leader has no samples
When --group option is used, it should display all events together. But the current logic only checks if the first (leader) event has samples or not. Let's check the member events as well.
Also it missed to put the linked samples from member evsels to the output RB-tree so that it can be displayed in the output.
For example, take a look at this example.
$ ./perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u
It has three events but 'path_put' function has samples only for mem-stores (second) event.
$ sudo ./perf annotate --stdio -f path_put Percent | Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period) ---------------------------------------------------------------------------------------------------------- : 0 0xffffffffae600020 <path_put>: 0.00 : ffffffffae600020: endbr64 0.00 : ffffffffae600024: nopl (%rax, %rax) 91.22 : ffffffffae600029: pushq %rbx 0.00 : ffffffffae60002a: movq %rdi, %rbx 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi 8.78 : ffffffffae600031: callq 0xffffffffae614aa0 0.00 : ffffffffae600036: movq (%rbx), %rdi 0.00 : ffffffffae600039: popq %rbx 0.00 : ffffffffae60003a: jmp 0xffffffffae620670 0.00 : ffffffffae60003f: nop
Therefore, it didn't show up when --group option is used since the leader ("mem-loads") event has no samples. But now it checks both events.
Before: $ sudo ./perf annotate --stdio -f --group path_put (no output)
After: $ sudo ./perf annotate --stdio -f --group path_put Percent | Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period) ------------------------------------------------------------------------------------------------------------------------------------------------------------- : 0 0xffffffffae600020 <path_put>: 0.00 0.00 0.00 : ffffffffae600020: endbr64 0.00 0.00 0.00 : ffffffffae600024: nopl (%rax, %rax) 0.00 91.22 0.00 : ffffffffae600029: pushq %rbx 0.00 0.00 0.00 : ffffffffae60002a: movq %rdi, %rbx 0.00 0.00 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi 0.00 8.78 0.00 : ffffffffae600031: callq 0xffffffffae614aa0 0.00 0.00 0.00 : ffffffffae600036: movq (%rbx), %rdi 0.00 0.00 0.00 : ffffffffae600039: popq %rbx 0.00 0.00 0.00 : ffffffffae60003a: jmp 0xffffffffae620670 0.00 0.00 0.00 : ffffffffae60003f: nop
Committer testing:
Before:
root@number:~# perf annotate --group --stdio2 clear_page_erms root@number:~#
After:
root@number:~# perf annotate --group --stdio2 clear_page_erms Samples: 125 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period] clear_page_erms() /proc/kcore Percent 0xffffffff990c6cc0 <clear_page_erms>: endbr64 movl $0x1000,%ecx xorl %eax,%eax 0.00 100.00 0.00 rep stosb %al, (%rdi) ← retq int3 int3 int3 int3 nop nop root@number:~#
Reported-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2 |
|
| #
ce533c9b |
| 03-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Add --skip-empty option
Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report.
For example, the
perf annotate: Add --skip-empty option
Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report.
For example, the following command would use 3 events including dummy.
$ perf mem record -a -- perf test -w noploop
$ perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u
Just using perf annotate with --group will show the all 3 events.
$ perf annotate --group --stdio | head Percent | Source code & Disassembly of ... -------------------------------------------------------------- : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 0.00 : e060: pushq %rbp 0.00 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 0.00 : e064: pushq %r15 0.00 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 0.00 : e069: pushq %r14 0.00 0.00 0.00 : e06b: pushq %r13 0.00 0.00 0.00 : e06d: movl %edx, %r13d
Now with --skip-empty, it'll hide the last dummy event.
$ perf annotate --group --stdio --skip-empty | head Percent | Source code & Disassembly of ... ------------------------------------------------------ : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 : e060: pushq %rbp 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 : e064: pushq %r15 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 : e069: pushq %r14 0.00 0.00 : e06b: pushq %r13 0.00 0.00 : e06d: movl %edx, %r13d
Committer testing:
root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~#
Before:
root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 0.00 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~#
After:
root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~#
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1 |
|
| #
2c9db747 |
| 18-Jul-2024 |
Athira Rajeev <[email protected]> |
perf annotate: Set instruction name to be used with insn-stat when using raw instruction
Since the "ins.name" is not set while using raw instruction, 'perf annotate' with insn-stat gives wrong data:
perf annotate: Set instruction name to be used with insn-stat when using raw instruction
Since the "ins.name" is not set while using raw instruction, 'perf annotate' with insn-stat gives wrong data:
Result from "./perf annotate --data-type --insn-stat":
Annotate Instruction stats total 615, ok 419 (68.1%), bad 196 (31.9%)
Name : Good Bad ----------------------------------------------------------- : 419 196
This patch sets "dl->ins.name" in arch specific function "check_ppc_insn" while initialising "struct disasm_line".
Also update "ins_find" function to pass "struct disasm_line" as a parameter so as to set its name field in arch specific call.
With the patch changes:
Annotate Instruction stats total 609, ok 446 (73.2%), bad 163 (26.8%)
Name/opcode : Good Bad ----------------------------------------------------------- 58 : 323 80 32 : 49 43 34 : 33 11 OP_31_XOP_LDX : 8 20 40 : 23 0 OP_31_XOP_LWARX : 5 1 OP_31_XOP_LWZX : 2 3 OP_31_XOP_LDARX : 3 0 33 : 0 2 OP_31_XOP_LBZX : 0 1 OP_31_XOP_LWAX : 0 1 OP_31_XOP_LHZX : 0 1
Reviewed-by: Kajol Jain <[email protected]> Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Athira Rajeev <[email protected]> Tested-by: Kajol Jain <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Akanksha J N <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Disha Goel <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Segher Boessenkool <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
411ee135 |
| 07-Jun-2024 |
Namhyung Kim <[email protected]> |
perf hist: Add symbol_conf.skip_empty
Add the skip_empty flag to symbol_conf and set the value from the report command to preserve the existing behavior. This makes the code simpler and will be nee
perf hist: Add symbol_conf.skip_empty
Add the skip_empty flag to symbol_conf and set the value from the report command to preserve the existing behavior. This makes the code simpler and will be needed other code which is hard to add a new argument.
Tested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
ee756ef7 |
| 04-May-2024 |
Ian Rogers <[email protected]> |
perf dso: Add reference count checking and accessor functions
Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCES
perf dso: Add reference count checking and accessor functions
Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCESS everywhere, add accessor functions for the variables in struct dso.
The majority of the change is mechanical in nature and not easy to split up.
Committer testing:
'perf test' up to this patch shows no regressions.
But:
util/symbol.c: In function ‘dso__load_bfd_symbols’: util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’ 1683 | dso__set_adjust_symbols(dso); | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from util/symbol.c:21: util/dso.h:268:20: note: declared here 268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val) | ^~~~~~~~~~~~~~~~~~~~~~~ make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1 MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/ make[6]: *** Waiting for unfinished jobs....
This was updated:
- symbols__fixup_end(&dso->symbols, false); - symbols__fixup_duplicate(&dso->symbols); - dso->adjust_symbols = 1; + symbols__fixup_end(dso__symbols(dso), false); + symbols__fixup_duplicate(dso__symbols(dso)); + dso__set_adjust_symbols(dso);
But not build tested with BUILD_NONDISTRO and libbfd devel files installed (binutils-devel on fedora).
Add the missing argument:
symbols__fixup_end(dso__symbols(dso), false); symbols__fixup_duplicate(dso__symbols(dso)); - dso__set_adjust_symbols(dso); + dso__set_adjust_symbols(dso, true);
Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahelenia Ziemiańska <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Changbin Du <[email protected]> Cc: Chengen Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dima Kogan <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Tiezhu Yang <[email protected]> Cc: Yanteng Si <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6 |
|
| #
2b87383c |
| 23-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Fix data type profiling on stdio
The loop in hists__find_annotations() never set the 'nd' pointer to NULL and it makes stdio output repeating the last element forever. I think it doe
perf annotate: Fix data type profiling on stdio
The loop in hists__find_annotations() never set the 'nd' pointer to NULL and it makes stdio output repeating the last element forever. I think it doesn't set to NULL for TUI to prevent it from exiting unexpectedly. But it should just set on stdio mode.
Fixes: d001c7a7f4736743 ("perf annotate-data: Add hist_entry__annotate_data_tui()") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4 |
|
| #
d001c7a7 |
| 11-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate-data: Add hist_entry__annotate_data_tui()
Support data type profiling output on TUI.
Testing from Arnaldo:
First make sure that the debug information for your workload binaries in em
perf annotate-data: Add hist_entry__annotate_data_tui()
Support data type profiling output on TUI.
Testing from Arnaldo:
First make sure that the debug information for your workload binaries in embedded in them by building it with '-g' or install the debuginfo packages, since our workload is 'find':
root@number:~# type find find is hashed (/usr/bin/find) root@number:~# rpm -qf /usr/bin/find findutils-4.9.0-5.fc39.x86_64 root@number:~# dnf debuginfo-install findutils <SNIP> root@number:~#
Then collect some data:
root@number:~# echo 1 > /proc/sys/vm/drop_caches root@number:~# perf mem record find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ] root@number:~#
Finally do data-type annotation with the following command, that will default, as 'perf report' to the --tui mode, with lines colored to highlight the hotspots, etc.
root@number:~# perf annotate --data-type Annotate type: 'struct predicate' (58 samples) Percent Offset Size Field 100.00 0 312 struct predicate { 0.00 0 8 PRED_FUNC pred_func; 0.00 8 8 char* p_name; 0.00 16 4 enum predicate_type p_type; 0.00 20 4 enum predicate_precedence p_prec; 0.00 24 1 _Bool side_effects; 0.00 25 1 _Bool no_default_print; 0.00 26 1 _Bool need_stat; 0.00 27 1 _Bool need_type; 0.00 28 1 _Bool need_inum; 0.00 32 4 enum EvaluationCost p_cost; 0.00 36 4 float est_success_rate; 0.00 40 1 _Bool literal_control_chars; 0.00 41 1 _Bool artificial; 0.00 48 8 char* arg_text; <SNIP>
Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
9b561be1 |
| 11-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate-data: Add hist_entry__annotate_data_tty()
And move the related code into util/annotate-data.c file.
Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <namhyung@
perf annotate-data: Add hist_entry__annotate_data_tty()
And move the related code into util/annotate-data.c file.
Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
d9aedc12 |
| 11-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Show progress of sample processing
Like 'perf report', it can take a while to process samples.
Show a progress window to inform users how that it is not stuck.
Reviewed-by: Ian Roge
perf annotate: Show progress of sample processing
Like 'perf report', it can take a while to process samples.
Show a progress window to inform users how that it is not stuck.
Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
bdeaf6ff |
| 22-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Honor output options with --data-type
For data type profiling output, it should be in sync with normal output so make it display percentage for each field. Also use coloring scheme f
perf annotate: Honor output options with --data-type
For data type profiling output, it should be in sync with normal output so make it display percentage for each field. Also use coloring scheme for users to identify fields with big overhead easily.
Users can use --show-total-period or --show-nr-samples to change the output style like in the normal perf annotate output.
Before:
$ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ samples offset size field 34 0 9792 struct task_struct { 2 0 24 struct thread_info thread_info { 0 0 8 long unsigned int flags; 1 8 8 long unsigned int syscall_work; 0 16 4 u32 status; 1 20 4 u32 cpu; };
After:
$ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ Percent offset size field 100.00 0 9792 struct task_struct { 3.55 0 24 struct thread_info thread_info { 0.00 0 8 long unsigned int flags; 1.63 8 8 long unsigned int syscall_work; 0.00 16 4 u32 status; 1.91 20 4 u32 cpu; };
Committer testing:
First collect a suitable perf.data file for use with 'perf annotate --data-type':
root@number:~# perf mem record -a sleep 1s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ] root@number:~#
Then, before:
root@number:~# perf annotate --data-type Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ samples offset size field 6 0 40 union { 6 0 40 struct __pthread_mutex_s __data { 2 0 4 int __lock; 0 4 4 unsigned int __count; 0 8 4 int __owner; 1 12 4 unsigned int __nusers; 2 16 4 int __kind; 1 20 2 short int __spins; 0 22 2 short int __elision; 0 24 16 __pthread_list_t __list { 0 24 8 struct __pthread_internal_list* __prev; 0 32 8 struct __pthread_internal_list* __next; }; }; 0 0 0 char* __size; 2 0 8 long int __align; }; <SNIP>
And after:
Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ Percent offset size field 100.00 0 40 union { 100.00 0 40 struct __pthread_mutex_s __data { 31.27 0 4 int __lock; 0.00 4 4 unsigned int __count; 0.00 8 4 int __owner; 7.67 12 4 unsigned int __nusers; 53.10 16 4 int __kind; 7.96 20 2 short int __spins; 0.00 22 2 short int __elision; 0.00 24 16 __pthread_list_t __list { 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0 0 char* __size; 31.27 0 8 long int __align; }; <SNIP>
The lines with percentages >= 7.67 have its percentages red colored.
Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
374af9f1 |
| 22-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Get rid of duplicate --group option item
The options array in cmd_annotate() has duplicate --group options. It only needs one and let's get rid of the other.
$ perf annotate -h 2>
perf annotate: Get rid of duplicate --group option item
The options array in cmd_annotate() has duplicate --group options. It only needs one and let's get rid of the other.
$ perf annotate -h 2>&1 | grep group --group Show event group information together --group Show event group information together
Fixes: 7ebaf4890f63eb90 ("perf annotate: Support '--group' option") Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
eb8a55e0 |
| 19-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate-data: Implement instruction tracking
If it failed to find a variable for the location directly, it might be due to a missing variable in the source code. For example, accessing pointe
perf annotate-data: Implement instruction tracking
If it failed to find a variable for the location directly, it might be due to a missing variable in the source code. For example, accessing pointer variables in a chain can result in the case like below:
struct foo *foo = ...;
int i = foo->bar->baz;
The DWARF debug information is created for each variable so it'd have one for 'foo'. But there's no variable for 'foo->bar' and then it cannot know the type of 'bar' and 'baz'.
The above source code can be compiled to the follow x86 instructions:
mov 0x8(%rax), %rcx mov 0x4(%rcx), %rdx <=== PMU sample mov %rdx, -4(%rbp)
Let's say 'foo' is located in the %rax and it has a pointer to struct foo. But perf sample is captured in the second instruction and there is no variable or type info for the %rcx.
It'd be great if compiler could generate debug info for %rcx, but we should handle it on our side. So this patch implements the logic to iterate instructions and update the type table for each location.
As it already collected a list of scopes including the target instruction, we can use it to construct the type table smartly.
+---------------- scope[0] subprogram | | +-------------- scope[1] lexical_block | | | | +------------ scope[2] inlined_subroutine | | | | | | +---------- scope[3] inlined_subroutine | | | | | | | | +-------- scope[4] lexical_block | | | | | | | | | | *** target instruction ...
Image the target instruction has 5 scopes, each scope will have its own variables and parameters. Then it can start with the innermost scope (4). So it'd search the shortest path from the start of scope[4] to the target address and build a list of basic blocks. Then it iterates the basic blocks with the variables in the scope and update the table. If it finds a type at the target instruction, then returns it.
Otherwise, it moves to the upper scope[3]. Now it'd search the shortest path from the start of scope[3] to the start of scope[4]. Then connect it to the existing basic block list. Then it'd iterate the blocks with variables for both scopes. It can repeat this until it finds a type at the target instruction or reaches to the top scope[0].
As the basic blocks contain the shortest path, it won't worry about branches and can update the table simply.
The final check will be done by find_matching_type() in the next patch.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
58824fa0 |
| 13-Dec-2023 |
Namhyung Kim <[email protected]> |
perf annotate: Add --insn-stat option for debugging
This is for a debugging purpose. It'd be useful to see per-instrucion level success/failure stats.
$ perf annotate --data-type --insn-stat A
perf annotate: Add --insn-stat option for debugging
This is for a debugging purpose. It'd be useful to see per-instrucion level success/failure stats.
$ perf annotate --data-type --insn-stat Annotate Instruction stats total 264, ok 143 (54.2%), bad 121 (45.8%)
Name : Good Bad ----------------------------------------------------------- movq : 45 31 movl : 22 11 popq : 0 19 cmpl : 16 3 addq : 8 7 cmpq : 11 3 cmpxchgl : 3 7 cmpxchgq : 8 0 incl : 3 3 movzbl : 4 2 incq : 4 2 decl : 6 0 ...
Committer notes:
So these are about being able to find the type for accesses from these instructions, we should improve the naming, but it is for debugging, we can improve this later:
@@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he) continue;
mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset); + if (mem_type) + istat->good++; + else + istat->bad++;
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
61a9741e |
| 13-Dec-2023 |
Namhyung Kim <[email protected]> |
perf annotate: Add --type-stat option for debugging
The --type-stat option is to be used with --data-type and to print detailed failure reasons for the data type annotation.
$ perf annotate --dat
perf annotate: Add --type-stat option for debugging
The --type-stat option is to be used with --data-type and to print detailed failure reasons for the data type annotation.
$ perf annotate --data-type --type-stat Annotate data type stats: total 294, ok 116 (39.5%), bad 178 (60.5%) ----------------------------------------------------------- 30 : no_sym 40 : no_insn_ops 33 : no_mem_ops 63 : no_var 4 : no_typeinfo 8 : bad_offset
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
227ad323 |
| 13-Dec-2023 |
Namhyung Kim <[email protected]> |
perf annotate: Support event group display
When events are grouped together, it'd be natural to show them at once like in other mode. Handle group leaders with members to collect the number of samp
perf annotate: Support event group display
When events are grouped together, it'd be natural to show them at once like in other mode. Handle group leaders with members to collect the number of samples together and display like below:
$ perf annotate --data-type --group ... Annotate type: 'struct page' in vmlinux (1 samples): event[0] = cpu/mem-loads,ldlat=30/P event[1] = cpu/mem-stores/P event[2] = dummy:u ============================================================================ samples offset size field 1 0 0 0 64 struct page { 0 0 0 0 8 long unsigned int flags; 0 0 0 8 40 union { 0 0 0 8 40 struct { 0 0 0 8 16 union { 0 0 0 8 16 struct list_head lru { 0 0 0 8 8 struct list_head* next; 0 0 0 16 8 struct list_head* prev; }; 0 0 0 8 16 struct { 0 0 0 8 8 void* __filler; 0 0 0 16 4 unsigned int mlock_count; }; 0 0 0 8 16 struct list_head buddy_list { 0 0 0 8 8 struct list_head* next; 0 0 0 16 8 struct list_head* prev; };
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
263925bf |
| 13-Dec-2023 |
Namhyung Kim <[email protected]> |
perf annotate: Add --data-type option
Support data type annotation with new --data-type option. It internally uses type sort key to collect sample histogram for the type and display every members l
perf annotate: Add --data-type option
Support data type annotation with new --data-type option. It internally uses type sort key to collect sample histogram for the type and display every members like below.
$ perf annotate --data-type ... Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples): ============================================================================ samples offset size field 13 0 640 struct cfs_rq { 2 0 16 struct load_weight load { 2 0 8 unsigned long weight; 0 8 4 u32 inv_weight; }; 0 16 8 unsigned long runnable_weight; 0 24 4 unsigned int nr_running; 1 28 4 unsigned int h_nr_running; ...
For simplicity it prints the number of samples per field for now. But it should be easy to show the overhead percentage instead.
The number at the outer struct is a sum of the numbers of the inner members. For example, struct cfs_rq got total 13 samples, and 2 came from the load (struct load_weight) and 1 from h_nr_running. Similarly, the struct load_weight got total 2 samples and they all came from the weight field.
I've added two new flags in the symbol_conf for this. The annotate_data_member is to get the members of the type. This is also needed for perf report with typeoff sort key. The annotate_data_sample is to update sample stats for each offset and used only in annotate.
Currently it only support stdio output mode, TUI support can be added later.
Committer testing:
With the perf.data from the previous csets, a very simple, short duration one:
# perf annotate --data-type Annotate type: 'struct list_head' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 16 struct list_head { 0 0 8 struct list_head* next; 1 8 8 struct list_head* prev; };
Annotate type: 'char' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 1 char ;
#
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|