|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
30c5a394 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Implement code + data type annotation
Sometimes it's useful to see both instructions and their data type together. Let's extend the annotate code to use data type profiling functions
perf annotate: Implement code + data type annotation
Sometimes it's useful to see both instructions and their data type together. Let's extend the annotate code to use data type profiling functions.
To make it easy to pass more argument, introduce a struct to carry necessary information together. Also add a new annotation_option called 'code_with_type' to control the behavior. This is not enabled yet but it'll be set later from the command line.
For simplicity, this is implemented for --stdio only.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
236ee256 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Factor out __hist_entry__get_data_type()
So that it can only handle a single disasm_linme and hopefully make the code simpler. This is also a preparation to be called from different
perf annotate: Factor out __hist_entry__get_data_type()
So that it can only handle a single disasm_linme and hopefully make the code simpler. This is also a preparation to be called from different places later.
The NO_TYPE macro was added to distinguish when it failed or needs retry.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
fe8da669 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Pass hist_entry to annotate functions
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_
perf annotate: Pass hist_entry to annotate functions
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_entry so it needs to be passed deeper.
Also rename a function with the same name in the builtin-annotate.c to hist_entry__stdio_annotate since it matches better to the command line option. And change the condition inside to be simpler.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
9aa3cbbf |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Pass annotation_options to annotation_line__print()
The annotation_line__print() has many arguments. But min_percent, max_lines and percent_type are from struct annotaion_options. S
perf annotate: Pass annotation_options to annotation_line__print()
The annotation_line__print() has many arguments. But min_percent, max_lines and percent_type are from struct annotaion_options. So let's pass a pointer to the option instead of passing them separately to reduce the number of function arguments.
Actually it has a recursive call if 'queue' is set. Add a new option instance to pass different values for the case.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
1f284082 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Remove unused len parameter from annotation_line__print()
It's not used anywhere, let's get rid of it.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/20
perf annotate: Remove unused len parameter from annotation_line__print()
It's not used anywhere, let's get rid of it.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
bde4ccfd |
| 24-Jan-2025 |
Ian Rogers <[email protected]> |
perf annotate: Use an array for the disassembler preference
Prior to this change a string was used which could cause issues with an unrecognized disassembler in symbol__disassembler. Change to initi
perf annotate: Use an array for the disassembler preference
Prior to this change a string was used which could cause issues with an unrecognized disassembler in symbol__disassembler. Change to initializing an array of perf_disassembler enum values. If a value already exists then adding it a second time is ignored to avoid array out of bounds problems present in the previous code, it also allows a statically sized array and removes memory allocation needs. Errors in the disassembler string are reported when the config is parsed during perf annotate or perf top start up. If the array is uninitialized after processing the config file the default llvm, capstone then objdump values are added but without a need to parse a string.
Fixes: a6e8a58de629 ("perf disasm: Allow configuring what disassemblers to use") Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/ Signed-off-by: Ian Rogers <[email protected]> Tested-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.13 |
|
| #
035f0c27 |
| 17-Jan-2025 |
Ian Rogers <[email protected]> |
perf annotate: Prefer passing evsel to evsel->core.idx
An evsel idx may not be stable due to sorting, evlist removal, etc. Try to reduce it being part of APIs by explicitly passing the evsel in anno
perf annotate: Prefer passing evsel to evsel->core.idx
An evsel idx may not be stable due to sorting, evlist removal, etc. Try to reduce it being part of APIs by explicitly passing the evsel in annotate code. Internally the code just reads evsel->core.idx so behavior is unchanged.
Signed-off-by: Ian Rogers <[email protected]> Cc: Chen Ni <[email protected]> Cc: Athira Rajeev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
a6e8a58d |
| 11-Nov-2024 |
Arnaldo Carvalho de Melo <[email protected]> |
perf disasm: Allow configuring what disassemblers to use
The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then
perf disasm: Allow configuring what disassemblers to use
The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not parsing the output of tools using those libraries to mimic binutils's objdump output) were introduced.
So when all those methods are available, there is a static preference for a series of attempts of disassembling a binary, with the 'llvm, capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf config' 'annotate.disassemblers' entry which and in what order disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this flexibility is useful for users and developers alike, elliminating the requirement to rebuild the tool with some specific set of libraries to see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> symbol__disassemble: filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, sym=update_load_avg, start=0xffffffffb6148fe0, en> annotating [0x6ff7170] /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : [0x7407ca0] update_load_avg Disassembled with llvm annotate.disassemblers=llvm,capstone,objdump Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = capstone root@x1:~# root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> Disassembled with capstone annotate.disassemblers=capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=objdump,capstone root@x1:~# perf config annotate.disassemblers annotate.disassemblers=objdump,capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,capstone root@x1:~# perf annotate -v --stdio2 update_load_avg Executing: objdump --start-address=0xffffffff81148fe0 \ --stop-address=0xffffffff811497aa \ -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump,capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 #define DO_DETACH 0x8
/* Update task and its cfs_rq load average */ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { 1.61 push %r15 push %r14 1.00 push %r13 mov %edx,%r13d 1.90 push %r12 push %rbp mov %rsi,%rbp push %rbx mov %rdi,%rbx sub $0x18,%rsp }
/* rq->task_clock normalized against any time this cfs_rq has spent throttled */ static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq) { if (unlikely(cfs_rq->throttle_count)) 15.14 mov 0x1a4(%rdi),%eax root@x1:~#
After adding a way to select the disassembler from the command line a 'perf test' comparing the output of the various diassemblers should be introduced, to test these codebases.
Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7 |
|
| #
9fc4489a |
| 08-Nov-2024 |
Ian Rogers <[email protected]> |
perf dwarf-regs: Pass accurate disassembly machine to get_dwarf_regnum
Rather than pass 0/EM_NONE, use the value computed in the disasm struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_
perf dwarf-regs: Pass accurate disassembly machine to get_dwarf_regnum
Rather than pass 0/EM_NONE, use the value computed in the disasm struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_NONE if it were passed to get_dwarf_regnum. Pass a flags value as architectures like csky need the flags to determine the ABI variant.
Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Anup Patel <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: David S. Miller <[email protected]> Cc: Albert Ou <[email protected]> Cc: Shenlin Liang <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Guilherme Amadio <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Changbin Du <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Przemek Kitszel <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Guo Ren <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Will Deacon <[email protected]> Cc: James Clark <[email protected]> Cc: Mike Leach <[email protected]> Cc: Chen Pei <[email protected]> Cc: Leo Yan <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Aditya Gupta <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Bibo Mao <[email protected]> Cc: John Garry <[email protected]> Cc: Atish Patra <[email protected]> Cc: Dima Kogan <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Dr. David Alan Gilbert <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
edf3ce0e |
| 09-Sep-2024 |
Kan Liang <[email protected]> |
perf env: Find correct branch counter info on hybrid
No event is printed in the "Branch Counter" column on hybrid machines.
For example,
$ perf record -e "{cpu_core/branch-instructions/pp,cpu_co
perf env: Find correct branch counter info on hybrid
No event is printed in the "Branch Counter" column on hybrid machines.
For example,
$ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter $ perf report --total-cycles
# Branch counter abbr list: # cpu_core/branch-instructions/pp = A # cpu_core/branches/ = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter # ............... .............. ........... .......... .............. 44.54% 727.1K 0.00% 1 |+ |+ | 36.31% 592.7K 0.00% 2 |+ |+ | 17.83% 291.1K 0.00% 1 |+ |+ |
The branch counter information (br_cntr_width and br_cntr_nr) in the perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS is not available on hybrid machines. Without the width information, the number of occurrences of an event cannot be calculated.
For a hybrid machine, the caps information should be retrieved from the PMU_CAPS, and stored in the perf_env->pmu_caps.
Add a perf_env__find_br_cntr_info() to return the correct branch counter information from the corresponding fields.
Committer notes:
While testing I couldn't s ee those "Branch counter" columns enabled by pressing 'B' on the TUI, after reporting it to the list Kan explained the situation:
<quote Kan Liang> For a hybrid client, the "Branch Counter" feature is only supported starting from the just released Lunar Lake. Perf falls back to only "ANY" on your Raptor Lake.
The "The branch counter is not available" message is expected.
Here is the 'perf evlist' result from my Lunar Lake machine,
# perf evlist -v cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS # </quote>
Fixes: 6f9d8d1de2c61288 ("perf script: Add branch counters") Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
c8b93587 |
| 09-Sep-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Treat 'call' instruction as stack operation
I found some portion of mem-store events sampled on CALL instruction which has no memory access. But it actually saves a return address in
perf annotate: Treat 'call' instruction as stack operation
I found some portion of mem-store events sampled on CALL instruction which has no memory access. But it actually saves a return address into stack. It should be considered as a stack operation like RET instruction.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
e6952dce |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }', 4000 Hz, Event count (approx.): f3 /home/sdp/test/tchain_edit [Percent: local period] Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%) │ 0000000000401755 <f3>: 0.00 0.00 │ endbr64 │ push %rbp │ mov %rsp,%rbp │ movl $0x0,-0x4(%rbp) 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25 11.03 11.03 │ 11: mov -0x4(%rbp),%eax │ and $0x1,%eax │ test %eax,%eax 17.13 17.13 │2.41 1 |A |- | ↓ je 21 │ addl $0x1,-0x4(%rbp) 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp) 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp) 11.03 11.03 │0.61 3 |A |- | ↑ jle 11 │ nop │ pop %rbp 0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
20d6f555 |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf report: Display the branch counter histogram
Reusing the existing --total-cycles option to display the branch counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display the logged bra
perf report: Display the branch counter histogram
Reusing the existing --total-cycles option to display the branch counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display the logged branch counter events. They are shown right after all the cycle-related annotations.
Extend the 'struct block_info' to store and pass the branch counter related information.
The annotation_br_cntr_entry() is to print the histogram of each branch counter event. If the number of logged events is less than 4, the exact number of the abbr name is printed. Otherwise, using '+' to stands for more than 3 events.
Assume the number of logged events is less than 4.
The annotation_br_cntr_abbr_list() prints the branch counter's abbreviation list. Press 'B' to display the list in the TUI mode.
$ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter $ perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }' # Event count (approx.): 1610046 # # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 |A |- | ... 25.27% 1.1M 0.00% 2 |AA |- | ... 15.61% 667.2K 0.00% 1 |A |- | ... 0.16% 6.9K 0.81% 575 |A |- | ... 0.16% 6.8K 1.38% 977 |AA |- | ... 0.16% 6.8K 0.04% 28 |AA |B | ... 0.15% 6.6K 1.33% 946 |A |- | ... 0.11% 4.5K 0.06% 46 |AAA+|- | ... 0.10% 4.4K 0.88% 624 |A |- | ... 0.09% 3.7K 0.74% 524 |AAA+|B | ...
With -v applied,
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 A=1 ,B=- ... 25.27% 1.1M 0.00% 2 A=2 ,B=- ... 15.61% 667.2K 0.00% 1 A=1 ,B=- ... 0.16% 6.9K 0.81% 575 A=1 ,B=- ... 0.16% 6.8K 1.38% 977 A=2 ,B=- ... 0.16% 6.8K 0.04% 28 A=2 ,B=1 ... 0.15% 6.6K 1.33% 946 A=1 ,B=- ... 0.11% 4.5K 0.06% 46 A=3+,B=- ... 0.10% 4.4K 0.88% 624 A=1 ,B=- ... 0.09% 3.7K 0.74% 524 A=3+,B=1 ...
Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
1f2b7fbb |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available f
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64 space.
Because the saturation of a branch counter is small, e.g., for Intel Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter once saturated. That can be used to indicate a potential event lost because of the saturation.
Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3 |
|
| #
037f1b67 |
| 05-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Cache debuginfo for data type profiling
In find_data_type(), it creates and deletes a debug info whenver it tries to find data type for a sample. This is inefficient and it most like
perf annotate: Cache debuginfo for data type profiling
In find_data_type(), it creates and deletes a debug info whenver it tries to find data type for a sample. This is inefficient and it most likely accesses the same binary again and again.
Let's add a single entry cache the debug info structure for the last DSO. Depending on sample data, it usually gives me 2~3x (and sometimes more) speed ups.
Note that this will introduce a little difference in the output due to the order of checking stack operations. It used to check the stack ops before checking the availability of debug info but I moved it after the symbol check. So it'll report stack operations in DSOs without debug info as unknown. But I think it's ok and better to have the checking near the caching logic.
Committer testing:
root@x1:~# perf mem record -a sleep 5s root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# diff -u before after --- before 2024-08-08 09:33:53.880780784 -0300 +++ after 2024-08-08 09:35:13.917325041 -0300 @@ -81,8 +81,8 @@ # Overhead Data Type # ........ ......... # - 55.43% (unknown) - 11.61% (stack operation) + 55.56% (unknown) + 11.48% (stack operation) 4.93% struct pcpu_hot 3.26% unsigned int 2.48% struct
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2 |
|
| #
ce533c9b |
| 03-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Add --skip-empty option
Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report.
For example, the
perf annotate: Add --skip-empty option
Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report.
For example, the following command would use 3 events including dummy.
$ perf mem record -a -- perf test -w noploop
$ perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u
Just using perf annotate with --group will show the all 3 events.
$ perf annotate --group --stdio | head Percent | Source code & Disassembly of ... -------------------------------------------------------------- : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 0.00 : e060: pushq %rbp 0.00 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 0.00 : e064: pushq %r15 0.00 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 0.00 : e069: pushq %r14 0.00 0.00 0.00 : e06b: pushq %r13 0.00 0.00 0.00 : e06d: movl %edx, %r13d
Now with --skip-empty, it'll hide the last dummy event.
$ perf annotate --group --stdio --skip-empty | head Percent | Source code & Disassembly of ... ------------------------------------------------------ : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 : e060: pushq %rbp 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 : e064: pushq %r15 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 : e069: pushq %r14 0.00 0.00 : e06b: pushq %r13 0.00 0.00 : e06d: movl %edx, %r13d
Committer testing:
root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~#
Before:
root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 0.00 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~#
After:
root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~#
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
b00e4d0d |
| 03-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Use annotation__pcnt_width() consistently
The annotation__pcnt_width() calculates the screen width for the overhead (percent) area considering event groups properly. Use this functio
perf annotate: Use annotation__pcnt_width() consistently
The annotation__pcnt_width() calculates the screen width for the overhead (percent) area considering event groups properly. Use this function consistently so that we can make sure it has similar output in different modes. But there's a difference in stdio and tui output: stdio uses 8 and tui uses 7 for a percent.
Let's use 8 and adjust the print width in __annotation_line__write() properly.
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
cb1e8bfc |
| 03-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Set notes->src->nr_events early
We want to use it in different places so make sure it sets properly in symbol__annotate() before creating the disasm lines.
Signed-off-by: Namhyung Ki
perf annotate: Set notes->src->nr_events early
We want to use it in different places so make sure it sets properly in symbol__annotate() before creating the disasm lines.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
2dc02c26 |
| 03-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Use al->data_nr if possible
The data_nr keeps the number of entries in al->data[] so it should use it when it iterates the array. The notes->src->nr_events should have the same numbe
perf annotate: Use al->data_nr if possible
The data_nr keeps the number of entries in al->data[] so it should use it when it iterates the array. The notes->src->nr_events should have the same number but it'd be natural to use al->data_nr.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1 |
|
| #
2c9db747 |
| 18-Jul-2024 |
Athira Rajeev <[email protected]> |
perf annotate: Set instruction name to be used with insn-stat when using raw instruction
Since the "ins.name" is not set while using raw instruction, 'perf annotate' with insn-stat gives wrong data:
perf annotate: Set instruction name to be used with insn-stat when using raw instruction
Since the "ins.name" is not set while using raw instruction, 'perf annotate' with insn-stat gives wrong data:
Result from "./perf annotate --data-type --insn-stat":
Annotate Instruction stats total 615, ok 419 (68.1%), bad 196 (31.9%)
Name : Good Bad ----------------------------------------------------------- : 419 196
This patch sets "dl->ins.name" in arch specific function "check_ppc_insn" while initialising "struct disasm_line".
Also update "ins_find" function to pass "struct disasm_line" as a parameter so as to set its name field in arch specific call.
With the patch changes:
Annotate Instruction stats total 609, ok 446 (73.2%), bad 163 (26.8%)
Name/opcode : Good Bad ----------------------------------------------------------- 58 : 323 80 32 : 49 43 34 : 33 11 OP_31_XOP_LDX : 8 20 40 : 23 0 OP_31_XOP_LWARX : 5 1 OP_31_XOP_LWZX : 2 3 OP_31_XOP_LDARX : 3 0 33 : 0 2 OP_31_XOP_LBZX : 0 1 OP_31_XOP_LWAX : 0 1 OP_31_XOP_LHZX : 0 1
Reviewed-by: Kajol Jain <[email protected]> Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Athira Rajeev <[email protected]> Tested-by: Kajol Jain <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Akanksha J N <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Disha Goel <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Segher Boessenkool <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
1b4406d2 |
| 18-Jul-2024 |
Athira Rajeev <[email protected]> |
perf annotate: Update parameters for reg extract functions to use raw instruction on powerpc
Use the raw instruction code and macros to identify memory instructions, extract register fields and also
perf annotate: Update parameters for reg extract functions to use raw instruction on powerpc
Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset.
The implementation addresses the D-form, X-form, DS-form instructions.
Adds "mem_ref" field to check whether source/target has memory reference.
Add function "get_powerpc_regs" which will set these fields: reg1, reg2, offset depending of where it is source or target ops.
Update "parse" callback for "struct ins_ops" to also pass "struct disasm_line" as argument. This is needed in parse functions where opcode is used to determine whether to set multi_regs and other fields
Reviewed-by: Kajol Jain <[email protected]> Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Athira Rajeev <[email protected]> Tested-by: Kajol Jain <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Akanksha J N <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Disha Goel <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Segher Boessenkool <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9 |
|
| #
9ef30265 |
| 10-May-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Fix segfault on sample histogram
A symbol can have no samples, then accessing the annotated_source->samples hashmap will result in a segfault.
Fixes: a3f7768bcf48281d ("perf annotate
perf annotate: Fix segfault on sample histogram
A symbol can have no samples, then accessing the annotated_source->samples hashmap will result in a segfault.
Fixes: a3f7768bcf48281d ("perf annotate: Fix memory leak in annotated_source") Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
a3f7768b |
| 07-May-2024 |
Ian Rogers <[email protected]> |
perf annotate: Fix memory leak in annotated_source
Freeing hash map doesn't free the entries added to the hashmap, add the missing free().
Fixes: d3e7cad6f36d9e80 ("perf annotate: Add a hashmap for
perf annotate: Fix memory leak in annotated_source
Freeing hash map doesn't free the entries added to the hashmap, add the missing free().
Fixes: d3e7cad6f36d9e80 ("perf annotate: Add a hashmap for symbol histogram") Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
36e8aa90 |
| 06-May-2024 |
Athira Rajeev <[email protected]> |
perf annotate: Fix a comment about multi_regs in extract_reg_offset function
Fix a comment in function which explains how multi_regs field gets set for an instruction. In the example, "mov %rsi, 8(
perf annotate: Fix a comment about multi_regs in extract_reg_offset function
Fix a comment in function which explains how multi_regs field gets set for an instruction. In the example, "mov %rsi, 8(%rbx,%rcx,4)", the comment mistakenly referred to "dst_multi_regs = 0". Correct it to use "src_multi_regs = 0"
Signed-off-by: Athira Rajeev <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Akanksha J N <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Disha Goel <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Segher Boessenkool <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
69fb6eab |
| 07-May-2024 |
Arnaldo Carvalho de Melo <[email protected]> |
perf annotate: Use zfree() to avoid possibly accessing dangling pointers
When freeing a->b it is good practice to set a->b to NULL using zfree(&a->b) so that when we have a bug where a reference to
perf annotate: Use zfree() to avoid possibly accessing dangling pointers
When freeing a->b it is good practice to set a->b to NULL using zfree(&a->b) so that when we have a bug where a reference to a freed 'a' pointer is kept somewhere, we can more quickly cause a segfault if some code tries to use a->b.
This is mostly done but some new cases were introduced recently, convert them to zfree().
Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/ZjmbHHrjIm5YRIBv@x1 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|