|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
30c5a394 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Implement code + data type annotation
Sometimes it's useful to see both instructions and their data type together. Let's extend the annotate code to use data type profiling functions
perf annotate: Implement code + data type annotation
Sometimes it's useful to see both instructions and their data type together. Let's extend the annotate code to use data type profiling functions.
To make it easy to pass more argument, introduce a struct to carry necessary information together. Also add a new annotation_option called 'code_with_type' to control the behavior. This is not enabled yet but it'll be set later from the command line.
For simplicity, this is implemented for --stdio only.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
fe8da669 |
| 10-Mar-2025 |
Namhyung Kim <[email protected]> |
perf annotate: Pass hist_entry to annotate functions
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_
perf annotate: Pass hist_entry to annotate functions
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_entry so it needs to be passed deeper.
Also rename a function with the same name in the builtin-annotate.c to hist_entry__stdio_annotate since it matches better to the command line option. And change the condition inside to be simpler.
Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6 |
|
| #
dab8c32e |
| 04-Mar-2025 |
Athira Rajeev <[email protected]> |
perf annotate: Add annotation_options.disassembler_used
When doing "perf annotate", perf tool provides option to use specific disassembler like llvm/objdump/capstone. The order picked is to use llvm
perf annotate: Add annotation_options.disassembler_used
When doing "perf annotate", perf tool provides option to use specific disassembler like llvm/objdump/capstone. The order picked is to use llvm first and if that fails fallback to objdump ie to use PERF_DISASM_LLVM, PERF_DISASM_CAPSTONE and PERF_DISASM_OBJDUMP
In powerpc, when using "data type" sort keys, first preferred approach is to read the raw instruction from the DSO. In objdump is specified in "--objdump" option, it picks the symbol disassemble using objdump. Currently disasm_line__parse_powerpc() function uses length of the "line" to determine if objdump is used. But there are few cases, where if objdump doesn't recognise the instruction, the disassembled string will be empty.
Example:
134cdc: c4 05 82 41 beq 1352a0 <getcwd+0x6e0> 134ce0: ac 00 8e 40 bne cr3,134d8c <getcwd+0x1cc> 134ce4: 0f 00 10 04 pld r9,1028308 ====>134ce8: d4 b0 20 e5 134cec: 16 00 40 39 li r10,22 134cf0: 48 01 21 ea ld r17,328(r1)
So depending on length of line will give bad results.
Add a new filed to annotation options structure, "struct annotation_options" to save the disassembler used. Use this info to determine if disassembly is done while parsing the disasm line.
Reported-by: Tejas Manhas <[email protected]> Signed-off-by: Athira Rajeev <[email protected]> Tested-By: Venkat Rao Bagalkote <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
bde4ccfd |
| 24-Jan-2025 |
Ian Rogers <[email protected]> |
perf annotate: Use an array for the disassembler preference
Prior to this change a string was used which could cause issues with an unrecognized disassembler in symbol__disassembler. Change to initi
perf annotate: Use an array for the disassembler preference
Prior to this change a string was used which could cause issues with an unrecognized disassembler in symbol__disassembler. Change to initializing an array of perf_disassembler enum values. If a value already exists then adding it a second time is ignored to avoid array out of bounds problems present in the previous code, it also allows a statically sized array and removes memory allocation needs. Errors in the disassembler string are reported when the config is parsed during perf annotate or perf top start up. If the array is uninitialized after processing the config file the default llvm, capstone then objdump values are added but without a need to parse a string.
Fixes: a6e8a58de629 ("perf disasm: Allow configuring what disassemblers to use") Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/ Signed-off-by: Ian Rogers <[email protected]> Tested-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.13 |
|
| #
035f0c27 |
| 17-Jan-2025 |
Ian Rogers <[email protected]> |
perf annotate: Prefer passing evsel to evsel->core.idx
An evsel idx may not be stable due to sorting, evlist removal, etc. Try to reduce it being part of APIs by explicitly passing the evsel in anno
perf annotate: Prefer passing evsel to evsel->core.idx
An evsel idx may not be stable due to sorting, evlist removal, etc. Try to reduce it being part of APIs by explicitly passing the evsel in annotate code. Internally the code just reads evsel->core.idx so behavior is unchanged.
Signed-off-by: Ian Rogers <[email protected]> Cc: Chen Ni <[email protected]> Cc: Athira Rajeev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
b2b95a2d |
| 29-Nov-2024 |
Arnaldo Carvalho de Melo <[email protected]> |
perf disasm: Return a proper error when not determining the file type
Before:
⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sy
perf disasm: Return a proper error when not determining the file type
Before:
⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Internal error: Invalid -1 error code ⬢ [acme@toolbox a]$
After:
⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Couldn't determine the file /tmp/perf-3308868.map type. ⬢ [acme@toolbox a]$
Reported-by: Francesco Nigro <[email protected]> Reported-by: Ilan Green <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/lkml/Z092D9-r_iOgwIWM@x1 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.12 |
|
| #
a6e8a58d |
| 11-Nov-2024 |
Arnaldo Carvalho de Melo <[email protected]> |
perf disasm: Allow configuring what disassemblers to use
The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then
perf disasm: Allow configuring what disassemblers to use
The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not parsing the output of tools using those libraries to mimic binutils's objdump output) were introduced.
So when all those methods are available, there is a static preference for a series of attempts of disassembling a binary, with the 'llvm, capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf config' 'annotate.disassemblers' entry which and in what order disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this flexibility is useful for users and developers alike, elliminating the requirement to rebuild the tool with some specific set of libraries to see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> symbol__disassemble: filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, sym=update_load_avg, start=0xffffffffb6148fe0, en> annotating [0x6ff7170] /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : [0x7407ca0] update_load_avg Disassembled with llvm annotate.disassemblers=llvm,capstone,objdump Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = capstone root@x1:~# root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> Disassembled with capstone annotate.disassemblers=capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=objdump,capstone root@x1:~# perf config annotate.disassemblers annotate.disassemblers=objdump,capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,capstone root@x1:~# perf annotate -v --stdio2 update_load_avg Executing: objdump --start-address=0xffffffff81148fe0 \ --stop-address=0xffffffff811497aa \ -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump,capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 #define DO_DETACH 0x8
/* Update task and its cfs_rq load average */ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { 1.61 push %r15 push %r14 1.00 push %r13 mov %edx,%r13d 1.90 push %r12 push %rbp mov %rsi,%rbp push %rbx mov %rdi,%rbx sub $0x18,%rsp }
/* rq->task_clock normalized against any time this cfs_rq has spent throttled */ static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq) { if (unlikely(cfs_rq->throttle_count)) 15.14 mov 0x1a4(%rdi),%eax root@x1:~#
After adding a way to select the disassembler from the command line a 'perf test' comparing the output of the various diassemblers should be introduced, to test these codebases.
Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
e6952dce |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }', 4000 Hz, Event count (approx.): f3 /home/sdp/test/tchain_edit [Percent: local period] Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%) │ 0000000000401755 <f3>: 0.00 0.00 │ endbr64 │ push %rbp │ mov %rsp,%rbp │ movl $0x0,-0x4(%rbp) 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25 11.03 11.03 │ 11: mov -0x4(%rbp),%eax │ and $0x1,%eax │ test %eax,%eax 17.13 17.13 │2.41 1 |A |- | ↓ je 21 │ addl $0x1,-0x4(%rbp) 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp) 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp) 11.03 11.03 │0.61 3 |A |- | ↑ jle 11 │ nop │ pop %rbp 0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
20d6f555 |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf report: Display the branch counter histogram
Reusing the existing --total-cycles option to display the branch counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display the logged bra
perf report: Display the branch counter histogram
Reusing the existing --total-cycles option to display the branch counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display the logged branch counter events. They are shown right after all the cycle-related annotations.
Extend the 'struct block_info' to store and pass the branch counter related information.
The annotation_br_cntr_entry() is to print the histogram of each branch counter event. If the number of logged events is less than 4, the exact number of the abbr name is printed. Otherwise, using '+' to stands for more than 3 events.
Assume the number of logged events is less than 4.
The annotation_br_cntr_abbr_list() prints the branch counter's abbreviation list. Press 'B' to display the list in the TUI mode.
$ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter $ perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }' # Event count (approx.): 1610046 # # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 |A |- | ... 25.27% 1.1M 0.00% 2 |AA |- | ... 15.61% 667.2K 0.00% 1 |A |- | ... 0.16% 6.9K 0.81% 575 |A |- | ... 0.16% 6.8K 1.38% 977 |AA |- | ... 0.16% 6.8K 0.04% 28 |AA |B | ... 0.15% 6.6K 1.33% 946 |A |- | ... 0.11% 4.5K 0.06% 46 |AAA+|- | ... 0.10% 4.4K 0.88% 624 |A |- | ... 0.09% 3.7K 0.74% 524 |AAA+|B | ...
With -v applied,
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 A=1 ,B=- ... 25.27% 1.1M 0.00% 2 A=2 ,B=- ... 15.61% 667.2K 0.00% 1 A=1 ,B=- ... 0.16% 6.9K 0.81% 575 A=1 ,B=- ... 0.16% 6.8K 1.38% 977 A=2 ,B=- ... 0.16% 6.8K 0.04% 28 A=2 ,B=1 ... 0.15% 6.6K 1.33% 946 A=1 ,B=- ... 0.11% 4.5K 0.06% 46 A=3+,B=- ... 0.10% 4.4K 0.88% 624 A=1 ,B=- ... 0.09% 3.7K 0.74% 524 A=3+,B=1 ...
Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
1f2b7fbb |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available f
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64 space.
Because the saturation of a branch counter is small, e.g., for Intel Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter once saturated. That can be used to indicate a potential event lost because of the saturation.
Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3 |
|
| #
037f1b67 |
| 05-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Cache debuginfo for data type profiling
In find_data_type(), it creates and deletes a debug info whenver it tries to find data type for a sample. This is inefficient and it most like
perf annotate: Cache debuginfo for data type profiling
In find_data_type(), it creates and deletes a debug info whenver it tries to find data type for a sample. This is inefficient and it most likely accesses the same binary again and again.
Let's add a single entry cache the debug info structure for the last DSO. Depending on sample data, it usually gives me 2~3x (and sometimes more) speed ups.
Note that this will introduce a little difference in the output due to the order of checking stack operations. It used to check the stack ops before checking the availability of debug info but I moved it after the symbol check. So it'll report stack operations in DSOs without debug info as unknown. But I think it's ok and better to have the checking near the caching logic.
Committer testing:
root@x1:~# perf mem record -a sleep 5s root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# diff -u before after --- before 2024-08-08 09:33:53.880780784 -0300 +++ after 2024-08-08 09:35:13.917325041 -0300 @@ -81,8 +81,8 @@ # Overhead Data Type # ........ ......... # - 55.43% (unknown) - 11.61% (stack operation) + 55.56% (unknown) + 11.48% (stack operation) 4.93% struct pcpu_hot 3.26% unsigned int 2.48% struct
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2 |
|
| #
b00e4d0d |
| 03-Aug-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Use annotation__pcnt_width() consistently
The annotation__pcnt_width() calculates the screen width for the overhead (percent) area considering event groups properly. Use this functio
perf annotate: Use annotation__pcnt_width() consistently
The annotation__pcnt_width() calculates the screen width for the overhead (percent) area considering event groups properly. Use this function consistently so that we can make sure it has similar output in different modes. But there's a difference in stdio and tui output: stdio uses 8 and tui uses 7 for a percent.
Let's use 8 and adjust the print width in __annotation_line__write() properly.
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1 |
|
| #
06dd4c5a |
| 18-Jul-2024 |
Athira Rajeev <[email protected]> |
perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc
Currently, the perf tool infrastructure uses the disasm_line__parse function to parse disassembled line.
Example snippet
perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc
Currently, the perf tool infrastructure uses the disasm_line__parse function to parse disassembled line.
Example snippet from objdump:
objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4: lwz r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset.
In powerpc, the approach for data type profiling uses raw instruction instead of result from objdump to identify the instruction category and extract the source/target registers.
Example: 38 01 81 e8 ld r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. Add function "disasm_line__parse_powerpc" to handle parsing of raw instruction. Also update "struct disasm_line" to save the binary code/ With the change, function captures:
line -> "38 01 81 e8 ld r4,312(r1)" raw instruction "38 01 81 e8"
Raw instruction is used later to extract the reg/offset fields. Macros are added to extract opcode and register fields. "struct disasm_line" is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw code (raw).
Function "disasm_line__parse_powerpc fills the raw instruction hex value and can use macros to get opcode. There is no changes in existing code paths, which parses the disassembled code. The size of raw instruction depends on architecture.
In case of powerpc, the parsing the disasm line needs to handle cases for reading binary code directly from DSO as well as parsing the objdump result. Hence adding the logic into separate function instead of updating "disasm_line__parse". The architecture using the instruction name and present approach is not altered. Since this approach targets powerpc, the macro implementation is added for powerpc as of now.
Since the disasm_line__parse is used in other cases (perf annotate) and not only data tye profiling, the powerpc callback includes changes to work with binary code as well as mnemonic representation.
Also in case if the DSO read fails and libcapstone is not supported, the approach fallback to use objdump as option. Hence as option, patch has changes to ensure objdump option also works well.
Reviewed-by: Kajol Jain <[email protected]> Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Athira Rajeev <[email protected]> Tested-by: Kajol Jain <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Akanksha J N <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Disha Goel <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Segher Boessenkool <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] [ Add check for strndup() result ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3 |
|
| #
8c004c7a |
| 04-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Move 'start' field struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol
perf annotate: Move 'start' field struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation').
Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Ian Rogers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: LKML <[email protected]> Cc: <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
6f94a72d |
| 04-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Move nr_events struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('st
perf annotate: Move nr_events struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation').
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
f6b18aba |
| 04-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every sy
perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation').
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
a46acc45 |
| 04-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Move 'widths' struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('str
perf annotate: Move 'widths' struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation').
Also move the 'max_line_len' field into it as it's related.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
cee9b860 |
| 04-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Get rid of offsets array
The struct annotated_source.offsets[] is to save pointers to annotation_line at each offset. We can use annotated_source__get_line() helper instead so let's
perf annotate: Get rid of offsets array
The struct annotated_source.offsets[] is to save pointers to annotation_line at each offset. We can use annotated_source__get_line() helper instead so let's get rid of the array.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
6f157d9a |
| 04-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Introduce annotated_source__get_line()
It's a helper function to get annotation_line at the given offset without using the offsets array. The goal is to get rid of the offsets array
perf annotate: Introduce annotated_source__get_line()
It's a helper function to get annotation_line at the given offset without using the offsets array. The goal is to get rid of the offsets array altogether. It just does the linear search but I think it's better to save memory as it won't be called in a hot path.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
bfd98ceb |
| 04-Apr-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Staticize some local functions
I found annotation__mark_jump_targets(), annotation__set_offsets() and annotation__init_column_widths() are only used in the same file. Let's make them
perf annotate: Staticize some local functions
I found annotation__mark_jump_targets(), annotation__set_offsets() and annotation__init_column_widths() are only used in the same file. Let's make them static.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc2 |
|
| #
98f69a57 |
| 29-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Split out util/disasm.c
The util/annotate.c code has both disassembly and sample annotation related codes. Factor out the disasm part so that it can be handled more easily.
No funct
perf annotate: Split out util/disasm.c
The util/annotate.c code has both disassembly and sample annotation related codes. Factor out the disasm part so that it can be handled more easily.
No functional changes intended.
Committer notes:
Add missing include env.h, util.h, bpf-event.h and bpf-util.h to disasm.c, to fix things like:
util/disasm.c: In function ‘symbol__disassemble_bpf’: util/disasm.c:1203:9: error: implicit declaration of function ‘perf_exe’ [-Werror=implicit-function-declaration] 1203 | perf_exe(tpath, sizeof(tpath)); | ^~~~~~~~
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
10adbf77 |
| 29-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Add and use ins__is_nop()
Likewise, add ins__is_nop() to check if the current instruction is NOP.
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Ian Rogers <irogers@goo
perf annotate: Add and use ins__is_nop()
Likewise, add ins__is_nop() to check if the current instruction is NOP.
Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1 |
|
| #
cbaf89a8 |
| 19-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Parse x86 segment register location
Add a segment field in the struct annotated_insn_loc and save it for the segment based addressing like %gs:0x28. For simplicity it now handles %gs
perf annotate: Parse x86 segment register location
Add a segment field in the struct annotated_insn_loc and save it for the segment based addressing like %gs:0x28. For simplicity it now handles %gs register only.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
5cdd3fd7 |
| 19-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Add annotate_get_basic_blocks()
The annotate_get_basic_blocks() is to find a list of basic blocks from the source instruction to the destination instruction in a function.
It'll be u
perf annotate: Add annotate_get_basic_blocks()
The annotate_get_basic_blocks() is to find a list of basic blocks from the source instruction to the destination instruction in a function.
It'll be used to find variables in a scope. Use BFS (Breadth First Search) to find a shortest path to carry the variable/register state minimally.
Also change find_disasm_line() to be used in annotate_get_basic_blocks() and add 'allow_update' argument to control if it can update the IP.
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.8 |
|
| #
0f66dfe7 |
| 04-Mar-2024 |
Namhyung Kim <[email protected]> |
perf annotate: Add comments in the data structures
Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <acme@redh
perf annotate: Add comments in the data structures
Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|