History log of /linux-6.15/tools/perf/util/annotate.h (Results 1 – 25 of 222)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# 30c5a394 10-Mar-2025 Namhyung Kim <[email protected]>

perf annotate: Implement code + data type annotation

Sometimes it's useful to see both instructions and their data type
together. Let's extend the annotate code to use data type profiling
functions

perf annotate: Implement code + data type annotation

Sometimes it's useful to see both instructions and their data type
together. Let's extend the annotate code to use data type profiling
functions.

To make it easy to pass more argument, introduce a struct to carry
necessary information together. Also add a new annotation_option called
'code_with_type' to control the behavior. This is not enabled yet but
it'll be set later from the command line.

For simplicity, this is implemented for --stdio only.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


# fe8da669 10-Mar-2025 Namhyung Kim <[email protected]>

perf annotate: Pass hist_entry to annotate functions

It's a prepartion to support code annotation and data type
annotation at the same time. Data type annotation needs more
information in the hist_

perf annotate: Pass hist_entry to annotate functions

It's a prepartion to support code annotation and data type
annotation at the same time. Data type annotation needs more
information in the hist_entry so it needs to be passed deeper.

Also rename a function with the same name in the builtin-annotate.c
to hist_entry__stdio_annotate since it matches better to the command
line option. And change the condition inside to be simpler.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.14-rc6
# dab8c32e 04-Mar-2025 Athira Rajeev <[email protected]>

perf annotate: Add annotation_options.disassembler_used

When doing "perf annotate", perf tool provides option to
use specific disassembler like llvm/objdump/capstone. The
order picked is to use llvm

perf annotate: Add annotation_options.disassembler_used

When doing "perf annotate", perf tool provides option to
use specific disassembler like llvm/objdump/capstone. The
order picked is to use llvm first and if that fails fallback
to objdump ie to use PERF_DISASM_LLVM, PERF_DISASM_CAPSTONE
and PERF_DISASM_OBJDUMP

In powerpc, when using "data type" sort keys, first preferred
approach is to read the raw instruction from the DSO. In objdump
is specified in "--objdump" option, it picks the symbol disassemble
using objdump. Currently disasm_line__parse_powerpc() function
uses length of the "line" to determine if objdump is used.
But there are few cases, where if objdump doesn't recognise the
instruction, the disassembled string will be empty.

Example:

134cdc: c4 05 82 41 beq 1352a0 <getcwd+0x6e0>
134ce0: ac 00 8e 40 bne cr3,134d8c <getcwd+0x1cc>
134ce4: 0f 00 10 04 pld r9,1028308
====>134ce8: d4 b0 20 e5
134cec: 16 00 40 39 li r10,22
134cf0: 48 01 21 ea ld r17,328(r1)

So depending on length of line will give bad results.

Add a new filed to annotation options structure,
"struct annotation_options" to save the disassembler used.
Use this info to determine if disassembly is done while
parsing the disasm line.

Reported-by: Tejas Manhas <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-By: Venkat Rao Bagalkote <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1
# bde4ccfd 24-Jan-2025 Ian Rogers <[email protected]>

perf annotate: Use an array for the disassembler preference

Prior to this change a string was used which could cause issues with
an unrecognized disassembler in symbol__disassembler. Change to
initi

perf annotate: Use an array for the disassembler preference

Prior to this change a string was used which could cause issues with
an unrecognized disassembler in symbol__disassembler. Change to
initializing an array of perf_disassembler enum values. If a value
already exists then adding it a second time is ignored to avoid array
out of bounds problems present in the previous code, it also allows a
statically sized array and removes memory allocation needs. Errors in
the disassembler string are reported when the config is parsed during
perf annotate or perf top start up. If the array is uninitialized
after processing the config file the default llvm, capstone then
objdump values are added but without a need to parse a string.

Fixes: a6e8a58de629 ("perf disasm: Allow configuring what disassemblers to use")
Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.13
# 035f0c27 17-Jan-2025 Ian Rogers <[email protected]>

perf annotate: Prefer passing evsel to evsel->core.idx

An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in anno

perf annotate: Prefer passing evsel to evsel->core.idx

An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in annotate code. Internally the code just reads evsel->core.idx
so behavior is unchanged.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Chen Ni <[email protected]>
Cc: Athira Rajeev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1
# b2b95a2d 29-Nov-2024 Arnaldo Carvalho de Melo <[email protected]>

perf disasm: Return a proper error when not determining the file type

Before:

⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sy

perf disasm: Return a proper error when not determining the file type

Before:

⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)'
Error:
Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int):
Internal error: Invalid -1 error code
⬢ [acme@toolbox a]$

After:

⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)'
Error:
Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int):
Couldn't determine the file /tmp/perf-3308868.map type.
⬢ [acme@toolbox a]$

Reported-by: Francesco Nigro <[email protected]>
Reported-by: Ilan Green <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonatan Goldschmidt <[email protected]>
Link: https://lore.kernel.org/lkml/Z092D9-r_iOgwIWM@x1
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.12
# a6e8a58d 11-Nov-2024 Arnaldo Carvalho de Melo <[email protected]>

perf disasm: Allow configuring what disassemblers to use

The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then

perf disasm: Allow configuring what disassemblers to use

The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.

More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.

So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.

This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.

As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.

root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax

root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent

Disassembly of section .text:

ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4

ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8

/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}

/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#

After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.

Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4
# e6952dce 13-Aug-2024 Kan Liang <[email protected]>

perf annotate: Display the branch counter histogram

Display the branch counter histogram in the annotation view.

Press 'B' to display the branch counter's abbreviation list as well.

Samples: 1M

perf annotate: Display the branch counter histogram

Display the branch counter histogram in the annotation view.

Press 'B' to display the branch counter's abbreviation list as well.

Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret

Originally-by: Tinghao Zhang <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 20d6f555 13-Aug-2024 Kan Liang <[email protected]>

perf report: Display the branch counter histogram

Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged bra

perf report: Display the branch counter histogram

Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged branch counter events. They are shown right after all the
cycle-related annotations.

Extend the 'struct block_info' to store and pass the branch counter
related information.

The annotation_br_cntr_entry() is to print the histogram of each branch
counter event. If the number of logged events is less than 4, the exact
number of the abbr name is printed. Otherwise, using '+' to stands for
more than 3 events.

Assume the number of logged events is less than 4.

The annotation_br_cntr_abbr_list() prints the branch counter's
abbreviation list. Press 'B' to display the list in the TUI mode.

$ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
$ perf report --total-cycles --stdio

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
# Event count (approx.): 1610046
#
# Branch counter abbr list:
# branch-instructions:ppp = A
# branch-misses = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 |A |- | ...
25.27% 1.1M 0.00% 2 |AA |- | ...
15.61% 667.2K 0.00% 1 |A |- | ...
0.16% 6.9K 0.81% 575 |A |- | ...
0.16% 6.8K 1.38% 977 |AA |- | ...
0.16% 6.8K 0.04% 28 |AA |B | ...
0.15% 6.6K 1.33% 946 |A |- | ...
0.11% 4.5K 0.06% 46 |AAA+|- | ...
0.10% 4.4K 0.88% 624 |A |- | ...
0.09% 3.7K 0.74% 524 |AAA+|B | ...

With -v applied,

# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 A=1 ,B=- ...
25.27% 1.1M 0.00% 2 A=2 ,B=- ...
15.61% 667.2K 0.00% 1 A=1 ,B=- ...
0.16% 6.9K 0.81% 575 A=1 ,B=- ...
0.16% 6.8K 1.38% 977 A=2 ,B=- ...
0.16% 6.8K 0.04% 28 A=2 ,B=1 ...
0.15% 6.6K 1.33% 946 A=1 ,B=- ...
0.11% 4.5K 0.06% 46 A=3+,B=- ...
0.10% 4.4K 0.88% 624 A=1 ,B=- ...
0.09% 3.7K 0.74% 524 A=3+,B=1 ...

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 1f2b7fbb 13-Aug-2024 Kan Liang <[email protected]>

perf annotate: Save branch counters for each block

When annotating a basic block, it's useful to display the occurrences
of other events in the block.

The branch counter feature is only available f

perf annotate: Save branch counters for each block

When annotating a basic block, it's useful to display the occurrences
of other events in the block.

The branch counter feature is only available for newer Intel platforms.

So a dedicated option to display the branch counters is not introduced.

Reuse the existing --total-cycles option, which triggers the annotation
of a basic block and displays the cycle-related annotation.

When the branch counters information is available, the branch counters
are automatically appended after all the cycle-related annotation.

Accounting the branch counters as well when accounting the cycles in
hist__account_cycles().

In 'struct annotated_branch', introduce a br_cntr array to save the
accumulation of each branch counter.

In a sample, all the branch counters for a branch are saved in a u64
space.

Because the saturation of a branch counter is small, e.g., for Intel
Sierra Forest, the saturation is only 3.

Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter
once saturated. That can be used to indicate a potential event lost
because of the saturation.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.11-rc3
# 037f1b67 05-Aug-2024 Namhyung Kim <[email protected]>

perf annotate: Cache debuginfo for data type profiling

In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
like

perf annotate: Cache debuginfo for data type profiling

In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
likely accesses the same binary again and again.

Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.

Note that this will introduce a little difference in the output due to
the order of checking stack operations. It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check. So it'll report stack operations in DSOs without debug
info as unknown. But I think it's ok and better to have the checking
near the caching logic.

Committer testing:

root@x1:~# perf mem record -a sleep 5s
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~# diff -u before after
--- before 2024-08-08 09:33:53.880780784 -0300
+++ after 2024-08-08 09:35:13.917325041 -0300
@@ -81,8 +81,8 @@
# Overhead Data Type
# ........ .........
#
- 55.43% (unknown)
- 11.61% (stack operation)
+ 55.56% (unknown)
+ 11.48% (stack operation)
4.93% struct pcpu_hot
3.26% unsigned int
2.48% struct

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.11-rc2
# b00e4d0d 03-Aug-2024 Namhyung Kim <[email protected]>

perf annotate: Use annotation__pcnt_width() consistently

The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
functio

perf annotate: Use annotation__pcnt_width() consistently

The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
function consistently so that we can make sure it has similar output
in different modes. But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.

Let's use 8 and adjust the print width in __annotation_line__write()
properly.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.11-rc1
# 06dd4c5a 18-Jul-2024 Athira Rajeev <[email protected]>

perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc

Currently, the perf tool infrastructure uses the disasm_line__parse
function to parse disassembled line.

Example snippet

perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc

Currently, the perf tool infrastructure uses the disasm_line__parse
function to parse disassembled line.

Example snippet from objdump:

objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>

c0000000010224b4: lwz r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset.

In powerpc, the approach for data type profiling uses raw instruction
instead of result from objdump to identify the instruction category and
extract the source/target registers.

Example: 38 01 81 e8 ld r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:

line -> "38 01 81 e8 ld r4,312(r1)"
raw instruction "38 01 81 e8"

Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw).

Function "disasm_line__parse_powerpc fills the raw instruction hex value
and can use macros to get opcode. There is no changes in existing code
paths, which parses the disassembled code. The size of raw instruction
depends on architecture.

In case of powerpc, the parsing the disasm line needs to handle cases
for reading binary code directly from DSO as well as parsing the objdump
result. Hence adding the logic into separate function instead of
updating "disasm_line__parse". The architecture using the instruction
name and present approach is not altered. Since this approach targets
powerpc, the macro implementation is added for powerpc as of now.

Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mnemonic representation.

Also in case if the DSO read fails and libcapstone is not supported, the
approach fallback to use objdump as option. Hence as option, patch has
changes to ensure objdump option also works well.

Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
[ Add check for strndup() result ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3
# 8c004c7a 04-Apr-2024 Namhyung Kim <[email protected]>

perf annotate: Move 'start' field struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol

perf annotate: Move 'start' field struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol ('struct annotation').

Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Ian Rogers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: LKML <[email protected]>
Cc: <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 6f94a72d 04-Apr-2024 Namhyung Kim <[email protected]>

perf annotate: Move nr_events struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol ('st

perf annotate: Move nr_events struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol ('struct annotation').

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# f6b18aba 04-Apr-2024 Namhyung Kim <[email protected]>

perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every sy

perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol ('struct annotation').

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# a46acc45 04-Apr-2024 Namhyung Kim <[email protected]>

perf annotate: Move 'widths' struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with
actual samples. No need to consume memory for every symbol
('str

perf annotate: Move 'widths' struct to 'struct annotated_source'

It's only used in 'perf annotate' output which means functions with
actual samples. No need to consume memory for every symbol
('struct annotation').

Also move the 'max_line_len' field into it as it's related.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# cee9b860 04-Apr-2024 Namhyung Kim <[email protected]>

perf annotate: Get rid of offsets array

The struct annotated_source.offsets[] is to save pointers to
annotation_line at each offset. We can use annotated_source__get_line()
helper instead so let's

perf annotate: Get rid of offsets array

The struct annotated_source.offsets[] is to save pointers to
annotation_line at each offset. We can use annotated_source__get_line()
helper instead so let's get rid of the array.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 6f157d9a 04-Apr-2024 Namhyung Kim <[email protected]>

perf annotate: Introduce annotated_source__get_line()

It's a helper function to get annotation_line at the given offset
without using the offsets array. The goal is to get rid of the
offsets array

perf annotate: Introduce annotated_source__get_line()

It's a helper function to get annotation_line at the given offset
without using the offsets array. The goal is to get rid of the
offsets array altogether. It just does the linear search but I
think it's better to save memory as it won't be called in a hot
path.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# bfd98ceb 04-Apr-2024 Namhyung Kim <[email protected]>

perf annotate: Staticize some local functions

I found annotation__mark_jump_targets(), annotation__set_offsets()
and annotation__init_column_widths() are only used in the same file.
Let's make them

perf annotate: Staticize some local functions

I found annotation__mark_jump_targets(), annotation__set_offsets()
and annotation__init_column_widths() are only used in the same file.
Let's make them static.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.9-rc2
# 98f69a57 29-Mar-2024 Namhyung Kim <[email protected]>

perf annotate: Split out util/disasm.c

The util/annotate.c code has both disassembly and sample annotation
related codes. Factor out the disasm part so that it can be handled
more easily.

No funct

perf annotate: Split out util/disasm.c

The util/annotate.c code has both disassembly and sample annotation
related codes. Factor out the disasm part so that it can be handled
more easily.

No functional changes intended.

Committer notes:

Add missing include env.h, util.h, bpf-event.h and bpf-util.h to
disasm.c, to fix things like:

util/disasm.c: In function ‘symbol__disassemble_bpf’:
util/disasm.c:1203:9: error: implicit declaration of function ‘perf_exe’ [-Werror=implicit-function-declaration]
1203 | perf_exe(tpath, sizeof(tpath));
| ^~~~~~~~

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 10adbf77 29-Mar-2024 Namhyung Kim <[email protected]>

perf annotate: Add and use ins__is_nop()

Likewise, add ins__is_nop() to check if the current instruction is NOP.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Ian Rogers <irogers@goo

perf annotate: Add and use ins__is_nop()

Likewise, add ins__is_nop() to check if the current instruction is NOP.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.9-rc1
# cbaf89a8 19-Mar-2024 Namhyung Kim <[email protected]>

perf annotate: Parse x86 segment register location

Add a segment field in the struct annotated_insn_loc and save it for the
segment based addressing like %gs:0x28. For simplicity it now handles
%gs

perf annotate: Parse x86 segment register location

Add a segment field in the struct annotated_insn_loc and save it for the
segment based addressing like %gs:0x28. For simplicity it now handles
%gs register only.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 5cdd3fd7 19-Mar-2024 Namhyung Kim <[email protected]>

perf annotate: Add annotate_get_basic_blocks()

The annotate_get_basic_blocks() is to find a list of basic blocks from
the source instruction to the destination instruction in a function.

It'll be u

perf annotate: Add annotate_get_basic_blocks()

The annotate_get_basic_blocks() is to find a list of basic blocks from
the source instruction to the destination instruction in a function.

It'll be used to find variables in a scope. Use BFS (Breadth First
Search) to find a shortest path to carry the variable/register state
minimally.

Also change find_disasm_line() to be used in annotate_get_basic_blocks()
and add 'allow_update' argument to control if it can update the IP.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.8
# 0f66dfe7 04-Mar-2024 Namhyung Kim <[email protected]>

perf annotate: Add comments in the data structures

Reviewed-by: Ian Rogers <[email protected]>
Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <acme@redh

perf annotate: Add comments in the data structures

Reviewed-by: Ian Rogers <[email protected]>
Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


123456789