History log of /linux-6.15/tools/perf/util/annotate.c (Results 1 – 25 of 468)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# 30c5a394 10-Mar-2025 Namhyung Kim <[email protected]>

perf annotate: Implement code + data type annotation

Sometimes it's useful to see both instructions and their data type
together. Let's extend the annotate code to use data type profiling
functions

perf annotate: Implement code + data type annotation

Sometimes it's useful to see both instructions and their data type
together. Let's extend the annotate code to use data type profiling
functions.

To make it easy to pass more argument, introduce a struct to carry
necessary information together. Also add a new annotation_option called
'code_with_type' to control the behavior. This is not enabled yet but
it'll be set later from the command line.

For simplicity, this is implemented for --stdio only.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


# 236ee256 10-Mar-2025 Namhyung Kim <[email protected]>

perf annotate: Factor out __hist_entry__get_data_type()

So that it can only handle a single disasm_linme and hopefully make the
code simpler. This is also a preparation to be called from different

perf annotate: Factor out __hist_entry__get_data_type()

So that it can only handle a single disasm_linme and hopefully make the
code simpler. This is also a preparation to be called from different
places later.

The NO_TYPE macro was added to distinguish when it failed or needs retry.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


# fe8da669 10-Mar-2025 Namhyung Kim <[email protected]>

perf annotate: Pass hist_entry to annotate functions

It's a prepartion to support code annotation and data type
annotation at the same time. Data type annotation needs more
information in the hist_

perf annotate: Pass hist_entry to annotate functions

It's a prepartion to support code annotation and data type
annotation at the same time. Data type annotation needs more
information in the hist_entry so it needs to be passed deeper.

Also rename a function with the same name in the builtin-annotate.c
to hist_entry__stdio_annotate since it matches better to the command
line option. And change the condition inside to be simpler.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


# 9aa3cbbf 10-Mar-2025 Namhyung Kim <[email protected]>

perf annotate: Pass annotation_options to annotation_line__print()

The annotation_line__print() has many arguments. But min_percent,
max_lines and percent_type are from struct annotaion_options. S

perf annotate: Pass annotation_options to annotation_line__print()

The annotation_line__print() has many arguments. But min_percent,
max_lines and percent_type are from struct annotaion_options. So let's
pass a pointer to the option instead of passing them separately to
reduce the number of function arguments.

Actually it has a recursive call if 'queue' is set. Add a new option
instance to pass different values for the case.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


# 1f284082 10-Mar-2025 Namhyung Kim <[email protected]>

perf annotate: Remove unused len parameter from annotation_line__print()

It's not used anywhere, let's get rid of it.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/20

perf annotate: Remove unused len parameter from annotation_line__print()

It's not used anywhere, let's get rid of it.

Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1
# bde4ccfd 24-Jan-2025 Ian Rogers <[email protected]>

perf annotate: Use an array for the disassembler preference

Prior to this change a string was used which could cause issues with
an unrecognized disassembler in symbol__disassembler. Change to
initi

perf annotate: Use an array for the disassembler preference

Prior to this change a string was used which could cause issues with
an unrecognized disassembler in symbol__disassembler. Change to
initializing an array of perf_disassembler enum values. If a value
already exists then adding it a second time is ignored to avoid array
out of bounds problems present in the previous code, it also allows a
statically sized array and removes memory allocation needs. Errors in
the disassembler string are reported when the config is parsed during
perf annotate or perf top start up. If the array is uninitialized
after processing the config file the default llvm, capstone then
objdump values are added but without a need to parse a string.

Fixes: a6e8a58de629 ("perf disasm: Allow configuring what disassemblers to use")
Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.13
# 035f0c27 17-Jan-2025 Ian Rogers <[email protected]>

perf annotate: Prefer passing evsel to evsel->core.idx

An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in anno

perf annotate: Prefer passing evsel to evsel->core.idx

An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in annotate code. Internally the code just reads evsel->core.idx
so behavior is unchanged.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Chen Ni <[email protected]>
Cc: Athira Rajeev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12
# a6e8a58d 11-Nov-2024 Arnaldo Carvalho de Melo <[email protected]>

perf disasm: Allow configuring what disassemblers to use

The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then

perf disasm: Allow configuring what disassemblers to use

The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.

More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.

So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.

This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.

As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.

root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax

root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent

Disassembly of section .text:

ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4

ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8

/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}

/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#

After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.

Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.12-rc7
# 9fc4489a 08-Nov-2024 Ian Rogers <[email protected]>

perf dwarf-regs: Pass accurate disassembly machine to get_dwarf_regnum

Rather than pass 0/EM_NONE, use the value computed in the disasm
struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_

perf dwarf-regs: Pass accurate disassembly machine to get_dwarf_regnum

Rather than pass 0/EM_NONE, use the value computed in the disasm
struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_NONE if
it were passed to get_dwarf_regnum. Pass a flags value as
architectures like csky need the flags to determine the ABI variant.

Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Shenlin Liang <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Guilherme Amadio <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Alexander Lobakin <[email protected]>
Cc: Przemek Kitszel <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: James Clark <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Chen Pei <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Aditya Gupta <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Bibo Mao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Dima Kogan <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Dr. David Alan Gilbert <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>

show more ...


Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11
# edf3ce0e 09-Sep-2024 Kan Liang <[email protected]>

perf env: Find correct branch counter info on hybrid

No event is printed in the "Branch Counter" column on hybrid machines.

For example,

$ perf record -e "{cpu_core/branch-instructions/pp,cpu_co

perf env: Find correct branch counter info on hybrid

No event is printed in the "Branch Counter" column on hybrid machines.

For example,

$ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter
$ perf report --total-cycles

# Branch counter abbr list:
# cpu_core/branch-instructions/pp = A
# cpu_core/branches/ = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter
# ............... .............. ........... .......... ..............
44.54% 727.1K 0.00% 1 |+ |+ |
36.31% 592.7K 0.00% 2 |+ |+ |
17.83% 291.1K 0.00% 1 |+ |+ |

The branch counter information (br_cntr_width and br_cntr_nr) in the
perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS
is not available on hybrid machines. Without the width information, the
number of occurrences of an event cannot be calculated.

For a hybrid machine, the caps information should be retrieved from the
PMU_CAPS, and stored in the perf_env->pmu_caps.

Add a perf_env__find_br_cntr_info() to return the correct branch counter
information from the corresponding fields.

Committer notes:

While testing I couldn't s ee those "Branch counter" columns enabled by
pressing 'B' on the TUI, after reporting it to the list Kan explained
the situation:

<quote Kan Liang>
For a hybrid client, the "Branch Counter" feature is only supported
starting from the just released Lunar Lake. Perf falls back to only
"ANY" on your Raptor Lake.

The "The branch counter is not available" message is expected.

Here is the 'perf evlist' result from my Lunar Lake machine,

# perf evlist -v
cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS
#
</quote>

Fixes: 6f9d8d1de2c61288 ("perf script: Add branch counters")
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# c8b93587 09-Sep-2024 Namhyung Kim <[email protected]>

perf annotate: Treat 'call' instruction as stack operation

I found some portion of mem-store events sampled on CALL instruction
which has no memory access. But it actually saves a return address
in

perf annotate: Treat 'call' instruction as stack operation

I found some portion of mem-store events sampled on CALL instruction
which has no memory access. But it actually saves a return address
into stack. It should be considered as a stack operation like RET
instruction.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4
# e6952dce 13-Aug-2024 Kan Liang <[email protected]>

perf annotate: Display the branch counter histogram

Display the branch counter histogram in the annotation view.

Press 'B' to display the branch counter's abbreviation list as well.

Samples: 1M

perf annotate: Display the branch counter histogram

Display the branch counter histogram in the annotation view.

Press 'B' to display the branch counter's abbreviation list as well.

Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret

Originally-by: Tinghao Zhang <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 20d6f555 13-Aug-2024 Kan Liang <[email protected]>

perf report: Display the branch counter histogram

Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged bra

perf report: Display the branch counter histogram

Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged branch counter events. They are shown right after all the
cycle-related annotations.

Extend the 'struct block_info' to store and pass the branch counter
related information.

The annotation_br_cntr_entry() is to print the histogram of each branch
counter event. If the number of logged events is less than 4, the exact
number of the abbr name is printed. Otherwise, using '+' to stands for
more than 3 events.

Assume the number of logged events is less than 4.

The annotation_br_cntr_abbr_list() prints the branch counter's
abbreviation list. Press 'B' to display the list in the TUI mode.

$ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
$ perf report --total-cycles --stdio

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
# Event count (approx.): 1610046
#
# Branch counter abbr list:
# branch-instructions:ppp = A
# branch-misses = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 |A |- | ...
25.27% 1.1M 0.00% 2 |AA |- | ...
15.61% 667.2K 0.00% 1 |A |- | ...
0.16% 6.9K 0.81% 575 |A |- | ...
0.16% 6.8K 1.38% 977 |AA |- | ...
0.16% 6.8K 0.04% 28 |AA |B | ...
0.15% 6.6K 1.33% 946 |A |- | ...
0.11% 4.5K 0.06% 46 |AAA+|- | ...
0.10% 4.4K 0.88% 624 |A |- | ...
0.09% 3.7K 0.74% 524 |AAA+|B | ...

With -v applied,

# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 A=1 ,B=- ...
25.27% 1.1M 0.00% 2 A=2 ,B=- ...
15.61% 667.2K 0.00% 1 A=1 ,B=- ...
0.16% 6.9K 0.81% 575 A=1 ,B=- ...
0.16% 6.8K 1.38% 977 A=2 ,B=- ...
0.16% 6.8K 0.04% 28 A=2 ,B=1 ...
0.15% 6.6K 1.33% 946 A=1 ,B=- ...
0.11% 4.5K 0.06% 46 A=3+,B=- ...
0.10% 4.4K 0.88% 624 A=1 ,B=- ...
0.09% 3.7K 0.74% 524 A=3+,B=1 ...

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 1f2b7fbb 13-Aug-2024 Kan Liang <[email protected]>

perf annotate: Save branch counters for each block

When annotating a basic block, it's useful to display the occurrences
of other events in the block.

The branch counter feature is only available f

perf annotate: Save branch counters for each block

When annotating a basic block, it's useful to display the occurrences
of other events in the block.

The branch counter feature is only available for newer Intel platforms.

So a dedicated option to display the branch counters is not introduced.

Reuse the existing --total-cycles option, which triggers the annotation
of a basic block and displays the cycle-related annotation.

When the branch counters information is available, the branch counters
are automatically appended after all the cycle-related annotation.

Accounting the branch counters as well when accounting the cycles in
hist__account_cycles().

In 'struct annotated_branch', introduce a br_cntr array to save the
accumulation of each branch counter.

In a sample, all the branch counters for a branch are saved in a u64
space.

Because the saturation of a branch counter is small, e.g., for Intel
Sierra Forest, the saturation is only 3.

Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter
once saturated. That can be used to indicate a potential event lost
because of the saturation.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.11-rc3
# 037f1b67 05-Aug-2024 Namhyung Kim <[email protected]>

perf annotate: Cache debuginfo for data type profiling

In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
like

perf annotate: Cache debuginfo for data type profiling

In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
likely accesses the same binary again and again.

Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.

Note that this will introduce a little difference in the output due to
the order of checking stack operations. It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check. So it'll report stack operations in DSOs without debug
info as unknown. But I think it's ok and better to have the checking
near the caching logic.

Committer testing:

root@x1:~# perf mem record -a sleep 5s
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~# diff -u before after
--- before 2024-08-08 09:33:53.880780784 -0300
+++ after 2024-08-08 09:35:13.917325041 -0300
@@ -81,8 +81,8 @@
# Overhead Data Type
# ........ .........
#
- 55.43% (unknown)
- 11.61% (stack operation)
+ 55.56% (unknown)
+ 11.48% (stack operation)
4.93% struct pcpu_hot
3.26% unsigned int
2.48% struct

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.11-rc2
# ce533c9b 03-Aug-2024 Namhyung Kim <[email protected]>

perf annotate: Add --skip-empty option

Like in 'perf report', we want to hide empty events in the 'perf annotate'
output. This is consistent when the option is set in perf report.

For example, the

perf annotate: Add --skip-empty option

Like in 'perf report', we want to hide empty events in the 'perf annotate'
output. This is consistent when the option is set in perf report.

For example, the following command would use 3 events including dummy.

$ perf mem record -a -- perf test -w noploop

$ perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u

Just using perf annotate with --group will show the all 3 events.

$ perf annotate --group --stdio | head
Percent | Source code & Disassembly of ...
--------------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 0.00 : e060: pushq %rbp
0.00 0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 0.00 : e064: pushq %r15
0.00 0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 0.00 : e069: pushq %r14
0.00 0.00 0.00 : e06b: pushq %r13
0.00 0.00 0.00 : e06d: movl %edx, %r13d

Now with --skip-empty, it'll hide the last dummy event.

$ perf annotate --group --stdio --skip-empty | head
Percent | Source code & Disassembly of ...
------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 : e060: pushq %rbp
0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 : e064: pushq %r15
0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 : e069: pushq %r14
0.00 0.00 : e06b: pushq %r13
0.00 0.00 : e06d: movl %edx, %r13d

Committer testing:

root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~#

Before:

root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 0.00 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#

After:

root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# b00e4d0d 03-Aug-2024 Namhyung Kim <[email protected]>

perf annotate: Use annotation__pcnt_width() consistently

The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
functio

perf annotate: Use annotation__pcnt_width() consistently

The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
function consistently so that we can make sure it has similar output
in different modes. But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.

Let's use 8 and adjust the print width in __annotation_line__write()
properly.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# cb1e8bfc 03-Aug-2024 Namhyung Kim <[email protected]>

perf annotate: Set notes->src->nr_events early

We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.

Signed-off-by: Namhyung Ki

perf annotate: Set notes->src->nr_events early

We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 2dc02c26 03-Aug-2024 Namhyung Kim <[email protected]>

perf annotate: Use al->data_nr if possible

The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array. The notes->src->nr_events should have
the same numbe

perf annotate: Use al->data_nr if possible

The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array. The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.11-rc1
# 2c9db747 18-Jul-2024 Athira Rajeev <[email protected]>

perf annotate: Set instruction name to be used with insn-stat when using raw instruction

Since the "ins.name" is not set while using raw instruction,
'perf annotate' with insn-stat gives wrong data:

perf annotate: Set instruction name to be used with insn-stat when using raw instruction

Since the "ins.name" is not set while using raw instruction,
'perf annotate' with insn-stat gives wrong data:

Result from "./perf annotate --data-type --insn-stat":

Annotate Instruction stats
total 615, ok 419 (68.1%), bad 196 (31.9%)

Name : Good Bad
-----------------------------------------------------------
: 419 196

This patch sets "dl->ins.name" in arch specific function
"check_ppc_insn" while initialising "struct disasm_line".

Also update "ins_find" function to pass "struct disasm_line" as a
parameter so as to set its name field in arch specific call.

With the patch changes:

Annotate Instruction stats
total 609, ok 446 (73.2%), bad 163 (26.8%)

Name/opcode : Good Bad
-----------------------------------------------------------
58 : 323 80
32 : 49 43
34 : 33 11
OP_31_XOP_LDX : 8 20
40 : 23 0
OP_31_XOP_LWARX : 5 1
OP_31_XOP_LWZX : 2 3
OP_31_XOP_LDARX : 3 0
33 : 0 2
OP_31_XOP_LBZX : 0 1
OP_31_XOP_LWAX : 0 1
OP_31_XOP_LHZX : 0 1

Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 1b4406d2 18-Jul-2024 Athira Rajeev <[email protected]>

perf annotate: Update parameters for reg extract functions to use raw instruction on powerpc

Use the raw instruction code and macros to identify memory instructions,
extract register fields and also

perf annotate: Update parameters for reg extract functions to use raw instruction on powerpc

Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset.

The implementation addresses the D-form, X-form, DS-form instructions.

Adds "mem_ref" field to check whether source/target has memory
reference.

Add function "get_powerpc_regs" which will set these fields: reg1, reg2,
offset depending of where it is source or target ops.

Update "parse" callback for "struct ins_ops" to also pass "struct
disasm_line" as argument. This is needed in parse functions where opcode
is used to determine whether to set multi_regs and other fields

Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9
# 9ef30265 10-May-2024 Namhyung Kim <[email protected]>

perf annotate: Fix segfault on sample histogram

A symbol can have no samples, then accessing the annotated_source->samples
hashmap will result in a segfault.

Fixes: a3f7768bcf48281d ("perf annotate

perf annotate: Fix segfault on sample histogram

A symbol can have no samples, then accessing the annotated_source->samples
hashmap will result in a segfault.

Fixes: a3f7768bcf48281d ("perf annotate: Fix memory leak in annotated_source")
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# a3f7768b 07-May-2024 Ian Rogers <[email protected]>

perf annotate: Fix memory leak in annotated_source

Freeing hash map doesn't free the entries added to the hashmap, add
the missing free().

Fixes: d3e7cad6f36d9e80 ("perf annotate: Add a hashmap for

perf annotate: Fix memory leak in annotated_source

Freeing hash map doesn't free the entries added to the hashmap, add
the missing free().

Fixes: d3e7cad6f36d9e80 ("perf annotate: Add a hashmap for symbol histogram")
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ben Gainey <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Li Dong <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 36e8aa90 06-May-2024 Athira Rajeev <[email protected]>

perf annotate: Fix a comment about multi_regs in extract_reg_offset function

Fix a comment in function which explains how multi_regs field gets set
for an instruction. In the example, "mov %rsi, 8(

perf annotate: Fix a comment about multi_regs in extract_reg_offset function

Fix a comment in function which explains how multi_regs field gets set
for an instruction. In the example, "mov %rsi, 8(%rbx,%rcx,4)", the
comment mistakenly referred to "dst_multi_regs = 0". Correct it to use
"src_multi_regs = 0"

Signed-off-by: Athira Rajeev <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


# 69fb6eab 07-May-2024 Arnaldo Carvalho de Melo <[email protected]>

perf annotate: Use zfree() to avoid possibly accessing dangling pointers

When freeing a->b it is good practice to set a->b to NULL using
zfree(&a->b) so that when we have a bug where a reference to

perf annotate: Use zfree() to avoid possibly accessing dangling pointers

When freeing a->b it is good practice to set a->b to NULL using
zfree(&a->b) so that when we have a bug where a reference to a freed 'a'
pointer is kept somewhere, we can more quickly cause a segfault if some
code tries to use a->b.

This is mostly done but some new cases were introduced recently, convert
them to zfree().

Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/ZjmbHHrjIm5YRIBv@x1
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

show more ...


12345678910>>...19