|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
b0920abe |
| 05-Mar-2025 |
Namhyung Kim <[email protected]> |
perf report: Do not process non-JIT BPF ksymbol events
The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so it'd be 0 when it's not JITed. The ksymbol is needed to symbolize the cod
perf report: Do not process non-JIT BPF ksymbol events
The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so it'd be 0 when it's not JITed. The ksymbol is needed to symbolize the code when it gets samples in the region but non-JITed code cannot get samples. Thus it'd be ok to ignore them.
Actually it caused a performance issue in the perf tools on old ARM kernels where it can refuse to JIT some BPF codes. It ended up splitting the existing kernel map (kallsyms). And later lookup for a kernel symbol would create a new kernel map from kallsyms and then split it again and again. :(
Probably there's a bug in the kernel map/symbol handling in perf tools. But I think we need to fix this anyway.
Reported-by: Kevin Nomura <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
41453107 |
| 28-Feb-2025 |
Namhyung Kim <[email protected]> |
perf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel maps
This was detected at the end of a 'perf record' session when build-id collection was enabled and thus the BPF programs put in p
perf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel maps
This was detected at the end of a 'perf record' session when build-id collection was enabled and thus the BPF programs put in place while the session was running, some even put in place by perf itself were processed and inserted, with some overlaps related to BPF trampolines and programs took place.
Using maps__fixup_overlap_and_insert() instead of maps__insert() "fixes" the problem, in the sense that overlaps will be dealt with and then the consistency will be kept, but it would be interesting to fully understand why such overlaps take place and how to deal with them when doing symbol resolution.
Reported-by: Arnaldo Carvalho de Melo <[email protected]> Suggested-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/lkml/CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
f7a46e02 |
| 28-Feb-2025 |
Namhyung Kim <[email protected]> |
perf machine: Fixup kernel maps ends after adding extra maps
I just noticed it would add extra kernel maps after modules. I think it should fixup end address of the kernel maps after adding all map
perf machine: Fixup kernel maps ends after adding extra maps
I just noticed it would add extra kernel maps after modules. I think it should fixup end address of the kernel maps after adding all maps first.
Fixes: 876e80cf83d10585 ("perf tools: Fixup end address of modules") Reported-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/lkml/Z7TvZGjVix2asYWI@x1 Link: https://lore.kernel.org/lkml/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4 |
|
| #
e7af1946 |
| 22-Feb-2025 |
Ian Rogers <[email protected]> |
perf machine: Reuse module path buffer
Rather than copying the path and appending the directory entry in a fresh path buffer, append to the path at the end of where it is for the recursion level. Th
perf machine: Reuse module path buffer
Rather than copying the path and appending the directory entry in a fresh path buffer, append to the path at the end of where it is for the recursion level. This saves a PATH_MAX buffer per recursion level and some unnecessary copying.
Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
f7cada5f |
| 22-Feb-2025 |
Ian Rogers <[email protected]> |
perf maps: Switch modules tree walk to io_dir__readdir
Compared to glibc's opendir/readdir this lowers the max RSS of perf record by 1.8MB on a Debian machine.
Acked-by: Namhyung Kim <namhyung@kern
perf maps: Switch modules tree walk to io_dir__readdir
Compared to glibc's opendir/readdir this lowers the max RSS of perf record by 1.8MB on a Debian machine.
Acked-by: Namhyung Kim <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2 |
|
| #
4bac7fb5 |
| 06-Feb-2025 |
Krzysztof Łopatowski <[email protected]> |
perf tools: Improve startup time by reducing unnecessary stat() calls
When testing perf trace on NixOS, I noticed significant startup delays: - `ls`: ~2ms - `strace ls`: ~10ms - `perf trace ls`: ~55
perf tools: Improve startup time by reducing unnecessary stat() calls
When testing perf trace on NixOS, I noticed significant startup delays: - `ls`: ~2ms - `strace ls`: ~10ms - `perf trace ls`: ~550ms
Profiling showed that 51% of the time is spent reading files, 26% in loading BPF programs, and 11% in `newfstatat`.
This patch optimizes module path exploration by avoiding `stat()` calls unless necessary. For filesystems that do not implement `d_type` (DT_UNKNOWN), it falls back to the old behavior. See `readdir(3)` for details.
This reduces `perf trace ls` time to ~500ms.
A more thorough startup optimization based on command parameters would be ideal, but that is a larger effort.
Signed-off-by: Krzysztof Łopatowski <[email protected]> Acked-by: Howard Chu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
f13bc61b |
| 13-Feb-2025 |
Dmitry Vyukov <[email protected]> |
perf report: Add machine parallelism
Add calculation of the current parallelism level (number of threads actively running on CPUs). The parallelism level can be shown in reports on its own, and to c
perf report: Add machine parallelism
Add calculation of the current parallelism level (number of threads actively running on CPUs). The parallelism level can be shown in reports on its own, and to calculate latency overheads.
Signed-off-by: Dmitry Vyukov <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Link: https://lore.kernel.org/r/0f8c1b8eb12619029e31b3d5c0346f4616a5aeda.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13 |
|
| #
dc6d2bc2 |
| 13-Jan-2025 |
Ian Rogers <[email protected]> |
perf sample: Make user_regs and intr_regs optional
The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. In
perf sample: Make user_regs and intr_regs optional
The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. Initializing this much memory has a cost reported by Tavian Barnes <[email protected]> as about 2.5% when running `perf script --itrace=i0`: https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/
Adrian Hunter <[email protected]> replied that the zero initialization was necessary and couldn't simply be removed.
This patch aims to strike a middle ground of still zeroing the perf_sample, but removing 79% of its size by make user_regs and intr_regs optional pointers to zalloc-ed memory. To support the allocation accessors are created for user_regs and intr_regs. To support correct cleanup perf_sample__init and perf_sample__exit functions are created and added throughout the code base.
Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
1df4b33f |
| 04-Feb-2025 |
Dr. David Alan Gilbert <[email protected]> |
perf tools: Deadcode removal
The last use of machine__fprintf_vmlinux_path() was removed in 2011 by commit ab81f3fd350c ("perf top: Reuse the 'report' hist_entry/hists classes")
mmap_cpu_mask__dupl
perf tools: Deadcode removal
The last use of machine__fprintf_vmlinux_path() was removed in 2011 by commit ab81f3fd350c ("perf top: Reuse the 'report' hist_entry/hists classes")
mmap_cpu_mask__duplicate() was added in 2021 by commit 6bd006c6eb7f ("perf mmap: Introduce mmap_cpu_mask__duplicate()") but hasn't been used since.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <[email protected]> Tested-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
876e80cf |
| 18-Dec-2024 |
Namhyung Kim <[email protected]> |
perf tools: Fixup end address of modules
In machine__create_module(), it reads /proc/modules to get a list of modules in the system. The file shows the start address (of text) and the size of the m
perf tools: Fixup end address of modules
In machine__create_module(), it reads /proc/modules to get a list of modules in the system. The file shows the start address (of text) and the size of the module so it uses the info to reconstruct system memory maps for symbol resolution.
But module memory consists of multiple segments and they can be scaterred. Currently perf tools assume they are contiguous and see some overlaps. This can confuse the tool when it finds a map containing a given address.
As we mostly care about the function symbols in the text segment, it can fixup the size or end address of modules when there's an overlap. We can use maps__fixup_end() which updates the end address using the start address of the next map.
Ideally it should be able to track other segments (like data/rodata), but that would require some changes in /proc/modules IMHO.
Reported-by: Blake Jones <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Daniel Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Pavlu <[email protected]> Cc: Sami Tolvanen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
7a93786c |
| 08-Jan-2025 |
Christophe Leroy <[email protected]> |
perf machine: Don't ignore _etext when not a text symbol
Depending on how vmlinux.lds is written, _etext might be the very first data symbol instead of the very last text symbol.
Don't require it t
perf machine: Don't ignore _etext when not a text symbol
Depending on how vmlinux.lds is written, _etext might be the very first data symbol instead of the very last text symbol.
Don't require it to be a text symbol, accept any symbol type.
Comitter notes:
See the first Link for further discussion, but it all boils down to this:
--- # grep -e _stext -e _etext -e _edata /proc/kallsyms c0000000 T _stext c08b8000 D _etext
So there is no _edata and _etext is not text
$ ppc-linux-objdump -x vmlinux | grep -e _stext -e _etext -e _edata c0000000 g .head.text 00000000 _stext c08b8000 g .rodata 00000000 _etext c1378000 g .sbss 00000000 _edata ---
Fixes: ed9adb2035b5be58 ("perf machine: Read also the end of the kernel") Signed-off-by: Christophe Leroy <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Link: https://lore.kernel.org/r/b3ee1994d95257cb7f2de037c5030ba7d1bed404.1736327613.git.christophe.leroy@csgroup.eu Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
88a6e2f6 |
| 26-Nov-2024 |
Arnaldo Carvalho de Melo <[email protected]> |
perf machine: Initialize machine->env to address a segfault
Its used from trace__run(), for the 'perf trace' live mode, i.e. its strace-like, non-perf.data file processing mode, the most common one.
perf machine: Initialize machine->env to address a segfault
Its used from trace__run(), for the 'perf trace' live mode, i.e. its strace-like, non-perf.data file processing mode, the most common one.
The trace__run() function will set trace->host using machine__new_host() that is supposed to give a machine instance representing the running machine, and since we'll use perf_env__arch_strerrno() to get the right errno -> string table, we need to use machine->env, so initialize it in machine__new_host().
Before the patch:
(gdb) run trace --errno-summary -a sleep 1 <SNIP> Summary of events:
gvfs-afc-volume (3187), 2 events, 0.0%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ pselect6 1 0 0.000 0.000 0.000 0.000 0.00%
GUsbEventThread (3519), 2 events, 0.0%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 1 0 0.000 0.000 0.000 0.000 0.00% <SNIP> Program received signal SIGSEGV, Segmentation fault. 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 478 if (env->arch_strerrno == NULL) (gdb) bt #0 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 #1 0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673 #2 0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708 #3 0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747 #4 0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456 #5 0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487 #6 0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351 #7 0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404 #8 0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448 #9 0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560 (gdb)
After:
root@number:~# perf trace -a --errno-summary sleep 1 <SNIP> pw-data-loop (2685), 1410 events, 16.0%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 188 0 983.428 0.000 5.231 15.595 8.68% ioctl 94 0 0.811 0.004 0.009 0.016 2.82% read 188 0 0.322 0.001 0.002 0.006 5.15% write 141 0 0.280 0.001 0.002 0.018 8.39% timerfd_settime 94 0 0.138 0.001 0.001 0.007 6.47%
gnome-control-c (179406), 1848 events, 20.9%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 222 0 959.577 0.000 4.322 21.414 11.40% recvmsg 150 0 0.539 0.001 0.004 0.013 5.12% write 300 0 0.442 0.001 0.001 0.007 3.29% read 150 0 0.183 0.001 0.001 0.009 5.53% getpid 102 0 0.101 0.000 0.001 0.008 7.82%
root@number:~#
Fixes: 54373b5d53c1f6aa ("perf env: Introduce perf_env__arch_strerrno()") Reported-by: Veronika Molnarova <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Veronika Molnarova <[email protected]> Acked-by: Michael Petlan <[email protected]> Tested-by: Michael Petlan <[email protected]> Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1 Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
05a62936 |
| 10-Oct-2024 |
Veronika Molnarova <[email protected]> |
perf dso: Fix symtab_type for kmod compression
During the rework of the dso structure in patch ee756ef7491eafd an increment was forgotten for the symtab_type in case the data for the kernel module a
perf dso: Fix symtab_type for kmod compression
During the rework of the dso structure in patch ee756ef7491eafd an increment was forgotten for the symtab_type in case the data for the kernel module are compressed. This affects the probing of the kernel modules, which fails if the data are not already cached.
Increment the value of the symtab_type to its compressed variant so the data could be recovered successfully.
Fixes: ee756ef7491eafd7 ("perf dso: Add reference count checking and accessor functions") Signed-off-by: Veronika Molnarova <[email protected]> Acked-by: Michael Petlan <[email protected]> Acked-by: Namhyung Kim <[email protected]> Tested-by: Michael Petlan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
02b27050 |
| 09-Sep-2024 |
Ian Rogers <[email protected]> |
perf callchain: Allow symbols to be optional when resolving a callchain
In uses like 'perf inject' it is not necessary to gather the symbol for each call chain location, the map for the sample IP is
perf callchain: Allow symbols to be optional when resolving a callchain
In uses like 'perf inject' it is not necessary to gather the symbol for each call chain location, the map for the sample IP is wanted so that build IDs and the like can be injected. Make gathering the symbol in the callchain_cursor optional.
For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to 29.6MB by avoiding loading symbols.
Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Casey Chen <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
1a5474a7 |
| 20-Aug-2024 |
Namhyung Kim <[email protected]> |
perf tools: Print lost samples due to BPF filter
Print the actual dropped sample count in the event stat.
$ sudo perf record -o- -e cycles --filter 'period < 10000' \ -e instructions --filt
perf tools: Print lost samples due to BPF filter
Print the actual dropped sample count in the event stat.
$ sudo perf record -o- -e cycles --filter 'period < 10000' \ -e instructions --filter 'ip > 0x8000000000000000' perf test -w noploop | \ perf report --stat -i- [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.058 MB - ]
Aggregated stats: TOTAL events: 469 MMAP events: 268 (57.1%) COMM events: 2 ( 0.4%) EXIT events: 1 ( 0.2%) SAMPLE events: 16 ( 3.4%) MMAP2 events: 22 ( 4.7%) LOST_SAMPLES events: 2 ( 0.4%) KSYMBOL events: 89 (19.0%) BPF_EVENT events: 39 ( 8.3%) ATTR events: 2 ( 0.4%) FINISHED_ROUND events: 1 ( 0.2%) ID_INDEX events: 1 ( 0.2%) THREAD_MAP events: 1 ( 0.2%) CPU_MAP events: 1 ( 0.2%) EVENT_UPDATE events: 2 ( 0.4%) TIME_CONV events: 1 ( 0.2%) FEATURE events: 20 ( 4.3%) FINISHED_INIT events: 1 ( 0.2%) cycles stats: SAMPLE events: 2 LOST_SAMPLES (BPF) events: 4010 instructions stats: SAMPLE events: 14 LOST_SAMPLES (BPF) events: 3990
Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: KP Singh <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4 |
|
| #
e4bb4caa |
| 17-Aug-2024 |
Ian Rogers <[email protected]> |
perf dso: Constify dso_id
The passed dso_id is copied and so is never an out argument. Remove its mutability.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]
perf dso: Constify dso_id
The passed dso_id is copied and so is never an out argument. Remove its mutability.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
1f2b7fbb |
| 13-Aug-2024 |
Kan Liang <[email protected]> |
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available f
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64 space.
Because the saturation of a branch counter is small, e.g., for Intel Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter once saturated. That can be used to indicate a potential event lost because of the saturation.
Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3 |
|
| #
599c1939 |
| 08-Aug-2024 |
Ian Rogers <[email protected]> |
perf callchain: Fix stitch LBR memory leaks
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps and map members are reference counted. Ensure these values use a _get routine to i
perf callchain: Fix stitch LBR memory leaks
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps and map members are reference counted. Ensure these values use a _get routine to increment the reference counts and use map_symbol__exit() to release the reference counts.
Do similar for 'struct thread's prev_lbr_cursor, but save the size of the prev_lbr_cursor array so that it may be iterated.
Ensure that when stitch_nodes are placed on the free list the map_symbols are exited.
Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to list_splice_init(), so the whole list is moved and nodes aren't leaked.
A reproduction of the memory leaks is possible with a leak sanitizer build in the perf report command of:
``` $ perf record -e cycles --call-graph lbr perf test -w thloop $ perf report --stitch-lbr ```
Reviewed-by: Kan Liang <[email protected]> Fixes: ff165628d72644e3 ("perf callchain: Stitch LBR call stack") Signed-off-by: Ian Rogers <[email protected]> [ Basic tests after applying the patch, repeating the example above ] Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Changbin Du <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9 |
|
| #
1a8c2e01 |
| 07-May-2024 |
Ian Rogers <[email protected]> |
perf mem-info: Add reference count checking
Add reference count checking and switch 'struct mem_info' usage to use accessor functions.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunt
perf mem-info: Add reference count checking
Add reference count checking and switch 'struct mem_info' usage to use accessor functions.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
ad3003a6 |
| 07-May-2024 |
Ian Rogers <[email protected]> |
perf mem-info: Move mem-info out of mem-events and symbol
Move mem-info to its own header rather than having it split between mem-events and symbol.
Signed-off-by: Ian Rogers <[email protected]> C
perf mem-info: Move mem-info out of mem-events and symbol
Move mem-info to its own header rather than having it split between mem-events and symbol.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
ee5061f8 |
| 06-May-2024 |
Ian Rogers <[email protected]> |
perf symbol-elf: Ensure dso__put() in machine__process_ksymbol_register()
The dso__put() after the map creation causes a use after put in dso__set_loaded().
To ensure there is a +1 reference count
perf symbol-elf: Ensure dso__put() in machine__process_ksymbol_register()
The dso__put() after the map creation causes a use after put in dso__set_loaded().
To ensure there is a +1 reference count on both sides of the if-else, do a dso__get() on the found map's dso.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Changbin Du <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tiezhu Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc7 |
|
| #
ee756ef7 |
| 04-May-2024 |
Ian Rogers <[email protected]> |
perf dso: Add reference count checking and accessor functions
Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCES
perf dso: Add reference count checking and accessor functions
Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCESS everywhere, add accessor functions for the variables in struct dso.
The majority of the change is mechanical in nature and not easy to split up.
Committer testing:
'perf test' up to this patch shows no regressions.
But:
util/symbol.c: In function ‘dso__load_bfd_symbols’: util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’ 1683 | dso__set_adjust_symbols(dso); | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from util/symbol.c:21: util/dso.h:268:20: note: declared here 268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val) | ^~~~~~~~~~~~~~~~~~~~~~~ make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1 MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/ make[6]: *** Waiting for unfinished jobs....
This was updated:
- symbols__fixup_end(&dso->symbols, false); - symbols__fixup_duplicate(&dso->symbols); - dso->adjust_symbols = 1; + symbols__fixup_end(dso__symbols(dso), false); + symbols__fixup_duplicate(dso__symbols(dso)); + dso__set_adjust_symbols(dso);
But not build tested with BUILD_NONDISTRO and libbfd devel files installed (binutils-devel on fedora).
Add the missing argument:
symbols__fixup_end(dso__symbols(dso), false); symbols__fixup_duplicate(dso__symbols(dso)); - dso__set_adjust_symbols(dso); + dso__set_adjust_symbols(dso, true);
Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahelenia Ziemiańska <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Changbin Du <[email protected]> Cc: Chengen Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dima Kogan <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Tiezhu Yang <[email protected]> Cc: Yanteng Si <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
0ffc8fca |
| 10-Apr-2024 |
Ian Rogers <[email protected]> |
perf dsos: Switch more loops to dsos__for_each_dso()
Switch loops within dsos.c, add a version that isn't locked. Switch some unlocked loops to hold the read lock.
Signed-off-by: Ian Rogers <iroger
perf dsos: Switch more loops to dsos__for_each_dso()
Switch loops within dsos.c, add a version that isn't locked. Switch some unlocked loops to hold the read lock.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Changbin Du <[email protected]> Cc: Chengen Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Markus Elfring <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
73f3fea2 |
| 10-Apr-2024 |
Ian Rogers <[email protected]> |
perf dsos: Introduce dsos__for_each_dso()
To better abstract the dsos internals, introduce dsos__for_each_dso that does a callback on each dso.
This also means the read lock can be correctly held.
perf dsos: Introduce dsos__for_each_dso()
To better abstract the dsos internals, introduce dsos__for_each_dso that does a callback on each dso.
This also means the read lock can be correctly held.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Changbin Du <[email protected]> Cc: Chengen Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Markus Elfring <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
| #
f649ed80 |
| 10-Apr-2024 |
Ian Rogers <[email protected]> |
perf dsos: Tidy reference counting and locking
Move more functionality into dsos.c generally from machine.c, renaming functions to match their new usage.
The find function is made to always "get" b
perf dsos: Tidy reference counting and locking
Move more functionality into dsos.c generally from machine.c, renaming functions to match their new usage.
The find function is made to always "get" before returning a dso.
Reduce the scope of locks in vdso to match the locking paradigm.
Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Changbin Du <[email protected]> Cc: Chengen Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Markus Elfring <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|