|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
45a86d01 |
| 04-Mar-2025 |
Namhyung Kim <[email protected]> |
perf test: Add --metric-only to perf stat output tests
Add a test case for --metric-only for std, csv, json output mode using shadow IPC metric from instructions and cycles events. It should produc
perf test: Add --metric-only to perf stat output tests
Add a test case for --metric-only for std, csv, json output mode using shadow IPC metric from instructions and cycles events. It should produce 'insn per cycle' metric.
But currently JSON output has (none) 'GHz' as well. It looks like a bug but I don't have enough time to debug it for now so I made it pass. :(
$ perf stat --metric-only -e instructions,cycles true
Performance counter stats for 'true':
0.56
0.002127319 seconds time elapsed
0.002077000 seconds user 0.000000000 seconds sys
$ perf stat -x, --metric-only -e instructions,cycles true
0.55,,
$ perf stat -j --metric-only -e instructions,cycles true {"insn per cycle" : "0.53", "GHz" : "none"}
$ perf test output -v 5: Test data source output : Ok 31: Sort output of hist entries : Ok 88: perf stat CSV output linter : Ok 90: perf stat JSON output linter : Ok 92: perf stat STD output linter : Ok
Tested-by: Thomas Falcon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Suggested-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
625f4de2 |
| 29-Oct-2024 |
Veronika Molnarova <[email protected]> |
perf test: Parse 'perf stat' Topdown events for aarch64
The 'perf stat' output on aarch64 machines with topdown events wasn't counted for in the 'perf stat STD output linter' test case. Add the topd
perf test: Parse 'perf stat' Topdown events for aarch64
The 'perf stat' output on aarch64 machines with topdown events wasn't counted for in the 'perf stat STD output linter' test case. Add the topdown metric to the skip_metric list as it is done for topdown events on other systems.
The Topdown events are also disabled on aarch64 KVM guests because the value of caps/slots is set to 0 due to the part of the system register being a stub.
This prevents the metric for the topdown events from being computed, leaving the 'perf stat' topdown metric without any value at all.
Add the "TopdownL1" to the skip_metric list as well to handle this possibility.
Before aarch64:
100: perf stat STD output linter: --- start --- test child forked, pid 403305 Checking STD output: no args Unknown event name in TopdownL1 # 4.3 percent of slots slots_lost_misspeculation_fraction ---- end(-1) ---- 100: perf stat STD output linter : FAILED!
Before aarch64 KVM:
100: perf stat STD output linter: --- start --- test child forked, pid 404671 Checking STD output: no args Unknown event name in TopdownL1 ---- end(-1) ---- 100: perf stat STD output linter : FAILED!
After:
100: perf stat STD output linter: --- start --- test child forked, pid 404777 Checking STD output: no args [Success] Checking STD output: system wide [Success] Checking STD output: interval [Success] Checking STD output: per thread [Success] Checking STD output: per node [Success] Checking STD output: system wide no aggregation [Success] Checking STD output: per core [Success] Checking STD output: per cache instance [Success] Checking STD output: per cluster [Success] Checking STD output: per die [Success] Checking STD output: per socket [Success] ---- end(0) ---- 100: perf stat STD output linter : Ok
Signed-off-by: Veronika Molnarova <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4 |
|
| #
cbc917a1 |
| 08-Feb-2024 |
Yicong Yang <[email protected]> |
perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel Ja
perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel Jacobsville). Currently parsing and building cluster topology have been supported since [1].
perf stat has already supported aggregation for other topologies like die or socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T bandwidth contention.
This patch add support for "--per-cluster" option for per-cluster aggregation. Also update the docs and related test. The output will be like:
[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
Performance counter stats for 'system wide':
S56-D0-CLS158 4 1,321,521,570 LLC-load S56-D0-CLS594 4 794,211,453 LLC-load S56-D0-CLS1030 4 41,623 LLC-load S56-D0-CLS1466 4 41,646 LLC-load S56-D0-CLS1902 4 16,863 LLC-load S56-D0-CLS2338 4 15,721 LLC-load S56-D0-CLS2774 4 22,671 LLC-load [...]
On a legacy system without cluster or cluster support, the output will be look like: [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Performance counter stats for 'system wide':
S56-D0-CLS0 64 18,011,485 cycles S7182-D0-CLS0 64 16,548,835 cycles
Note that this patch doesn't mix the cluster information in the outputs of --per-core to avoid breaking any tools/scripts using it.
Note that perf recently supports "--per-cache" aggregation, but it's not the same with the cluster although cluster CPUs may share some cache resources. For example on my machine all clusters within a die share the same L3 cache: $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list 0-31 $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list 0-3
[1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die")
Tested-by: Jie Zhan <[email protected]> Reviewed-by: Tim Chen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
5f70c6c5 |
| 07-Feb-2024 |
Yicong Yang <[email protected]> |
perf test: Skip metric w/o event name on arm64 in stat STD output linter
stat+std_output.sh test fails on my arm64 machine: [root@localhost shell]# ./stat+std_output.sh Checking STD output: no args
perf test: Skip metric w/o event name on arm64 in stat STD output linter
stat+std_output.sh test fails on my arm64 machine: [root@localhost shell]# ./stat+std_output.sh Checking STD output: no args Unknown event name in TopDownL1 # 0.18 retiring [root@localhost shell]# ./stat+std_output.sh Checking STD output: no args [Success] Checking STD output: system wide [Success] Checking STD output: interval [Success] Checking STD output: per thread Unknown event name in tmux: server-1114960 # 0.41 frontend_bound
When no args specified `perf stat` will add TopdownL1 metric group and the output will be like: [root@localhost shell]# perf stat -- stress-ng --vm 1 --timeout 1 stress-ng: info: [3351733] setting to a 1 second run per stressor stress-ng: info: [3351733] dispatching hogs: 1 vm stress-ng: info: [3351733] successful run completed in 1.02s
Performance counter stats for 'stress-ng --vm 1 --timeout 1':
1,037.71 msec task-clock # 1.000 CPUs utilized 13 context-switches # 12.528 /sec 1 cpu-migrations # 0.964 /sec 67,544 page-faults # 65.090 K/sec 2,691,932,561 cycles # 2.594 GHz (74.56%) 6,571,333,653 instructions # 2.44 insn per cycle (74.92%) 521,863,142 branches # 502.901 M/sec (75.21%) 425,879 branch-misses # 0.08% of all branches (87.57%) TopDownL1 # 0.61 retiring (87.67%) # 0.03 frontend_bound (87.67%) # 0.02 bad_speculation (87.67%) # 0.34 backend_bound (74.61%)
1.038138390 seconds time elapsed
0.844849000 seconds user 0.189053000 seconds sys
Metrics in group TopDownL1 don't have event name on arm64 but are not listed in the $skip_metric list which they should be listed. Add them to the skip list as what does for x86 platforms in [1].
[1] commit 4d60e83dfcee ("perf test: Skip metrics w/o event name in stat STD output linter")
Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1 |
|
| #
35de80c7 |
| 07-Sep-2023 |
Athira Rajeev <[email protected]> |
tests/shell: Fix shellcheck SC1090 to handle the location of sourced files
Running shellcheck on some of the shell scripts throws below error:
In tests/shell/coresight/unroll_loop_thread_10.sh lin
tests/shell: Fix shellcheck SC1090 to handle the location of sourced files
Running shellcheck on some of the shell scripts throws below error:
In tests/shell/coresight/unroll_loop_thread_10.sh line 8: . "$(dirname $0)"/../lib/coresight.sh ^-- SC1090: Can't follow non-constant source. Use a directive to specify location.
This happens on shellcheck version "0.6.0". Fix shellcheck warning for SC1090 using "shellcheck source="i option to mention the location of sourced files.
Signed-off-by: Athira Rajeev <[email protected]> Tested-by: Ian Rogers <[email protected]> Reviewed-by: Kajol Jain <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
35578a55 |
| 09-Jul-2023 |
Athira Rajeev <[email protected]> |
perf tests stat+std_output: Fix shellcheck warnings about word splitting/quoting and local variables
Running shellcheck on stat_std_output testcase throws below warning:
In tests/shell/stat+std_
perf tests stat+std_output: Fix shellcheck warnings about word splitting/quoting and local variables
Running shellcheck on stat_std_output testcase throws below warning:
In tests/shell/stat+std_output.sh line 9: . $(dirname $0)/lib/stat_output.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting.
In tests/shell/stat+std_output.sh line 32: local -i cnt=0 ^-^ SC2034 (warning): cnt appears unused. Verify use (or export if used externally).
Fixed the warning by adding quotes to avoid word splitting and removed unused variable "cnt" at line 32.
Signed-off-by: Athira Rajeev <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Disha Goel <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|
|
Revision tags: v6.4 |
|
| #
4d60e83d |
| 23-Jun-2023 |
Namhyung Kim <[email protected]> |
perf test: Skip metrics w/o event name in stat STD output linter
This test checks if the output of perf stat to match event names and metrics. So it wants the output lines to have both event name a
perf test: Skip metrics w/o event name in stat STD output linter
This test checks if the output of perf stat to match event names and metrics. So it wants the output lines to have both event name and metric. Otherwise it should skip the line.
On AMD machines, the instruction event has two metrics and they are printed in separate lines. It makes the line without event name like below:
# perf stat -a sleep 1
Performance counter stats for 'system wide':
64,383.34 msec cpu-clock # 64.048 CPUs utilized 14,526 context-switches # 225.617 /sec 112 cpu-migrations # 1.740 /sec 190 page-faults # 2.951 /sec 807,558,652 cycles # 0.013 GHz (83.30%) 69,809,799 stalled-cycles-frontend # 8.64% frontend cycles idle (83.30%) 196,983,266 stalled-cycles-backend # 24.39% backend cycles idle (83.30%) 424,876,008 instructions # 0.53 insn per cycle (here) ---> # 0.46 stalled cycles per insn (83.30%) 97,788,321 branches # 1.519 M/sec (83.34%) 4,147,377 branch-misses # 4.24% of all branches (83.46%)
1.005241409 seconds time elapsed
Also modern Intel machines have TopDown metrics which also don't have event names.
# perf stat -a sleep 1
Performance counter stats for 'system wide':
8,015.39 msec cpu-clock # 7.996 CPUs utilized 5,823 context-switches # 726.477 /sec 189 cpu-migrations # 23.580 /sec 139 page-faults # 17.342 /sec 435,139,308 cycles # 0.054 GHz 193,891,345 instructions # 0.45 insn per cycle 42,773,028 branches # 5.336 M/sec 2,298,113 branch-misses # 5.37% of all branches TopdownL1 # 25.5 % tma_backend_bound /--> # 7.9 % tma_bad_speculation (here) --+ # 55.7 % tma_frontend_bound \--> # 10.9 % tma_retiring
1.002395924 seconds time elapsed
There is a check to skip TopdownL1 and TopdownL2 specifically but it does not cover every affected lines.
So there is another check to skip the line if it has nothing on the left side of # sign. Well.. it seems ok but that's not enough too.
When aggregation mode (like --per-socket or --per-thread) is used, it adds some prefix (e.g. CPU socket, task name and PID) in the output line. So the test code ignores them to normalize result.
A problem can happen for per-thread mode when task name contains one or more spaces. It'd only ignore the first part of the task name, and it thinks there's something more in the line so it would not skip.
# perf stat -a --perf-thread sleep 1 ... perf-21276 # 70.2 % tma_backend_bound perf-21276 # 3.9 % tma_bad_speculation perf-21276 # 10.5 % tma_frontend_bound perf-21276 # 15.3 % tma_retiring ^^^^^^^^^^ (ignored)
my task-21328 # 70.2 % tma_backend_bound my task-21328 # 3.9 % tma_bad_speculation my task-21328 # 10.5 % tma_frontend_bound my task-21328 # 15.3 % tma_retiring ^^ (ignored)
So I think it should look at the metric names instead. Add skip_metric to hold the list of names to skip. It would contain 'stalled cycles per insn' and metrics started by 'tma_'.
Fixes: 99a04a48f225 ("perf test: Add test case for the standard 'perf stat' output") Acked-by: Ian Rogers <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
| #
8d3df7c3 |
| 23-Jun-2023 |
Namhyung Kim <[email protected]> |
perf test: Reorder event name checks in stat STD output linter
On AMD machines, the perf stat STD output test failed like below:
$ sudo ./perf test -v 98 98: perf stat STD output linter
perf test: Reorder event name checks in stat STD output linter
On AMD machines, the perf stat STD output test failed like below:
$ sudo ./perf test -v 98 98: perf stat STD output linter : --- start --- test child forked, pid 1841901 Checking STD output: no argswrong event metric. expected 'GHz' in 108,121 stalled-cycles-frontend # 10.88% frontend cycles idle test child finished with -1 ---- end ---- perf stat STD output linter: FAILED!
This is because there are stalled-cycles-{frontend,backend} events are used by default. The current logic checks the event_name array to find which event it's running. But 'cycles' event comes before those stalled cycles event and it matches first. So it tries to find 'GHz' metric in the output (which is for the 'cycles') and fails.
Move the stalled-cycles-{frontend,backend} events before 'cycles' so that it can find the stalled cycles events first.
Also add a space after 'no args' test name for consistency.
Fixes: 99a04a48f225 ("perf test: Add test case for the standard 'perf stat' output") Acked-by: Ian Rogers <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc7 |
|
| #
99a04a48 |
| 16-Jun-2023 |
Kan Liang <[email protected]> |
perf test: Add test case for the standard 'perf stat' output
Add a new test case to verify the standard 'perf stat' output with different options.
Reviewed-by: Ian Rogers <[email protected]> Signe
perf test: Add test case for the standard 'perf stat' output
Add a new test case to verify the standard 'perf stat' output with different options.
Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahmad Yasin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
show more ...
|