|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
c53e14f1 |
| 10-Mar-2025 |
Kan Liang <[email protected]> |
perf: Extend per event callchain limit to branch stack
The commit 97c79a38cd45 ("perf core: Per event callchain limit") introduced a per-event term to allow finer tuning of the depth of callchains t
perf: Extend per event callchain limit to branch stack
The commit 97c79a38cd45 ("perf core: Per event callchain limit") introduced a per-event term to allow finer tuning of the depth of callchains to save space.
It should be applied to the branch stack as well. For example, autoFDO collections require maximum LBR entries. In the meantime, other system-wide LBR users may only be interested in the latest a few number of LBRs. A per-event LBR depth would save the perf output buffer.
The patch simply drops the uninterested branches, but HW still collects the maximum branches. There may be a model-specific optimization that can reduce the HW depth for some cases to reduce the overhead further. But it isn't included in the patch set. Because it's not useful for all cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect LBRs. The depth should have less impact on the collecting overhead. The model-specific optimization may be implemented later separately.
Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
18d92bb5 |
| 22-Oct-2024 |
Adrian Hunter <[email protected]> |
perf/core: Add aux_pause, aux_resume, aux_start_paused
Hardware traces, such as instruction traces, can produce a vast amount of trace data, so being able to reduce tracing to more specific circumst
perf/core: Add aux_pause, aux_resume, aux_start_paused
Hardware traces, such as instruction traces, can produce a vast amount of trace data, so being able to reduce tracing to more specific circumstances can be useful.
The ability to pause or resume tracing when another event happens, can do that.
Add ability for an event to "pause" or "resume" AUX area tracing.
Add aux_pause bit to perf_event_attr to indicate that, if the event happens, the associated AUX area tracing should be paused. Ditto aux_resume. Do not allow aux_pause and aux_resume to be set together.
Add aux_start_paused bit to perf_event_attr to indicate to an AUX area event that it should start in a "paused" state.
Add aux_paused to struct hw_perf_event for AUX area events to keep track of the "paused" state. aux_paused is initialized to aux_start_paused.
Add PERF_EF_PAUSE and PERF_EF_RESUME modes for ->stop() and ->start() callbacks. Call as needed, during __perf_event_output(). Add aux_in_pause_resume to struct perf_buffer to prevent races with the NMI handler. Pause/resume in NMI context will miss out if it coincides with another pause/resume.
To use aux_pause or aux_resume, an event must be in a group with the AUX area event as the group leader.
Example (requires Intel PT and tools patches also):
$ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.043 MB perf.data ] $ perf script --call-trace uname 30805 [000] 24001.058782799: name: 0x7ffc9c1865b0 uname 30805 [000] 24001.058784424: psb offs: 0 uname 30805 [000] 24001.058784424: cbr: 39 freq: 3904 MHz (139%) uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __x64_sys_newuname uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) down_read uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __cond_resched uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) in_lock_functions uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_sub uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) up_read uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) in_lock_functions uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) preempt_count_sub uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) _copy_to_user uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_to_user_mode uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_work uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) perf_syscall_exit uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_alloc uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_get_recursion_context uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_tp_event uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_update uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) tracing_gen_ctx_irq_test uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_event uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __perf_event_account_interrupt uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __this_cpu_preempt_check uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_output_forward uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_aux_pause uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) ring_buffer_get uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_lock uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_unlock uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) pt_event_stop uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) native_write_msr uname 30805 [000] 24001.058785463: ([kernel.kallsyms]) native_write_msr uname 30805 [000] 24001.058785639: 0x0
Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: James Clark <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6 |
|
| #
608f6976 |
| 26-Jun-2024 |
Kan Liang <[email protected]> |
perf/x86/intel: Support new data source for Lunar Lake
A new PEBS data source format is introduced for the p-core of Lunar Lake. The data source field is extended to 8 bits with new encodings.
A ne
perf/x86/intel: Support new data source for Lunar Lake
A new PEBS data source format is introduced for the p-core of Lunar Lake. The data source field is extended to 8 bits with new encodings.
A new layout is introduced into the union intel_x86_pebs_dse. Introduce the lnl_latency_data() to parse the new format. Enlarge the pebs_data_source[] accordingly to include new encodings.
Only the mem load and the mem store events can generate the data source. Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and INTEL_HYBRID_STLAT_CONSTRAINT to mark them.
Add two new bits for the new cache-related data src, L2_MHB and MSC. The L2_MHB is short for L2 Miss Handling Buffer, which is similar to LFB (Line Fill Buffer), but to track the L2 Cache misses. The MSC stands for the memory-side cache.
Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6 |
|
| #
33744916 |
| 25-Oct-2023 |
Kan Liang <[email protected]> |
perf/x86/intel: Support branch counters logging
The branch counters logging (A.K.A LBR event logging) introduces a per-counter indication of precise event occurrences in LBRs. It can provide a means
perf/x86/intel: Support branch counters logging
The branch counters logging (A.K.A LBR event logging) introduces a per-counter indication of precise event occurrences in LBRs. It can provide a means to attribute exposed retirement latency to combinations of events across a block of instructions. It also provides a means of attributing Timed LBR latencies to events.
The feature is first introduced on SRF/GRR. It is an enhancement of the ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences of events on the GP counters. The information is displayed by the order of counters.
The design proposed in this patch requires that the events which are logged must be in a group with the event that has LBR. If there are more than one LBR group, the counters logging information only from the current group (overflowed) are stored for the perf tool, otherwise the perf tool cannot know which and when other groups are scheduled especially when multiplexing is triggered. The user can ensure it uses the maximum number of counters that support LBR info (4 by now) by making the group large enough.
The HW only logs events by the order of counters. The order may be different from the order of enabling which the perf tool can understand. When parsing the information of each branch entry, convert the counter order to the enabled order, and store the enabled order in the extension space.
Unconditionally reset LBRs for an LBR event group when it's deleted. The logged counter information is only valid for the current LBR group. If another LBR group is scheduled later, the information from the stale LBRs would be otherwise wrongly interpreted.
Add a sanity check in intel_pmu_hw_config(). Disable the feature if other counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode is enabled. (For the LBR call stack mode, we cannot simply flush the LBR, since it will break the call stack. Also, there is no obvious usage with the call stack mode for now.)
Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch stack setup.
Expose the maximum number of supported counters and the width of the counters into the sysfs. The perf tool can use the information to parse the logged counters in each branch.
Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
571d91dc |
| 25-Oct-2023 |
Kan Liang <[email protected]> |
perf: Add branch stack counters
Currently, the additional information of a branch entry is stored in a u64 space. With more and more information added, the space is running out. For example, the inf
perf: Add branch stack counters
Currently, the additional information of a branch entry is stored in a u64 space. With more and more information added, the space is running out. For example, the information of occurrences of events will be added for each branch.
Two places were suggested to append the counters. https://lore.kernel.org/lkml/[email protected]/ One place is right after the flags of each branch entry. It changes the existing struct perf_branch_entry. The later ARCH specific implementation has to be really careful to consistently pick the right struct. The other place is right after the entire struct perf_branch_stack. The disadvantage is that the pointer of the extra space has to be recorded. The common interface perf_sample_save_brstack() has to be updated.
The latter is much straightforward, and should be easily understood and maintained. It is implemented in the patch.
Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate the event which is recorded in the branch info.
The "u64 counters" may store the occurrences of several events. The information regarding the number of events/counters and the width of each counter should be exposed via sysfs as a reference for the perf tool. Define the branch_counter_nr and branch_counter_width ABI here. The support will be implemented later in the Intel-specific patch.
Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4 |
|
| #
526fffab |
| 25-Jul-2023 |
Ravi Bangoria <[email protected]> |
perf/mem: Introduce PERF_MEM_LVLNUM_UNC
Older API PERF_MEM_LVL_UNC can be replaced by PERF_MEM_LVLNUM_UNC.
Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel)
perf/mem: Introduce PERF_MEM_LVLNUM_UNC
Older API PERF_MEM_LVL_UNC can be replaced by PERF_MEM_LVLNUM_UNC.
Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4 |
|
| #
09519ec3 |
| 09-Jan-2023 |
Rob Herring <[email protected]> |
perf: Add perf_event_attr::config3
Arm SPEv1.2 adds another 64-bits of event filtering control. As the existing perf_event_attr::configN fields are all used up for SPE PMU, an additional field is ne
perf: Add perf_event_attr::config3
Arm SPEv1.2 adds another 64-bits of event filtering control. As the existing perf_event_attr::configN fields are all used up for SPE PMU, an additional field is needed. Add a new 'config3' field.
Tested-by: James Clark <[email protected]> Signed-off-by: Rob Herring <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0 |
|
| #
cb6c18b5 |
| 01-Oct-2022 |
Ravi Bangoria <[email protected]> |
perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's bit ambiguous name and also not generic enough to cover cxl.cac
perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's bit ambiguous name and also not generic enough to cover cxl.cache and cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.
Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
cfef80ba |
| 28-Sep-2022 |
Ravi Bangoria <[email protected]> |
perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
PERF_MEM_SNOOPX_PEER is defined only in tools uapi header. Although it's used only by perf tool, not defining it in kernel header can cre
perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
PERF_MEM_SNOOPX_PEER is defined only in tools uapi header. Although it's used only by perf tool, not defining it in kernel header can create problems in future.
Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
ee3e88df |
| 28-Sep-2022 |
Ravi Bangoria <[email protected]> |
perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO accesses but it
perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO accesses but it can not distinguish between local and remote IO. Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.
Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.0-rc7, v6.0-rc6, v6.0-rc5 |
|
| #
b4e12b2d |
| 08-Sep-2022 |
Namhyung Kim <[email protected]> |
perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY
There's no in-tree user anymore. Let's get rid of it.
Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <peterz@infra
perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY
There's no in-tree user anymore. Let's get rid of it.
Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.0-rc4, v6.0-rc3 |
|
| #
f4054e52 |
| 24-Aug-2022 |
Anshuman Khandual <[email protected]> |
perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform
BRBE captured branch types will overflow perf_branch_entry.type and generic branch types in perf_branch_entry.new_type. So override each
perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform
BRBE captured branch types will overflow perf_branch_entry.type and generic branch types in perf_branch_entry.new_type. So override each available arch specific branch type in the following manner to comprehensively process all reported branch types in BRBE.
PERF_BR_ARM64_FIQ PERF_BR_NEW_ARCH_1 PERF_BR_ARM64_DEBUG_HALT PERF_BR_NEW_ARCH_2 PERF_BR_ARM64_DEBUG_EXIT PERF_BR_NEW_ARCH_3 PERF_BR_ARM64_DEBUG_INST PERF_BR_NEW_ARCH_4 PERF_BR_ARM64_DEBUG_DATA PERF_BR_NEW_ARCH_5
Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: James Clark <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
5402d25a |
| 24-Aug-2022 |
Anshuman Khandual <[email protected]> |
perf: Capture branch privilege information
Platforms like arm64 could capture privilege level information for all the branch records. Hence this adds a new element in the struct branch_entry to reco
perf: Capture branch privilege information
Platforms like arm64 could capture privilege level information for all the branch records. Hence this adds a new element in the struct branch_entry to record the privilege level information, which could be requested through a new event.attr.branch_sample_type based flag PERF_SAMPLE_BRANCH_PRIV_SAVE. This flag helps user choose whether privilege information is captured.
Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: James Clark <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
b190bc4a |
| 24-Aug-2022 |
Anshuman Khandual <[email protected]> |
perf: Extend branch type classification
branch_entry.type now has ran out of space to accommodate more branch types classification. This will prevent perf branch stack implementation on arm64 (via B
perf: Extend branch type classification
branch_entry.type now has ran out of space to accommodate more branch types classification. This will prevent perf branch stack implementation on arm64 (via BRBE) to capture all available branch types. Extending this bit field i.e branch_entry.type [4 bits] is not an option as it will break user space ABI both for little and big endian perf tools.
Extend branch classification with a new field branch_entry.new_type via a new branch type PERF_BR_EXTEND_ABI in branch_entry.type. Perf tools which could decode PERF_BR_EXTEND_ABI, will then parse branch_entry.new_type as well.
branch_entry.new_type is a 4 bit field which can hold upto 16 branch types. The first three branch types will hold various generic page faults followed by five architecture specific branch types, which can be overridden by the platform for specific use cases. These architecture specific branch types gets overridden on arm64 platform for BRBE implementation.
New generic branch types
- PERF_BR_NEW_FAULT_ALGN - PERF_BR_NEW_FAULT_DATA - PERF_BR_NEW_FAULT_INST
New arch specific branch types
- PERF_BR_NEW_ARCH_1 - PERF_BR_NEW_ARCH_2 - PERF_BR_NEW_ARCH_3 - PERF_BR_NEW_ARCH_4 - PERF_BR_NEW_ARCH_5
Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: James Clark <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
a724ec82 |
| 24-Aug-2022 |
Anshuman Khandual <[email protected]> |
perf: Add system error and not in transaction branch types
This expands generic branch type classification by adding two more entries there in i.e system error and not in transaction. This also upda
perf: Add system error and not in transaction branch types
This expands generic branch type classification by adding two more entries there in i.e system error and not in transaction. This also updates the x86 implementation to process X86_BR_NO_TX records as appropriate. This changes branch types reported to user space on x86 platform but it should not be a problem. The possible scenarios and impacts are enumerated here.
-------------------------------------------------------------------------- | kernel | perf tool | Impact | -------------------------------------------------------------------------- | old | old | Works as before | -------------------------------------------------------------------------- | old | new | PERF_BR_UNKNOWN is processed | -------------------------------------------------------------------------- | new | old | PERF_BR_NO_TX is blocked via old PERF_BR_MAX | -------------------------------------------------------------------------- | new | new | PERF_BR_NO_TX is recognized | --------------------------------------------------------------------------
When PERF_BR_NO_TX is blocked via old PERF_BR_MAX (new kernel with old perf tool) the user space might throw up an warning complaining about an unrecognized branch types being reported, but it's expected. PERF_BR_SERROR & PERF_BR_NO_TX branch types will be used for BRBE implementation on arm64 platform.
PERF_BR_NO_TX complements 'abort' and 'in_tx' elements in perf_branch_entry which represent other transaction states for a given branch record. Because this completes the transaction state classification.
Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: James Clark <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.0-rc2, v6.0-rc1 |
|
| #
93315e46 |
| 11-Aug-2022 |
Sandipan Das <[email protected]> |
perf/core: Add speculation info to branch entries
Add a new "spec" bitfield to branch entries for providing speculation information. This will be populated using hints provided by branch sampling fe
perf/core: Add speculation info to branch entries
Add a new "spec" bitfield to branch entries for providing speculation information. This will be populated using hints provided by branch sampling features on supported hardware. The following cases are covered:
* No branch speculation information is available * Branch is speculative but taken on the wrong path * Branch is non-speculative but taken on the correct path * Branch is speculative and taken on the correct path
Suggested-by: Stephane Eranian <[email protected]> Signed-off-by: Sandipan Das <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/834088c302faf21c7b665031dd111f424e509a64.1660211399.git.sandipan.das@amd.com
show more ...
|
|
Revision tags: v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2 |
|
| #
94dfc73e |
| 07-Apr-2022 |
Gustavo A. R. Silva <[email protected]> |
treewide: uapi: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a stru
treewide: uapi: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2].
This code was transformed with the help of Coccinelle: (linux-5.19-rc2$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)
@@ identifier S, member, array; type T1, T2; @@
struct S { ... T1 member; T2 array[ - 0 ]; };
-fstrict-flex-arrays=3 is coming and we need to land these changes to prevent issues like these in the short future:
../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source] strcpy(de3->name, "."); ^
Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If this breaks anything, we can use a union with a new member name.
[1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/78 Build-tested-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/lkml/62b675ec.wKX6AOZ6cbE71vtF%[email protected]/ Acked-by: Dan Williams <[email protected]> # For ndctl.h Signed-off-by: Gustavo A. R. Silva <[email protected]>
show more ...
|
| #
119a784c |
| 16-Jun-2022 |
Namhyung Kim <[email protected]> |
perf/core: Add a new read format to get a number of lost samples
Sometimes we want to know an accurate number of samples even if it's lost. Currenlty PERF_RECORD_LOST is generated for a ring-buffer
perf/core: Add a new read format to get a number of lost samples
Sometimes we want to know an accurate number of samples even if it's lost. Currenlty PERF_RECORD_LOST is generated for a ring-buffer which might be shared with other events. So it's hard to know per-event lost count.
Add event->lost_samples field and PERF_FORMAT_LOST to retrieve it from userspace.
Original-patch-by: Jiri Olsa <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
cedd3614 |
| 24-Feb-2022 |
Anshuman Khandual <[email protected]> |
perf: Add irq and exception return branch types
This expands generic branch type classification by adding two more entries there in i.e irq and exception return. Also updates the x86 implementation
perf: Add irq and exception return branch types
This expands generic branch type classification by adding two more entries there in i.e irq and exception return. Also updates the x86 implementation to process X86_BR_IRET and X86_BR_IRQ records as appropriate. This changes branch types reported to user space on x86 platform but it should not be a problem. The possible scenarios and impacts are enumerated here.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3 |
|
| #
ddecd228 |
| 31-Jan-2022 |
Marco Elver <[email protected]> |
perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
Due to the alignment requirements of siginfo_t, as described in 3ddb3fd8cdb0 ("signal, perf: Fix siginfo_t by avoidi
perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
Due to the alignment requirements of siginfo_t, as described in 3ddb3fd8cdb0 ("signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures"), siginfo_t::si_perf_data is limited to an unsigned long.
However, perf_event_attr::sig_data is an u64, to avoid having to deal with compat conversions. Due to being an u64, it may not immediately be clear to users that sig_data is truncated on 32 bit architectures.
Add a comment to explicitly point this out, and hopefully help some users save time by not having to deduce themselves what's happening.
Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5 |
|
| #
cb1c4aba |
| 06-Dec-2021 |
Kajol Jain <[email protected]> |
perf: Add new macros for mem_hops field
Add new macros for mem_hops field which can be used to represent remote-node, socket and board level details.
Currently the code had macro for HOPS_0, which
perf: Add new macros for mem_hops field
Add new macros for mem_hops field which can be used to represent remote-node, socket and board level details.
Currently the code had macro for HOPS_0, which corresponds to data coming from another core but same node. Add new macros for HOPS_1 to HOPS_3 to represent remote-node, socket and board level data.
For ex: Encodings for mem_hops fields with L2 cache:
L2 - local L2 L2 | REMOTE | HOPS_0 - remote core, same node L2 L2 | REMOTE | HOPS_1 - remote node, same socket L2 L2 | REMOTE | HOPS_2 - remote socket, same board L2 L2 | REMOTE | HOPS_3 - remote board L2
Signed-off-by: Kajol Jain <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5 |
|
| #
fec9cc61 |
| 06-Oct-2021 |
Kajol Jain <[email protected]> |
perf: Add mem_hops field in perf_mem_data_src structure
Going forward, future generation systems can have more hierarchy within the node/package level but currently we don't have any data source enc
perf: Add mem_hops field in perf_mem_data_src structure
Going forward, future generation systems can have more hierarchy within the node/package level but currently we don't have any data source encoding field in perf, which can be used to represent this level of data.
Add a new field called 'mem_hops' in the perf_mem_data_src structure which can be used to represent intra-node/package or inter-node/off-package details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value can be used to present different hop levels data.
Also add corresponding macros to define mem_hop field values and shift value.
Currently we define macro for HOPS_0 which corresponds to data coming from another core but same node.
For ex: Encodings for mem_hops fields with L2 cache:
L2 - local L2 L2 | REMOTE | HOPS_0 - remote core, same node L2
Signed-off-by: Kajol Jain <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
f4c6217f |
| 06-Oct-2021 |
Kajol Jain <[email protected]> |
perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
Add a comment about PERF_MEM_LVL_* namespace being depricated to some extent in favour of added PERF_MEM_{L
perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
Add a comment about PERF_MEM_LVL_* namespace being depricated to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
Remove an extra line present in perf_mem__lvl_scnprintf function.
Signed-off-by: Kajol Jain <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1 |
|
| #
8b8ff8cc |
| 07-Sep-2021 |
Adrian Hunter <[email protected]> |
perf/x86: Add new event for AUX output counter index
PEBS-via-PT records contain a mask of applicable counters. To identify which event belongs to which counter, a side-band event is needed. Until n
perf/x86: Add new event for AUX output counter index
PEBS-via-PT records contain a mask of applicable counters. To identify which event belongs to which counter, a side-band event is needed. Until now, there has been no side-band event, and consequently users were limited to using a single event.
Add such a side-band event. Note the event is optimised to output only when the counter index changes for an event. That works only so long as all PEBS-via-PT events are scheduled together, which they are for a recording session because they are in a single group.
Also no attribute bit is used to select the new event, so a new kernel is not compatible with older perf tools. The assumption being that PEBS-via-PT is sufficiently esoteric that users will not be troubled by this.
Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1 |
|
| #
0683b531 |
| 02-May-2021 |
Eric W. Biederman <[email protected]> |
signal: Deliver all of the siginfo perf data in _perf
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t.
Note: The data field in the perf data structures in a u64 t
signal: Deliver all of the siginfo perf data in _perf
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t.
Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports.
v1: https://lkml.kernel.org/r/[email protected] v2: https://lkml.kernel.org/r/[email protected] v3: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Marco Elver <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|