|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
1e7dcbfa |
| 05-Mar-2025 |
Oliver Upton <[email protected]> |
KVM: arm64: Remap PMUv3 events onto hardware
Map PMUv3 event IDs onto hardware, if the driver exposes such a helper. This is expected to be quite rare, and only useful for non-PMUv3 hardware.
Teste
KVM: arm64: Remap PMUv3 events onto hardware
Map PMUv3 event IDs onto hardware, if the driver exposes such a helper. This is expected to be quite rare, and only useful for non-PMUv3 hardware.
Tested-by: Janne Grunau <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4 |
|
| #
dc4d58a7 |
| 18-Feb-2025 |
Mark Rutland <[email protected]> |
perf: arm_pmu: Move PMUv3-specific data
A few fields in struct arm_pmu are only used with PMUv3, and soon we will need to add more for BRBE. Group the fields together so that we have a logical place
perf: arm_pmu: Move PMUv3-specific data
A few fields in struct arm_pmu are only used with PMUv3, and soon we will need to add more for BRBE. Group the fields together so that we have a logical place to add more data in future.
At the same time, remove the comment for reg_pmmir as it doesn't convey anything useful.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Rob Herring (Arm) <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
d8226d8c |
| 31-Jul-2024 |
Rob Herring (Arm) <[email protected]> |
perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter
Armv9.4/8.9 PMU adds optional support for a fixed instruction counter similar to the fixed cycle counter. Support for the feature is
perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter
Armv9.4/8.9 PMU adds optional support for a fixed instruction counter similar to the fixed cycle counter. Support for the feature is indicated in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not accessible in AArch32.
Existing userspace using direct counter access won't know how to handle the fixed instruction counter, so we have to avoid using the counter when user access is requested.
Acked-by: Mark Rutland <[email protected]> Signed-off-by: Rob Herring (Arm) <[email protected]> Tested-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
bf5ffc8c |
| 31-Jul-2024 |
Rob Herring (Arm) <[email protected]> |
perf: arm_pmu: Remove event index to counter remapping
Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters starting at 1 and had 1:1 event index to counter numbering. On Armv7 an
perf: arm_pmu: Remove event index to counter remapping
Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters starting at 1 and had 1:1 event index to counter numbering. On Armv7 and later, this changed the cycle counter to 31 and event counters start at 0. The drivers for Armv7 and PMUv3 kept the old event index numbering and introduced an event index to counter conversion. The conversion uses masking to convert from event index to a counter number. This operation relies on having at most 32 counters so that the cycle counter index 0 can be transformed to counter number 31.
Armv9.4 adds support for an additional fixed function counter (instructions) which increases possible counters to more than 32, and the conversion won't work anymore as a simple subtract and mask. The primary reason for the translation (other than history) seems to be to have a contiguous mask of counters 0-N. Keeping that would result in more complicated index to counter conversions. Instead, store a mask of available counters rather than just number of events. That provides more information in addition to the number of events.
No (intended) functional changes.
Acked-by: Mark Rutland <[email protected]> Signed-off-by: Rob Herring (Arm) <[email protected]> Tested-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
f6da8696 |
| 11-Dec-2023 |
James Clark <[email protected]> |
arm: pmu: Share user ABI format mechanism with SPE
This mechanism makes it much easier to define and read new attributes so move it to the arm_pmu.h header so that it can be shared. At the same time
arm: pmu: Share user ABI format mechanism with SPE
This mechanism makes it much easier to define and read new attributes so move it to the arm_pmu.h header so that it can be shared. At the same time update the existing format attributes to use it.
GENMASK has to be changed to GENMASK_ULL because the config fields are 64 bits even on arm32 where this will also be used now.
Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2 |
|
| #
118eb89b |
| 15-Nov-2023 |
Anshuman Khandual <[email protected]> |
drivers: perf: arm_pmu: Drop 'pmu_lock' element from 'struct pmu_hw_events'
As 'pmu_lock' element is not being used in any ARM PMU implementation, just drop this from 'struct pmu_hw_events'.
Cc: Wi
drivers: perf: arm_pmu: Drop 'pmu_lock' element from 'struct pmu_hw_events'
As 'pmu_lock' element is not being used in any ARM PMU implementation, just drop this from 'struct pmu_hw_events'.
Cc: Will Deacon <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7 |
|
| #
1aa3d027 |
| 17-Aug-2023 |
Anshuman Khandual <[email protected]> |
arm_pmu: acpi: Add a representative platform device for TRBE
ACPI TRBE does not have a HID for identification which could create and add a platform device into the platform bus. Also without a platf
arm_pmu: acpi: Add a representative platform device for TRBE
ACPI TRBE does not have a HID for identification which could create and add a platform device into the platform bus. Also without a platform device, it cannot be probed and bound to a platform driver.
This creates a dummy platform device for TRBE after ascertaining that ACPI provides required interrupts uniformly across all cpus on the system. This device gets created inside drivers/perf/arm_pmu_acpi.c to accommodate TRBE being built as a module.
Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3 |
|
| #
d7a0fe9e |
| 19-May-2023 |
Douglas Anderson <[email protected]> |
arm64: enable perf events based hard lockup detector
With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been poss
arm64: enable perf events based hard lockup detector
With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been possible to enable hard lockup detector (or NMI watchdog) on arm64 platforms. So enable corresponding support.
One thing to note here is that normally lockup detector is initialized just after the early initcalls but PMU on arm64 comes up much later as device_initcall(). To cope with that, override arch_perf_nmi_is_available() to let the watchdog framework know PMU not ready, and inform the framework to re-initialize lockup detection once PMU has been initialized.
[[email protected]: only HAVE_HARDLOCKUP_DETECTOR_PERF if the PMU config is enabled] Link: https://lkml.kernel.org/r/20230523073952.1.I60217a63acc35621e13f10be16c0cd7c363caf8c@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid Co-developed-by: Sumit Garg <[email protected]> Signed-off-by: Sumit Garg <[email protected]> Co-developed-by: Pingfan Liu <[email protected]> Signed-off-by: Pingfan Liu <[email protected]> Signed-off-by: Lecopzer Chen <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen-Yu Tsai <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Colin Cross <[email protected]> Cc: Daniel Thompson <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masayoshi Mizuma <[email protected]> Cc: Matthias Kaehlcke <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: "Ravi V. Shankar" <[email protected]> Cc: Ricardo Neri <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Tzung-Bi Shih <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
8be3593b |
| 28-May-2023 |
Marc Zyngier <[email protected]> |
drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
Sidharth reports that on M2, the PMU never generates any interrupt when using 'perf record', which is a annoying as you get no sample. I'm te
drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
Sidharth reports that on M2, the PMU never generates any interrupt when using 'perf record', which is a annoying as you get no sample. I'm temped to say "no sample, no problem", but others may have a different opinion.
Upon investigation, it appears that the counters on M2 are significantly different from the ones on M1, as they count on 64 bits instead of 48. Which of course, in the fine M1 tradition, means that we can only use 63 bits, as the top bit is used to signal the interrupt...
This results in having to introduce yet another flag to indicate yet another odd counter width. Who knows what the next crazy implementation will do...
With this, perf can work out the correct offset, and 'perf record' works as intended.
Tested on M2 and M2-Pro CPUs.
Cc: Janne Grunau <[email protected]> Cc: Hector Martin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Fixes: 7d0bfb7c9977 ("drivers/perf: apple_m1: Add Apple M2 support") Reported-by: Sidharth Kshatriya <[email protected]> Tested-by: Sidharth Kshatriya <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2 |
|
| #
61d03862 |
| 16-Feb-2023 |
Mark Rutland <[email protected]> |
arm_pmu: fix event CPU filtering
Janne reports that perf has been broken on Apple M1 as of commit:
bd27568117664b8b ("perf: Rewrite core context handling")
That commit replaced the pmu::filter_m
arm_pmu: fix event CPU filtering
Janne reports that perf has been broken on Apple M1 as of commit:
bd27568117664b8b ("perf: Rewrite core context handling")
That commit replaced the pmu::filter_match() callback with pmu::filter(), whose return value has the opposite polarity, with true implying events should be ignored rather than scheduled. While an attempt was made to update the logic in armv8pmu_filter() and armpmu_filter() accordingly, the return value remains inverted in a couple of cases:
* If the arm_pmu does not have an arm_pmu::filter() callback, armpmu_filter() will always return whether the CPU is supported rather than whether the CPU is not supported.
As a result, the perf core will not schedule events on supported CPUs, resulting in a loss of events. Additionally, the perf core will attempt to schedule events on unsupported CPUs, but this will be rejected by armpmu_add(), which may result in a loss of events from other PMUs on those unsupported CPUs.
* If the arm_pmu does have an arm_pmu::filter() callback, and armpmu_filter() is called on a CPU which is not supported by the arm_pmu, armpmu_filter() will return false rather than true.
As a result, the perf core will attempt to schedule events on unsupported CPUs, but this will be rejected by armpmu_add(), which may result in a loss of events from other PMUs on those unsupported CPUs.
This means a loss of events can be seen with any arm_pmu driver, but with the ARMv8 PMUv3 driver (which is the only arm_pmu driver with an arm_pmu::filter() callback) the event loss will be more limited and may go unnoticed, which is how this issue evaded testing so far.
Fix the CPU filtering by performing this consistently in armpmu_filter(), and remove the redundant arm_pmu::filter() callback and armv8pmu_filter() implementation.
Commit bd2756811766 also silently removed the CHAIN event filtering from armv8pmu_filter(), which will be addressed by a separate patch without using the filter callback.
Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: Janne Grunau <[email protected]> Link: https://lore.kernel.org/asahi/[email protected]/ Signed-off-by: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Asahi Lina <[email protected]> Cc: Eric Curtin <[email protected]> Tested-by: Janne Grunau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0 |
|
| #
fe40ffdb |
| 30-Sep-2022 |
Mark Rutland <[email protected]> |
arm_pmu: rework ACPI probing
The current ACPI PMU probing logic tries to associate PMUs with CPUs when the CPU is first brought online, in order to handle late hotplug, though PMUs are only register
arm_pmu: rework ACPI probing
The current ACPI PMU probing logic tries to associate PMUs with CPUs when the CPU is first brought online, in order to handle late hotplug, though PMUs are only registered during early boot, and so for late hotplugged CPUs this can only associate the CPU with an existing PMU.
We tried to be clever and the have the arm_pmu_acpi_cpu_starting() callback allocate a struct arm_pmu when no matching instance is found, in order to avoid duplication of logic. However, as above this doesn't do anything useful for late hotplugged CPUs, and this requires us to allocate memory in an atomic context, which is especially problematic for PREEMPT_RT, as reported by Valentin and Pierre.
This patch reworks the probing to detect PMUs for all online CPUs in the arm_pmu_acpi_probe() function, which is more aligned with how DT probing works. The arm_pmu_acpi_cpu_starting() callback only tries to associate CPUs with an existing arm_pmu instance, avoiding the problem of allocating in atomic context.
Note that as we didn't previously register PMUs for late-hotplugged CPUs, this change doesn't result in a loss of existing functionality, though we will now warn when we cannot associate a CPU with a PMU.
This change allows us to pull the hotplug callback registration into the arm_pmu_acpi_probe() function, as we no longer need the callbacks to be invoked shortly after probing the boot CPUs, and can register it without invoking the calls.
For the moment the arm_pmu_acpi_init() initcall remains to register the SPE PMU, though in future this should probably be moved elsewhere (e.g. the arm64 ACPI init code), since this doesn't need to be tied to the regular CPU PMU code.
Signed-off-by: Mark Rutland <[email protected]> Reported-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Reported-by: Pierre Gondois <[email protected]> Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/ Cc: Pierre Gondois <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Will Deacon <[email protected]> Reviewed-and-tested-by: Pierre Gondois <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
bd275681 |
| 08-Oct-2022 |
Peter Zijlstra <[email protected]> |
perf: Rewrite core context handling
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which
perf: Rewrite core context handling
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged).
Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU
*Current design:*
Currently we have a per task and per cpu perf_event_contexts:
task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------'
Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task.
Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU.
The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc.
Each perf_event is then associated with its PMU and one perf_event_context.
*Proposed design:*
New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures:
task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------'
With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key:
{cpu, pmu, cgroup, group_index}
Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc.
Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information.
Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context.
Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events.
Much thanks to Ravi for picking this up and pushing it towards completion.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Co-developed-by: Ravi Bangoria <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.0-rc7, v6.0-rc6, v6.0-rc5 |
|
| #
91207f62 |
| 07-Sep-2022 |
Anshuman Khandual <[email protected]> |
arm64/perf: Assert all platform event flags are within PERF_EVENT_FLAG_ARCH
Ensure all platform specific event flags are within PERF_EVENT_FLAG_ARCH.
Signed-off-by: Anshuman Khandual <anshuman.khan
arm64/perf: Assert all platform event flags are within PERF_EVENT_FLAG_ARCH
Ensure all platform specific event flags are within PERF_EVENT_FLAG_ARCH.
Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: James Clark <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4 |
|
| #
1280f12f |
| 08-Feb-2022 |
Marc Zyngier <[email protected]> |
drivers/perf: arm_pmu: Handle 47 bit counters
The current ARM PMU framework can only deal with 32 or 64bit counters. Teach it about a 47bit flavour.
Yes, this is odd.
Reviewed-by: Hector Martin <m
drivers/perf: arm_pmu: Handle 47 bit counters
The current ARM PMU framework can only deal with 32 or 64bit counters. Teach it about a 47bit flavour.
Yes, this is odd.
Reviewed-by: Hector Martin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2 |
|
| #
e840f42a |
| 19-Sep-2021 |
Marc Zyngier <[email protected]> |
KVM: arm64: Fix PMU probe ordering
Russell reported that since 5.13, KVM's probing of the PMU has started to fail on his HW. As it turns out, there is an implicit ordering dependency between the arc
KVM: arm64: Fix PMU probe ordering
Russell reported that since 5.13, KVM's probing of the PMU has started to fail on his HW. As it turns out, there is an implicit ordering dependency between the architectural PMU probing code and and KVM's own probing. If, due to probe ordering reasons, KVM probes before the PMU driver, it will fail to detect the PMU and prevent it from being advertised to guests as well as the VMM.
Obviously, this is one probing too many, and we should be able to deal with any ordering.
Add a callback from the PMU code into KVM to advertise the registration of a host CPU PMU, allowing for any probing order.
Fixes: 5421db1be3b1 ("KVM: arm64: Divorce the perf code from oprofile helpers") Reported-by: "Russell King (Oracle)" <[email protected]> Tested-by: Russell King (Oracle) <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected]
show more ...
|
|
Revision tags: v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4 |
|
| #
b90d72a6 |
| 12-Jan-2021 |
Will Deacon <[email protected]> |
Revert "arm64: Enable perf events based hard lockup detector"
This reverts commit 367c820ef08082e68df8a3bc12e62393af21e4b5.
lockup_detector_init() makes heavy use of per-cpu variables and must be c
Revert "arm64: Enable perf events based hard lockup detector"
This reverts commit 367c820ef08082e68df8a3bc12e62393af21e4b5.
lockup_detector_init() makes heavy use of per-cpu variables and must be called with preemption disabled. Usually, it's handled early during boot in kernel_init_freeable(), before SMP has been initialised.
Since we do not know whether or not our PMU interrupt can be signalled as an NMI until considerably later in the boot process, the Arm PMU driver attempts to re-initialise the lockup detector off the back of a device_initcall(). Unfortunately, this is called from preemptible context and results in the following splat:
| BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 | caller is debug_smp_processor_id+0x20/0x2c | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x3c0 | show_stack+0x20/0x6c | dump_stack+0x2f0/0x42c | check_preemption_disabled+0x1cc/0x1dc | debug_smp_processor_id+0x20/0x2c | hardlockup_detector_event_create+0x34/0x18c | hardlockup_detector_perf_init+0x2c/0x134 | watchdog_nmi_probe+0x18/0x24 | lockup_detector_init+0x44/0xa8 | armv8_pmu_driver_init+0x54/0x78 | do_one_initcall+0x184/0x43c | kernel_init_freeable+0x368/0x380 | kernel_init+0x1c/0x1cc | ret_from_fork+0x10/0x30
Rather than bodge this with raw_smp_processor_id() or randomly disabling preemption, simply revert the culprit for now until we figure out how to do this properly.
Reported-by: Lecopzer Chen <[email protected]> Signed-off-by: Will Deacon <[email protected]> Acked-by: Mark Rutland <[email protected]> Cc: Sumit Garg <[email protected]> Cc: Alexandru Elisei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9 |
|
| #
367c820e |
| 07-Oct-2020 |
Sumit Garg <[email protected]> |
arm64: Enable perf events based hard lockup detector
With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been poss
arm64: Enable perf events based hard lockup detector
With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been possible to enable hard lockup detector (or NMI watchdog) on arm64 platforms. So enable corresponding support.
One thing to note here is that normally lockup detector is initialized just after the early initcalls but PMU on arm64 comes up much later as device_initcall(). So we need to re-initialize lockup detection once PMU has been initialized.
Signed-off-by: Sumit Garg <[email protected]> Acked-by: Alexandru Elisei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.9-rc8, v5.9-rc7 |
|
| #
f5be3a61 |
| 22-Sep-2020 |
Shaokun Zhang <[email protected]> |
arm64: perf: Add support caps under sysfs
ARMv8.4-PMU introduces the PMMIR_EL1 registers and some new PMU events, like STALL_SLOT etc, are related to it. Let's add a caps directory to /sys/bus/event
arm64: perf: Add support caps under sysfs
ARMv8.4-PMU introduces the PMMIR_EL1 registers and some new PMU events, like STALL_SLOT etc, are related to it. Let's add a caps directory to /sys/bus/event_source/devices/armv8_pmuv3_0/ and support slots from PMMIR_EL1 registers in this entry. The user programs can get the slots from sysfs directly.
/sys/bus/event_source/devices/armv8_pmuv3_0/caps/slots is exposed under sysfs. Both ARMv8.4-PMU and STALL_SLOT event are implemented, it returns the slots from PMMIR_EL1, otherwise it will return 0.
Signed-off-by: Shaokun Zhang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5 |
|
| #
8673e02e |
| 02-Mar-2020 |
Andrew Murray <[email protected]> |
arm64: perf: Add support for ARMv8.5-PMU 64-bit counters
At present ARMv8 event counters are limited to 32-bits, though by using the CHAIN event it's possible to combine adjacent counters to achieve
arm64: perf: Add support for ARMv8.5-PMU 64-bit counters
At present ARMv8 event counters are limited to 32-bits, though by using the CHAIN event it's possible to combine adjacent counters to achieve 64-bits. The perf config1:0 bit can be set to use such a configuration.
With the introduction of ARMv8.5-PMU support, all event counters can now be used as 64-bit counters.
Let's enable 64-bit event counters where support exists. Unless the user sets config1:0 we will adjust the counter value such that it overflows upon 32-bit overflow. This follows the same behaviour as the cycle counter which has always been (and remains) 64-bits.
Signed-off-by: Andrew Murray <[email protected]> Reviewed-by: Suzuki K Poulose <[email protected]> [Mark: fix ID field names, compare with 8.5 value] Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7 |
|
| #
d24a0c70 |
| 26-Jun-2019 |
Jeremy Linton <[email protected]> |
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI 6.3 adds additional fields to the MADT GICC structure to describe SPE PPI's. We pick these out of the cached reference to the madt_gicc structur
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI 6.3 adds additional fields to the MADT GICC structure to describe SPE PPI's. We pick these out of the cached reference to the madt_gicc structure similarly to the core PMU code. We then create a platform device referring to the IRQ and let the user/module loader decide whether to load the SPE driver.
Tested-by: Hanjun Guo <[email protected]> Reviewed-by: Sudeep Holla <[email protected]> Reviewed-by: Lorenzo Pieralisi <[email protected]> Signed-off-by: Jeremy Linton <[email protected]> Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc6, v5.2-rc5, v5.2-rc4 |
|
| #
d2912cb1 |
| 04-Jun-2019 |
Thomas Gleixner <[email protected]> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Enrico Weigelt <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7 |
|
| #
342e53bd |
| 05-Oct-2018 |
Will Deacon <[email protected]> |
arm64: perf: Add support for Armv8.1 PMCEID register format
Armv8.1 allocated the upper 32-bits of the PMCEID registers to describe the common architectural and microarchitecture events beginning at
arm64: perf: Add support for Armv8.1 PMCEID register format
Armv8.1 allocated the upper 32-bits of the PMCEID registers to describe the common architectural and microarchitecture events beginning at 0x4000.
Add support for these registers to our probing code, so that we can advertise the SPE events when they are supported by the CPU.
Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
ca2b4972 |
| 05-Oct-2018 |
Will Deacon <[email protected]> |
arm64: perf: Reject stand-alone CHAIN events for PMUv3
It doesn't make sense for a perf event to be configured as a CHAIN event in isolation, so extend the arm_pmu structure with a ->filter_match()
arm64: perf: Reject stand-alone CHAIN events for PMUv3
It doesn't make sense for a perf event to be configured as a CHAIN event in isolation, so extend the arm_pmu structure with a ->filter_match() function to allow the backend PMU implementation to reject CHAIN events early.
Cc: <[email protected]> Reviewed-by: Suzuki K Poulose <[email protected]> Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5 |
|
| #
e2da97d3 |
| 10-Jul-2018 |
Suzuki K Poulose <[email protected]> |
arm_pmu: Add support for 64bit event counters
Each PMU has a set of 32bit event counters. But in some special cases, the events could be counted using counters which are effectively 64bit wide.
e.g
arm_pmu: Add support for 64bit event counters
Each PMU has a set of 32bit event counters. But in some special cases, the events could be counted using counters which are effectively 64bit wide.
e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count only the CPU cycles. Also, the PMU can chain the event counters to effectively count as a 64bit counter.
Add support for tracking the events that uses 64bit counters. This only affects the periods set for each counter in the core driver.
Cc: Will Deacon <[email protected]> Reviewed-by: Julien Thierry <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
3a95200d |
| 10-Jul-2018 |
Suzuki K Poulose <[email protected]> |
arm_pmu: Change API to support 64bit counter values
Convert the {read/write}_counter APIs to handle 64bit values to enable supporting chained event counters. The backends still use 32bit values and
arm_pmu: Change API to support 64bit counter values
Convert the {read/write}_counter APIs to handle 64bit values to enable supporting chained event counters. The backends still use 32bit values and we pass them 32bit values only. So in effect there are no functional changes.
Cc: Will Deacon <[email protected]> Acked-by: Mark Rutland <[email protected]> Reviewed-by: Julien Thierry <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Signed-off-by: Will Deacon <[email protected]>
show more ...
|