AArch64TargetTransformInfo.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# f15b6b29	12-Jul-2022	David Sherwood <[email protected]>	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following co [AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560 show more ...
# 7c3cda55	08-Jul-2022	Cullen Rhodes <[email protected]>	[AArch64][SVE] Prefer SIMD&FP variant of clast[ab] The scalar variant with GPR source/dest has considerably higher latency than the SIMD&FP scalar variant across a variety of micro-architectures: [AArch64][SVE] Prefer SIMD&FP variant of clast[ab] The scalar variant with GPR source/dest has considerably higher latency than the SIMD&FP scalar variant across a variety of micro-architectures: Core Scalar SIMD&FP -------------------------------- Neoverse V1 9 cyc 3 cyc Neoverse N2 8 cyc 3 cyc Cortex A510 8 cyc 4 cyc A64FX 29 cyc 6 cyc show more ...
Revision tags: llvmorg-14.0.6
# a83aa33d	16-Jun-2022	Bradley Smith <[email protected]>	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of [IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976 show more ...
# fb4d3d23	21-Jun-2022	David Green <[email protected]>	[AArch64] Remove unnecessary funnel shift sve costs. D127680 added some unnecessary funnel shift costs for AArch64 to "match the legacy behaviour". The default costs are closer to the correct values [AArch64] Remove unnecessary funnel shift sve costs. D127680 added some unnecessary funnel shift costs for AArch64 to "match the legacy behaviour". The default costs are closer to the correct values and line up with the scalar/neon costs better. Remove the lines again to clean up the code, they can be added back at a later date with better values if needed. show more ...
# db85345f	20-Jun-2022	Philip Reames <[email protected]>	[BasicTTI] Allow generic handling of scalable vector fshr/fshl This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9aba8, when sinking a unco [BasicTTI] Allow generic handling of scalable vector fshr/fshl This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9aba8, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue. Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing. AArch64 costing was changed to preserve legacy behavior. There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target. Differential Revision: https://reviews.llvm.org/D127680 show more ...
# b329156f	17-Jun-2022	Tiehu Zhang <[email protected]>	[AArch64][LV] AArch64 does not prefer vectorized addressing TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads. For aarch64, only gather/scatter (supported by SVE) [AArch64][LV] AArch64 does not prefer vectorized addressing TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads. For aarch64, only gather/scatter (supported by SVE) can deal with vectors of addresses. This patch specializes the hook for AArch64, to return true only when we enable SVE. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D124612 show more ...
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# bb82f746	23-May-2022	Jingu Kang <[email protected]>	Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a. The commmit from https://reviews.llvm.org/D125918 has fix Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a. The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure. Differential Revision: https://reviews.llvm.org/D118979 show more ...
# 5f4541fe	06-May-2022	Bradley Smith <[email protected]>	[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic Differential Revision: https://reviews.llvm.org/D125233
# 17a73992	10-May-2022	Florian Hahn <[email protected]>	[AArch64] Remove redundant f{min,max}nm intrinsics. The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify llvm.aarch64.neon.f{min,max}nm(a, a) -> a. This helps with simplifying code wr [AArch64] Remove redundant f{min,max}nm intrinsics. The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify llvm.aarch64.neon.f{min,max}nm(a, a) -> a. This helps with simplifying code written using the ACLE, e.g. see https://godbolt.org/z/jYxsoc89c Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125234 show more ...
# dccc69a3	06-May-2022	David Green <[email protected]>	[AArch64] Add extra reverse costs. This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model. Differential Revision: https://reviews.llvm.org/D [AArch64] Add extra reverse costs. This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model. Differential Revision: https://reviews.llvm.org/D124786 show more ...
# 2dcb2d85	02-May-2022	David Green <[email protected]>	[AArch64] Cost modelling for fptoi_sat This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many commo [AArch64] Cost modelling for fptoi_sat This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many common types they will be legal instructions as the AArch64 instructions will saturate naturally. For unsupported pairs of integer and floating point types, an additional min/max clamp is needed. Differential Revision: https://reviews.llvm.org/D124357 show more ...
# 6918a15f	29-Apr-2022	David Kreitzer <[email protected]>	Test commit. Fixed a typo in a comment.
Revision tags: llvmorg-14.0.3
# 46cef9a8	27-Apr-2022	David Green <[email protected]>	[AArch64] Attempt to fix bots by ensuring legalized type is a vector
# 8e2a0e61	27-Apr-2022	David Green <[email protected]>	[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that [AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call. Differential Revision: https://reviews.llvm.org/D123414 show more ...
# d6327050	27-Apr-2022	David Green <[email protected]>	[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly mor [AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly more accurate cost for the resulting shuffle. Differential Revision: https://reviews.llvm.org/D123409 show more ...
# fa8a9fea	26-Apr-2022	Vasileios Porpodas <[email protected]>	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c. Code review: https://reviews.llvm.or Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c. Code review: https://reviews.llvm.org/D124202 show more ...
# 6a9bbd9f	26-Apr-2022	Vasileios Porpodas <[email protected]>	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit 55ce296d6f217fd0defed2592ff7b74b79b2c1f0.
Revision tags: llvmorg-14.0.2
# 55ce296d	21-Apr-2022	Vasileios Porpodas <[email protected]>	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is no [SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202 show more ...
# 4e971efa	22-Apr-2022	Vasileios Porpodas <[email protected]>	Recommit "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64" This reverts commit 7052a0ad689b990265ec79bd2b0a7d6e8c131bfe.
# 7052a0ad	22-Apr-2022	Vasileios Porpodas <[email protected]>	Revert "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64" This reverts commit 7ba702644bac6df166a02bbd692c1599a95a7c8b.
# 7ba70264	12-Apr-2022	Vasileios Porpodas <[email protected]>	[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64 The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts the lookahead score of splat load [SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64 The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts the lookahead score of splat loads ad they can be done by the `movddup` instruction that combines the load and the broadcast and is cheap to execute. A similar issue shows up on AArch64. The `ld1r` instruction performs a broadcast load and is cheap to execute. This patch implements the TargetTransformInfo hooks for AArch64. Differential Revision: https://reviews.llvm.org/D123638 show more ...
# 42ebfa82	12-Apr-2022	Muhammad Omair Javaid <[email protected]>	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: ht Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch. show more ...
Revision tags: llvmorg-14.0.1
# fa784f63	07-Apr-2022	David Green <[email protected]>	[AArch64] Insert subvector costs An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so long as the index is [AArch64] Insert subvector costs An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so long as the index is a multiple of 4. This teaches the cost model that, using code copied over from the X86 backend. Some of the costs (v16i16_4_0) are still high because they get matched as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen tests for inserting subvectors, which I were added as llvm/test/CodeGen/AArch64/insert-subvector.ll. Differential Revision: https://reviews.llvm.org/D120880 show more ...
# 64b6192e	05-Apr-2022	Jingu Kang <[email protected]>	[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth Set the maximum VF of AArch64 with 128 / the size of smallest type in loop. Differential Revision: https://reviews.llvm.org/D118979
# 750bf358	04-Apr-2022	David Green <[email protected]>	[AArch64] Increase cost of v2i64 multiplies The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4extract + 2insert + 2mul. Scalarizing to/from gpr registers are expensive [AArch64] Increase cost of v2i64 multiplies The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4extract + 2insert + 2mul. Scalarizing to/from gpr registers are expensive though, and the cost wasn't high enough to prevent vectorizing in places where it can be detrimental for performance. This increases it so that the costs of copying to/from GPRs is increased to 2 each, with the total cost increasing to 14. So long as umull/smull are handled correctly (as in D123006) this seems to lead to better vectorization factors and better performance. Differential Revision: https://reviews.llvm.org/D123007 show more ...
12 3 4 5 6 7 8 9 10 >>...12