|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
f15b6b29 |
| 12-Jul-2022 |
David Sherwood <[email protected]> |
[AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following co
[AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met:
1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences.
Currently the default option is "disabled", but this will be changed in a later patch.
I've added new tests to show the options behave as expected here:
Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
Differential Revision: https://reviews.llvm.org/D129560
show more ...
|
| #
7c3cda55 |
| 08-Jul-2022 |
Cullen Rhodes <[email protected]> |
[AArch64][SVE] Prefer SIMD&FP variant of clast[ab]
The scalar variant with GPR source/dest has considerably higher latency than the SIMD&FP scalar variant across a variety of micro-architectures:
[AArch64][SVE] Prefer SIMD&FP variant of clast[ab]
The scalar variant with GPR source/dest has considerably higher latency than the SIMD&FP scalar variant across a variety of micro-architectures:
Core Scalar SIMD&FP -------------------------------- Neoverse V1 9 cyc 3 cyc Neoverse N2 8 cyc 3 cyc Cortex A510 8 cyc 4 cyc A64FX 29 cyc 6 cyc
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
a83aa33d |
| 16-Jun-2022 |
Bradley Smith <[email protected]> |
[IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of
[IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace.
Differential Revision: https://reviews.llvm.org/D127976
show more ...
|
| #
fb4d3d23 |
| 21-Jun-2022 |
David Green <[email protected]> |
[AArch64] Remove unnecessary funnel shift sve costs.
D127680 added some unnecessary funnel shift costs for AArch64 to "match the legacy behaviour". The default costs are closer to the correct values
[AArch64] Remove unnecessary funnel shift sve costs.
D127680 added some unnecessary funnel shift costs for AArch64 to "match the legacy behaviour". The default costs are closer to the correct values and line up with the scalar/neon costs better. Remove the lines again to clean up the code, they can be added back at a later date with better values if needed.
show more ...
|
| #
db85345f |
| 20-Jun-2022 |
Philip Reames <[email protected]> |
[BasicTTI] Allow generic handling of scalable vector fshr/fshl
This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9aba8, when sinking a unco
[BasicTTI] Allow generic handling of scalable vector fshr/fshl
This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9aba8, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue.
Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing.
AArch64 costing was changed to preserve legacy behavior. There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target.
Differential Revision: https://reviews.llvm.org/D127680
show more ...
|
| #
b329156f |
| 17-Jun-2022 |
Tiehu Zhang <[email protected]> |
[AArch64][LV] AArch64 does not prefer vectorized addressing
TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads. For aarch64, only gather/scatter (supported by SVE)
[AArch64][LV] AArch64 does not prefer vectorized addressing
TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads. For aarch64, only gather/scatter (supported by SVE) can deal with vectors of addresses. This patch specializes the hook for AArch64, to return true only when we enable SVE.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D124612
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
bb82f746 |
| 23-May-2022 |
Jingu Kang <[email protected]> |
Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.
The commmit from https://reviews.llvm.org/D125918 has fix
Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.
The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure.
Differential Revision: https://reviews.llvm.org/D118979
show more ...
|
| #
5f4541fe |
| 06-May-2022 |
Bradley Smith <[email protected]> |
[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic
Differential Revision: https://reviews.llvm.org/D125233
|
| #
17a73992 |
| 10-May-2022 |
Florian Hahn <[email protected]> |
[AArch64] Remove redundant f{min,max}nm intrinsics.
The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify llvm.aarch64.neon.f{min,max}nm(a, a) -> a.
This helps with simplifying code wr
[AArch64] Remove redundant f{min,max}nm intrinsics.
The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify llvm.aarch64.neon.f{min,max}nm(a, a) -> a.
This helps with simplifying code written using the ACLE, e.g. see https://godbolt.org/z/jYxsoc89c
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D125234
show more ...
|
| #
dccc69a3 |
| 06-May-2022 |
David Green <[email protected]> |
[AArch64] Add extra reverse costs.
This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model.
Differential Revision: https://reviews.llvm.org/D
[AArch64] Add extra reverse costs.
This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model.
Differential Revision: https://reviews.llvm.org/D124786
show more ...
|
| #
2dcb2d85 |
| 02-May-2022 |
David Green <[email protected]> |
[AArch64] Cost modelling for fptoi_sat
This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many commo
[AArch64] Cost modelling for fptoi_sat
This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many common types they will be legal instructions as the AArch64 instructions will saturate naturally. For unsupported pairs of integer and floating point types, an additional min/max clamp is needed.
Differential Revision: https://reviews.llvm.org/D124357
show more ...
|
| #
6918a15f |
| 29-Apr-2022 |
David Kreitzer <[email protected]> |
Test commit. Fixed a typo in a comment.
|
|
Revision tags: llvmorg-14.0.3 |
|
| #
46cef9a8 |
| 27-Apr-2022 |
David Green <[email protected]> |
[AArch64] Attempt to fix bots by ensuring legalized type is a vector
|
| #
8e2a0e61 |
| 27-Apr-2022 |
David Green <[email protected]> |
[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost
Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that
[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost
Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call.
Differential Revision: https://reviews.llvm.org/D123414
show more ...
|
| #
d6327050 |
| 27-Apr-2022 |
David Green <[email protected]> |
[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost
Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly mor
[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost
Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly more accurate cost for the resulting shuffle.
Differential Revision: https://reviews.llvm.org/D123409
show more ...
|
| #
fa8a9fea |
| 26-Apr-2022 |
Vasileios Porpodas <[email protected]> |
Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.
Code review: https://reviews.llvm.or
Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.
Code review: https://reviews.llvm.org/D124202
show more ...
|
| #
6a9bbd9f |
| 26-Apr-2022 |
Vasileios Porpodas <[email protected]> |
Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 55ce296d6f217fd0defed2592ff7b74b79b2c1f0.
|
|
Revision tags: llvmorg-14.0.2 |
|
| #
55ce296d |
| 21-Apr-2022 |
Vasileios Porpodas <[email protected]> |
[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is no
[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
show more ...
|
| #
4e971efa |
| 22-Apr-2022 |
Vasileios Porpodas <[email protected]> |
Recommit "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64"
This reverts commit 7052a0ad689b990265ec79bd2b0a7d6e8c131bfe.
|
| #
7052a0ad |
| 22-Apr-2022 |
Vasileios Porpodas <[email protected]> |
Revert "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64"
This reverts commit 7ba702644bac6df166a02bbd692c1599a95a7c8b.
|
| #
7ba70264 |
| 12-Apr-2022 |
Vasileios Porpodas <[email protected]> |
[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64
The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts the lookahead score of splat load
[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64
The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts the lookahead score of splat loads ad they can be done by the `movddup` instruction that combines the load and the broadcast and is cheap to execute.
A similar issue shows up on AArch64. The `ld1r` instruction performs a broadcast load and is cheap to execute.
This patch implements the TargetTransformInfo hooks for AArch64.
Differential Revision: https://reviews.llvm.org/D123638
show more ...
|
| #
42ebfa82 |
| 12-Apr-2022 |
Muhammad Omair Javaid <[email protected]> |
Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.
This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:
ht
Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.
This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:
https://lab.llvm.org/buildbot/#/builders/176/builds/1515
llvm-tblgen crashes after applying this patch.
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
fa784f63 |
| 07-Apr-2022 |
David Green <[email protected]> |
[AArch64] Insert subvector costs
An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so long as the index is
[AArch64] Insert subvector costs
An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so long as the index is a multiple of 4. This teaches the cost model that, using code copied over from the X86 backend.
Some of the costs (v16i16_4_0) are still high because they get matched as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen tests for inserting subvectors, which I were added as llvm/test/CodeGen/AArch64/insert-subvector.ll.
Differential Revision: https://reviews.llvm.org/D120880
show more ...
|
| #
64b6192e |
| 05-Apr-2022 |
Jingu Kang <[email protected]> |
[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth
Set the maximum VF of AArch64 with 128 / the size of smallest type in loop.
Differential Revision: https://reviews.llvm.org/D118979
|
| #
750bf358 |
| 04-Apr-2022 |
David Green <[email protected]> |
[AArch64] Increase cost of v2i64 multiplies
The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4*extract + 2*insert + 2*mul. Scalarizing to/from gpr registers are expensive
[AArch64] Increase cost of v2i64 multiplies
The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4*extract + 2*insert + 2*mul. Scalarizing to/from gpr registers are expensive though, and the cost wasn't high enough to prevent vectorizing in places where it can be detrimental for performance. This increases it so that the costs of copying to/from GPRs is increased to 2 each, with the total cost increasing to 14. So long as umull/smull are handled correctly (as in D123006) this seems to lead to better vectorization factors and better performance.
Differential Revision: https://reviews.llvm.org/D123007
show more ...
|