History log of /llvm-project-15.0.7/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (Results 1 – 25 of 283)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# f15b6b29 12-Jul-2022 David Sherwood <[email protected]>

[AArch64] Add target hook for preferPredicateOverEpilogue

This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following co

[AArch64] Add target hook for preferPredicateOverEpilogue

This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560

show more ...


# 7c3cda55 08-Jul-2022 Cullen Rhodes <[email protected]>

[AArch64][SVE] Prefer SIMD&FP variant of clast[ab]

The scalar variant with GPR source/dest has considerably higher latency
than the SIMD&FP scalar variant across a variety of micro-architectures:

[AArch64][SVE] Prefer SIMD&FP variant of clast[ab]

The scalar variant with GPR source/dest has considerably higher latency
than the SIMD&FP scalar variant across a variety of micro-architectures:

Core Scalar SIMD&FP
--------------------------------
Neoverse V1 9 cyc 3 cyc
Neoverse N2 8 cyc 3 cyc
Cortex A510 8 cyc 4 cyc
A64FX 29 cyc 6 cyc

show more ...


Revision tags: llvmorg-14.0.6
# a83aa33d 16-Jun-2022 Bradley Smith <[email protected]>

[IR] Move vector.insert/vector.extract out of experimental namespace

These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of

[IR] Move vector.insert/vector.extract out of experimental namespace

These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976

show more ...


# fb4d3d23 21-Jun-2022 David Green <[email protected]>

[AArch64] Remove unnecessary funnel shift sve costs.

D127680 added some unnecessary funnel shift costs for AArch64 to "match
the legacy behaviour". The default costs are closer to the correct
values

[AArch64] Remove unnecessary funnel shift sve costs.

D127680 added some unnecessary funnel shift costs for AArch64 to "match
the legacy behaviour". The default costs are closer to the correct
values and line up with the scalar/neon costs better. Remove the lines
again to clean up the code, they can be added back at a later date with
better values if needed.

show more ...


# db85345f 20-Jun-2022 Philip Reames <[email protected]>

[BasicTTI] Allow generic handling of scalable vector fshr/fshl

This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9aba8, when sinking a unco

[BasicTTI] Allow generic handling of scalable vector fshr/fshl

This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9aba8, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue.

Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing.

AArch64 costing was changed to preserve legacy behavior. There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target.

Differential Revision: https://reviews.llvm.org/D127680

show more ...


# b329156f 17-Jun-2022 Tiehu Zhang <[email protected]>

[AArch64][LV] AArch64 does not prefer vectorized addressing

TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads.
For aarch64, only gather/scatter (supported by SVE)

[AArch64][LV] AArch64 does not prefer vectorized addressing

TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads.
For aarch64, only gather/scatter (supported by SVE) can deal with vectors of addresses.
This patch specializes the hook for AArch64, to return true only when we enable SVE.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D124612

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# bb82f746 23-May-2022 Jingu Kang <[email protected]>

Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""

This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.

The commmit from https://reviews.llvm.org/D125918 has fix

Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""

This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.

The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build
failure.

Differential Revision: https://reviews.llvm.org/D118979

show more ...


# 5f4541fe 06-May-2022 Bradley Smith <[email protected]>

[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic

Differential Revision: https://reviews.llvm.org/D125233


# 17a73992 10-May-2022 Florian Hahn <[email protected]>

[AArch64] Remove redundant f{min,max}nm intrinsics.

The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify
llvm.aarch64.neon.f{min,max}nm(a, a) -> a.

This helps with simplifying code wr

[AArch64] Remove redundant f{min,max}nm intrinsics.

The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify
llvm.aarch64.neon.f{min,max}nm(a, a) -> a.

This helps with simplifying code written using the ACLE, e.g.
see https://godbolt.org/z/jYxsoc89c

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D125234

show more ...


# dccc69a3 06-May-2022 David Green <[email protected]>

[AArch64] Add extra reverse costs.

This adds some extra costs for reverse shuffles under AArch64, filling
in the i16/f16/i8 gaps in the cost model.

Differential Revision: https://reviews.llvm.org/D

[AArch64] Add extra reverse costs.

This adds some extra costs for reverse shuffles under AArch64, filling
in the i16/f16/i8 gaps in the cost model.

Differential Revision: https://reviews.llvm.org/D124786

show more ...


# 2dcb2d85 02-May-2022 David Green <[email protected]>

[AArch64] Cost modelling for fptoi_sat

This builds on top of the target-independent cost model added in D124269
to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics.
For many commo

[AArch64] Cost modelling for fptoi_sat

This builds on top of the target-independent cost model added in D124269
to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics.
For many common types they will be legal instructions as the AArch64
instructions will saturate naturally. For unsupported pairs of integer
and floating point types, an additional min/max clamp is needed.

Differential Revision: https://reviews.llvm.org/D124357

show more ...


# 6918a15f 29-Apr-2022 David Kreitzer <[email protected]>

Test commit. Fixed a typo in a comment.


Revision tags: llvmorg-14.0.3
# 46cef9a8 27-Apr-2022 David Green <[email protected]>

[AArch64] Attempt to fix bots by ensuring legalized type is a vector


# 8e2a0e61 27-Apr-2022 David Green <[email protected]>

[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost

Given a larger-than-legal shuffle mask, the final codegen will split
into multiple sub-vectors. This attempts to model that

[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost

Given a larger-than-legal shuffle mask, the final codegen will split
into multiple sub-vectors. This attempts to model that in
AArch64TTIImpl::getShuffleCost, splitting masks up according to the size
of the legalized vectors. If the sub-masks have at most 2 input sources
we can call getShuffleCost on them and sum the costs, to get a more
accurate final cost for the entire shuffle. The call to
improveShuffleKindFromMask helps to improve the shuffle kind for the
sub-mask cost call.

Differential Revision: https://reviews.llvm.org/D123414

show more ...


# d6327050 27-Apr-2022 David Green <[email protected]>

[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost

Given a shuffle with 4 elements size 16 or 32, we can use the costs
directly from the PerfectShuffle tables to get a slightly mor

[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost

Given a shuffle with 4 elements size 16 or 32, we can use the costs
directly from the PerfectShuffle tables to get a slightly more accurate
cost for the resulting shuffle.

Differential Revision: https://reviews.llvm.org/D123409

show more ...


# fa8a9fea 26-Apr-2022 Vasileios Porpodas <[email protected]>

Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"

This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.

Code review: https://reviews.llvm.or

Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"

This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.

Code review: https://reviews.llvm.org/D124202

show more ...


# 6a9bbd9f 26-Apr-2022 Vasileios Porpodas <[email protected]>

Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"

This reverts commit 55ce296d6f217fd0defed2592ff7b74b79b2c1f0.


Revision tags: llvmorg-14.0.2
# 55ce296d 21-Apr-2022 Vasileios Porpodas <[email protected]>

[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`

Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is no

[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`

Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202

show more ...


# 4e971efa 22-Apr-2022 Vasileios Porpodas <[email protected]>

Recommit "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64"

This reverts commit 7052a0ad689b990265ec79bd2b0a7d6e8c131bfe.


# 7052a0ad 22-Apr-2022 Vasileios Porpodas <[email protected]>

Revert "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64"

This reverts commit 7ba702644bac6df166a02bbd692c1599a95a7c8b.


# 7ba70264 12-Apr-2022 Vasileios Porpodas <[email protected]>

[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64

The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts
the lookahead score of splat load

[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64

The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts
the lookahead score of splat loads ad they can be done by the `movddup`
instruction that combines the load and the broadcast and is cheap to execute.

A similar issue shows up on AArch64. The `ld1r` instruction performs a broadcast
load and is cheap to execute.

This patch implements the TargetTransformInfo hooks for AArch64.

Differential Revision: https://reviews.llvm.org/D123638

show more ...


# 42ebfa82 12-Apr-2022 Muhammad Omair Javaid <[email protected]>

Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"

This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

ht

Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"

This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.

show more ...


Revision tags: llvmorg-14.0.1
# fa784f63 07-Apr-2022 David Green <[email protected]>

[AArch64] Insert subvector costs

An insert subvector under aarch64 can often be done as a single lane mov
operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so
long as the index is

[AArch64] Insert subvector costs

An insert subvector under aarch64 can often be done as a single lane mov
operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so
long as the index is a multiple of 4. This teaches the cost model that,
using code copied over from the X86 backend.

Some of the costs (v16i16_4_0) are still high because they get matched
as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen
tests for inserting subvectors, which I were added as
llvm/test/CodeGen/AArch64/insert-subvector.ll.

Differential Revision: https://reviews.llvm.org/D120880

show more ...


# 64b6192e 05-Apr-2022 Jingu Kang <[email protected]>

[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth

Set the maximum VF of AArch64 with 128 / the size of smallest type in loop.

Differential Revision: https://reviews.llvm.org/D118979


# 750bf358 04-Apr-2022 David Green <[email protected]>

[AArch64] Increase cost of v2i64 multiplies

The cost of a v2i64 multiply was special cased in D92208 as scalarized
into 4*extract + 2*insert + 2*mul. Scalarizing to/from gpr registers are
expensive

[AArch64] Increase cost of v2i64 multiplies

The cost of a v2i64 multiply was special cased in D92208 as scalarized
into 4*extract + 2*insert + 2*mul. Scalarizing to/from gpr registers are
expensive though, and the cost wasn't high enough to prevent vectorizing
in places where it can be detrimental for performance. This increases it
so that the costs of copying to/from GPRs is increased to 2 each, with
the total cost increasing to 14. So long as umull/smull are handled
correctly (as in D123006) this seems to lead to better vectorization
factors and better performance.

Differential Revision: https://reviews.llvm.org/D123007

show more ...


12345678910>>...12