|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
4b7913c3 |
| 16-Jul-2022 |
David Green <[email protected]> |
[VectorCombine] Only consider shuffle uses with the same type.
The backend getShuffleCosts do not currently handle shuffles that change size very well. Limit the shuffles we collect to the same type
[VectorCombine] Only consider shuffle uses with the same type.
The backend getShuffleCosts do not currently handle shuffles that change size very well. Limit the shuffles we collect to the same type to make sure they do not cause issues as reported in D128732.
show more ...
|
| #
519d7876 |
| 07-Jul-2022 |
Sander de Smalen <[email protected]> |
[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.
This addresses https://github.com/llvm/llvm-project/issues/56377
Reviewed By: fhahn
Differential Revision: ht
[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.
This addresses https://github.com/llvm/llvm-project/issues/56377
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D129136
show more ...
|
| #
5493f8fc |
| 05-Jul-2022 |
David Green <[email protected]> |
[VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the t
[VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask
This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x
The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree.
This is a recommit with some additional checks for supported forms and out-of-bounds mask elements, with some extra tests.
Differential Revision: https://reviews.llvm.org/D128732
show more ...
|
| #
b69c75d5 |
| 05-Jul-2022 |
Nikita Popov <[email protected]> |
Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"
This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d.
Clang crashes while linking bullet from llvm-test-suite in ReleaseL
Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"
This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d.
Clang crashes while linking bullet from llvm-test-suite in ReleaseLTO-g cmake configuration.
show more ...
|
| #
19a1e20b |
| 04-Jul-2022 |
David Green <[email protected]> |
[VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the t
[VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask
This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x
The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree.
Differential Revision: https://reviews.llvm.org/D128732
show more ...
|
| #
bdba8278 |
| 29-Jun-2022 |
Nikita Popov <[email protected]> |
[VectorCombine] Avoid ConstantExpr::get() (NFC)
Use IRBuilder APIs instead, which will still constant fold.
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
c399b3a6 |
| 18-Jun-2022 |
Kazu Hirata <[email protected]> |
[Vectorize] Use llvm::is_contained (NFC)
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
6f9e1ea0 |
| 08-May-2022 |
David Green <[email protected]> |
[VectorCombine] Attempt to fold select shuffles from reductions
Given a commutative reduction leading from a shuffle, the order of the lanes on the shuffle are not important for the result. This mea
[VectorCombine] Attempt to fold select shuffles from reductions
Given a commutative reduction leading from a shuffle, the order of the lanes on the shuffle are not important for the result. This means we can reorder the shuffle to something simpler, which we try shuffling the first vector lanes first. This was D123494.
The new shuffle may not be profitable though, and if it is not we can try the folding of select shuffles from D123911. This, with some adjustment as the output lane ordering is now unimportant, can allow the final shuffle to simplify given the inputs to the patterns from D123911. Where as each transformation on their own are not profitable, the combination is.
We can only support a single shuffle when called from reductions, but we are able to sort the ReconstructMask, potentially allowing it to simplify to an identity or concat mask.
Differential Revision: https://reviews.llvm.org/D125086
show more ...
|
| #
100cb9a2 |
| 06-May-2022 |
David Green <[email protected]> |
[VectorCombine] Fold shuffle select pattern
This patch adds a combine to attempt to reduce the costs of certain select-shuffle patterns. The form of code it attempts to detect is: %x = shuffle ...
[VectorCombine] Fold shuffle select pattern
This patch adds a combine to attempt to reduce the costs of certain select-shuffle patterns. The form of code it attempts to detect is: %x = shuffle ... %y = shuffle ... %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask
A classic select-mask will pick items from each lane of a or b. These do not always have a great lowering on many architectures. This patch attempts to pack a and b into the lower elements, creating a differently ordered shuffle for reconstructing the orignal which may be better than the select mask. This can be better for performance, especially if less elements of a and b need to be computed and the input shuffles are cheaper.
Because select-masks are just one form of shuffle, we generalize to any mask. So long as the backend has decent costmodel for the shuffles, this can generally improve things when they come up. For more basic cost models the folds do not appear to be profitable, not getting past the cost checks.
Differential Revision: https://reviews.llvm.org/D123911
show more ...
|
| #
70306542 |
| 03-May-2022 |
serge-sans-paille <[email protected]> |
[iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few regressions, fixing them.
Differential Revision: https://reviews.llvm.
[iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few regressions, fixing them.
Differential Revision: https://reviews.llvm.org/D124847
show more ...
|
| #
34f97a37 |
| 01-May-2022 |
Simon Pilgrim <[email protected]> |
[VectorCombine] Merge isa<>/cast<> into dyn_cast<>. NFC.
We want to handle the the assert in VectorCombine so avoid the repeated isa/cast code.
|
| #
7047c479 |
| 29-Apr-2022 |
David Green <[email protected]> |
[VecCombine] Fix sort comparator logic in foldShuffleFromReductions
I think this sort comparator was overly complex, and the windows expensive check bot agreed, failing as it was not giving a strict
[VecCombine] Fix sort comparator logic in foldShuffleFromReductions
I think this sort comparator was overly complex, and the windows expensive check bot agreed, failing as it was not giving a strict weak ordering. Change it to use the comparison of the mask values as unsigned integers. This should sort the undef elements to the end whilst keeping X<Y otherwise.
show more ...
|
|
Revision tags: llvmorg-14.0.3 |
|
| #
ded8187e |
| 28-Apr-2022 |
David Green <[email protected]> |
[VectorCombine] Try to reduce shuffle cost for commutative reduction operands
Given a shuffle feeding a commutative reduction, the lane ordering of the shuffle will not alter the result. This is als
[VectorCombine] Try to reduce shuffle cost for commutative reduction operands
Given a shuffle feeding a commutative reduction, the lane ordering of the shuffle will not alter the result. This is also true if there are a number of operations between the reduction and the shuffle, providing they only operate lane-wise. This patch searches for cases like that in Vector Combine, allowing us to check the cost of the shuffle vs an in-order identity shuffle and replace the order if possible. This only handles a single shuffle at the moment to keep things simple, and is able to ignore splats that produce results where every result is the same.
This is a more powerful version of a combine that already happens in instrcombine, capable of optimizing more cases by looking through more instructions and being able to cost the shuffle.
Differential Revision: https://reviews.llvm.org/D123494
show more ...
|
|
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
2e44b787 |
| 16-Mar-2022 |
Fraser Cormack <[email protected]> |
[VectorCombine] Insert addrspacecast when crossing address space boundaries
We can not bitcast pointers across different address spaces. This was previously fixed in D89577 but then in D93229 an enh
[VectorCombine] Insert addrspacecast when crossing address space boundaries
We can not bitcast pointers across different address spaces. This was previously fixed in D89577 but then in D93229 an enhancement was added which peeks further through the ponter operand, opening up the possibility that address-space violations could be introduced.
Instead of bailing as the previous fix did, simply insert an addrspacecast cast instruction.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D121787
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
c141d158 |
| 19-Feb-2022 |
Florian Hahn <[email protected]> |
[VectorCombine] Remove redundant checks (NFC).
The removed conditions are already checked by the if above.
Fixes #53761.
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
0edf9995 |
| 28-Dec-2021 |
Sanjay Patel <[email protected]> |
[Analysis] allow caller to choose signed/unsigned when computing constant range
We should not lose analysis precision if an 'add' has both no-wrap flags (nsw and nuw) compared to just one or the oth
[Analysis] allow caller to choose signed/unsigned when computing constant range
We should not lose analysis precision if an 'add' has both no-wrap flags (nsw and nuw) compared to just one or the other.
This patch is modeled on a similar construct that was added with D59386.
I don't think it is possible to expose a problem with an unsigned compare because of the way this was coded (nuw is handled first).
InstCombine has an assert that fires with the example from: https://github.com/llvm/llvm-project/issues/52884 ...because it was expecting InstSimplify to handle this kind of pattern with an smax.
Fixes #52884
Differential Revision: https://reviews.llvm.org/D116322
show more ...
|
| #
5a81a603 |
| 09-Dec-2021 |
Arthur Eubanks <[email protected]> |
[NFC] Remove more calls to getAlignment()
These are deprecated and should be replaced with getAlign().
Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align val
[NFC] Remove more calls to getAlignment()
These are deprecated and should be replaced with getAlign().
Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
66d22b4d |
| 21-Oct-2021 |
Sanjay Patel <[email protected]> |
[VectorCombine] fold shuffle-of-binops with common operand
shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)
This is motivated by an example in D111800 (although that patch avoids the problem
[VectorCombine] fold shuffle-of-binops with common operand
shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)
This is motivated by an example in D111800 (although that patch avoids the problem for that particular example).
The pattern is shown in reduced form with: https://llvm.org/PR52178 https://alive2.llvm.org/ce/z/d8zB4D
There is no difference on the PhaseOrdering test from D111800 because the aarch64 cost model says that the shuffle cost is 3 while the fadd cost is 2.
Differential Revision: https://reviews.llvm.org/D111901
show more ...
|
| #
4a1d63d7 |
| 15-Oct-2021 |
Florian Hahn <[email protected]> |
[VectorCombine] Add option to only run scalarization transforms.
This patch adds a pass option to only run transforms that scalarize vector operations and do not create new vector instructions.
Whe
[VectorCombine] Add option to only run scalarization transforms.
This patch adds a pass option to only run transforms that scalarize vector operations and do not create new vector instructions.
When running VectorCombine early in the pipeline introducing new vector operations can have negative effects, like blocking loop or SLP vectorization. To avoid regressions, restrict the early VectorCombine run (when using -enable-matrix) to only perform scalarization and not introduce new vector operations.
This is done as option to the pass directly, which is then set when adding the pass to the pipeline. This is done for the new pass manager only.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D111800
show more ...
|
| #
098a0d8f |
| 30-Sep-2021 |
Hongtao Yu <[email protected]> |
[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.
Not exactly like DbgIntrinsi
[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.
Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110847
show more ...
|
| #
0dcd2b40 |
| 06-Oct-2021 |
Simon Pilgrim <[email protected]> |
[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost
We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SS
[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost
We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument.
This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known.
Differential Revision: https://reviews.llvm.org/D111024
show more ...
|
| #
e2f6290e |
| 28-Sep-2021 |
Florian Hahn <[email protected]> |
[VectorCombine] Discard ScalarizationResult state in early exit.
ScalarizationResult's destructor makes sure ToFreeze is not ignored if set. Currently, scalarizeLoadExtract has an early exit if the
[VectorCombine] Discard ScalarizationResult state in early exit.
ScalarizationResult's destructor makes sure ToFreeze is not ignored if set. Currently, scalarizeLoadExtract has an early exit if the index is not safe directly. But when it is SafeWithFreeze, we need to discard the state first, otherwise we hit the assert in the destructor.
Fixes PR51992.
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
300870a9 |
| 22-Sep-2021 |
Florian Hahn <[email protected]> |
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines.
Suggested in D100302.
The
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines.
Suggested in D100302.
The main use case at the moment is foldSingleElementStore and scalarizeLoadExtract working together to improve scalarization.
Note that we now also do not run SimplifyInstructionsInBlock on the whole function if there have been changes. This means we fail to remove/simplify instructions not related to any of the vector combines. IMO this is fine, as simplifying the whole function seems more like a workaround for not tracking the changed instructions.
Compile-time impact looks neutral: NewPM-O3: +0.02% NewPM-ReleaseThinLTO: -0.00% NewPM-ReleaseLTO-g: -0.02%
http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D110171
show more ...
|
| #
5131037e |
| 21-Sep-2021 |
Florian Hahn <[email protected]> |
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts compute
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree.
The use VectorCombine is updated to pass through the DT to enable additional scalarization.
Note that similar APIs like computeKnownBits already accept optional dominator tree arguments.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110175
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc3 |
|
| #
c24fc37e |
| 13-Sep-2021 |
Florian Hahn <[email protected]> |
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be6605 limited scalarization to indices that are known non-poison. For certain patterns that restrict the range of an index, we
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be6605 limited scalarization to indices that are known non-poison. For certain patterns that restrict the range of an index, we can insert a freeze of the original value, to prevent propagation of poison.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D107580
show more ...
|