|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2 |
|
| #
5f620d00 |
| 23-Sep-2022 |
Florian Hahn <[email protected]> |
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lower
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre 151c144 any longer.
Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization.
Fixes #57912.
(cherry picked from commit 2c692d891ed639779b1c4b504ca63037bbacc0e8)
show more ...
|
|
Revision tags: llvmorg-15.0.1 |
|
| #
b73d2c8c |
| 19-Sep-2022 |
Florian Hahn <[email protected]> |
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails o
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases.
At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening.
This can lead to widened phis with incorrect start values being created in the epilogue vector body.
This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before 151c144. This effectively reverts 151c144, but the long-term fix is to properly support widened inductions during epilogue vectorization
Fixes #57712.
show more ...
|
|
Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
| #
d945a2c9 |
| 09-Aug-2022 |
Dinar Temirbulatov <[email protected]> |
[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.
After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation
[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.
After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value.
Differential Revision: https://reviews.llvm.org/D130755
(cherry picked from commit cab6cd68340255be241b7cf169c67a1899ced115)
show more ...
|
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
95a932fb |
| 25-Jul-2022 |
Kazu Hirata <[email protected]> |
Remove redundaunt override specifiers (NFC)
Identified with modernize-use-override.
|
| #
2d2e2e7e |
| 23-Jul-2022 |
Kazu Hirata <[email protected]> |
[Vectorize] Remove isConsecutiveLoadOrStore (NFC)
The last use was removed on Jan 4, 2022 in commit 95a93722db2d10753f8887cf6a61380936d32f1c.
|
| #
b5c72136 |
| 22-Jul-2022 |
Philip Reames <[email protected]> |
[LV] Use early return to simplify code structure
|
| #
5a445395 |
| 22-Jul-2022 |
Benjamin Kramer <[email protected]> |
[LV] Remove unused variable. NFC.
|
| #
d7bf81fd |
| 22-Jul-2022 |
Philip Reames <[email protected]> |
[LV] Rework widening cost of uniform memory ops for clarity [nfc]
Reorganize the code to make it clear what is and isn't handle, and why. Restructure bailout to remove (false and confusing) dependen
[LV] Rework widening cost of uniform memory ops for clarity [nfc]
Reorganize the code to make it clear what is and isn't handle, and why. Restructure bailout to remove (false and confusing) dependence on CM_Scalarize; just return invalid cost and propagate, that's what it is for.
show more ...
|
| #
bd753501 |
| 21-Jul-2022 |
Philip Reames <[email protected]> |
[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst
This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the common name, these are different. * LVs notion mea
[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst
This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the common name, these are different. * LVs notion means that only the first lane *of each unrolled part* is required. That is, lanes within a single unroll factor are considered uniform. This allows e.g. widenable memory ops to be considered uses of uniform computations. * LVL and LAI's notion refers to all lanes across all unrollings.
IsUniformMem is in turn defined in terms of LAI's notion. Thus a UniformMemOpmeans is a memory operation with a loop invariant address. This means the same address is accessed in every iteration.
The tweaked piece of code was trying to match a uniform mem op (i.e. fully loop invariant address), but instead checked for LV's notion of uniformity. In theory, this meant with UF > 1, we could speculate a load which wasn't safe to execute.
This ends up being mostly silent in current code as it is nearly impossible to create the case where this difference is visible. The closest I've come in the test case from 54cb87, but even then, the incorrect result is only visible in the vplan debug output; before this change we sink the unsafely speculated load back into the user's predicate blocks before emitting IR. Both before and after IR are correct so the differences aren't "interesting".
The other test changes are uninteresting. They're cases where LV's uniform analysis is slightly weaker than SCEV isLoopInvariant.
show more ...
|
| #
f15b6b29 |
| 12-Jul-2022 |
David Sherwood <[email protected]> |
[AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following co
[AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met:
1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences.
Currently the default option is "disabled", but this will be changed in a later patch.
I've added new tests to show the options behave as expected here:
Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
Differential Revision: https://reviews.llvm.org/D129560
show more ...
|
| #
523a526a |
| 19-Jul-2022 |
Philip Reames <[email protected]> |
[LV] Fix miscompile due to srem/sdiv speculation safety condition
An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a
[LV] Fix miscompile due to srem/sdiv speculation safety condition
An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block.
Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose.
Differential Revision: https://reviews.llvm.org/D130106
show more ...
|
| #
a75760a2 |
| 19-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Remove unnecessary cast in widenCallInstruction. (NFC)
|
| #
30e53b8c |
| 18-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Sink module variable and use State to set it in widenCall. (NFC)
Limits the lifetime of the variable and makes it independent of CallInst.
|
| #
105032f5 |
| 18-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Use PHI recipe instead of PredRecipe for subsequent uses.
At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of the predicate recipe. This incorrectly models the def-use chai
[LV] Use PHI recipe instead of PredRecipe for subsequent uses.
At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of the predicate recipe. This incorrectly models the def-use chains, as all later uses should use the phi recipe. Fix that by delaying recording of the recipe.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D129436
show more ...
|
| #
cc0ee179 |
| 17-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Move VPPredInstPHIRecipe::execute to VPlanRecipes.cpp (NFC)
|
| #
6813b41d |
| 16-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Avoid creating new run-time VF expression for each runtime checks.
At the moment, the cost of runtime checks for scalable vectors is overestimated due to creating separate vscale * VF expressio
[LV] Avoid creating new run-time VF expression for each runtime checks.
At the moment, the cost of runtime checks for scalable vectors is overestimated due to creating separate vscale * VF expressions for each check. Instead re-use the first expression.
show more ...
|
| #
aa00fb02 |
| 15-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Use umax(VF * UF, MinProfTC) for scalable vectors.
For scalable vectors, it is not sufficient to only check MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because this property
[LV] Use umax(VF * UF, MinProfTC) for scalable vectors.
For scalable vectors, it is not sufficient to only check MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because this property may not holder for larger values of vscale. In those cases, compute umax(VF * UF, MinProfTC) instead.
This should fix https://lab.llvm.org/buildbot/#/builders/197/builds/2262
show more ...
|
| #
bd404fbc |
| 15-Jul-2022 |
Mel Chen <[email protected]> |
[LV][NFC] Fix the condition for printing debug messages
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D128523
|
| #
611ffcf4 |
| 14-Jul-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use value instead of getValue (NFC)
|
| #
6f7347b8 |
| 14-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Use PredRecipe directly instead of getOrAddVPValue (NFC).
There is no need to look up the VPValue for Instr, PredRecipe can be used directly.
|
| #
225e3ec6 |
| 13-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Move VPBranchOnMaskRecipe::execute to VPlanRecipes.cpp (NFC).
|
| #
307ace7f |
| 11-Jul-2022 |
David Sherwood <[email protected]> |
[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs
When vectorising ordered reductions we call a function LoopVectorizationPlanner::adjustRecipesForReductions to replace th
[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs
When vectorising ordered reductions we call a function LoopVectorizationPlanner::adjustRecipesForReductions to replace the existing VPWidenRecipe for the fadd instruction with a new VPReductionRecipe. We attempt to insert the new recipe in the same place, but this is wrong because createBlockInMask may have generated new recipes that VPReductionRecipe now depends upon. I have changed the insertion code to append the recipe to the VPBasicBlock instead.
Added a new RUN with tail-folding enabled to the existing test:
Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
Differential Revision: https://reviews.llvm.org/D129550
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
6b694d60 |
| 21-Jun-2022 |
David Sherwood <[email protected]> |
[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF
When calculating the cost of Instruction::Br in getInstructionCost we query PredicatedBBsAfterVectorization to see if there is a s
[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF
When calculating the cost of Instruction::Br in getInstructionCost we query PredicatedBBsAfterVectorization to see if there is a scalar predicated block. However, this meant that the decisions being made for a given fixed-width VF were affecting the cost for a scalable VF. As a result we were returning InstructionCost::Invalid pointlessly for a scalable VF that should have a low cost. I encountered this for some loops when enabling tail-folding for scalable VFs.
Test added here:
Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
Differential Revision: https://reviews.llvm.org/D128272
show more ...
|
| #
5d135041 |
| 11-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC).
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
03fee671 |
| 10-May-2022 |
David Sherwood <[email protected]> |
[LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operat
[LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop.
This patch adds support for using the active lane mask for control flow by:
1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch.
I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region.
Differential Revision: https://reviews.llvm.org/D125301
show more ...
|