History log of /llvm-project-15.0.7/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (Results 1 – 25 of 1687)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2
# 5f620d00 23-Sep-2022 Florian Hahn <[email protected]>

[LV] Update handling of scalable pointer inductions after b73d2c8.

The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lower

[LV] Update handling of scalable pointer inductions after b73d2c8.

The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lowering
didn't expect/support the behavior pre 151c144 any longer.

Update the code dealing with scalable pointer inductions to also check
for uniformity in combination with isScalarAfterVectorization. This
should ensure scalable pointer inductions are handled properly during
epilogue vectorization.

Fixes #57912.

(cherry picked from commit 2c692d891ed639779b1c4b504ca63037bbacc0e8)

show more ...


Revision tags: llvmorg-15.0.1
# b73d2c8c 19-Sep-2022 Florian Hahn <[email protected]>

[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.

Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails o

[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.

Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.

show more ...


Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3
# d945a2c9 09-Aug-2022 Dinar Temirbulatov <[email protected]>

[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.

After D121595 was commited, I noticed regressions assosicated with small trip
count numbersvectorisation

[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.

After D121595 was commited, I noticed regressions assosicated with small trip
count numbersvectorisation by tail folding with scalable vectors. As a solution
for those issues I propose to introduce the minimal trip count threshold value.

Differential Revision: https://reviews.llvm.org/D130755

(cherry picked from commit cab6cd68340255be241b7cf169c67a1899ced115)

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 95a932fb 25-Jul-2022 Kazu Hirata <[email protected]>

Remove redundaunt override specifiers (NFC)

Identified with modernize-use-override.


# 2d2e2e7e 23-Jul-2022 Kazu Hirata <[email protected]>

[Vectorize] Remove isConsecutiveLoadOrStore (NFC)

The last use was removed on Jan 4, 2022 in commit
95a93722db2d10753f8887cf6a61380936d32f1c.


# b5c72136 22-Jul-2022 Philip Reames <[email protected]>

[LV] Use early return to simplify code structure


# 5a445395 22-Jul-2022 Benjamin Kramer <[email protected]>

[LV] Remove unused variable. NFC.


# d7bf81fd 22-Jul-2022 Philip Reames <[email protected]>

[LV] Rework widening cost of uniform memory ops for clarity [nfc]

Reorganize the code to make it clear what is and isn't handle, and why.
Restructure bailout to remove (false and confusing) dependen

[LV] Rework widening cost of uniform memory ops for clarity [nfc]

Reorganize the code to make it clear what is and isn't handle, and why.
Restructure bailout to remove (false and confusing) dependence on
CM_Scalarize; just return invalid cost and propagate, that's what it
is for.

show more ...


# bd753501 21-Jul-2022 Philip Reames <[email protected]>

[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst

This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the
common name, these are different.
* LVs notion mea

[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst

This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the
common name, these are different.
* LVs notion means that only the first lane *of each unrolled part* is
required. That is, lanes within a single unroll factor are considered
uniform. This allows e.g. widenable memory ops to be considered
uses of uniform computations.
* LVL and LAI's notion refers to all lanes across all unrollings.

IsUniformMem is in turn defined in terms of LAI's notion. Thus a
UniformMemOpmeans is a memory operation with a loop invariant address.
This means the same address is accessed in every iteration.

The tweaked piece of code was trying to match a uniform mem op (i.e.
fully loop invariant address), but instead checked for LV's notion of
uniformity. In theory, this meant with UF > 1, we could speculate
a load which wasn't safe to execute.

This ends up being mostly silent in current code as it is nearly
impossible to create the case where this difference is visible. The
closest I've come in the test case from 54cb87, but even then, the
incorrect result is only visible in the vplan debug output; before this
change we sink the unsafely speculated load back into the user's predicate
blocks before emitting IR. Both before and after IR are correct so the
differences aren't "interesting".

The other test changes are uninteresting. They're cases where LV's uniform
analysis is slightly weaker than SCEV isLoopInvariant.

show more ...


# f15b6b29 12-Jul-2022 David Sherwood <[email protected]>

[AArch64] Add target hook for preferPredicateOverEpilogue

This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following co

[AArch64] Add target hook for preferPredicateOverEpilogue

This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560

show more ...


# 523a526a 19-Jul-2022 Philip Reames <[email protected]>

[LV] Fix miscompile due to srem/sdiv speculation safety condition

An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a

[LV] Fix miscompile due to srem/sdiv speculation safety condition

An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block.

Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose.

Differential Revision: https://reviews.llvm.org/D130106

show more ...


# a75760a2 19-Jul-2022 Florian Hahn <[email protected]>

[LV] Remove unnecessary cast in widenCallInstruction. (NFC)


# 30e53b8c 18-Jul-2022 Florian Hahn <[email protected]>

[LV] Sink module variable and use State to set it in widenCall. (NFC)

Limits the lifetime of the variable and makes it independent of
CallInst.


# 105032f5 18-Jul-2022 Florian Hahn <[email protected]>

[LV] Use PHI recipe instead of PredRecipe for subsequent uses.

At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of
the predicate recipe. This incorrectly models the def-use chai

[LV] Use PHI recipe instead of PredRecipe for subsequent uses.

At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of
the predicate recipe. This incorrectly models the def-use chains, as all
later uses should use the phi recipe. Fix that by delaying recording of
the recipe.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D129436

show more ...


# cc0ee179 17-Jul-2022 Florian Hahn <[email protected]>

[LV] Move VPPredInstPHIRecipe::execute to VPlanRecipes.cpp (NFC)


# 6813b41d 16-Jul-2022 Florian Hahn <[email protected]>

[LV] Avoid creating new run-time VF expression for each runtime checks.

At the moment, the cost of runtime checks for scalable vectors is
overestimated due to creating separate vscale * VF expressio

[LV] Avoid creating new run-time VF expression for each runtime checks.

At the moment, the cost of runtime checks for scalable vectors is
overestimated due to creating separate vscale * VF expressions for each
check. Instead re-use the first expression.

show more ...


# aa00fb02 15-Jul-2022 Florian Hahn <[email protected]>

[LV] Use umax(VF * UF, MinProfTC) for scalable vectors.

For scalable vectors, it is not sufficient to only check
MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because
this property

[LV] Use umax(VF * UF, MinProfTC) for scalable vectors.

For scalable vectors, it is not sufficient to only check
MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because
this property may not holder for larger values of vscale. In those
cases, compute umax(VF * UF, MinProfTC) instead.

This should fix
https://lab.llvm.org/buildbot/#/builders/197/builds/2262

show more ...


# bd404fbc 15-Jul-2022 Mel Chen <[email protected]>

[LV][NFC] Fix the condition for printing debug messages

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D128523


# 611ffcf4 14-Jul-2022 Kazu Hirata <[email protected]>

[llvm] Use value instead of getValue (NFC)


# 6f7347b8 14-Jul-2022 Florian Hahn <[email protected]>

[LV] Use PredRecipe directly instead of getOrAddVPValue (NFC).

There is no need to look up the VPValue for Instr, PredRecipe can be
used directly.


# 225e3ec6 13-Jul-2022 Florian Hahn <[email protected]>

[LV] Move VPBranchOnMaskRecipe::execute to VPlanRecipes.cpp (NFC).


# 307ace7f 11-Jul-2022 David Sherwood <[email protected]>

[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs

When vectorising ordered reductions we call a function
LoopVectorizationPlanner::adjustRecipesForReductions to replace th

[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs

When vectorising ordered reductions we call a function
LoopVectorizationPlanner::adjustRecipesForReductions to replace the
existing VPWidenRecipe for the fadd instruction with a new
VPReductionRecipe. We attempt to insert the new recipe in the same
place, but this is wrong because createBlockInMask may have
generated new recipes that VPReductionRecipe now depends upon. I
have changed the insertion code to append the recipe to the
VPBasicBlock instead.

Added a new RUN with tail-folding enabled to the existing test:

Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D129550

show more ...


Revision tags: llvmorg-14.0.6
# 6b694d60 21-Jun-2022 David Sherwood <[email protected]>

[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF

When calculating the cost of Instruction::Br in getInstructionCost
we query PredicatedBBsAfterVectorization to see if there is a
s

[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF

When calculating the cost of Instruction::Br in getInstructionCost
we query PredicatedBBsAfterVectorization to see if there is a
scalar predicated block. However, this meant that the decisions
being made for a given fixed-width VF were affecting the cost for a
scalable VF. As a result we were returning InstructionCost::Invalid
pointlessly for a scalable VF that should have a low cost. I
encountered this for some loops when enabling tail-folding for
scalable VFs.

Test added here:

Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll

Differential Revision: https://reviews.llvm.org/D128272

show more ...


# 5d135041 11-Jul-2022 Florian Hahn <[email protected]>

[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC).


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# 03fee671 10-May-2022 David Sherwood <[email protected]>

[LoopVectorize] Add option to use active lane mask for loop control flow

Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operat

[LoopVectorize] Add option to use active lane mask for loop control flow

Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301

show more ...


12345678910>>...68