Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 02d6950d 01-Jul-2022 David Sherwood <[email protected]>

[LoopVectorize][NFC] Add optional Name parameter to VPInstruction

This patch is a simple piece of refactoring that now permits users
to create VPInstructions and specify the name of the value being

[LoopVectorize][NFC] Add optional Name parameter to VPInstruction

This patch is a simple piece of refactoring that now permits users
to create VPInstructions and specify the name of the value being
generated. This is useful for creating more readable/meaningful
names in IR.

Differential Revision: https://reviews.llvm.org/D128982

show more ...


# bc19b7c3 07-Jul-2022 Florian Hahn <[email protected]>

[LV] Remove collectTriviallyDeadInstructions, already handled by VP DCE.

Now that removeDeadRecipes can remove most dead recipes across a whole
VPlan, there is no need to first collect some dead ins

[LV] Remove collectTriviallyDeadInstructions, already handled by VP DCE.

Now that removeDeadRecipes can remove most dead recipes across a whole
VPlan, there is no need to first collect some dead instructions.
Instead removeDeadRecipes can simply clean them up.

Depends D127580.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D128408

show more ...


# 644a965c 04-Jul-2022 Florian Hahn <[email protected]>

[LV] Vectorize cases with larger number of RT checks, execute only if profitable.

This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approac

[LV] Vectorize cases with larger number of RT checks, execute only if profitable.

This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approach.

The new approach allows vectorization with a larger number of runtime
checks in general, but only executes the vector loop (and runtime checks) if
considered profitable at runtime. Profitable here means that the cost-model
indicates that the runtime check cost + vector loop cost < scalar loop cost.

To do that, LV computes the minimum trip count for which runtime check cost
+ vector-loop-cost < scalar loop cost.

Note that there is still a hard cut-off to avoid excessive compile-time/code-size
increases, but it is much larger than the original limit.

The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource
is mostly neutral, but the new approach can give substantial gains in cases where
we failed to vectorize before due to the over-aggressive cut-offs.

On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%)
and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.

```
CFP2006/447.dealII/447.dealII -1.9%
CINT2017rate/525.x264_r/525.x264_r -2.2%
ASC_Sequoia/IRSmk/IRSmk -9.2%
Rodinia/srad/srad -36.1%
```

`size` regressions on AArch64 with -O3 are

```
MultiSource/Applications/hbd/hbd 90256.00 106768.00 18.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000 240676.00 257268.00 6.9%
MultiSourc...enchmarks/mafft/pairlocalalign 472603.00 489131.00 3.5%
External/S...2017rate/525.x264_r/525.x264_r 613831.00 630343.00 2.7%
External/S...NT2006/464.h264ref/464.h264ref 818920.00 835448.00 2.0%
External/S...te/538.imagick_r/538.imagick_r 1994730.00 2027754.00 1.7%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 1236471.00 1253015.00 1.3%
MultiSource/Applications/oggenc/oggenc 2108147.00 2124675.00 0.8%
External/S.../CFP2006/447.dealII/447.dealII 4742999.00 4759559.00 0.3%
External/S...rate/510.parest_r/510.parest_r 14206377.00 14239433.00 0.2%
```

Reviewed By: lebedev.ri, ebrevnov, dmgreen

Differential Revision: https://reviews.llvm.org/D109368

show more ...


# 0dddf04c 01-Jul-2022 Florian Hahn <[email protected]>

[LV] Don't optimize exit cond during epilogue vectorization.

At the moment, the same VPlan can be used code generation of both the
main vector and epilogue vector loop. This can lead to wrong result

[LV] Don't optimize exit cond during epilogue vectorization.

At the moment, the same VPlan can be used code generation of both the
main vector and epilogue vector loop. This can lead to wrong results, if
the plan is optimized based on the VF of the main vector loop and then
re-used for the epilogue loop.

One example where this is problematic is if the scalar loops need to
execute at least one iteration, e.g. due to interleave groups.

To prevent mis-compiles in the short-term, disable optimizing exit
conditions for VPlans when using epilogue vectorization. The proper fix
is to avoid re-using the same plan for both loops, which will require
support for cloning plans first.

Fixes #56319.

show more ...


# cb69ba4f 24-Jun-2022 Florian Hahn <[email protected]>

[LV] Create RT checks once VF/IC are selected, track scalar cost.

This patch updates LV to generate runtime after the VF & IC are selected. It
allows deciding whether to vectorize with runtime check

[LV] Create RT checks once VF/IC are selected, track scalar cost.

This patch updates LV to generate runtime after the VF & IC are selected. It
allows deciding whether to vectorize with runtime checks or not based on
their cost compared to the vector loop.

It also updates VectorizationFactor to include the scalar cost.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D75981

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4
# 3ed9f603 19-May-2022 Tiehu Zhang <[email protected]>

[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold

The runtime check threshold should also restrict interleave count.
Otherwise, too many runtime checks will be

[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold

The runtime check threshold should also restrict interleave count.
Otherwise, too many runtime checks will be generated for some cases.

Reviewed By: fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D122126

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 1b89c832 21-Mar-2022 serge-sans-paille <[email protected]>

Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.or

Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122181

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1
# 3a3cb929 07-Feb-2022 Kazu Hirata <[email protected]>

[llvm] Use = default (NFC)


Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 5b362e4c 20-Dec-2021 Florian Hahn <[email protected]>

[VPlan] Add Debugloc to VPInstruction.

Upcoming changes require attaching debug locations to VPInstructions,
e.g. adding induction increment recipes in D113223.

Reviewed By: Ayal

Differential Revi

[VPlan] Add Debugloc to VPInstruction.

Upcoming changes require attaching debug locations to VPInstructions,
e.g. adding induction increment recipes in D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115123

show more ...


# e90630e5 13-Dec-2021 Florian Hahn <[email protected]>

[VPlan] Remove unused createNaryOp (NFC).


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# c42bb30b 07-Sep-2021 David Sherwood <[email protected]>

[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies

At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
we bail out if the main vector loop uses a

[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies

At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
we bail out if the main vector loop uses a scalable VF. This patch adds
support for generating epilogue vector loops using a fixed-width VF when the
main vector loop uses a scalable VF.

I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor
so that we convert the scalable VF into a fixed-width VF and do profitability
checks on that instead. In addition, since the scalable and fixed-width VFs
live in different VPlans that means I had to change the calls to
LVP.hasPlanWithVFs so that we only pass in the fixed-width VF.

New tests added here:

Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll

Differential Revision: https://reviews.llvm.org/D109432

show more ...


# 3d706c20 04-Oct-2021 David Sherwood <[email protected]>

[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor

I have removed LoopVectorizationPlanner::setBestPlan, since this
function is quite aggressive because it deletes all other plans
e

[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor

I have removed LoopVectorizationPlanner::setBestPlan, since this
function is quite aggressive because it deletes all other plans
except the one containing the <VF,UF> pair required. The code is
currently written to assume that all <VF,UF> pairs will live in the
same vplan. This is overly restrictive, since scalable VFs live in
different plans to fixed-width VFS. When we add support for
vectorising epilogue loops when the main loop uses scalable vectors
then we will the vplan for the main loop will be different to the
epilogue.

Instead I have added a new function called

LoopVectorizationPlanner::getBestPlanFor

that returns the best vplan for the <VF,UF> pair requested and leaves
all the vplans untouched. We then pass this best vplan to

LoopVectorizationPlanner::executePlan

which now takes an additional VPlanPtr argument.

Differential revision: https://reviews.llvm.org/D111125

show more ...


Revision tags: llvmorg-13.0.0-rc2
# a00aafc3 07-Aug-2021 Florian Hahn <[email protected]>

[VPlan] Iterate over phi recipes to detect reductions to fix.

After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them

[VPlan] Iterate over phi recipes to detect reductions to fix.

After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4
# f9967256 28-Jun-2021 Kerry McLaughlin <[email protected]>

[LoopVectorize] Fix strict reductions where VF = 1

Currently we will allow loops with a fixed width VF of 1 to vectorize
if the -enable-strict-reductions flag is set. However, the loop vectorizer
wi

[LoopVectorize] Fix strict reductions where VF = 1

Currently we will allow loops with a fixed width VF of 1 to vectorize
if the -enable-strict-reductions flag is set. However, the loop vectorizer
will not use ordered reductions if `VF.isScalar()` and the resulting
vectorized loop will be out of order.

This patch removes `VF.isVector()` when checking if ordered reductions
should be used. Also, instead of converting the FAdds to reductions if the
VF = 1, operands of the FAdds are changed such that the order is preserved.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D104533

show more ...


Revision tags: llvmorg-12.0.1-rc3
# cc5ee857 25-Jun-2021 Florian Hahn <[email protected]>

[LV] Doxygenize VectorizationFactor member comments (NFC).

Minor cleanup for follow-up patch.


Revision tags: llvmorg-12.0.1-rc2
# aa00b1d7 31-May-2021 Florian Hahn <[email protected]>

[LV] Try to sink users recursively for first-order recurrences.

Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to s

[LV] Try to sink users recursively for first-order recurrences.

Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.

Fixes PR44286 (and another PR or two).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84951

show more ...


Revision tags: llvmorg-12.0.1-rc1
# 81fdc73e 18-May-2021 Sander de Smalen <[email protected]>

[LV] Return both fixed and scalable Max VF from computeMaxVF.

This patch introduces a new class, MaxVFCandidates, that holds the
maximum vectorization factors that have been computed for both scalab

[LV] Return both fixed and scalable Max VF from computeMaxVF.

This patch introduces a new class, MaxVFCandidates, that holds the
maximum vectorization factors that have been computed for both scalable
and fixed-width vectors.

This patch is intended to be NFC for fixed-width vectors, although
considering a scalable max VF (which is disabled by default) pessimises
tail-loop elimination, since it can no longer determine if any chosen VF
(less than fixed/scalable MaxVFs) is guaranteed to handle all vector
iterations if the trip-count is known. This issue will be addressed in
a future patch.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D98721

show more ...


# 86729538 19-Apr-2021 Sander de Smalen <[email protected]>

[LV] Let selectVectorizationFactor reason directly on VectorizationFactor.

Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single Vect

[LV] Let selectVectorizationFactor reason directly on VectorizationFactor.

Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100121

show more ...


# 49999d43 15-Apr-2021 Florian Hahn <[email protected]>

[VPlan] Replace a few unnecessary includes with forward decls.


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# c773d0f9 29-Mar-2021 Florian Hahn <[email protected]>

Recommit "[LV] Move runtime pointer size check to LVP::plan()."

Re-apply 25fbe803d4db, with a small update to emit the right remark
class.

Original message:
[LV] Move runtime pointer size check

Recommit "[LV] Move runtime pointer size check to LVP::plan()."

Re-apply 25fbe803d4db, with a small update to emit the right remark
class.

Original message:
[LV] Move runtime pointer size check to LVP::plan().

This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.

A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.

Reviewed By: lebedev.ri

show more ...


# 485c8ce7 29-Mar-2021 Florian Hahn <[email protected]>

Revert "[LV] Move runtime pointer size check to LVP::plan()."

This reverts commit 25fbe803d4dbcf8ff3a3a9ca161f5b9a68353ed0.

This breaks a clang test which filters for the wrong remark type.


# 25fbe803 29-Mar-2021 Florian Hahn <[email protected]>

[LV] Move runtime pointer size check to LVP::plan().

This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in t

[LV] Move runtime pointer size check to LVP::plan().

This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.

A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98634

show more ...


# 92205cb2 19-Mar-2021 Andrei Elovikov <[email protected]>

[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)"

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98897


# 93a9d2de 18-Mar-2021 Andrei Elovikov <[email protected]>

[VPlan] Add plain text (not DOT's digraph) dumps

I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially

[VPlan] Add plain text (not DOT's digraph) dumps

I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
LIT test would become too obscure. I can imagine that we'd want to CHECK
against VPlan dumps after multiple transformations instead. That would be
easier with plain text dumps than with DOT format.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96628

show more ...


# 3614df35 18-Mar-2021 Mehdi Amini <[email protected]>

Revert "[VPlan] Add plain text (not DOT's digraph) dumps"

This reverts commit 6b053c9867a3ede32e51cef3ed972d5ce5b38bc0.
The build is broken:

ld.lld: error: undefined symbol: llvm::VPlan::printDOT(l

Revert "[VPlan] Add plain text (not DOT's digraph) dumps"

This reverts commit 6b053c9867a3ede32e51cef3ed972d5ce5b38bc0.
The build is broken:

ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a

show more ...


123