|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
02d6950d |
| 01-Jul-2022 |
David Sherwood <[email protected]> |
[LoopVectorize][NFC] Add optional Name parameter to VPInstruction
This patch is a simple piece of refactoring that now permits users to create VPInstructions and specify the name of the value being
[LoopVectorize][NFC] Add optional Name parameter to VPInstruction
This patch is a simple piece of refactoring that now permits users to create VPInstructions and specify the name of the value being generated. This is useful for creating more readable/meaningful names in IR.
Differential Revision: https://reviews.llvm.org/D128982
show more ...
|
| #
bc19b7c3 |
| 07-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Remove collectTriviallyDeadInstructions, already handled by VP DCE.
Now that removeDeadRecipes can remove most dead recipes across a whole VPlan, there is no need to first collect some dead ins
[LV] Remove collectTriviallyDeadInstructions, already handled by VP DCE.
Now that removeDeadRecipes can remove most dead recipes across a whole VPlan, there is no need to first collect some dead instructions. Instead removeDeadRecipes can simply clean them up.
Depends D127580.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D128408
show more ...
|
| #
644a965c |
| 04-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Vectorize cases with larger number of RT checks, execute only if profitable.
This patch replaces the tight hard cut-off for the number of runtime checks with a more accurate cost-driven approac
[LV] Vectorize cases with larger number of RT checks, execute only if profitable.
This patch replaces the tight hard cut-off for the number of runtime checks with a more accurate cost-driven approach.
The new approach allows vectorization with a larger number of runtime checks in general, but only executes the vector loop (and runtime checks) if considered profitable at runtime. Profitable here means that the cost-model indicates that the runtime check cost + vector loop cost < scalar loop cost.
To do that, LV computes the minimum trip count for which runtime check cost + vector-loop-cost < scalar loop cost.
Note that there is still a hard cut-off to avoid excessive compile-time/code-size increases, but it is much larger than the original limit.
The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource is mostly neutral, but the new approach can give substantial gains in cases where we failed to vectorize before due to the over-aggressive cut-offs.
On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%) and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.
``` CFP2006/447.dealII/447.dealII -1.9% CINT2017rate/525.x264_r/525.x264_r -2.2% ASC_Sequoia/IRSmk/IRSmk -9.2% Rodinia/srad/srad -36.1% ```
`size` regressions on AArch64 with -O3 are
``` MultiSource/Applications/hbd/hbd 90256.00 106768.00 18.3% MultiSourc...ks/ASCI_Purple/SMG2000/smg2000 240676.00 257268.00 6.9% MultiSourc...enchmarks/mafft/pairlocalalign 472603.00 489131.00 3.5% External/S...2017rate/525.x264_r/525.x264_r 613831.00 630343.00 2.7% External/S...NT2006/464.h264ref/464.h264ref 818920.00 835448.00 2.0% External/S...te/538.imagick_r/538.imagick_r 1994730.00 2027754.00 1.7% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 1236471.00 1253015.00 1.3% MultiSource/Applications/oggenc/oggenc 2108147.00 2124675.00 0.8% External/S.../CFP2006/447.dealII/447.dealII 4742999.00 4759559.00 0.3% External/S...rate/510.parest_r/510.parest_r 14206377.00 14239433.00 0.2% ```
Reviewed By: lebedev.ri, ebrevnov, dmgreen
Differential Revision: https://reviews.llvm.org/D109368
show more ...
|
| #
0dddf04c |
| 01-Jul-2022 |
Florian Hahn <[email protected]> |
[LV] Don't optimize exit cond during epilogue vectorization.
At the moment, the same VPlan can be used code generation of both the main vector and epilogue vector loop. This can lead to wrong result
[LV] Don't optimize exit cond during epilogue vectorization.
At the moment, the same VPlan can be used code generation of both the main vector and epilogue vector loop. This can lead to wrong results, if the plan is optimized based on the VF of the main vector loop and then re-used for the epilogue loop.
One example where this is problematic is if the scalar loops need to execute at least one iteration, e.g. due to interleave groups.
To prevent mis-compiles in the short-term, disable optimizing exit conditions for VPlans when using epilogue vectorization. The proper fix is to avoid re-using the same plan for both loops, which will require support for cloning plans first.
Fixes #56319.
show more ...
|
| #
cb69ba4f |
| 24-Jun-2022 |
Florian Hahn <[email protected]> |
[LV] Create RT checks once VF/IC are selected, track scalar cost.
This patch updates LV to generate runtime after the VF & IC are selected. It allows deciding whether to vectorize with runtime check
[LV] Create RT checks once VF/IC are selected, track scalar cost.
This patch updates LV to generate runtime after the VF & IC are selected. It allows deciding whether to vectorize with runtime checks or not based on their cost compared to the vector loop.
It also updates VectorizationFactor to include the scalar cost.
Reviewed By: lebedev.ri, dmgreen
Differential Revision: https://reviews.llvm.org/D75981
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
3ed9f603 |
| 19-May-2022 |
Tiehu Zhang <[email protected]> |
[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold
The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be
[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold
The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be generated for some cases.
Reviewed By: fhahn, dmgreen
Differential Revision: https://reviews.llvm.org/D122126
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
1b89c832 |
| 21-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.or
Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122181
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1 |
|
| #
3a3cb929 |
| 07-Feb-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use = default (NFC)
|
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
5b362e4c |
| 20-Dec-2021 |
Florian Hahn <[email protected]> |
[VPlan] Add Debugloc to VPInstruction.
Upcoming changes require attaching debug locations to VPInstructions, e.g. adding induction increment recipes in D113223.
Reviewed By: Ayal
Differential Revi
[VPlan] Add Debugloc to VPInstruction.
Upcoming changes require attaching debug locations to VPInstructions, e.g. adding induction increment recipes in D113223.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D115123
show more ...
|
| #
e90630e5 |
| 13-Dec-2021 |
Florian Hahn <[email protected]> |
[VPlan] Remove unused createNaryOp (NFC).
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
c42bb30b |
| 07-Sep-2021 |
David Sherwood <[email protected]> |
[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies
At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor we bail out if the main vector loop uses a
[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies
At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor we bail out if the main vector loop uses a scalable VF. This patch adds support for generating epilogue vector loops using a fixed-width VF when the main vector loop uses a scalable VF.
I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor so that we convert the scalable VF into a fixed-width VF and do profitability checks on that instead. In addition, since the scalable and fixed-width VFs live in different VPlans that means I had to change the calls to LVP.hasPlanWithVFs so that we only pass in the fixed-width VF.
New tests added here:
Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll
Differential Revision: https://reviews.llvm.org/D109432
show more ...
|
| #
3d706c20 |
| 04-Oct-2021 |
David Sherwood <[email protected]> |
[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor
I have removed LoopVectorizationPlanner::setBestPlan, since this function is quite aggressive because it deletes all other plans e
[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor
I have removed LoopVectorizationPlanner::setBestPlan, since this function is quite aggressive because it deletes all other plans except the one containing the <VF,UF> pair required. The code is currently written to assume that all <VF,UF> pairs will live in the same vplan. This is overly restrictive, since scalable VFs live in different plans to fixed-width VFS. When we add support for vectorising epilogue loops when the main loop uses scalable vectors then we will the vplan for the main loop will be different to the epilogue.
Instead I have added a new function called
LoopVectorizationPlanner::getBestPlanFor
that returns the best vplan for the <VF,UF> pair requested and leaves all the vplans untouched. We then pass this best vplan to
LoopVectorizationPlanner::executePlan
which now takes an additional VPlanPtr argument.
Differential revision: https://reviews.llvm.org/D111125
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
a00aafc3 |
| 07-Aug-2021 |
Florian Hahn <[email protected]> |
[VPlan] Iterate over phi recipes to detect reductions to fix.
After refactoring the phi recipes, we can now iterate over all header phis in a VPlan to detect reductions when it comes to fixing them
[VPlan] Iterate over phi recipes to detect reductions to fix.
After refactoring the phi recipes, we can now iterate over all header phis in a VPlan to detect reductions when it comes to fixing them up when tail folding.
This reduces the coupling with the cost model & legal by using the information directly available in VPlan. It also removes a call to getOrAddVPValue, which references the original IR value which may become outdated after VPlan transformations.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D100102
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4 |
|
| #
f9967256 |
| 28-Jun-2021 |
Kerry McLaughlin <[email protected]> |
[LoopVectorize] Fix strict reductions where VF = 1
Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer wi
[LoopVectorize] Fix strict reductions where VF = 1
Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order.
This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D104533
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc3 |
|
| #
cc5ee857 |
| 25-Jun-2021 |
Florian Hahn <[email protected]> |
[LV] Doxygenize VectorizationFactor member comments (NFC).
Minor cleanup for follow-up patch.
|
|
Revision tags: llvmorg-12.0.1-rc2 |
|
| #
aa00b1d7 |
| 31-May-2021 |
Florian Hahn <[email protected]> |
[LV] Try to sink users recursively for first-order recurrences.
Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to s
[LV] Try to sink users recursively for first-order recurrences.
Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction.
Fixes PR44286 (and another PR or two).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84951
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1 |
|
| #
81fdc73e |
| 18-May-2021 |
Sander de Smalen <[email protected]> |
[LV] Return both fixed and scalable Max VF from computeMaxVF.
This patch introduces a new class, MaxVFCandidates, that holds the maximum vectorization factors that have been computed for both scalab
[LV] Return both fixed and scalable Max VF from computeMaxVF.
This patch introduces a new class, MaxVFCandidates, that holds the maximum vectorization factors that have been computed for both scalable and fixed-width vectors.
This patch is intended to be NFC for fixed-width vectors, although considering a scalable max VF (which is disabled by default) pessimises tail-loop elimination, since it can no longer determine if any chosen VF (less than fixed/scalable MaxVFs) is guaranteed to handle all vector iterations if the trip-count is known. This issue will be addressed in a future patch.
Reviewed By: fhahn, david-arm
Differential Revision: https://reviews.llvm.org/D98721
show more ...
|
| #
86729538 |
| 19-Apr-2021 |
Sander de Smalen <[email protected]> |
[LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single Vect
[LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float.
This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost).
The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D100121
show more ...
|
| #
49999d43 |
| 15-Apr-2021 |
Florian Hahn <[email protected]> |
[VPlan] Replace a few unnecessary includes with forward decls.
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
c773d0f9 |
| 29-Mar-2021 |
Florian Hahn <[email protected]> |
Recommit "[LV] Move runtime pointer size check to LVP::plan()."
Re-apply 25fbe803d4db, with a small update to emit the right remark class.
Original message: [LV] Move runtime pointer size check
Recommit "[LV] Move runtime pointer size check to LVP::plan()."
Re-apply 25fbe803d4db, with a small update to emit the right remark class.
Original message: [LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner.
A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately.
Reviewed By: lebedev.ri
show more ...
|
| #
485c8ce7 |
| 29-Mar-2021 |
Florian Hahn <[email protected]> |
Revert "[LV] Move runtime pointer size check to LVP::plan()."
This reverts commit 25fbe803d4dbcf8ff3a3a9ca161f5b9a68353ed0.
This breaks a clang test which filters for the wrong remark type.
|
| #
25fbe803 |
| 29-Mar-2021 |
Florian Hahn <[email protected]> |
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in t
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner.
A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D98634
show more ...
|
| #
92205cb2 |
| 19-Mar-2021 |
Andrei Elovikov <[email protected]> |
[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)"
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98897
|
| #
93a9d2de |
| 18-Mar-2021 |
Andrei Elovikov <[email protected]> |
[VPlan] Add plain text (not DOT's digraph) dumps
I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially
[VPlan] Add plain text (not DOT's digraph) dumps
I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96628
show more ...
|
| #
3614df35 |
| 18-Mar-2021 |
Mehdi Amini <[email protected]> |
Revert "[VPlan] Add plain text (not DOT's digraph) dumps"
This reverts commit 6b053c9867a3ede32e51cef3ed972d5ce5b38bc0. The build is broken:
ld.lld: error: undefined symbol: llvm::VPlan::printDOT(l
Revert "[VPlan] Add plain text (not DOT's digraph) dumps"
This reverts commit 6b053c9867a3ede32e51cef3ed972d5ce5b38bc0. The build is broken:
ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
show more ...
|