|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
8d85e945 |
| 16-Nov-2021 |
Philip Reames <[email protected]> |
[SCEV] Canonicalize X - urem X, Y patterns
There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could
[SCEV] Canonicalize X - urem X, Y patterns
There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version.
The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency.
The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions.
Differential Revision: https://reviews.llvm.org/D114018
show more ...
|
| #
37ead201 |
| 12-Nov-2021 |
Philip Reames <[email protected]> |
[runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the
[runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.
Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.)
Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.
* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
show more ...
|
| #
de2fed61 |
| 12-Nov-2021 |
Philip Reames <[email protected]> |
[unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is
[unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.
With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.
However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
8fdd7c2f |
| 05-Jun-2021 |
Nikita Popov <[email protected]> |
[LoopUnroll] Clamp unroll count to MaxTripCount
Unrolling with more iterations than MaxTripCount is pointless, as those iterations can never be executed. As such, we clamp ULO.Count to MaxTripCount
[LoopUnroll] Clamp unroll count to MaxTripCount
Unrolling with more iterations than MaxTripCount is pointless, as those iterations can never be executed. As such, we clamp ULO.Count to MaxTripCount if it is known. This means we no longer need to consider iterations after MaxTripCount for exit folding, and the CompletelyUnroll flag becomes independent of ULO.TripCount.
Differential Revision: https://reviews.llvm.org/D103748
show more ...
|
| #
92ce29ee |
| 05-Jun-2021 |
Nikita Popov <[email protected]> |
[LoopUnroll] Regenerate test checks (NFC)
|
|
Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
a95796a3 |
| 26-Jun-2020 |
Arthur Eubanks <[email protected]> |
[NewPM][LoopUnroll] Rename unroll* to loop-unroll*
The legacy pass is called "loop-unroll", but in the new PM it's called "unroll". Also applied to unroll-and-jam and unroll-full.
Fixes various che
[NewPM][LoopUnroll] Rename unroll* to loop-unroll*
The legacy pass is called "loop-unroll", but in the new PM it's called "unroll". Also applied to unroll-and-jam and unroll-full.
Fixes various check-llvm tests when NPM is turned on.
Reviewed By: Whitney, dmgreen
Differential Revision: https://reviews.llvm.org/D82590
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
| #
cee313d2 |
| 17-Apr-2019 |
Eric Christopher <[email protected]> |
Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.
Will be re-reverting again.
llvm-svn: 358552
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2 |
|
| #
ecd90131 |
| 02-Aug-2017 |
Teresa Johnson <[email protected]> |
[PM] Split LoopUnrollPass and make partial unroller a function pass
Summary: This is largely NFC*, in preparation for utilizing ProfileSummaryInfo and BranchFrequencyInfo analyses. In this patch I a
[PM] Split LoopUnrollPass and make partial unroller a function pass
Summary: This is largely NFC*, in preparation for utilizing ProfileSummaryInfo and BranchFrequencyInfo analyses. In this patch I am only doing the splitting for the New PM, but I can do the same for the legacy PM as a follow-on if this looks good.
*Not NFC since for partial unrolling we lose the updates done to the loop traversal (adding new sibling and child loops) - according to Chandler this is not very useful for partial unrolling, but it also means that the debugging flag -unroll-revisit-child-loops no longer works for partial unrolling.
Reviewers: chandlerc
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36157
llvm-svn: 309886
show more ...
|
|
Revision tags: llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4 |
|
| #
21bef2cb |
| 02-Mar-2017 |
Evgeny Stupachenko <[email protected]> |
The patch turns on epilogue unroll for loops with constant recurency start. Summary:
Set unroll remainder to epilog if a loop contains a phi with constant parameter:
loop: pn = phi [Const, PreH
The patch turns on epilogue unroll for loops with constant recurency start. Summary:
Set unroll remainder to epilog if a loop contains a phi with constant parameter:
loop: pn = phi [Const, PreHeader], [pn.next, Latch] ...
Reviewer: hfinkel
Differential Revision: http://reviews.llvm.org/D27004
From: Evgeny Stupachenko <[email protected]> llvm-svn: 296770
show more ...
|
|
Revision tags: llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2 |
|
| #
ce40fa13 |
| 25-Jan-2017 |
Chandler Carruth <[email protected]> |
[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls loops.
We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details thro
[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls loops.
We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure.
I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane.
Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure.
Differential Revision: https://reviews.llvm.org/D28848
llvm-svn: 293011
show more ...
|
|
Revision tags: llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2 |
|
| #
b2738e41 |
| 02-Aug-2016 |
Michael Zolotukhin <[email protected]> |
[LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its original value.
As agreed in post-commit review of r265388, I'm switching the flag to its original value until the 90% run
[LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its original value.
As agreed in post-commit review of r265388, I'm switching the flag to its original value until the 90% runtime performance regression on SingleSource/Benchmarks/Stanford/Bubblesort is addressed.
llvm-svn: 277524
show more ...
|
|
Revision tags: llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
| #
188de5ae |
| 05-Apr-2016 |
David L Kreitzer <[email protected]> |
Adds the ability to use an epilog remainder loop during loop unrolling and makes this the default behavior.
Patch by Evgeny Stupachenko ([email protected]).
Differential Revision: http://reviews.l
Adds the ability to use an epilog remainder loop during loop unrolling and makes this the default behavior.
Patch by Evgeny Stupachenko ([email protected]).
Differential Revision: http://reviews.llvm.org/D18158
llvm-svn: 265388
show more ...
|
|
Revision tags: llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1, llvmorg-3.6.1, llvmorg-3.6.1-rc1 |
|
| #
71190fec |
| 12-Apr-2015 |
Sanjoy Das <[email protected]> |
[LoopUnrollRuntime] Clean up a predicate.
Clean up a predicate I added in r229731, fix the relevant comment and add a test case. The earlier version is confusing to read and was also buggy (probabl
[LoopUnrollRuntime] Clean up a predicate.
Clean up a predicate I added in r229731, fix the relevant comment and add a test case. The earlier version is confusing to read and was also buggy (probably not a coincidence) till Alexey fixed it in r233881.
llvm-svn: 234701
show more ...
|