|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
81c648a3 |
| 18-May-2022 |
Nikita Popov <[email protected]> |
[LoopUnroll] Freeze tripcount rather than condition
This is a followup to D125754. We introduce two branches, one before the unrolled loop and one before the epilogue (and similar for the prologue c
[LoopUnroll] Freeze tripcount rather than condition
This is a followup to D125754. We introduce two branches, one before the unrolled loop and one before the epilogue (and similar for the prologue case). The previous patch only froze the condition on the first branch.
Rather than independently freezing the second condition, this patch instead freezes TripCount and bases BECount on it. These are the two quantities involved in the conditions, and this ensures that both work on a consistent, non-poisonous trip count.
Differential Revision: https://reviews.llvm.org/D125896
show more ...
|
| #
323514de |
| 17-May-2022 |
Nikita Popov <[email protected]> |
[LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits
When performing runtime unrolling with multiple exits, one of the earlier (non-latch) exits may exit the loop on the first
[LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits
When performing runtime unrolling with multiple exits, one of the earlier (non-latch) exits may exit the loop on the first iteration, such that we never branch on the latch exit condition. As such, we need to freeze the condition of the new branch that is introduced before the loop, as it now executes unconditionally.
Differential Revision: https://reviews.llvm.org/D125754
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
72031407 |
| 03-Jan-2022 |
Philip Reames <[email protected]> |
Revert "[unroll] Prune all but first copy of invariant exit"
This reverts commit 9bd22595bad36cd19f5e7ae18ccd9f41cba29dc5.
Seeing some bot failures which look plausibly connected. Revert while inv
Revert "[unroll] Prune all but first copy of invariant exit"
This reverts commit 9bd22595bad36cd19f5e7ae18ccd9f41cba29dc5.
Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize.
e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933
show more ...
|
| #
9bd22595 |
| 03-Jan-2022 |
Philip Reames <[email protected]> |
[unroll] Prune all but first copy of invariant exit
If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled it
[unroll] Prune all but first copy of invariant exit
If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead.
The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure.
Differential Revision: https://reviews.llvm.org/D116496
show more ...
|
| #
8906a0fe |
| 29-Nov-2021 |
Philip Reames <[email protected]> |
[SCEVExpander] Drop poison generating flags when reusing instructions
The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple
[SCEVExpander] Drop poison generating flags when reusing instructions
The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct.
This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid.
In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled.
On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE.
The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB.
Differential Revision: https://reviews.llvm.org/D112734
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
da327e72 |
| 15-Nov-2021 |
Philip Reames <[email protected]> |
Fix a misleading FIXME in an unroll test
|
| #
37ead201 |
| 12-Nov-2021 |
Philip Reames <[email protected]> |
[runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the
[runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.
Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.)
Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.
* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
show more ...
|
| #
de2fed61 |
| 12-Nov-2021 |
Philip Reames <[email protected]> |
[unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is
[unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.
With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.
However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
show more ...
|
| #
e01c91f2 |
| 12-Nov-2021 |
Philip Reames <[email protected]> |
[tests] Add coverage for cases we can prune exits when runtlme unrolling
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
fa82a3d0 |
| 02-Sep-2021 |
Philip Reames <[email protected]> |
[runtimeunroll] Support epilogue unrolling with a parent loop
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the
[runtimeunroll] Support epilogue unrolling with a parent loop
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop. When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.
Differential Revision: https://reviews.llvm.org/D108476
show more ...
|
| #
b604fcb7 |
| 31-Aug-2021 |
Philip Reames <[email protected]> |
[runtime] Move prolog/epilog block to a post-simplify strategy
The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one
[runtime] Move prolog/epilog block to a post-simplify strategy
The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one iteration. The old implementation did this by avoiding loop construction entirely. This patches instead constructs the trivial loop and then explicitly breaks the backedge and simplifies. This does result in some additional code churn when triggered, but a) results in better quality code and b) removes a codepath which didn't work properly for multiple exit epilogs.
One oddity that I want to draw to reviewer attention is that this somehow changes revisit order. The new order looks equivalent to me, but I don't understand how creating and erasing an extra loop here creates this effect.
Differential Revision: https://reviews.llvm.org/D108521
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
17b9cb18 |
| 19-Aug-2021 |
Philip Reames <[email protected]> |
[runtimeunroll] Support multiple exits to latch exit w/prolog loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the s
[runtimeunroll] Support multiple exits to latch exit w/prolog loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.
This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch. I roled in the analogous fix here as it seemed obvious, and not worth re-review.
As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic. (Alternatively, maybe we should be peeling these cases more aggressively?)
Differential Revision: https://reviews.llvm.org/D108262
show more ...
|
| #
94d09142 |
| 18-Aug-2021 |
Philip Reames <[email protected]> |
[runtimeunroll] Support multiple exits to latch exit w/epilogue loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the
[runtimeunroll] Support multiple exits to latch exit w/epilogue loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.
I decided to restrict this to the epilogue case. Given the changes ended up being pretty generic, we may be able to unblock the prolog case too, but I want to do that in a separate change to reduce the amount of code we all have to understand at one time.
Differential Revision: https://reviews.llvm.org/D107381
show more ...
|
| #
54934923 |
| 18-Aug-2021 |
Philip Reames <[email protected]> |
[test] Remove a redundant test line
This was made redundant when I removed -instcombine from output in 70ffd65c, but I didn't notice. nikic pointed that out in review of D107381
|
| #
911991d2 |
| 03-Aug-2021 |
Philip Reames <[email protected]> |
[tests] Autogen an unroll test for ease of update
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
95848ea1 |
| 26-Aug-2020 |
Roman Lebedev <[email protected]> |
[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks
As FIXME said, they really should be checking for a single user, not use, so let's do that. It i
[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks
As FIXME said, they really should be checking for a single user, not use, so let's do that. It is not *that* unusual to have the same value as incoming value in a PHI node, not unlike how a PHI may have the same incoming basic block more than once.
There isn't a nice way to do that, Value::users() isn't uniqified, and Value only tracks it's uses, not Users, so the check is potentially costly since it does indeed potentially involes traversing the entire use list of a value.
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
| #
c3b8bd1e |
| 04-Jul-2020 |
Roman Lebedev <[email protected]> |
[InstCombine] Always try to invert non-canonical predicate of an icmp
Summary: The actual transform i was going after was: https://rise4fun.com/Alive/Tp9H ``` Name: zz Pre: isPowerOf2(C0) && isPower
[InstCombine] Always try to invert non-canonical predicate of an icmp
Summary: The actual transform i was going after was: https://rise4fun.com/Alive/Tp9H ``` Name: zz Pre: isPowerOf2(C0) && isPowerOf2(C1) && C1 == C0 %t0 = and i8 %x, C0 %r = icmp eq i8 %t0, C1 => %t = icmp eq i8 %t0, 0 %r = xor i1 %t, -1
Name: zz Pre: isPowerOf2(C0) %t0 = and i8 %x, C0 %r = icmp ne i8 %t0, 0 => %t = icmp eq i8 %t0, 0 %r = xor i1 %t, -1 ``` but as it can be seen from the current tests, we already canonicalize most of it, and we are only missing handling multi-use non-canonical icmp predicates.
If we have both `!=0` and `==0`, even though we can CSE them, we end up being stuck with them. We should canonicalize to the `==0`.
I believe this is one of the cleanup steps i'll need after `-scalarizer` if i end up proceeding with my WIP alloca promotion helper pass.
Reviewers: spatel, jdoerfert, nikic
Reviewed By: nikic
Subscribers: zzheng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83139
show more ...
|
| #
17a15c32 |
| 03-Jul-2020 |
Roman Lebedev <[email protected]> |
[NFCI][LoopUnroll] s/%tmp/%i/ in one test to silence update script warning
|
|
Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
1badf7c3 |
| 06-Mar-2020 |
Roman Lebedev <[email protected]> |
[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant
Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org
[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant
Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org/z/G24anE
This resolves phase-ordering bug that was introduced in D75145 for https://godbolt.org/z/2gBwF2 https://godbolt.org/z/XvgSua
Reviewers: spatel, nikic, dmgreen, xbolva00
Reviewed By: nikic, xbolva00
Subscribers: hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75757
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2 |
|
| #
0f22e783 |
| 02-Dec-2019 |
Roman Lebedev <[email protected]> |
[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100)
rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two inst
[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100)
rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away.
Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since.
From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'.
As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed.
However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute.
Doing so doesn't seem to break any particular idioms, as least as far as i can see.
References https://bugs.llvm.org/show_bug.cgi?id=44100
show more ...
|
|
Revision tags: llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
| #
5a89ba73 |
| 24-Jun-2019 |
Matt Arsenault <[email protected]> |
InstCombine: Preserve nuw when reassociating nuw ops [1/3]
Alive says this is OK.
llvm-svn: 364233
|
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
| #
cee313d2 |
| 17-Apr-2019 |
Eric Christopher <[email protected]> |
Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.
Will be re-reverting again.
llvm-svn: 358552
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
| #
bae11e79 |
| 28-Dec-2018 |
Anna Thomas <[email protected]> |
[UnrollRuntime] NFC: Updated exiting tests and added more tests
Added more tests for multiple exiting blocks to the LatchExit. Today these cases are not supported. Patch to follow soon.
llvm-svn: 3
[UnrollRuntime] NFC: Updated exiting tests and added more tests
Added more tests for multiple exiting blocks to the LatchExit. Today these cases are not supported. Patch to follow soon.
llvm-svn: 350135
show more ...
|
| #
98743fa7 |
| 28-Dec-2018 |
Anna Thomas <[email protected]> |
[UnrollRuntime] NFC: Add comment and verify LCSSA
Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog.
llvm-svn: 350131
|
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1 |
|
| #
512dde77 |
| 15-Sep-2017 |
Anna Thomas <[email protected]> |
[RuntimeUnrolling] Populate the VMap entry correctly when default generated through lookup
During runtime unrolling on loops with multiple exits, we update the exit blocks with the correct phi value
[RuntimeUnrolling] Populate the VMap entry correctly when default generated through lookup
During runtime unrolling on loops with multiple exits, we update the exit blocks with the correct phi values from both original and remainder loop. In this process, we lookup the VMap for the mapped incoming phi values, but did not update the VMap if a default entry was generated in the VMap during the lookup. This default value is generated when constants or values outside the current loop are looked up. This patch fixes the assertion failure when null entries are present in the VMap because of this lookup. Added a testcase that showcases the problem.
llvm-svn: 313358
show more ...
|