|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
716ca2e3 |
| 20-Jul-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Pre-sink IR input for some tests
Edit the IR input for some codegen tests to simulate what the IR code sinking pass would do to it. This makes the tests immune to the presence or absence of
[AMDGPU] Pre-sink IR input for some tests
Edit the IR input for some codegen tests to simulate what the IR code sinking pass would do to it. This makes the tests immune to the presence or absence of the code sinking pass in the codegen pass pipeline, which does not belong there.
Differential Revision: https://reviews.llvm.org/D130169
show more ...
|
| #
c945d88d |
| 14-Jul-2022 |
Brendon Cahoon <[email protected]> |
Revert "[StructurizeCFG] Improve basic block ordering"
This reverts commit f1b05a0a2bbbea160002be709f8a1c59de366761.
Need to revert to due to issues identified with testing. The transformation is i
Revert "[StructurizeCFG] Improve basic block ordering"
This reverts commit f1b05a0a2bbbea160002be709f8a1c59de366761.
Need to revert to due to issues identified with testing. The transformation is incorrect for blocks that contain convergent instructions.
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
f1b05a0a |
| 06-Apr-2022 |
Brendon Cahoon <[email protected]> |
[StructurizeCFG] Improve basic block ordering
StructurizeCFG linearizes the successors of branching basic block by adding Flow blocks to record the true/false path for branches and back edges. This
[StructurizeCFG] Improve basic block ordering
StructurizeCFG linearizes the successors of branching basic block by adding Flow blocks to record the true/false path for branches and back edges. This patch reduces the number of Phi values needed to capture the control flow path by improving the basic block ordering.
Previously, StructurizeCFG adds loop exit blocks outside of the loop. StructurizeCFG sets a boolean value to indicate the path taken, and all exit block live values extend to after the loop. For loops with a large number of exits blocks, this creates a huge number of values that are maintained, which increases compilation time and register pressure. This is problem especially with ASAN, which adds early exits to blocks with unreachable instructions for each instrumented check in the loop.
In specific cases, this patch reduces the number of values needed after the loop by moving the exit block into the loop. This is done for blocks that have a single predecessor and single successor by moving the block to appear just after the predecessor.
Differential Revision: https://reviews.llvm.org/D123231
show more ...
|
| #
24e16e4a |
| 27-May-2022 |
Serguei Katkov <[email protected]> |
[SSAUpdaterImpl] Do not generate phi node with all the same incoming values
If all available vals to basic block are the same - do not build new phi node and just use this value.
Reviewed By: samee
[SSAUpdaterImpl] Do not generate phi node with all the same incoming values
If all available vals to basic block are the same - do not build new phi node and just use this value.
Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126525
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
565af157 |
| 25-Feb-2022 |
Carl Ritson <[email protected]> |
[AMDGPU] Extend pre-emit peephole to redundantly masked VCC
Extend pre-emit peephole for S_CBRANCH_VCC[N]Z to eliminate redundant S_AND operations against EXEC for V_CMP results in VCC. These occur
[AMDGPU] Extend pre-emit peephole to redundantly masked VCC
Extend pre-emit peephole for S_CBRANCH_VCC[N]Z to eliminate redundant S_AND operations against EXEC for V_CMP results in VCC. These occur after after register allocation when VCC has been selected as the comparison destination.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D120202
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
b9cf52bc |
| 03-Feb-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst
Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruc
[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst
Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruction, because AMDGPUInstrInfo::isUniformMMO can already detect that various non-instruction pointers are uniform.
Most of the test case churn is from tests that used undef as a pointer, which AMDGPUInstrInfo::isUniformMMO treats as uniform.
Differential Revision: https://reviews.llvm.org/D118909
show more ...
|
|
Revision tags: llvmorg-15-init |
|
| #
d2e5d351 |
| 31-Jan-2022 |
Jay Foad <[email protected]> |
[StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get awa
[StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get away with modifying an existing compare instruction instead. (StructurizeCFG is generally run late in the pipeline so instcombine does not clean them up for us.)
Differential Revision: https://reviews.llvm.org/D118623
show more ...
|
| #
8faad296 |
| 31-Jan-2022 |
Jay Foad <[email protected]> |
Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.
Apparently it is not safe to modify the condition even if it passe
Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.
Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.
show more ...
|
| #
a6b54dda |
| 28-Jan-2022 |
Jay Foad <[email protected]> |
[Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pi
[Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern.
Differential Revision: https://reviews.llvm.org/D118478
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
18f93512 |
| 19-Nov-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
| #
2f499b9a |
| 19-Dec-2020 |
Tony <[email protected]> |
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant cache
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
show more ...
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
7ecf1969 |
| 17-Nov-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary.
2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation.
Differential Revision: https://reviews.llvm.org/D91636
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
d8f651d3 |
| 09-Jun-2020 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] Enable structurizer workarounds by default
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D81211
|
|
Revision tags: llvmorg-10.0.1-rc1 |
|
| #
17e13da2 |
| 07-May-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Re-auto-generate test checks
|
| #
72e87549 |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard
Differential Revision: https://reviews.llvm.org/D77228
|
| #
51dc0283 |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
Revert e1730cfeb3588f20dcf4a96b181ad52761666e52
|
| #
e1730cfe |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard
Differential Revision: https://reviews.llvm.org/D77228
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
42febbab |
| 05-Mar-2020 |
Sameer Sahasrabuddhe <[email protected]> |
StructurizeCFG: simplify phi nodes when possible
After structurization, some phi nodes can have a single incoming edge and can be simplified away. This change runs a simplify query on all phis that
StructurizeCFG: simplify phi nodes when possible
After structurization, some phi nodes can have a single incoming edge and can be simplified away. This change runs a simplify query on all phis that are either modified or added by the structurizer. This also moves some phis closer to their use as a side benefit.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D75500
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc3 |
|
| #
534d8866 |
| 28-Feb-2020 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] add generated checks for some LIT tests
This is in prepration for further changes that affect these tests.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D75403
|
|
Revision tags: llvmorg-10.0.0-rc2 |
|
| #
00b22df7 |
| 03-Feb-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix extra type mangling on llvm.amdgcn.if.break
These have to be the same mask type.
|
|
Revision tags: llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
| #
68a2fef9 |
| 13-Jun-2019 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301
llvm-svn: 363339
|
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
| #
28212cc6 |
| 31-Oct-2018 |
Nicolai Haehnle <[email protected]> |
AMDGPU: Remove PHI loop condition optimization
Summary: The optimization to early break out of loops if all threads are dead was never fully implemented.
But the PHI node analyzing is actually caus
AMDGPU: Remove PHI loop condition optimization
Summary: The optimization to early break out of loops if all threads are dead was never fully implemented.
But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it.
(This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai)
Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <[email protected]>
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53359
llvm-svn: 345718
show more ...
|
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
| #
dfd14ade |
| 20-Jun-2018 |
Alina Sbirlea <[email protected]> |
Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. l
Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?)
2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken
After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken
Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges.
2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll
3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement.
Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D48202
llvm-svn: 335183
show more ...
|
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2 |
|
| #
5f915461 |
| 23-May-2018 |
Changpeng Fang <[email protected]> |
StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly
Summary: StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get
StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly
Summary: StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order. The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops. To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited before moving on to outer loop.
However, we found a problem for a SubRegion which is a loop itself:
--> BB1 --> BB2 --> BB3 -->
In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead BB2 to be placed in the wrong order.
In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth to guard the sorting.
Reviewers: arsenm, jlebar
Differential Revision: https://reviews.llvm.org/D46912
llvm-svn: 333111
show more ...
|
| #
391bcf88 |
| 17-May-2018 |
Changpeng Fang <[email protected]> |
AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops.
Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes
AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops.
Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working.
In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops. This will make CFG with infinite loops be structurized.
Reviewer: nhaehnle
Differential Revision: https://reviews.llvm.org/D46340
llvm-svn: 332625
show more ...
|