|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
18f93512 |
| 19-Nov-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
c80d8a8c |
| 23-Jul-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] MachineLICM cannot hoist VALU
MachineLoop::isLoopInvariant() returns false for all VALU because of the exec use. Check TII::isIgnorableUse() to allow hoisting.
That unfortunately results i
[AMDGPU] MachineLICM cannot hoist VALU
MachineLoop::isLoopInvariant() returns false for all VALU because of the exec use. Check TII::isIgnorableUse() to allow hoisting.
That unfortunately results in higher register consumption since MachineLICM does not adequately estimate pressure. Therefor I think it shall only be enabled after D107677 even though it does not depend on it.
Differential Revision: https://reviews.llvm.org/D107859
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
| #
2f499b9a |
| 19-Dec-2020 |
Tony <[email protected]> |
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant cache
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
show more ...
|
| #
4b806473 |
| 01-Jan-2021 |
Roman Lebedev <[email protected]> |
[AMDGPU][SimplifyCFG] Teach AMDGPUUnifyDivergentExitNodes to preserve {,Post}DomTree
This is a (last big?) part of the patch series to make SimplifyCFG preserve DomTree. Currently, it still does not
[AMDGPU][SimplifyCFG] Teach AMDGPUUnifyDivergentExitNodes to preserve {,Post}DomTree
This is a (last big?) part of the patch series to make SimplifyCFG preserve DomTree. Currently, it still does not actually preserve it, even thought it is pretty much fully updated to preserve it.
Once the default is flipped, a valid DomTree must be passed into simplifyCFG, which means that whatever pass calls simplifyCFG, should also be smart about DomTree's.
As far as i can see from `check-llvm` with default flipped, this is the last LLVM test batch (other than bugpoint tests) that needed fixes to not break with default flipped.
The changes here are boringly identical to the ones i did over 42+ times/commits recently already, so while AMDGPU is outside of my normal ecosystem, i'm going to go for post-commit review here, like in all the other 42+ changes.
Note that while the pass is taught to preserve {,Post}DomTree, it still doesn't do that by default, because simplifycfg still doesn't do that by default, and flipping default in this pass will implicitly flip the default for simplifycfg. That will happen, but not right now.
show more ...
|
| #
b23b1bcc |
| 01-Jan-2021 |
Roman Lebedev <[email protected]> |
[NFC][CodeGen][Tests] Mark all tests that fail to preserve DomTree for SimplifyCFG as such
These tests start to fail when the SimplifyCFG's default regarding DomTree updating is switched on, so mark
[NFC][CodeGen][Tests] Mark all tests that fail to preserve DomTree for SimplifyCFG as such
These tests start to fail when the SimplifyCFG's default regarding DomTree updating is switched on, so mark them as needing changes.
show more ...
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
7ecf1969 |
| 17-Nov-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary.
2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation.
Differential Revision: https://reviews.llvm.org/D91636
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1 |
|
| #
3a186657 |
| 17-Jul-2020 |
Carl Ritson <[email protected]> |
[AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation
When SCC is dead, but VCC is required then replace s_and / s_andn2 with s_mov into VCC when mask value is 0 or -1.
Reviewed By: rampite
[AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation
When SCC is dead, but VCC is required then replace s_and / s_andn2 with s_mov into VCC when mask value is 0 or -1.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D83850
show more ...
|
|
Revision tags: llvmorg-12-init |
|
| #
67422612 |
| 15-Jul-2020 |
Carl Ritson <[email protected]> |
[AMDGPU] Apply pre-emit s_cbranch_vcc optimation to more patterns
Add handling of s_andn2 and mask of 0. This eliminates redundant instructions from uniform control flow.
Reviewed By: rampitec
Dif
[AMDGPU] Apply pre-emit s_cbranch_vcc optimation to more patterns
Add handling of s_andn2 and mask of 0. This eliminates redundant instructions from uniform control flow.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D83641
show more ...
|
|
Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
d8f651d3 |
| 09-Jun-2020 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] Enable structurizer workarounds by default
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D81211
|
|
Revision tags: llvmorg-10.0.1-rc1 |
|
| #
17e13da2 |
| 07-May-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Re-auto-generate test checks
|
| #
72e87549 |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard
Differential Revision: https://reviews.llvm.org/D77228
|
| #
51dc0283 |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
Revert e1730cfeb3588f20dcf4a96b181ad52761666e52
|
| #
e1730cfe |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard
Differential Revision: https://reviews.llvm.org/D77228
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
| #
534d8866 |
| 28-Feb-2020 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] add generated checks for some LIT tests
This is in prepration for further changes that affect these tests.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D75403
|
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
| #
6934202d |
| 05-Dec-2018 |
Craig Topper <[email protected]> |
[MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM
It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this co
[MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM
It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber.
This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef.
The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before.
Differential Revision: https://reviews.llvm.org/D55102
llvm-svn: 348330
show more ...
|
| #
e86c8d33 |
| 12-Nov-2018 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z
Sometimes after basic block placement we end up with a code like:
sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz
This
[AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z
Sometimes after basic block placement we end up with a code like:
sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz
This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch.
This is essentially a single s_cbranch_execz instruction when moved into a single new basic block.
Differential Revision: https://reviews.llvm.org/D54164
llvm-svn: 346690
show more ...
|
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
| #
20d4795d |
| 29-Jun-2018 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion.
Differential Revis
[AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion.
Differential Revision: https://reviews.llvm.org/D48604
llvm-svn: 335988
show more ...
|
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2 |
|
| #
391bcf88 |
| 17-May-2018 |
Changpeng Fang <[email protected]> |
AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops.
Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes
AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops.
Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working.
In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops. This will make CFG with infinite loops be structurized.
Reviewer: nhaehnle
Differential Revision: https://reviews.llvm.org/D46340
llvm-svn: 332625
show more ...
|
|
Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3 |
|
| #
70359ac6 |
| 02-Jun-2017 |
Mark Searles <[email protected]> |
[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.
-enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass
Differential Revision: https://reviews.l
[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.
-enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass
Differential Revision: https://reviews.llvm.org/D33730
llvm-svn: 304551
show more ...
|
|
Revision tags: llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1 |
|
| #
3dbeefa9 |
| 21-Mar-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default ca
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel).
llvm-svn: 298444
show more ...
|
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2 |
|
| #
7aad8fd8 |
| 24-Jan-2017 |
Matt Arsenault <[email protected]> |
Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@mi
Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <[email protected]>
llvm-svn: 292982
show more ...
|
|
Revision tags: llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1, llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1 |
|
| #
45bb48ea |
| 13-Jun-2015 |
Tom Stellard <[email protected]> |
R600 -> AMDGPU rename
llvm-svn: 239657
|