|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
fd64a857 |
| 29-Jun-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. T
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. This patch also cleans up the SIOptimizeExecMasking pass a bit.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D129073
show more ...
|
| #
851a5efe |
| 23-Jun-2022 |
Nico Weber <[email protected]> |
Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.o
Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.org/D128437 There's disagreement about the best fix. So let's keep HEAD green while discussions are happening.
show more ...
|
| #
719658d0 |
| 23-Jun-2022 |
Luo, Yuanke <[email protected]> |
[fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA der
[fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class.
Differential Revision: https://reviews.llvm.org/D126771
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
56a5d788 |
| 07-Jan-2022 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Disable optimizeEndCf at -O0
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D116819
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
18f93512 |
| 19-Nov-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
208332de |
| 19-Apr-2021 |
Ruiling Song <[email protected]> |
[AMDGPU] Add Optimize VGPR LiveRange Pass.
This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example:
def(a) if(cond) use(a) ... // A else use(a)
As
[AMDGPU] Add Optimize VGPR LiveRange Pass.
This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example:
def(a) if(cond) use(a) ... // A else use(a)
As AMDGPU access vgpr with respect to active-mask, we can mark `a` as dead in region A. For details, please refer to the comments in implementation file.
The pass is enabled by default, the frontend can disable it through "-amdgpu-opt-vgpr-liverange=false".
Differential Revision: https://reviews.llvm.org/D102212
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
60b1967c |
| 21-Jan-2020 |
Scott Linder <[email protected]> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
| #
c262b69d |
| 13-Mar-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Fix endcf collapse
Only collapse inner endcf if the outer one belongs to SI_IF. If it does belong to SI_ELSE then mask being restored in fact a partial inverse of what we need.
Differentia
[AMDGPU] Fix endcf collapse
Only collapse inner endcf if the outer one belongs to SI_IF. If it does belong to SI_ELSE then mask being restored in fact a partial inverse of what we need.
Differential Revision: https://reviews.llvm.org/D76154
show more ...
|
| #
32e90cbc |
| 13-Mar-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our scopes are not as perfectly enclosed as I expected. Disable it for now.
Differential Revision: https://review
[AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our scopes are not as perfectly enclosed as I expected. Disable it for now.
Differential Revision: https://reviews.llvm.org/D76148
show more ...
|
| #
a7352864 |
| 12-Mar-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Simplify exec copies
The patch removes late endcf handling and only leaves the related portion with redundant exec mask copy elimination.
Differential Revision: https://reviews.llvm.org/D7
[AMDGPU] Simplify exec copies
The patch removes late endcf handling and only leaves the related portion with redundant exec mask copy elimination.
Differential Revision: https://reviews.llvm.org/D76095
show more ...
|
| #
360aff04 |
| 11-Mar-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Simplify nested SI_END_CF
This is to replace the optimization from the SIOptimizeExecMaskingPreRA. We have less opportunities in the control flow lowering because many VGPR copies are still
[AMDGPU] Simplify nested SI_END_CF
This is to replace the optimization from the SIOptimizeExecMaskingPreRA. We have less opportunities in the control flow lowering because many VGPR copies are still in place and will be removed later, but we know for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64 with EXEC.
The subsequent change needs to convert s_and_saveexec into s_and and address new TODO lines in tests, then code block guarded by the -amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer will be removed.
Differential Revision: https://reviews.llvm.org/D76033
show more ...
|
| #
9801e546 |
| 10-Mar-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Disable nested endcf collapse
The assumption is that conditional regions are perfectly nested and a mask restored at the exit from the inner block will be completely covered by a mask resto
[AMDGPU] Disable nested endcf collapse
The assumption is that conditional regions are perfectly nested and a mask restored at the exit from the inner block will be completely covered by a mask restored in the outer.
It turns out with our current structurizer this is not always the case.
Disable the optimization for now, but I want to keep it around for a while to either try after further structurizer changes or to move it into control flow lowering where we have more info and reuse the test.
Differential Revision: https://reviews.llvm.org/D75958
show more ...
|
| #
e53a9d96 |
| 22-Jan-2020 |
cdevadas <[email protected]> |
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to h
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary.
This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
show more ...
|
| #
a80291ce |
| 21-Jan-2020 |
Nicolai Hähnle <[email protected]> |
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Me
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.
show more ...
|
|
Revision tags: llvmorg-11-init |
|
| #
0dc6c249 |
| 10-Jan-2020 |
cdevadas <[email protected]> |
[AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an opt
[AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary.
This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
show more ...
|
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
| #
f9f81289 |
| 29-Aug-2019 |
Jordan Rupprecht <[email protected]> |
Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a)
It causes many benchmark regressions, internally and in llvm's b
Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a)
It causes many benchmark regressions, internally and in llvm's benchmark suite.
llvm-svn: 370398
show more ...
|
| #
51f48295 |
| 22-Aug-2019 |
Guozhi Wei <[email protected]> |
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile informat
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 369664
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc2 |
|
| #
a45f301f |
| 12-Aug-2019 |
Hans Wennborg <[email protected]> |
Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"
It caused assertions to fire when building Chromium:
lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::
Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"
It caused assertions to fire when building Chromium:
lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.
See https://crbug.com/992871#c3 for how to reproduce.
> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. > > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. > > Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368579
show more ...
|
| #
80347c3a |
| 08-Aug-2019 |
Guozhi Wei <[email protected]> |
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile informat
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368339
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
| #
d2210af3 |
| 14-Jun-2019 |
Guozhi Wei <[email protected]> |
[MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another comm
[MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
llvm-svn: 363471
show more ...
|
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
| #
c2814e12 |
| 17-Apr-2019 |
Rhys Perry <[email protected]> |
AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less co
AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.
Reviewers: arsen, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60824
llvm-svn: 358592
show more ...
|
| #
4d47ac3b |
| 27-Mar-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Add additional MIR tests for exec mask optimizations
Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patte
AMDGPU: Add additional MIR tests for exec mask optimizations
Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns.
Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed.
llvm-svn: 357091
show more ...
|
| #
b008b37b |
| 25-Mar-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Make collapse-endcf test more useful
Without a VALU instruction in the return block, these were mostly testing the path to delete exec mask code before s_endpgm rather than the end cf handli
AMDGPU: Make collapse-endcf test more useful
Without a VALU instruction in the return block, these were mostly testing the path to delete exec mask code before s_endpgm rather than the end cf handling.
llvm-svn: 356955
show more ...
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
| #
20d4795d |
| 29-Jun-2018 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion.
Differential Revis
[AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion.
Differential Revision: https://reviews.llvm.org/D48604
llvm-svn: 335988
show more ...
|
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2 |
|
| #
2a22c5de |
| 02-Feb-2018 |
Yaxun Liu <[email protected]> |
[AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.
Differential Revision: https://reviews.llvm.org/D40955
llvm-svn: 324101
|