|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
851a5efe |
| 23-Jun-2022 |
Nico Weber <[email protected]> |
Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.o
Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.org/D128437 There's disagreement about the best fix. So let's keep HEAD green while discussions are happening.
show more ...
|
| #
719658d0 |
| 23-Jun-2022 |
Luo, Yuanke <[email protected]> |
[fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA der
[fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class.
Differential Revision: https://reviews.llvm.org/D126771
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
f510045d |
| 14-Jan-2022 |
Jay Foad <[email protected]> |
[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].
Differential Revision: https://reviews.llvm.org/D117298
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
18f93512 |
| 19-Nov-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
c46d99e4 |
| 07-Jul-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Refine -O0 and -O1 passes.
Differential Revision: https://reviews.llvm.org/D105579
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
f9a8c6a0 |
| 12-Apr-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Save VGPR of whole wave when spilling
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes
[AMDGPU] Save VGPR of whole wave when spilling
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead.
The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR?
If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC
If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes)
Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains.
Differential Revision: https://reviews.llvm.org/D96336
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
be2afbd0 |
| 20-Oct-2020 |
Carl Ritson <[email protected]> |
[AMDGPU] Remove fix up operand from SI_ELSE
Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instru
[AMDGPU] Remove fix up operand from SI_ELSE
Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instructions during pre-RA optimisation.
This facilitates passes (i.e. SIWholeQuadMode) adding exec mask manipulation post control flow lowering, and pre control flow lower passes do not need to be aware of SI_ELSE handling.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D89644
show more ...
|
| #
3fdf3b15 |
| 14-Oct-2020 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4 |
|
| #
89baeaef |
| 22-Sep-2020 |
Matt Arsenault <[email protected]> |
Reapply "RegAllocFast: Rewrite and improve"
This reverts commit 73a6a164b84a8195defbb8f5eeb6faecfc478ad4.
|
|
Revision tags: llvmorg-11.0.0-rc3 |
|
| #
73a6a164 |
| 22-Sep-2020 |
Muhammad Omair Javaid <[email protected]> |
Revert "Reapply Revert "RegAllocFast: Rewrite and improve""
This reverts commit 55f9f87da2c2ad791b9e62cccb1c035e037444fa.
Breaks following buildbots: http://lab.llvm.org:8011/builders/lldb-arm-ubun
Revert "Reapply Revert "RegAllocFast: Rewrite and improve""
This reverts commit 55f9f87da2c2ad791b9e62cccb1c035e037444fa.
Breaks following buildbots: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306 http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154
show more ...
|
| #
55f9f87d |
| 21-Sep-2020 |
Matt Arsenault <[email protected]> |
Reapply Revert "RegAllocFast: Rewrite and improve"
This reverts commit dbd53a1f0c939a55e7719c39d08179468f9ad3dc.
Needed lldb test updates
|
| #
dbd53a1f |
| 19-Sep-2020 |
Eric Christopher <[email protected]> |
Temporarily Revert "RegAllocFast: Rewrite and improve" as it's breaking a few tests in the lldb test suite.
Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio
Temporarily Revert "RegAllocFast: Rewrite and improve" as it's breaking a few tests in the lldb test suite.
Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio
This reverts commit c8757ff3aa7dd7a25a6343f6ef74a70c7be04325.
show more ...
|
| #
c8757ff3 |
| 14-Sep-2020 |
Matt Arsenault <[email protected]> |
RegAllocFast: Rewrite and improve
This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details:
Track regi
RegAllocFast: Rewrite and improve
This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details:
Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases. Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics).
Check register mask operands (calls) instead of conservatively assuming everything is clobbered. Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use. Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition. When testing this on the full llvm test-suite including SPEC externals I measured:
average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test) 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count)
Also adds a few testcases that were broken without this patch, in particular bug 47278.
Patch mostly by Matthias Braun
show more ...
|
| #
3c2a7bd2 |
| 02-Sep-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove code to handle tied si_else operands
This has not used tied operands for a long time.
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
da33c96d |
| 03-Jun-2020 |
Carl Ritson <[email protected]> |
[AMDGPU] Make SGPR spills exec mask agnostic
Explicitly set the exec mask for SGPR spills and reloads. This fixes a bug where SGPR spills to memory could be incorrect if the exec mask was 0 (or diff
[AMDGPU] Make SGPR spills exec mask agnostic
Explicitly set the exec mask for SGPR spills and reloads. This fixes a bug where SGPR spills to memory could be incorrect if the exec mask was 0 (or differed between spill and reload).
Additionally pack scalar subregisters (upto 16/32 per VGPR), so that the majority of scalar types can be spilt or reloaded with a simple memory access. This should amortize some of the additional overhead of manipulating the exec mask.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D80282
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc1 |
|
| #
72e87549 |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard
Differential Revision: https://reviews.llvm.org/D77228
|
| #
51dc0283 |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
Revert e1730cfeb3588f20dcf4a96b181ad52761666e52
|
| #
e1730cfe |
| 06-Apr-2020 |
Konstantin Pyzhov <[email protected]> |
[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard
Differential Revision: https://reviews.llvm.org/D77228
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
60b1967c |
| 21-Jan-2020 |
Scott Linder <[email protected]> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
| #
e53a9d96 |
| 22-Jan-2020 |
cdevadas <[email protected]> |
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to h
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary.
This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
show more ...
|
| #
a80291ce |
| 21-Jan-2020 |
Nicolai Hähnle <[email protected]> |
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Me
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.
show more ...
|
|
Revision tags: llvmorg-11-init |
|
| #
0dc6c249 |
| 10-Jan-2020 |
cdevadas <[email protected]> |
[AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an opt
[AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary.
This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
show more ...
|
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
c4d256a5 |
| 14-Oct-2019 |
Alexander Timofeev <[email protected]> |
[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues
[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.
Discovered issues were addressed in the following commits:
https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731
This change brings back AMDGPU specific changes.
Reviewed by: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D68635
llvm-svn: 374767
show more ...
|
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2 |
|
| #
37bd9bd1 |
| 06-Jun-2019 |
Alexander Timofeev <[email protected]> |
[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd
"Divergence driven ISel. Assign register class for cross block values according to the divergence." that dis
[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd
"Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before.
This change reverts AMDGPU specific changes and keeps common part unaffected.
llvm-svn: 362749
show more ...
|
| #
ba447bae |
| 26-May-2019 |
Alexander Timofeev <[email protected]> |
[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assi
[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed.
llvm-svn: 361741
show more ...
|