History log of /llvm-project-15.0.7/llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll (Results 1 – 25 of 47)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 851a5efe 23-Jun-2022 Nico Weber <[email protected]>

Revert "[fastalloc] Support allocating specific register class in fastalloc"

This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b.
Breaks a few things, see comments on https://reviews.llvm.o

Revert "[fastalloc] Support allocating specific register class in fastalloc"

This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b.
Breaks a few things, see comments on https://reviews.llvm.org/D128437
There's disagreement about the best fix.
So let's keep HEAD green while discussions are happening.

show more ...


# 719658d0 23-Jun-2022 Luo, Yuanke <[email protected]>

[fastalloc] Support allocating specific register class in fastalloc

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA der

[fastalloc] Support allocating specific register class in fastalloc

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Differential Revision: https://reviews.llvm.org/D126771

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# f510045d 14-Jan-2022 Jay Foad <[email protected]>

[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.

Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].

Differential Revision: https://reviews.llvm.org/D117298


Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 18f93512 19-Nov-2021 RamNalamothu <[email protected]>

[AMDGPU] Do not generate ELF symbols for the local branch target labels

The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slow

[AMDGPU] Do not generate ELF symbols for the local branch target labels

The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slows down the loader,
and is only used to simplify disassembly.

Use '--symbolize-operands' with llvm-objdump to improve readability of the
branch target operands in disassembly.

Fixes: SWDEV-312223

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D114273

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# c46d99e4 07-Jul-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Refine -O0 and -O1 passes.

Differential Revision: https://reviews.llvm.org/D105579


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# f9a8c6a0 12-Apr-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Save VGPR of whole wave when spilling

Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes

[AMDGPU] Save VGPR of whole wave when spilling

Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes of the VGPR. We even need to save the VGPR if it is marked as
dead.

The generated code depends on two things:
- Can we scavenge an SGPR to save EXEC?
- And can we scavenge a VGPR?

If we can scavenge an SGPR, we
- save EXEC into the SGPR
- set the needed lane mask
- save the temporary VGPR
- write the spilled SGPR into VGPR lanes
- save the VGPR again to the target stack slot
- restore the VGPR
- restore EXEC

If we were not able to scavenge an SGPR, we do the same operations, but
everytime the temporary VGPR is written to memory, we
- write VGPR to memory
- flip exec (s_not exec, exec)
- write VGPR again (previously inactive lanes)

Surprisingly often, we are able to scavenge an SGPR, even though we are
at the brink of running out of SGPRs.
Scavenging a VGPR does not have a great effect (saves three instructions
if no SGPR was scavenged), but we need to know if the VGPR we use is
live before or not, otherwise the machine verifier complains.

Differential Revision: https://reviews.llvm.org/D96336

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# be2afbd0 20-Oct-2020 Carl Ritson <[email protected]>

[AMDGPU] Remove fix up operand from SI_ELSE

Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified. Instead always emit code that handles EXEC and
remove unnecessary instru

[AMDGPU] Remove fix up operand from SI_ELSE

Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified. Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.

This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D89644

show more ...


# 3fdf3b15 14-Oct-2020 Konstantin Zhuravlyov <[email protected]>

AMDGPU: Update AMDHSA code object version handling

Differential Revision: https://reviews.llvm.org/D89076


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4
# 89baeaef 22-Sep-2020 Matt Arsenault <[email protected]>

Reapply "RegAllocFast: Rewrite and improve"

This reverts commit 73a6a164b84a8195defbb8f5eeb6faecfc478ad4.


Revision tags: llvmorg-11.0.0-rc3
# 73a6a164 22-Sep-2020 Muhammad Omair Javaid <[email protected]>

Revert "Reapply Revert "RegAllocFast: Rewrite and improve""

This reverts commit 55f9f87da2c2ad791b9e62cccb1c035e037444fa.

Breaks following buildbots:
http://lab.llvm.org:8011/builders/lldb-arm-ubun

Revert "Reapply Revert "RegAllocFast: Rewrite and improve""

This reverts commit 55f9f87da2c2ad791b9e62cccb1c035e037444fa.

Breaks following buildbots:
http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306
http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154

show more ...


# 55f9f87d 21-Sep-2020 Matt Arsenault <[email protected]>

Reapply Revert "RegAllocFast: Rewrite and improve"

This reverts commit dbd53a1f0c939a55e7719c39d08179468f9ad3dc.

Needed lldb test updates


# dbd53a1f 19-Sep-2020 Eric Christopher <[email protected]>

Temporarily Revert "RegAllocFast: Rewrite and improve"
as it's breaking a few tests in the lldb test suite.

Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio

Temporarily Revert "RegAllocFast: Rewrite and improve"
as it's breaking a few tests in the lldb test suite.

Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio

This reverts commit c8757ff3aa7dd7a25a6343f6ef74a70c7be04325.

show more ...


# c8757ff3 14-Sep-2020 Matt Arsenault <[email protected]>

RegAllocFast: Rewrite and improve

This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:

Track regi

RegAllocFast: Rewrite and improve

This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:

Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).

Check register mask operands (calls) instead of conservatively
assuming everything is clobbered. Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition. When testing this on the full llvm
test-suite including SPEC externals I measured:

average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)

Also adds a few testcases that were broken without this patch, in
particular bug 47278.

Patch mostly by Matthias Braun

show more ...


# 3c2a7bd2 02-Sep-2020 Matt Arsenault <[email protected]>

AMDGPU: Remove code to handle tied si_else operands

This has not used tied operands for a long time.


Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2
# da33c96d 03-Jun-2020 Carl Ritson <[email protected]>

[AMDGPU] Make SGPR spills exec mask agnostic

Explicitly set the exec mask for SGPR spills and reloads.
This fixes a bug where SGPR spills to memory could be incorrect
if the exec mask was 0 (or diff

[AMDGPU] Make SGPR spills exec mask agnostic

Explicitly set the exec mask for SGPR spills and reloads.
This fixes a bug where SGPR spills to memory could be incorrect
if the exec mask was 0 (or differed between spill and reload).

Additionally pack scalar subregisters (upto 16/32 per VGPR),
so that the majority of scalar types can be spilt or reloaded
with a simple memory access. This should amortize some of the
additional overhead of manipulating the exec mask.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D80282

show more ...


Revision tags: llvmorg-10.0.1-rc1
# 72e87549 06-Apr-2020 Konstantin Pyzhov <[email protected]>

[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.

Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228


# 51dc0283 06-Apr-2020 Konstantin Pyzhov <[email protected]>

Revert e1730cfeb3588f20dcf4a96b181ad52761666e52


# e1730cfe 06-Apr-2020 Konstantin Pyzhov <[email protected]>

[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.

Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1
# 60b1967c 21-Jan-2020 Scott Linder <[email protected]>

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us t

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138

show more ...


# e53a9d96 22-Jan-2020 cdevadas <[email protected]>

Resubmit: [AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
h

Resubmit: [AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092

show more ...


# a80291ce 21-Jan-2020 Nicolai Hähnle <[email protected]>

Revert "[AMDGPU] Invert the handling of skip insertion."

This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Me

Revert "[AMDGPU] Invert the handling of skip insertion."

This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Mesa.

show more ...


Revision tags: llvmorg-11-init
# 0dc6c249 10-Jan-2020 cdevadas <[email protected]>

[AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an opt

[AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092

show more ...


Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# c4d256a5 14-Oct-2019 Alexander Timofeev <[email protected]>

[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'

Detailed description:

After https://reviews.llvm.org/D59990 submit several issues

[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'

Detailed description:

After https://reviews.llvm.org/D59990 submit several issues were discovered.
Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.

Discovered issues were addressed in the following commits:

https://reviews.llvm.org/D67662
https://reviews.llvm.org/D67101
https://reviews.llvm.org/D63953
https://reviews.llvm.org/D63731

This change brings back AMDGPU specific changes.

Reviewed by: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D68635

llvm-svn: 374767

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2
# 37bd9bd1 06-Jun-2019 Alexander Timofeev <[email protected]>

[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd

"Divergence driven ISel. Assign register class for cross block values
according to the divergence."
that dis

[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd

"Divergence driven ISel. Assign register class for cross block values
according to the divergence."
that discovered the design flaw leading to several issues that
required to be solved before.

This change reverts AMDGPU specific changes and keeps common part
unaffected.

llvm-svn: 362749

show more ...


# ba447bae 26-May-2019 Alexander Timofeev <[email protected]>

[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.

Details: To make instruction selection really divergence driven it is necessary to assi

[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.

Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.

llvm-svn: 361741

show more ...


12