History log of /llvm-project-15.0.7/llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (Results 1 – 25 of 30)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# fd64a857 29-Jun-2022 Thomas Symalla <[email protected]>

[AMDGPU] Combine s_or_saveexec, s_xor instructions.

This patch merges a consecutive sequence of

s_or_saveexec s_o, s_i
s_xor exec, exec, s_o

into a single

s_andn2_saveexec s_o, s_i instruction.
T

[AMDGPU] Combine s_or_saveexec, s_xor instructions.

This patch merges a consecutive sequence of

s_or_saveexec s_o, s_i
s_xor exec, exec, s_o

into a single

s_andn2_saveexec s_o, s_i instruction.
This patch also cleans up the SIOptimizeExecMasking pass a bit.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D129073

show more ...


# 851a5efe 23-Jun-2022 Nico Weber <[email protected]>

Revert "[fastalloc] Support allocating specific register class in fastalloc"

This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b.
Breaks a few things, see comments on https://reviews.llvm.o

Revert "[fastalloc] Support allocating specific register class in fastalloc"

This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b.
Breaks a few things, see comments on https://reviews.llvm.org/D128437
There's disagreement about the best fix.
So let's keep HEAD green while discussions are happening.

show more ...


# 719658d0 23-Jun-2022 Luo, Yuanke <[email protected]>

[fastalloc] Support allocating specific register class in fastalloc

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA der

[fastalloc] Support allocating specific register class in fastalloc

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Differential Revision: https://reviews.llvm.org/D126771

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 56a5d788 07-Jan-2022 Christudasan Devadasan <[email protected]>

[AMDGPU] Disable optimizeEndCf at -O0

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116819


Revision tags: llvmorg-13.0.1-rc1
# 18f93512 19-Nov-2021 RamNalamothu <[email protected]>

[AMDGPU] Do not generate ELF symbols for the local branch target labels

The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slow

[AMDGPU] Do not generate ELF symbols for the local branch target labels

The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slows down the loader,
and is only used to simplify disassembly.

Use '--symbolize-operands' with llvm-objdump to improve readability of the
branch target operands in disassembly.

Fixes: SWDEV-312223

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D114273

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 208332de 19-Apr-2021 Ruiling Song <[email protected]>

[AMDGPU] Add Optimize VGPR LiveRange Pass.

This pass aims to optimize VGPR live-range in a typical divergent if-else
control flow. For example:

def(a)
if(cond)
use(a)
... // A
else
use(a)

As

[AMDGPU] Add Optimize VGPR LiveRange Pass.

This pass aims to optimize VGPR live-range in a typical divergent if-else
control flow. For example:

def(a)
if(cond)
use(a)
... // A
else
use(a)

As AMDGPU access vgpr with respect to active-mask, we can mark `a` as
dead in region A. For details, please refer to the comments in
implementation file.

The pass is enabled by default, the frontend can disable it through
"-amdgpu-opt-vgpr-liverange=false".

Differential Revision: https://reviews.llvm.org/D102212

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1
# 60b1967c 21-Jan-2020 Scott Linder <[email protected]>

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us t

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138

show more ...


# c262b69d 13-Mar-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Fix endcf collapse

Only collapse inner endcf if the outer one belongs to SI_IF.
If it does belong to SI_ELSE then mask being restored in fact
a partial inverse of what we need.

Differentia

[AMDGPU] Fix endcf collapse

Only collapse inner endcf if the outer one belongs to SI_IF.
If it does belong to SI_ELSE then mask being restored in fact
a partial inverse of what we need.

Differential Revision: https://reviews.llvm.org/D76154

show more ...


# 32e90cbc 13-Mar-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Disable endcf collapse

There are some functional regressions and I suspect our
scopes are not as perfectly enclosed as I expected.
Disable it for now.

Differential Revision: https://review

[AMDGPU] Disable endcf collapse

There are some functional regressions and I suspect our
scopes are not as perfectly enclosed as I expected.
Disable it for now.

Differential Revision: https://reviews.llvm.org/D76148

show more ...


# a7352864 12-Mar-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Simplify exec copies

The patch removes late endcf handling and only leaves the
related portion with redundant exec mask copy elimination.

Differential Revision: https://reviews.llvm.org/D7

[AMDGPU] Simplify exec copies

The patch removes late endcf handling and only leaves the
related portion with redundant exec mask copy elimination.

Differential Revision: https://reviews.llvm.org/D76095

show more ...


# 360aff04 11-Mar-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Simplify nested SI_END_CF

This is to replace the optimization from the SIOptimizeExecMaskingPreRA.
We have less opportunities in the control flow lowering because many
VGPR copies are still

[AMDGPU] Simplify nested SI_END_CF

This is to replace the optimization from the SIOptimizeExecMaskingPreRA.
We have less opportunities in the control flow lowering because many
VGPR copies are still in place and will be removed later, but we know
for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64
with EXEC.

The subsequent change needs to convert s_and_saveexec into s_and and
address new TODO lines in tests, then code block guarded by the
-amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer
will be removed.

Differential Revision: https://reviews.llvm.org/D76033

show more ...


# 9801e546 10-Mar-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Disable nested endcf collapse

The assumption is that conditional regions are perfectly nested
and a mask restored at the exit from the inner block will be
completely covered by a mask resto

[AMDGPU] Disable nested endcf collapse

The assumption is that conditional regions are perfectly nested
and a mask restored at the exit from the inner block will be
completely covered by a mask restored in the outer.

It turns out with our current structurizer this is not always
the case.

Disable the optimization for now, but I want to keep it around
for a while to either try after further structurizer changes or
to move it into control flow lowering where we have more info
and reuse the test.

Differential Revision: https://reviews.llvm.org/D75958

show more ...


# e53a9d96 22-Jan-2020 cdevadas <[email protected]>

Resubmit: [AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
h

Resubmit: [AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092

show more ...


# a80291ce 21-Jan-2020 Nicolai Hähnle <[email protected]>

Revert "[AMDGPU] Invert the handling of skip insertion."

This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Me

Revert "[AMDGPU] Invert the handling of skip insertion."

This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Mesa.

show more ...


Revision tags: llvmorg-11-init
# 0dc6c249 10-Jan-2020 cdevadas <[email protected]>

[AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an opt

[AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092

show more ...


Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# f9f81289 29-Aug-2019 Jordan Rupprecht <[email protected]>

Revert [MBP] Disable aggressive loop rotate in plain mode

This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a)

It causes many benchmark regressions, internally and in llvm's b

Revert [MBP] Disable aggressive loop rotate in plain mode

This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a)

It causes many benchmark regressions, internally and in llvm's benchmark suite.

llvm-svn: 370398

show more ...


# 51f48295 22-Aug-2019 Guozhi Wei <[email protected]>

[MBP] Disable aggressive loop rotate in plain mode

Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile informat

[MBP] Disable aggressive loop rotate in plain mode

Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.

To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.

Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 369664

show more ...


Revision tags: llvmorg-9.0.0-rc2
# a45f301f 12-Aug-2019 Hans Wennborg <[email protected]>

Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"

It caused assertions to fire when building Chromium:

lib/CodeGen/LiveDebugValues.cpp:331: bool
{anonymous}::LiveDebugValues::

Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"

It caused assertions to fire when building Chromium:

lib/CodeGen/LiveDebugValues.cpp:331: bool
{anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion
`Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.

See https://crbug.com/992871#c3 for how to reproduce.

> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
>
> To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
>
> Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 368579

show more ...


# 80347c3a 08-Aug-2019 Guozhi Wei <[email protected]>

[MBP] Disable aggressive loop rotate in plain mode

Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile informat

[MBP] Disable aggressive loop rotate in plain mode

Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.

To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.

Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 368339

show more ...


Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3
# d2210af3 14-Jun-2019 Guozhi Wei <[email protected]>

[MBP] Move a latch block with conditional exit and multi predecessors to top of loop

Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another comm

[MBP] Move a latch block with conditional exit and multi predecessors to top of loop

Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:

* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors

If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.

Differential Revision: https://reviews.llvm.org/D43256

llvm-svn: 363471

show more ...


Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# c2814e12 17-Apr-2019 Rhys Perry <[email protected]>

AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions

Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less co

AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions

Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.

Reviewers: arsen, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60824

llvm-svn: 358592

show more ...


# 4d47ac3b 27-Mar-2019 Matt Arsenault <[email protected]>

AMDGPU: Add additional MIR tests for exec mask optimizations

Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patte

AMDGPU: Add additional MIR tests for exec mask optimizations

Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.

Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.

llvm-svn: 357091

show more ...


# b008b37b 25-Mar-2019 Matt Arsenault <[email protected]>

AMDGPU: Make collapse-endcf test more useful

Without a VALU instruction in the return block, these were mostly
testing the path to delete exec mask code before s_endpgm rather than
the end cf handli

AMDGPU: Make collapse-endcf test more useful

Without a VALU instruction in the return block, these were mostly
testing the path to delete exec mask code before s_endpgm rather than
the end cf handling.

llvm-svn: 356955

show more ...


Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1
# 20d4795d 29-Jun-2018 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Enable LICM in the BE pipeline

This allows to hoist code portion to compute reciprocal of loop
invariant denominator in integer division after codegen prepare
expansion.

Differential Revis

[AMDGPU] Enable LICM in the BE pipeline

This allows to hoist code portion to compute reciprocal of loop
invariant denominator in integer division after codegen prepare
expansion.

Differential Revision: https://reviews.llvm.org/D48604

llvm-svn: 335988

show more ...


Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2
# 2a22c5de 02-Feb-2018 Yaxun Liu <[email protected]>

[AMDGPU] Switch to the new addr space mapping by default

This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101


12