History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp (Results 1 – 25 of 81)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 732eed40 09-Mar-2022 Ruiling Song <[email protected]>

[AMDGPU] Mark GFX11 dual source blend export as strict-wqm

The instructions that generate the source of dual source blend export
should run in strict-wqm. That is if any lane in a quad is active,
we

[AMDGPU] Mark GFX11 dual source blend export as strict-wqm

The instructions that generate the source of dual source blend export
should run in strict-wqm. That is if any lane in a quad is active,
we need to enable all four lanes of that quad to make the shuffling
operation before exporting to dual source blend target work correctly.

Differential Revision: https://reviews.llvm.org/D127981

show more ...


Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 29621c13 11-Mar-2021 Piotr Sobczak <[email protected]>

[AMDGPU] Tag GFX11 LDS loads as using strict_wqm

LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad
(if any pixel is enabled in the quad, data is written
to all 4 pixels/threads in the quad).

Tag

[AMDGPU] Tag GFX11 LDS loads as using strict_wqm

LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad
(if any pixel is enabled in the quad, data is written
to all 4 pixels/threads in the quad).

Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm
to enforce this and avoid lane clobbering issues.
Note that only the instruction itself is tagged.
The implicit uses of these do not need to be set WQM.
The reduces unnecessary WQM calculation of M0.

Differential Revision: https://reviews.llvm.org/D127977

show more ...


# 4271a1ff 18-Jun-2022 Kazu Hirata <[email protected]>

[llvm] Call *set::insert without checking membership first (NFC)


# 37b37838 16-Mar-2022 Shengchen Kan <[email protected]>

[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments


# 98dd3905 11-Jan-2022 Ruiling Song <[email protected]>

AMDGPU: Use removeAllRegUnitsForPhysReg()

I met the issue here when working on something else.
Actually we have already reserved EXEC, but it looks
like the register coalescer is causing the sub-reg

AMDGPU: Use removeAllRegUnitsForPhysReg()

I met the issue here when working on something else.
Actually we have already reserved EXEC, but it looks
like the register coalescer is causing the sub-register
of EXEC appears in LiveIntervals. I have not looked
deeper why register coalscer have such behavior, but
removeAllRegUnitsForPhysReg() is the right way.

Reviewed By: critson, foad, arsenm

Differential Revision: https://reviews.llvm.org/D117014

show more ...


# 6527b2a4 18-Feb-2022 Sebastian Neubauer <[email protected]>

[AMDGPU][NFC] Fix typos

Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235


# aa418b91 25-Jan-2022 Konstantina <[email protected]>

[AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D118096


# f78c1b07 17-Dec-2021 Kazu Hirata <[email protected]>

[Target] Use range-based for loops (NFC)


# 976f3b3c 24-Nov-2021 Carl Ritson <[email protected]>

[AMDGPU] Only allow implicit WQM in pixel shaders

Implicit derivatives are only valid in pixel shaders,
hence only implicitly enable WQM for pixel shaders.
This avoids unintended WQM in other shader

[AMDGPU] Only allow implicit WQM in pixel shaders

Implicit derivatives are only valid in pixel shaders,
hence only implicitly enable WQM for pixel shaders.
This avoids unintended WQM in other shader types (e.g. compute)
when image sampling instructions are used.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114414

show more ...


# d45cb1d7 23-Nov-2021 Kazu Hirata <[email protected]>

[llvm] Use range-based for loops (NFC)


# 2c4ba3e9 05-Nov-2021 Kazu Hirata <[email protected]>

[Target] Use make_early_inc_range (NFC)


# be1a8f88 27-Oct-2021 Jay Foad <[email protected]>

[AMDGPU] Really preserve LiveVariables in SILowerControlFlow

https://bugs.llvm.org/show_bug.cgi?id=52204

Differential Revision: https://reviews.llvm.org/D112731


# 36deb9a6 15-Oct-2021 Jay Foad <[email protected]>

Add new MachineFunction property FailsVerification

TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this

Add new MachineFunction property FailsVerification

TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397

show more ...


# 67cfefeb 06-May-2021 Carl Ritson <[email protected]>

[AMDGPU] Fix WQM failure with single block inactive demote

Instruction test for inactive kill/demote needs to be based on
actual opcode not whether instruction would be lowered to demote.

Reviewed

[AMDGPU] Fix WQM failure with single block inactive demote

Instruction test for inactive kill/demote needs to be based on
actual opcode not whether instruction would be lowered to demote.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D101966

show more ...


# 79cb3ba0 21-Apr-2021 Jay Foad <[email protected]>

[AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands

STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so
there is no need to add another implicit use of $exec when

[AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands

STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so
there is no need to add another implicit use of $exec when lowering them
to V_MOV_B32 instructions.

Differential Revision: https://reviews.llvm.org/D100969

show more ...


# 1a4bc3ab 18-Mar-2021 Carl Ritson <[email protected]>

[AMDGPU] Avoid unnecessary graph visits during WQM marking

Avoid revisiting nodes with the same set of defined lanes by
using a unified visited set which integrates lanes into the key.
This retains

[AMDGPU] Avoid unnecessary graph visits during WQM marking

Avoid revisiting nodes with the same set of defined lanes by
using a unified visited set which integrates lanes into the key.
This retains the intent of the original code by still revisiting
a subgraph if a different set of lanes is defined and hence
marking might progress differently.

Note: default size of the visited set has been confirmed to
cover >99% of invocations in large array of test shaders.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D98772

show more ...


# 13877db2 15-Mar-2021 Carl Ritson <[email protected]>

[AMDGPU] Fix shortfalls in WQM marking

When tracking defined lanes through phi nodes in the live range
graph each branch of the phi must be handled independently.
Also rewrite the marking algorithm

[AMDGPU] Fix shortfalls in WQM marking

When tracking defined lanes through phi nodes in the live range
graph each branch of the phi must be handled independently.
Also rewrite the marking algorithm to reduce unnecessary
operations.

Previously a shared set of defined lanes was used which caused
marking to stop prematurely. This was observable in existing lit
tests, but test patterns did not cover this detail.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D98614

show more ...


Revision tags: llvmorg-12.0.0-rc3
# 4672bac1 03-Mar-2021 Piotr Sobczak <[email protected]>

[AMDGPU] Introduce Strict WQM mode

* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference t

[AMDGPU] Introduce Strict WQM mode

* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
* The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96258

show more ...


# c3ce7bae 02-Mar-2021 Piotr Sobczak <[email protected]>

[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm

* Introduce the new intrinsic amdgcn_strict_wwm
* Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix

[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm

* Introduce the new intrinsic amdgcn_strict_wwm
* Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257

show more ...


Revision tags: llvmorg-12.0.0-rc2
# 8181dcd3 19-Feb-2021 Carl Ritson <[email protected]>

[AMDGPU] WQM/WWM: Fix marking of partial definitions

Track lanes when processing definitions for marking WQM/WWM.
If all lanes have been defined then marking can stop.
This prevents marking unnecess

[AMDGPU] WQM/WWM: Fix marking of partial definitions

Track lanes when processing definitions for marking WQM/WWM.
If all lanes have been defined then marking can stop.
This prevents marking unnecessary instructions as WQM/WWM.

In particular this fixes a bug where values passing through
V_SET_INACTIVE would me marked as requiring WWM.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95503

show more ...


# a8d9d507 17-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] gfx90a support

Differential Revision: https://reviews.llvm.org/D96906


# aef781b4 14-Feb-2021 Carl Ritson <[email protected]>

[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic

Add intrinsic which demotes all active lanes to helper lanes.
This is used to implement demote to helper Vulkan extension.

In practice demoting a lane

[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic

Add intrinsic which demotes all active lanes to helper lanes.
This is used to implement demote to helper Vulkan extension.

In practice demoting a lane to helper simply means removing it
from the mask of live lanes used for WQM/WWM/Exact mode.
Where the shader does not use WQM, demotes just become kills.

Additionally add llvm.amdgcn.live.mask intrinsic to complement
demote operations. In theory llvm.amdgcn.ps.live can be used
to detect helper lanes; however, ps.live can be moved by LICM.
The movement of ps.live cannot be remedied without changing
its type signature and such a change would require ps.live
users to update as well.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94747

show more ...


# c16f7760 10-Feb-2021 Carl Ritson <[email protected]>

[AMDGPU] Move kill lowering to WQM pass and add live mask tracking

Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Us

[AMDGPU] Move kill lowering to WQM pass and add live mask tracking

Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Use live lane tracking to enable early termination of shader
at any point in control flow.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94746

show more ...


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1
# 0824694d 27-Jan-2021 Carl Ritson <[email protected]>

[AMDGPU] Fix WMM Entry SCC preservation

SCC was not correctly preserved when entering WWM.
Current lit test was unable to detect this as entry block is
handled differently.
Additionally fix an issue

[AMDGPU] Fix WMM Entry SCC preservation

SCC was not correctly preserved when entering WWM.
Current lit test was unable to detect this as entry block is
handled differently.
Additionally fix an issue where SCC was unnecessarily preserved
when exiting from WWM to Exact mode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D95500

show more ...


Revision tags: llvmorg-13-init, llvmorg-11.1.0-rc2
# 560d7e04 20-Jan-2021 dfukalov <[email protected]>

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036


1234