|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
732eed40 |
| 09-Mar-2022 |
Ruiling Song <[email protected]> |
[AMDGPU] Mark GFX11 dual source blend export as strict-wqm
The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we
[AMDGPU] Mark GFX11 dual source blend export as strict-wqm
The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we need to enable all four lanes of that quad to make the shuffling operation before exporting to dual source blend target work correctly.
Differential Revision: https://reviews.llvm.org/D127981
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
29621c13 |
| 11-Mar-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Tag GFX11 LDS loads as using strict_wqm
LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad).
Tag
[AMDGPU] Tag GFX11 LDS loads as using strict_wqm
LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad).
Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm to enforce this and avoid lane clobbering issues. Note that only the instruction itself is tagged. The implicit uses of these do not need to be set WQM. The reduces unnecessary WQM calculation of M0.
Differential Revision: https://reviews.llvm.org/D127977
show more ...
|
| #
4271a1ff |
| 18-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Call *set::insert without checking membership first (NFC)
|
| #
37b37838 |
| 16-Mar-2022 |
Shengchen Kan <[email protected]> |
[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
|
| #
98dd3905 |
| 11-Jan-2022 |
Ruiling Song <[email protected]> |
AMDGPU: Use removeAllRegUnitsForPhysReg()
I met the issue here when working on something else. Actually we have already reserved EXEC, but it looks like the register coalescer is causing the sub-reg
AMDGPU: Use removeAllRegUnitsForPhysReg()
I met the issue here when working on something else. Actually we have already reserved EXEC, but it looks like the register coalescer is causing the sub-register of EXEC appears in LiveIntervals. I have not looked deeper why register coalscer have such behavior, but removeAllRegUnitsForPhysReg() is the right way.
Reviewed By: critson, foad, arsenm
Differential Revision: https://reviews.llvm.org/D117014
show more ...
|
| #
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
| #
aa418b91 |
| 25-Jan-2022 |
Konstantina <[email protected]> |
[AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D118096
|
| #
f78c1b07 |
| 17-Dec-2021 |
Kazu Hirata <[email protected]> |
[Target] Use range-based for loops (NFC)
|
| #
976f3b3c |
| 24-Nov-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Only allow implicit WQM in pixel shaders
Implicit derivatives are only valid in pixel shaders, hence only implicitly enable WQM for pixel shaders. This avoids unintended WQM in other shader
[AMDGPU] Only allow implicit WQM in pixel shaders
Implicit derivatives are only valid in pixel shaders, hence only implicitly enable WQM for pixel shaders. This avoids unintended WQM in other shader types (e.g. compute) when image sampling instructions are used.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114414
show more ...
|
| #
d45cb1d7 |
| 23-Nov-2021 |
Kazu Hirata <[email protected]> |
[llvm] Use range-based for loops (NFC)
|
| #
2c4ba3e9 |
| 05-Nov-2021 |
Kazu Hirata <[email protected]> |
[Target] Use make_early_inc_range (NFC)
|
| #
be1a8f88 |
| 27-Oct-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Really preserve LiveVariables in SILowerControlFlow
https://bugs.llvm.org/show_bug.cgi?id=52204
Differential Revision: https://reviews.llvm.org/D112731
|
| #
36deb9a6 |
| 15-Oct-2021 |
Jay Foad <[email protected]> |
Add new MachineFunction property FailsVerification
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets you skip machine verification after a particular pass. Unfortunately this
Add new MachineFunction property FailsVerification
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets you skip machine verification after a particular pass. Unfortunately this is used in generic code in TargetPassConfig itself to skip verification after a generic pass, only because some previous target- specific pass damaged the MIR on that specific target. This is bad because problems in one target cause lack of verification for all targets.
This patch replaces that mechanism with a new MachineFunction property called "FailsVerification" which can be set by (usually target-specific) passes that are known to introduce problems. Later passes can reset it again if they are known to clean up the previous problems.
Differential Revision: https://reviews.llvm.org/D111397
show more ...
|
| #
67cfefeb |
| 06-May-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Fix WQM failure with single block inactive demote
Instruction test for inactive kill/demote needs to be based on actual opcode not whether instruction would be lowered to demote.
Reviewed
[AMDGPU] Fix WQM failure with single block inactive demote
Instruction test for inactive kill/demote needs to be based on actual opcode not whether instruction would be lowered to demote.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D101966
show more ...
|
| #
79cb3ba0 |
| 21-Apr-2021 |
Jay Foad <[email protected]> |
[AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands
STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so there is no need to add another implicit use of $exec when
[AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands
STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so there is no need to add another implicit use of $exec when lowering them to V_MOV_B32 instructions.
Differential Revision: https://reviews.llvm.org/D100969
show more ...
|
| #
1a4bc3ab |
| 18-Mar-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Avoid unnecessary graph visits during WQM marking
Avoid revisiting nodes with the same set of defined lanes by using a unified visited set which integrates lanes into the key. This retains
[AMDGPU] Avoid unnecessary graph visits during WQM marking
Avoid revisiting nodes with the same set of defined lanes by using a unified visited set which integrates lanes into the key. This retains the intent of the original code by still revisiting a subgraph if a different set of lanes is defined and hence marking might progress differently.
Note: default size of the visited set has been confirmed to cover >99% of invocations in large array of test shaders.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D98772
show more ...
|
| #
13877db2 |
| 15-Mar-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Fix shortfalls in WQM marking
When tracking defined lanes through phi nodes in the live range graph each branch of the phi must be handled independently. Also rewrite the marking algorithm
[AMDGPU] Fix shortfalls in WQM marking
When tracking defined lanes through phi nodes in the live range graph each branch of the phi must be handled independently. Also rewrite the marking algorithm to reduce unnecessary operations.
Previously a shared set of defined lanes was used which caused marking to stop prematurely. This was observable in existing lit tests, but test patterns did not cover this detail.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D98614
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3 |
|
| #
4672bac1 |
| 03-Mar-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference t
[AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D96258
show more ...
|
| #
c3ce7bae |
| 02-Mar-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future.
The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions.
The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D96257
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2 |
|
| #
8181dcd3 |
| 19-Feb-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] WQM/WWM: Fix marking of partial definitions
Track lanes when processing definitions for marking WQM/WWM. If all lanes have been defined then marking can stop. This prevents marking unnecess
[AMDGPU] WQM/WWM: Fix marking of partial definitions
Track lanes when processing definitions for marking WQM/WWM. If all lanes have been defined then marking can stop. This prevents marking unnecessary instructions as WQM/WWM.
In particular this fixes a bug where values passing through V_SET_INACTIVE would me marked as requiring WWM.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D95503
show more ...
|
| #
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
| #
aef781b4 |
| 14-Feb-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic
Add intrinsic which demotes all active lanes to helper lanes. This is used to implement demote to helper Vulkan extension.
In practice demoting a lane
[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic
Add intrinsic which demotes all active lanes to helper lanes. This is used to implement demote to helper Vulkan extension.
In practice demoting a lane to helper simply means removing it from the mask of live lanes used for WQM/WWM/Exact mode. Where the shader does not use WQM, demotes just become kills.
Additionally add llvm.amdgcn.live.mask intrinsic to complement demote operations. In theory llvm.amdgcn.ps.live can be used to detect helper lanes; however, ps.live can be moved by LICM. The movement of ps.live cannot be remedied without changing its type signature and such a change would require ps.live users to update as well.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D94747
show more ...
|
| #
c16f7760 |
| 10-Feb-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Us
[AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D94746
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1 |
|
| #
0824694d |
| 27-Jan-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Fix WMM Entry SCC preservation
SCC was not correctly preserved when entering WWM. Current lit test was unable to detect this as entry block is handled differently. Additionally fix an issue
[AMDGPU] Fix WMM Entry SCC preservation
SCC was not correctly preserved when entering WWM. Current lit test was unable to detect this as entry block is handled differently. Additionally fix an issue where SCC was unnecessarily preserved when exiting from WWM to Exact mode.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D95500
show more ...
|
|
Revision tags: llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
560d7e04 |
| 20-Jan-2021 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|