|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
2e29b013 |
| 21-Jun-2022 |
Alexander Timofeev <[email protected]> |
[AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable.
Since the divergence-driven instruction selection has been enabled for AMDGPU, all the uniform instructions are expected
[AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable.
Since the divergence-driven instruction selection has been enabled for AMDGPU, all the uniform instructions are expected to be selected to SALU form, except those not having one. VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form and overhead introduced by the data transfer between the VGPRs and SGPRs.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D128252
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
387927bb |
| 27-Nov-2021 |
Kazu Hirata <[email protected]> |
[Target] Use range-based for loops (NFC)
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
654c89d8 |
| 06-Sep-2021 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc.
Also, added the missing AV register classes from 192b to 1024b.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D109300
show more ...
|
| #
1a332946 |
| 16-Sep-2021 |
alex-t <[email protected]> |
[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC
Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* b
[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC
Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa.
VALU operations only process the active lanes of the VGPR and ignore inactive ones. Active lanes correspond to 1 bit in the EXEC mask register. SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively. SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs
To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit. To return back to the SALU context we need to do the opposite.
To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D109900
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
ed0f4415 |
| 15-Jul-2021 |
alex-t <[email protected]> |
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode dive
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3 |
|
| #
4672bac1 |
| 03-Mar-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference t
[AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D96258
show more ...
|
| #
c3ce7bae |
| 02-Mar-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future.
The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions.
The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D96257
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
560d7e04 |
| 20-Jan-2021 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
|
Revision tags: llvmorg-11.1.0-rc1 |
|
| #
314e29ed |
| 07-Jan-2021 |
Joe Nash <[email protected]> |
[AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only avail
[AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D94341
Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
show more ...
|
| #
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
| #
0e219b64 |
| 03-Jan-2021 |
Kazu Hirata <[email protected]> |
[Target] Construct SmallVector with iterator ranges (NFC)
|
| #
29ed846d |
| 21-Dec-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix assert when checking for implicit operand legality
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
31a0b283 |
| 29-Oct-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into three blocks. So the basic block that we iterate over must be updated.
This failed ass
[AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into three blocks. So the basic block that we iterate over must be updated.
This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for divergent calls in branches before.
Differential Revision: https://reviews.llvm.org/D90596
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
8d7fd73c |
| 17-Sep-2020 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Fix merging m0 inits
Fix incorrect merges of m0 inits in loops.
It was assumed that if a clobbering instruction appears in the same block as an init and the clobbering instruction does not
[AMDGPU] Fix merging m0 inits
Fix incorrect merges of m0 inits in loops.
It was assumed that if a clobbering instruction appears in the same block as an init and the clobbering instruction does not dominate the init then it does not interfere with init.
This does not work in the presence of loops, where in this scenario, the clobbering instruction does interfere with the init in another iteration.
To fix this, do not check for block equality and defer the decision to the predecessor check.
Differential Revision: https://reviews.llvm.org/D87882
show more ...
|
| #
45eeb8c2 |
| 26-Aug-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Remove unused variable introduced in r251860
|
| #
98de0d22 |
| 21-Aug-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy
|
| #
34978602 |
| 20-Aug-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
b726d071 |
| 22-May-2020 |
alex-t <[email protected]> |
[AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate
Summary: PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence.
[AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate
Summary: PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence. In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back. PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception. We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the source VGPR in this COPY is move immediate.
bb.0:
%0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec %2:sreg_32 = .....
bb.1: %3:sreg_32 = PHI %1, %bb.3, %2, %bb.1 S_BRANCH %bb.3
bb.3: %1:sreg_32 = COPY %0 S_BRANCH %bb.2
Reviewers: rampitec
Reviewed By: rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80434
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc1 |
|
| #
04627950 |
| 02-Apr-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Propagate AGPR RC from PHI to its PHI operands
We can fix register class of PHI based on its all AGPR uses. That leaves behind all PHIs which were already processed earlier. Propagate RC ba
[AMDGPU] Propagate AGPR RC from PHI to its PHI operands
We can fix register class of PHI based on its all AGPR uses. That leaves behind all PHIs which were already processed earlier. Propagate RC back to PHI operands of a PHI.
Differential Revision: https://reviews.llvm.org/D77344
show more ...
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
| #
d9e8b2cb |
| 23-Dec-2019 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Skip DAG hack passes on selected functions
The way fallback to SelectionDAG works is somewhat surprising to me. When the fallback path is enabled, the entire set of SelectionDAG s
AMDGPU/GlobalISel: Skip DAG hack passes on selected functions
The way fallback to SelectionDAG works is somewhat surprising to me. When the fallback path is enabled, the entire set of SelectionDAG selector passes is added to the pass pipeline, and each one needs to check if the function was selected. This results in the surprising behavior of running SIFixSGPRCopies for example, but only if -global-isel-abort=2 is used.
SIAddIMGInitPass is also added in addInstSelector, but I'm not sure why we have this pass or if it should be added somewhere else for GlobalISel.
show more ...
|
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
05da2fe5 |
| 13-Nov-2019 |
Reid Kleckner <[email protected]> |
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
show more ...
|
| #
0fab220e |
| 18-Oct-2019 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] move PHI nodes to AGPR class
If all uses of a PHI are in AGPR register class we should avoid unneeded copies via VGPRs.
Differential Revision: https://reviews.llvm.org/D69200
llvm-svn: 37
[AMDGPU] move PHI nodes to AGPR class
If all uses of a PHI are in AGPR register class we should avoid unneeded copies via VGPRs.
Differential Revision: https://reviews.llvm.org/D69200
llvm-svn: 375297
show more ...
|
| #
2d6a2303 |
| 16-Oct-2019 |
David Stuttard <[email protected]> |
[AMDGPU] Fix-up cases where writelane has 2 SGPR operands
Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand const
[AMDGPU] Fix-up cases where writelane has 2 SGPR operands
Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint
Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky.
This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions.
The algorithm used is to check for trivial copy prop of suitable constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access).
Reviewers: rampitec, tpr, arsenm, nhaehnle
Reviewed By: rampitec, arsenm, nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D51932
Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea llvm-svn: 375004
show more ...
|
| #
527e9f9a |
| 15-Oct-2019 |
Austin Kerbow <[email protected]> |
AMDGPU: Fix infinite searches in SIFixSGPRCopies
Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies.
The first condition involves a REG_SEQUENCE that
AMDGPU: Fix infinite searches in SIFixSGPRCopies
Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies.
The first condition involves a REG_SEQUENCE that uses registers defined by both a PHI and a COPY.
The second condition arises when a physical register is copied to a virtual register which is then used in a PHI node. If the same virtual register is copied to the same physical register, the result is an endless loop.
%0:sgpr_64 = COPY $sgpr0_sgpr1 %2 = PHI %0, %bb.0, %1, %bb.1 $sgpr0_sgpr1 = COPY %0
Reviewers: alex-t, rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68970
llvm-svn: 374944
show more ...
|
| #
c4d256a5 |
| 14-Oct-2019 |
Alexander Timofeev <[email protected]> |
[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues
[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.
Discovered issues were addressed in the following commits:
https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731
This change brings back AMDGPU specific changes.
Reviewed by: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D68635
llvm-svn: 374767
show more ...
|