|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
b28bb8cc |
| 15-Jul-2022 |
Joe Nash <[email protected]> |
[AMDGPU] Remove old operand from VOPC DPP
For most DPP instructions, the old operand stores the value that was in the current lane before the DPP operation, and is tied to the destination. For VOPC
[AMDGPU] Remove old operand from VOPC DPP
For most DPP instructions, the old operand stores the value that was in the current lane before the DPP operation, and is tied to the destination. For VOPC DPP, this is unnecessary and incorrect.
There appears to have been a latent bug related to D122737 with SIInstrInfo::isOperandLegal. If you checked if a register operand was legal when the InstructionDesc expected an immediate, it reported that is valid. Its fix is necessary for and tested in this patch.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D130040
show more ...
|
| #
8d0383eb |
| 24-Jun-2022 |
Matt Arsenault <[email protected]> |
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.
show more ...
|
| #
432cbd78 |
| 18-Jul-2022 |
Ivan Kosarev <[email protected]> |
[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129381
|
| #
e45aa230 |
| 13-Jul-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Update LiveVariables after killing an immediate def
D114999 added code to kill an immediate def if it was folded into its only use by convertToThreeAddress. This patch updates LiveVariables
[AMDGPU] Update LiveVariables after killing an immediate def
D114999 added code to kill an immediate def if it was folded into its only use by convertToThreeAddress. This patch updates LiveVariables when that happens in order to fix verification failures exposed by D129213.
Differential Revision: https://reviews.llvm.org/D129661
show more ...
|
| #
d1af09ad |
| 23-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back.
Depends on D128442, D128270
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D128656
show more ...
|
| #
4874838a |
| 28-Jun-2022 |
Piotr Sobczak <[email protected]> |
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D1287
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D128756
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
d342d130 |
| 06-Jun-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Use isMeta flags on pseudoinstructions
|
| #
21895c6b |
| 28-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Relax verification of soffset in scalar stores
It must use m0 only on GFX8. Later chips can use ang SGPR.
Differential Revision: https://reviews.llvm.org/D128765
|
| #
f1cfaa95 |
| 20-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] Use GFX11 S_PACK_HL instruction in more cases
Differential Revision: https://reviews.llvm.org/D128527
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
bd9eed3a |
| 22-May-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Add isMFMA helper function. NFC
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D127124
|
| #
cb9ae937 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision:
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision: https://reviews.llvm.org/D127527
show more ...
|
| #
0f818306 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR
This uses rotating reminder of division by 3 to select another temp vgpr each next time in a sequence of several agpr copies. Therefore
[AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR
This uses rotating reminder of division by 3 to select another temp vgpr each next time in a sequence of several agpr copies. Therefore, temp vgpr selection depends on the generated agpr number. This number could change with any unrelated change to the register definitions.
Stabilize the selection by using a real agpr number.
Differential Revision: https://reviews.llvm.org/D127524
show more ...
|
|
Revision tags: llvmorg-14.0.3 |
|
| #
0e1c71e4 |
| 27-Apr-2022 |
Matt Arsenault <[email protected]> |
CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine
Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function.
Currently AMDGPU i
CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine
Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function.
Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.
show more ...
|
| #
5df6669d |
| 18-May-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Enforce alignment of image vaddr on gfx90a
Even though single address image instructions only use a single VGPR HW accesses 4 or 5 which creates alignment requirement.
Fixes: SWDEV-316648
[AMDGPU] Enforce alignment of image vaddr on gfx90a
Even though single address image instructions only use a single VGPR HW accesses 4 or 5 which creates alignment requirement.
Fixes: SWDEV-316648
Differential Revision: https://reviews.llvm.org/D126009
show more ...
|
| #
78ec59e6 |
| 20-May-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Handle mandatory literals in isOperandLegal
Extend SIInstrInfo::isOperandLegal to enforce a limit on the number of literal operands for all VALU instructions, not just VOP3. In particular i
[AMDGPU] Handle mandatory literals in isOperandLegal
Extend SIInstrInfo::isOperandLegal to enforce a limit on the number of literal operands for all VALU instructions, not just VOP3. In particular it now handles VOP2 instructions with a mandatory literal operand like V_FMAAK_F32.
Differential Revision: https://reviews.llvm.org/D126064
show more ...
|
| #
5b18ef72 |
| 20-May-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add verification for mandatory literals
Extend the literal operand checking in SIInstrInfo::verifyInstruction to check VOP2 instructions like V_FMAAK_F32 which have a mandatory literal oper
[AMDGPU] Add verification for mandatory literals
Extend the literal operand checking in SIInstrInfo::verifyInstruction to check VOP2 instructions like V_FMAAK_F32 which have a mandatory literal operand. The rule is that src0 can also be a literal, but only if it is the same literal value.
AMDGPUAsmParser::validateConstantBusLimitations already handles this correctly.
Differential Revision: https://reviews.llvm.org/D126063
show more ...
|
| #
d14f2a63 |
| 19-May-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Allow multiple uses of the same literal in SOP2/SOPC
AMDGPUAsmParser::validateSOPLiteral already knew about this but SIInstrInfo::verifyInstruction did not.
Differential Revision: https://
[AMDGPU] Allow multiple uses of the same literal in SOP2/SOPC
AMDGPUAsmParser::validateSOPLiteral already knew about this but SIInstrInfo::verifyInstruction did not.
Differential Revision: https://reviews.llvm.org/D125976
show more ...
|
| #
dee31902 |
| 17-May-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add llvm.amdgcn.global.load.lds intrinsic
Differential Revision: https://reviews.llvm.org/D125279
|
| #
791ec1c6 |
| 13-May-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds
Differential Revision: https://reviews.llvm.org/D124884
|
|
Revision tags: llvmorg-14.0.2 |
|
| #
c7025940 |
| 19-Apr-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits.
[AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits.
Contributors: Petar Avramovic <[email protected]> Dmitry Preobrazhensky <[email protected]>
Depends on D125316
Patch 6/N for upstreaming of AMDGPU gfx11 architecture.
Reviewed By: dp, Petar.Avramovic
Differential Revision: https://reviews.llvm.org/D125319
show more ...
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
dfb006c0 |
| 21-Feb-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Extract SIInstrInfo::removeModOperands. NFC.
Make this an externally callable function for use in a future patch.
Differential Revision: https://reviews.llvm.org/D125565
|
| #
2db70021 |
| 25-Mar-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers
[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers have knowledge of inter-wave runtime behavior which is unknown to the compiler this builtin can be used to tune scheduling.
This intrinsic creates a barrier between scheduling regions. The immediate parameter is a mask to determine the types of instructions that should be prevented from crossing the sched_barrier. In this initial patch, there are only two variations. A mask of 0 means that no instructions may be scheduled across the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing instructions may cross the sched_barrier.
Note that this intrinsic is only meant to work with the scheduling passes. Any other transformations that may move code will not be impacted in the ways described above.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D124700
show more ...
|
| #
18ed279a |
| 14-Apr-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 subtarget features & early tests
Tablegen definitions for subtarget features and cpp predicate functions to access the features. New Sub-TargetProcessors and common latencies. Simple
[AMDGPU] gfx11 subtarget features & early tests
Tablegen definitions for subtarget features and cpp predicate functions to access the features. New Sub-TargetProcessors and common latencies. Simple changes to MIR codegen tests which pass on gfx11 because they have the same output as previous subtargets or operate on pseudo instructions which are reused from previous subtargets.
Contributors: Jay Foad <[email protected]> Petar Avramovic <[email protected]>
Patch 4/N for upstreaming of AMDGPU gfx11 architecture
Depends on D124538
Reviewed By: Petar.Avramovic, foad
Differential Revision: https://reviews.llvm.org/D125261
show more ...
|
| #
88f04bdb |
| 10-May-2022 |
Ivan Kosarev <[email protected]> |
[AMDGPU][GFX10] Support base+soffset+offset SMEM loads.
Also makes a step towards resolving https://github.com/llvm/llvm-project/issues/38652
Reviewed By: foad, dp
Differential Revision: https://r
[AMDGPU][GFX10] Support base+soffset+offset SMEM loads.
Also makes a step towards resolving https://github.com/llvm/llvm-project/issues/38652
Reviewed By: foad, dp
Differential Revision: https://reviews.llvm.org/D125117
show more ...
|
| #
879ac410 |
| 30-Mar-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Fix crash in SIOptimizeExecMaskingPreRA
When folding a COPY of exec into another COPY, the call to TII->isOperandLegal would crash because COPYs don't have defined register classes for thei
[AMDGPU] Fix crash in SIOptimizeExecMaskingPreRA
When folding a COPY of exec into another COPY, the call to TII->isOperandLegal would crash because COPYs don't have defined register classes for their operands.
Differential Revision: https://reviews.llvm.org/D122737
show more ...
|