|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
432cbd78 |
| 18-Jul-2022 |
Ivan Kosarev <[email protected]> |
[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129381
|
| #
4874838a |
| 28-Jun-2022 |
Piotr Sobczak <[email protected]> |
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D1287
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D128756
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
20d20156 |
| 09-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 VINTERP intrinsics and ISel support
Depends on D127664
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D127756
|
| #
2d43de13 |
| 15-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 new dot instruction codegen support
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D127904
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
7b9f620e |
| 06-Apr-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Work around GFX11 flat scratch SVS swizzling bug
Differential Revision: https://reviews.llvm.org/D127635
|
| #
5df2893a |
| 28-Apr-2022 |
Nicolai Hähnle <[email protected]> |
AMDGPU: Add G_AMDGPU_MAD_64_32 instructions
These generic instructions are trivially selected to V_MAD_[IU]64_[IU]32 instructions when run on the VALU.
When at least both factors are scalar, it is
AMDGPU: Add G_AMDGPU_MAD_64_32 instructions
These generic instructions are trivially selected to V_MAD_[IU]64_[IU]32 instructions when run on the VALU.
When at least both factors are scalar, it is usually better to execute some or all of the instruction on the SALU. To this end, we lower the instruction to simpler instructions that are supported on the SALU when applying the register bank mapping.
Differential Revision: https://reviews.llvm.org/D124843
show more ...
|
| #
dee31902 |
| 17-May-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add llvm.amdgcn.global.load.lds intrinsic
Differential Revision: https://reviews.llvm.org/D125279
|
| #
791ec1c6 |
| 13-May-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds
Differential Revision: https://reviews.llvm.org/D124884
|
| #
6e3e14f6 |
| 21-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Support gfx940 smfmac instructions
Differential Revision: https://reviews.llvm.org/D122191
|
| #
f59cb41b |
| 15-Mar-2022 |
Abinav Puthan Purayil <[email protected]> |
[AMDGPU] Select buffer_atomic_cmpswap* in tblgen
This change replaces the manual selection of buffer_atomic_cmpswap* instructions in SelectionDAG and GlobalISel with a tblgen based selection in BUFI
[AMDGPU] Select buffer_atomic_cmpswap* in tblgen
This change replaces the manual selection of buffer_atomic_cmpswap* instructions in SelectionDAG and GlobalISel with a tblgen based selection in BUFInstructions.td. This allows us to select the return and no-return variants in tblgen.
Differential Revision: https://reviews.llvm.org/D121770
show more ...
|
| #
c4500de2 |
| 14-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
36fe3f13 |
| 08-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.
Differential Revision: https://reviews.llvm.org/D121254
|
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
7f26a102 |
| 12-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences
Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address.
AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences
Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address. We want to fold fold into the MUBUF form to use the SP in the SGPR offset, and previously we were special casing the interpretation of the pointer value if the access memory operand said it was relative to the stack pointer.
690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved the DAG path to special casing copies from SGPRs. This is not an entirely sound approach, since it's still changing the interpretation of pointer values based the context.
Introduce a new pseudo which corresponds to the wave-to-vector address transform. This way the memory instruction has consistent semantics where the incoming pointer is always interpreted as a vector address, and we're not obligated to optimize into the MUBUF offset-only addressing mode. The DAG should probably have an equivalent pseudo.
This should fix some correctness issues, and folding this into addressing modes will be a future optimization patch.
show more ...
|
| #
41bfac6a |
| 02-Jan-2022 |
Kazu Hirata <[email protected]> |
[Target] Remove unused forward declarations (NFC)
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
078da26b |
| 08-Nov-2021 |
Abinav Puthan Purayil <[email protected]> |
[AMDGPU] Check for unneeded shift mask in shift PatFrags.
The existing constrained shift PatFrags only dealt with masked shift from OpenCL front-ends. This change copies the X86DAGToDAGISel::isUnnee
[AMDGPU] Check for unneeded shift mask in shift PatFrags.
The existing constrained shift PatFrags only dealt with masked shift from OpenCL front-ends. This change copies the X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in the shift PatFrag predicates.
Differential Revision: https://reviews.llvm.org/D113448
show more ...
|
| #
aee86f9b |
| 07-Nov-2021 |
Kazu Hirata <[email protected]> |
[AMDGPU] Remove unused declaration selectSMRD (NFC)
The function body proper was removed on Feb 20, 2019 in commit 79b5c3842b684f873d1ffad502336e973616ea51.
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
48958d02 |
| 23-Aug-2021 |
Daniil Fukalov <[email protected]> |
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `G
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108596
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
f9f5d415 |
| 30-Apr-2021 |
Brendon Cahoon <[email protected]> |
[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bit
[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands.
The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register.
There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used.
Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently.
Differential Revision: https://reviews.llvm.org/D100149
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
cc7add52 |
| 30-Mar-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundl
[AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds.
Fixed version of D99587, which does not rely on the address space.
Differential Revision: https://reviews.llvm.org/D99743
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3 |
|
| #
5d0e9ddf |
| 01-Mar-2021 |
Jay Foad <[email protected]> |
[AMDGPU][GlobalISel] Add support for global atomicrmw fadd
This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously i
[AMDGPU][GlobalISel] Add support for global atomicrmw fadd
This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously implemented for selecting from the llvm.amdgcn.global.atomic.fadd intrinsic.
Differential Revision: https://reviews.llvm.org/D97767
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2 |
|
| #
3bffb1cd |
| 09-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amoun
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway.
Additional advantage that parser will accept these flags in any order unlike now.
Differential Revision: https://reviews.llvm.org/D96469
show more ...
|
| #
8a316045 |
| 25-Feb-2021 |
Amara Emerson <[email protected]> |
[AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and
[AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass.
Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize.
Differential Revision: https://reviews.llvm.org/D97732
show more ...
|
| #
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
7e9ceed9 |
| 27-Aug-2020 |
Jay Foad <[email protected]> |
[TableGen][GlobalISel] Allow duplicate RendererFns
Allow different GICustomOperandRenderers to use the same RendererFn. This avoids the need for targets to define a bunch of identical C++ renderer f
[TableGen][GlobalISel] Allow duplicate RendererFns
Allow different GICustomOperandRenderers to use the same RendererFn. This avoids the need for targets to define a bunch of identical C++ renderer functions with different names.
Without this fix TableGen would have emitted code that tried to define the GICR enumeration with duplicate enumerators.
Differential Revision: https://reviews.llvm.org/D96587
show more ...
|
| #
6bde0853 |
| 27-Jan-2021 |
Kazu Hirata <[email protected]> |
[AMDGPU] Forward-declare TargetRegisterClass (NFC)
AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a forward declaration of TargetRegisterClass in InstructionSelector.h. This pat
[AMDGPU] Forward-declare TargetRegisterClass (NFC)
AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a forward declaration of TargetRegisterClass in InstructionSelector.h. This patch adds a forward declaration right in AMDGPUInstructionSelector.h.
While we are at it, this patch removes the one in InstructionSelector.h, where it is unnecessary.
show more ...
|