History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h (Results 1 – 25 of 142)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 432cbd78 18-Jul-2022 Ivan Kosarev <[email protected]>

[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129381


# 4874838a 28-Jun-2022 Piotr Sobczak <[email protected]>

[AMDGPU] gfx11 WMMA instruction support

gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D1287

[AMDGPU] gfx11 WMMA instruction support

gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# 20d20156 09-Jun-2022 Joe Nash <[email protected]>

[AMDGPU] gfx11 VINTERP intrinsics and ISel support

Depends on D127664

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127756


# 2d43de13 15-Jun-2022 Joe Nash <[email protected]>

[AMDGPU] gfx11 new dot instruction codegen support

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127904


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 7b9f620e 06-Apr-2022 Jay Foad <[email protected]>

[AMDGPU] Work around GFX11 flat scratch SVS swizzling bug

Differential Revision: https://reviews.llvm.org/D127635


# 5df2893a 28-Apr-2022 Nicolai Hähnle <[email protected]>

AMDGPU: Add G_AMDGPU_MAD_64_32 instructions

These generic instructions are trivially selected to
V_MAD_[IU]64_[IU]32 instructions when run on the VALU.

When at least both factors are scalar, it is

AMDGPU: Add G_AMDGPU_MAD_64_32 instructions

These generic instructions are trivially selected to
V_MAD_[IU]64_[IU]32 instructions when run on the VALU.

When at least both factors are scalar, it is usually better to execute
some or all of the instruction on the SALU. To this end, we lower the
instruction to simpler instructions that are supported on the SALU
when applying the register bank mapping.

Differential Revision: https://reviews.llvm.org/D124843

show more ...


# dee31902 17-May-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Add llvm.amdgcn.global.load.lds intrinsic

Differential Revision: https://reviews.llvm.org/D125279


# 791ec1c6 13-May-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds

Differential Revision: https://reviews.llvm.org/D124884


# 6e3e14f6 21-Mar-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Support gfx940 smfmac instructions

Differential Revision: https://reviews.llvm.org/D122191


# f59cb41b 15-Mar-2022 Abinav Puthan Purayil <[email protected]>

[AMDGPU] Select buffer_atomic_cmpswap* in tblgen

This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFI

[AMDGPU] Select buffer_atomic_cmpswap* in tblgen

This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFInstructions.td. This allows us to select the return and
no-return variants in tblgen.

Differential Revision: https://reviews.llvm.org/D121770

show more ...


# c4500de2 14-Mar-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions

Differential Revision: https://reviews.llvm.org/D121634


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 36fe3f13 08-Mar-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] flat scratch SVS addressing mode for gfx940

Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254


Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 7f26a102 12-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences

Arbitrary stack pointers are accessed using MUBUF instructions with
the voffset field, which is interpreted as the swizzled address.

AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences

Arbitrary stack pointers are accessed using MUBUF instructions with
the voffset field, which is interpreted as the swizzled address. We
want to fold fold into the MUBUF form to use the SP in the SGPR
offset, and previously we were special casing the interpretation of
the pointer value if the access memory operand said it was relative to
the stack pointer.

690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved
the DAG path to special casing copies from SGPRs. This is not an
entirely sound approach, since it's still changing the interpretation
of pointer values based the context.

Introduce a new pseudo which corresponds to the wave-to-vector address
transform. This way the memory instruction has consistent semantics
where the incoming pointer is always interpreted as a vector address,
and we're not obligated to optimize into the MUBUF offset-only
addressing mode. The DAG should probably have an equivalent pseudo.

This should fix some correctness issues, and folding this into
addressing modes will be a future optimization patch.

show more ...


# 41bfac6a 02-Jan-2022 Kazu Hirata <[email protected]>

[Target] Remove unused forward declarations (NFC)


Revision tags: llvmorg-13.0.1-rc1
# 078da26b 08-Nov-2021 Abinav Puthan Purayil <[email protected]>

[AMDGPU] Check for unneeded shift mask in shift PatFrags.

The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnnee

[AMDGPU] Check for unneeded shift mask in shift PatFrags.

The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.

Differential Revision: https://reviews.llvm.org/D113448

show more ...


# aee86f9b 07-Nov-2021 Kazu Hirata <[email protected]>

[AMDGPU] Remove unused declaration selectSMRD (NFC)

The function body proper was removed on Feb 20, 2019 in commit
79b5c3842b684f873d1ffad502336e973616ea51.


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 48958d02 23-Aug-2021 Daniil Fukalov <[email protected]>

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `G

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
and `R600TargetMachine::getSubtargetImpl()` had different return value type
than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# f9f5d415 30-Apr-2021 Brendon Cahoon <[email protected]>

[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX

Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bit

[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX

Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.

The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.

There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.

Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.

Differential Revision: https://reviews.llvm.org/D100149

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# cc7add52 30-Mar-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Use SIInstrFlags for flat variants. NFC

Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundl

[AMDGPU] Use SIInstrFlags for flat variants. NFC

Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743

show more ...


Revision tags: llvmorg-12.0.0-rc3
# 5d0e9ddf 01-Mar-2021 Jay Foad <[email protected]>

[AMDGPU][GlobalISel] Add support for global atomicrmw fadd

This includes gfx908 which only has a no-return version of the
global_atomic_add_f32 instruction, using the same hack that was
previously i

[AMDGPU][GlobalISel] Add support for global atomicrmw fadd

This includes gfx908 which only has a no-return version of the
global_atomic_add_f32 instruction, using the same hack that was
previously implemented for selecting from the
llvm.amdgcn.global.atomic.fadd intrinsic.

Differential Revision: https://reviews.llvm.org/D97767

show more ...


Revision tags: llvmorg-12.0.0-rc2
# 3bffb1cd 09-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Use single cache policy operand

Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amoun

[AMDGPU] Use single cache policy operand

Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469

show more ...


# 8a316045 25-Feb-2021 Amara Emerson <[email protected]>

[AArch64][GlobalISel] Enable use of the optsize predicate in the selector.

To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and

[AArch64][GlobalISel] Enable use of the optsize predicate in the selector.

To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732

show more ...


# a8d9d507 17-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] gfx90a support

Differential Revision: https://reviews.llvm.org/D96906


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 7e9ceed9 27-Aug-2020 Jay Foad <[email protected]>

[TableGen][GlobalISel] Allow duplicate RendererFns

Allow different GICustomOperandRenderers to use the same RendererFn.
This avoids the need for targets to define a bunch of identical C++
renderer f

[TableGen][GlobalISel] Allow duplicate RendererFns

Allow different GICustomOperandRenderers to use the same RendererFn.
This avoids the need for targets to define a bunch of identical C++
renderer functions with different names.

Without this fix TableGen would have emitted code that tried to define
the GICR enumeration with duplicate enumerators.

Differential Revision: https://reviews.llvm.org/D96587

show more ...


# 6bde0853 27-Jan-2021 Kazu Hirata <[email protected]>

[AMDGPU] Forward-declare TargetRegisterClass (NFC)

AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This pat

[AMDGPU] Forward-declare TargetRegisterClass (NFC)

AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This patch adds a forward declaration right in
AMDGPUInstructionSelector.h.

While we are at it, this patch removes the one in
InstructionSelector.h, where it is unnecessary.

show more ...


123456