History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (Results 1 – 25 of 535)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 0387da6f 20-Jul-2022 Kazu Hirata <[email protected]>

Use value instead of getValue (NFC)


# 41ae78ea 20-Jul-2022 Kazu Hirata <[email protected]>

Use has_value instead of hasValue (NFC)


# 3a205977 19-Jul-2022 Jon Chesterfield <[email protected]>

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS varia

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060

show more ...


Revision tags: llvmorg-14.0.6
# ae72fee7 20-Jun-2022 Joe Nash <[email protected]>

[AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type

Reviewed By: #amdgpu, foad, rampitec

Differential Revision: https://reviews.llvm.org/D128205


# 6cae753b 20-Jun-2022 Mirko Brkusanin <[email protected]>

[AMDGPU][GlobalISel] Legalize G_FSUB for s16

Differential Revision: https://reviews.llvm.org/D128066


# c155a944 14-Jun-2022 Jay Foad <[email protected]>

[AMDGPU] GFX11 CodeGen support for MIMG instructions

This includes:
- New llvm.amdgcn.image.msaa.load.* intrinsics
- NSA changes, because MIMG-NSA is now limited to 3 dwords
- Split CD forms of IMAG

[AMDGPU] GFX11 CodeGen support for MIMG instructions

This includes:
- New llvm.amdgcn.image.msaa.load.* intrinsics
- NSA changes, because MIMG-NSA is now limited to 3 dwords
- Split CD forms of IMAGE_SAMPLE instructions out into separate
test files since they are no longer supported in GFX11

Differential Revision: https://reviews.llvm.org/D127837

show more ...


Revision tags: llvmorg-14.0.5
# 0abb472f 09-Jun-2022 Benjamin Kramer <[email protected]>

AMDGPU/GISel: Remove unused variable. NFC.


Revision tags: llvmorg-14.0.4
# 264d1136 29-Apr-2022 Nicolai Hähnle <[email protected]>

AMDGPU/GISel: Introduce custom legalization of G_MUL

The generic legalizer framework is still used to reduce the problem
to scalar multiplication with the bit size a multiple of 32.

Generating opti

AMDGPU/GISel: Introduce custom legalization of G_MUL

The generic legalizer framework is still used to reduce the problem
to scalar multiplication with the bit size a multiple of 32.

Generating optimal code sequences for big integer multiplication is
somewhat tricky and has a number of target-specific intricacies:

- The target has V_MAD_U64_U32 instructions that multiply two 32-bit
factors and add a 64-bit accumulator. Most partial products should
use this instruction.
- The accumulator is mapped to consecutive 32-bit GPRs, and partial-
product multiply-adds can feed the accumulator into each other
directly. (The register allocator's support for that is somewhat
limited, but that only matters for 128-bit integers and larger.)
- OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator
to be stored in an even-aligned pair of GPRs. To avoid excessive
register copies, it makes sense to compute odd partial products
separately from even partial products (where a partial product
src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both
halves together as a final step.
- We can combine G_MUL+G_ADD into a single cascade of multiply-adds.
- The target can keep many carry-bits in flight simultaneously, so
combining carries using G_UADDE is preferable over G_ZEXT + G_ADD.
- Not addressed by this patch: When the factors are sign-extended,
the V_MAD_I64_I32 instruction (signed version!) can be used.

It is difficult to address these points generically:

1) Finding matching pairs of G_MUL and G_UMULH to find a wide
multiply is expensive. We could add a G_UMUL_LOHI generic instruction
and conditionally use that in the generic legalizer, but by itself
this wouldn't allow us to use the accumulation capability of
V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE
post-legalization, but this is also expensive.

2) Similarly, making sense of the legalization outcome of a wide
pre-legalization G_MUL+G_ADD pair is extremely expensive.

3) How could the generic legalizer possibly deal with the
particular idiosyncracy of "odd" vs. "even" partial products.

All this points in the direction of directly emitting an ideal code
sequence during legalization, but the generic legalizer should not
be burdened with such overly target-specific concerns. Hence, a
custom legalization.

Note that the implemented approach is different from that used by
SelectionDAG because narrowing of scalars works differently in
general. SelectionDAG iteratively cuts wide scalars into low and
high halves until a legal size is reached. By contrast, GlobalISel
does the narrowing in a single shot, which should be better for
compile-time and for the quality of the generated code.

This patch leaves three gaps open:

1. When the factors are uniform, we should execute the multiplication on
the SALU. Register bank mapping already ensures this.

However, the resulting code sequence is not optimal because it doesn't
fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32
doesn't have a carry-in.) It is very difficult to fix this after the
fact, so we should really use a different legalization sequence in
this case. Unfortunately, we don't have a divergence analysis and so
cannot make that choice.

(This only matters for 128-bit integers and larger.)

2. Avoid unnecessary multiplies when sources are known to be zero- or
sign-extended. The challenge is that the legalizer does not currently
have access to GISelKnownBits.

3. When the G_MUL is followed by a G_ADD, we should consider combining
the two instructions into a single multiply-add sequence, to utilize
the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has
multiple uses and the implied duplication of the multiply is an
overall negative). However, this is also not true when the factors
are uniform: in that case, it is generally better to *not* combine
the two operations, so that the multiply can be done on the SALU.

Again, we don't have a divergence analysis available and so cannot
make an informed choice.

Differential Revision: https://reviews.llvm.org/D124844

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# 794a0bb5 15-Apr-2022 Matt Arsenault <[email protected]>

AMDGPU: Directly implement computeKnownBits for workitem intrinsics

Currently metadata is inserted in a late pass which is lowered
to an AssertZext. The metadata would be more useful if it was
inser

AMDGPU: Directly implement computeKnownBits for workitem intrinsics

Currently metadata is inserted in a late pass which is lowered
to an AssertZext. The metadata would be more useful if it was
inserted earlier after inlining, but before codegen.

Probably shouldn't change anything now. Just replacing the
late metadata annotation needs more work, since we lose
out on optimizations after these are lowered to CopyFromReg.

Seems to be slightly better than relying on the AssertZext from the
metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using
-start-before=amdgpu-isel instead of running the usual codegen
pipeline.

show more ...


Revision tags: llvmorg-14.0.1
# 538c7717 06-Apr-2022 Jay Foad <[email protected]>

[AMDGPU] Fix unused variable warning after D117484


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 54c525fc 15-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Handle legacy grid ID intrinsics

Handle the llvm.r600.* intrinsics which are still in use in libclc. I
thought it would be possible to switch it to using
llvm.amdgcn.implicitarg.p

AMDGPU/GlobalISel: Handle legacy grid ID intrinsics

Handle the llvm.r600.* intrinsics which are still in use in libclc. I
thought it would be possible to switch it to using
llvm.amdgcn.implicitarg.ptr already, but it turns out the implicit
arguments are currently split into a piece before and after the
explicit kernel arguments.

show more ...


# 898d5776 31-Mar-2022 Abinav Puthan Purayil <[email protected]>

[AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer

Differential Revision: https://reviews.llvm.org/D122803


# acf83abc 31-Mar-2022 Abinav Puthan Purayil <[email protected]>

[AMDGPU][GlobalISel] Remove unused variable. NFC.


# dd5895cc 17-Mar-2022 Changpeng Fang <[email protected]>

AMDGPU: Use the implicit kernargs for code object version 5

Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg

AMDGPU: Use the implicit kernargs for code object version 5

Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265

show more ...


# 37b37838 16-Mar-2022 Shengchen Kan <[email protected]>

[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments


# 932f6281 03-Mar-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] new gfx940 fp atomics

Differential Revision: https://reviews.llvm.org/D121028


# 6527b2a4 18-Feb-2022 Sebastian Neubauer <[email protected]>

[AMDGPU][NFC] Fix typos

Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235


# dcb2da13 11-Feb-2022 Julien Pages <[email protected]>

[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode

Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.

Differential Revision:

[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode

Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.

Differential Revision: https://reviews.llvm.org/D110579

show more ...


# 8b8b4913 27-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Fix assertions on invalid addrspacecasts

Fixes some assert on invalid situations and starts directly emitting
the error.


# 1194b9cd 01-Feb-2022 Changpeng Fang <[email protected]>

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for t

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for the new ABI
Set the metadata version to be 1.2

Reviewers:
t-tye, b-sumner, arsenm, and bcahoon

Fixes:
SWDEV-307188, SWDEV-307189

Differential Revision:
https://reviews.llvm.org/D118272

show more ...


Revision tags: llvmorg-13.0.1-rc2
# ae2f9c8b 21-Dec-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Remove lz and nomip combine from codegen

These combines have been moved into the IR combiner in D116042.

Differential Revision: https://reviews.llvm.org/D116116


# 0530fdbb 20-Dec-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Fix LOD bias in A16 combine

As the codegen fix in D111754, the LOD bias needs to be converted to 16
bits. Fix this in the combine.

Differential Revision: https://reviews.llvm.org/D116038


# 2e49e0cf 19-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics

Emit an error if the return value is used on subtargets that do not
support them. Previously we were falling back to the DAG on s

AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics

Emit an error if the return value is used on subtargets that do not
support them. Previously we were falling back to the DAG on selection
failure, where it would emit this error and then fail again.

show more ...


# be7e938e 19-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd

This code is not structured to handle the legacy buffer intrinsics and
was miscompiling them.


# 8ff3c9e0 19-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics

The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn

AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics

The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn't handle at all). We also are not diagnosing the
use of the return value on gfx908. These will be addressed separately.

show more ...


12345678910>>...22