|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
0387da6f |
| 20-Jul-2022 |
Kazu Hirata <[email protected]> |
Use value instead of getValue (NFC)
|
| #
41ae78ea |
| 20-Jul-2022 |
Kazu Hirata <[email protected]> |
Use has_value instead of hasValue (NFC)
|
| #
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
ae72fee7 |
| 20-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type
Reviewed By: #amdgpu, foad, rampitec
Differential Revision: https://reviews.llvm.org/D128205
|
| #
6cae753b |
| 20-Jun-2022 |
Mirko Brkusanin <[email protected]> |
[AMDGPU][GlobalISel] Legalize G_FSUB for s16
Differential Revision: https://reviews.llvm.org/D128066
|
| #
c155a944 |
| 14-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] GFX11 CodeGen support for MIMG instructions
This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAG
[AMDGPU] GFX11 CodeGen support for MIMG instructions
This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAGE_SAMPLE instructions out into separate test files since they are no longer supported in GFX11
Differential Revision: https://reviews.llvm.org/D127837
show more ...
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
0abb472f |
| 09-Jun-2022 |
Benjamin Kramer <[email protected]> |
AMDGPU/GISel: Remove unused variable. NFC.
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
264d1136 |
| 29-Apr-2022 |
Nicolai Hähnle <[email protected]> |
AMDGPU/GISel: Introduce custom legalization of G_MUL
The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32.
Generating opti
AMDGPU/GISel: Introduce custom legalization of G_MUL
The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32.
Generating optimal code sequences for big integer multiplication is somewhat tricky and has a number of target-specific intricacies:
- The target has V_MAD_U64_U32 instructions that multiply two 32-bit factors and add a 64-bit accumulator. Most partial products should use this instruction. - The accumulator is mapped to consecutive 32-bit GPRs, and partial- product multiply-adds can feed the accumulator into each other directly. (The register allocator's support for that is somewhat limited, but that only matters for 128-bit integers and larger.) - OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator to be stored in an even-aligned pair of GPRs. To avoid excessive register copies, it makes sense to compute odd partial products separately from even partial products (where a partial product src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both halves together as a final step. - We can combine G_MUL+G_ADD into a single cascade of multiply-adds. - The target can keep many carry-bits in flight simultaneously, so combining carries using G_UADDE is preferable over G_ZEXT + G_ADD. - Not addressed by this patch: When the factors are sign-extended, the V_MAD_I64_I32 instruction (signed version!) can be used.
It is difficult to address these points generically:
1) Finding matching pairs of G_MUL and G_UMULH to find a wide multiply is expensive. We could add a G_UMUL_LOHI generic instruction and conditionally use that in the generic legalizer, but by itself this wouldn't allow us to use the accumulation capability of V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE post-legalization, but this is also expensive.
2) Similarly, making sense of the legalization outcome of a wide pre-legalization G_MUL+G_ADD pair is extremely expensive.
3) How could the generic legalizer possibly deal with the particular idiosyncracy of "odd" vs. "even" partial products.
All this points in the direction of directly emitting an ideal code sequence during legalization, but the generic legalizer should not be burdened with such overly target-specific concerns. Hence, a custom legalization.
Note that the implemented approach is different from that used by SelectionDAG because narrowing of scalars works differently in general. SelectionDAG iteratively cuts wide scalars into low and high halves until a legal size is reached. By contrast, GlobalISel does the narrowing in a single shot, which should be better for compile-time and for the quality of the generated code.
This patch leaves three gaps open:
1. When the factors are uniform, we should execute the multiplication on the SALU. Register bank mapping already ensures this.
However, the resulting code sequence is not optimal because it doesn't fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32 doesn't have a carry-in.) It is very difficult to fix this after the fact, so we should really use a different legalization sequence in this case. Unfortunately, we don't have a divergence analysis and so cannot make that choice.
(This only matters for 128-bit integers and larger.)
2. Avoid unnecessary multiplies when sources are known to be zero- or sign-extended. The challenge is that the legalizer does not currently have access to GISelKnownBits.
3. When the G_MUL is followed by a G_ADD, we should consider combining the two instructions into a single multiply-add sequence, to utilize the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has multiple uses and the implied duplication of the multiply is an overall negative). However, this is also not true when the factors are uniform: in that case, it is generally better to *not* combine the two operations, so that the multiply can be done on the SALU.
Again, we don't have a divergence analysis available and so cannot make an informed choice.
Differential Revision: https://reviews.llvm.org/D124844
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
794a0bb5 |
| 15-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Directly implement computeKnownBits for workitem intrinsics
Currently metadata is inserted in a late pass which is lowered to an AssertZext. The metadata would be more useful if it was inser
AMDGPU: Directly implement computeKnownBits for workitem intrinsics
Currently metadata is inserted in a late pass which is lowered to an AssertZext. The metadata would be more useful if it was inserted earlier after inlining, but before codegen.
Probably shouldn't change anything now. Just replacing the late metadata annotation needs more work, since we lose out on optimizations after these are lowered to CopyFromReg.
Seems to be slightly better than relying on the AssertZext from the metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using -start-before=amdgpu-isel instead of running the usual codegen pipeline.
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
538c7717 |
| 06-Apr-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Fix unused variable warning after D117484
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
54c525fc |
| 15-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Handle legacy grid ID intrinsics
Handle the llvm.r600.* intrinsics which are still in use in libclc. I thought it would be possible to switch it to using llvm.amdgcn.implicitarg.p
AMDGPU/GlobalISel: Handle legacy grid ID intrinsics
Handle the llvm.r600.* intrinsics which are still in use in libclc. I thought it would be possible to switch it to using llvm.amdgcn.implicitarg.ptr already, but it turns out the implicit arguments are currently split into a piece before and after the explicit kernel arguments.
show more ...
|
| #
898d5776 |
| 31-Mar-2022 |
Abinav Puthan Purayil <[email protected]> |
[AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer
Differential Revision: https://reviews.llvm.org/D122803
|
| #
acf83abc |
| 31-Mar-2022 |
Abinav Puthan Purayil <[email protected]> |
[AMDGPU][GlobalISel] Remove unused variable. NFC.
|
| #
dd5895cc |
| 17-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.
Reviewers: arsenm, sameerds, yaxunl
Differential Revision: https://reviews.llvm.org/D120265
show more ...
|
| #
37b37838 |
| 16-Mar-2022 |
Shengchen Kan <[email protected]> |
[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
|
| #
932f6281 |
| 03-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] new gfx940 fp atomics
Differential Revision: https://reviews.llvm.org/D121028
|
| #
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
| #
dcb2da13 |
| 11-Feb-2022 |
Julien Pages <[email protected]> |
[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16.
Differential Revision:
[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16.
Differential Revision: https://reviews.llvm.org/D110579
show more ...
|
| #
8b8b4913 |
| 27-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Fix assertions on invalid addrspacecasts
Fixes some assert on invalid situations and starts directly emitting the error.
|
| #
1194b9cd |
| 01-Feb-2022 |
Changpeng Fang <[email protected]> |
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for t
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2
Reviewers: t-tye, b-sumner, arsenm, and bcahoon
Fixes: SWDEV-307188, SWDEV-307189
Differential Revision: https://reviews.llvm.org/D118272
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
ae2f9c8b |
| 21-Dec-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Remove lz and nomip combine from codegen
These combines have been moved into the IR combiner in D116042.
Differential Revision: https://reviews.llvm.org/D116116
|
| #
0530fdbb |
| 20-Dec-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Fix LOD bias in A16 combine
As the codegen fix in D111754, the LOD bias needs to be converted to 16 bits. Fix this in the combine.
Differential Revision: https://reviews.llvm.org/D116038
|
| #
2e49e0cf |
| 19-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics
Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on s
AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics
Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on selection failure, where it would emit this error and then fail again.
show more ...
|
| #
be7e938e |
| 19-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd
This code is not structured to handle the legacy buffer intrinsics and was miscompiling them.
|
| #
8ff3c9e0 |
| 19-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics
The struct/raw forms for the buffer atomics now work as expected. However, we're incorrectly handling the legacy form (which we probably shouldn
AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics
The struct/raw forms for the buffer atomics now work as expected. However, we're incorrectly handling the legacy form (which we probably shouldn't handle at all). We also are not diagnosing the use of the return value on gfx908. These will be addressed separately.
show more ...
|