AMDGPULegalizerInfo.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 0387da6f	20-Jul-2022	Kazu Hirata <[email protected]>	Use value instead of getValue (NFC)
# 41ae78ea	20-Jul-2022	Kazu Hirata <[email protected]>	Use has_value instead of hasValue (NFC)
# 3a205977	19-Jul-2022	Jon Chesterfield <[email protected]>	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia [amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060 show more ...
Revision tags: llvmorg-14.0.6
# ae72fee7	20-Jun-2022	Joe Nash <[email protected]>	[AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type Reviewed By: #amdgpu, foad, rampitec Differential Revision: https://reviews.llvm.org/D128205
# 6cae753b	20-Jun-2022	Mirko Brkusanin <[email protected]>	[AMDGPU][GlobalISel] Legalize G_FSUB for s16 Differential Revision: https://reviews.llvm.org/D128066
# c155a944	14-Jun-2022	Jay Foad <[email protected]>	[AMDGPU] GFX11 CodeGen support for MIMG instructions This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAG [AMDGPU] GFX11 CodeGen support for MIMG instructions This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAGE_SAMPLE instructions out into separate test files since they are no longer supported in GFX11 Differential Revision: https://reviews.llvm.org/D127837 show more ...
Revision tags: llvmorg-14.0.5
# 0abb472f	09-Jun-2022	Benjamin Kramer <[email protected]>	AMDGPU/GISel: Remove unused variable. NFC.
Revision tags: llvmorg-14.0.4
# 264d1136	29-Apr-2022	Nicolai Hähnle <[email protected]>	AMDGPU/GISel: Introduce custom legalization of G_MUL The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32. Generating opti AMDGPU/GISel: Introduce custom legalization of G_MUL The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32. Generating optimal code sequences for big integer multiplication is somewhat tricky and has a number of target-specific intricacies: - The target has V_MAD_U64_U32 instructions that multiply two 32-bit factors and add a 64-bit accumulator. Most partial products should use this instruction. - The accumulator is mapped to consecutive 32-bit GPRs, and partial- product multiply-adds can feed the accumulator into each other directly. (The register allocator's support for that is somewhat limited, but that only matters for 128-bit integers and larger.) - OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator to be stored in an even-aligned pair of GPRs. To avoid excessive register copies, it makes sense to compute odd partial products separately from even partial products (where a partial product src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both halves together as a final step. - We can combine G_MUL+G_ADD into a single cascade of multiply-adds. - The target can keep many carry-bits in flight simultaneously, so combining carries using G_UADDE is preferable over G_ZEXT + G_ADD. - Not addressed by this patch: When the factors are sign-extended, the V_MAD_I64_I32 instruction (signed version!) can be used. It is difficult to address these points generically: 1) Finding matching pairs of G_MUL and G_UMULH to find a wide multiply is expensive. We could add a G_UMUL_LOHI generic instruction and conditionally use that in the generic legalizer, but by itself this wouldn't allow us to use the accumulation capability of V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE post-legalization, but this is also expensive. 2) Similarly, making sense of the legalization outcome of a wide pre-legalization G_MUL+G_ADD pair is extremely expensive. 3) How could the generic legalizer possibly deal with the particular idiosyncracy of "odd" vs. "even" partial products. All this points in the direction of directly emitting an ideal code sequence during legalization, but the generic legalizer should not be burdened with such overly target-specific concerns. Hence, a custom legalization. Note that the implemented approach is different from that used by SelectionDAG because narrowing of scalars works differently in general. SelectionDAG iteratively cuts wide scalars into low and high halves until a legal size is reached. By contrast, GlobalISel does the narrowing in a single shot, which should be better for compile-time and for the quality of the generated code. This patch leaves three gaps open: 1. When the factors are uniform, we should execute the multiplication on the SALU. Register bank mapping already ensures this. However, the resulting code sequence is not optimal because it doesn't fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32 doesn't have a carry-in.) It is very difficult to fix this after the fact, so we should really use a different legalization sequence in this case. Unfortunately, we don't have a divergence analysis and so cannot make that choice. (This only matters for 128-bit integers and larger.) 2. Avoid unnecessary multiplies when sources are known to be zero- or sign-extended. The challenge is that the legalizer does not currently have access to GISelKnownBits. 3. When the G_MUL is followed by a G_ADD, we should consider combining the two instructions into a single multiply-add sequence, to utilize the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has multiple uses and the implied duplication of the multiply is an overall negative). However, this is also not true when the factors are uniform: in that case, it is generally better to not combine the two operations, so that the multiply can be done on the SALU. Again, we don't have a divergence analysis available and so cannot make an informed choice. Differential Revision: https://reviews.llvm.org/D124844 show more ...
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# 794a0bb5	15-Apr-2022	Matt Arsenault <[email protected]>	AMDGPU: Directly implement computeKnownBits for workitem intrinsics Currently metadata is inserted in a late pass which is lowered to an AssertZext. The metadata would be more useful if it was inser AMDGPU: Directly implement computeKnownBits for workitem intrinsics Currently metadata is inserted in a late pass which is lowered to an AssertZext. The metadata would be more useful if it was inserted earlier after inlining, but before codegen. Probably shouldn't change anything now. Just replacing the late metadata annotation needs more work, since we lose out on optimizations after these are lowered to CopyFromReg. Seems to be slightly better than relying on the AssertZext from the metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using -start-before=amdgpu-isel instead of running the usual codegen pipeline. show more ...
Revision tags: llvmorg-14.0.1
# 538c7717	06-Apr-2022	Jay Foad <[email protected]>	[AMDGPU] Fix unused variable warning after D117484
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 54c525fc	15-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Handle legacy grid ID intrinsics Handle the llvm.r600.* intrinsics which are still in use in libclc. I thought it would be possible to switch it to using llvm.amdgcn.implicitarg.p AMDGPU/GlobalISel: Handle legacy grid ID intrinsics Handle the llvm.r600.* intrinsics which are still in use in libclc. I thought it would be possible to switch it to using llvm.amdgcn.implicitarg.ptr already, but it turns out the implicit arguments are currently split into a piece before and after the explicit kernel arguments. show more ...
# 898d5776	31-Mar-2022	Abinav Puthan Purayil <[email protected]>	[AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer Differential Revision: https://reviews.llvm.org/D122803
# acf83abc	31-Mar-2022	Abinav Puthan Purayil <[email protected]>	[AMDGPU][GlobalISel] Remove unused variable. NFC.
# dd5895cc	17-Mar-2022	Changpeng Fang <[email protected]>	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265 show more ...
# 37b37838	16-Mar-2022	Shengchen Kan <[email protected]>	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
# 932f6281	03-Mar-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] new gfx940 fp atomics Differential Revision: https://reviews.llvm.org/D121028
# 6527b2a4	18-Feb-2022	Sebastian Neubauer <[email protected]>	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235
# dcb2da13	11-Feb-2022	Julien Pages <[email protected]>	[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: https://reviews.llvm.org/D110579 show more ...
# 8b8b4913	27-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Fix assertions on invalid addrspacecasts Fixes some assert on invalid situations and starts directly emitting the error.
# 1194b9cd	01-Feb-2022	Changpeng Fang <[email protected]>	AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for t AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2 Reviewers: t-tye, b-sumner, arsenm, and bcahoon Fixes: SWDEV-307188, SWDEV-307189 Differential Revision: https://reviews.llvm.org/D118272 show more ...
Revision tags: llvmorg-13.0.1-rc2
# ae2f9c8b	21-Dec-2021	Sebastian Neubauer <[email protected]>	[AMDGPU] Remove lz and nomip combine from codegen These combines have been moved into the IR combiner in D116042. Differential Revision: https://reviews.llvm.org/D116116
# 0530fdbb	20-Dec-2021	Sebastian Neubauer <[email protected]>	[AMDGPU] Fix LOD bias in A16 combine As the codegen fix in D111754, the LOD bias needs to be converted to 16 bits. Fix this in the combine. Differential Revision: https://reviews.llvm.org/D116038
# 2e49e0cf	19-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on s AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on selection failure, where it would emit this error and then fail again. show more ...
# be7e938e	19-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd This code is not structured to handle the legacy buffer intrinsics and was miscompiling them.
# 8ff3c9e0	19-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics The struct/raw forms for the buffer atomics now work as expected. However, we're incorrectly handling the legacy form (which we probably shouldn AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics The struct/raw forms for the buffer atomics now work as expected. However, we're incorrectly handling the legacy form (which we probably shouldn't handle at all). We also are not diagnosing the use of the return value on gfx908. These will be addressed separately. show more ...
12 3 4 5 6 7 8 9 10 >>...22