AMDGPUISelLowering.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3
# dca7f087	09-Aug-2022	David Stuttard <[email protected]>	AMDGPU: mbcnt allow for non-zero src1 for known-bits Src1 for mbcnt can be a non-zero literal or register. Take this into account when calculating known bits. Differential Revision: https://reviews AMDGPU: mbcnt allow for non-zero src1 for known-bits Src1 for mbcnt can be a non-zero literal or register. Take this into account when calculating known bits. Differential Revision: https://reviews.llvm.org/D131478 (cherry picked from commit 1d1cc05539e275ae7666fc4b44bf725ec335078a) show more ...
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 1023ddaf	06-Jul-2022	Shilei Tian <[email protected]>	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instr [LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041 show more ...
# bd675af2	30-Jun-2022	Piotr Sobczak <[email protected]>	[AMDGPU] Make v16i16/v16f16 legal There are upcoming intrinsics to use the new types. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128865
Revision tags: llvmorg-14.0.6
# f1255186	18-Jun-2022	Guillaume Chatelet <[email protected]>	[NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code [NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code more opaque. Differential Revision: https://reviews.llvm.org/D128121 show more ...
Revision tags: llvmorg-14.0.5
# 07881861	03-Jun-2022	Guillaume Chatelet <[email protected]>	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with [Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910 show more ...
Revision tags: llvmorg-14.0.4
# 25af3afa	18-May-2022	Shao-Ce SUN <[email protected]>	[NFC][AMDGPU][CodeGen] Use ArrayRef in TargetLowering functions Based on D123467. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124508
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# 794a0bb5	15-Apr-2022	Matt Arsenault <[email protected]>	AMDGPU: Directly implement computeKnownBits for workitem intrinsics Currently metadata is inserted in a late pass which is lowered to an AssertZext. The metadata would be more useful if it was inser AMDGPU: Directly implement computeKnownBits for workitem intrinsics Currently metadata is inserted in a late pass which is lowered to an AssertZext. The metadata would be more useful if it was inserted earlier after inlining, but before codegen. Probably shouldn't change anything now. Just replacing the late metadata annotation needs more work, since we lose out on optimizations after these are lowered to CopyFromReg. Seems to be slightly better than relying on the AssertZext from the metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using -start-before=amdgpu-isel instead of running the usual codegen pipeline. show more ...
Revision tags: llvmorg-14.0.1
# 4df4922d	08-Apr-2022	Nicolai Hähnle <[email protected]>	AMDGPU/SDAG: Custom SETCC (i.e. ballot) is always uniform The AMDGPUISD::SETCC node is like ISD::SETCC, but returns a lane mask instead of a per-lane boolean. The lane mask is uniform. This improve AMDGPU/SDAG: Custom SETCC (i.e. ballot) is always uniform The AMDGPUISD::SETCC node is like ISD::SETCC, but returns a lane mask instead of a per-lane boolean. The lane mask is uniform. This improves instruction selection for code patterns like ctpop(ballot(x)), which can now use an S_BCNT1_* instruction instead of V_BCNT_*. GlobalISel already selects scalar instructions (an earlier commit added a test case).. Differential Revision: https://reviews.llvm.org/D123432 show more ...
# e66f0edb	07-Apr-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Split unaligned LDS access instead of scalarizing There is no need to fully scalarize an unaligned operation in some case, just split it to alignment. Differential Revision: https://review [AMDGPU] Split unaligned LDS access instead of scalarizing There is no need to fully scalarize an unaligned operation in some case, just split it to alignment. Differential Revision: https://reviews.llvm.org/D123330 show more ...
# 662b9fa0	28-Mar-2022	Shao-Ce SUN <[email protected]>	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557
# dd5895cc	17-Mar-2022	Changpeng Fang <[email protected]>	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265 show more ...
# 989f1c72	15-Mar-2022	serge-sans-paille <[email protected]>	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# a278250b	10-Mar-2022	Nico Weber <[email protected]>	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/ Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169 show more ...
# 7f230fee	07-Mar-2022	serge-sans-paille <[email protected]>	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169
# 04fff547	07-Mar-2022	Venkata Ramanaiah Nalamothu <[email protected]>	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652 show more ...
Revision tags: llvmorg-14.0.0-rc2
# dcb2da13	11-Feb-2022	Julien Pages <[email protected]>	[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: https://reviews.llvm.org/D110579 show more ...
Revision tags: llvmorg-14.0.0-rc1
# fd2bb51f	08-Feb-2022	Simon Pilgrim <[email protected]>	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bit [ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019 show more ...
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# bb1fe369	19-Jan-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721
# 454256ef	14-Jan-2022	Craig Topper <[email protected]>	[AMDGPU] Correct the known bits calculation for MUL_I24. I'm not entirely sure, but based on how ComputeNumSignBits handles ISD::MUL, I believe this code was miscounting the number of sign bits. As [AMDGPU] Correct the known bits calculation for MUL_I24. I'm not entirely sure, but based on how ComputeNumSignBits handles ISD::MUL, I believe this code was miscounting the number of sign bits. As an example of an incorrect result let's say that countMinSignBits returned 1 for the left hand side and 24 for the right hand side. LHSValBits would be 23 and RHSValBits would be 0 and the sum would be 23. This would cause the code to set 9 high bits as zero/one. Now suppose the real values for the left side is 0x800000 and the right hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits not 9. The number of valid bits for the left and right operands is now the number of non-sign bits + 1. If the sum of the valid bits of the left and right sides exceeds 32, then the result may overflow and we can't say anything about the sign of the result. If the sum is 32 or less then it won't overflow and we know the result has at least 1 sign bit. For the previous example, the code will now calculate the left side valid bits as 24 and the right side as 1. The sum will be 25 and the sign bits will be 32 - 25 + 1 which is 8, the correct value. Differential Revision: https://reviews.llvm.org/D116469 show more ...
Revision tags: llvmorg-13.0.1-rc2
# ff971873	07-Jan-2022	Jay Foad <[email protected]>	[GlobalISel] Fix legality checks for G_UBFX combines 1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality with the correct types for the G_UBFX that it builds. 2. Fix AMDGPUTarget [GlobalISel] Fix legality checks for G_UBFX combines 1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality with the correct types for the G_UBFX that it builds. 2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to match the legality rules: result and first operand can be s32 or s64 but the "shift amount" operands are always s32. 3. Add AMDGPU tests where the post-legalizer combiner would create illegal MIR without the above fixes. Differential Revision: https://reviews.llvm.org/D116802 show more ...
# 3f3fe4a5	07-Jan-2022	Jay Foad <[email protected]>	[GlobalISel] Fix typo Extact to Extract in function name. NFC.
# cbcbbd6a	03-Jan-2022	Craig Topper <[email protected]>	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" t [ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522 show more ...
# 09b53296	22-Dec-2021	Ron Lieberman <[email protected]>	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3. Failed amdgpu runtime buildbot # 3514
# 9075009d	22-Dec-2021	RamNalamothu <[email protected]>	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652 show more ...
Revision tags: llvmorg-13.0.1-rc1
# d7e03df7	12-Nov-2021	Jay Foad <[email protected]>	[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 Select SelectionDAG ops smul_lohi/umul_lohi to v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0. v_mul_lo, v_mul_ [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 Select SelectionDAG ops smul_lohi/umul_lohi to v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0. v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions so it is better to use one instruction than two. Further improvements are possible to make better use of the addend operand, but this is already a strict improvement over what we have now. Differential Revision: https://reviews.llvm.org/D113986 show more ...
12 3 4 5 6 7 8 9 10 >>...19