History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (Results 1 – 25 of 475)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3
# dca7f087 09-Aug-2022 David Stuttard <[email protected]>

AMDGPU: mbcnt allow for non-zero src1 for known-bits

Src1 for mbcnt can be a non-zero literal or register. Take this into account
when calculating known bits.

Differential Revision: https://reviews

AMDGPU: mbcnt allow for non-zero src1 for known-bits

Src1 for mbcnt can be a non-zero literal or register. Take this into account
when calculating known bits.

Differential Revision: https://reviews.llvm.org/D131478

(cherry picked from commit 1d1cc05539e275ae7666fc4b44bf725ec335078a)

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 1023ddaf 06-Jul-2022 Shilei Tian <[email protected]>

[LLVM] Add the support for fmax and fmin in atomicrmw instruction

This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instr

[LLVM] Add the support for fmax and fmin in atomicrmw instruction

This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041

show more ...


# bd675af2 30-Jun-2022 Piotr Sobczak <[email protected]>

[AMDGPU] Make v16i16/v16f16 legal

There are upcoming intrinsics to use the new types.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128865


Revision tags: llvmorg-14.0.6
# f1255186 18-Jun-2022 Guillaume Chatelet <[email protected]>

[NFC][Alignment] Remove max functions between Align and MaybeAlign

`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code

[NFC][Alignment] Remove max functions between Align and MaybeAlign

`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.

Differential Revision: https://reviews.llvm.org/D128121

show more ...


Revision tags: llvmorg-14.0.5
# 07881861 03-Jun-2022 Guillaume Chatelet <[email protected]>

[Alignment][NFC] Remove usage of MemSDNode::getAlignment

I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with

[Alignment][NFC] Remove usage of MemSDNode::getAlignment

I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910

show more ...


Revision tags: llvmorg-14.0.4
# 25af3afa 18-May-2022 Shao-Ce SUN <[email protected]>

[NFC][AMDGPU][CodeGen] Use ArrayRef in TargetLowering functions

Based on D123467.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D124508


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# 794a0bb5 15-Apr-2022 Matt Arsenault <[email protected]>

AMDGPU: Directly implement computeKnownBits for workitem intrinsics

Currently metadata is inserted in a late pass which is lowered
to an AssertZext. The metadata would be more useful if it was
inser

AMDGPU: Directly implement computeKnownBits for workitem intrinsics

Currently metadata is inserted in a late pass which is lowered
to an AssertZext. The metadata would be more useful if it was
inserted earlier after inlining, but before codegen.

Probably shouldn't change anything now. Just replacing the
late metadata annotation needs more work, since we lose
out on optimizations after these are lowered to CopyFromReg.

Seems to be slightly better than relying on the AssertZext from the
metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using
-start-before=amdgpu-isel instead of running the usual codegen
pipeline.

show more ...


Revision tags: llvmorg-14.0.1
# 4df4922d 08-Apr-2022 Nicolai Hähnle <[email protected]>

AMDGPU/SDAG: Custom SETCC (i.e. ballot) is always uniform

The AMDGPUISD::SETCC node is like ISD::SETCC, but returns a lane mask
instead of a per-lane boolean. The lane mask is uniform.

This improve

AMDGPU/SDAG: Custom SETCC (i.e. ballot) is always uniform

The AMDGPUISD::SETCC node is like ISD::SETCC, but returns a lane mask
instead of a per-lane boolean. The lane mask is uniform.

This improves instruction selection for code patterns like
ctpop(ballot(x)), which can now use an S_BCNT1_* instruction instead
of V_BCNT_*.

GlobalISel already selects scalar instructions (an earlier commit
added a test case)..

Differential Revision: https://reviews.llvm.org/D123432

show more ...


# e66f0edb 07-Apr-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Split unaligned LDS access instead of scalarizing

There is no need to fully scalarize an unaligned operation in
some case, just split it to alignment.

Differential Revision: https://review

[AMDGPU] Split unaligned LDS access instead of scalarizing

There is no need to fully scalarize an unaligned operation in
some case, just split it to alignment.

Differential Revision: https://reviews.llvm.org/D123330

show more ...


# 662b9fa0 28-Mar-2022 Shao-Ce SUN <[email protected]>

[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122557


# dd5895cc 17-Mar-2022 Changpeng Fang <[email protected]>

AMDGPU: Use the implicit kernargs for code object version 5

Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg

AMDGPU: Use the implicit kernargs for code object version 5

Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265

show more ...


# 989f1c72 15-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# a278250b 10-Mar-2022 Nico Weber <[email protected]>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


# 04fff547 07-Mar-2022 Venkata Ramanaiah Nalamothu <[email protected]>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652

show more ...


Revision tags: llvmorg-14.0.0-rc2
# dcb2da13 11-Feb-2022 Julien Pages <[email protected]>

[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode

Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.

Differential Revision:

[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode

Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.

Differential Revision: https://reviews.llvm.org/D110579

show more ...


Revision tags: llvmorg-14.0.0-rc1
# fd2bb51f 08-Feb-2022 Simon Pilgrim <[email protected]>

[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length

In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bit

[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length

In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask.

This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments.

I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these.

Differential Revision: https://reviews.llvm.org/D119019

show more ...


Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# bb1fe369 19-Jan-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Make v8i16/v8f16 legal

Differential Revision: https://reviews.llvm.org/D117721


# 454256ef 14-Jan-2022 Craig Topper <[email protected]>

[AMDGPU] Correct the known bits calculation for MUL_I24.

I'm not entirely sure, but based on how ComputeNumSignBits handles
ISD::MUL, I believe this code was miscounting the number of sign
bits.

As

[AMDGPU] Correct the known bits calculation for MUL_I24.

I'm not entirely sure, but based on how ComputeNumSignBits handles
ISD::MUL, I believe this code was miscounting the number of sign
bits.

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x800000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

The number of valid bits for the left and right operands is now
the number of non-sign bits + 1. If the sum of the valid bits of
the left and right sides exceeds 32, then the result may overflow and we
can't say anything about the sign of the result. If the sum is 32
or less then it won't overflow and we know the result has at least
1 sign bit.

For the previous example, the code will now calculate the left
side valid bits as 24 and the right side as 1. The sum will be 25
and the sign bits will be 32 - 25 + 1 which is 8, the correct value.

Differential Revision: https://reviews.llvm.org/D116469

show more ...


Revision tags: llvmorg-13.0.1-rc2
# ff971873 07-Jan-2022 Jay Foad <[email protected]>

[GlobalISel] Fix legality checks for G_UBFX combines

1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality
with the correct types for the G_UBFX that it builds.
2. Fix AMDGPUTarget

[GlobalISel] Fix legality checks for G_UBFX combines

1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality
with the correct types for the G_UBFX that it builds.
2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to
match the legality rules: result and first operand can be s32 or s64
but the "shift amount" operands are always s32.
3. Add AMDGPU tests where the post-legalizer combiner would create
illegal MIR without the above fixes.

Differential Revision: https://reviews.llvm.org/D116802

show more ...


# 3f3fe4a5 07-Jan-2022 Jay Foad <[email protected]>

[GlobalISel] Fix typo Extact to Extract in function name. NFC.


# cbcbbd6a 03-Jan-2022 Craig Topper <[email protected]>

[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC

This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" t

[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC

This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" to match similar functions
in KnownBits like countMaxActiveBits.

Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old
name around to keep this patch size down. Will do a bulk rename as
follow up.

Rename KnownBits::countMaxSignedBits->countMaxSignificantBits.

Reviewed By: lebedev.ri, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D116522

show more ...


# 09b53296 22-Dec-2021 Ron Lieberman <[email protected]>

Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"

This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

Failed amdgpu runtime buildbot # 3514


# 9075009d 22-Dec-2021 RamNalamothu <[email protected]>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652

show more ...


Revision tags: llvmorg-13.0.1-rc1
# d7e03df7 12-Nov-2021 Jay Foad <[email protected]>

[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32

Select SelectionDAG ops smul_lohi/umul_lohi to
v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0.
v_mul_lo, v_mul_

[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32

Select SelectionDAG ops smul_lohi/umul_lohi to
v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0.
v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions
so it is better to use one instruction than two.

Further improvements are possible to make better use of the addend
operand, but this is already a strict improvement over what we have
now.

Differential Revision: https://reviews.llvm.org/D113986

show more ...


12345678910>>...19