History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (Results 1 – 25 of 28)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# 445a483b 13-Jun-2022 Jay Foad <[email protected]>

[AMDGPU] Add new GFX11 intrinsic llvm.amdgcn.exp.row

Differential Revision: https://reviews.llvm.org/D127671


# bfcfd53b 13-Jun-2022 Jay Foad <[email protected]>

[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic

Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were a

[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic

Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.

Also use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D127662

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# 2417de27 25-Apr-2022 Mariusz Sikora <[email protected]>

[AMDGPU] Use d16 flag for image.sample instructions

Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine

[AMDGPU] Use d16 flag for image.sample instructions

Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.

Differential Revision: https://reviews.llvm.org/D124232

show more ...


# c6afbdb5 25-Apr-2022 Piotr Sobczak <[email protected]>

Revert "[AMDGPU] Use d16 flag for image.sample instructions"

This reverts commit d1762fc454c0d7ee0bcffe87e798f67b6c43c1d2.

Reverting D124232 as the buildbot reported some errors in sanitizers.


# d1762fc4 25-Apr-2022 Mariusz Sikora <[email protected]>

[AMDGPU] Use d16 flag for image.sample instructions

Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine

[AMDGPU] Use d16 flag for image.sample instructions

Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.

Differential Revision: https://reviews.llvm.org/D124232

show more ...


Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init
# 4ed7c6ee 24-Jan-2022 Sebastian Neubauer <[email protected]>

[AMDGPU] Only match correct type for a16

Addresses are floats when a sampler is present and unsigned integers
when no sampler is present.

Therefore, only zext instructions, not sext instructions sh

[AMDGPU] Only match correct type for a16

Addresses are floats when a sampler is present and unsigned integers
when no sampler is present.

Therefore, only zext instructions, not sext instructions should match.

Also match integer constants that can be truncated.

Differential Revision: https://reviews.llvm.org/D118043

show more ...


# 80532ebb 24-Jan-2022 Sebastian Neubauer <[email protected]>

[AMDGPU][InstCombine] Remove zero image offset

Remove the offset parameter if it is zero.

Differential Revision: https://reviews.llvm.org/D117876


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 603d1803 21-Dec-2021 Sebastian Neubauer <[email protected]>

[AMDGPU][InstCombine] Remove zero LOD bias

If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Rev

[AMDGPU][InstCombine] Remove zero LOD bias

If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Revision: https://reviews.llvm.org/D116042

show more ...


# 0530fdbb 20-Dec-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Fix LOD bias in A16 combine

As the codegen fix in D111754, the LOD bias needs to be converted to 16
bits. Fix this in the combine.

Differential Revision: https://reviews.llvm.org/D116038


# 45f16eab 13-Dec-2021 Matt Arsenault <[email protected]>

AMDGPU: Combine is.shared/is.private of null/undef


Revision tags: llvmorg-13.0.1-rc1
# f631173d 30-Sep-2021 Kazu Hirata <[email protected]>

[llvm] Migrate from arg_operands to args (NFC)

Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# dc6e8dfd 20-Sep-2021 Jacob Lambert <[email protected]>

[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.


Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 48958d02 23-Aug-2021 Daniil Fukalov <[email protected]>

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `G

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
and `R600TargetMachine::getSubtargetImpl()` had different return value type
than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596

show more ...


# 3f4d00bc 18-Aug-2021 Arthur Eubanks <[email protected]>

[NFC] More get/removeAttribute() cleanup


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# 560d7e04 20-Jan-2021 dfukalov <[email protected]>

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036


Revision tags: llvmorg-11.1.0-rc1
# 6a87e9b0 25-Dec-2020 dfukalov <[email protected]>

[NFC][AMDGPU] Reduce include files dependency.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813


# 0e219b64 03-Jan-2021 Kazu Hirata <[email protected]>

[Target] Construct SmallVector with iterator ranges (NFC)


# 9b296102 29-Dec-2020 Juneyoung Lee <[email protected]>

Use unary CreateShuffleVector if possible

As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleV

Use unary CreateShuffleVector if possible

As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# c7afb698 16-Dec-2020 Piotr Sobczak <[email protected]>

[AMDGPU] Avoid calling copyFastMathFlags in wrong context

Calling Instruction::copyFastMathFlags() assumes the caller is
FPMathOperator. Avoid calling the function for instructions
that are not inst

[AMDGPU] Avoid calling copyFastMathFlags in wrong context

Calling Instruction::copyFastMathFlags() assumes the caller is
FPMathOperator. Avoid calling the function for instructions
that are not instances of FPMathOperator.

show more ...


Revision tags: llvmorg-11.0.1-rc1
# 958130df 23-Oct-2020 Jay Foad <[email protected]>

[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy

This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.

Differ

[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy

This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.

Differential Revision: https://reviews.llvm.org/D90028

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6
# 86a480e9 06-Oct-2020 Jay Foad <[email protected]>

[AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy

Differential Revision: https://reviews.llvm.org/D88955


Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4
# 20e9c36c 26-Sep-2020 Fangrui Song <[email protected]>

Internalize functions from various tools. NFC

And internalize some classes if I noticed them:)


Revision tags: llvmorg-11.0.0-rc3
# f0268121 17-Sep-2020 Simon Pilgrim <[email protected]>

InstCombiner.h - remove unnecessary KnownBits.h include. NFCI.

Move the include down to cpp files with an implicit dependency.


Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1
# 833b3b0d 23-Jul-2020 Sebastian Neubauer <[email protected]>

[AMDGPU] Add v3f16/v3i16 support to SDag

Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel

[AMDGPU] Add v3f16/v3i16 support to SDag

Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420

show more ...


Revision tags: llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2
# b8d19947 04-Jun-2020 Sebastian Neubauer <[email protected]>

[AMDGPU] Add A16/G16 to InstCombine

When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target

[AMDGPU] Add A16/G16 to InstCombine

When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.

An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.

Differential Revision: https://reviews.llvm.org/D85887

show more ...


12