|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
445a483b |
| 13-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add new GFX11 intrinsic llvm.amdgcn.exp.row
Differential Revision: https://reviews.llvm.org/D127671
|
| #
bfcfd53b |
| 13-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were a
[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane.
Also use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D127662
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
2417de27 |
| 25-Apr-2022 |
Mariusz Sikora <[email protected]> |
[AMDGPU] Use d16 flag for image.sample instructions
Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled.
This patch adds new pattern in InstCombine
[AMDGPU] Use d16 flag for image.sample instructions
Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled.
This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic.
Differential Revision: https://reviews.llvm.org/D124232
show more ...
|
| #
c6afbdb5 |
| 25-Apr-2022 |
Piotr Sobczak <[email protected]> |
Revert "[AMDGPU] Use d16 flag for image.sample instructions"
This reverts commit d1762fc454c0d7ee0bcffe87e798f67b6c43c1d2.
Reverting D124232 as the buildbot reported some errors in sanitizers.
|
| #
d1762fc4 |
| 25-Apr-2022 |
Mariusz Sikora <[email protected]> |
[AMDGPU] Use d16 flag for image.sample instructions
Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled.
This patch adds new pattern in InstCombine
[AMDGPU] Use d16 flag for image.sample instructions
Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled.
This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic.
Differential Revision: https://reviews.llvm.org/D124232
show more ...
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
4ed7c6ee |
| 24-Jan-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Only match correct type for a16
Addresses are floats when a sampler is present and unsigned integers when no sampler is present.
Therefore, only zext instructions, not sext instructions sh
[AMDGPU] Only match correct type for a16
Addresses are floats when a sampler is present and unsigned integers when no sampler is present.
Therefore, only zext instructions, not sext instructions should match.
Also match integer constants that can be truncated.
Differential Revision: https://reviews.llvm.org/D118043
show more ...
|
| #
80532ebb |
| 24-Jan-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][InstCombine] Remove zero image offset
Remove the offset parameter if it is zero.
Differential Revision: https://reviews.llvm.org/D117876
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
603d1803 |
| 21-Dec-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction. Also copy other image optimizations (l->lz, mip->nomip) to IR combines.
Differential Rev
[AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction. Also copy other image optimizations (l->lz, mip->nomip) to IR combines.
Differential Revision: https://reviews.llvm.org/D116042
show more ...
|
| #
0530fdbb |
| 20-Dec-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Fix LOD bias in A16 combine
As the codegen fix in D111754, the LOD bias needs to be converted to 16 bits. Fix this in the combine.
Differential Revision: https://reviews.llvm.org/D116038
|
| #
45f16eab |
| 13-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Combine is.shared/is.private of null/undef
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
f631173d |
| 30-Sep-2021 |
Kazu Hirata <[email protected]> |
[llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
dc6e8dfd |
| 20-Sep-2021 |
Jacob Lambert <[email protected]> |
[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
48958d02 |
| 23-Aug-2021 |
Daniil Fukalov <[email protected]> |
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `G
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108596
show more ...
|
| #
3f4d00bc |
| 18-Aug-2021 |
Arthur Eubanks <[email protected]> |
[NFC] More get/removeAttribute() cleanup
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
560d7e04 |
| 20-Jan-2021 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
|
Revision tags: llvmorg-11.1.0-rc1 |
|
| #
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
| #
0e219b64 |
| 03-Jan-2021 |
Kazu Hirata <[email protected]> |
[Target] Construct SmallVector with iterator ranges (NFC)
|
| #
9b296102 |
| 29-Dec-2020 |
Juneyoung Lee <[email protected]> |
Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleV
Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them.
Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
show more ...
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
c7afb698 |
| 16-Dec-2020 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Avoid calling copyFastMathFlags in wrong context
Calling Instruction::copyFastMathFlags() assumes the caller is FPMathOperator. Avoid calling the function for instructions that are not inst
[AMDGPU] Avoid calling copyFastMathFlags in wrong context
Calling Instruction::copyFastMathFlags() assumes the caller is FPMathOperator. Avoid calling the function for instructions that are not instances of FPMathOperator.
show more ...
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
958130df |
| 23-Oct-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy
This follows on from D89558 which added the new intrinsic and D88955 which added similar combines for llvm.amdgcn.fmul.legacy.
Differ
[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy
This follows on from D89558 which added the new intrinsic and D88955 which added similar combines for llvm.amdgcn.fmul.legacy.
Differential Revision: https://reviews.llvm.org/D90028
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6 |
|
| #
86a480e9 |
| 06-Oct-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy
Differential Revision: https://reviews.llvm.org/D88955
|
|
Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4 |
|
| #
20e9c36c |
| 26-Sep-2020 |
Fangrui Song <[email protected]> |
Internalize functions from various tools. NFC
And internalize some classes if I noticed them:)
|
|
Revision tags: llvmorg-11.0.0-rc3 |
|
| #
f0268121 |
| 17-Sep-2020 |
Simon Pilgrim <[email protected]> |
InstCombiner.h - remove unnecessary KnownBits.h include. NFCI.
Move the include down to cpp files with an implicit dependency.
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1 |
|
| #
833b3b0d |
| 23-Jul-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types and enable InstCombine to emit them.
This patch only implements it for the selection dag. GlobalISel
[AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types and enable InstCombine to emit them.
This patch only implements it for the selection dag. GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.
Differential Revision: https://reviews.llvm.org/D84420
show more ...
|
|
Revision tags: llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
b8d19947 |
| 04-Jun-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Add A16/G16 to InstCombine
When sampling from images with coordinates that only have 16 bit accuracy, convert the image intrinsic call to use a16 or g16. This does only happen if the target
[AMDGPU] Add A16/G16 to InstCombine
When sampling from images with coordinates that only have 16 bit accuracy, convert the image intrinsic call to use a16 or g16. This does only happen if the target hardware supports it.
An alternative would be to always apply this combination, independent of the target hardware and extend 16 bit arguments to 32 bit arguments during legalization. To me, this sounds like an unnecessary roundtrip that could prevent some further InstCombine optimizations.
Differential Revision: https://reviews.llvm.org/D85887
show more ...
|