|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
| #
d1af09ad |
| 23-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back.
Depends on D128442, D128270
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D128656
show more ...
|
| #
0f94d2b3 |
| 30-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader
[AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs.
Differential Revision: https://reviews.llvm.org/D128442
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
cfb7ffde |
| 21-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] New AMDGPUInsertDelayAlu pass
Differential Revision: https://reviews.llvm.org/D128270
|
| #
b5818e4e |
| 24-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Cluster stores as well as loads for GFX11
Differential Revision: https://reviews.llvm.org/D128517
|
| #
7a47ee51 |
| 21-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Don't use Optional::getValue (NFC)
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
48ebc1af |
| 03-Jun-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Add more expressive sched_barrier controls
The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly opti
[AMDGPU] Add more expressive sched_barrier controls
The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly optimized code. This patch adds more granular control over the types of instructions that are allowed to be reordered with respect to one or multiple sched_barriers. A mask is used to specify groups of instructions that should be allowed to be scheduled around a sched_barrier. The details about this mask may be used can be found in llvm/include/llvm/IR/IntrinsicsAMDGPU.td.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D127123
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3 |
|
| #
0e1c71e4 |
| 27-Apr-2022 |
Matt Arsenault <[email protected]> |
CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine
Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function.
Currently AMDGPU i
CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine
Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function.
Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.
show more ...
|
| #
2e61dfb1 |
| 16-May-2022 |
jeff <[email protected]> |
[AMDGPU] Instruction Type Pipeline
This patch implements a DAG mutation which adds edges between different groups of instructions. The purpose is to try to generate code that conforms to a pipeline
[AMDGPU] Instruction Type Pipeline
This patch implements a DAG mutation which adds edges between different groups of instructions. The purpose is to try to generate code that conforms to a pipeline (groupA instructions occur before groupB, groupB -> groupC, and so on). Currently the pipeline order is hardcoded as VMEM->DSRead->MFMA->DSWrite, but the patch was designed to be easily extensible. Alias analysis is problematic for pipelining as memory instructions will usually not be able to be reordered w.r.t one another.
Differential Revision: https://reviews.llvm.org/D125997
show more ...
|
| #
f822db76 |
| 26-Apr-2022 |
jeff <[email protected]> |
[AMDGPU] Allow for MFMA Inst Clustering
This patch adds cluster edges between independent MFMA instructions. Additionally, it propogates all predecessors of cluster insts to the root of the cluster(
[AMDGPU] Allow for MFMA Inst Clustering
This patch adds cluster edges between independent MFMA instructions. Additionally, it propogates all predecessors of cluster insts to the root of the cluster(s), and all successors to the leaf(ves) of the cluster(s) -- this is done to remove the possibility that those insts will be interspersed within the cluster.
Reviewed By: kerbowa
Differential Revision: https://reviews.llvm.org/D124678
show more ...
|
| #
3ff8ee24 |
| 28-Apr-2022 |
jeff <[email protected]> |
[NFC] Fix typo
Reviewed By: kerbowa
Differential Revision: https://reviews.llvm.org/D124647
|
| #
6ddf2a82 |
| 27-Apr-2022 |
Ivan Kosarev <[email protected]> |
[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.
As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation an
[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.
As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation and then issuing their VMEM loads, which in turn leads the VALU unit to idle. This patch tries to prevent this by temporarily raising the wave's priority.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D124246
show more ...
|
|
Revision tags: llvmorg-14.0.2 |
|
| #
987df725 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Serialize VGPRForAGPRCopy
|
| #
5cd17f9d |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Serialize WWM registers
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
203a1e36 |
| 17-Dec-2021 |
Matt Arsenault <[email protected]> |
Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"
This reverts commit 8a85be807bd453eb9c88d0126c75fd5ea393f60d.
The unrelated failure this exposed was fixed.
|
| #
1235aaef |
| 06-Apr-2022 |
Craig Topper <[email protected]> |
[AArch64][AMDGPU][WebAssembly] Use static_cast instead of a reinterpret_cast to downcast in parseMachineFunctionInfo. NFC
static_cast is a little safer here since the compiler will ensure we're cast
[AArch64][AMDGPU][WebAssembly] Use static_cast instead of a reinterpret_cast to downcast in parseMachineFunctionInfo. NFC
static_cast is a little safer here since the compiler will ensure we're casting to a class derived from yaml::MachineFunctionInfo.
I believe this first appeared on AMDGPU and was copied to the other two targets.
Spotted when it was being copied to RISCV in D123178.
Differential Revision: https://reviews.llvm.org/D123260
show more ...
|
| #
991dc4b4 |
| 15-Mar-2022 |
Pavel Labath <[email protected]> |
Remove a top-level "using namespace" in TargetTransformInfoImpl.h
Avoids polluting the namespace of all files including the header.
|
| #
ed98c1b3 |
| 09-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332
|
| #
c4b1a63a |
| 25-Feb-2022 |
Jameson Nash <[email protected]> |
mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D1
mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D120518
show more ...
|
| #
8a85be80 |
| 16-Dec-2021 |
Ron Lieberman <[email protected]> |
Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"
Offload abort in Nekbone
This reverts commit 2b4876157562bc76e86f193d371348993905bc61.
|
| #
2b487615 |
| 14-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove AMDGPUFixFunctionBitcasts pass
This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls
AMDGPU: Remove AMDGPUFixFunctionBitcasts pass
This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls are supposed to work now, so drop the hack.
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
06b90175 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove fixed function ABI option
|
| #
729bf9b2 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway.
Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.
show more ...
|
| #
bf225939 |
| 15-Oct-2021 |
Michael Liao <[email protected]> |
[InferAddressSpaces] Support assumed addrspaces from addrspace predicates.
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the addres
[InferAddressSpaces] Support assumed addrspaces from addrspace predicates.
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC. - This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,
``` foo(float *p) { __builtin_assume(__isGlobal(p)); // From there, we could assume p is a global pointer instead of a // generic one. } ```
This makes the code portable without introducing the implementation-specific features.
Note that NVCC starts to support __builtin_assume from version 11.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112041
show more ...
|
| #
58e7ec47 |
| 22-Oct-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Run SIShrinkInstructions before post-RA scheduling
Run post-RA SIShrinkInstructions just before post-RA scheduling, instead of afterwards. After the fixes in D112305 and D112317 this seems
[AMDGPU] Run SIShrinkInstructions before post-RA scheduling
Run post-RA SIShrinkInstructions just before post-RA scheduling, instead of afterwards. After the fixes in D112305 and D112317 this seems to make no difference, but it paves the way for scheduler tweaks that are sensitive to the e32 vs e64 encoding of VALU instructions.
Differential Revision: https://reviews.llvm.org/D112341
show more ...
|