|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
d1af09ad |
| 23-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back.
Depends on D128442, D128270
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D128656
show more ...
|
| #
0f94d2b3 |
| 30-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader
[AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs.
Differential Revision: https://reviews.llvm.org/D128442
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
cfb7ffde |
| 21-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] New AMDGPUInsertDelayAlu pass
Differential Revision: https://reviews.llvm.org/D128270
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3 |
|
| #
6ddf2a82 |
| 27-Apr-2022 |
Ivan Kosarev <[email protected]> |
[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.
As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation an
[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.
As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation and then issuing their VMEM loads, which in turn leads the VALU unit to idle. This patch tries to prevent this by temporarily raising the wave's priority.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D124246
show more ...
|
|
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
203a1e36 |
| 17-Dec-2021 |
Matt Arsenault <[email protected]> |
Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"
This reverts commit 8a85be807bd453eb9c88d0126c75fd5ea393f60d.
The unrelated failure this exposed was fixed.
|
| #
e188aae4 |
| 31-Jan-2022 |
serge-sans-paille <[email protected]> |
Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.
This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avo
Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.
This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/
I've tried to summarize the biggest change below:
- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h
And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l before: 6400831 after: 6189948
200k lines less to process is no that bad ;-)
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D118652
show more ...
|
| #
8a85be80 |
| 16-Dec-2021 |
Ron Lieberman <[email protected]> |
Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"
Offload abort in Nekbone
This reverts commit 2b4876157562bc76e86f193d371348993905bc61.
|
| #
2b487615 |
| 14-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove AMDGPUFixFunctionBitcasts pass
This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls
AMDGPU: Remove AMDGPUFixFunctionBitcasts pass
This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls are supposed to work now, so drop the hack.
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
9cf995be |
| 08-Oct-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Promote generic pointer kernel arguments into global
The new pass walks kernel's pointer arguments, then loads from them. If a loaded value is a pointer and loaded pointer is unmodified in
[AMDGPU] Promote generic pointer kernel arguments into global
The new pass walks kernel's pointer arguments, then loads from them. If a loaded value is a pointer and loaded pointer is unmodified in the kernel before the load, then promote loaded pointer to global. Then recursively continue.
Differential Revision: https://reviews.llvm.org/D111464
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
4c1023b4 |
| 14-Sep-2021 |
Jacob Lambert <[email protected]> |
[AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files
Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files. Testing workflow as a new contribu
[AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files
Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files. Testing workflow as a new contributor.
Differential Revision: https://reviews.llvm.org/D109733
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
48958d02 |
| 23-Aug-2021 |
Daniil Fukalov <[email protected]> |
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `G
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108596
show more ...
|
| #
5173854f |
| 06-Aug-2021 |
Reshabh Sharma <[email protected]> |
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer w
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
show more ...
|
| #
dce35ef1 |
| 04-Aug-2021 |
Reshabh Sharma <[email protected]> |
Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"
This reverts commit d42e70b3d315645e37f3b1455d39e68678e69525.
|
| #
d42e70b3 |
| 04-Aug-2021 |
Reshabh Sharma <[email protected]> |
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer w
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4 |
|
| #
96709823 |
| 27-Jun-2021 |
Kuter Dinel <[email protected]> |
[AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Reviewed By: jdoerfert, arsenm
Differential Revision: htt
[AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Reviewed By: jdoerfert, arsenm
Differential Revision: https://reviews.llvm.org/D104997
show more ...
|
| #
2b08f6af |
| 19-Jul-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls.
[AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls.
This is more accurate than guessing the maximum register usage without looking at the actual usage.
As before, assume that indirect calls will hit a function in the current module.
Differential Revision: https://reviews.llvm.org/D105839
show more ...
|
| #
381ded34 |
| 28-Jun-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants
This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot
[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants
This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register.
This gives 10-20% uplift in a set of huge apps heavily using double precession math.
Fixes: SWDEV-292645
Differential Revision: https://reviews.llvm.org/D104874
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
208332de |
| 19-Apr-2021 |
Ruiling Song <[email protected]> |
[AMDGPU] Add Optimize VGPR LiveRange Pass.
This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example:
def(a) if(cond) use(a) ... // A else use(a)
As
[AMDGPU] Add Optimize VGPR LiveRange Pass.
This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example:
def(a) if(cond) use(a) ... // A else use(a)
As AMDGPU access vgpr with respect to active-mask, we can mark `a` as dead in region A. For details, please refer to the comments in implementation file.
The pass is enabled by default, the frontend can disable it through "-amdgpu-opt-vgpr-liverange=false".
Differential Revision: https://reviews.llvm.org/D102212
show more ...
|
| #
80fd5fa5 |
| 21-Jun-2021 |
hsmahesha <[email protected]> |
[AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
The main motivation behind pointer replacement of LDS use within non-kernel functions is - to *avoid* subsequent LDS lowering pa
[AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
The main motivation behind pointer replacement of LDS use within non-kernel functions is - to *avoid* subsequent LDS lowering pass from directly packing LDS (assume large LDS) into a struct type which would otherwise cause allocating huge memory for struct instance within every kernel.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103225
show more ...
|
| #
caf1294d |
| 26-Apr-2021 |
Baptiste Saleil <[email protected]> |
[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pas
[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pass and its associated test cases from the tree.
Differential Revision: https://reviews.llvm.org/D101313
Change-Id: I0599169a7609c19a887f8d847a71e664030cc141
show more ...
|
| #
d5d412f2 |
| 07-Apr-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Split GCNRegBankReassign
Allow pass to work separately with SGPR, VGPR registers or both. This is NFC now but will be needed to split RA for separate SGPR and VGPR passes.
Differential Rev
[AMDGPU] Split GCNRegBankReassign
Allow pass to work separately with SGPR, VGPR registers or both. This is NFC now but will be needed to split RA for separate SGPR and VGPR passes.
Differential Revision: https://reviews.llvm.org/D100063
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5 |
|
| #
fdc4f19e |
| 01-Apr-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Remove SIAddIMGInit pass which is now unused
Differential Revision: https://reviews.llvm.org/D99748
|
|
Revision tags: llvmorg-12.0.0-rc4 |
|
| #
fe5f4c39 |
| 20-Mar-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Rename SIInsertSkips Pass
Pass no longer handles skips. Pass now removes unnecessary unconditional branches and lowers early termination branches. Hence rename to SILateBranchLowering.
Mo
[AMDGPU] Rename SIInsertSkips Pass
Pass no longer handles skips. Pass now removes unnecessary unconditional branches and lowers early termination branches. Hence rename to SILateBranchLowering.
Move code to handle returns to epilog from SIPreEmitPeephole into SILateBranchLowering. This means SIPreEmitPeephole only contains optional optimisations, and all required transforms are in SILateBranchLowering.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D98915
show more ...
|
| #
5df2af8b |
| 20-Mar-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole
SIRemoveShortExecBranches is an optimisation so fits well in the context of SIPreEmitPeephole.
Test changes relate to early terminati
[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole
SIRemoveShortExecBranches is an optimisation so fits well in the context of SIPreEmitPeephole.
Test changes relate to early termination from kills which have now been lowered prior to considering branches for removal. As these use s_cbranch the execz skips are now retained instead. Currently either behaviour is valid as kill with EXEC=0 is a nop; however, if early termination is used differently in future then the new behaviour is the correct one.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D98917
show more ...
|
| #
13e49dce |
| 15-Mar-2021 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lower function LDS pass
[amdgpu] Implement lower function LDS pass
Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kern
[amdgpu] Implement lower function LDS pass
[amdgpu] Implement lower function LDS pass
Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset.
Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer.
This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite.
This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D94648
show more ...
|