AMDGPU.h - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPU.h

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# d1af09ad	23-Jun-2022	Joe Nash <[email protected]>	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a [AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656 show more ...
# 0f94d2b3	30-Jun-2022	Jay Foad <[email protected]>	[AMDGPU] GFX11: automatically release VGPRs at the end of the shader GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader [AMDGPU] GFX11: automatically release VGPRs at the end of the shader GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs. Differential Revision: https://reviews.llvm.org/D128442 show more ...
Revision tags: llvmorg-14.0.6
# cfb7ffde	21-Jun-2022	Jay Foad <[email protected]>	[AMDGPU] New AMDGPUInsertDelayAlu pass Differential Revision: https://reviews.llvm.org/D128270
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3
# 6ddf2a82	27-Apr-2022	Ivan Kosarev <[email protected]>	[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling. As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation an [AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling. As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation and then issuing their VMEM loads, which in turn leads the VALU unit to idle. This patch tries to prevent this by temporarily raising the wave's priority. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D124246 show more ...
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 203a1e36	17-Dec-2021	Matt Arsenault <[email protected]>	Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" This reverts commit 8a85be807bd453eb9c88d0126c75fd5ea393f60d. The unrelated failure this exposed was fixed.
# e188aae4	31-Jan-2022	serge-sans-paille <[email protected]>	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avo Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652 show more ...
# 8a85be80	16-Dec-2021	Ron Lieberman <[email protected]>	Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" Offload abort in Nekbone This reverts commit 2b4876157562bc76e86f193d371348993905bc61.
# 2b487615	14-Dec-2021	Matt Arsenault <[email protected]>	AMDGPU: Remove AMDGPUFixFunctionBitcasts pass This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls AMDGPU: Remove AMDGPUFixFunctionBitcasts pass This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls are supposed to work now, so drop the hack. show more ...
Revision tags: llvmorg-13.0.1-rc1
# 9cf995be	08-Oct-2021	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Promote generic pointer kernel arguments into global The new pass walks kernel's pointer arguments, then loads from them. If a loaded value is a pointer and loaded pointer is unmodified in [AMDGPU] Promote generic pointer kernel arguments into global The new pass walks kernel's pointer arguments, then loads from them. If a loaded value is a pointer and loaded pointer is unmodified in the kernel before the load, then promote loaded pointer to global. Then recursively continue. Differential Revision: https://reviews.llvm.org/D111464 show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 4c1023b4	14-Sep-2021	Jacob Lambert <[email protected]>	[AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files. Testing workflow as a new contribu [AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files. Testing workflow as a new contributor. Differential Revision: https://reviews.llvm.org/D109733 show more ...
Revision tags: llvmorg-13.0.0-rc2
# 48958d02	23-Aug-2021	Daniil Fukalov <[email protected]>	[NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `G [NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596 show more ...
# 5173854f	06-Aug-2021	Reshabh Sharma <[email protected]>	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer w [AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682 show more ...
# dce35ef1	04-Aug-2021	Reshabh Sharma <[email protected]>	Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list" This reverts commit d42e70b3d315645e37f3b1455d39e68678e69525.
# d42e70b3	04-Aug-2021	Reshabh Sharma <[email protected]>	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer w [AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682 show more ...
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4
# 96709823	27-Jun-2021	Kuter Dinel <[email protected]>	[AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: htt [AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997 show more ...
# 2b08f6af	19-Jul-2021	Sebastian Neubauer <[email protected]>	[AMDGPU] Improve register computation for indirect calls First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls. [AMDGPU] Improve register computation for indirect calls First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls. This is more accurate than guessing the maximum register usage without looking at the actual usage. As before, assume that indirect calls will hit a function in the current module. Differential Revision: https://reviews.llvm.org/D105839 show more ...
# 381ded34	28-Jun-2021	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot [AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register. This gives 10-20% uplift in a set of huge apps heavily using double precession math. Fixes: SWDEV-292645 Differential Revision: https://reviews.llvm.org/D104874 show more ...
Revision tags: llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 208332de	19-Apr-2021	Ruiling Song <[email protected]>	[AMDGPU] Add Optimize VGPR LiveRange Pass. This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example: def(a) if(cond) use(a) ... // A else use(a) As [AMDGPU] Add Optimize VGPR LiveRange Pass. This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example: def(a) if(cond) use(a) ... // A else use(a) As AMDGPU access vgpr with respect to active-mask, we can mark `a` as dead in region A. For details, please refer to the comments in implementation file. The pass is enabled by default, the frontend can disable it through "-amdgpu-opt-vgpr-liverange=false". Differential Revision: https://reviews.llvm.org/D102212 show more ...
# 80fd5fa5	21-Jun-2021	hsmahesha <[email protected]>	[AMDGPU] Replace non-kernel function uses of LDS globals by pointers. The main motivation behind pointer replacement of LDS use within non-kernel functions is - to avoid subsequent LDS lowering pa [AMDGPU] Replace non-kernel function uses of LDS globals by pointers. The main motivation behind pointer replacement of LDS use within non-kernel functions is - to avoid subsequent LDS lowering pass from directly packing LDS (assume large LDS) into a struct type which would otherwise cause allocating huge memory for struct instance within every kernel. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103225 show more ...
# caf1294d	26-Apr-2021	Baptiste Saleil <[email protected]>	[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pas [AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pass and its associated test cases from the tree. Differential Revision: https://reviews.llvm.org/D101313 Change-Id: I0599169a7609c19a887f8d847a71e664030cc141 show more ...
# d5d412f2	07-Apr-2021	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Split GCNRegBankReassign Allow pass to work separately with SGPR, VGPR registers or both. This is NFC now but will be needed to split RA for separate SGPR and VGPR passes. Differential Rev [AMDGPU] Split GCNRegBankReassign Allow pass to work separately with SGPR, VGPR registers or both. This is NFC now but will be needed to split RA for separate SGPR and VGPR passes. Differential Revision: https://reviews.llvm.org/D100063 show more ...
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5
# fdc4f19e	01-Apr-2021	Jay Foad <[email protected]>	[AMDGPU] Remove SIAddIMGInit pass which is now unused Differential Revision: https://reviews.llvm.org/D99748
Revision tags: llvmorg-12.0.0-rc4
# fe5f4c39	20-Mar-2021	Carl Ritson <[email protected]>	[AMDGPU] Rename SIInsertSkips Pass Pass no longer handles skips. Pass now removes unnecessary unconditional branches and lowers early termination branches. Hence rename to SILateBranchLowering. Mo [AMDGPU] Rename SIInsertSkips Pass Pass no longer handles skips. Pass now removes unnecessary unconditional branches and lowers early termination branches. Hence rename to SILateBranchLowering. Move code to handle returns to epilog from SIPreEmitPeephole into SILateBranchLowering. This means SIPreEmitPeephole only contains optional optimisations, and all required transforms are in SILateBranchLowering. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98915 show more ...
# 5df2af8b	20-Mar-2021	Carl Ritson <[email protected]>	[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole SIRemoveShortExecBranches is an optimisation so fits well in the context of SIPreEmitPeephole. Test changes relate to early terminati [AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole SIRemoveShortExecBranches is an optimisation so fits well in the context of SIPreEmitPeephole. Test changes relate to early termination from kills which have now been lowered prior to considering branches for removal. As these use s_cbranch the execz skips are now retained instead. Currently either behaviour is valid as kill with EXEC=0 is a nop; however, if early termination is used differently in future then the new behaviour is the correct one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D98917 show more ...
# 13e49dce	15-Mar-2021	Jon Chesterfield <[email protected]>	[amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kern [amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648 show more ...
12 3 4 5 6 7