AMDGPUCallLowering.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 0387da6f	20-Jul-2022	Kazu Hirata <[email protected]>	Use value instead of getValue (NFC)
# 41ae78ea	20-Jul-2022	Kazu Hirata <[email protected]>	Use has_value instead of hasValue (NFC)
# 3a205977	19-Jul-2022	Jon Chesterfield <[email protected]>	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia [amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060 show more ...
Revision tags: llvmorg-14.0.6
# d154d0ac	20-Jun-2022	Guillaume Chatelet <[email protected]>	[NFC] Simplify code
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# bc78c099	04-May-2022	Jon Chesterfield <[email protected]>	[amdgpu] Elide module lds allocation in kernels with no callees Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block [amdgpu] Elide module lds allocation in kernels with no callees Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block as before if the attribute is missing or has its default value of true. Patch uses the new attribute to detect the simplest possible instance of this, where a kernel makes no calls and thus cannot call any functions that use LDS. Tests updated to match, coverage was already good. Interesting cases is in lower-module-lds-offsets where annotating the kernel allows the backend to pick a different (in this case better) variable ordering than previously. A later patch will avoid moving kernel variables into module.lds when the kernel can have this attribute, allowing optimal ordering and locally unused variable elimination. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122091 show more ...
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# bcbd4cf1	20-Mar-2022	Jon Chesterfield <[email protected]>	Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b. show more ...
# 98e474c1	19-Mar-2022	Jon Chesterfield <[email protected]>	[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal
# 989f1c72	15-Mar-2022	serge-sans-paille <[email protected]>	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# a278250b	10-Mar-2022	Nico Weber <[email protected]>	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/ Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169 show more ...
# 7f230fee	07-Mar-2022	serge-sans-paille <[email protected]>	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169
# 0f20a35b	09-Mar-2022	Changpeng Fang <[email protected]>	AMDGPU: Set up User SGPRs for queue_ptr only when necessary Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In curr AMDGPU: Set up User SGPRs for queue_ptr only when necessary Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend. Reviewers: sameerds, arsenm Fixes: SWDEV-307189 Differential Revision: https://reviews.llvm.org/D119762 show more ...
# 04fff547	07-Mar-2022	Venkata Ramanaiah Nalamothu <[email protected]>	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652 show more ...
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# f482e869	18-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders I don't think this is actually defined for mesa, but this is what we were doing on the DAG path.
# 99e8e173	14-Jan-2022	Matt Arsenault <[email protected]>	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit a97e20a3a8a58be751f023e610758310d5664562.
Revision tags: llvmorg-13.0.1-rc2
# 7f26a102	12-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address. AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address. We want to fold fold into the MUBUF form to use the SP in the SGPR offset, and previously we were special casing the interpretation of the pointer value if the access memory operand said it was relative to the stack pointer. 690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved the DAG path to special casing copies from SGPRs. This is not an entirely sound approach, since it's still changing the interpretation of pointer values based the context. Introduce a new pseudo which corresponds to the wave-to-vector address transform. This way the memory instruction has consistent semantics where the incoming pointer is always interpreted as a vector address, and we're not obligated to optimize into the MUBUF offset-only addressing mode. The DAG should probably have an equivalent pseudo. This should fix some correctness issues, and folding this into addressing modes will be a future optimization patch. show more ...
# dc2457c8	15-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU: Fix crashing on calls to C functions from graphics contexts If we had one of the shader calling conventions calling a default calling convention callee, this would crash when the caller did AMDGPU: Fix crashing on calls to C functions from graphics contexts If we had one of the shader calling conventions calling a default calling convention callee, this would crash when the caller did not have anything to pass to the workitem ID. This is illegal, but we still need to produce something sensible. llvm-reduce likes to replace calls to intrinsics with calls to null or undef, so this does appear and is helpful to avoid hard erroring. Pass undef in this case, as already happened for the other implicit arguments. It might make sense to define the behavior here and pass null for the pointers, and -1 for the workitem ID. We do have extra bits in the workitem ID, so this wouldn't conflict with a valid value. show more ...
# a6f49423	09-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size If we know we we aren't using a component from the kernel, we can save a few bit packing instructions. We're still enabling the V AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size If we know we we aren't using a component from the kernel, we can save a few bit packing instructions. We're still enabling the VGPR input to the kernel though. show more ...
# 09b53296	22-Dec-2021	Ron Lieberman <[email protected]>	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3. Failed amdgpu runtime buildbot # 3514
# 9075009d	22-Dec-2021	RamNalamothu <[email protected]>	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652 show more ...
# 26924b57	13-Dec-2021	Neubauer, Sebastian <[email protected]>	[AMDGPU] Ignore special ABI registers for graphics Fixed ABI arguments are compute specific and should not be added to graphics shaders or functions, so do not try to add them. Differential Revisio [AMDGPU] Ignore special ABI registers for graphics Fixed ABI arguments are compute specific and should not be added to graphics shaders or functions, so do not try to add them. Differential Revision: https://reviews.llvm.org/D115344 show more ...
# d395befa	11-Dec-2021	Kazu Hirata <[email protected]>	[llvm] Use range-based for loops (NFC)
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 06b90175	14-Aug-2021	Matt Arsenault <[email protected]>	AMDGPU: Remove fixed function ABI option
# 729bf9b2	14-Aug-2021	Matt Arsenault <[email protected]>	AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway. Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values. show more ...
# 76cbe622	25-Oct-2021	Thomas Symalla <[email protected]>	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not goin [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637 show more ...
# fd1cfc90	28-Oct-2021	Sebastian Neubauer <[email protected]>	[AMDGPU][GlobalISel] Fix waterfall loops - Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benef [AMDGPU][GlobalISel] Fix waterfall loops - Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks. Differential Revision: https://reviews.llvm.org/D109052 show more ...
12 3 4 5 6