|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
0387da6f |
| 20-Jul-2022 |
Kazu Hirata <[email protected]> |
Use value instead of getValue (NFC)
|
| #
41ae78ea |
| 20-Jul-2022 |
Kazu Hirata <[email protected]> |
Use has_value instead of hasValue (NFC)
|
| #
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
d154d0ac |
| 20-Jun-2022 |
Guillaume Chatelet <[email protected]> |
[NFC] Simplify code
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
bc78c099 |
| 04-May-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Elide module lds allocation in kernels with no callees
Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block
[amdgpu] Elide module lds allocation in kernels with no callees
Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block as before if the attribute is missing or has its default value of true.
Patch uses the new attribute to detect the simplest possible instance of this, where a kernel makes no calls and thus cannot call any functions that use LDS.
Tests updated to match, coverage was already good. Interesting cases is in lower-module-lds-offsets where annotating the kernel allows the backend to pick a different (in this case better) variable ordering than previously. A later patch will avoid moving kernel variables into module.lds when the kernel can have this attribute, allowing optimal ordering and locally unused variable elimination.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122091
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
bcbd4cf1 |
| 20-Mar-2022 |
Jon Chesterfield <[email protected]> |
Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e
Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.
show more ...
|
| #
98e474c1 |
| 19-Mar-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal
|
| #
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
a278250b |
| 10-Mar-2022 |
Nico Weber <[email protected]> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|
| #
7f230fee |
| 07-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
after: 1061034926 before: 1063332844
Differential Revision: https://reviews.llvm.org/D121169
|
| #
0f20a35b |
| 09-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In curr
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
show more ...
|
| #
04fff547 |
| 07-Mar-2022 |
Venkata Ramanaiah Nalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
f482e869 |
| 18-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders
I don't think this is actually defined for mesa, but this is what we were doing on the DAG path.
|
| #
99e8e173 |
| 14-Jan-2022 |
Matt Arsenault <[email protected]> |
Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction"
This reverts commit a97e20a3a8a58be751f023e610758310d5664562.
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
7f26a102 |
| 12-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences
Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address.
AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences
Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address. We want to fold fold into the MUBUF form to use the SP in the SGPR offset, and previously we were special casing the interpretation of the pointer value if the access memory operand said it was relative to the stack pointer.
690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved the DAG path to special casing copies from SGPRs. This is not an entirely sound approach, since it's still changing the interpretation of pointer values based the context.
Introduce a new pseudo which corresponds to the wave-to-vector address transform. This way the memory instruction has consistent semantics where the incoming pointer is always interpreted as a vector address, and we're not obligated to optimize into the MUBUF offset-only addressing mode. The DAG should probably have an equivalent pseudo.
This should fix some correctness issues, and folding this into addressing modes will be a future optimization patch.
show more ...
|
| #
dc2457c8 |
| 15-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix crashing on calls to C functions from graphics contexts
If we had one of the shader calling conventions calling a default calling convention callee, this would crash when the caller did
AMDGPU: Fix crashing on calls to C functions from graphics contexts
If we had one of the shader calling conventions calling a default calling convention callee, this would crash when the caller did not have anything to pass to the workitem ID.
This is illegal, but we still need to produce something sensible. llvm-reduce likes to replace calls to intrinsics with calls to null or undef, so this does appear and is helpful to avoid hard erroring.
Pass undef in this case, as already happened for the other implicit arguments. It might make sense to define the behavior here and pass null for the pointers, and -1 for the workitem ID. We do have extra bits in the workitem ID, so this wouldn't conflict with a valid value.
show more ...
|
| #
a6f49423 |
| 09-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size
If we know we we aren't using a component from the kernel, we can save a few bit packing instructions.
We're still enabling the V
AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size
If we know we we aren't using a component from the kernel, we can save a few bit packing instructions.
We're still enabling the VGPR input to the kernel though.
show more ...
|
| #
09b53296 |
| 22-Dec-2021 |
Ron Lieberman <[email protected]> |
Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.
Failed amdgpu runtime buildbot # 3514
|
| #
9075009d |
| 22-Dec-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
| #
26924b57 |
| 13-Dec-2021 |
Neubauer, Sebastian <[email protected]> |
[AMDGPU] Ignore special ABI registers for graphics
Fixed ABI arguments are compute specific and should not be added to graphics shaders or functions, so do not try to add them.
Differential Revisio
[AMDGPU] Ignore special ABI registers for graphics
Fixed ABI arguments are compute specific and should not be added to graphics shaders or functions, so do not try to add them.
Differential Revision: https://reviews.llvm.org/D115344
show more ...
|
| #
d395befa |
| 11-Dec-2021 |
Kazu Hirata <[email protected]> |
[llvm] Use range-based for loops (NFC)
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
06b90175 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove fixed function ABI option
|
| #
729bf9b2 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway.
Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.
show more ...
|
| #
76cbe622 |
| 25-Oct-2021 |
Thomas Symalla <[email protected]> |
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not goin
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D111637
show more ...
|
| #
fd1cfc90 |
| 28-Oct-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][GlobalISel] Fix waterfall loops
- Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benef
[AMDGPU][GlobalISel] Fix waterfall loops
- Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls
To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks.
Differential Revision: https://reviews.llvm.org/D109052
show more ...
|