History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp (Results 1 – 25 of 138)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 0387da6f 20-Jul-2022 Kazu Hirata <[email protected]>

Use value instead of getValue (NFC)


# 41ae78ea 20-Jul-2022 Kazu Hirata <[email protected]>

Use has_value instead of hasValue (NFC)


# 3a205977 19-Jul-2022 Jon Chesterfield <[email protected]>

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS varia

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060

show more ...


Revision tags: llvmorg-14.0.6
# d154d0ac 20-Jun-2022 Guillaume Chatelet <[email protected]>

[NFC] Simplify code


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# bc78c099 04-May-2022 Jon Chesterfield <[email protected]>

[amdgpu] Elide module lds allocation in kernels with no callees

Introduces a string attribute, amdgpu-requires-module-lds, to allow
eliding the module.lds block from kernels. Will allocate the block

[amdgpu] Elide module lds allocation in kernels with no callees

Introduces a string attribute, amdgpu-requires-module-lds, to allow
eliding the module.lds block from kernels. Will allocate the block as before
if the attribute is missing or has its default value of true.

Patch uses the new attribute to detect the simplest possible instance of this,
where a kernel makes no calls and thus cannot call any functions that use LDS.

Tests updated to match, coverage was already good. Interesting cases is in
lower-module-lds-offsets where annotating the kernel allows the backend to pick
a different (in this case better) variable ordering than previously. A later
patch will avoid moving kernel variables into module.lds when the kernel can
have this attribute, allowing optimal ordering and locally unused variable
elimination.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122091

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# bcbd4cf1 20-Mar-2022 Jon Chesterfield <[email protected]>

Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal"
Reconsidered, better to handle per-function state in the constructor as before.
This reverts commit 98e474c1b3210d90e

Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal"
Reconsidered, better to handle per-function state in the constructor as before.
This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.

show more ...


# 98e474c1 19-Mar-2022 Jon Chesterfield <[email protected]>

[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal


# 989f1c72 15-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# a278250b 10-Mar-2022 Nico Weber <[email protected]>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


# 0f20a35b 09-Mar-2022 Changpeng Fang <[email protected]>

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In curr

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.

Reviewers: sameerds, arsenm

Fixes: SWDEV-307189

Differential Revision: https://reviews.llvm.org/D119762

show more ...


# 04fff547 07-Mar-2022 Venkata Ramanaiah Nalamothu <[email protected]>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652

show more ...


Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# f482e869 18-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders

I don't think this is actually defined for mesa, but this is what we
were doing on the DAG path.


# 99e8e173 14-Jan-2022 Matt Arsenault <[email protected]>

Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction"

This reverts commit a97e20a3a8a58be751f023e610758310d5664562.


Revision tags: llvmorg-13.0.1-rc2
# 7f26a102 12-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences

Arbitrary stack pointers are accessed using MUBUF instructions with
the voffset field, which is interpreted as the swizzled address.

AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences

Arbitrary stack pointers are accessed using MUBUF instructions with
the voffset field, which is interpreted as the swizzled address. We
want to fold fold into the MUBUF form to use the SP in the SGPR
offset, and previously we were special casing the interpretation of
the pointer value if the access memory operand said it was relative to
the stack pointer.

690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved
the DAG path to special casing copies from SGPRs. This is not an
entirely sound approach, since it's still changing the interpretation
of pointer values based the context.

Introduce a new pseudo which corresponds to the wave-to-vector address
transform. This way the memory instruction has consistent semantics
where the incoming pointer is always interpreted as a vector address,
and we're not obligated to optimize into the MUBUF offset-only
addressing mode. The DAG should probably have an equivalent pseudo.

This should fix some correctness issues, and folding this into
addressing modes will be a future optimization patch.

show more ...


# dc2457c8 15-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU: Fix crashing on calls to C functions from graphics contexts

If we had one of the shader calling conventions calling a default
calling convention callee, this would crash when the caller did

AMDGPU: Fix crashing on calls to C functions from graphics contexts

If we had one of the shader calling conventions calling a default
calling convention callee, this would crash when the caller did not
have anything to pass to the workitem ID.

This is illegal, but we still need to produce something
sensible. llvm-reduce likes to replace calls to intrinsics with calls
to null or undef, so this does appear and is helpful to avoid hard
erroring.

Pass undef in this case, as already happened for the other implicit
arguments. It might make sense to define the behavior here and pass
null for the pointers, and -1 for the workitem ID. We do have extra
bits in the workitem ID, so this wouldn't conflict with a valid value.

show more ...


# a6f49423 09-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size

If we know we we aren't using a component from the kernel, we can save
a few bit packing instructions.

We're still enabling the V

AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size

If we know we we aren't using a component from the kernel, we can save
a few bit packing instructions.

We're still enabling the VGPR input to the kernel though.

show more ...


# 09b53296 22-Dec-2021 Ron Lieberman <[email protected]>

Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"

This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

Failed amdgpu runtime buildbot # 3514


# 9075009d 22-Dec-2021 RamNalamothu <[email protected]>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652

show more ...


# 26924b57 13-Dec-2021 Neubauer, Sebastian <[email protected]>

[AMDGPU] Ignore special ABI registers for graphics

Fixed ABI arguments are compute specific and should not be added to
graphics shaders or functions, so do not try to add them.

Differential Revisio

[AMDGPU] Ignore special ABI registers for graphics

Fixed ABI arguments are compute specific and should not be added to
graphics shaders or functions, so do not try to add them.

Differential Revision: https://reviews.llvm.org/D115344

show more ...


# d395befa 11-Dec-2021 Kazu Hirata <[email protected]>

[llvm] Use range-based for loops (NFC)


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 06b90175 14-Aug-2021 Matt Arsenault <[email protected]>

AMDGPU: Remove fixed function ABI option


# 729bf9b2 14-Aug-2021 Matt Arsenault <[email protected]>

AMDGPU: Enable fixed function ABI by default

Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on

AMDGPU: Enable fixed function ABI by default

Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.

show more ...


# 76cbe622 25-Oct-2021 Thomas Symalla <[email protected]>

[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not goin

[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.

This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D111637

show more ...


# fd1cfc90 28-Oct-2021 Sebastian Neubauer <[email protected]>

[AMDGPU][GlobalISel] Fix waterfall loops

- Move the `s_and exec` to its correct position before the content of
the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benef

[AMDGPU][GlobalISel] Fix waterfall loops

- Move the `s_and exec` to its correct position before the content of
the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit
from optimizations
- Add support for indirect function calls

To support indirect calls, add a G_SI_CALL instruction without register
class restrictions and insert a waterfall loop when applying register
banks.

Differential Revision: https://reviews.llvm.org/D109052

show more ...


123456