History log of /llvm-project-15.0.7/llvm/test/CodeGen/AMDGPU/ipra.ll (Results 1 – 16 of 16)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 04fff547 07-Mar-2022 Venkata Ramanaiah Nalamothu <[email protected]>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652

show more ...


Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 4622afa9 17-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass

This is more precise in the face of indirect calls and aliases, still
assuming the call target is defined somewhere in the current module

AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass

This is more precise in the face of indirect calls and aliases, still
assuming the call target is defined somewhere in the current module.

This sometimes changes the order the functions are printed, and also
changes the point where context errors are printed relative to
stdout. This also likely has negative consequences for compile time
and memory usage.

show more ...


# 0bcd9f60 13-Jan-2022 Venkata Ramanaiah Nalamothu <[email protected]>

[AMDGPU] Ensure return address is save/restored around the call with IPRA enabled/disabled

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D117243


Revision tags: llvmorg-13.0.1-rc2
# 09b53296 22-Dec-2021 Ron Lieberman <[email protected]>

Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"

This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

Failed amdgpu runtime buildbot # 3514


# 9075009d 22-Dec-2021 RamNalamothu <[email protected]>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652

show more ...


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 729bf9b2 14-Aug-2021 Matt Arsenault <[email protected]>

AMDGPU: Enable fixed function ABI by default

Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on

AMDGPU: Enable fixed function ABI by default

Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 5682ae2f 25-Mar-2021 madhur13490 <[email protected]>

[AMDGPU] Set implicit arg attributes for indirect calls

This patch adds attributes corresponding to
implicits to functions/kernels if
1. it has an indirect call OR
2. it's address is taken.

Once su

[AMDGPU] Set implicit arg attributes for indirect calls

This patch adds attributes corresponding to
implicits to functions/kernels if
1. it has an indirect call OR
2. it's address is taken.

Once such attributes are set, rest of the codegen would work
out-of-box for indirect calls. This patch eliminates
the potential overhead -fixed-abi imposes even though indirect functions
calls are not used.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D99347

show more ...


Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# 3c297a25 10-Feb-2021 madhur13490 <[email protected]>

Make fixed-abi default for AMD HSA OS

fixed-abi uses pre-defined and predictable
SGPR/VGPRs for passing arguments. This patch makes
this scheme default when HSA OS is specified in triple.

Reviewed

Make fixed-abi default for AMD HSA OS

fixed-abi uses pre-defined and predictable
SGPR/VGPRs for passing arguments. This patch makes
this scheme default when HSA OS is specified in triple.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96340

show more ...


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1
# 60b1967c 21-Jan-2020 Scott Linder <[email protected]>

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us t

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138

show more ...


Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4
# 71dfb7ec 08-Jul-2019 Matt Arsenault <[email protected]>

AMDGPU: Make s34 the FP register

Make the FP register callee saved.

This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame r

AMDGPU: Make s34 the FP register

Make the FP register callee saved.

This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame register
used throughout the rest of the function. I don't like how this
bypassess the standard mechanism for CSR spills just to get the
correct insert point. I may look for a better solution, since all CSR
VGPRs may also need to have all lanes activated. Another option might
be to make getFrameIndexReference change the base register if the
frame index is a CSR, and then try to figure out the right insertion
point in emitProlog.

If there is a free VGPR lane available for SGPR spilling, try to use
it for the FP. If that would require intrtoducing a new VGPR spill,
try to use a free call clobbered SGPR. Only fallback to introducing a
new VGPR spill as a last resort.

This also doesn't attempt to handle SGPR spilling with scalar stores.

llvm-svn: 365372

show more ...


Revision tags: llvmorg-8.0.1-rc3
# 22e3dc60 21-Jun-2019 Matt Arsenault <[email protected]>

AMDGPU: Fix not using s33 for scratch wave offset in kernels

Fixes missing piece from r363990.

llvm-svn: 364099


Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2
# afc24ed2 01-Feb-2019 Scott Linder <[email protected]>

[AMDGPU] Mark test functions with hidden visibility

Prepare for future patch which affects codegen for calls to preemptible
functions.

Differential Revision: https://reviews.llvm.org/D57605

llvm-s

[AMDGPU] Mark test functions with hidden visibility

Prepare for future patch which affects codegen for calls to preemptible
functions.

Differential Revision: https://reviews.llvm.org/D57605

llvm-svn: 352920

show more ...


Revision tags: llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3
# 607a7566 28-Nov-2017 Matt Arsenault <[email protected]>

AMDGPU: Enable IPRA

llvm-svn: 319256


Revision tags: llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1
# 6efd082c 14-Sep-2017 Matt Arsenault <[email protected]>

AMDGPU: Make frame register caller preserved

Using SplitCSR for the frame register was very broken. Often
the copies in the prolog and epilog were optimized out, in addition
to them being inserted a

AMDGPU: Make frame register caller preserved

Using SplitCSR for the frame register was very broken. Often
the copies in the prolog and epilog were optimized out, in addition
to them being inserted after the true prolog where the FP
was clobbered.

I have a hacky solution which works that continues to use
split CSR, but for now this is simpler and will get to working
programs.

llvm-svn: 313274

show more ...


Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3
# d664315a 24-Aug-2017 Matt Arsenault <[email protected]>

IPRA: Don't assume called function is first call operand

Fixes not finding the called global for AMDGPU
call pseudoinstructions, which prevented IPRA
from doing much.

llvm-svn: 311637


Revision tags: llvmorg-5.0.0-rc2
# b94972cb 05-Aug-2017 Matt Arsenault <[email protected]>

IPRA: Don't crash on null getCallPreservedMask

Kernels aren't callable, so they don't have a call preserved mask.

llvm-svn: 310172