|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
cef65864 |
| 18-Jun-2022 |
Guillaume Chatelet <[email protected]> |
[Alignment] Use Align for MaxKernArgAlign
Differential Revision: https://reviews.llvm.org/D128118
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
cc5a1b3d |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
llvm-reduce: Add cloning of target MachineFunctionInfo
MIR support is totally unusable for AMDGPU without this, since the set of reserved registers is set from fields here.
Add a clone method to Ma
llvm-reduce: Add cloning of target MachineFunctionInfo
MIR support is totally unusable for AMDGPU without this, since the set of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of the copy constructor that is required if there are any MIR constructs that use pointers. Specifically, at minimum fields that reference MachineBasicBlocks or the MachineFunction need to be adjusted to the values in the new function.
show more ...
|
| #
cfe51684 |
| 18-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Make PSV instances static members
|
| #
dd7e407d |
| 02-Jun-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Move SpilledReg from MFI to SIRegisterInfo
This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.
|
| #
5bd87350 |
| 21-Apr-2022 |
hsmahesha <[email protected]> |
[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unus
[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one exist.
Fixes SWDEV-330006.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123525
show more ...
|
| #
987df725 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Serialize VGPRForAGPRCopy
|
| #
b5ec1312 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix allocating GDS globals to LDS offsets
These don't seem to be very well used or tested, but try to make the behavior a bit more consistent with LDS globals.
I'm not sure what the definit
AMDGPU: Fix allocating GDS globals to LDS offsets
These don't seem to be very well used or tested, but try to make the behavior a bit more consistent with LDS globals.
I'm not sure what the definition for amdgpu-gds-size is supposed to mean. For now I assumed it's allocating a static size at the beginning of the allocation, and any known globals are allocated after it.
show more ...
|
| #
378bb801 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Serialize a few more MachineFunctionInfo fields in MIR
|
| #
f90f4884 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Serialize gds size in MIR
|
| #
5cd17f9d |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Serialize WWM registers
|
| #
e0d585d7 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this
AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this by utilizing the CSR spill code.
This makes the frame index used transient state for PrologEpilogInserter, and thus makes serialization easier. Really this doesn't need to be saved here but there isn't really a better place for it.
show more ...
|
| #
ea47373a |
| 14-Apr-2022 |
hsmahesha <[email protected]> |
[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy.
This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy.
Reviewed By: rampitec
Differen
[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy.
This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123651
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
8384ced9 |
| 28-Mar-2022 |
Changpeng Fang <[email protected]> |
[AMDGPU][NFC]: Remove unnecessary MFI functions
Summary: hasHostcallPtr() and hasHeapPtr() are only used in metadata emit. However, we can use the corresponding function attributes directly instea
[AMDGPU][NFC]: Remove unnecessary MFI functions
Summary: hasHostcallPtr() and hasHeapPtr() are only used in metadata emit. However, we can use the corresponding function attributes directly instead introducing the functions.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D122600
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
ca62b1db |
| 25-Feb-2022 |
Changpeng Fang <[email protected]> |
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://reviews.llvm.org/D119027
show more ...
|
| #
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
| #
d8f99bb6 |
| 11-Feb-2022 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now r
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument.
The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
aeaf85b9 |
| 13-Jan-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Select VGPR versions of MFMA if possible
We can select _vgprcd versions of MAI instructions and have no AGPRs with the whole budget left for VGPRs if:
1. This is a kernel; 2. It has no cal
[AMDGPU] Select VGPR versions of MFMA if possible
We can select _vgprcd versions of MAI instructions and have no AGPRs with the whole budget left for VGPRs if:
1. This is a kernel; 2. It has no calls; 3. It runs at least on 2 waves thus having not more that 256 VGPRs. 4. There is no inline asm requesting AGPRs.
Differential Revision: https://reviews.llvm.org/D117253
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
d6fdbbca |
| 24-Nov-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Add second emergency slot for SGPR to vmem for large frames
In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the S
AMDGPU: Add second emergency slot for SGPR to vmem for large frames
In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the SGPR spill. In most cases we could spill the VGPR along with the SGPR being spilled, but we don't have any free lanes for SGPR_1024 in wave32 so we could still potentially need a second scavenging slot.
show more ...
|
| #
de1600a1 |
| 10-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Avoid enabling kernel workitem IDs with reqd_work_group_size
|
| #
8470bf2b |
| 12-Jan-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Do not reserve any VGPR for SGPR spills
After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPR
[AMDGPU] Do not reserve any VGPR for SGPR spills
After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPRA is enabled since we cannot predict that a called function may not reserve any register if it does not have any SGPR spills. If that happens those functions may override reserved registers that are normally callee saved. Added a test to show this.
Fixes: SWDEV-309900
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D115551
show more ...
|
| #
d45a2479 |
| 18-Dec-2021 |
Brendon Cahoon <[email protected]> |
[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur du
[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur due to stack slot coloring. The problem is that subsequent code that processes the other object will assert because the stack frame index is marked dead.
Removing dead frame indices is needed prior to stack slot coloring, which is what happens with SGPR to VGPR spills. These spills are lowered prior to stack slot coloring, but the VGPR to AGPR spills are processed afterwards during the Prolog/Epilog Inserter pass. This patch marks the VGPR to AGPR spill slot as dead if the slot is not used by another object.
Differential Revision: https://reviews.llvm.org/D115996
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
06b90175 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove fixed function ABI option
|
| #
729bf9b2 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway.
Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.
show more ...
|
| #
e5340ed3 |
| 27-Oct-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Fix global isel for kernels using agprs on gfx90a
With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this c
[AMDGPU] Fix global isel for kernels using agprs on gfx90a
With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case.
Differential Revision: https://reviews.llvm.org/D112644
show more ...
|