|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
86bd7e20 |
| 04-Jul-2022 |
Thomas Symalla <[email protected]> |
[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass.
Reviewed By: foad
Differe
[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D129086
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
cb9ae937 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision:
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision: https://reviews.llvm.org/D127527
show more ...
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
dd7e407d |
| 02-Jun-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Move SpilledReg from MFI to SIRegisterInfo
This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.
|
| #
2d43955c |
| 25-May-2022 |
Scott Linder <[email protected]> |
[AMDGPU][NFC] Refactor AMDGPUCallingConv.td
Rename CalleeSavedRegs defs to avoid being overly specific:
* CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_
[AMDGPU][NFC] Refactor AMDGPUCallingConv.td
Rename CalleeSavedRegs defs to avoid being overly specific:
* CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_105 => CSR_AMDGPU_SGPRs * CSR_AMDGPU_SI_Gfx_SGPRs_4_29 + CSR_AMDGPU_SI_Gfx_SGPRs_64_105 => CSR_AMDGPU_SI_Gfx_SGPRs * CSR_AMDGPU_HighRegs => CSR_AMDGPU * CSR_AMDGPU_HighRegs_With_AGPRs => CSR_AMDGPU_GFX90AInsts * CSR_AMDGPU_SI_Gfx_With_AGPRs => CSR_AMDGPU_SI_Gfx_GFX90AInsts
Introduce a class RegMask to mark the cases where we use the CalleeSavedRegs class purely as an expedient way to produce a mask. Update the names of these masks to not mention "CSR". Other targets also seem to do this, so a reasonable alternative is to actually update table-gen to include a new class to do this explicitly, but the current approach seems harmless so I opted to just make it more explicit.
Reviewed By: arsenm, sebastian-ne
Differential Revision: https://reviews.llvm.org/D109008
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
e0d585d7 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this
AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this by utilizing the CSR spill code.
This makes the frame index used transient state for PrologEpilogInserter, and thus makes serialization easier. Really this doesn't need to be saved here but there isn't really a better place for it.
show more ...
|
| #
ea47373a |
| 14-Apr-2022 |
hsmahesha <[email protected]> |
[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy.
This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy.
Reviewed By: rampitec
Differen
[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy.
This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123651
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
463bc93e |
| 09-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Remove unused parameter
|
| #
f014303e |
| 23-Mar-2022 |
hsmahesha <[email protected]> |
[AMDGPU] [NFC]: Organize the code around reserving registers.
First, add code to reserve all required special purpose registers, followed by code to reserve SGPRs, followed by code to reserve VGPRs/
[AMDGPU] [NFC]: Organize the code around reserving registers.
First, add code to reserve all required special purpose registers, followed by code to reserve SGPRs, followed by code to reserve VGPRs/AGPRs.
This patch is prepared as a pre-requisite to fix an issue related to GFX90A hardware.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122219
show more ...
|
| #
7636c9a9 |
| 22-Mar-2022 |
alex-t <[email protected]> |
[AMDGPU] use scalar shift for SALU users in frame index elimination
In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of t
[AMDGPU] use scalar shift for SALU users in frame index elimination
In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of the stack object user kind and use scalar shift/add for scalar users.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121524
show more ...
|
|
Revision tags: llvmorg-14.0.0 |
|
| #
0a488cba |
| 12-Mar-2022 |
alex-t <[email protected]> |
[AMDGPU] use scalar shift for SALU users in frame index elimination
In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of t
[AMDGPU] use scalar shift for SALU users in frame index elimination
In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of the stack object user kind and use scalar shift/add for scalar users.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121524
show more ...
|
| #
37b37838 |
| 16-Mar-2022 |
Shengchen Kan <[email protected]> |
[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
|
| #
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
36fe3f13 |
| 08-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.
Differential Revision: https://reviews.llvm.org/D121254
|
| #
72a9e5f8 |
| 11-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Restrict machine copy propagation from creating unaligned classes
Fixes: SWDEV-326366
Differential Revision: https://reviews.llvm.org/D121491
|
| #
a278250b |
| 10-Mar-2022 |
Nico Weber <[email protected]> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|
| #
7f230fee |
| 07-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
after: 1061034926 before: 1063332844
Differential Revision: https://reviews.llvm.org/D121169
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
bf60a1c5 |
| 25-Feb-2022 |
Aakanksha <[email protected]> |
Avoid comparisons between types of different widths in a loop condition to prevent the loop from behaving unexpectedly
This change fixes the code violations flagged in AMD compute CodeQL scan - Quer
Avoid comparisons between types of different widths in a loop condition to prevent the loop from behaving unexpectedly
This change fixes the code violations flagged in AMD compute CodeQL scan - Query Description: "Comparisons between types of different widths in a loop condition can cause the loop to behave unexpectedly."
Differential Revision: https://reviews.llvm.org/D120355
show more ...
|
| #
3884cb92 |
| 14-Feb-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Always reserve VGPR for AGPR copies on gfx908
Just because there aren't AGPRs in the original program doesn't mean the register allocator can't choose to use them (unless we were to forcibly
AMDGPU: Always reserve VGPR for AGPR copies on gfx908
Just because there aren't AGPRs in the original program doesn't mean the register allocator can't choose to use them (unless we were to forcibly reserve all AGPRs if there weren't any uses). This happens in high pressure situations and introduces copies to avoid spills.
In this test, the allocator ends up introducing a copy from SGPR to AGPR which requires an intermediate VGPR. I don't believe it would introduce a copy from AGPR to AGPR in this situation, since it would be trying to use an intermediate with a different class.
Theoretically this is also broken on gfx90a, but I have been unable to come up with a testcase.
show more ...
|
| #
898dc8a4 |
| 15-Feb-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Use subtarget in class instead of querying function
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
f2c99ea4 |
| 04-Feb-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Use reserved VGPR for AGPR spills to memory
Previously would reuse the VGPR used for large frame offsets with the one needed for copying from the AGPR. Fix this by reusing the register we al
AMDGPU: Use reserved VGPR for AGPR spills to memory
Previously would reuse the VGPR used for large frame offsets with the one needed for copying from the AGPR. Fix this by reusing the register we already reserved for handling AGPR to AGPR copies.
show more ...
|
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
8b2ca766 |
| 15-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908
We need to guarantee cheap copies between AGPRs, and unfortunately gfx908 cannot directly do this. Theoretically we could set the s
AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908
We need to guarantee cheap copies between AGPRs, and unfortunately gfx908 cannot directly do this. Theoretically we could set the scavenger up with an emergency spill slot, but it also feels unreasonable to pay that cost for what was assumed to be a simple and cheap copy. Pick a register that doesn't conflict with any ABI registers.
This does not address the same issue when copying from SGPR to AGPR for gfx90a (this coincidentally fixes it for gfx908), but that's less interesting since the register allocator shouldn't be proactively introducing such copies.
One edge case I'm worried about is respecting the VGPR budget implied by amdgpu-waves-per-eu. If the theoretical upper bound of a function is 32 VGPRs, this will force the actual count to be 33.
This is also broken if inline assembly uses/defs something in v32. The coalescer will eliminate the intermediate vreg between the def and use, and the introduced copy will clobber the user value.
(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
31973062 |
| 01-Nov-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos
If we had a large offset which required materializing in a register, we would emit an s_add_i32, clobbering SCC. Start checking i
AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos
If we had a large offset which required materializing in a register, we would emit an s_add_i32, clobbering SCC. Start checking if SCC is live, and instead use a VGPR offset. For MUBUF, we switch to using offen. We would do this anyway in a normal load/store with a frame index, but not for spills.
The same problem still exists in other contexts where we expand frame indices.
The nasty edge case is when SGPRs are spilled to memory at a large frame offset where SCC is also clobbered. This requires a second scavenging index, and also required several patches in the scavenger to correctly handle multiple recursive scavenge indexes.
An even nastier edge case we still don't support is if we don't have any free SGPRs. If SCC is live and we don't have any free SGPRs to save exec, we have no way of flipping exec back and forth without also clobbering SCC.
Fixes: SWDEV-309419
show more ...
|
| #
245e25f9 |
| 02-Feb-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Implement isAsmClobberable
Warn on inline assembly clobbering reserved registers. It should also warn on at least some reserved register defs, but that isn't happening right now. If you have
AMDGPU: Implement isAsmClobberable
Warn on inline assembly clobbering reserved registers. It should also warn on at least some reserved register defs, but that isn't happening right now. If you have a def and re-use of a register we reserve, the register coalescer will eliminate the intermediate virtual register. When the reserved reg def is introduced later by the backend, it will end up clobbering the value the register coalescer assumed was live through the range.
There is also isInlineAsmReadOnlyReg, although I don't understand what the distinction really is. It's called in SelectionDAGBuilder, long before the set of reserved registers is frozen so I'm not sure how that can possibly work reliably.
Unfortunately this is also using the ugly tablegenerated names for the registers.
show more ...
|
| #
cf58b9ce |
| 09-Dec-2021 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Add AV class spill pseudo instructions
While enabling vector superclasses with D109301, the AV spills are converted into VGPR spills by introducing appropriate copies. The whole thing ended
[AMDGPU] Add AV class spill pseudo instructions
While enabling vector superclasses with D109301, the AV spills are converted into VGPR spills by introducing appropriate copies. The whole thing ended up adding two instructions per spill (a copy + vgpr spill pseudo) and caused an incorrect liverange update during inline spiller.
This patch adds the pseudo instructions for all AV spills from 32b to 1024b and handles them in the way all other spills are lowered.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D115439
show more ...
|
| #
017ef785 |
| 08-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs
This introduces verifier errors into this broken situation which we do not handle correctly, which is better than being silently mis
AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs
This introduces verifier errors into this broken situation which we do not handle correctly, which is better than being silently miscompiled. For the emergency stack slot, the scavenger likes to move the restore instruction as late as possible, which ends up separating the SCC def from the conditional branch.
show more ...
|