SIRegisterInfo.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 86bd7e20	04-Jul-2022	Thomas Symalla <[email protected]>	[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass. This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass. Reviewed By: foad Differe [NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass. This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D129086 show more ...
Revision tags: llvmorg-14.0.6
# cb9ae937	10-Jun-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Define SGPR_NULL64 register. NFCI. On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen. Differential Revision: [AMDGPU] Define SGPR_NULL64 register. NFCI. On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen. Differential Revision: https://reviews.llvm.org/D127527 show more ...
Revision tags: llvmorg-14.0.5
# dd7e407d	02-Jun-2022	Matt Arsenault <[email protected]>	AMDGPU: Move SpilledReg from MFI to SIRegisterInfo This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.
# 2d43955c	25-May-2022	Scott Linder <[email protected]>	[AMDGPU][NFC] Refactor AMDGPUCallingConv.td Rename CalleeSavedRegs defs to avoid being overly specific: * CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_ [AMDGPU][NFC] Refactor AMDGPUCallingConv.td Rename CalleeSavedRegs defs to avoid being overly specific: * CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_105 => CSR_AMDGPU_SGPRs * CSR_AMDGPU_SI_Gfx_SGPRs_4_29 + CSR_AMDGPU_SI_Gfx_SGPRs_64_105 => CSR_AMDGPU_SI_Gfx_SGPRs * CSR_AMDGPU_HighRegs => CSR_AMDGPU * CSR_AMDGPU_HighRegs_With_AGPRs => CSR_AMDGPU_GFX90AInsts * CSR_AMDGPU_SI_Gfx_With_AGPRs => CSR_AMDGPU_SI_Gfx_GFX90AInsts Introduce a class RegMask to mark the cases where we use the CalleeSavedRegs class purely as an expedient way to produce a mask. Update the names of these masks to not mention "CSR". Other targets also seem to do this, so a reasonable alternative is to actually update table-gen to include a new class to do this explicitly, but the current approach seems harmless so I opted to just make it more explicit. Reviewed By: arsenm, sebastian-ne Differential Revision: https://reviews.llvm.org/D109008 show more ...
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# e0d585d7	16-Apr-2022	Matt Arsenault <[email protected]>	AMDGPU: Defer creation of WWM VGPR spill slots There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this AMDGPU: Defer creation of WWM VGPR spill slots There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this by utilizing the CSR spill code. This makes the frame index used transient state for PrologEpilogInserter, and thus makes serialization easier. Really this doesn't need to be saved here but there isn't really a better place for it. show more ...
# ea47373a	14-Apr-2022	hsmahesha <[email protected]>	[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy. This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy. Reviewed By: rampitec Differen [AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy. This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D123651 show more ...
Revision tags: llvmorg-14.0.1
# 463bc93e	09-Apr-2022	Matt Arsenault <[email protected]>	AMDGPU/GlobalISel: Remove unused parameter
# f014303e	23-Mar-2022	hsmahesha <[email protected]>	[AMDGPU] [NFC]: Organize the code around reserving registers. First, add code to reserve all required special purpose registers, followed by code to reserve SGPRs, followed by code to reserve VGPRs/ [AMDGPU] [NFC]: Organize the code around reserving registers. First, add code to reserve all required special purpose registers, followed by code to reserve SGPRs, followed by code to reserve VGPRs/AGPRs. This patch is prepared as a pre-requisite to fix an issue related to GFX90A hardware. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122219 show more ...
# 7636c9a9	22-Mar-2022	alex-t <[email protected]>	[AMDGPU] use scalar shift for SALU users in frame index elimination In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of t [AMDGPU] use scalar shift for SALU users in frame index elimination In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of the stack object user kind and use scalar shift/add for scalar users. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121524 show more ...
Revision tags: llvmorg-14.0.0
# 0a488cba	12-Mar-2022	alex-t <[email protected]>	[AMDGPU] use scalar shift for SALU users in frame index elimination In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of t [AMDGPU] use scalar shift for SALU users in frame index elimination In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of the stack object user kind and use scalar shift/add for scalar users. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121524 show more ...
# 37b37838	16-Mar-2022	Shengchen Kan <[email protected]>	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
# 989f1c72	15-Mar-2022	serge-sans-paille <[email protected]>	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681 show more ...
Revision tags: llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 36fe3f13	08-Mar-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254
# 72a9e5f8	11-Mar-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Restrict machine copy propagation from creating unaligned classes Fixes: SWDEV-326366 Differential Revision: https://reviews.llvm.org/D121491
# a278250b	10-Mar-2022	Nico Weber <[email protected]>	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/ Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169 show more ...
# 7f230fee	07-Mar-2022	serge-sans-paille <[email protected]>	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169
Revision tags: llvmorg-14.0.0-rc2
# bf60a1c5	25-Feb-2022	Aakanksha <[email protected]>	Avoid comparisons between types of different widths in a loop condition to prevent the loop from behaving unexpectedly This change fixes the code violations flagged in AMD compute CodeQL scan - Quer Avoid comparisons between types of different widths in a loop condition to prevent the loop from behaving unexpectedly This change fixes the code violations flagged in AMD compute CodeQL scan - Query Description: "Comparisons between types of different widths in a loop condition can cause the loop to behave unexpectedly." Differential Revision: https://reviews.llvm.org/D120355 show more ...
# 3884cb92	14-Feb-2022	Matt Arsenault <[email protected]>	AMDGPU: Always reserve VGPR for AGPR copies on gfx908 Just because there aren't AGPRs in the original program doesn't mean the register allocator can't choose to use them (unless we were to forcibly AMDGPU: Always reserve VGPR for AGPR copies on gfx908 Just because there aren't AGPRs in the original program doesn't mean the register allocator can't choose to use them (unless we were to forcibly reserve all AGPRs if there weren't any uses). This happens in high pressure situations and introduces copies to avoid spills. In this test, the allocator ends up introducing a copy from SGPR to AGPR which requires an intermediate VGPR. I don't believe it would introduce a copy from AGPR to AGPR in this situation, since it would be trying to use an intermediate with a different class. Theoretically this is also broken on gfx90a, but I have been unable to come up with a testcase. show more ...
# 898dc8a4	15-Feb-2022	Matt Arsenault <[email protected]>	AMDGPU: Use subtarget in class instead of querying function
Revision tags: llvmorg-14.0.0-rc1
# f2c99ea4	04-Feb-2022	Matt Arsenault <[email protected]>	AMDGPU: Use reserved VGPR for AGPR spills to memory Previously would reuse the VGPR used for large frame offsets with the one needed for copying from the AGPR. Fix this by reusing the register we al AMDGPU: Use reserved VGPR for AGPR spills to memory Previously would reuse the VGPR used for large frame offsets with the one needed for copying from the AGPR. Fix this by reusing the register we already reserved for handling AGPR to AGPR copies. show more ...
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 8b2ca766	15-Dec-2021	Matt Arsenault <[email protected]>	AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908 We need to guarantee cheap copies between AGPRs, and unfortunately gfx908 cannot directly do this. Theoretically we could set the s AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908 We need to guarantee cheap copies between AGPRs, and unfortunately gfx908 cannot directly do this. Theoretically we could set the scavenger up with an emergency spill slot, but it also feels unreasonable to pay that cost for what was assumed to be a simple and cheap copy. Pick a register that doesn't conflict with any ABI registers. This does not address the same issue when copying from SGPR to AGPR for gfx90a (this coincidentally fixes it for gfx908), but that's less interesting since the register allocator shouldn't be proactively introducing such copies. One edge case I'm worried about is respecting the VGPR budget implied by amdgpu-waves-per-eu. If the theoretical upper bound of a function is 32 VGPRs, this will force the actual count to be 33. This is also broken if inline assembly uses/defs something in v32. The coalescer will eliminate the intermediate vreg between the def and use, and the introduced copy will clobber the user value. (cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607) show more ...
Revision tags: llvmorg-13.0.1-rc1
# 31973062	01-Nov-2021	Matt Arsenault <[email protected]>	AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos If we had a large offset which required materializing in a register, we would emit an s_add_i32, clobbering SCC. Start checking i AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos If we had a large offset which required materializing in a register, we would emit an s_add_i32, clobbering SCC. Start checking if SCC is live, and instead use a VGPR offset. For MUBUF, we switch to using offen. We would do this anyway in a normal load/store with a frame index, but not for spills. The same problem still exists in other contexts where we expand frame indices. The nasty edge case is when SGPRs are spilled to memory at a large frame offset where SCC is also clobbered. This requires a second scavenging index, and also required several patches in the scavenger to correctly handle multiple recursive scavenge indexes. An even nastier edge case we still don't support is if we don't have any free SGPRs. If SCC is live and we don't have any free SGPRs to save exec, we have no way of flipping exec back and forth without also clobbering SCC. Fixes: SWDEV-309419 show more ...
# 245e25f9	02-Feb-2022	Matt Arsenault <[email protected]>	AMDGPU: Implement isAsmClobberable Warn on inline assembly clobbering reserved registers. It should also warn on at least some reserved register defs, but that isn't happening right now. If you have AMDGPU: Implement isAsmClobberable Warn on inline assembly clobbering reserved registers. It should also warn on at least some reserved register defs, but that isn't happening right now. If you have a def and re-use of a register we reserve, the register coalescer will eliminate the intermediate virtual register. When the reserved reg def is introduced later by the backend, it will end up clobbering the value the register coalescer assumed was live through the range. There is also isInlineAsmReadOnlyReg, although I don't understand what the distinction really is. It's called in SelectionDAGBuilder, long before the set of reserved registers is frozen so I'm not sure how that can possibly work reliably. Unfortunately this is also using the ugly tablegenerated names for the registers. show more ...
# cf58b9ce	09-Dec-2021	Christudasan Devadasan <[email protected]>	[AMDGPU] Add AV class spill pseudo instructions While enabling vector superclasses with D109301, the AV spills are converted into VGPR spills by introducing appropriate copies. The whole thing ended [AMDGPU] Add AV class spill pseudo instructions While enabling vector superclasses with D109301, the AV spills are converted into VGPR spills by introducing appropriate copies. The whole thing ended up adding two instructions per spill (a copy + vgpr spill pseudo) and caused an incorrect liverange update during inline spiller. This patch adds the pseudo instructions for all AV spills from 32b to 1024b and handles them in the way all other spills are lowered. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115439 show more ...
# 017ef785	08-Dec-2021	Matt Arsenault <[email protected]>	AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs This introduces verifier errors into this broken situation which we do not handle correctly, which is better than being silently mis AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs This introduces verifier errors into this broken situation which we do not handle correctly, which is better than being silently miscompiled. For the emergency stack slot, the scavenger likes to move the restore instruction as late as possible, which ends up separating the SCC def from the conditional branch. show more ...
12 3 4 5 6 7 8 9 10 >>...15