SIFrameLowering.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5
# dd7e407d	02-Jun-2022	Matt Arsenault <[email protected]>	AMDGPU: Move SpilledReg from MFI to SIRegisterInfo This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# 5bd87350	21-Apr-2022	hsmahesha <[email protected]>	[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget. Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unus [AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget. Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one exist. Fixes SWDEV-330006. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D123525 show more ...
# e0d585d7	16-Apr-2022	Matt Arsenault <[email protected]>	AMDGPU: Defer creation of WWM VGPR spill slots There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this AMDGPU: Defer creation of WWM VGPR spill slots There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this by utilizing the CSR spill code. This makes the frame index used transient state for PrologEpilogInserter, and thus makes serialization easier. Really this doesn't need to be saved here but there isn't really a better place for it. show more ...
# 34a68037	13-Apr-2022	Christudasan Devadasan <[email protected]>	[AMDGPU][SIFrameLowering] Refactor custom SGPR spills (NFC). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123666
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 04fff547	07-Mar-2022	Venkata Ramanaiah Nalamothu <[email protected]>	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652 show more ...
Revision tags: llvmorg-14.0.0-rc2
# 6527b2a4	18-Feb-2022	Sebastian Neubauer <[email protected]>	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# d6fdbbca	24-Nov-2021	Matt Arsenault <[email protected]>	AMDGPU: Add second emergency slot for SGPR to vmem for large frames In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the S AMDGPU: Add second emergency slot for SGPR to vmem for large frames In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the SGPR spill. In most cases we could spill the VGPR along with the SGPR being spilled, but we don't have any free lanes for SGPR_1024 in wave32 so we could still potentially need a second scavenging slot. show more ...
# 18aabae8	20-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can' AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can't be colored away so ignore them. show more ...
# 8470bf2b	12-Jan-2022	Austin Kerbow <[email protected]>	[AMDGPU] Do not reserve any VGPR for SGPR spills After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPR [AMDGPU] Do not reserve any VGPR for SGPR spills After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPRA is enabled since we cannot predict that a called function may not reserve any register if it does not have any SGPR spills. If that happens those functions may override reserved registers that are normally callee saved. Added a test to show this. Fixes: SWDEV-309900 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115551 show more ...
# d45a2479	18-Dec-2021	Brendon Cahoon <[email protected]>	[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur du [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur due to stack slot coloring. The problem is that subsequent code that processes the other object will assert because the stack frame index is marked dead. Removing dead frame indices is needed prior to stack slot coloring, which is what happens with SGPR to VGPR spills. These spills are lowered prior to stack slot coloring, but the VGPR to AGPR spills are processed afterwards during the Prolog/Epilog Inserter pass. This patch marks the VGPR to AGPR spill slot as dead if the slot is not used by another object. Differential Revision: https://reviews.llvm.org/D115996 show more ...
# 273a0c8b	04-Nov-2021	Matt Arsenault <[email protected]>	PrologEpilogInserter: Use explicit control for scavenge slot placement AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the em PrologEpilogInserter: Use explicit control for scavenge slot placement AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment. I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only. show more ...
# 659887b4	13-Nov-2021	Matt Arsenault <[email protected]>	AMDGPU: Mark prolog/epilog SCC defs as dead A future change will add SCC liveness checks. Since we are still relying on forward register scavenging, add dead flags to avoid spuriously detecting SCC AMDGPU: Mark prolog/epilog SCC defs as dead A future change will add SCC liveness checks. Since we are still relying on forward register scavenging, add dead flags to avoid spuriously detecting SCC as live. show more ...
# 476ab0f8	11-Nov-2021	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Fixed stack pointer init with architected flat scratch Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels. F [AMDGPU] Fixed stack pointer init with architected flat scratch Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels. Fixes: SWDEV-310935 Differential Revision: https://reviews.llvm.org/D113628 show more ...
# 539f500e	03-Nov-2021	RamNalamothu <[email protected]>	[AMDGPU] Do not add debug locations to the code inside prologue There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code [AMDGPU] Do not add debug locations to the code inside prologue There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code inside prologue as a side effect of https://reviews.llvm.org/D99269 because buildSpillLoadStore() is using source location of the real instruction in the basic block if any. Fixes: SWDEV-307590 Reviewed By: scott.linder, sebastian-ne Differential Revision: https://reviews.llvm.org/D113100 show more ...
# 4bef0304	03-Nov-2021	Kazu Hirata <[email protected]>	[AArch64, AMDGPU] Use make_early_inc_range (NFC)
# bd4dad87	07-Oct-2021	Jack Andersen <[email protected]>	[MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand Based on the reasoning of D53903, register operands of DBG_VALUE are invariably treated as RegState:: [MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand Based on the reasoning of D53903, register operands of DBG_VALUE are invariably treated as RegState::Debug operands. This change enforces this invariant as part of MachineInstr::addOperand so that all passes emit this flag consistently. RegState::Debug is inconsistently set on DBG_VALUE registers throughout LLVM. This runs the risk of a filtering iterator like MachineRegisterInfo::reg_nodbg_iterator to process these operands erroneously when not parsed from MIR sources. This issue was observed in the development of the llvm-mos fork which adds a backend that relies on physical register operands much more than existing targets. Physical RegUnit 0 has the same numeric encoding as $noreg (indicating an undef for DBG_VALUE). Allowing debug operands into the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision of register numbers with different zero semantics). Eventually, this causes an assert where DBG_VALUE instructions are prohibited from participating in live register ranges. Reviewed By: MatzeB, StephenTozer Differential Revision: https://reviews.llvm.org/D110105 show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 11b7ee97	03-Aug-2021	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Avoid assert for saved FP With spilling into AGPRs enabled we cannot reliably predict if we need to save FP or not. We may finally spill everything into AGPRs and never touch stack. In this [AMDGPU] Avoid assert for saved FP With spilling into AGPRs enabled we cannot reliably predict if we need to save FP or not. We may finally spill everything into AGPRs and never touch stack. In this case we still may save FP. This is deficiency but not an error, so avoid the assert. Differential Revision: https://reviews.llvm.org/D107404 show more ...
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init
# 1a8c5717	28-Jul-2021	RamNalamothu <[email protected]>	[AMDGPU] We would need FP if there is call and caller save VGPR spills Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs to consider caller save VGPR spills as well while antic [AMDGPU] We would need FP if there is call and caller save VGPR spills Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs to consider caller save VGPR spills as well while anticipating if we require FP. Fixes: SWDEV-295978 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106758 show more ...
# 4359b870	14-Jul-2021	Sebastian Neubauer <[email protected]>	[AMDGPU] Init scratch only if necessary If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register. Differential Revision: https://reviews.llvm.org [AMDGPU] Init scratch only if necessary If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register. Differential Revision: https://reviews.llvm.org/D105920 show more ...
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# eebe841a	26-Sep-2018	Matt Arsenault <[email protected]>	RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know a RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point. Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill. This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers. In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run. One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work. Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator. show more ...
# 96e1fcb1	07-Jun-2021	Sebastian Neubauer <[email protected]>	[AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differe [AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differential Revision: https://reviews.llvm.org/D103322 show more ...
# 82f92e35	21-Apr-2021	Andy Wingo <[email protected]>	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a point [WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140 show more ...
# bc1ad6e3	31-May-2021	Andy Wingo <[email protected]>	Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables" This reverts commit bf35f4af51cddd743435bb6b94a45592c967891a. There was an error in a shared-library build.
# bf35f4af	21-Apr-2021	Andy Wingo <[email protected]>	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a point [WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140 show more ...
# e0c82654	28-May-2021	Nemanja Ivanovic <[email protected]>	Revert "Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI." Since ca5f07f8c4bc96d16ed1992b810aa3897df157f2 already r Revert "Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI." Since ca5f07f8c4bc96d16ed1992b810aa3897df157f2 already reverted the cause for this warning, this commit now causes warnings about a default label in a switch that covers the enum. This reverts commit cf2eeb114c59cfc3a80133e96c585188fa16cc98. show more ...
12 3 4 5 6 7