|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
dd7e407d |
| 02-Jun-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Move SpilledReg from MFI to SIRegisterInfo
This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
5bd87350 |
| 21-Apr-2022 |
hsmahesha <[email protected]> |
[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unus
[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one exist.
Fixes SWDEV-330006.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123525
show more ...
|
| #
e0d585d7 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this
AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this by utilizing the CSR spill code.
This makes the frame index used transient state for PrologEpilogInserter, and thus makes serialization easier. Really this doesn't need to be saved here but there isn't really a better place for it.
show more ...
|
| #
34a68037 |
| 13-Apr-2022 |
Christudasan Devadasan <[email protected]> |
[AMDGPU][SIFrameLowering] Refactor custom SGPR spills (NFC).
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D123666
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
04fff547 |
| 07-Mar-2022 |
Venkata Ramanaiah Nalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
d6fdbbca |
| 24-Nov-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Add second emergency slot for SGPR to vmem for large frames
In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the S
AMDGPU: Add second emergency slot for SGPR to vmem for large frames
In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the SGPR spill. In most cases we could spill the VGPR along with the SGPR being spilled, but we don't have any free lanes for SGPR_1024 in wave32 so we could still potentially need a second scavenging slot.
show more ...
|
| #
18aabae8 |
| 20-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills
These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can'
AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills
These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can't be colored away so ignore them.
show more ...
|
| #
8470bf2b |
| 12-Jan-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Do not reserve any VGPR for SGPR spills
After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPR
[AMDGPU] Do not reserve any VGPR for SGPR spills
After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPRA is enabled since we cannot predict that a called function may not reserve any register if it does not have any SGPR spills. If that happens those functions may override reserved registers that are normally callee saved. Added a test to show this.
Fixes: SWDEV-309900
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D115551
show more ...
|
| #
d45a2479 |
| 18-Dec-2021 |
Brendon Cahoon <[email protected]> |
[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur du
[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur due to stack slot coloring. The problem is that subsequent code that processes the other object will assert because the stack frame index is marked dead.
Removing dead frame indices is needed prior to stack slot coloring, which is what happens with SGPR to VGPR spills. These spills are lowered prior to stack slot coloring, but the VGPR to AGPR spills are processed afterwards during the Prolog/Epilog Inserter pass. This patch marks the VGPR to AGPR spill slot as dead if the slot is not used by another object.
Differential Revision: https://reviews.llvm.org/D115996
show more ...
|
| #
273a0c8b |
| 04-Nov-2021 |
Matt Arsenault <[email protected]> |
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the em
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only.
show more ...
|
| #
659887b4 |
| 13-Nov-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Mark prolog/epilog SCC defs as dead
A future change will add SCC liveness checks. Since we are still relying on forward register scavenging, add dead flags to avoid spuriously detecting SCC
AMDGPU: Mark prolog/epilog SCC defs as dead
A future change will add SCC liveness checks. Since we are still relying on forward register scavenging, add dead flags to avoid spuriously detecting SCC as live.
show more ...
|
| #
476ab0f8 |
| 11-Nov-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Fixed stack pointer init with architected flat scratch
Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels.
F
[AMDGPU] Fixed stack pointer init with architected flat scratch
Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels.
Fixes: SWDEV-310935
Differential Revision: https://reviews.llvm.org/D113628
show more ...
|
| #
539f500e |
| 03-Nov-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Do not add debug locations to the code inside prologue
There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code
[AMDGPU] Do not add debug locations to the code inside prologue
There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code inside prologue as a side effect of https://reviews.llvm.org/D99269 because buildSpillLoadStore() is using source location of the real instruction in the basic block if any.
Fixes: SWDEV-307590
Reviewed By: scott.linder, sebastian-ne
Differential Revision: https://reviews.llvm.org/D113100
show more ...
|
| #
4bef0304 |
| 03-Nov-2021 |
Kazu Hirata <[email protected]> |
[AArch64, AMDGPU] Use make_early_inc_range (NFC)
|
| #
bd4dad87 |
| 07-Oct-2021 |
Jack Andersen <[email protected]> |
[MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand
Based on the reasoning of D53903, register operands of DBG_VALUE are invariably treated as RegState::
[MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand
Based on the reasoning of D53903, register operands of DBG_VALUE are invariably treated as RegState::Debug operands. This change enforces this invariant as part of MachineInstr::addOperand so that all passes emit this flag consistently.
RegState::Debug is inconsistently set on DBG_VALUE registers throughout LLVM. This runs the risk of a filtering iterator like MachineRegisterInfo::reg_nodbg_iterator to process these operands erroneously when not parsed from MIR sources.
This issue was observed in the development of the llvm-mos fork which adds a backend that relies on physical register operands much more than existing targets. Physical RegUnit 0 has the same numeric encoding as $noreg (indicating an undef for DBG_VALUE). Allowing debug operands into the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision of register numbers with different zero semantics). Eventually, this causes an assert where DBG_VALUE instructions are prohibited from participating in live register ranges.
Reviewed By: MatzeB, StephenTozer
Differential Revision: https://reviews.llvm.org/D110105
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
11b7ee97 |
| 03-Aug-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Avoid assert for saved FP
With spilling into AGPRs enabled we cannot reliably predict if we need to save FP or not. We may finally spill everything into AGPRs and never touch stack. In this
[AMDGPU] Avoid assert for saved FP
With spilling into AGPRs enabled we cannot reliably predict if we need to save FP or not. We may finally spill everything into AGPRs and never touch stack. In this case we still may save FP. This is deficiency but not an error, so avoid the assert.
Differential Revision: https://reviews.llvm.org/D107404
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
1a8c5717 |
| 28-Jul-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] We would need FP if there is call and caller save VGPR spills
Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs to consider caller save VGPR spills as well while antic
[AMDGPU] We would need FP if there is call and caller save VGPR spills
Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs to consider caller save VGPR spills as well while anticipating if we require FP.
Fixes: SWDEV-295978
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D106758
show more ...
|
| #
4359b870 |
| 14-Jul-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register.
Differential Revision: https://reviews.llvm.org
[AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register.
Differential Revision: https://reviews.llvm.org/D105920
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
| #
eebe841a |
| 26-Sep-2018 |
Matt Arsenault <[email protected]> |
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know a
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point.
Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run.
One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work.
Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.
show more ...
|
| #
96e1fcb1 |
| 07-Jun-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differe
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differential Revision: https://reviews.llvm.org/D103322
show more ...
|
| #
82f92e35 |
| 21-Apr-2021 |
Andy Wingo <[email protected]> |
[WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a point
[WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
show more ...
|
| #
bc1ad6e3 |
| 31-May-2021 |
Andy Wingo <[email protected]> |
Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables"
This reverts commit bf35f4af51cddd743435bb6b94a45592c967891a. There was an error in a shared-library build.
|
| #
bf35f4af |
| 21-Apr-2021 |
Andy Wingo <[email protected]> |
[WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a point
[WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
show more ...
|
| #
e0c82654 |
| 28-May-2021 |
Nemanja Ivanovic <[email protected]> |
Revert "Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI."
Since ca5f07f8c4bc96d16ed1992b810aa3897df157f2 already r
Revert "Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI."
Since ca5f07f8c4bc96d16ed1992b810aa3897df157f2 already reverted the cause for this warning, this commit now causes warnings about a default label in a switch that covers the enum.
This reverts commit cf2eeb114c59cfc3a80133e96c585188fa16cc98.
show more ...
|