|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
2d43955c |
| 25-May-2022 |
Scott Linder <[email protected]> |
[AMDGPU][NFC] Refactor AMDGPUCallingConv.td
Rename CalleeSavedRegs defs to avoid being overly specific:
* CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_
[AMDGPU][NFC] Refactor AMDGPUCallingConv.td
Rename CalleeSavedRegs defs to avoid being overly specific:
* CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_105 => CSR_AMDGPU_SGPRs * CSR_AMDGPU_SI_Gfx_SGPRs_4_29 + CSR_AMDGPU_SI_Gfx_SGPRs_64_105 => CSR_AMDGPU_SI_Gfx_SGPRs * CSR_AMDGPU_HighRegs => CSR_AMDGPU * CSR_AMDGPU_HighRegs_With_AGPRs => CSR_AMDGPU_GFX90AInsts * CSR_AMDGPU_SI_Gfx_With_AGPRs => CSR_AMDGPU_SI_Gfx_GFX90AInsts
Introduce a class RegMask to mark the cases where we use the CalleeSavedRegs class purely as an expedient way to produce a mask. Update the names of these masks to not mention "CSR". Other targets also seem to do this, so a reasonable alternative is to actually update table-gen to include a new class to do this explicitly, but the current approach seems harmless so I opted to just make it more explicit.
Reviewed By: arsenm, sebastian-ne
Differential Revision: https://reviews.llvm.org/D109008
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
04fff547 |
| 07-Mar-2022 |
Venkata Ramanaiah Nalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
09b53296 |
| 22-Dec-2021 |
Ron Lieberman <[email protected]> |
Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.
Failed amdgpu runtime buildbot # 3514
|
| #
9075009d |
| 22-Dec-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
76cbe622 |
| 25-Oct-2021 |
Thomas Symalla <[email protected]> |
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not goin
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D111637
show more ...
|
| #
eb16570a |
| 26-Oct-2021 |
Neubauer, Sebastian <[email protected]> |
[AMDGPU] Remove unused CSR defs
CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used anywhere, so remove them.
Differential Revision: https://reviews.llvm.org/D112535
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
fb44c322 |
| 12-Jul-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Promote signext/zeroext i16 shader returns
This makes them consistent with all the other return convention handling. If we don't do this, we lose the sext/zext flag if treated as a full assi
AMDGPU: Promote signext/zeroext i16 shader returns
This makes them consistent with all the other return convention handling. If we don't do this, we lose the sext/zext flag if treated as a full assignment, which complicates a future GlobalISel patch.
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5 |
|
| #
7842e172 |
| 01-Apr-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Fix large return values with amdgpu_gfx
Returning in memory is not supported, so fall back to sret. Also, extend i1 and i16 to i32. Otherwise, they would be passed through memory.
Differen
[AMDGPU] Fix large return values with amdgpu_gfx
Returning in memory is not supported, so fall back to sret. Also, extend i1 and i16 to i32. Otherwise, they would be passed through memory.
Differential Revision: https://reviews.llvm.org/D100543
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
| #
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
| #
bcf723b2 |
| 07-Feb-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop adding stack passed wide arguments to call conv handler
The generated calling convention code shouldn't see these types since we split large types into 32-bit chunks before the calling
AMDGPU: Stop adding stack passed wide arguments to call conv handler
The generated calling convention code shouldn't see these types since we split large types into 32-bit chunks before the calling convention code is triggered.
GlobalISel ends up directly calls the generated CC code before checking for the register count breakdown. Arguably this difference is a bug, but this was dead code for the DAG anyway.
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
97f51f04 |
| 15-Dec-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove redundant CCAction for i1
|
|
Revision tags: llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
a022b1cc |
| 16-Sep-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls within graphics shaders. For the moment, this uses the same calling convention as other
[AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls within graphics shaders. For the moment, this uses the same calling convention as other calls in amdgpu, with registers excluded for return address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
show more ...
|
| #
d3bcfe2a |
| 15-Oct-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Implement getNoPreservedMask
We don't support funclets for exception handling and I hit this when manually reducing MIR.
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
| #
375cec4b |
| 27-Mar-2020 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Introduce more scratch registers in the ABI.
The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not ef
[AMDGPU] Introduce more scratch registers in the ABI.
The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not efficient always, esp. when the callee requiring more registers, ended up emitting a large number of spills, even though its caller requires only a few.
This patch revises the ABI by introducing more scratch registers that a callee can freely use. The 256 vgpr registers now become: 32 argument registers 112 scratch registers and 112 callee saved registers. The scratch registers and the CSRs are intermixed at regular intervals (a split boundary of 8) to obtain a better occupancy.
Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr
Reviewed By: arsenm, t-tye
Differential Revision: https://reviews.llvm.org/D76356
show more ...
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
2214bc81 |
| 26-Jan-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Allow i16 shader arguments
Not allowing this just creates unnecessary complications when writing simple tests.
|
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
| #
3b1459ed |
| 28-Aug-2019 |
Ryan Taylor <[email protected]> |
[AMDGPU] Adjust number of SGPRs available in Calling Convention
This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved availa
[AMDGPU] Adjust number of SGPRs available in Calling Convention
This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention.
Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1 |
|
| #
1022c0df |
| 19-Jul-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit.
This should also help avoid
AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit.
This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel.
llvm-svn: 366578
show more ...
|
|
Revision tags: llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
| #
5b0922fe |
| 03-Jul-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Add pass to lower SGPR spills
This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining Li
AMDGPU: Add pass to lower SGPR spills
This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining LiveIntervals).
This simplifies making the frame pointer register callee saved. As it is now, the code to determine callee saves needs to predict all the possible SGPR spills and how many callee saved VGPRs are needed. By handling this before PrologEpilogInserter, it's possible to just check the spill objects that already exist.
Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04 llvm-svn: 365095
show more ...
|
|
Revision tags: llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2 |
|
| #
60ba03e2 |
| 21-May-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix not marking new gfx10 SGPRs as CSRs
llvm-svn: 361330
|
|
Revision tags: llvmorg-8.0.1-rc1 |
|
| #
29257eb7 |
| 15-May-2019 |
Ryan Taylor <[email protected]> |
[AMDGPU] Increases available SGPR for Calling Convention
Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105.
[AMDGPU] Increases available SGPR for Calling Convention
Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105.
Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61261
llvm-svn: 360778
show more ...
|
| #
033f99a2 |
| 22-Mar-2019 |
Tim Renouf <[email protected]> |
[AMDGPU] Added v5i32 and v5f32 register classes
They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords.
Differential Revision: https://r
[AMDGPU] Added v5i32 and v5f32 register classes
They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords.
Differential Revision: https://reviews.llvm.org/D58903
Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
show more ...
|
| #
361b5b21 |
| 21-Mar-2019 |
Tim Renouf <[email protected]> |
[AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled there
[AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled there.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58902
Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
show more ...
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
| #
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <[email protected]> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
| #
55ab9213 |
| 01-Aug-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Partially fix handling of packed amdgpu_ps arguments
Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values.
llvm
AMDGPU: Partially fix handling of packed amdgpu_ps arguments
Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values.
llvm-svn: 338618
show more ...
|
| #
5bfbae5c |
| 11-Jul-2018 |
Tom Stellard <[email protected]> |
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Me
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
show more ...
|