|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
04fff547 |
| 07-Mar-2022 |
Venkata Ramanaiah Nalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
09b53296 |
| 22-Dec-2021 |
Ron Lieberman <[email protected]> |
Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.
Failed amdgpu runtime buildbot # 3514
|
| #
9075009d |
| 22-Dec-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
76cbe622 |
| 25-Oct-2021 |
Thomas Symalla <[email protected]> |
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not goin
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D111637
show more ...
|
| #
9bd5cfeb |
| 23-Oct-2021 |
Abinav Puthan Purayil <[email protected]> |
[AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics.
These intrinsics maps to the 24-bit v_mul_hi instructions.
This change also fixes an incorrect assumption on the associativity of 24-bit mul
[AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics.
These intrinsics maps to the 24-bit v_mul_hi instructions.
This change also fixes an incorrect assumption on the associativity of 24-bit mulhi in its SDNode record in tblgen.
Differential Revision: https://reviews.llvm.org/D112394
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
48958d02 |
| 23-Aug-2021 |
Daniil Fukalov <[email protected]> |
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `G
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108596
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
a7786bad |
| 15-Jun-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Move zeroed FP high bits optimization to patterns
|
|
Revision tags: llvmorg-12.0.1-rc1 |
|
| #
c714d037 |
| 06-May-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32
Differential Revision: https://reviews.llvm.org/D102022
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3 |
|
| #
f89f6d1e |
| 02-Feb-2021 |
Thomas Symalla <[email protected]> |
[AMDGPU]: Fixes an invalid clamp selection pattern.
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong inval
[AMDGPU]: Fixes an invalid clamp selection pattern.
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95878
show more ...
|
| #
d49efdc9 |
| 03-Feb-2021 |
Sebastian Neubauer <[email protected]> |
Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path."
This reverts commits 62af0305b7cc..677a3529d3e6 from D93708. They cause failures in the sanitizer builds because of uninitialized va
Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path."
This reverts commits 62af0305b7cc..677a3529d3e6 from D93708. They cause failures in the sanitizer builds because of uninitialized values.
A fix is in D95878, but it might take some time until this is pushed, so reverting the changes for now.
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
6604d81e |
| 13-Jan-2021 |
Thomas Symalla <[email protected]> |
Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review.
|
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
c5c58fd6 |
| 17-Jun-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove intermediate DAG node for trig_preop intrinsic
We weren't doing anything with this, and keeping it would just add more boilerplate for GlobalISel.
|
|
Revision tags: llvmorg-10.0.1-rc1 |
|
| #
97f3f0ba |
| 06-May-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Add intrinsic for s_setreg
This will be more useful with fenv access implemented.
|
| #
0312b9f5 |
| 23-Apr-2020 |
Kazuaki Ishizaki <[email protected]> |
[llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4 |
|
| #
f4637925 |
| 06-Sep-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove custom node for RSQ_LEGACY
Directly select from the intrinsic. This wasn't getting much value from the custom node.
|
| #
e87ec667 |
| 05-Apr-2020 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll
|
| #
4c1c9422 |
| 18-Feb-2020 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Select llvm.amdgcn.fdot2
I'm slighly worried about the generated checks, since they won't catch incorrect modifiers being added at the end of the line.
|
| #
fab4cdea |
| 20-Feb-2020 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Select llvm.amdgcn.fmul.legacy
|
| #
361f2a78 |
| 10-Feb-2020 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Handle sbfe/ubfe intrinsic
Try to handle arbitrary scalar BFEs by packing the operands. The DAG gives up on non-constant arguments. We're still missing any constant folding, so we
AMDGPU/GlobalISel: Handle sbfe/ubfe intrinsic
Try to handle arbitrary scalar BFEs by packing the operands. The DAG gives up on non-constant arguments. We're still missing any constant folding, so we end up with pretty ugly code most of the time. Also handle the 64-bit scalar case, which the DAG doesn't try to do.
show more ...
|
| #
d1b393d9 |
| 07-Feb-2020 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Select G_CTTZ_ZERO_UNDEF
Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.
|
| #
045a8921 |
| 07-Feb-2020 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Select G_CTLZ_ZERO_UNDEF
Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.
|
| #
00115d76 |
| 06-Jan-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove dead kill handling
At one point a custom node was used for kill handling, but now the intrinsic is directly selected. Remove leftover pattern machinery.
|
| #
68b102b9 |
| 01-Jan-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16
Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both in
AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16
Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both instructions in the output need it. Neither considers that both output instructions depend on m0. I don't know of any other pattern we need to handle this case, so it's less effort to just workaround this for now.
show more ...
|
| #
eef92f25 |
| 05-Jan-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove custom node for exports
I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic.
Requires hacking out the illegal type on SI, so manually sele
AMDGPU: Remove custom node for exports
I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic.
Requires hacking out the illegal type on SI, so manually select that case during lowering.
show more ...
|
| #
7fa0bfe7 |
| 08-Sep-2019 |
Matt Arsenault <[email protected]> |
AMDGPU/GlobalISel: Select mul24 intrinsics
|