|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
79e77a9f |
| 23-Jun-2022 |
Baptiste Saleil <[email protected]> |
[AMDGPU] Flush the vmcnt counter in loop preheaders when necessary
waitcnt vmcnt instructions are currently generated in loop bodies before using values loaded outside of the loop. In some cases, it
[AMDGPU] Flush the vmcnt counter in loop preheaders when necessary
waitcnt vmcnt instructions are currently generated in loop bodies before using values loaded outside of the loop. In some cases, it is better to flush the vmcnt counter in a loop preheader before entering the loop body. This patch detects these cases and generates waitcnt instructions to flush the counter.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D115747
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
2a683647 |
| 14-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 waitcnt support for VINTERP and LDSDIR instructions
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D127781
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
6c372daa |
| 08-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn
Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions.
Differential Revision: https://reviews.llv
[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn
Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions.
Differential Revision: https://reviews.llvm.org/D127315
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
d21b9b49 |
| 21-Apr-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn.
Contributors: Jay Foad <[email protected]>
Patch
[AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn.
Contributors: Jay Foad <[email protected]>
Patch 7/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125319
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D125498
show more ...
|
| #
791ec1c6 |
| 13-May-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds
Differential Revision: https://reviews.llvm.org/D124884
|
| #
51e02409 |
| 27-Apr-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Produce waitcounts for LDS DMA
MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written can be accessed. A load from LDS to VMEM does not need a wait.
Differential Revisio
[AMDGPU] Produce waitcounts for LDS DMA
MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written can be accessed. A load from LDS to VMEM does not need a wait.
Differential Revision: https://reviews.llvm.org/D124626
show more ...
|
| #
7f97ac94 |
| 19-Apr-2022 |
Austin Kerbow <[email protected]> |
Revert "[AMDGPU] Omit unnecessary waitcnt before barriers"
This reverts commit 8d0c34fd4fb66ea0d19563154a59658e4b7f35d4.
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
04fff547 |
| 07-Mar-2022 |
Venkata Ramanaiah Nalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
8d0c34fd |
| 25-Feb-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Omit unnecessary waitcnt before barriers
It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an ex
[AMDGPU] Omit unnecessary waitcnt before barriers
It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an exception when traps are enabled. Add a new subtarget feature which tracks which HW has this ability.
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D120544
show more ...
|
| #
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
| #
c87c61c5 |
| 14-Feb-2022 |
Joe Nash <[email protected]> |
[AMDGPU] Fix AGPR offset for waitcnt
An enum value stores the offset between AGPR ranges and VGPR ranges in the internal storage of SIInsertWaitcnts. It said 226 when it should say 256, causing some
[AMDGPU] Fix AGPR offset for waitcnt
An enum value stores the offset between AGPR ranges and VGPR ranges in the internal storage of SIInsertWaitcnts. It said 226 when it should say 256, causing some portion of the ranges to overlap. That in turn causes 'aliasing' between the registers, potentially inserting waitcnts that are not required.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D119749
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
f1e36474 |
| 24-Jan-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix debug prints
Print the instructions instead of pointers.
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
8dfb417e |
| 17-Jan-2022 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Fix missing waitcnt issue
Ignore out of order counters when merging brackets. The fact that there was a pending event in the old state does not guarantee that the waitcnt was generated, so
[AMDGPU] Fix missing waitcnt issue
Ignore out of order counters when merging brackets. The fact that there was a pending event in the old state does not guarantee that the waitcnt was generated, so we still need to conservatively re-process the block.
The patch fixes a correctness issue where the block was not re-processed and the waitcnt not inserted in consequence.
Differential Revision: https://reviews.llvm.org/D117544
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
09b53296 |
| 22-Dec-2021 |
Ron Lieberman <[email protected]> |
Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.
Failed amdgpu runtime buildbot # 3514
|
| #
9075009d |
| 22-Dec-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
| #
1e93f389 |
| 18-Dec-2021 |
Jakub Kuderski <[email protected]> |
[AMDGPU] Use enum_seq to iterator over InstCounterTypes. NFC.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D115900
|
| #
d9ae852f |
| 18-Dec-2021 |
Jakub Kuderski <[email protected]> |
[AMDGPU] Fix data race in SIInsertWaitcnts
The race condition happened when two pass managers ran on two different modules but modified/read the global variables.
To address this, I considered usin
[AMDGPU] Fix data race in SIInsertWaitcnts
The race condition happened when two pass managers ran on two different modules but modified/read the global variables.
To address this, I considered using singletons and freestanding functions to allow getting/setting `HardwareLimits` and `RegisterEncoding`, or making it local to the pass. I chose the latter and made it a member of `WaitcntsBrackets`, to minimizes the amount of global state.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D115896
show more ...
|
| #
6a7db0dc |
| 15-Dec-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Skip some work on subtargets without scalar stores. NFC.
|
| #
0e8590f0 |
| 30-Nov-2021 |
David Stuttard <[email protected]> |
[AMDGPU] Add support for in-order bvh in waitcnt pass
bvh should be handled separately from vmem and vmem with sampler instructions for waitcnt handling.
Differential Revision: https://reviews.llvm
[AMDGPU] Add support for in-order bvh in waitcnt pass
bvh should be handled separately from vmem and vmem with sampler instructions for waitcnt handling.
Differential Revision: https://reviews.llvm.org/D114794
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
ee0133dc |
| 16-Nov-2021 |
Kazu Hirata <[email protected]> |
[llvm] Use range-for loops (NFC)
|
| #
d1f45ed5 |
| 11-Nov-2021 |
Neubauer, Sebastian <[email protected]> |
[AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
|
| #
76cbe622 |
| 25-Oct-2021 |
Thomas Symalla <[email protected]> |
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not goin
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D111637
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
a4db7025 |
| 12-May-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Remove assert
Remove assert introduced in D101177, following post-commit feedback.
|
| #
68137ef5 |
| 12-May-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Skip invariant loads when avoiding WAR conflicts
No need to handle invariant loads when avoiding WAR conflicts, as there cannot be a vector store to the same memory location.
Reviewed By:
[AMDGPU] Skip invariant loads when avoiding WAR conflicts
No need to handle invariant loads when avoiding WAR conflicts, as there cannot be a vector store to the same memory location.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D101177
show more ...
|
| #
4433f460 |
| 11-May-2021 |
Austin Kerbow <[email protected]> |
[AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2
The waitcnt pass would increment the number of vmem events for some buffer invalidates that were not handled by the pass.
Reviewed By: rampi
[AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2
The waitcnt pass would increment the number of vmem events for some buffer invalidates that were not handled by the pass.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D102252
show more ...
|