|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
4874838a |
| 28-Jun-2022 |
Piotr Sobczak <[email protected]> |
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D1287
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D128756
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
13107c27 |
| 16-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to a
[AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to allow concurrent issue of LDS direct with VALU execution.
Also detect LDS direct versus VMEM source VGPR hazards and insert vm_vsrc=0 waits using s_waitcnt_depctr.
Differential Revision: https://reviews.llvm.org/D127963
show more ...
|
| #
9dff14be |
| 15-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add support for GFX11 hazards
Add support for partial stall over EXEC hazard and trans use hazard.
Differential Revision: https://reviews.llvm.org/D127872
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
bd9eed3a |
| 22-May-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Add isMFMA helper function. NFC
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D127124
|
| #
5c974d08 |
| 08-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Fix hazard handling of v_cmpx to permlane
- VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane.
Diffe
[AMDGPU] Fix hazard handling of v_cmpx to permlane
- VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane.
Differential Revision: https://reviews.llvm.org/D127344
show more ...
|
|
Revision tags: llvmorg-14.0.3 |
|
| #
63f21f4c |
| 27-Apr-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Handle LDS DMA and LDS_DIRECT hazards
There shall be 1 wait state between M0 write and LDS DMA/LDS_DIRECT use.
Differential Revision: https://reviews.llvm.org/D124550
|
|
Revision tags: llvmorg-14.0.2 |
|
| #
d951d937 |
| 13-Apr-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Increate hazard for store dwordx3/4 to 2 waitstates on gfx940
Fixes: SWDEV-327053
Differential Revision: https://reviews.llvm.org/D123687
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
f311f934 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx940 VALU hazard recognizer
Differntial Revision: https://reviews.llvm.org/D122339
|
| #
64838ba3 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use GenericTable to classify DGEMM
Since there is a table introduced for MAI instructions extend it to use for DGEMM classification.
Differential Revision: https://reviews.llvm.org/D122337
|
| #
cad9de71 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx940 MAI hazard recognizer
Differential Revision: https://reviews.llvm.org/D122263
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
1e15adba |
| 04-Mar-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Add s_nop WaitStates between neighboring mfma
In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad s
[AMDGPU] Add s_nop WaitStates between neighboring mfma
In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad some portion of these stall cycles with s_nops.
Fixes: SWDEV-326925
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121437
show more ...
|
| #
e9a49c64 |
| 17-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx940 basic speed model
This is incomplete and will handle more instructions as they are added.
Differential Revision: https://reviews.llvm.org/D121966
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
380ff31d |
| 22-Feb-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Fix typo in comment [NFC]
This replaces "V_MOB_B32" with "V_MOV_B32" in some comment.
|
| #
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
dbf278b9 |
| 21-Jan-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are
[AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read.
Note that this inevitably increases register pressure to the point where some programs will become uncompilable.
This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst.
Fixes: SWDEV-318900
Differential Revision: https://reviews.llvm.org/D117844
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
d6b07348 |
| 19-Jan-2022 |
Jim Lin <[email protected]> |
[NFC] Use Register instead of unsigned
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
661a232e |
| 19-Nov-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Remove a no-op check in the gfx90a hazard recognizer
Also rename helper function accordingly.
Differential Revision: https://reviews.llvm.org/D114289
|
| #
d1f45ed5 |
| 11-Nov-2021 |
Neubauer, Sebastian <[email protected]> |
[AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
4f5ba46e |
| 17-Aug-2021 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to zero for all non-instructions. With that we can avoid the special handling for them in `getWa
[AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to zero for all non-instructions. With that we can avoid the special handling for them in `getWaitStatesSince` and `AdvanceCycle`. This NFC patch makes the handling more generic.
show more ...
|
| #
68660767 |
| 13-Aug-2021 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE are only placeholders to prevent certain unwanted transformations and will get discarded during
[AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE are only placeholders to prevent certain unwanted transformations and will get discarded during assembly emission. They should not be counted during nop insertion.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D108022
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
e0c382a9 |
| 14-Jun-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Limit runs of fixLdsBranchVmemWARHazard
The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds access followed by a branch, followed by an lds/vmem access.
The handling of
[AMDGPU] Limit runs of fixLdsBranchVmemWARHazard
The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds access followed by a branch, followed by an lds/vmem access.
The handling of the hazard requires an arbitrary number of instructions to process. In the worst case where a function has a vmem access, but no lds accesses, all instructions are examined only to conclude that the hazard cannot occur.
Add the pre-processing stage which detects if there is both lds and vmem present in the function and only then does the more costly search.
This patch significantly improves compilation time in the cases the hazard cannot happen. In one pathological case I looked at IsHazardInst is needlesly called 88.6 milions times.
The numbers could also be improved by introducing a map around the inner calls to ::getWaitStatesSince in fixLdsBranchVmemWARHazard, but nothing will beat not running fixLdsBranchVmemWARHazard at all in the cases detected by shouldRunLdsBranchVmemWARHazardFixup().
Differential Revision: https://reviews.llvm.org/D104219
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1 |
|
| #
f251379a |
| 30-Apr-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Simplify getWaitStatesSince. NFC.
|
| #
424f1f6f |
| 30-Apr-2021 |
Carl Ritson <[email protected]> |
[AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the i
[AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D101430
show more ...
|
| #
749702fc |
| 29-Apr-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Remove dead early-out in GCNHazardRecognizer
Remove an early-out in wait state counting which can never be taken.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.o
[AMDGPU] Remove dead early-out in GCNHazardRecognizer
Remove an early-out in wait state counting which can never be taken.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D101520
show more ...
|
| #
12011b52 |
| 27-Apr-2021 |
Jay Foad <[email protected]> |
[AMDGPU] GCNHazardRecognizer: ignore all meta instructions
This is hopefully NFC, but should be more robust in ignoring all instructions that should be ignored, instead of just some of them.
Differ
[AMDGPU] GCNHazardRecognizer: ignore all meta instructions
This is hopefully NFC, but should be more robust in ignoring all instructions that should be ignored, instead of just some of them.
Differential Revision: https://reviews.llvm.org/D101372
show more ...
|