|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
5cae8816 |
| 06-Jul-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add GFX11 test coverage
Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10.
[AMDGPU] Add GFX11 test coverage
Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10.
Differential Revision: https://reviews.llvm.org/D129295
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
77851cc1 |
| 15-Jun-2022 |
David Stuttard <[email protected]> |
[AMDGPU] Change use null for dead sdst to be gfx1030+
Pre gfx1030 null for sdst is different. c97436f8b6e2 [AMDGPU] Use null for dead sdst operand - requires a change to make it not apply to pre gfx
[AMDGPU] Change use null for dead sdst to be gfx1030+
Pre gfx1030 null for sdst is different. c97436f8b6e2 [AMDGPU] Use null for dead sdst operand - requires a change to make it not apply to pre gfx1030
Differential Revision: https://reviews.llvm.org/D127869
show more ...
|
| #
c97436f8 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use null for dead sdst operand
Differential Revision: https://reviews.llvm.org/D127542
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
23db8e4b |
| 06-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use v_mad_u64_u32 for IMAD32
Nic Curtis done the experiments to prove it is faster than a separate mul and add.
Fixes: SWDEV-332806
Differential Revision: https://reviews.llvm.org/D127253
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
2b754384 |
| 29-Mar-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Generate checks in atomic_optimizations_*.ll
This had already been done for some of these files but not all.
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
f510045d |
| 14-Jan-2022 |
Jay Foad <[email protected]> |
[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].
Differential Revision: https://reviews.llvm.org/D117298
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
5df52f77 |
| 19-Mar-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Remove weird target triples from tests. NFC.
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2 |
|
| #
52bc2e75 |
| 03-Aug-2020 |
Nicolai Hähnle <[email protected]> |
[AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24
Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when possible. This significantly improves some game cases b
[AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24
Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when possible. This significantly improves some game cases by eliminating v_readfirstlane instructions when the result feeds into a scalar operation, like the address calculation for a scalar load or store.
Since isDivergent is only an approximation of whether a value is in SGPRs, it can potentially regress some situations where a uniform value ends up in a VGPR. These should be rare in real code, although the test changes do contain a number of examples.
Most of the test changes are just using s_mul instead of v_mul/mad which is generally better for both register pressure and latency (at least on GFX10 where sgpr pressure doesn't affect occupancy and vector ALU instructions have significantly longer latency than scalar ALU). Some R600 tests now use MULLO_INT instead of MUL_UINT24.
GlobalISel appears to handle more scenarios in the desirable way, although it can also be thrown off and fails to select the 24-bit multiplies in some cases.
Alternative solution considered and rejected was to allow selecting MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of those SD operations works is don't-care on the most significant 8 bits, and this fact is used in some combines via SimplifyDemandedBits.
Based on a patch by Nicolai Hähnle.
Differential Revision: https://reviews.llvm.org/D97063
show more ...
|
| #
7a880ab3 |
| 27-Oct-2020 |
Carl Ritson <[email protected]> |
[AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to instruction scheduling. Move the entire pass after the machine instruction scheduler and mak
[AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to instruction scheduling. Move the entire pass after the machine instruction scheduler and make changes so pass is correct for non-SSA operation. These changes should leave the pass still usable pre-scheduler, although tests have be updated to reflect post-scheduler results.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D88081
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
5d3a69fe |
| 05-Mar-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its
[AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes.
This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period.
Use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D65088
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
| #
eac23862 |
| 23-Aug-2019 |
Jay Foad <[email protected]> |
[AMDGPU] gfx10 atomic optimizer changes.
Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, crit
[AMDGPU] gfx10 atomic optimizer changes.
Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65644
llvm-svn: 369745
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init |
|
| #
27ec195f |
| 12-Jul-2019 |
Jay Foad <[email protected]> |
[AMDGPU] Fix DPP combiner check for exec modification
Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it check
[AMDGPU] Fix DPP combiner check for exec modification
Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer.
Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned.
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64393
llvm-svn: 365910
show more ...
|
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
| #
0a30f33c |
| 01-Apr-2019 |
Neil Henning <[email protected]> |
[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactiv
[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes.
Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it).
This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!).
Differential Revision: https://reviews.llvm.org/D59295
llvm-svn: 357400
show more ...
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4 |
|
| #
9e3f7d8a |
| 05-Mar-2019 |
Carl Ritson <[email protected]> |
[AMDGPU] Fix DPP operand order in atomic optimizer
Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.
Change-Id: I631d050e1c00a3b4bc7c11a9
[AMDGPU] Fix DPP operand order in atomic optimizer
Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.
Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30
Reviewers: sheredom, tpr
Reviewed By: sheredom
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58900
llvm-svn: 355394
show more ...
|
|
Revision tags: llvmorg-8.0.0-rc3 |
|
| #
8c10fa1a |
| 11-Feb-2019 |
Neil Henning <[email protected]> |
[AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug
[AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug by using a ballot instead.
Differential Revision: https://reviews.llvm.org/D57737
llvm-svn: 353703
show more ...
|
|
Revision tags: llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
| #
66416574 |
| 08-Oct-2018 |
Neil Henning <[email protected]> |
[AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by:
- Running through a function and finding a
[AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by:
- Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result.
Differential Revision: https://reviews.llvm.org/D51969
llvm-svn: 343973
show more ...
|