|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
fd64a857 |
| 29-Jun-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. T
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. This patch also cleans up the SIOptimizeExecMasking pass a bit.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D129073
show more ...
|
| #
86bd7e20 |
| 04-Jul-2022 |
Thomas Symalla <[email protected]> |
[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass.
Reviewed By: foad
Differe
[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D129086
show more ...
|
| #
7736ce1c |
| 24-Jun-2022 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
It was causing bad machine code for several blender scenes: *** Bad machine code: Using an undefined physical register *** - func
AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
It was causing bad machine code for several blender scenes: *** Bad machine code: Using an undefined physical register *** - function: kernel_holdout_emission_blurring_pathtermination_ao - basic block: %bb.28 if.end40.i (0x7f84861a2320) - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec - operand 1: $vgpr3
Differential Revision: https://reviews.llvm.org/D127768
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
6d97ca69 |
| 08-Apr-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.
We found that it might be beneficial to have the SIOptimizeExecMasking pass detect more cases where v_cmp, s_and_saveexec patterns
[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.
We found that it might be beneficial to have the SIOptimizeExecMasking pass detect more cases where v_cmp, s_and_saveexec patterns can be transformed to s_mov, v_cmpx patterns. Currently, the search range for finding a fitting v_cmp instruction is 5, however, this is doubled to 10 here.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D123367
show more ...
|
| #
1a6aa8b1 |
| 31-Mar-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Add missing use check in SIOptimizeExecMasking pass.
Whenever a v_cmp, s_and_saveexec instruction sequence shall be transformed to an equivalent s_mov, v_cmpx sequence, it needs to be detec
[AMDGPU] Add missing use check in SIOptimizeExecMasking pass.
Whenever a v_cmp, s_and_saveexec instruction sequence shall be transformed to an equivalent s_mov, v_cmpx sequence, it needs to be detected if the v_cmp target register is used between the two instructions as the v_cmp result gets omitted by using the v_cmpx instruction, resulting in invalid code.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D122797
show more ...
|
| #
3bd15c03 |
| 25-Mar-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Fix adding modifiers when creating v_cmpx instructions.
Revision https://reviews.llvm.org/D122332 added a pattern transformation where v_cmpx instructions are introduced. However, the modif
[AMDGPU] Fix adding modifiers when creating v_cmpx instructions.
Revision https://reviews.llvm.org/D122332 added a pattern transformation where v_cmpx instructions are introduced. However, the modifiers are not correctly inherited from the original operands. The patch adds the source modifiers, if they are exist, or sets them to 0.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D122489
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
718aec20 |
| 01-Feb-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR.
An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison.
This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence.
It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets.
Same as D119696 including a buildbot and MIR test fix.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D122332
show more ...
|
| #
7de6107d |
| 21-Mar-2022 |
Thomas Symalla <[email protected]> |
Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64.
Differential Revision: https://reviews.
Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64.
Differential Revision: https://reviews.llvm.org/D122117
show more ...
|
| #
e725e2af |
| 21-Mar-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] [NFC] Fix missing include.
|
| #
011c6419 |
| 01-Feb-2022 |
Thomas Symalla <[email protected]> |
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR.
An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison.
This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence.
It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets.
Reviewed By: sebastian-ne, critson
Differential Revision: https://reviews.llvm.org/D119696
show more ...
|
| #
d86dcb7e |
| 16-Feb-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Return better Changed status from SIOptimizeExecMasking
Differential Revision: https://reviews.llvm.org/D120024
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
| #
c16f7760 |
| 10-Feb-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Us
[AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D94746
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
560d7e04 |
| 20-Jan-2021 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
|
Revision tags: llvmorg-11.1.0-rc1 |
|
| #
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
fde83517 |
| 10-Nov-2020 |
Carl Ritson <[email protected]> |
[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY.
Reviewed By: arsenm
Differential Revision: https://reviews.
[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D90451
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
0576f436 |
| 10-Sep-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes not emit the or to exec at the beginning of the block, where
AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes not emit the or to exec at the beginning of the block, where it really has to be. If there is an instruction that defines one of the source operands, split the block and turn the si_end_cf into a terminator.
This avoids regressions when regalloc fast is switched to inserting reloads at the beginning of the block, instead of spills at the end of the block.
In a future change, this should always split the block.
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1 |
|
| #
44211f20 |
| 23-Jul-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Optimize copies to exec with other insts after exec def
It's possible to have terminator instructions after a write to exec, so skip over them to find it.
|
| #
b3e63aa8 |
| 23-Jul-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Don't assume there is only one terminator copy
This would stop on the first in reverse order, failing the verifier if there were more earlier in the block.
|
|
Revision tags: llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
| #
9acd9544 |
| 27-Dec-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Use Register
|
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
05da2fe5 |
| 13-Nov-2019 |
Reid Kleckner <[email protected]> |
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
show more ...
|
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
| #
4b7fc85c |
| 20-Aug-2019 |
Matt Arsenault <[email protected]> |
Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"
This reverts r367500 and r369203. This is causing various test failures.
llvm-svn: 369417
|
| #
0c476111 |
| 15-Aug-2019 |
Daniel Sanders <[email protected]> |
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Re
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc2 |
|
| #
d48324ff |
| 01-Aug-2019 |
Matt Arsenault <[email protected]> |
Reapply "AMDGPU: Split block for si_end_cf"
This reverts commit r359363, reapplying r357634
llvm-svn: 367500
|
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init |
|
| #
e67cc380 |
| 11-Jul-2019 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx908 mfma support
Differential Revision: https://reviews.llvm.org/D64584
llvm-svn: 365824
|
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
| #
52500216 |
| 16-Jun-2019 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden getRegClass().
Differential Revision: https://reviews.llvm.org/D63351
llvm-svn: 363513
|