|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
bfcfd53b |
| 13-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were a
[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane.
Also use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D127662
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
dc6e8dfd |
| 20-Sep-2021 |
Jacob Lambert <[email protected]> |
[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
9d08f276 |
| 19-Mar-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Use reductions instead of scans in the atomic optimizer
If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a sca
[AMDGPU] Use reductions instead of scans in the atomic optimizer
If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a scan. Do this for GFX10, where the permlanex16 instruction makes it viable. For wave64 this saves a couple of dpp operations. For wave32 it saves one readlane (which are generally bad for performance) and one dpp operation.
Differential Revision: https://reviews.llvm.org/D98953
show more ...
|
| #
5a5a5312 |
| 19-Mar-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Remove some redundant code. NFC.
This is redundant because we have already checked that we can't handle divergent 64-bit atomic operands.
|
| #
5dd5ddcb |
| 18-Mar-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Skip building some IR if it won't be used. NFC.
|
| #
c96dfe0d |
| 18-Mar-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC.
|
|
Revision tags: llvmorg-12.0.0-rc3 |
|
| #
c3ce7bae |
| 02-Mar-2021 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future.
The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions.
The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D96257
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
560d7e04 |
| 20-Jan-2021 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
|
Revision tags: llvmorg-11.1.0-rc1 |
|
| #
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5 |
|
| #
0249df33 |
| 29-Sep-2020 |
Mirko Brkusanin <[email protected]> |
[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic instructions in order to avoid making unnecessary instructions when
[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic instructions in order to avoid making unnecessary instructions when -amdgpu-atomic-optimizer is present.
Differential Revision: https://reviews.llvm.org/D88315
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
aad93654 |
| 30-May-2020 |
Christopher Tetreault <[email protected]> |
[SVE] Eliminate calls to default-false VectorType::get() from AMDGPU
Reviewers: efriedma, david-arm, fpetrogalli, arsenm
Reviewed By: david-arm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehn
[SVE] Eliminate calls to default-false VectorType::get() from AMDGPU
Reviewers: efriedma, david-arm, fpetrogalli, arsenm
Reviewed By: david-arm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, tschuett, hiraditya, rkruppe, psnobl, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80328
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
5d3a69fe |
| 05-Mar-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its
[AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes.
This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period.
Use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D65088
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
05da2fe5 |
| 13-Nov-2019 |
Reid Kleckner <[email protected]> |
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
show more ...
|
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
| #
eac23862 |
| 23-Aug-2019 |
Jay Foad <[email protected]> |
[AMDGPU] gfx10 atomic optimizer changes.
Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, crit
[AMDGPU] gfx10 atomic optimizer changes.
Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65644
llvm-svn: 369745
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc2 |
|
| #
dcb75324 |
| 29-Jul-2019 |
Jay Foad <[email protected]> |
[DivergenceAnalysis] Add methods for querying divergence at use
Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is unif
[DivergenceAnalysis] Add methods for querying divergence at use
Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is uniform at its definition, a use of it in another basic block can be divergent because of divergent control flow between the def and the use.
This patch adds new isDivergent(Use) methods to DivergenceAnalysis, LegacyDivergenceAnalysis and GPUDivergenceAnalysis.
This might allow D63953 or other similar workarounds to be removed.
Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue
Reviewed By: nhaehnle
Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65141
llvm-svn: 367218
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc1 |
|
| #
298500ae |
| 22-Jul-2019 |
Jay Foad <[email protected]> |
[AMDGPU] Save some work when an atomic op has no uses
Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an
[AMDGPU] Save some work when an atomic op has no uses
Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC.
Reviewers: arsenm, dstuttard, tpr
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64981
llvm-svn: 366667
show more ...
|
| #
7d06ffff |
| 19-Jul-2019 |
Jay Foad <[email protected]> |
[AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3
[AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because:
1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction.
Because of #2 and #3 the end result is improved from this:
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
To this:
v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
I.e. two fewer computational instructions, one extra nop where we could schedule something else.
Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64411
llvm-svn: 366543
show more ...
|
|
Revision tags: llvmorg-10-init |
|
| #
70235c64 |
| 17-Jul-2019 |
Jay Foad <[email protected]> |
[AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.
Reviewers: arsenm, sheredom
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t
[AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.
Reviewers: arsenm, sheredom
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64809
llvm-svn: 366323
show more ...
|
| #
17060f0a |
| 16-Jul-2019 |
Jay Foad <[email protected]> |
[AMDGPU] Optimize atomic max/min
Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract.
Reviewers: arsenm, sheredom, critson, rampit
[AMDGPU] Optimize atomic max/min
Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64328
llvm-svn: 366235
show more ...
|
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
| #
38902350 |
| 08-Jul-2019 |
Jay Foad <[email protected]> |
[AMDGPU] Use a named predicate instead of a magic number.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, ll
[AMDGPU] Use a named predicate instead of a magic number.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64201
llvm-svn: 365294
show more ...
|
|
Revision tags: llvmorg-8.0.1-rc3 |
|
| #
68a2fef9 |
| 13-Jun-2019 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301
llvm-svn: 363339
|
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0 |
|
| #
3a31b3f6 |
| 14-Mar-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Don't add unnecessary convergent attributes
These are redundant with the intrinsic declaration.
llvm-svn: 356143
|
|
Revision tags: llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4 |
|
| #
9e3f7d8a |
| 05-Mar-2019 |
Carl Ritson <[email protected]> |
[AMDGPU] Fix DPP operand order in atomic optimizer
Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.
Change-Id: I631d050e1c00a3b4bc7c11a9
[AMDGPU] Fix DPP operand order in atomic optimizer
Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.
Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30
Reviewers: sheredom, tpr
Reviewed By: sheredom
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58900
llvm-svn: 355394
show more ...
|
|
Revision tags: llvmorg-8.0.0-rc3 |
|
| #
582c1601 |
| 11-Feb-2019 |
Benjamin Kramer <[email protected]> |
[AMDGPU] Remove unused variable
llvm-svn: 353704
|
| #
8c10fa1a |
| 11-Feb-2019 |
Neil Henning <[email protected]> |
[AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug
[AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug by using a ballot instead.
Differential Revision: https://reviews.llvm.org/D57737
llvm-svn: 353703
show more ...
|