|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
2dfe4194 |
| 02-Jun-2022 |
Julien Pages <[email protected]> |
[AMDGPU] Improve codegen of extractelement/insertelement in some cases
This patch improves the codegen of extractelement and insertelement for vector containing 8 elements. Before, a dag combine tra
[AMDGPU] Improve codegen of extractelement/insertelement in some cases
This patch improves the codegen of extractelement and insertelement for vector containing 8 elements. Before, a dag combine transformation was generating a sequence of 8 select/cmp. This patch changes the upper limit for this transformation and the movrel instruction will eventually be used instead. Extractlement/insertelement for vectors containing less than 8 elements are unchanged.
Differential Revision: https://reviews.llvm.org/D126389
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
f510045d |
| 14-Jan-2022 |
Jay Foad <[email protected]> |
[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].
Differential Revision: https://reviews.llvm.org/D117298
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
c0581f7d |
| 05-Jan-2022 |
David Salinas <[email protected]> |
Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test Q
Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup
Change-Id: Ifc167b3c2dae7a65920676f22a97ba76485f3456
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D116686
Change-Id: I1abf49b74a7e2ba0e0205f747a4154a468b9d7f2
show more ...
|
| #
085f0783 |
| 05-Jan-2022 |
Nico Weber <[email protected]> |
Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62. The change contained many unrelated changes and e.g. restored un
Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.
show more ...
|
| #
859ebca7 |
| 23-Dec-2021 |
David Salinas <[email protected]> |
Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU an
Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup
Change-Id: Ibf8e397df94001f248fba609f072088a46abae08
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D115960
Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
fce5a567 |
| 02-Nov-2021 |
Jay Foad <[email protected]> |
[AMDGPU] More robust checks in extract_vector_dynelt.ll
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
640beb38 |
| 30-Aug-2021 |
Michael Liao <[email protected]> |
[amdgpu] Enable selection of `s_cselect_b64`.
Differential Revision: https://reviews.llvm.org/D109159
|
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
ed0f4415 |
| 15-Jul-2021 |
alex-t <[email protected]> |
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode dive
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
0045786f |
| 04-Mar-2020 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Re-commit D81925 with a bugfix D82370.
Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370
show more ...
|
| #
778351df |
| 24-Jun-2020 |
Matt Arsenault <[email protected]> |
Revert "[AMDGPU] Enable compare operations to be selected by divergence"
This reverts commit 521ac0b5cea02f629d035f807460affbb65ae7ad.
Reported to break thousands of piglit tests.
|
| #
521ac0b5 |
| 19-Jun-2020 |
alex-t <[email protected]> |
[AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.
Reviewers: rampitec, arsenm
[AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82194
show more ...
|
| #
6d9565d6 |
| 19-Jun-2020 |
Piotr Sobczak <[email protected]> |
Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with expensive checks enabled.
This reverts commit 4067de569f119a81419fbf2e79d5f3307dfdda5b.
|
| #
4067de56 |
| 04-Mar-2020 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81925
show more ...
|
| #
1dfd1b3e |
| 20-May-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Tune threshold for cmp/select vector lowering
It was set in total vector size while the idea was to limit a number of instructions. Now it started to work with doubles and thresholds needs
[AMDGPU] Tune threshold for cmp/select vector lowering
It was set in total vector size while the idea was to limit a number of instructions. Now it started to work with doubles and thresholds needs to be updated.
Differential Revision: https://reviews.llvm.org/D80322
show more ...
|
| #
4eecf171 |
| 15-May-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Always expand ext/insertelement with divergent idx
Even though series of cmd/cndmask can produce quite a lot of code that is still better than a loop. In case of doubles we would even produ
[AMDGPU] Always expand ext/insertelement with divergent idx
Even though series of cmd/cndmask can produce quite a lot of code that is still better than a loop. In case of doubles we would even produce two loops.
Differential Revision: https://reviews.llvm.org/D80032
show more ...
|
| #
9d4cf5bd |
| 14-May-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Make v16f64/v16i64 legal
This allows indirect VGPR addressing to work.
Differential Revision: https://reviews.llvm.org/D79960
|
| #
591b029f |
| 13-May-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Optimized indirect multi-VGPR addressing
SelectMOVRELOffset prevents peeling of a constant from an index if final base could be negative. isBaseWithConstantOffset() succeeds if a value is a
[AMDGPU] Optimized indirect multi-VGPR addressing
SelectMOVRELOffset prevents peeling of a constant from an index if final base could be negative. isBaseWithConstantOffset() succeeds if a value is an "add" or "or" operator. In case of "or" it shall be an add-like "or" which never changes a sign of the sum given a non-negative offset. I.e. we can safely allow peeling if operator is an "or".
Differential Revision: https://reviews.llvm.org/D79898
show more ...
|
| #
71ed66d9 |
| 12-May-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Make v4i64/v4f64/v8i64/v8f64 legal
We can produce such vectors in the Promote Alloca pass, but we are unable to use movrel to operate it and lower via scratch. Making it legal makes SI_INDI
[AMDGPU] Make v4i64/v4f64/v8i64/v8f64 legal
We can produce such vectors in the Promote Alloca pass, but we are unable to use movrel to operate it and lower via scratch. Making it legal makes SI_INDIRECT patterns work.
There is more work to do in subsequent changes:
1. We initialize m0 twice to access each dword. It shall be possible to only do it once and increment base register number instead. 2. We also need v16i64/v16f64 but these first need to be added to tablegen.
Differential Revision: https://reviews.llvm.org/D79808
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
31479d86 |
| 11-Nov-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Change boolean content type to 0 or 1
The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to s
AMDGPU: Change boolean content type to 0 or 1
The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.
show more ...
|
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
| #
bcb34ac2 |
| 13-Nov-2018 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction in
[AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big.
Differential Revision: https://reviews.llvm.org/D54351
llvm-svn: 346800
show more ...
|
| #
35de877e |
| 13-Nov-2018 |
Stanislav Mekhanoshin <[email protected]> |
Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling
Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed.
Differential Revision: https:/
Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling
Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed.
Differential Revision: https://reviews.llvm.org/D54440
llvm-svn: 346795
show more ...
|