|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
f510045d |
| 14-Jan-2022 |
Jay Foad <[email protected]> |
[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].
Differential Revision: https://reviews.llvm.org/D117298
|
| #
33b45ee4 |
| 27-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Handle addrspacecast of constant 32-bit to flat
I accidentally made this work on the GlobalISel path, and there's no real reason not to handle this.
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
c0581f7d |
| 05-Jan-2022 |
David Salinas <[email protected]> |
Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test Q
Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup
Change-Id: Ifc167b3c2dae7a65920676f22a97ba76485f3456
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D116686
Change-Id: I1abf49b74a7e2ba0e0205f747a4154a468b9d7f2
show more ...
|
| #
085f0783 |
| 05-Jan-2022 |
Nico Weber <[email protected]> |
Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62. The change contained many unrelated changes and e.g. restored un
Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.
show more ...
|
| #
859ebca7 |
| 23-Dec-2021 |
David Salinas <[email protected]> |
Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU an
Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup
Change-Id: Ibf8e397df94001f248fba609f072088a46abae08
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D115960
Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
729bf9b2 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway.
Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.
show more ...
|
| #
640beb38 |
| 30-Aug-2021 |
Michael Liao <[email protected]> |
[amdgpu] Enable selection of `s_cselect_b64`.
Differential Revision: https://reviews.llvm.org/D109159
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
ed0f4415 |
| 15-Jul-2021 |
alex-t <[email protected]> |
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode dive
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
show more ...
|
| #
4a3b0556 |
| 09-Jul-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Fix flags of V_MOV_B64_PSEUDO
In particular it was not rematerializable.
Differential Revision: https://reviews.llvm.org/D105724
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
96e1fcb1 |
| 07-Jun-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differe
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differential Revision: https://reviews.llvm.org/D103322
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
5682ae2f |
| 25-Mar-2021 |
madhur13490 <[email protected]> |
[AMDGPU] Set implicit arg attributes for indirect calls
This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken.
Once su
[AMDGPU] Set implicit arg attributes for indirect calls
This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken.
Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D99347
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
| #
3c297a25 |
| 10-Feb-2021 |
madhur13490 <[email protected]> |
Make fixed-abi default for AMD HSA OS
fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple.
Reviewed
Make fixed-abi default for AMD HSA OS
fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D96340
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
d2e52eec |
| 10-Nov-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Select global saddr mode from SGPR pointer
Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.
|
| #
5ab17021 |
| 10-Nov-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
|
| #
3fdf3b15 |
| 14-Oct-2020 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2 |
|
| #
62d8b8a2 |
| 30-Jul-2020 |
Piotr Sobczak <[email protected]> |
Fix 64-bit copy to SCC
Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64.
Before introd
Fix 64-bit copy to SCC
Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64.
Before introducing the S_CSELECT pattern with explicit SCC (0045786f146e78afee49eee053dc29ebc842fee1), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64).
The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered.
The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64).
Differential Revision: https://reviews.llvm.org/D85207
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
0045786f |
| 04-Mar-2020 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Re-commit D81925 with a bugfix D82370.
Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370
show more ...
|
| #
778351df |
| 24-Jun-2020 |
Matt Arsenault <[email protected]> |
Revert "[AMDGPU] Enable compare operations to be selected by divergence"
This reverts commit 521ac0b5cea02f629d035f807460affbb65ae7ad.
Reported to break thousands of piglit tests.
|
| #
521ac0b5 |
| 19-Jun-2020 |
alex-t <[email protected]> |
[AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.
Reviewers: rampitec, arsenm
[AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82194
show more ...
|
| #
6d9565d6 |
| 19-Jun-2020 |
Piotr Sobczak <[email protected]> |
Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with expensive checks enabled.
This reverts commit 4067de569f119a81419fbf2e79d5f3307dfdda5b.
|
| #
4067de56 |
| 04-Mar-2020 |
Piotr Sobczak <[email protected]> |
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm
[AMDGPU] Select s_cselect
Summary: Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81925
show more ...
|
| #
bb10fa3a |
| 24-May-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix wrong null value for private address space
I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before.
Also don't allow o
AMDGPU: Fix wrong null value for private address space
I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before.
Also don't allow offset folding for -1 pointers, since it looks weird to partially fold this.
show more ...
|
| #
074b8026 |
| 13-May-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix DAG divergence for implicit function arguments
This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answ
AMDGPU: Fix DAG divergence for implicit function arguments
This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answer for the queue ptr argument in callable functions, since it's not an explicit IR argument and is always uniform.
Fixes not using scalar loads for the aperture in addrspacecast lowering, and any other places that use implicit SGPR arguments.
show more ...
|
| #
fda0c8df |
| 07-May-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Lower addrspacecast to 32-bit constant
Somehow this was missing from the DAG path, but not global isel.
|
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
60b1967c |
| 21-Jan-2020 |
Scott Linder <[email protected]> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|