|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
96e1fcb1 |
| 07-Jun-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differe
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differential Revision: https://reviews.llvm.org/D103322
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5 |
|
| #
04419628 |
| 04-Apr-2021 |
Thomas Preud'homme <[email protected]> |
[AMDGPU, test] Fix use of undef FileCheck var
Test CodeGen/AMDGPU/amdgpu.private-memory.ll and CodeGen/AMDGPU/private-memory-r600.ll have a block of CHECK directives whose prefix is inconsistent: R6
[AMDGPU, test] Fix use of undef FileCheck var
Test CodeGen/AMDGPU/amdgpu.private-memory.ll and CodeGen/AMDGPU/private-memory-r600.ll have a block of CHECK directives whose prefix is inconsistent: R600-CHECK Vs R600. This leads to a R600-NOT directive using an undefined CHAN variable due to R600-CHECK directives never being considered by FileCheck. Fixing the prefix leads to the testcase failing. As per https://reviews.llvm.org/D99865#2675235 this commit removes the directives instead since it is not possible to write a reliable check.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D99865
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
830ed64c |
| 11-Nov-2020 |
Jay Foad <[email protected]> |
Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103c8d8e624b19fad5a5006e7a783ecb7.
The underlying problems were fixed by D90607.
|
| #
3fdf3b15 |
| 14-Oct-2020 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4 |
|
| #
16778b19 |
| 25-Sep-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Make bfe patterns divergence-aware
This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr.
[AMDGPU] Make bfe patterns divergence-aware
This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr.
Differential Revision: https://reviews.llvm.org/D88580
show more ...
|
| #
8b08fa01 |
| 29-Sep-2020 |
Mirko Brkusanin <[email protected]> |
Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"
This reverts commit f5cd7ec9f3fc969ff5e1feed961996844333de3b.
Certain rocPRIM/rocThrust/hipCUB tests were failing because of
Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"
This reverts commit f5cd7ec9f3fc969ff5e1feed961996844333de3b.
Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc3 |
|
| #
f5cd7ec9 |
| 21-Aug-2020 |
Mirko Brkusanin <[email protected]> |
[AMDGPU] Reorganize GCN subtarget features for unaligned access
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessM
[AMDGPU] Reorganize GCN subtarget features for unaligned access
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both.
Differential Revision: https://reviews.llvm.org/D84522
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
60b1967c |
| 21-Jan-2020 |
Scott Linder <[email protected]> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
| #
4b472139 |
| 27-Aug-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Switch backend default max workgroup size to 1024
Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires langu
AMDGPU: Switch backend default max workgroup size to 1024
Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum.
I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
show more ...
|
|
Revision tags: llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
| #
a25e0524 |
| 15-Nov-2018 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Enable code object v3 for AMDHSA only
Differential Revision: https://reviews.llvm.org/D54186
llvm-svn: 346923
|
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
| #
2d22d24a |
| 30-Oct-2018 |
Konstantin Zhuravlyov <[email protected]> |
Revert r345542: AMDGPU: Enable code object v3 by default
It breaks mesa.
llvm-svn: 345662
|
| #
5cb95020 |
| 29-Oct-2018 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Enable code object v3 by default
Differential Revision: https://reviews.llvm.org/D53525
llvm-svn: 345542
|
|
Revision tags: llvmorg-7.0.0 |
|
| #
db7ee766 |
| 11-Sep-2018 |
Alexander Timofeev <[email protected]> |
[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec
llvm-svn: 341
[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec
llvm-svn: 341928
show more ...
|
|
Revision tags: llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3 |
|
| #
9224c00d |
| 05-Jun-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Use more custom insert/extract_vector_elt lowering
Apply to i8 vectors.
llvm-svn: 334044
|
|
Revision tags: llvmorg-6.0.1-rc2 |
|
| #
869cbedc |
| 08-May-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix broken dynamic vector indexing for packed types
The intention of this was to multiply by 16, not shift by 16.
llvm-svn: 331793
|
|
Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3 |
|
| #
ba92059c |
| 16-Feb-2018 |
Changpeng Fang <[email protected]> |
AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements
Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an o
AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements
Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D33559
llvm-svn: 325372
show more ...
|
| #
0124b548 |
| 13-Feb-2018 |
Yaxun Liu <[email protected]> |
[AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170
llvm-svn: 325030
|
|
Revision tags: llvmorg-6.0.0-rc2 |
|
| #
2a22c5de |
| 02-Feb-2018 |
Yaxun Liu <[email protected]> |
[AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.
Differential Revision: https://reviews.llvm.org/D40955
llvm-svn: 324101
|
|
Revision tags: llvmorg-6.0.0-rc1 |
|
| #
e4f067eb |
| 19-Dec-2017 |
Mark Searles <[email protected]> |
[AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for
[AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed.
Differential Revision: https://reviews.llvm.org/D41377
llvm-svn: 321100
show more ...
|
|
Revision tags: llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2 |
|
| #
db77e57e |
| 27-Nov-2017 |
Nirav Dave <[email protected]> |
[DAG] Do MergeConsecutiveStores again before Instruction Selection
Summary:
Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowe
[DAG] Do MergeConsecutiveStores again before Instruction Selection
Summary:
Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well.
Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33675
llvm-svn: 319036
show more ...
|
| #
a0342dc9 |
| 20-Nov-2017 |
Dmitry Preobrazhensky <[email protected]> |
[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765
Reviewers: tamazov, SamWot, arsenm, vpykhtin
Di
[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765
Reviewers: tamazov, SamWot, arsenm, vpykhtin
Differential Revision: https://reviews.llvm.org/D40088
llvm-svn: 318675
show more ...
|
| #
45b98189 |
| 15-Nov-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Don't use MUBUF vaddr if address may overflow
Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive.
llvm-svn: 318240
|
|
Revision tags: llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1 |
|
| #
c90347d7 |
| 12-Apr-2017 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Generate range metadata for workitem id
If workgroup size is known inform llvm about range returned by local id and local size queries.
Differential Revision: https://reviews.llvm.org/D31
[AMDGPU] Generate range metadata for workitem id
If workgroup size is known inform llvm about range returned by local id and local size queries.
Differential Revision: https://reviews.llvm.org/D31804
llvm-svn: 300102
show more ...
|
| #
3dbeefa9 |
| 21-Mar-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default ca
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel).
llvm-svn: 298444
show more ...
|
| #
8e45acfc |
| 17-Mar-2017 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add address space based alias analysis pass
This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed.
Differential Revision: https://reviews.llvm.org/D31103
llvm
[AMDGPU] Add address space based alias analysis pass
This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed.
Differential Revision: https://reviews.llvm.org/D31103
llvm-svn: 298172
show more ...
|