|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
bb1fe369 |
| 19-Jan-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Make v8i16/v8f16 legal
Differential Revision: https://reviews.llvm.org/D117721
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
| #
98f48723 |
| 24-Jun-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6
[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D104622
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
2291bd13 |
| 30-Nov-2020 |
Austin Kerbow <[email protected]> |
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
show more ...
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
20c43d6b |
| 20-Nov-2020 |
Matt Arsenault <[email protected]> |
OpaquePtr: Bulk update tests to use typed sret
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4 |
|
| #
a343b9b0 |
| 23-Sep-2020 |
Sebastian Neubauer <[email protected]> |
Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6.
According to michel.daenzer, > This completely broke the Mesa radeonsi drive
Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6.
According to michel.daenzer, > This completely broke the Mesa radeonsi driver on Navi 14. Xorg + > xterm come up with major corruption & psychedelic colours.
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc3 |
|
| #
ca907bfb |
| 04-Sep-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished
[AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished. Calls need some time to be executed, so if the callee inserts the waitcnt, filling the instruction buffer and waiting for memory will be interleaved, hiding some latency. This comes at the cost of having a waitcnt inside functions that may not be needed as no memory operations are outstanding.
For function calls, this is already implemented. The same principal applies to returns: If the caller inserts a waitcnt after the call, the callee does not have to wait and the return and memory operation can be run in parallel.
This commit implements waiting in the caller after returning from a function call.
Differential Revision: https://reviews.llvm.org/D87674
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
60b1967c |
| 21-Jan-2020 |
Scott Linder <[email protected]> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
| #
156a1b59 |
| 24-Feb-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Make signext/zeroext behave more sensibly over > i32
Interpret these as extending to the next multiple of 32-bits. This had no effect with i48 for example, which is really split into {i32, i
AMDGPU: Make signext/zeroext behave more sensibly over > i32
Interpret these as extending to the next multiple of 32-bits. This had no effect with i48 for example, which is really split into {i32, i16}, which should extend the high part.
show more ...
|
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1 |
|
| #
870f49e6 |
| 19-Jul-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Add some function return test cases
llvm-svn: 366591
|
|
Revision tags: llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
| #
71dfb7ec |
| 08-Jul-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Make s34 the FP register
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame r
AMDGPU: Make s34 the FP register
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog.
If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort.
This also doesn't attempt to handle SGPR spilling with scalar stores.
llvm-svn: 365372
show more ...
|
|
Revision tags: llvmorg-8.0.1-rc3 |
|
| #
d88db6d7 |
| 20-Jun-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Always use s33 for global scratch wave offset
Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around ever
AMDGPU: Always use s33 for global scratch wave offset
Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR.
llvm-svn: 363990
show more ...
|
|
Revision tags: llvmorg-8.0.1-rc2 |
|
| #
5c714cbd |
| 23-May-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack allocation than is possible:
https://github.com/RadeonOpenCompute/ROCR-Runtim
AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack allocation than is possible:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70
Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made this configurable.
llvm-svn: 361541
show more ...
|
|
Revision tags: llvmorg-8.0.1-rc1 |
|
| #
361b5b21 |
| 21-Mar-2019 |
Tim Renouf <[email protected]> |
[AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled there
[AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled there.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58902
Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
show more ...
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3 |
|
| #
57b5966d |
| 10-Sep-2018 |
Matt Arsenault <[email protected]> |
DAG: Handle odd vector sizes in calling conv splitting
This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces.
Fixes not splitti
DAG: Handle odd vector sizes in calling conv splitting
This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces.
Fixes not splitting 3i16/v3f16 into two registers for AMDGPU.
This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets.
llvm-svn: 341801
show more ...
|
| #
0da6350d |
| 31-Aug-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove remnants of old address space mapping
llvm-svn: 341165
|
|
Revision tags: llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
| #
8f9dde94 |
| 28-Jul-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop wasting argument registers with v3i32/v3f32
SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument spac
AMDGPU: Stop wasting argument registers with v3i32/v3f32
SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines.
llvm-svn: 338197
show more ...
|
| #
02dc7e19 |
| 15-Jun-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working around them not being legal.
llvm-svn: 334835
|
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3 |
|
| #
f1c868ef |
| 07-Jun-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix not including v2f64 in SReg_128
Fixes assertion with calls returning v2f64.
llvm-svn: 334189
|
| #
9224c00d |
| 05-Jun-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Use more custom insert/extract_vector_elt lowering
Apply to i8 vectors.
llvm-svn: 334044
|
|
Revision tags: llvmorg-6.0.1-rc2 |
|
| #
762d4988 |
| 09-May-2018 |
Matt Arsenault <[email protected]> |
AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector, we can use the original source value.
This helps with some combine orde
AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector, we can use the original source value.
This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases.
llvm-svn: 331909
show more ...
|
|
Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3 |
|
| #
0124b548 |
| 13-Feb-2018 |
Yaxun Liu <[email protected]> |
[AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170
llvm-svn: 325030
|
|
Revision tags: llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3 |
|
| #
30e4608c |
| 03-Dec-2017 |
Yaxun Liu <[email protected]> |
CodeGen: Fix SelectionDAGISel::LowerArguments for sret addr space
SelectionDAGISel::LowerArguments assumes sret addr space is 0, which is not true for amdgcn---amdgiz target.
This patch fixes that.
CodeGen: Fix SelectionDAGISel::LowerArguments for sret addr space
SelectionDAGISel::LowerArguments assumes sret addr space is 0, which is not true for amdgcn---amdgiz target.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D40255
llvm-svn: 319630
show more ...
|
| #
caf0ed4d |
| 30-Nov-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Allow negative MUBUF vaddr for gfx9
GFX9 does not enable bounds checking for the resource descriptors used for private access, so it should be OK to use vaddr with a potentially negative val
AMDGPU: Allow negative MUBUF vaddr for gfx9
GFX9 does not enable bounds checking for the resource descriptors used for private access, so it should be OK to use vaddr with a potentially negative value.
llvm-svn: 319393
show more ...
|
|
Revision tags: llvmorg-5.0.1-rc2 |
|
| #
a0342dc9 |
| 20-Nov-2017 |
Dmitry Preobrazhensky <[email protected]> |
[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765
Reviewers: tamazov, SamWot, arsenm, vpykhtin
Di
[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765
Reviewers: tamazov, SamWot, arsenm, vpykhtin
Differential Revision: https://reviews.llvm.org/D40088
llvm-svn: 318675
show more ...
|
| #
45b98189 |
| 15-Nov-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Don't use MUBUF vaddr if address may overflow
Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive.
llvm-svn: 318240
|