|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
082e22f3 |
| 24-Sep-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all
[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation.
This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs.
Differential Revision: https://reviews.llvm.org/D110376
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
f4ace637 |
| 24-Mar-2021 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDG
AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names
Differential Revision: https://reviews.llvm.org/D95638
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
2291bd13 |
| 30-Nov-2020 |
Austin Kerbow <[email protected]> |
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
show more ...
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
3fdf3b15 |
| 14-Oct-2020 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
| #
a25e0524 |
| 15-Nov-2018 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Enable code object v3 for AMDHSA only
Differential Revision: https://reviews.llvm.org/D54186
llvm-svn: 346923
|
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
| #
2d22d24a |
| 30-Oct-2018 |
Konstantin Zhuravlyov <[email protected]> |
Revert r345542: AMDGPU: Enable code object v3 by default
It breaks mesa.
llvm-svn: 345662
|
| #
5cb95020 |
| 29-Oct-2018 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Enable code object v3 by default
Differential Revision: https://reviews.llvm.org/D53525
llvm-svn: 345542
|
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3 |
|
| #
3c7581bb |
| 08-Jun-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Use correct register names in inline assembly
Fixes using physical registers in inline asm from clang.
llvm-svn: 305004
|
|
Revision tags: llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1 |
|
| #
a3566f21 |
| 17-Apr-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Use MachineRegisterInfo to find max used register
Avoid looping through program to determine register counts. This avoids needing to look at regmask operands.
Also fixes some counting error
AMDGPU: Use MachineRegisterInfo to find max used register
Avoid looping through program to determine register counts. This avoids needing to look at regmask operands.
Also fixes some counting errors with flat_scr when there are no stack objects.
llvm-svn: 300482
show more ...
|
| #
3dbeefa9 |
| 21-Mar-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default ca
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel).
llvm-svn: 298444
show more ...
|
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2 |
|
| #
7aad8fd8 |
| 24-Jan-2017 |
Matt Arsenault <[email protected]> |
Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@mi
Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <[email protected]>
llvm-svn: 292982
show more ...
|
|
Revision tags: llvmorg-4.0.0-rc1 |
|
| #
0f55fbae |
| 09-Dec-2016 |
Marek Olsak <[email protected]> |
AMDGPU/SI: Don't reserve XNACK when it's disabled
Summary: This frees 2 additional scalar registers.
These are results from all of my 3 patches combined:
Polaris: Spilled SGPRs: 2231 -> 1517
AMDGPU/SI: Don't reserve XNACK when it's disabled
Summary: This frees 2 additional scalar registers.
These are results from all of my 3 patches combined:
Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %)
Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %)
Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27151
llvm-svn: 289262
show more ...
|
| #
693e9be9 |
| 09-Dec-2016 |
Marek Olsak <[email protected]> |
AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Summary: This frees 2 scalar registers.
Reviewers: tstellarAMD
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle
AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Summary: This frees 2 scalar registers.
Reviewers: tstellarAMD
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27150
llvm-svn: 289261
show more ...
|
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
| #
5e832e86 |
| 06-Sep-2016 |
Wei Ding <[email protected]> |
AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276
llvm-svn: 280742
|
|
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1 |
|
| #
07f65718 |
| 26-Jul-2016 |
Matt Arsenault <[email protected]> |
AMDGPU: Add missing tests for xnack option for HSA
llvm-svn: 276765
|
|
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1, llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1 |
|
| #
3c05d6d3 |
| 07-Jan-2016 |
Nicolai Haehnle <[email protected]> |
AMDGPU/SI: xnack_mask is always reserved on VI
Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy a
AMDGPU/SI: xnack_mask is always reserved on VI
Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used.
I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]).
This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this.
It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.)
Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15898
llvm-svn: 257073
show more ...
|
| #
5b504976 |
| 04-Jan-2016 |
Nicolai Haehnle <[email protected]> |
AMDGPU: add +xnack feature
Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically.
The hardware only requires this reservation when th
AMDGPU: add +xnack feature
Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically.
The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test).
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15869
llvm-svn: 256794
show more ...
|
| #
caaa3aa0 |
| 17-Dec-2015 |
Tom Stellard <[email protected]> |
AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15583
Patch by: Cha
AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15583
Patch by: Changpeng Fang
llvm-svn: 255908
show more ...
|