|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
d1af09ad |
| 23-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back.
Depends on D128442, D128270
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D128656
show more ...
|
| #
0f94d2b3 |
| 30-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader
[AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs.
Differential Revision: https://reviews.llvm.org/D128442
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
cfb7ffde |
| 21-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] New AMDGPUInsertDelayAlu pass
Differential Revision: https://reviews.llvm.org/D128270
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
7b9f620e |
| 06-Apr-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Work around GFX11 flat scratch SVS swizzling bug
Differential Revision: https://reviews.llvm.org/D127635
|
| #
438c312c |
| 13-Jun-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add GFX11 flat scratch test coverage
|
| #
e2926501 |
| 16-May-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Aggressively fold immediates in SIShrinkInstructions
Fold immediates regardless of how many uses they have. This is expected to increase overall code size, but decrease register usage.
Dif
[AMDGPU] Aggressively fold immediates in SIShrinkInstructions
Fold immediates regardless of how many uses they have. This is expected to increase overall code size, but decrease register usage.
Differential Revision: https://reviews.llvm.org/D114644
show more ...
|
| #
3eb2281b |
| 16-May-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Aggressively fold immediates in SIFoldOperands
Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by ad
[AMDGPU] Aggressively fold immediates in SIFoldOperands
Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions.
This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size.
Differential Revision: https://reviews.llvm.org/D114643
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
36fe3f13 |
| 08-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.
Differential Revision: https://reviews.llvm.org/D121254
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
a5d4f82b |
| 11-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and this doesn't work well with command line options.
Differential Revision: https://reviews.llvm.org/D119425
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
89c447e4 |
| 16-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal
This was inheriting the mesa behavior, and as far as I know nobody is using opencl kernels with amdpal. The isMesaKernel check was
AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal
This was inheriting the mesa behavior, and as far as I know nobody is using opencl kernels with amdpal. The isMesaKernel check was irrelevant because this property needs to be held for all functions.
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
8a52bd82 |
| 19-Nov-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Only select VOP3 forms of VOP2 instructions
Change VOP_PAT_GEN to default to not generating an instruction selection pattern for the VOP2 (e32) form of an instruction, only for the VOP3 (e6
[AMDGPU] Only select VOP3 forms of VOP2 instructions
Change VOP_PAT_GEN to default to not generating an instruction selection pattern for the VOP2 (e32) form of an instruction, only for the VOP3 (e64) form. This allows SIFoldOperands maximum freedom to fold copies into the operands of an instruction, before SIShrinkInstructions tries to shrink it back to the smaller encoding.
This affects the following VOP2 instructions: v_min_i32 v_max_i32 v_min_u32 v_max_u32 v_and_b32 v_or_b32 v_xor_b32 v_lshr_b32 v_ashr_i32 v_lshl_b32
A further cleanup could simplify or remove VOP_PAT_GEN, since its optional second argument is never used.
Differential Revision: https://reviews.llvm.org/D114252
show more ...
|
| #
273a0c8b |
| 04-Nov-2021 |
Matt Arsenault <[email protected]> |
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the em
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only.
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
3ce1b963 |
| 08-Sep-2021 |
Joe Nash <[email protected]> |
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D1095
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
f3fe44fa |
| 19-Aug-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Fix too many constants with flat scratch
Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame ind
[AMDGPU] Fix too many constants with flat scratch
Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame index, leading to too many constants when flat scratch is enabled and stack offsets are large.
Differential Revision: https://reviews.llvm.org/D108368
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
96e1fcb1 |
| 07-Jun-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differe
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differential Revision: https://reviews.llvm.org/D103322
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1 |
|
| #
13c03162 |
| 03-May-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets.
This is slightly more
[AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets.
This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch.
Differential Revision: https://reviews.llvm.org/D101292
show more ...
|
| #
f5199d7a |
| 09-Apr-2021 |
Austin Kerbow <[email protected]> |
[AMDGPU] Revise handling of preexisting waitcnt
Preexisting waitcnt may not update the scoreboard if the instruction being examined needed to wait on fewer counters than what was encoded in the old
[AMDGPU] Revise handling of preexisting waitcnt
Preexisting waitcnt may not update the scoreboard if the instruction being examined needed to wait on fewer counters than what was encoded in the old waitcnt instruction. Fixing this results in the elimination of some redudnat waitcnt.
These changes also enable combining consecutive waitcnt into a single S_WAITCNT or S_WAITCNT_VSCNT instruction.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D100281
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
2291bd13 |
| 30-Nov-2020 |
Austin Kerbow <[email protected]> |
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
show more ...
|
| #
ca904b81 |
| 22-Jan-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Fix FP materialization/resolve with flat scratch
Differential Revision: https://reviews.llvm.org/D95266
|
| #
2f499b9a |
| 19-Dec-2020 |
Tony <[email protected]> |
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant cache
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
show more ...
|
| #
cdfd4c5c |
| 05-Jan-2021 |
Mircea Trofin <[email protected]> |
[NFC] Removed unused prefixes in test/CodeGen/AMDGPU
More patches to follow. This covers the pertinent tests starting with e, f, and g.
Differential Revision: https://reviews.llvm.org/D94124
|
| #
d28624a2 |
| 01-Dec-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Stop adding an implicit def of vcc_hi for wave32
This doesn't seem to be needed for anything.
Differential Revision: https://reviews.llvm.org/D92400
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
7a18bdb3 |
| 15-Oct-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Implement flat scratch init for pal
Extract the scratch offset from the scratch buffer descriptor that is stored in the global table.
Differential Revision: https://reviews.llvm.org/D91701
|
| #
cf6565f6 |
| 12-Nov-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
|
| #
d5a46586 |
| 06-Nov-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
|