|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
| #
8a12f20e |
| 13-Jul-2022 |
jeff <[email protected]> |
[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation
|
| #
6817031d |
| 06-Jul-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Disable FillMFMAShadowMutation by default
Disable amdgpu mfma power sched.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D129172
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
f1255186 |
| 18-Jun-2022 |
Guillaume Chatelet <[email protected]> |
[NFC][Alignment] Remove max functions between Align and MaybeAlign
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code
[NFC][Alignment] Remove max functions between Align and MaybeAlign
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code more opaque.
Differential Revision: https://reviews.llvm.org/D128121
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
3732cd59 |
| 16-May-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 vop3 and inherited vop instructions
This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an
[AMDGPU] gfx11 vop3 and inherited vop instructions
This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using the AssemblerPredicate = isGFX10Plus are also enabled. That predicate will be changed to isGFX10Only in a later patch.
Patch 15/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D126468
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D126475
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
8a53b25e |
| 12-Apr-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Use default member initializers in Subtarget classes
Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtar
[AMDGPU] Use default member initializers in Subtarget classes
Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtarget.h but forgetting to initialize it to false in AMDGPUSubtarget.cpp.
This was mostly autogenerated by: clang-tidy -checks=-*,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/*Subtarget.cpp
Differential Revision: https://reviews.llvm.org/D123613
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
6733590d |
| 07-Apr-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5
Summary: If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise set it to 256 bytes for
AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5
Summary: If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise set it to 256 bytes for code object version 5 (and beyond).
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D123262
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
0c0636f7 |
| 07-Mar-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Fix uninitialized value after 8d0c34fd4f
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
35ec58d8 |
| 01-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx940 removes all image instructions
Differential Revision: https://reviews.llvm.org/D120763
|
| #
2e2e64df |
| 28-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add gfx940 target
This is target definition only.
Differential Revision: https://reviews.llvm.org/D120688
|
| #
a5d4f82b |
| 11-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and this doesn't work well with command line options.
Differential Revision: https://reviews.llvm.org/D119425
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
4e077c0a |
| 25-Jan-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Remove feature register-banking
Since RegBankReassign pass was removed this feature is not use for anything.
Differential Revision: https://reviews.llvm.org/D118195
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
0b1140e8 |
| 14-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch
This was approximating the entry point logic for flat_scratch_init, which is not really the point. We need to account for whether we need to r
AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch
This was approximating the entry point logic for flat_scratch_init, which is not really the point. We need to account for whether we need to reserve the SGPR pair used for flat_scratch, not whether we needed the initialization kernel argument. If this was an arbitrary function, we would end up over-reporting the number of potentially free SGPRs. The logic for architected flat scratch also only applies to the initialization in the kernel, not the reserved registers at the end.
Avoids compile failures in a future patch from allocating more SGPRs than the subtarget supports.
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
4db74227 |
| 14-Dec-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes
Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have t
[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes
Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the same zeroing behaviour as on GFX8.
Differential Revision: https://reviews.llvm.org/D115729
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
2959e082 |
| 26-Oct-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default
Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm
AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default
Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm.amdgcn.implicitarg.ptr. This violates the principle of being correct by default. Some OpenMP testcases were broken recently since it wasn't correctly setting this attribute, and no known frontends are setting this to anything other than the maximum.
Most of the test changes are from load widening of argument loads since there now more implied dereferenceable bytes.
show more ...
|
| #
ae0ba7de |
| 25-Oct-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Optimize out implicit kernarg argument allocation if unused
We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arg
AMDGPU: Optimize out implicit kernarg argument allocation if unused
We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arguments if unneeded.
show more ...
|
| #
90ff1487 |
| 28-Oct-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Account for implicit argument alignment for kernarg segment
If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For
AMDGPU: Account for implicit argument alignment for kernarg segment
If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI.
The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters.
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
8d4b74ac |
| 11-Sep-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set
It should be semantically identical if it was set to the same value as the default. Also improve the documentation.
|
| #
3f34f75a |
| 22-Oct-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32
As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as imp
[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32
As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as implicit pseudo operands. Fix this by setting the correct latency from the SchedModel after addPhysRegDataDeps wrongly set it to 0.
Differential Revision: https://reviews.llvm.org/D112317
show more ...
|
| #
6fe949c4 |
| 22-Oct-2021 |
Kazu Hirata <[email protected]> |
[Target, Transforms] Use StringRef::contains (NFC)
|
| #
082e22f3 |
| 24-Sep-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all
[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation.
This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs.
Differential Revision: https://reviews.llvm.org/D110376
show more ...
|
| #
ec55dced |
| 16-Sep-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query
Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgp
AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query
Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.
show more ...
|
| #
dc6e8dfd |
| 20-Sep-2021 |
Jacob Lambert <[email protected]> |
[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.
|
| #
3ce1b963 |
| 08-Sep-2021 |
Joe Nash <[email protected]> |
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D1095
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
48958d02 |
| 23-Aug-2021 |
Daniil Fukalov <[email protected]> |
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `G
[NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108596
show more ...
|