|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
7094ab4e |
| 18-Jul-2022 |
Kazu Hirata <[email protected]> |
[llvm] Modernize bool literals (NFC)
Identified with modernize-use-bool-literals.
|
| #
d1af09ad |
| 23-Jun-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back.
Depends on D128442, D128270
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D128656
show more ...
|
| #
4874838a |
| 28-Jun-2022 |
Piotr Sobczak <[email protected]> |
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D1287
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D128756
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
07b7fada |
| 25-May-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only.
A VOPD instruction is constituted of an
[AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only.
A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D128218
show more ...
|
| #
485e8b4f |
| 20-Jun-2022 |
Dmitry Preobrazhensky <[email protected]> |
[AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes
Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065.
Differential
[AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes
Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065.
Differential Revision: https://reviews.llvm.org/D128075
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
7050d5b9 |
| 17-Feb-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Limit GFX11 to using 128 VGPRs
This is a temporary measure to avoid generating incorrect code until the compiler understands the new way that GFX11 encodes 16-bit operands in VOP instructio
[AMDGPU] Limit GFX11 to using 128 VGPRs
This is a temporary measure to avoid generating incorrect code until the compiler understands the new way that GFX11 encodes 16-bit operands in VOP instructions.
Differential Revision: https://reviews.llvm.org/D128054
show more ...
|
| #
cb9ae937 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision:
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision: https://reviews.llvm.org/D127527
show more ...
|
| #
95a13425 |
| 05-Jun-2022 |
Fangrui Song <[email protected]> |
Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options
|
| #
835e09c4 |
| 10-May-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 FLAT Instructions
MachineCode Support for FLAT type instructions
Contributors: Sebastian Neubauer <[email protected]>
Patch 12/N for upstreaming of AMDGPU gfx11 architectur
[AMDGPU] gfx11 FLAT Instructions
MachineCode Support for FLAT type instructions
Contributors: Sebastian Neubauer <[email protected]>
Patch 12/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125989
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D125992
show more ...
|
| #
1a51ab76 |
| 25-Apr-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 export instructions
Contributors: Jay Foad <[email protected]> Dmitry Preobrazhensky <[email protected]>
Patch 10/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125822
Revi
[AMDGPU] gfx11 export instructions
Contributors: Jay Foad <[email protected]> Dmitry Preobrazhensky <[email protected]>
Patch 10/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125822
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D125824
show more ...
|
| #
d21b9b49 |
| 21-Apr-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn.
Contributors: Jay Foad <[email protected]>
Patch
[AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn.
Contributors: Jay Foad <[email protected]>
Patch 7/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125319
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D125498
show more ...
|
| #
c7025940 |
| 19-Apr-2022 |
Joe Nash <[email protected]> |
[AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits.
[AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits.
Contributors: Petar Avramovic <[email protected]> Dmitry Preobrazhensky <[email protected]>
Depends on D125316
Patch 6/N for upstreaming of AMDGPU gfx11 architecture.
Reviewed By: dp, Petar.Avramovic
Differential Revision: https://reviews.llvm.org/D125319
show more ...
|
| #
e01dbabd |
| 19-Apr-2022 |
Dmitry Preobrazhensky <[email protected]> |
[AMDGPU][MC] Corrected error message "image data size does not match dmask and tfe"
Differential Revision: https://reviews.llvm.org/D123929
|
| #
8edaf259 |
| 12-Apr-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset t
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.
Reviewers: arsenm, sameerds, b-sumner and foad
Differential Revision: https://reviews.llvm.org/D123548
show more ...
|
| #
1f6aa903 |
| 07-Apr-2022 |
Dmitry Preobrazhensky <[email protected]> |
[AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand
Added the following helpers:
depctr_hold_cnt(...) depctr_sa_sdst(...) depctr_va_vdst(...) depctr_va_sdst(...)
[AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand
Added the following helpers:
depctr_hold_cnt(...) depctr_sa_sdst(...) depctr_va_vdst(...) depctr_va_sdst(...) depctr_va_ssrc(...) depctr_va_vcc(...) depctr_vm_vsrc(...)
Differential Revision: https://reviews.llvm.org/D123022
show more ...
|
| #
64838ba3 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use GenericTable to classify DGEMM
Since there is a table introduced for MAI instructions extend it to use for DGEMM classification.
Differential Revision: https://reviews.llvm.org/D122337
|
| #
cad9de71 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx940 MAI hazard recognizer
Differential Revision: https://reviews.llvm.org/D122263
|
| #
1d817a14 |
| 21-Mar-2022 |
Dmitry Preobrazhensky <[email protected]> |
[AMDGPU][MC][NFC] Refactored sendmsg(...) handling
Differential Revision: https://reviews.llvm.org/D121995
|
| #
dd5895cc |
| 17-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.
Reviewers: arsenm, sameerds, yaxunl
Differential Revision: https://reviews.llvm.org/D120265
show more ...
|
| #
5977dfba |
| 16-Mar-2022 |
Dmitry Preobrazhensky <[email protected]> |
[AMDGPU][MC][NFC] Refactored custom operands handling
The original design of custom operands support assumed that most GPUs have the same or very similar operand names end encodings. This is no long
[AMDGPU][MC][NFC] Refactored custom operands handling
The original design of custom operands support assumed that most GPUs have the same or very similar operand names end encodings. This is no longer the case. As a result the support code becomes over-complicated and difficult to maintain.
This change implements a different design with the following benefits:
- support of aliases; - support of operands with overlapped encodings; - identification of defined but unsupported operands.
Differential Revision: https://reviews.llvm.org/D121696
show more ...
|
| #
8dd3d1cf |
| 10-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Add symbolic names for gfx940 HWREGs
The namespaces of HWREGs is now overlapping with gfx10. Thus the patch is longer than necessary to just support new names. It also need to handle proper
[AMDGPU] Add symbolic names for gfx940 HWREGs
The namespaces of HWREGs is now overlapping with gfx10. Thus the patch is longer than necessary to just support new names. It also need to handle proper error messages, i.e. to issue a "specified hardware register is not supported on this GPU" message.
This may need a major refactoring in the future.
Differential Revision: https://reviews.llvm.org/D121418
show more ...
|
| #
0f20a35b |
| 09-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In curr
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
show more ...
|
| #
8992b50e |
| 02-Mar-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx940 uses new names for coherency bits
Differential Revision: https://reviews.llvm.org/D120855
|
| #
ca62b1db |
| 25-Feb-2022 |
Changpeng Fang <[email protected]> |
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://reviews.llvm.org/D119027
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
74702444 |
| 02-Feb-2022 |
Jacob Lambert <[email protected]> |
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "a
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "agpr_count" key to report the number of AGPRs set for supported devices (gfx90a, gfx908, as determined by hasMAIInsts()). This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL. The AsmParser also now recognizes ".kernel.agpr_count" for supported devices.
Differential Revision: https://reviews.llvm.org/D116140
show more ...
|