|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
d96361d7 |
| 17-Jun-2022 |
Abinav Puthan Purayil <[email protected]> |
[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map
This change introduces the dynamic stack boolean field to code-object-v3 and above under the code prope
[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map
This change introduces the dynamic stack boolean field to code-object-v3 and above under the code properties of the kernel descriptor and under the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to the is_dynamic_callstack field of amd_kernel_code_t.
Differential Revision: https://reviews.llvm.org/D128344
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
8edaf259 |
| 12-Apr-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset t
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.
Reviewers: arsenm, sameerds, b-sumner and foad
Differential Revision: https://reviews.llvm.org/D123548
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
7f9868f9 |
| 11-Apr-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5
Summary: In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads to align the implicit
AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5
Summary: In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads to align the implicit kernel argument segment to 8 byte boundary. In this work, we simply force this alignment through the first implicit argument. In addition, we don't emit metadata for any implicit kernel argument if none of them is actually used.
Reviewers: arsenm, b-sumner
Differential Revision: https://reviews.llvm.org/D123346
show more ...
|
| #
09f33a43 |
| 05-Apr-2022 |
Scott Linder <[email protected]> |
[AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic
The diagnostic is unreliable, and triggers even for dead uses of hostcall that may exist when linking the device-libs at lower optimization l
[AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic
The diagnostic is unreliable, and triggers even for dead uses of hostcall that may exist when linking the device-libs at lower optimization levels.
Eliminate the diagnostic, and directly document the limitation for OpenCL before code object V5.
Make some NFC changes to clarify the related code in the MetadataStreamer.
Add a clang test to tie OCL sources containing printf to the backend IR tests for this situation.
Reviewed By: sameerds, arsenm, yaxunl
Differential Revision: https://reviews.llvm.org/D121951
show more ...
|
| #
8384ced9 |
| 28-Mar-2022 |
Changpeng Fang <[email protected]> |
[AMDGPU][NFC]: Remove unnecessary MFI functions
Summary: hasHostcallPtr() and hasHeapPtr() are only used in metadata emit. However, we can use the corresponding function attributes directly instea
[AMDGPU][NFC]: Remove unnecessary MFI functions
Summary: hasHostcallPtr() and hasHeapPtr() are only used in metadata emit. However, we can use the corresponding function attributes directly instead introducing the functions.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D122600
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
0f20a35b |
| 09-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In curr
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
ca62b1db |
| 25-Feb-2022 |
Changpeng Fang <[email protected]> |
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://reviews.llvm.org/D119027
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
74702444 |
| 02-Feb-2022 |
Jacob Lambert <[email protected]> |
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "a
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "agpr_count" key to report the number of AGPRs set for supported devices (gfx90a, gfx908, as determined by hasMAIInsts()). This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL. The AsmParser also now recognizes ".kernel.agpr_count" for supported devices.
Differential Revision: https://reviews.llvm.org/D116140
show more ...
|
| #
d8f99bb6 |
| 11-Feb-2022 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now r
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument.
The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
show more ...
|
| #
1194b9cd |
| 01-Feb-2022 |
Changpeng Fang <[email protected]> |
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for t
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2
Reviewers: t-tye, b-sumner, arsenm, and bcahoon
Fixes: SWDEV-307188, SWDEV-307189
Differential Revision: https://reviews.llvm.org/D118272
show more ...
|
| #
a5e324e3 |
| 26-Jan-2022 |
Nikita Popov <[email protected]> |
[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers
AMDGPUHSAMetadataStreamer currently assumes that pointer arguments without align attribute have ABI alignment of the pointee type
[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers
AMDGPUHSAMetadataStreamer currently assumes that pointer arguments without align attribute have ABI alignment of the pointee type. This is incompatible with opaque pointers, but also plain incorrect: Pointer arguments without explicit alignment have alignment 1. It is the responsibility of the frontent to add correct align annotations.
Differential Revision: https://reviews.llvm.org/D118229
show more ...
|
| #
aa97bc11 |
| 21-Jan-2022 |
Nikita Popov <[email protected]> |
[NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecatin
[NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
ae0ba7de |
| 25-Oct-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Optimize out implicit kernarg argument allocation if unused
We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arg
AMDGPU: Optimize out implicit kernarg argument allocation if unused
We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arguments if unneeded.
show more ...
|
| #
90ff1487 |
| 28-Oct-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Account for implicit argument alignment for kernarg segment
If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For
AMDGPU: Account for implicit argument alignment for kernarg segment
If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI.
The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters.
show more ...
|
| #
6fe949c4 |
| 22-Oct-2021 |
Kazu Hirata <[email protected]> |
[Target, Transforms] Use StringRef::contains (NFC)
|
| #
095c48fd |
| 05-Oct-2021 |
kpyzhov <[email protected]> |
[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module
[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument. This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf.
Differential revision: https://reviews.llvm.org/D110337
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
5173854f |
| 06-Aug-2021 |
Reshabh Sharma <[email protected]> |
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer w
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
show more ...
|
| #
dce35ef1 |
| 04-Aug-2021 |
Reshabh Sharma <[email protected]> |
Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"
This reverts commit d42e70b3d315645e37f3b1455d39e68678e69525.
|
| #
d42e70b3 |
| 04-Aug-2021 |
Reshabh Sharma <[email protected]> |
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer w
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
f4ace637 |
| 24-Mar-2021 |
Konstantin Zhuravlyov <[email protected]> |
AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDG
AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names
Differential Revision: https://reviews.llvm.org/D95638
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init |
|
| #
f82cff31 |
| 26-Jan-2021 |
Simon Pilgrim <[email protected]> |
[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI.
Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions
|
|
Revision tags: llvmorg-11.1.0-rc2 |
|
| #
560d7e04 |
| 20-Jan-2021 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
|
Revision tags: llvmorg-11.1.0-rc1 |
|
| #
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
| #
1168119c |
| 07-May-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Start interpreting byref on kernel arguments
These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%st
AMDGPU: Start interpreting byref on kernel arguments
These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should produce the same offsets and argument metadata.
This handles all 3 kernel ABI implementations, and the two HSA metadata emission paths.
show more ...
|
| #
31f4e43f |
| 29-Jun-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove .value_type from kernel metadata
This doesn't appear used for anything, and is emitted incorrectly based on the description. This also depends on the IR type, and pointee element type.
|