History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp (Results 1 – 25 of 48)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# d96361d7 17-Jun-2022 Abinav Puthan Purayil <[email protected]>

[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map

This change introduces the dynamic stack boolean field to code-object-v3
and above under the code prope

[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map

This change introduces the dynamic stack boolean field to code-object-v3
and above under the code properties of the kernel descriptor and under
the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to
the is_dynamic_callstack field of amd_kernel_code_t.

Differential Revision: https://reviews.llvm.org/D128344

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# 8edaf259 12-Apr-2022 Changpeng Fang <[email protected]>

AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally

Summary:
Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default.
We use implicitarg_ptr + offset t

AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally

Summary:
Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default.
We use implicitarg_ptr + offset to check whether the multigrid synchronization
pointer is used. If yes, we remove this attribute and also remove
amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg
only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.

Reviewers: arsenm, sameerds, b-sumner and foad

Differential Revision: https://reviews.llvm.org/D123548

show more ...


Revision tags: llvmorg-14.0.1
# 7f9868f9 11-Apr-2022 Changpeng Fang <[email protected]>

AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5

Summary:
In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads
to align the implicit

AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5

Summary:
In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads
to align the implicit kernel argument segment to 8 byte boundary. In this work, we simply force
this alignment through the first implicit argument.
In addition, we don't emit metadata for any implicit kernel argument if none of them is actually used.

Reviewers: arsenm, b-sumner

Differential Revision: https://reviews.llvm.org/D123346

show more ...


# 09f33a43 05-Apr-2022 Scott Linder <[email protected]>

[AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic

The diagnostic is unreliable, and triggers even for dead uses of
hostcall that may exist when linking the device-libs at lower
optimization l

[AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic

The diagnostic is unreliable, and triggers even for dead uses of
hostcall that may exist when linking the device-libs at lower
optimization levels.

Eliminate the diagnostic, and directly document the limitation for
OpenCL before code object V5.

Make some NFC changes to clarify the related code in the
MetadataStreamer.

Add a clang test to tie OCL sources containing printf to the backend IR
tests for this situation.

Reviewed By: sameerds, arsenm, yaxunl

Differential Revision: https://reviews.llvm.org/D121951

show more ...


# 8384ced9 28-Mar-2022 Changpeng Fang <[email protected]>

[AMDGPU][NFC]: Remove unnecessary MFI functions

Summary:
hasHostcallPtr() and hasHeapPtr() are only used in metadata emit.
However, we can use the corresponding function attributes directly
instea

[AMDGPU][NFC]: Remove unnecessary MFI functions

Summary:
hasHostcallPtr() and hasHeapPtr() are only used in metadata emit.
However, we can use the corresponding function attributes directly
instead introducing the functions.

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D122600

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 0f20a35b 09-Mar-2022 Changpeng Fang <[email protected]>

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In curr

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.

Reviewers: sameerds, arsenm

Fixes: SWDEV-307189

Differential Revision: https://reviews.llvm.org/D119762

show more ...


Revision tags: llvmorg-14.0.0-rc2
# ca62b1db 25-Feb-2022 Changpeng Fang <[email protected]>

[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg

Summary:
Emit metadata for hidden_heap_v1 kernarg

Reviewers:
sameerds, b-sumner

Fixes:
SWDEV-307188

Differential Revision:
https://

[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg

Summary:
Emit metadata for hidden_heap_v1 kernarg

Reviewers:
sameerds, b-sumner

Fixes:
SWDEV-307188

Differential Revision:
https://reviews.llvm.org/D119027

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 74702444 02-Feb-2022 Jacob Lambert <[email protected]>

[AMDGPU] Add agpr_count to metadata and AsmParser

gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "a

[AMDGPU] Add agpr_count to metadata and AsmParser

gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.

Differential Revision: https://reviews.llvm.org/D116140

show more ...


# d8f99bb6 11-Feb-2022 Sameer Sahasrabuddhe <[email protected]>

[AMDGPU] replace hostcall module flag with function attribute

The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
r

[AMDGPU] replace hostcall module flag with function attribute

The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216

show more ...


# 1194b9cd 01-Feb-2022 Changpeng Fang <[email protected]>

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for t

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for the new ABI
Set the metadata version to be 1.2

Reviewers:
t-tye, b-sumner, arsenm, and bcahoon

Fixes:
SWDEV-307188, SWDEV-307189

Differential Revision:
https://reviews.llvm.org/D118272

show more ...


# a5e324e3 26-Jan-2022 Nikita Popov <[email protected]>

[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers

AMDGPUHSAMetadataStreamer currently assumes that pointer arguments
without align attribute have ABI alignment of the pointee type

[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers

AMDGPUHSAMetadataStreamer currently assumes that pointer arguments
without align attribute have ABI alignment of the pointee type.
This is incompatible with opaque pointers, but also plain incorrect:
Pointer arguments without explicit alignment have alignment 1. It is
the responsibility of the frontent to add correct align annotations.

Differential Revision: https://reviews.llvm.org/D118229

show more ...


# aa97bc11 21-Jan-2022 Nikita Popov <[email protected]>

[NFC] Remove uses of PointerType::getElementType()

Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecatin

[NFC] Remove uses of PointerType::getElementType()

Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# ae0ba7de 25-Oct-2021 Matt Arsenault <[email protected]>

AMDGPU: Optimize out implicit kernarg argument allocation if unused

We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be
unused. Start using it to avoid allocating the implicit arg

AMDGPU: Optimize out implicit kernarg argument allocation if unused

We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be
unused. Start using it to avoid allocating the implicit arguments if
unneeded.

show more ...


# 90ff1487 28-Oct-2021 Matt Arsenault <[email protected]>

AMDGPU: Account for implicit argument alignment for kernarg segment

If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For

AMDGPU: Account for implicit argument alignment for kernarg segment

If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For
some reason we require an 8-byte alignment for this, even though
there's no real advantage and I don't see where this is documented in
the ABI.

The code object header code also claims the minimum alignment is 16,
which is what I thought you always got at runtime anyway so I don't
know why this matters.

show more ...


# 6fe949c4 22-Oct-2021 Kazu Hirata <[email protected]>

[Target, Transforms] Use StringRef::contains (NFC)


# 095c48fd 05-Oct-2021 kpyzhov <[email protected]>

[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration.
The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module

[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration.
The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument.
This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf.

Differential revision: https://reviews.llvm.org/D110337

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 5173854f 06-Aug-2021 Reshabh Sharma <[email protected]>

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer w

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682

show more ...


# dce35ef1 04-Aug-2021 Reshabh Sharma <[email protected]>

Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"

This reverts commit d42e70b3d315645e37f3b1455d39e68678e69525.


# d42e70b3 04-Aug-2021 Reshabh Sharma <[email protected]>

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer w

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# f4ace637 24-Mar-2021 Konstantin Zhuravlyov <[email protected]>

AMDGPU: Add target id and code object v4 support

- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id)
- Add code object v4 support (https://llvm.org/docs/AMDG

AMDGPU: Add target id and code object v4 support

- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id)
- Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object)
- Add kernarg_size to kernel descriptor
- Change trap handler ABI to no longer move queue pointer into s[0:1]
- Cleanup ELF definitions
- Add V2, V3, V4 suffixes to make a clear distinction for code object version
- Consolidate note names

Differential Revision: https://reviews.llvm.org/D95638

show more ...


Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init
# f82cff31 26-Jan-2021 Simon Pilgrim <[email protected]>

[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI.

Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions


Revision tags: llvmorg-11.1.0-rc2
# 560d7e04 20-Jan-2021 dfukalov <[email protected]>

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036


Revision tags: llvmorg-11.1.0-rc1
# 6a87e9b0 25-Dec-2020 dfukalov <[email protected]>

[NFC][AMDGPU] Reduce include files dependency.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1
# 1168119c 07-May-2020 Matt Arsenault <[email protected]>

AMDGPU: Start interpreting byref on kernel arguments

These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%st

AMDGPU: Start interpreting byref on kernel arguments

These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should
produce the same offsets and argument metadata.

This handles all 3 kernel ABI implementations, and the two HSA
metadata emission paths.

show more ...


# 31f4e43f 29-Jun-2020 Matt Arsenault <[email protected]>

AMDGPU: Remove .value_type from kernel metadata

This doesn't appear used for anything, and is emitted incorrectly
based on the description. This also depends on the IR type, and
pointee element type.


12