|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
8edaf259 |
| 12-Apr-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset t
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.
Reviewers: arsenm, sameerds, b-sumner and foad
Differential Revision: https://reviews.llvm.org/D123548
show more ...
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
ca62b1db |
| 25-Feb-2022 |
Changpeng Fang <[email protected]> |
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://reviews.llvm.org/D119027
show more ...
|
| #
d8f99bb6 |
| 11-Feb-2022 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now r
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument.
The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
722b8e0e |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Invert ABI attribute handling
Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary.
AMDGPU: Invert ABI attribute handling
Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated.
This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass.
However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit.
At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.
show more ...
|
| #
088cc636 |
| 11-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert their meanings.
This now diverges more from the old AMDGPUAnnotateKernelFeatures, but this isn't u
AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert their meanings.
This now diverges more from the old AMDGPUAnnotateKernelFeatures, but this isn't used yet anyway.
show more ...
|
| #
46d82e73 |
| 13-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically the regular transforms were already run at this point. Touching undefined behavior breaks
AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically the regular transforms were already run at this point. Touching undefined behavior breaks a lot of tests when this is enabled by default, many of which are expecting to test handling of undef operations.
show more ...
|
| #
cf32d61a |
| 11-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as attributes, and are currently a hacky way of passing an analysis into the
AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as attributes, and are currently a hacky way of passing an analysis into the DAG. These don't really belong in the IR, and don't really fit in with the other attributes. Remove these to facilitate inverting the pass.
I don't exactly understand the indirect call test changes. These tests are using calls which are trivially replacable with a direct call, so I'm not sure what the point is.
show more ...
|
| #
98d7aa43 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer support using it outside of kernels.
|
| #
a77ae4aa |
| 12-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop attributor adding attributes to intrinsic declarations
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
2aaf038e |
| 27-Jul-2021 |
Johannes Doerfert <[email protected]> |
[Attributor] Update check lines for all AMDGPU attributor tests
I thought there was only one when I pushed cdb4cfe8b3ce2b0c50d4855ec260eab07fe63611, these should be all (in the CodeGen/AMDGPU folder
[Attributor] Update check lines for all AMDGPU attributor tests
I thought there was only one when I pushed cdb4cfe8b3ce2b0c50d4855ec260eab07fe63611, these should be all (in the CodeGen/AMDGPU folder).
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4 |
|
| #
96709823 |
| 27-Jun-2021 |
Kuter Dinel <[email protected]> |
[AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Reviewed By: jdoerfert, arsenm
Differential Revision: htt
[AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Reviewed By: jdoerfert, arsenm
Differential Revision: https://reviews.llvm.org/D104997
show more ...
|
| #
a7749c3f |
| 13-Jul-2021 |
Kuter Dinel <[email protected]> |
[AMDGPU] Use update_test_checks.py script for annotate kernel features tests.
This patch makes the annotate kernel features tests use the update_tests_checks.py script. Which makes it easy to update
[AMDGPU] Use update_test_checks.py script for annotate kernel features tests.
This patch makes the annotate kernel features tests use the update_tests_checks.py script. Which makes it easy to update the tests.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D105864
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
f0abefaf |
| 28-May-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Add IntrWillReturn to intrinsic definitions
This should probably be implied for all the speculatable ones. I think the only ones where this plausibly doesn't apply is s_sendmsghalt and maybe
AMDGPU: Add IntrWillReturn to intrinsic definitions
This should probably be implied for all the speculatable ones. I think the only ones where this plausibly doesn't apply is s_sendmsghalt and maybe kill.
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc1 |
|
| #
21d2884a |
| 19-May-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Annotate functions that have stack objects
Relying on any MachineFunction state in the MachineFunctionInfo constructor is hazardous, because the construction time is unclear and determined b
AMDGPU: Annotate functions that have stack objects
Relying on any MachineFunction state in the MachineFunctionInfo constructor is hazardous, because the construction time is unclear and determined by the first use. The function may be only partially constructed, which is part of why we have many of these hacky string attributes to track what we need for ABI lowering.
For SelectionDAG, all stack objects are created up-front before calling convention lowering so stack objects are visible at construction time. For GlobalISel, none of the IR function has been visited yet and the allocas haven't been added to the MachineFrameInfo yet. This should fix failing to set flat_scratch_init in GlobalISel when needed.
This pass really needs to be turned into some kind of analysis, but I haven't found a nice way use one here.
show more ...
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4 |
|
| #
f581d575 |
| 05-Sep-2019 |
Matt Arsenault <[email protected]> |
AMDGPU: Add intrinsics for address space identification
The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture.
llvm-svn: 371009
|
|
Revision tags: llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3 |
|
| #
0124b548 |
| 13-Feb-2018 |
Yaxun Liu <[email protected]> |
[AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170
llvm-svn: 325030
|
|
Revision tags: llvmorg-6.0.0-rc2 |
|
| #
2a22c5de |
| 02-Feb-2018 |
Yaxun Liu <[email protected]> |
[AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.
Differential Revision: https://reviews.llvm.org/D40955
llvm-svn: 324101
|
|
Revision tags: llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1 |
|
| #
23e4df6a |
| 14-Jul-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Detect kernarg segment pointer
This is necessary to pass the kernarg segment pointer to callee functions. Also don't unconditionally enable for kernels.
llvm-svn: 307978
|
|
Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2 |
|
| #
7b82b4bd |
| 02-May-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Make intrinsics speculatable
llvm-svn: 301937
|
|
Revision tags: llvmorg-4.0.1-rc1 |
|
| #
3dbeefa9 |
| 21-Mar-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default ca
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel).
llvm-svn: 298444
show more ...
|
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
| #
99c14524 |
| 25-Apr-2016 |
Matt Arsenault <[email protected]> |
AMDGPU: Implement addrspacecast
llvm-svn: 267452
|
| #
48ab526f |
| 25-Apr-2016 |
Matt Arsenault <[email protected]> |
AMDGPU: Add queue ptr intrinsic
llvm-svn: 267451
|
|
Revision tags: llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2 |
|
| #
d0799df7 |
| 30-Jan-2016 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get the workgroup size, and the read.local.size intrinsics do not work
AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get the workgroup size, and the read.local.size intrinsics do not work correctly.
llvm-svn: 259296
show more ...
|