|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
bf789b19 |
| 21-Jun-2022 |
Johannes Doerfert <[email protected]> |
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication.
This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in 6555558a80589d1c5a1154b92cc3af9495f8f86c.
show more ...
|
| #
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
3be3b401 |
| 15-Apr-2022 |
Johannes Doerfert <[email protected]> |
[Attributor][NFCI] Introduce AttributorConfig to bundle all options
Instead of lengthy constructors we can now set the members of a read-only struct before the Attributor is created. Should make it
[Attributor][NFCI] Introduce AttributorConfig to bundle all options
Instead of lengthy constructors we can now set the members of a read-only struct before the Attributor is created. Should make it clearer what is configurable and also help introducing new options in the future. This actually added IsModulePass and avoids deduction through the Function set size. No functional change was intended.
show more ...
|
| #
8edaf259 |
| 12-Apr-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset t
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.
Reviewers: arsenm, sameerds, b-sumner and foad
Differential Revision: https://reviews.llvm.org/D123548
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
dd5895cc |
| 17-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.
Reviewers: arsenm, sameerds, yaxunl
Differential Revision: https://reviews.llvm.org/D120265
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
0f20a35b |
| 09-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In curr
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
ca62b1db |
| 25-Feb-2022 |
Changpeng Fang <[email protected]> |
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://reviews.llvm.org/D119027
show more ...
|
| #
d8f99bb6 |
| 11-Feb-2022 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now r
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument.
The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
c6a6b579 |
| 09-Feb-2022 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] [NFC] Fix incorrect use of bitwise operator.
Differential Revision: https://reviews.llvm.org/D119308
|
| #
02a2e46f |
| 08-Feb-2022 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] [NFC] refactor the AMDGPU attributor
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D119087
|
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
4132dc91 |
| 16-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Return result from indicatePessimisticFixpoint
I don't think this fixes anything.
|
| #
6bcf1f91 |
| 11-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Indicate pessimistic fixpoint for entry functions
There aren't going to be any callers for these, so avoid running through the machinery to look at the callers.
|
| #
0eebe2e3 |
| 02-Dec-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Sanitized functions require implicit arguments
Do not infer no-amdgpu-implicitarg-ptr for sanitized functions. If a function is explicitly marked amdgpu-no-implicitarg-ptr and sanitize_addre
AMDGPU: Sanitized functions require implicit arguments
Do not infer no-amdgpu-implicitarg-ptr for sanitized functions. If a function is explicitly marked amdgpu-no-implicitarg-ptr and sanitize_address, infer that it is required.
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
9b8b1645 |
| 07-Nov-2021 |
Benjamin Kramer <[email protected]> |
Put implementation details into anonymous namespaces. NFCI.
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
ec57b375 |
| 09-Sep-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size
This can merge the acceptable ranges based on the call graph, rather than the simple application of the attribute. Remove the handling
AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size
This can merge the acceptable ranges based on the call graph, rather than the simple application of the attribute. Remove the handling from the old pass.
show more ...
|
| #
f1217420 |
| 09-Sep-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Rename attributor class for uniform-work-group-size
This isn't really an AMDGPU specific attribute and could be moved to generic code. It's also important to include the word uniform in the
AMDGPU: Rename attributor class for uniform-work-group-size
This isn't really an AMDGPU specific attribute and could be moved to generic code. It's also important to include the word uniform in the name.
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
088cc636 |
| 11-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert their meanings.
This now diverges more from the old AMDGPUAnnotateKernelFeatures, but this isn't u
AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert their meanings.
This now diverges more from the old AMDGPUAnnotateKernelFeatures, but this isn't used yet anyway.
show more ...
|
| #
46d82e73 |
| 13-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically the regular transforms were already run at this point. Touching undefined behavior breaks
AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically the regular transforms were already run at this point. Touching undefined behavior breaks a lot of tests when this is enabled by default, many of which are expecting to test handling of undef operations.
show more ...
|
| #
cf32d61a |
| 11-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as attributes, and are currently a hacky way of passing an analysis into the
AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as attributes, and are currently a hacky way of passing an analysis into the DAG. These don't really belong in the IR, and don't really fit in with the other attributes. Remove these to facilitate inverting the pass.
I don't exactly understand the indirect call test changes. These tests are using calls which are trivially replacable with a direct call, so I'm not sure what the point is.
show more ...
|
| #
98d7aa43 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer support using it outside of kernels.
|
| #
a77ae4aa |
| 12-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Stop attributor adding attributes to intrinsic declarations
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
edb05d55 |
| 24-Jul-2021 |
Alexander Belyaev <[email protected]> |
[llvm] Inline getAssociatedFunction() in LLVM_DEBUG.
Function* F is used only inside LLVM_DEBUG, so that it causes unused variable warning.
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4 |
|
| #
96709823 |
| 27-Jun-2021 |
Kuter Dinel <[email protected]> |
[AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Reviewed By: jdoerfert, arsenm
Differential Revision: htt
[AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Reviewed By: jdoerfert, arsenm
Differential Revision: https://reviews.llvm.org/D104997
show more ...
|