|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
bf789b19 |
| 21-Jun-2022 |
Johannes Doerfert <[email protected]> |
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication.
This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in 6555558a80589d1c5a1154b92cc3af9495f8f86c.
show more ...
|
| #
f6e0c05e |
| 08-Jul-2022 |
Johannes Doerfert <[email protected]> |
Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit f17639ea0cd30f52ac853ba2eb25518426cc3bb8 as three AMDGPU tests haven't been updated. Will need to verify the
Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit f17639ea0cd30f52ac853ba2eb25518426cc3bb8 as three AMDGPU tests haven't been updated. Will need to verify the changes are not regressions we should avoid.
show more ...
|
| #
f17639ea |
| 21-Jun-2022 |
Johannes Doerfert <[email protected]> |
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication.
This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in 6555558a80589d1c5a1154b92cc3af9495f8f86c.
show more ...
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
6555558a |
| 09-Jun-2022 |
Johannes Doerfert <[email protected]> |
Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit da50dab1ae111e9e6cb0248a47a038b17f798705.
Patch broke AMD GPU OpenMP offload buildbots. https://lab.llvm.org
Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit da50dab1ae111e9e6cb0248a47a038b17f798705.
Patch broke AMD GPU OpenMP offload buildbots. https://lab.llvm.org/buildbot/#/builders/193/builds/13246
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
da50dab1 |
| 10-May-2022 |
Johannes Doerfert <[email protected]> |
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many
[Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication.
This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes, which is good.
Fixes: https://github.com/llvm/llvm-project/issues/54981
show more ...
|
| #
dbffa407 |
| 18-May-2022 |
Joseph Huber <[email protected]> |
[NVVM] Update intrinsic defintions to include the `nocallback` attribute
This patch adds the `nocallback` attribute to the NVVM intrinsics that did not use the `DefaultAttrsIntrinsic` method that in
[NVVM] Update intrinsic defintions to include the `nocallback` attribute
This patch adds the `nocallback` attribute to the NVVM intrinsics that did not use the `DefaultAttrsIntrinsic` method that includes it already. The `nocallback` attribute states that the intrinsic function cannot enter back into the caller's translation-unit. This allows as to determine that a function calling a `nocallback` function can have the `norecurse` attribute. This should be safe for all the NVVM intrinsics because they do not call other functions within the translation unit.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D125937
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
e90bce8f |
| 31-Mar-2022 |
Augie Fackler <[email protected]> |
CallBase: fix getFnAttr so it also checks the function
Prior to this change, CallBase::hasFnAttr checked the called function to see if it had an attribute if it wasn't set on the CallBase, but getFn
CallBase: fix getFnAttr so it also checks the function
Prior to this change, CallBase::hasFnAttr checked the called function to see if it had an attribute if it wasn't set on the CallBase, but getFnAttr didn't do the same delegation, which led to very confusing behavior. This patch fixes the issue by making CallBase::getFnAttr also check the function under the same circumstances.
Test changes look (to me) like they're cleaning up redundant attributes which no longer get specified both on the callee and call. We also clean up the one ad-hoc implementation of this getter over in InlineCost.cpp.
Differential Revision: https://reviews.llvm.org/D122821
show more ...
|
| #
a81fff8a |
| 24-Mar-2022 |
Johannes Doerfert <[email protected]> |
Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes"
This reverts commit c5f789050daab25aad6770790987e2b7c0395936 and reapplies 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 with a
Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes"
This reverts commit c5f789050daab25aad6770790987e2b7c0395936 and reapplies 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 with additional test changes.
show more ...
|
| #
c5f78905 |
| 24-Mar-2022 |
Johannes Doerfert <[email protected]> |
Revert "[Intrinsics] Add `nocallback` to the default intrinsic attributes"
This reverts commit 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 as it breaks the buildbots.
I didn't see these failures in th
Revert "[Intrinsics] Add `nocallback` to the default intrinsic attributes"
This reverts commit 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 as it breaks the buildbots.
I didn't see these failures in the pre-merge checks, looking into it.
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
7aea3ea8 |
| 01-Feb-2022 |
Johannes Doerfert <[email protected]> |
[Intrinsics] Add `nocallback` to the default intrinsic attributes
Most intrinsics, especially "default" ones, will not call back into the IR module. `nocallback` encodes this nicely. As it was not u
[Intrinsics] Add `nocallback` to the default intrinsic attributes
Most intrinsics, especially "default" ones, will not call back into the IR module. `nocallback` encodes this nicely. As it was not used before, this patch also makes use of `nocallback` in the Attributor which results in many more `norecurse` deductions.
Tablegen part is mechanical, test updates by script.
Differential Revision: https://reviews.llvm.org/D118680
show more ...
|
| #
4166738c |
| 18-Mar-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Do not crash when kernels are debug wrapper functions
With debug information enabled (-g) Clang will wrap the actual target region into a new function which is called from the "kernel"
[OpenMP][FIX] Do not crash when kernels are debug wrapper functions
With debug information enabled (-g) Clang will wrap the actual target region into a new function which is called from the "kernel". The problem is that the "kernel" is now basically a wrapper without all the things we expect. More importantly, if we end up asking for an AAKernelInfo for the "target region function" we might try to turn it into SPMD mode. That used to cause an assertion as that function doesn't have an appropriately named `_exec_mode` global. While the global is going away soon we still need to make sure to properly handle this case, e.g., perform optimizations reliably.
Differential Revision: https://reviews.llvm.org/D122043
show more ...
|
| #
e92891f8 |
| 11-Mar-2022 |
Johannes Doerfert <[email protected]> |
[Attributor] Allow not to default initialize AAs for live internal functions
Outside users of the Attributor, e.g., OpenMP-opt, want to seed AAs themselves. We should not seed all default AAs one an
[Attributor] Allow not to default initialize AAs for live internal functions
Outside users of the Attributor, e.g., OpenMP-opt, want to seed AAs themselves. We should not seed all default AAs one an internal function becomes live. That said, there should be a callback such that they can do lazy seeding as well.
Differential Revision: https://reviews.llvm.org/D121489
show more ...
|
| #
192a34dd |
| 25-Feb-2022 |
Johannes Doerfert <[email protected]> |
[Attributor][OpenMPOpt][FIX] Register simplification callbacks
Heap-2-stack and heap-2-shared can replace an allocation call with something else. To avoid us deriving information from the allocator
[Attributor][OpenMPOpt][FIX] Register simplification callbacks
Heap-2-stack and heap-2-shared can replace an allocation call with something else. To avoid us deriving information from the allocator implementation we register a simplification callback now that will force us to stop at the call site. We probably should create the replacement memory eagerly and return that instead though.
show more ...
|
| #
0136a440 |
| 17-Feb-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Add an option to limit shared memory usage in OpenMPOpt
One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping th
[OpenMP] Add an option to limit shared memory usage in OpenMPOpt
One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping the runtime call in all cases, however if too many variables are pushed to hared memory the kernel will crash. Since this is an optimization and not something the user specified explicitly, there should be an option to limit this optimization in those cases. This path introduces the `-openmp-opt-shared-limit=` option to limit the amount of bytes that will be placed in shared memory from HeapToShared.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120079
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
ac3ec22d |
| 20-Sep-2021 |
Johannes Doerfert <[email protected]> |
[Attributor] Use AAFunctionReachability to determine AANoRecurse
We missed out on AANoRecurse in the module pass because we had no call graph. With AAFunctionReachability we can simply ask if the fu
[Attributor] Use AAFunctionReachability to determine AANoRecurse
We missed out on AANoRecurse in the module pass because we had no call graph. With AAFunctionReachability we can simply ask if the function may reach itself.
Differential Revision: https://reviews.llvm.org/D110099
show more ...
|
| #
a1db0e52 |
| 01-Feb-2022 |
Johannes Doerfert <[email protected]> |
[Attributor][FIX] Liveness handling in the isAssumedDead helpers
This fixes a conceptual problem with our AAIsDead usage which conflated call site liveness with call site return value liveness. With
[Attributor][FIX] Liveness handling in the isAssumedDead helpers
This fixes a conceptual problem with our AAIsDead usage which conflated call site liveness with call site return value liveness. Without the fix tests would obviously miscompile as we make genericValueTraversal more powerful (in a follow up). The effects on the tests are mixed but mostly marginal. The most prominent one is the lack of `noreturn` for functions. The reason is that we make entire blocks live at the same time (for time reasons). Now that we actually look at the block liveness, which we need to do, the return instructions are live and will survive. As an example, `noreturn_async.ll` has been modified to retain the `noreturn` even with block granularity. We could address this easily but there is little need in practice.
show more ...
|
| #
5eb49009 |
| 24-Jan-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Add more identifier to created shared globals
Currenly we push some variables to a global constant containing shared memory as an optimization. This generated constant had internal linkage
[OpenMP] Add more identifier to created shared globals
Currenly we push some variables to a global constant containing shared memory as an optimization. This generated constant had internal linkage and should not have collided with any known identifiers in the translation unit. However, there have been observed cases of this optimiztaion unintentionally colliding with undocumented PTX identifiers. This patch adds a suffix to the created globals to hopefully bypass this.
Depends on D118059
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D118068
show more ...
|
| #
6e220296 |
| 27-Dec-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Use alignment information in HeapToShared
This patch uses the return alignment attribute now present in the `__kmpc_alloc_shared` runtime call to set the alignment of the shared memory glob
[OpenMP] Use alignment information in HeapToShared
This patch uses the return alignment attribute now present in the `__kmpc_alloc_shared` runtime call to set the alignment of the shared memory global created to replace it.
Depends on D115971
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D116319
show more ...
|
| #
c690c1c9 |
| 17-Sep-2021 |
Johannes Doerfert <[email protected]> |
[NVVM] Update intrinsic definitions to include more attributes
A lot of NVVM intrinsics can use the default intrinsic attributes (e.g., nosync, nofree, ...) as well as `speculatable`. The latter is
[NVVM] Update intrinsic definitions to include more attributes
A lot of NVVM intrinsics can use the default intrinsic attributes (e.g., nosync, nofree, ...) as well as `speculatable`. The latter is important if we want to recompute intrinsics results instead of communicating them via memory.
I did use default attributes for almost all `readnone` attributes but speculatable only where I had reasonable confidence they cannot experience UB. That said, someone should double check.
TODO: There seem to be various intrinsics marked `Commutative` which should not, e.g., fma and div.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D109987
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc3 |
|
| #
57822c3f |
| 08-Sep-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP][NFC] Repair test that contained nested kernels
The benchmark contained (partially) nested kernels, something we do not generate nor support.
|
| #
423d34f7 |
| 22-Sep-2021 |
Shilei Tian <[email protected]> |
[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patch
[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110279
show more ...
|
| #
fec2927e |
| 17-Sep-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Add NoSync attributes to alloc / free shared RTL calls
This patch adds the `nosync` attribute to the `__kmpc_alloc_shared` and `__kmpc_free_shared` runtime library calls. This allows code a
[OpenMP] Add NoSync attributes to alloc / free shared RTL calls
This patch adds the `nosync` attribute to the `__kmpc_alloc_shared` and `__kmpc_free_shared` runtime library calls. This allows code analysis to know that these functins dont contain any barriers. This will help optimizations reason about the CFG of blocks containing these calls.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109995
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
29a3e3dd |
| 04-Aug-2021 |
Giorgis Georgakoudis <[email protected]> |
[OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that s
[OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D106892
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
754eb1c2 |
| 21-Jul-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. T
[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106496
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
d9659bf6 |
| 20-May-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called
[OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions.
The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
Differential Revision: https://reviews.llvm.org/D101977
show more ...
|