|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1 |
|
| #
fd8fd9e5 |
| 27-Jul-2022 |
Joseph Huber <[email protected]> |
Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that
Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that when we use `-mlink-builtin-bitcode` we only import used symbols from the runtime. Then OpenMPOpt will insert calls to symbols that were not previously included. This patch removed this implicit behaviour as these functions were kept alive by the `noinline` simply because it kept calls to them in the module. This caused regression in some tests that relied on some OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but will try to fix it more correctly on main.
This reverts commit d61d72dae604c3258e25c00622b1a85861450303.
Fixes #56752
(cherry picked from commit b08369f7f288b6efb0897953da42ed54e60cfc0b)
show more ...
|
|
Revision tags: llvmorg-16-init |
|
| #
d61d72da |
| 25-Jul-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped
[OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped immediately in the OpenMPOpt module pass. However, Since the changes in D130298, we not explicitly state which functions will have external visiblity in the bitcode library. Additionally the OpenMPOpt module pass should run before the inliner pass, so this shouldn't make a difference in whether or not the functions will be alive for the initial pass of OpenMPOpt. This should simplify the interface, and additionally save time spend on scanning funciton names for noinline.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130368
show more ...
|
| #
d1501526 |
| 18-Jul-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP] Introduce more fine-grained control over the thread state use
We can help optimizations by making sure we use the team state whenever it is clear there is no thread state. To this end we in
[OpenMP] Introduce more fine-grained control over the thread state use
We can help optimizations by making sure we use the team state whenever it is clear there is no thread state. To this end we introduce a new state flag (`state::HasThreadState`) and explicit control for the `state::ValueRAII` helpers, including a dedicated "assert equal".
Differential Revision: https://reviews.llvm.org/D130113
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
3351ae61 |
| 22-Jun-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Remove duplicate data environment exit
Summary: This patch removes a duplicated exit from the OpenMP data envrionment. We already have an RAII method that guards this environment so i
[Libomptarget] Remove duplicate data environment exit
Summary: This patch removes a duplicated exit from the OpenMP data envrionment. We already have an RAII method that guards this environment so it is unnecessary.
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
ce0caf41 |
| 10-May-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Address existing warnings in the device runtime library
This patche attemps to address the current warnings in the OpenMP offloading device runtime. Previously we did not see these be
[Libomptarget] Address existing warnings in the device runtime library
This patche attemps to address the current warnings in the OpenMP offloading device runtime. Previously we did not see these because we compiled the runtime without the standard warning flags enabled. However, these warnings are used when we now build the static library version of this runtime. This became extremely noisy when coupled with the fact the we compile each file roughly 32 times when all the architectures are considered. So it would be ideal to not have all these warnings show up when building.
Most of these errors were simply implicit switch-case fallthroughs, which can be addressed using C++17's fallthrough attribute. Additionally there was a volatile variable that was being casted away. This is most likely safe to remove because we cast it away before its even used and didn't seem to affect anything in testing.
Depends on D125260
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D125339
show more ...
|
| #
b4f8443d |
| 09-May-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Allow the device runtime to be compiled for the host
Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This
[Libomptarget] Allow the device runtime to be compiled for the host
Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This is problematic if we want to make compiling the device runtime more general via the standar `clang` driver rather than invoking the clang front-end directly. This patch addresses this by primarily changing the declare type to `nohost` so the host will not contain any of this code. Additionally we forward declare the functions that are defined via variants, otherwise these would cause problems on the host.
Reviewed By: jdoerfert, tianshilei1992
Differential Revision: https://reviews.llvm.org/D125260
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
a619072c |
| 21-Mar-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Manually unroll the argument copy loop
The unroll pragma did not properly work as the loop bound was not known when we optimize the runtime and we then added a "unroll disable" metadata whi
[OpenMP] Manually unroll the argument copy loop
The unroll pragma did not properly work as the loop bound was not known when we optimize the runtime and we then added a "unroll disable" metadata which prevented unrolling later when the bounds were known. For now we manually unroll to make sure up to 16 elements are handled nicely. This helps optimizations to look through the argument passing.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109164
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
1e121568 |
| 27-Jan-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions
IdentTy objects are useful for debugging and profiling so we want to keep them around in more places, especially those that have
[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions
IdentTy objects are useful for debugging and profiling so we want to keep them around in more places, especially those that have a large impact on performance, e.g., everything related to state.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112494
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
bc9c4d72 |
| 09-Dec-2021 |
Joseph Huber <[email protected]> |
[OpenMP][FIX] Pass the num_threads value directly to parallel_51
The problem with the old scheme is that we would need to keep track of the "next region" and reset the num_threads value after it. Th
[OpenMP][FIX] Pass the num_threads value directly to parallel_51
The problem with the old scheme is that we would need to keep track of the "next region" and reset the num_threads value after it. The new RT doesn't do it and an assertion is triggered. The old RT doesn't do it either, I haven't tested it but I assume a num_threads clause might impact multiple parallel regions "accidentally". Further, in SPMD mode num_threads was simply ignored, for some reason beyond me.
In any case, parallel_51 is designed to take the clause value directly, so let's do that instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D113623
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
025f5492 |
| 30-Oct-2021 |
Shilei Tian <[email protected]> |
[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3
The synchronization at the end of parallel region cannot make sure all threads exit the scope. As a result, the assertions right after it m
[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3
The synchronization at the end of parallel region cannot make sure all threads exit the scope. As a result, the assertions right after it might be hit, and further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may not hold as well. We either add a synchronization right after the parallel region, or remove the assertions and assuptions. Here we choose the first one as those assertions and assumptions can help optimizations.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112861
show more ...
|
| #
74f91741 |
| 18-Oct-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This
[OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112002
show more ...
|
| #
48877525 |
| 20-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP] Remove obsolete external interface for device RT
We do not generate _serialized_parallel calls in device mode, no need for an external API.
Reviewed By: JonChesterfield
Differential Revis
[OpenMP] Remove obsolete external interface for device RT
We do not generate _serialized_parallel calls in device mode, no need for an external API.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D112145
show more ...
|
| #
5102c3c6 |
| 20-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Do not adjust the level after the environment was popped
Exiting a data environment will reset all values, it is wrong to adjust them afterwards.
Reviewed By: tianshilei1992
Differen
[OpenMP][FIX] Do not adjust the level after the environment was popped
Exiting a data environment will reset all values, it is wrong to adjust them afterwards.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112144
show more ...
|
| #
b16aadf0 |
| 20-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some
[OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112153
show more ...
|
| #
44710940 |
| 08-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Data race in the SPMD execution of the new runtime
We need to synchronize the threads *before* we destroy the RAII objects that hold the old values and not after to avoid threads execu
[OpenMP][FIX] Data race in the SPMD execution of the new runtime
We need to synchronize the threads *before* we destroy the RAII objects that hold the old values and not after to avoid threads executing the parallel region but seeing an inconsistent state.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D111369
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
d83ca624 |
| 23-Sep-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Fix data-race in new device RTL
This patch fixes a data-race observed when using the new device runtime library. The Internal control variable for the parallel level is read in the `__kmpc_
[OpenMP] Fix data-race in new device RTL
This patch fixes a data-race observed when using the new device runtime library. The Internal control variable for the parallel level is read in the `__kmpc_parallel_51` function while it could potentially be written by other threads. This causes data corruption and will cause nondetermistic behaviour in the runtime. This patch fixes this by adding an explicit synchronization before the region starts.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110366
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
e3ee7624 |
| 27-Jul-2021 |
Joseph Huber <[email protected]> |
[Libomptarget] Revert new variable sharing to use the old method
The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its n
[Libomptarget] Revert new variable sharing to use the old method
The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106905
show more ...
|
| #
67ab875f |
| 25-Jul-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP] Prototype opt-in new GPU device RTL
The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an
[OpenMP] Prototype opt-in new GPU device RTL
The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name.
The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources.
The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary.
Documentation and debug features are still mostly missing and will be added over time.
All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols.
Differential Revision: https://reviews.llvm.org/D106803
show more ...
|