|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1 |
|
| #
fd8fd9e5 |
| 27-Jul-2022 |
Joseph Huber <[email protected]> |
Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that
Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that when we use `-mlink-builtin-bitcode` we only import used symbols from the runtime. Then OpenMPOpt will insert calls to symbols that were not previously included. This patch removed this implicit behaviour as these functions were kept alive by the `noinline` simply because it kept calls to them in the module. This caused regression in some tests that relied on some OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but will try to fix it more correctly on main.
This reverts commit d61d72dae604c3258e25c00622b1a85861450303.
Fixes #56752
(cherry picked from commit b08369f7f288b6efb0897953da42ed54e60cfc0b)
show more ...
|
|
Revision tags: llvmorg-16-init |
|
| #
d61d72da |
| 25-Jul-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped
[OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped immediately in the OpenMPOpt module pass. However, Since the changes in D130298, we not explicitly state which functions will have external visiblity in the bitcode library. Additionally the OpenMPOpt module pass should run before the inliner pass, so this shouldn't make a difference in whether or not the functions will be alive for the initial pass of OpenMPOpt. This should simplify the interface, and additionally save time spend on scanning funciton names for noinline.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130368
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
b4f8443d |
| 09-May-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Allow the device runtime to be compiled for the host
Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This
[Libomptarget] Allow the device runtime to be compiled for the host
Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This is problematic if we want to make compiling the device runtime more general via the standar `clang` driver rather than invoking the clang front-end directly. This patch addresses this by primarily changing the declare type to `nohost` so the host will not contain any of this code. Additionally we forward declare the functions that are defined via variants, otherwise these would cause problems on the host.
Reviewed By: jdoerfert, tianshilei1992
Differential Revision: https://reviews.llvm.org/D125260
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
f52927c1 |
| 01-Feb-2022 |
Jon Chesterfield <[email protected]> |
Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned"
This seems to be the root cause of hangs on amdgpu. Reverting while investigating. This reverts commit 7b9844cc8dd0045f5251450ba
Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned"
This seems to be the root cause of hangs on amdgpu. Reverting while investigating. This reverts commit 7b9844cc8dd0045f5251450ba2980d6d6ac48ef9.
show more ...
|
| #
7b9844cc |
| 26-Jan-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned
Due to num_threads (probably also other reasons) we cannot assume explicit barriers are always executed by all threads in an aligned fash
[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned
Due to num_threads (probably also other reasons) we cannot assume explicit barriers are always executed by all threads in an aligned fashion. We can optimize them if that property can be proven but that is different.
show more ...
|
| #
619f44b0 |
| 28-Jan-2022 |
Ron Lieberman <[email protected]> |
Revert "[OpenMP] Ensure broken assumptions print once, not thousands of times."
This reverts commit 27c799ecc9e9e3bfb8232c93fd500f45ca0cb345.
|
| #
27c799ec |
| 27-Jan-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Ensure broken assumptions print once, not thousands of times.
If we have a broken assumption we want to print a message to the user. If the assumption is broken by many threads in many team
[OpenMP] Ensure broken assumptions print once, not thousands of times.
If we have a broken assumption we want to print a message to the user. If the assumption is broken by many threads in many teams this can become a problem. To avoid it we use a hash that tracks if a broken assumption has (likely) been printed and avoid printing it again. This is not fool proof and has some caveats that might cause problems in the future (see comment) but it should improve the situation considerably for now.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D112156
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
73720c80 |
| 31-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers ar
[OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112893
show more ...
|
| #
74f91741 |
| 18-Oct-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This
[OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112002
show more ...
|
| #
4d50803c |
| 28-Oct-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this withou
[libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this without the tests enabled by default while debugging that difference.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112227
show more ...
|
| #
22bd75be |
| 28-Oct-2021 |
Jon Chesterfield <[email protected]> |
[openmp] Fix a git misfire in cf37a94c1e42ce
|
| #
6c7b203d |
| 28-Oct-2021 |
Jon Chesterfield <[email protected]> |
Revert "[libomptarget] Build DeviceRTL for amdgpu" - more tests failing on CI than failed locally when writing this patch
This reverts commit 33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2.
|
| #
cf37a94c |
| 27-Oct-2021 |
Jon Chesterfield <[email protected]> |
[openmp] Add amdgpu impl missed from D112153
|
| #
33427fdb |
| 27-Oct-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D11
[libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112227
show more ...
|
| #
b16aadf0 |
| 20-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some
[OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112153
show more ...
|
| #
7272982e |
| 19-Oct-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup
Subset of D111993. Fix typos, rename read to load.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D111999
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
21d91a8e |
| 18-Aug-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget][devicertl] Replace lanemask with uint64 at interface
Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as I
[libomptarget][devicertl] Replace lanemask with uint64 at interface
Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining.
Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108317
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
67ab875f |
| 25-Jul-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP] Prototype opt-in new GPU device RTL
The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an
[OpenMP] Prototype opt-in new GPU device RTL
The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name.
The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources.
The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary.
Documentation and debug features are still mostly missing and will be added over time.
All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols.
Differential Revision: https://reviews.llvm.org/D106803
show more ...
|