History log of /llvm-project-15.0.7/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp (Results 1 – 18 of 18)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1
# fd8fd9e5 27-Jul-2022 Joseph Huber <[email protected]>

Revert "[OpenMP] Remove noinline attributes in the device runtime"

The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that

Revert "[OpenMP] Remove noinline attributes in the device runtime"

The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.

This reverts commit d61d72dae604c3258e25c00622b1a85861450303.

Fixes #56752

(cherry picked from commit b08369f7f288b6efb0897953da42ed54e60cfc0b)

show more ...


Revision tags: llvmorg-16-init
# d61d72da 25-Jul-2022 Joseph Huber <[email protected]>

[OpenMP] Remove noinline attributes in the device runtime

We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped

[OpenMP] Remove noinline attributes in the device runtime

We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130368

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4
# b4f8443d 09-May-2022 Joseph Huber <[email protected]>

[Libomptarget] Allow the device runtime to be compiled for the host

Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This

[Libomptarget] Allow the device runtime to be compiled for the host

Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This is
problematic if we want to make compiling the device runtime more general
via the standar `clang` driver rather than invoking the clang front-end
directly. This patch addresses this by primarily changing the declare
type to `nohost` so the host will not contain any of this code.
Additionally we forward declare the functions that are defined via
variants, otherwise these would cause problems on the host.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D125260

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init
# f52927c1 01-Feb-2022 Jon Chesterfield <[email protected]>

Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned"

This seems to be the root cause of hangs on amdgpu. Reverting while investigating.
This reverts commit 7b9844cc8dd0045f5251450ba

Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned"

This seems to be the root cause of hangs on amdgpu. Reverting while investigating.
This reverts commit 7b9844cc8dd0045f5251450ba2980d6d6ac48ef9.

show more ...


# 7b9844cc 26-Jan-2022 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned

Due to num_threads (probably also other reasons) we cannot assume
explicit barriers are always executed by all threads in an aligned
fash

[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned

Due to num_threads (probably also other reasons) we cannot assume
explicit barriers are always executed by all threads in an aligned
fashion. We can optimize them if that property can be proven but
that is different.

show more ...


# 619f44b0 28-Jan-2022 Ron Lieberman <[email protected]>

Revert "[OpenMP] Ensure broken assumptions print once, not thousands of times."

This reverts commit 27c799ecc9e9e3bfb8232c93fd500f45ca0cb345.


# 27c799ec 27-Jan-2022 Joseph Huber <[email protected]>

[OpenMP] Ensure broken assumptions print once, not thousands of times.

If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many team

[OpenMP] Ensure broken assumptions print once, not thousands of times.

If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many teams this can
become a problem. To avoid it we use a hash that tracks if a broken
assumption has (likely) been printed and avoid printing it again. This
is not fool proof and has some caveats that might cause problems in
the future (see comment) but it should improve the situation
considerably for now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112156

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 73720c80 31-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Introduce and use a simple generic-mode barrier

Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers ar

[OpenMP][FIX] Introduce and use a simple generic-mode barrier

Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112893

show more ...


# 74f91741 18-Oct-2021 Joseph Huber <[email protected]>

[OpenMP] Use function tracing RAII for runtime functions.

This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This

[OpenMP] Use function tracing RAII for runtime functions.

This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This is
enabled by first compiling the new runtime with
`-fopenmp-target-debug=3` and running with
`LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and
thread 0 so there isn't much output when using a generic region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112002

show more ...


# 4d50803c 28-Oct-2021 Jon Chesterfield <[email protected]>

[libomptarget] Build DeviceRTL for amdgpu

Passes same tests as the current deviceRTL. Includes cmake change from D111987.
CI is showing a different set of pass/fails to local, committing this
withou

[libomptarget] Build DeviceRTL for amdgpu

Passes same tests as the current deviceRTL. Includes cmake change from D111987.
CI is showing a different set of pass/fails to local, committing this
without the tests enabled by default while debugging that difference.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227

show more ...


# 22bd75be 28-Oct-2021 Jon Chesterfield <[email protected]>

[openmp] Fix a git misfire in cf37a94c1e42ce


# 6c7b203d 28-Oct-2021 Jon Chesterfield <[email protected]>

Revert "[libomptarget] Build DeviceRTL for amdgpu"
- more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2.


# cf37a94c 27-Oct-2021 Jon Chesterfield <[email protected]>

[openmp] Add amdgpu impl missed from D112153


# 33427fdb 27-Oct-2021 Jon Chesterfield <[email protected]>

[libomptarget] Build DeviceRTL for amdgpu

Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D11

[libomptarget] Build DeviceRTL for amdgpu

Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227

show more ...


# b16aadf0 20-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP] Introduce aligned synchronization into the new device RT

We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some

[OpenMP] Introduce aligned synchronization into the new device RT

We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some more
documentation.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112153

show more ...


# 7272982e 19-Oct-2021 Jon Chesterfield <[email protected]>

[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup

Subset of D111993. Fix typos, rename read to load.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111999


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 21d91a8e 18-Aug-2021 Jon Chesterfield <[email protected]>

[libomptarget][devicertl] Replace lanemask with uint64 at interface

Use uint64_t for lanemask on all GPU architectures at the interface
with clang. Updates tests. The deviceRTL is always linked as I

[libomptarget][devicertl] Replace lanemask with uint64 at interface

Use uint64_t for lanemask on all GPU architectures at the interface
with clang. Updates tests. The deviceRTL is always linked as IR so the zext
and trunc introduced for wave32 architectures will fold after inlining.

Simplification partly motivated by amdgpu gfx10 which will be wave32 and
is awkward to express in the current arch-dependant typedef interface.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108317

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init
# 67ab875f 25-Jul-2021 Johannes Doerfert <[email protected]>

[OpenMP] Prototype opt-in new GPU device RTL

The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an

[OpenMP] Prototype opt-in new GPU device RTL

The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.

The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.

The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.

Documentation and debug features are still mostly missing and will be
added over time.

All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.

Differential Revision: https://reviews.llvm.org/D106803

show more ...