History log of /llvm-project-15.0.7/openmp/libomptarget/DeviceRTL/src/Parallelism.cpp (Results 1 – 18 of 18)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1
# fd8fd9e5 27-Jul-2022 Joseph Huber <[email protected]>

Revert "[OpenMP] Remove noinline attributes in the device runtime"

The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that

Revert "[OpenMP] Remove noinline attributes in the device runtime"

The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.

This reverts commit d61d72dae604c3258e25c00622b1a85861450303.

Fixes #56752

(cherry picked from commit b08369f7f288b6efb0897953da42ed54e60cfc0b)

show more ...


Revision tags: llvmorg-16-init
# d61d72da 25-Jul-2022 Joseph Huber <[email protected]>

[OpenMP] Remove noinline attributes in the device runtime

We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped

[OpenMP] Remove noinline attributes in the device runtime

We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130368

show more ...


# d1501526 18-Jul-2022 Johannes Doerfert <[email protected]>

[OpenMP] Introduce more fine-grained control over the thread state use

We can help optimizations by making sure we use the team state whenever
it is clear there is no thread state. To this end we in

[OpenMP] Introduce more fine-grained control over the thread state use

We can help optimizations by making sure we use the team state whenever
it is clear there is no thread state. To this end we introduce a new
state flag (`state::HasThreadState`) and explicit control for the
`state::ValueRAII` helpers, including a dedicated "assert equal".

Differential Revision: https://reviews.llvm.org/D130113

show more ...


Revision tags: llvmorg-14.0.6
# 3351ae61 22-Jun-2022 Joseph Huber <[email protected]>

[Libomptarget] Remove duplicate data environment exit

Summary:
This patch removes a duplicated exit from the OpenMP data envrionment.
We already have an RAII method that guards this environment so i

[Libomptarget] Remove duplicate data environment exit

Summary:
This patch removes a duplicated exit from the OpenMP data envrionment.
We already have an RAII method that guards this environment so it is
unnecessary.

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# ce0caf41 10-May-2022 Joseph Huber <[email protected]>

[Libomptarget] Address existing warnings in the device runtime library

This patche attemps to address the current warnings in the OpenMP
offloading device runtime. Previously we did not see these be

[Libomptarget] Address existing warnings in the device runtime library

This patche attemps to address the current warnings in the OpenMP
offloading device runtime. Previously we did not see these because we
compiled the runtime without the standard warning flags enabled.
However, these warnings are used when we now build the static library
version of this runtime. This became extremely noisy when coupled with
the fact the we compile each file roughly 32 times when all the
architectures are considered. So it would be ideal to not have all these
warnings show up when building.

Most of these errors were simply implicit switch-case fallthroughs,
which can be addressed using C++17's fallthrough attribute. Additionally
there was a volatile variable that was being casted away. This is most
likely safe to remove because we cast it away before its even used and
didn't seem to affect anything in testing.

Depends on D125260

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D125339

show more ...


# b4f8443d 09-May-2022 Joseph Huber <[email protected]>

[Libomptarget] Allow the device runtime to be compiled for the host

Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This

[Libomptarget] Allow the device runtime to be compiled for the host

Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This is
problematic if we want to make compiling the device runtime more general
via the standar `clang` driver rather than invoking the clang front-end
directly. This patch addresses this by primarily changing the declare
type to `nohost` so the host will not contain any of this code.
Additionally we forward declare the functions that are defined via
variants, otherwise these would cause problems on the host.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D125260

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# a619072c 21-Mar-2022 Joseph Huber <[email protected]>

[OpenMP] Manually unroll the argument copy loop

The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata whi

[OpenMP] Manually unroll the argument copy loop

The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata which prevented unrolling later when the bounds were known.
For now we manually unroll to make sure up to 16 elements are handled
nicely. This helps optimizations to look through the argument passing.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109164

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init
# 1e121568 27-Jan-2022 Johannes Doerfert <[email protected]>

[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions

IdentTy objects are useful for debugging and profiling so we want to
keep them around in more places, especially those that have

[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions

IdentTy objects are useful for debugging and profiling so we want to
keep them around in more places, especially those that have a large
impact on performance, e.g., everything related to state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112494

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# bc9c4d72 09-Dec-2021 Joseph Huber <[email protected]>

[OpenMP][FIX] Pass the num_threads value directly to parallel_51

The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. Th

[OpenMP][FIX] Pass the num_threads value directly to parallel_51

The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.

In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D113623

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 025f5492 30-Oct-2021 Shilei Tian <[email protected]>

[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3

The synchronization at the end of parallel region cannot make sure all threads
exit the scope. As a result, the assertions right after it m

[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3

The synchronization at the end of parallel region cannot make sure all threads
exit the scope. As a result, the assertions right after it might be hit, and
further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may
not hold as well. We either add a synchronization right after the parallel region,
or remove the assertions and assuptions. Here we choose the first one as those
assertions and assumptions can help optimizations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112861

show more ...


# 74f91741 18-Oct-2021 Joseph Huber <[email protected]>

[OpenMP] Use function tracing RAII for runtime functions.

This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This

[OpenMP] Use function tracing RAII for runtime functions.

This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This is
enabled by first compiling the new runtime with
`-fopenmp-target-debug=3` and running with
`LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and
thread 0 so there isn't much output when using a generic region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112002

show more ...


# 48877525 20-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP] Remove obsolete external interface for device RT

We do not generate _serialized_parallel calls in device mode, no
need for an external API.

Reviewed By: JonChesterfield

Differential Revis

[OpenMP] Remove obsolete external interface for device RT

We do not generate _serialized_parallel calls in device mode, no
need for an external API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112145

show more ...


# 5102c3c6 20-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Do not adjust the level after the environment was popped

Exiting a data environment will reset all values, it is wrong to adjust
them afterwards.

Reviewed By: tianshilei1992

Differen

[OpenMP][FIX] Do not adjust the level after the environment was popped

Exiting a data environment will reset all values, it is wrong to adjust
them afterwards.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112144

show more ...


# b16aadf0 20-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP] Introduce aligned synchronization into the new device RT

We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some

[OpenMP] Introduce aligned synchronization into the new device RT

We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some more
documentation.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112153

show more ...


# 44710940 08-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Data race in the SPMD execution of the new runtime

We need to synchronize the threads *before* we destroy the RAII objects
that hold the old values and not after to avoid threads execu

[OpenMP][FIX] Data race in the SPMD execution of the new runtime

We need to synchronize the threads *before* we destroy the RAII objects
that hold the old values and not after to avoid threads executing the
parallel region but seeing an inconsistent state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111369

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# d83ca624 23-Sep-2021 Joseph Huber <[email protected]>

[OpenMP] Fix data-race in new device RTL

This patch fixes a data-race observed when using the new device runtime
library. The Internal control variable for the parallel level is read in
the `__kmpc_

[OpenMP] Fix data-race in new device RTL

This patch fixes a data-race observed when using the new device runtime
library. The Internal control variable for the parallel level is read in
the `__kmpc_parallel_51` function while it could potentially be written
by other threads. This causes data corruption and will cause
nondetermistic behaviour in the runtime. This patch fixes this by adding
an explicit synchronization before the region starts.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110366

show more ...


Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# e3ee7624 27-Jul-2021 Joseph Huber <[email protected]>

[Libomptarget] Revert new variable sharing to use the old method

The new method of sharing variables introduces a `__kmpc_alloc_shared` call
that cannot be removed in the middle end because of its n

[Libomptarget] Revert new variable sharing to use the old method

The new method of sharing variables introduces a `__kmpc_alloc_shared` call
that cannot be removed in the middle end because of its non-constant argument
and unconnected free. This patch reverts this to the old method that used a
static amount of shared memory for sharing variables.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106905

show more ...


# 67ab875f 25-Jul-2021 Johannes Doerfert <[email protected]>

[OpenMP] Prototype opt-in new GPU device RTL

The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an

[OpenMP] Prototype opt-in new GPU device RTL

The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.

The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.

The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.

Documentation and debug features are still mostly missing and will be
added over time.

All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.

Differential Revision: https://reviews.llvm.org/D106803

show more ...