History log of /llvm-project-15.0.7/openmp/libomptarget/DeviceRTL/src/State.cpp (Results 1 – 25 of 28)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1
# fd8fd9e5 27-Jul-2022 Joseph Huber <[email protected]>

Revert "[OpenMP] Remove noinline attributes in the device runtime"

The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that

Revert "[OpenMP] Remove noinline attributes in the device runtime"

The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.

This reverts commit d61d72dae604c3258e25c00622b1a85861450303.

Fixes #56752

(cherry picked from commit b08369f7f288b6efb0897953da42ed54e60cfc0b)

show more ...


Revision tags: llvmorg-16-init
# d61d72da 25-Jul-2022 Joseph Huber <[email protected]>

[OpenMP] Remove noinline attributes in the device runtime

We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped

[OpenMP] Remove noinline attributes in the device runtime

We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130368

show more ...


# 1da6ae4b 22-Jul-2022 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Ensure thread and team state are defined properly

The namespaces were missing causing the symbols to have "C" mangling.
To avoid this in the future we qualify the names now fully.


# d1501526 18-Jul-2022 Johannes Doerfert <[email protected]>

[OpenMP] Introduce more fine-grained control over the thread state use

We can help optimizations by making sure we use the team state whenever
it is clear there is no thread state. To this end we in

[OpenMP] Introduce more fine-grained control over the thread state use

We can help optimizations by making sure we use the team state whenever
it is clear there is no thread state. To this end we introduce a new
state flag (`state::HasThreadState`) and explicit control for the
`state::ValueRAII` helpers, including a dedicated "assert equal".

Differential Revision: https://reviews.llvm.org/D130113

show more ...


# a42361dc 19-Jul-2022 Johannes Doerfert <[email protected]>

[OpenMP] Expose the state in the header to allow non-lto optimizations

We used to inline the `lookup` calls such that the runtime had "known"
access offsets when it was shipped. With the new static

[OpenMP] Expose the state in the header to allow non-lto optimizations

We used to inline the `lookup` calls such that the runtime had "known"
access offsets when it was shipped. With the new static library build it
doesn't as the lookup is an indirection we cannot look through. This
should help us optimize the code better until we can do LTO for the
runtime again.

Differential Revision: https://reviews.llvm.org/D130111

show more ...


Revision tags: llvmorg-14.0.6
# 616dd9ae 22-Jun-2022 Jose M Monsalve Diaz <[email protected]>

[OpenMP] Implementing omp_get_device_num()

This patch implements omp_get_device_num() in the host and the device.

It uses the already existing getDeviceNum in the device config for the device.
And

[OpenMP] Implementing omp_get_device_num()

This patch implements omp_get_device_num() in the host and the device.

It uses the already existing getDeviceNum in the device config for the device.
And in the host it uses the omp_get_num_devices().

Two simple tests added

Differential Revision: https://reviews.llvm.org/D128347

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# 20ec4161 20-May-2022 Joseph Huber <[email protected]>

[Libomptarget] Add branch prediction intrinsic to state check

Summary:
We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch
prediction intrinsics to help the optimizer ignore unl

[Libomptarget] Add branch prediction intrinsic to state check

Summary:
We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch
prediction intrinsics to help the optimizer ignore unlikely loops. This
wasn't applied to this one loop so add that in.

show more ...


# ce0caf41 10-May-2022 Joseph Huber <[email protected]>

[Libomptarget] Address existing warnings in the device runtime library

This patche attemps to address the current warnings in the OpenMP
offloading device runtime. Previously we did not see these be

[Libomptarget] Address existing warnings in the device runtime library

This patche attemps to address the current warnings in the OpenMP
offloading device runtime. Previously we did not see these because we
compiled the runtime without the standard warning flags enabled.
However, these warnings are used when we now build the static library
version of this runtime. This became extremely noisy when coupled with
the fact the we compile each file roughly 32 times when all the
architectures are considered. So it would be ideal to not have all these
warnings show up when building.

Most of these errors were simply implicit switch-case fallthroughs,
which can be addressed using C++17's fallthrough attribute. Additionally
there was a volatile variable that was being casted away. This is most
likely safe to remove because we cast it away before its even used and
didn't seem to affect anything in testing.

Depends on D125260

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D125339

show more ...


# b4f8443d 09-May-2022 Joseph Huber <[email protected]>

[Libomptarget] Allow the device runtime to be compiled for the host

Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This

[Libomptarget] Allow the device runtime to be compiled for the host

Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This is
problematic if we want to make compiling the device runtime more general
via the standar `clang` driver rather than invoking the clang front-end
directly. This patch addresses this by primarily changing the declare
type to `nohost` so the host will not contain any of this code.
Additionally we forward declare the functions that are defined via
variants, otherwise these would cause problems on the host.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D125260

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# a3f423cf 06-Apr-2022 Joseph Huber <[email protected]>

[OpenMP] Add dynamic memory function to omp.h and add documentation

This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to
the `omp.h` header file so users can access it by default.

[OpenMP] Add dynamic memory function to omp.h and add documentation

This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to
the `omp.h` header file so users can access it by default. Also changed
the name to keep it consistent with the other target allocators. Added
some documentation so users know how to use it. Didn't add the interface
for Fortran since there's no way to test it right now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D123246

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# 5dd0c396 23-Feb-2022 Joseph Huber <[email protected]>

[Libomptarget][NFC} Fix missing newline in error message


# 0870a4f5 18-Feb-2022 Joseph Huber <[email protected]>

[OpenMP] Add flag for disabling thread state in runtime

The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it

[OpenMP] Add flag for disabling thread state in runtime

The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120106

show more ...


# 57b4c526 14-Feb-2022 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Eliminate race on the IsSPMD global

The `IsSPMD` global can only be read by threads other than the main
thread *after* initialization is complete. To allow usage of
`mapping::getBlockS

[OpenMP][FIX] Eliminate race on the IsSPMD global

The `IsSPMD` global can only be read by threads other than the main
thread *after* initialization is complete. To allow usage of
`mapping::getBlockSize` before initialization is done, we can pass the
`IsSPMD` state explicitly. This is similar to other APIs that take
`IsSPMD` explicitly to avoid such a race, e.g.,
`mapping::isInitialThreadInLevel0(IsSPMD)`

Fixes https://github.com/llvm/llvm-project/issues/53857

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# fd5853da 31-Jan-2022 Joseph Huber <[email protected]>

[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded

Reduces the shared memory size used for globalization to 512 bytes from
2048 to reduce the pressure on shared

[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded

Reduces the shared memory size used for globalization to 512 bytes from
2048 to reduce the pressure on shared memory. This patch ado adds a
debug mesage to indicate when the shared memory was insufficient.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118625

show more ...


# 1e121568 27-Jan-2022 Johannes Doerfert <[email protected]>

[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions

IdentTy objects are useful for debugging and profiling so we want to
keep them around in more places, especially those that have

[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions

IdentTy objects are useful for debugging and profiling so we want to
keep them around in more places, especially those that have a large
impact on performance, e.g., everything related to state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112494

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 7cdaa5a9 17-Dec-2021 Joseph Huber <[email protected]>

[OpenMP][FIX] Change globalization alignment to 16

This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to

[OpenMP][FIX] Change globalization alignment to 16

This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.

Depends on D115888

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115971

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 374cd0fb 16-Nov-2021 Joseph Huber <[email protected]>

[OpenMP] Fix initializer not working on AMDGPU

The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supp

[OpenMP] Fix initializer not working on AMDGPU

The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D113963

show more ...


# ccb5d272 30-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Avoid a race between initialization and first state reads

When we pick state 0 to initialize state but thread N is going to be the
"main thread", in generic mode, we would require extr

[OpenMP][FIX] Avoid a race between initialization and first state reads

When we pick state 0 to initialize state but thread N is going to be the
"main thread", in generic mode, we would require extra synchronization.
Instead, we should pick the main thread to initialize state in generic
mode and any thread in SPMD mode.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112874

show more ...


# 6dd791bc 18-Oct-2021 Joseph Huber <[email protected]>

[OpenMP] Check output of malloc in the device for debug

A common problem is the device running out of global heap memory and
crashing due to a nullptr dereference when using the data sharing stack.

[OpenMP] Check output of malloc in the device for debug

A common problem is the device running out of global heap memory and
crashing due to a nullptr dereference when using the data sharing stack.
This explicitly checks that a nullptr was not returned by malloc when
debugging field 1 is enabled.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112005

show more ...


# 74f91741 18-Oct-2021 Joseph Huber <[email protected]>

[OpenMP] Use function tracing RAII for runtime functions.

This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This

[OpenMP] Use function tracing RAII for runtime functions.

This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This is
enabled by first compiling the new runtime with
`-fopenmp-target-debug=3` and running with
`LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and
thread 0 so there isn't much output when using a generic region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112002

show more ...


# b16aadf0 20-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP] Introduce aligned synchronization into the new device RT

We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some

[OpenMP] Introduce aligned synchronization into the new device RT

We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some more
documentation.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112153

show more ...


# 4c88341d 16-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Do check the level before return team size

The team size could/should be an ICV but since we know it is either 1 or
a value we can leave it in the team state for now. However, we still

[OpenMP][FIX] Do check the level before return team size

The team size could/should be an ICV but since we know it is either 1 or
a value we can leave it in the team state for now. However, we still
need to determine if the current level is nested before we use it.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D111949

show more ...


# dc729609 16-Oct-2021 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Do not dereference a potential nullptr

The first thread state in the new GPU runtime doesn't have a previous
one and we should not dereference the nullptr placeholder.

Reviewed By: ti

[OpenMP][FIX] Do not dereference a potential nullptr

The first thread state in the new GPU runtime doesn't have a previous
one and we should not dereference the nullptr placeholder.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111946

show more ...


# 208f9005 01-Oct-2021 Joseph Huber <[email protected]>

[Libomptarget] Add an external interface to dynamic shared memory

This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
`

[Libomptarget] Add an external interface to dynamic shared memory

This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
``llvm_omp_get_dynamic_shared``. This includes a host-side
definition that only returns a null pointer so that it can be used when
host-fallback is enabled without crashing. Support for dynamic shared
memory was also ported to the old device runtime.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110957

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# f1c821fa 17-Sep-2021 Joseph Huber <[email protected]>

[OpenMP] Add support for dynamic shared memory in new RTL

This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return

[OpenMP] Add support for dynamic shared memory in new RTL

This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006

show more ...


12