|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1 |
|
| #
fd8fd9e5 |
| 27-Jul-2022 |
Joseph Huber <[email protected]> |
Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that
Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that when we use `-mlink-builtin-bitcode` we only import used symbols from the runtime. Then OpenMPOpt will insert calls to symbols that were not previously included. This patch removed this implicit behaviour as these functions were kept alive by the `noinline` simply because it kept calls to them in the module. This caused regression in some tests that relied on some OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but will try to fix it more correctly on main.
This reverts commit d61d72dae604c3258e25c00622b1a85861450303.
Fixes #56752
(cherry picked from commit b08369f7f288b6efb0897953da42ed54e60cfc0b)
show more ...
|
|
Revision tags: llvmorg-16-init |
|
| #
d61d72da |
| 25-Jul-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped
[OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped immediately in the OpenMPOpt module pass. However, Since the changes in D130298, we not explicitly state which functions will have external visiblity in the bitcode library. Additionally the OpenMPOpt module pass should run before the inliner pass, so this shouldn't make a difference in whether or not the functions will be alive for the initial pass of OpenMPOpt. This should simplify the interface, and additionally save time spend on scanning funciton names for noinline.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130368
show more ...
|
| #
1da6ae4b |
| 22-Jul-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Ensure thread and team state are defined properly
The namespaces were missing causing the symbols to have "C" mangling. To avoid this in the future we qualify the names now fully.
|
| #
d1501526 |
| 18-Jul-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP] Introduce more fine-grained control over the thread state use
We can help optimizations by making sure we use the team state whenever it is clear there is no thread state. To this end we in
[OpenMP] Introduce more fine-grained control over the thread state use
We can help optimizations by making sure we use the team state whenever it is clear there is no thread state. To this end we introduce a new state flag (`state::HasThreadState`) and explicit control for the `state::ValueRAII` helpers, including a dedicated "assert equal".
Differential Revision: https://reviews.llvm.org/D130113
show more ...
|
| #
a42361dc |
| 19-Jul-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP] Expose the state in the header to allow non-lto optimizations
We used to inline the `lookup` calls such that the runtime had "known" access offsets when it was shipped. With the new static
[OpenMP] Expose the state in the header to allow non-lto optimizations
We used to inline the `lookup` calls such that the runtime had "known" access offsets when it was shipped. With the new static library build it doesn't as the lookup is an indirection we cannot look through. This should help us optimize the code better until we can do LTO for the runtime again.
Differential Revision: https://reviews.llvm.org/D130111
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
616dd9ae |
| 22-Jun-2022 |
Jose M Monsalve Diaz <[email protected]> |
[OpenMP] Implementing omp_get_device_num()
This patch implements omp_get_device_num() in the host and the device.
It uses the already existing getDeviceNum in the device config for the device. And
[OpenMP] Implementing omp_get_device_num()
This patch implements omp_get_device_num() in the host and the device.
It uses the already existing getDeviceNum in the device config for the device. And in the host it uses the omp_get_num_devices().
Two simple tests added
Differential Revision: https://reviews.llvm.org/D128347
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
20ec4161 |
| 20-May-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Add branch prediction intrinsic to state check
Summary: We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch prediction intrinsics to help the optimizer ignore unl
[Libomptarget] Add branch prediction intrinsic to state check
Summary: We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch prediction intrinsics to help the optimizer ignore unlikely loops. This wasn't applied to this one loop so add that in.
show more ...
|
| #
ce0caf41 |
| 10-May-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Address existing warnings in the device runtime library
This patche attemps to address the current warnings in the OpenMP offloading device runtime. Previously we did not see these be
[Libomptarget] Address existing warnings in the device runtime library
This patche attemps to address the current warnings in the OpenMP offloading device runtime. Previously we did not see these because we compiled the runtime without the standard warning flags enabled. However, these warnings are used when we now build the static library version of this runtime. This became extremely noisy when coupled with the fact the we compile each file roughly 32 times when all the architectures are considered. So it would be ideal to not have all these warnings show up when building.
Most of these errors were simply implicit switch-case fallthroughs, which can be addressed using C++17's fallthrough attribute. Additionally there was a volatile variable that was being casted away. This is most likely safe to remove because we cast it away before its even used and didn't seem to affect anything in testing.
Depends on D125260
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D125339
show more ...
|
| #
b4f8443d |
| 09-May-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Allow the device runtime to be compiled for the host
Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This
[Libomptarget] Allow the device runtime to be compiled for the host
Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This is problematic if we want to make compiling the device runtime more general via the standar `clang` driver rather than invoking the clang front-end directly. This patch addresses this by primarily changing the declare type to `nohost` so the host will not contain any of this code. Additionally we forward declare the functions that are defined via variants, otherwise these would cause problems on the host.
Reviewed By: jdoerfert, tianshilei1992
Differential Revision: https://reviews.llvm.org/D125260
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
a3f423cf |
| 06-Apr-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Add dynamic memory function to omp.h and add documentation
This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to the `omp.h` header file so users can access it by default.
[OpenMP] Add dynamic memory function to omp.h and add documentation
This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to the `omp.h` header file so users can access it by default. Also changed the name to keep it consistent with the other target allocators. Added some documentation so users know how to use it. Didn't add the interface for Fortran since there's no way to test it right now.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D123246
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
5dd0c396 |
| 23-Feb-2022 |
Joseph Huber <[email protected]> |
[Libomptarget][NFC} Fix missing newline in error message
|
| #
0870a4f5 |
| 18-Feb-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Add flag for disabling thread state in runtime
The runtime uses thread state values to indicate when we use an ICV or are in nested parallelism. This is done for OpenMP correctness, but it
[OpenMP] Add flag for disabling thread state in runtime
The runtime uses thread state values to indicate when we use an ICV or are in nested parallelism. This is done for OpenMP correctness, but it not needed in the majority of cases. The new flag added is `-fopenmp-assume-no-thread-state`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120106
show more ...
|
| #
57b4c526 |
| 14-Feb-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Eliminate race on the IsSPMD global
The `IsSPMD` global can only be read by threads other than the main thread *after* initialization is complete. To allow usage of `mapping::getBlockS
[OpenMP][FIX] Eliminate race on the IsSPMD global
The `IsSPMD` global can only be read by threads other than the main thread *after* initialization is complete. To allow usage of `mapping::getBlockSize` before initialization is done, we can pass the `IsSPMD` state explicitly. This is similar to other APIs that take `IsSPMD` explicitly to avoid such a race, e.g., `mapping::isInitialThreadInLevel0(IsSPMD)`
Fixes https://github.com/llvm/llvm-project/issues/53857
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
fd5853da |
| 31-Jan-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded
Reduces the shared memory size used for globalization to 512 bytes from 2048 to reduce the pressure on shared
[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded
Reduces the shared memory size used for globalization to 512 bytes from 2048 to reduce the pressure on shared memory. This patch ado adds a debug mesage to indicate when the shared memory was insufficient.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D118625
show more ...
|
| #
1e121568 |
| 27-Jan-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions
IdentTy objects are useful for debugging and profiling so we want to keep them around in more places, especially those that have
[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions
IdentTy objects are useful for debugging and profiling so we want to keep them around in more places, especially those that have a large impact on performance, e.g., everything related to state.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112494
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
7cdaa5a9 |
| 17-Dec-2021 |
Joseph Huber <[email protected]> |
[OpenMP][FIX] Change globalization alignment to 16
This patch changes the default aligntment from 8 to 16, and encodes this information in the `__kmpc_alloc_shared` runtime call to communicate it to
[OpenMP][FIX] Change globalization alignment to 16
This patch changes the default aligntment from 8 to 16, and encodes this information in the `__kmpc_alloc_shared` runtime call to communicate it to the HeapToStack pass. The previous alignment of 8 was not sufficient for the maximum size of primitive types on 64-bit systems, and needs to be increaesd. This reduces the amount of space availible in the data sharing stack, so this implementation will need to be improved later to include the alignment requirements in the allocation call, and use it properly in the data sharing stack in the runtime.
Depends on D115888
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D115971
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
374cd0fb |
| 16-Nov-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Fix initializer not working on AMDGPU
The RAII class used for debugging RTL entry used a shared variable to keep track of the current depth. This used a global initializer, which isn't supp
[OpenMP] Fix initializer not working on AMDGPU
The RAII class used for debugging RTL entry used a shared variable to keep track of the current depth. This used a global initializer, which isn't supported on AMDGPU. This patch removes the initializer and instead sets it to zero when the state is initialized in the runtime.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D113963
show more ...
|
| #
ccb5d272 |
| 30-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Avoid a race between initialization and first state reads
When we pick state 0 to initialize state but thread N is going to be the "main thread", in generic mode, we would require extr
[OpenMP][FIX] Avoid a race between initialization and first state reads
When we pick state 0 to initialize state but thread N is going to be the "main thread", in generic mode, we would require extra synchronization. Instead, we should pick the main thread to initialize state in generic mode and any thread in SPMD mode.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112874
show more ...
|
| #
6dd791bc |
| 18-Oct-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Check output of malloc in the device for debug
A common problem is the device running out of global heap memory and crashing due to a nullptr dereference when using the data sharing stack.
[OpenMP] Check output of malloc in the device for debug
A common problem is the device running out of global heap memory and crashing due to a nullptr dereference when using the data sharing stack. This explicitly checks that a nullptr was not returned by malloc when debugging field 1 is enabled.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112005
show more ...
|
| #
74f91741 |
| 18-Oct-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This
[OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112002
show more ...
|
| #
b16aadf0 |
| 20-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some
[OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112153
show more ...
|
| #
4c88341d |
| 16-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Do check the level before return team size
The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still
[OpenMP][FIX] Do check the level before return team size
The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still need to determine if the current level is nested before we use it.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D111949
show more ...
|
| #
dc729609 |
| 16-Oct-2021 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Do not dereference a potential nullptr
The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder.
Reviewed By: ti
[OpenMP][FIX] Do not dereference a potential nullptr
The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D111946
show more ...
|
| #
208f9005 |
| 01-Oct-2021 |
Joseph Huber <[email protected]> |
[Libomptarget] Add an external interface to dynamic shared memory
This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is `
[Libomptarget] Add an external interface to dynamic shared memory
This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D110957
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
f1c821fa |
| 17-Sep-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new device runtime. The new function `__kmpc_get_dynamic_shared` will return
[OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new device runtime. The new function `__kmpc_get_dynamic_shared` will return a pointer to the buffer of dynamic shared memory. Currently the amount of memory allocated is set by an environment variable.
In the future this amount will be added to the amount used for the smart stack which will be configured in a similar way.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D110006
show more ...
|