|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
| #
0b12f770 |
| 10-Aug-2022 |
Joseph Huber <[email protected]> |
[Libomptarget][CUDA] Check CUDA compatibilty correctly
We recently added support for multi-architecture binaries in libomptarget. This is done by extracting the architecture from the embedded image
[Libomptarget][CUDA] Check CUDA compatibilty correctly
We recently added support for multi-architecture binaries in libomptarget. This is done by extracting the architecture from the embedded image and comparing it with the major and minor version supported by the current CUDA installation. Previously we just compared these directly, which was not correct for binary compatibility. The CUDA documentation states that we can consider any image with an equivalent major or a greater or equal to minor compatible with the current image. Change the check to use this new logic in the CUDA plugin.
Fixes #57049
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D131567
(cherry picked from commit fdbb15355e7977b914cbd7e753b5e909d735ad83)
show more ...
|
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
cfa6e79d |
| 22-Jul-2022 |
Joel E. Denny <[email protected]> |
[Libomptarget] Don't report lack of CUDA devices
Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics about a lack of CUDA devices before an application runs:
``` $ clang -fopenmp -f
[Libomptarget] Don't report lack of CUDA devices
Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics about a lack of CUDA devices before an application runs:
``` $ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c $ ./a.out CUDA error: Error returned from cuInit CUDA error: no CUDA-capable device is detected Hello World: 4 ```
This can happen when the CUDA plugin was built but all CUDA devices are currently disabled in some manner, perhaps because `CUDA_VISIBLE_DEVICES` is set to the empty string. As shown in the above example, it can even happen when we haven't compiled the application for offloading to CUDA.
The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp` appears to be intended to handle this case, and it chooses not to write a diagnostic to stderr unless debugging is enabled:
``` if (NumberOfDevices == 0) { DP("There are no devices supporting CUDA.\n"); return; } ```
The problem is that the above code is never reached because the earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`. This patch handles that `cuInit` case in the same manner as the above code handles the `NumberOfDevices == 0` case.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130371
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
e01ce4e8 |
| 10-Jun-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Add checks for CUDA subarchitecture using new info
This patch extends the `is_valid_binary` routine to also check if the binary's architecture string matches the one parsed from the r
[Libomptarget] Add checks for CUDA subarchitecture using new info
This patch extends the `is_valid_binary` routine to also check if the binary's architecture string matches the one parsed from the runtime. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for CUDA.
Depends on D127432
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D127505
show more ...
|
| #
1f940b69 |
| 15-Jul-2022 |
Joseph Huber <[email protected]> |
[Libomptarget][NFC] Fix signed comparison warnings
Summary: Non-functional change, just fixing some sign comparison warnings by making both match.
|
| #
d27d0a67 |
| 01-Jul-2022 |
Joseph Huber <[email protected]> |
[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention
Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly la
[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention
Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly large clash between the naming conventions used. This patch fixes most of the variable names that did not confrom to the LLVM standard, that is `VariableName` for variables and `functionName` for functions.
This patch was primarily done using my editor's linting messages, if there are any issues I missed arising from the automation let me know.
Reviewed By: saiislam
Differential Revision: https://reviews.llvm.org/D128997
show more ...
|
| #
696bca9b |
| 01-Jul-2022 |
Shilei Tian <[email protected]> |
[NFC][OpenMP][CUDA] Remove unnecessary default label
|
| #
2695e23a |
| 28-Jun-2022 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work
This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In a
[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work
This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122764
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
a3a42c3c |
| 09-Apr-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Ensure to set the context for wait events if necessary
Differential Revision: https://reviews.llvm.org/D123445
|
| #
ba93e4e3 |
| 25-Mar-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][NFC] Add missing virtual destructor to silence warning
|
| #
545fcc3d |
| 26-Mar-2022 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Fix potential program crash caused by double free resources
As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical res
[OpenMP][CUDA] Fix potential program crash caused by double free resources
As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical resources on the two sides of `Next` mark. It is usually not an issue, unless the following case: 1. Some resources are not returned. 2. We need to iterate the pool and free the element.
That will cause double free, which is the case for event pool. Since we don't release events hold by the data map, it can happen that the `Next` mark is not reset, and we have two identical items in the pool. When the pool is destroyed, we will call `cuEventDestroy` twice on the same event. In the best case, we can only observe CUDA errors. In the worst case, it can cause internal failures in CUDART and further crash.
This patch fixes the issue by tracking all resources that have been given using an `unordered_set`. We don't remove it when a resource is returned. When the pool is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make sure that the set contains all resources allocated from the device. We just need to iterate the set and free the resource accordingly.
For now, only event pool is set to use it. Stream pool is not because we can make sure all streams are returned when the plugin is destroyed.
Someone might be wondering, why don't we release all events hold in the data map. That is because, plugins are determined to be destroyed *before* `libomptarget`. If we can somehow make the plugin outlast `libomptarget`, life will be much easier.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122014
show more ...
|
| #
6c2be885 |
| 25-Mar-2022 |
Johannes Doerfert <[email protected]> |
Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning"
This reverts commit b9fd8f34ae547674ac0b5f5fbc5bb66d2bc0fedb as it accidentally contained a unit test change that is not fini
Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning"
This reverts commit b9fd8f34ae547674ac0b5f5fbc5bb66d2bc0fedb as it accidentally contained a unit test change that is not finished (and unrelated).
show more ...
|
| #
b9fd8f34 |
| 25-Mar-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][NFC] Add missing virtual destructor to silence warning
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
f6639a42 |
| 09-Mar-2022 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Fix the check of `setContext`
|
| #
39d3283a |
| 09-Mar-2022 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly
Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we
[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly
Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we need to resize the pool, we need to get from allocator. Each call to allocate sets the current context once, which is unnecessary. In this patch, we set the context only in the entry interface functions, if needed. Actually in the best way this should be implemented via RAII, but since `cuCtxSetCurrent` could return error, and we don't use exception, we can't stop the execution if RAII fails.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D121322
show more ...
|
| #
5105c7cd |
| 09-Mar-2022 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten
This patch fixes the issue introduced in 14de0820e87f and D120089, that if dynamic libraries are used, the `CUmodule` ar
[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten
This patch fixes the issue introduced in 14de0820e87f and D120089, that if dynamic libraries are used, the `CUmodule` array could be overwritten.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D121308
show more ...
|
| #
14de0820 |
| 09-Mar-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][FIX] Ensure the modules vector is filled as others are
The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is t
[OpenMP][FIX] Ensure the modules vector is filled as others are
The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is to treat it like we do all the other vectors.
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
1660288b |
| 18-Feb-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP][CUDA] Use one event pool per device
An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy t
[OpenMP][CUDA] Use one event pool per device
An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy the former after the latter. Also, CUDA documentation states streams and events need to be associated with the same context, which we did not ensure at all.
Differential Revision: https://reviews.llvm.org/D120142
show more ...
|
| #
10aa83ff |
| 17-Feb-2022 |
Johannes Doerfert <[email protected]> |
[OpenMP] Allow to explicitly deinitialize device resources
There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destru
[OpenMP] Allow to explicitly deinitialize device resources
There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destruction. This patch should ensure the CUDA plugin is less fragile if something during the deinitialization goes wrong. 2) We need to support (hard) pause runtime calls eventually. This patch allows us to free all associated resources, though we cannot reinitialize the device yet.
Follow up patch will associate one event pool per device/context.
Differential Revision: https://reviews.llvm.org/D120089
show more ...
|
| #
aca33b0b |
| 10-Feb-2022 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Remove the hard team limit
Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as
[OpenMP][CUDA] Remove the hard team limit
Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D119313
show more ...
|
| #
f6685f77 |
| 10-Feb-2022 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Refine the logic to determine grid size
This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater tha
[OpenMP][CUDA] Refine the logic to determine grid size
This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual hardware limit.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D119311
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
f44e41af |
| 27-Jan-2022 |
Sri Hari Krishna Narayanan <[email protected]> |
Runtime for Interop directive
This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works onl
Runtime for Interop directive
This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works only for GPUs and has several TODOs that should be addressed going forward.
Reviewed By: RaviNarayanaswamy
Differential Revision: https://reviews.llvm.org/D106674
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
943d1d83 |
| 28-Dec-2021 |
Shilei Tian <[email protected]> |
[OpenMP][CUDA] Add resource pool for CUevent
Following D111954, this patch adds the resource pool for CUevent.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D116315
|
| #
357c8031 |
| 28-Dec-2021 |
Shilei Tian <[email protected]> |
[OpenMP][Plugin] Minor adjustments to ResourcePool
This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new i
[OpenMP][Plugin] Minor adjustments to ResourcePool
This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new interface function `clear` to release all hold resources. - If initial size is 0, resize to 1 when the first request is encountered.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D116340
show more ...
|
| #
a697a0a4 |
| 27-Dec-2021 |
Shilei Tian <[email protected]> |
[OpenMP][Plugin] Introduce generic resource pool
Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event,
[OpenMP][Plugin] Introduce generic resource pool
Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event, will be hold by `libomptarget`. It is always good to buffer those resources. What's more important, given the way that `libomptarget` and plugins are connected, we cannot make sure whether plugins are still alive when `libomptarget` is destroyed. That leads to an issue that those resouces hold by `libomptarget` might not be released correctly. As a result, we need an unified management of all the resources that can be shared between `libomptarget` and plugins.
`ResourcePoolTy` is designed to manage the type of resource for one device. It has to work with an allocator which is supposed to provide `create` and `destroy`. In this way, when the plugin is destroyed, we can make sure that all resources allocated from native runtime library will be released correctly, no matter whether `libomptarget` starts its destroy.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D111954
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
b1ce4549 |
| 19-Oct-2021 |
Joseph Huber <[email protected]> |
[OpenMP] Remove macro guards for device debugging
The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This
[OpenMP] Remove macro guards for device debugging
The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112083
show more ...
|