rtl.cpp - OpenGrok history log for /llvm-project-15.0.7/openmp/libomptarget/plugins/cuda/src/rtl.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3
# 0b12f770	10-Aug-2022	Joseph Huber <[email protected]>	[Libomptarget][CUDA] Check CUDA compatibilty correctly We recently added support for multi-architecture binaries in libomptarget. This is done by extracting the architecture from the embedded image [Libomptarget][CUDA] Check CUDA compatibilty correctly We recently added support for multi-architecture binaries in libomptarget. This is done by extracting the architecture from the embedded image and comparing it with the major and minor version supported by the current CUDA installation. Previously we just compared these directly, which was not correct for binary compatibility. The CUDA documentation states that we can consider any image with an equivalent major or a greater or equal to minor compatible with the current image. Change the check to use this new logic in the CUDA plugin. Fixes #57049 Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D131567 (cherry picked from commit fdbb15355e7977b914cbd7e753b5e909d735ad83) show more ...
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# cfa6e79d	22-Jul-2022	Joel E. Denny <[email protected]>	[Libomptarget] Don't report lack of CUDA devices Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics about a lack of CUDA devices before an application runs: ``` $ clang -fopenmp -f [Libomptarget] Don't report lack of CUDA devices Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics about a lack of CUDA devices before an application runs: ``` $ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c $ ./a.out CUDA error: Error returned from cuInit CUDA error: no CUDA-capable device is detected Hello World: 4 ``` This can happen when the CUDA plugin was built but all CUDA devices are currently disabled in some manner, perhaps because `CUDA_VISIBLE_DEVICES` is set to the empty string. As shown in the above example, it can even happen when we haven't compiled the application for offloading to CUDA. The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp` appears to be intended to handle this case, and it chooses not to write a diagnostic to stderr unless debugging is enabled: ``` if (NumberOfDevices == 0) { DP("There are no devices supporting CUDA.\n"); return; } ``` The problem is that the above code is never reached because the earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`. This patch handles that `cuInit` case in the same manner as the above code handles the `NumberOfDevices == 0` case. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D130371 show more ...
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# e01ce4e8	10-Jun-2022	Joseph Huber <[email protected]>	[Libomptarget] Add checks for CUDA subarchitecture using new info This patch extends the `is_valid_binary` routine to also check if the binary's architecture string matches the one parsed from the r [Libomptarget] Add checks for CUDA subarchitecture using new info This patch extends the `is_valid_binary` routine to also check if the binary's architecture string matches the one parsed from the runtime. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for CUDA. Depends on D127432 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D127505 show more ...
# 1f940b69	15-Jul-2022	Joseph Huber <[email protected]>	[Libomptarget][NFC] Fix signed comparison warnings Summary: Non-functional change, just fixing some sign comparison warnings by making both match.
# d27d0a67	01-Jul-2022	Joseph Huber <[email protected]>	[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly la [Libomptarget][NFC] Make Libomptarget use the LLVM naming convention Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly large clash between the naming conventions used. This patch fixes most of the variable names that did not confrom to the LLVM standard, that is `VariableName` for variables and `functionName` for functions. This patch was primarily done using my editor's linting messages, if there are any issues I missed arising from the automation let me know. Reviewed By: saiislam Differential Revision: https://reviews.llvm.org/D128997 show more ...
# 696bca9b	01-Jul-2022	Shilei Tian <[email protected]>	[NFC][OpenMP][CUDA] Remove unnecessary default label
# 2695e23a	28-Jun-2022	Shilei Tian <[email protected]>	[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In a [OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122764 show more ...
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# a3a42c3c	09-Apr-2022	Johannes Doerfert <[email protected]>	[OpenMP][FIX] Ensure to set the context for wait events if necessary Differential Revision: https://reviews.llvm.org/D123445
# ba93e4e3	25-Mar-2022	Johannes Doerfert <[email protected]>	[OpenMP][NFC] Add missing virtual destructor to silence warning
# 545fcc3d	26-Mar-2022	Shilei Tian <[email protected]>	[OpenMP][CUDA] Fix potential program crash caused by double free resources As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical res [OpenMP][CUDA] Fix potential program crash caused by double free resources As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical resources on the two sides of `Next` mark. It is usually not an issue, unless the following case: 1. Some resources are not returned. 2. We need to iterate the pool and free the element. That will cause double free, which is the case for event pool. Since we don't release events hold by the data map, it can happen that the `Next` mark is not reset, and we have two identical items in the pool. When the pool is destroyed, we will call `cuEventDestroy` twice on the same event. In the best case, we can only observe CUDA errors. In the worst case, it can cause internal failures in CUDART and further crash. This patch fixes the issue by tracking all resources that have been given using an `unordered_set`. We don't remove it when a resource is returned. When the pool is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make sure that the set contains all resources allocated from the device. We just need to iterate the set and free the resource accordingly. For now, only event pool is set to use it. Stream pool is not because we can make sure all streams are returned when the plugin is destroyed. Someone might be wondering, why don't we release all events hold in the data map. That is because, plugins are determined to be destroyed before `libomptarget`. If we can somehow make the plugin outlast `libomptarget`, life will be much easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122014 show more ...
# 6c2be885	25-Mar-2022	Johannes Doerfert <[email protected]>	Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning" This reverts commit b9fd8f34ae547674ac0b5f5fbc5bb66d2bc0fedb as it accidentally contained a unit test change that is not fini Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning" This reverts commit b9fd8f34ae547674ac0b5f5fbc5bb66d2bc0fedb as it accidentally contained a unit test change that is not finished (and unrelated). show more ...
# b9fd8f34	25-Mar-2022	Johannes Doerfert <[email protected]>	[OpenMP][NFC] Add missing virtual destructor to silence warning
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# f6639a42	09-Mar-2022	Shilei Tian <[email protected]>	[OpenMP][CUDA] Fix the check of `setContext`
# 39d3283a	09-Mar-2022	Shilei Tian <[email protected]>	[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we [OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we need to resize the pool, we need to get from allocator. Each call to allocate sets the current context once, which is unnecessary. In this patch, we set the context only in the entry interface functions, if needed. Actually in the best way this should be implemented via RAII, but since `cuCtxSetCurrent` could return error, and we don't use exception, we can't stop the execution if RAII fails. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121322 show more ...
# 5105c7cd	09-Mar-2022	Shilei Tian <[email protected]>	[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten This patch fixes the issue introduced in 14de0820e87f and D120089, that if dynamic libraries are used, the `CUmodule` ar [OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten This patch fixes the issue introduced in 14de0820e87f and D120089, that if dynamic libraries are used, the `CUmodule` array could be overwritten. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121308 show more ...
# 14de0820	09-Mar-2022	Johannes Doerfert <[email protected]>	[OpenMP][FIX] Ensure the modules vector is filled as others are The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is t [OpenMP][FIX] Ensure the modules vector is filled as others are The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is to treat it like we do all the other vectors. show more ...
Revision tags: llvmorg-14.0.0-rc2
# 1660288b	18-Feb-2022	Johannes Doerfert <[email protected]>	[OpenMP][CUDA] Use one event pool per device An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy t [OpenMP][CUDA] Use one event pool per device An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy the former after the latter. Also, CUDA documentation states streams and events need to be associated with the same context, which we did not ensure at all. Differential Revision: https://reviews.llvm.org/D120142 show more ...
# 10aa83ff	17-Feb-2022	Johannes Doerfert <[email protected]>	[OpenMP] Allow to explicitly deinitialize device resources There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destru [OpenMP] Allow to explicitly deinitialize device resources There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destruction. This patch should ensure the CUDA plugin is less fragile if something during the deinitialization goes wrong. 2) We need to support (hard) pause runtime calls eventually. This patch allows us to free all associated resources, though we cannot reinitialize the device yet. Follow up patch will associate one event pool per device/context. Differential Revision: https://reviews.llvm.org/D120089 show more ...
# aca33b0b	10-Feb-2022	Shilei Tian <[email protected]>	[OpenMP][CUDA] Remove the hard team limit Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as [OpenMP][CUDA] Remove the hard team limit Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119313 show more ...
# f6685f77	10-Feb-2022	Shilei Tian <[email protected]>	[OpenMP][CUDA] Refine the logic to determine grid size This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater tha [OpenMP][CUDA] Refine the logic to determine grid size This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual hardware limit. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119311 show more ...
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# f44e41af	27-Jan-2022	Sri Hari Krishna Narayanan <[email protected]>	Runtime for Interop directive This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works onl Runtime for Interop directive This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works only for GPUs and has several TODOs that should be addressed going forward. Reviewed By: RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D106674 show more ...
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 943d1d83	28-Dec-2021	Shilei Tian <[email protected]>	[OpenMP][CUDA] Add resource pool for CUevent Following D111954, this patch adds the resource pool for CUevent. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D116315
# 357c8031	28-Dec-2021	Shilei Tian <[email protected]>	[OpenMP][Plugin] Minor adjustments to ResourcePool This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new i [OpenMP][Plugin] Minor adjustments to ResourcePool This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new interface function `clear` to release all hold resources. - If initial size is 0, resize to 1 when the first request is encountered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116340 show more ...
# a697a0a4	27-Dec-2021	Shilei Tian <[email protected]>	[OpenMP][Plugin] Introduce generic resource pool Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event, [OpenMP][Plugin] Introduce generic resource pool Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event, will be hold by `libomptarget`. It is always good to buffer those resources. What's more important, given the way that `libomptarget` and plugins are connected, we cannot make sure whether plugins are still alive when `libomptarget` is destroyed. That leads to an issue that those resouces hold by `libomptarget` might not be released correctly. As a result, we need an unified management of all the resources that can be shared between `libomptarget` and plugins. `ResourcePoolTy` is designed to manage the type of resource for one device. It has to work with an allocator which is supposed to provide `create` and `destroy`. In this way, when the plugin is destroyed, we can make sure that all resources allocated from native runtime library will be released correctly, no matter whether `libomptarget` starts its destroy. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111954 show more ...
Revision tags: llvmorg-13.0.1-rc1
# b1ce4549	19-Oct-2021	Joseph Huber <[email protected]>	[OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This [OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083 show more ...
12 3 4