History log of /llvm-project-15.0.7/openmp/libomptarget/plugins/cuda/src/rtl.cpp (Results 1 – 25 of 89)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3
# 0b12f770 10-Aug-2022 Joseph Huber <[email protected]>

[Libomptarget][CUDA] Check CUDA compatibilty correctly

We recently added support for multi-architecture binaries in
libomptarget. This is done by extracting the architecture from the
embedded image

[Libomptarget][CUDA] Check CUDA compatibilty correctly

We recently added support for multi-architecture binaries in
libomptarget. This is done by extracting the architecture from the
embedded image and comparing it with the major and minor version
supported by the current CUDA installation. Previously we just compared
these directly, which was not correct for binary compatibility. The CUDA
documentation states that we can consider any image with an equivalent
major or a greater or equal to minor compatible with the current image.
Change the check to use this new logic in the CUDA plugin.

Fixes #57049

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D131567

(cherry picked from commit fdbb15355e7977b914cbd7e753b5e909d735ad83)

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# cfa6e79d 22-Jul-2022 Joel E. Denny <[email protected]>

[Libomptarget] Don't report lack of CUDA devices

Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics
about a lack of CUDA devices before an application runs:

```
$ clang -fopenmp -f

[Libomptarget] Don't report lack of CUDA devices

Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics
about a lack of CUDA devices before an application runs:

```
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c
$ ./a.out
CUDA error: Error returned from cuInit
CUDA error: no CUDA-capable device is detected
Hello World: 4
```

This can happen when the CUDA plugin was built but all CUDA devices
are currently disabled in some manner, perhaps because
`CUDA_VISIBLE_DEVICES` is set to the empty string. As shown in the
above example, it can even happen when we haven't compiled the
application for offloading to CUDA.

The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp`
appears to be intended to handle this case, and it chooses not to
write a diagnostic to stderr unless debugging is enabled:

```
if (NumberOfDevices == 0) {
DP("There are no devices supporting CUDA.\n");
return;
}
```

The problem is that the above code is never reached because the
earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`. This patch handles
that `cuInit` case in the same manner as the above code handles the
`NumberOfDevices == 0` case.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130371

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# e01ce4e8 10-Jun-2022 Joseph Huber <[email protected]>

[Libomptarget] Add checks for CUDA subarchitecture using new info

This patch extends the `is_valid_binary` routine to also check if the
binary's architecture string matches the one parsed from the r

[Libomptarget] Add checks for CUDA subarchitecture using new info

This patch extends the `is_valid_binary` routine to also check if the
binary's architecture string matches the one parsed from the runtime.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
CUDA.

Depends on D127432

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D127505

show more ...


# 1f940b69 15-Jul-2022 Joseph Huber <[email protected]>

[Libomptarget][NFC] Fix signed comparison warnings

Summary:
Non-functional change, just fixing some sign comparison warnings by
making both match.


# d27d0a67 01-Jul-2022 Joseph Huber <[email protected]>

[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention

Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly la

[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention

Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly large clash
between the naming conventions used. This patch fixes most of the
variable names that did not confrom to the LLVM standard, that is
`VariableName` for variables and `functionName` for functions.

This patch was primarily done using my editor's linting messages, if
there are any issues I missed arising from the automation let me know.

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D128997

show more ...


# 696bca9b 01-Jul-2022 Shilei Tian <[email protected]>

[NFC][OpenMP][CUDA] Remove unnecessary default label


# 2695e23a 28-Jun-2022 Shilei Tian <[email protected]>

[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work

This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In a

[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work

This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122764

show more ...


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# a3a42c3c 09-Apr-2022 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Ensure to set the context for wait events if necessary

Differential Revision: https://reviews.llvm.org/D123445


# ba93e4e3 25-Mar-2022 Johannes Doerfert <[email protected]>

[OpenMP][NFC] Add missing virtual destructor to silence warning


# 545fcc3d 26-Mar-2022 Shilei Tian <[email protected]>

[OpenMP][CUDA] Fix potential program crash caused by double free resources

As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical res

[OpenMP][CUDA] Fix potential program crash caused by double free resources

As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical resources on the two sides of `Next`
mark. It is usually not an issue, unless the following case:
1. Some resources are not returned.
2. We need to iterate the pool and free the element.

That will cause double free, which is the case for event pool. Since we don't release
events hold by the data map, it can happen that the `Next` mark is not reset, and
we have two identical items in the pool. When the pool is destroyed, we will call
`cuEventDestroy` twice on the same event. In the best case, we can only observe
CUDA errors. In the worst case, it can cause internal failures in CUDART and further
crash.

This patch fixes the issue by tracking all resources that have been given using
an `unordered_set`. We don't remove it when a resource is returned. When the pool
is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make
sure that the set contains all resources allocated from the device. We just need
to iterate the set and free the resource accordingly.

For now, only event pool is set to use it. Stream pool is not because we can make
sure all streams are returned when the plugin is destroyed.

Someone might be wondering, why don't we release all events hold in the data map.
That is because, plugins are determined to be destroyed *before* `libomptarget`.
If we can somehow make the plugin outlast `libomptarget`, life will be much
easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122014

show more ...


# 6c2be885 25-Mar-2022 Johannes Doerfert <[email protected]>

Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning"

This reverts commit b9fd8f34ae547674ac0b5f5fbc5bb66d2bc0fedb as it
accidentally contained a unit test change that is not fini

Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning"

This reverts commit b9fd8f34ae547674ac0b5f5fbc5bb66d2bc0fedb as it
accidentally contained a unit test change that is not finished (and
unrelated).

show more ...


# b9fd8f34 25-Mar-2022 Johannes Doerfert <[email protected]>

[OpenMP][NFC] Add missing virtual destructor to silence warning


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# f6639a42 09-Mar-2022 Shilei Tian <[email protected]>

[OpenMP][CUDA] Fix the check of `setContext`


# 39d3283a 09-Mar-2022 Shilei Tian <[email protected]>

[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly

Currently we set ccontext everywhere accordingly, but that causes many
unnecessary function calls. For example, in the resource pool, if we

[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly

Currently we set ccontext everywhere accordingly, but that causes many
unnecessary function calls. For example, in the resource pool, if we need to
resize the pool, we need to get from allocator. Each call to allocate sets the
current context once, which is unnecessary. In this patch, we set the context
only in the entry interface functions, if needed. Actually in the best way this
should be implemented via RAII, but since `cuCtxSetCurrent` could return error,
and we don't use exception, we can't stop the execution if RAII fails.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121322

show more ...


# 5105c7cd 09-Mar-2022 Shilei Tian <[email protected]>

[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten

This patch fixes the issue introduced in 14de0820e87f and D120089, that
if dynamic libraries are used, the `CUmodule` ar

[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten

This patch fixes the issue introduced in 14de0820e87f and D120089, that
if dynamic libraries are used, the `CUmodule` array could be overwritten.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121308

show more ...


# 14de0820 09-Mar-2022 Johannes Doerfert <[email protected]>

[OpenMP][FIX] Ensure the modules vector is filled as others are

The modules vector was for some reason special which could lead to it
not being of the same size (=num devices). Easiest solution is t

[OpenMP][FIX] Ensure the modules vector is filled as others are

The modules vector was for some reason special which could lead to it
not being of the same size (=num devices). Easiest solution is to treat
it like we do all the other vectors.

show more ...


Revision tags: llvmorg-14.0.0-rc2
# 1660288b 18-Feb-2022 Johannes Doerfert <[email protected]>

[OpenMP][CUDA] Use one event pool per device

An event pool, similar to the stream pool, needs to be kept per device.
For one, events are associated with cuda contexts which means we cannot
destroy t

[OpenMP][CUDA] Use one event pool per device

An event pool, similar to the stream pool, needs to be kept per device.
For one, events are associated with cuda contexts which means we cannot
destroy the former after the latter. Also, CUDA documentation states
streams and events need to be associated with the same context, which
we did not ensure at all.

Differential Revision: https://reviews.llvm.org/D120142

show more ...


# 10aa83ff 17-Feb-2022 Johannes Doerfert <[email protected]>

[OpenMP] Allow to explicitly deinitialize device resources

There are two problems this patch tries to address:
1) We currently free resources in a random order wrt. plugin and
libomptarget destru

[OpenMP] Allow to explicitly deinitialize device resources

There are two problems this patch tries to address:
1) We currently free resources in a random order wrt. plugin and
libomptarget destruction. This patch should ensure the CUDA plugin
is less fragile if something during the deinitialization goes wrong.
2) We need to support (hard) pause runtime calls eventually. This patch
allows us to free all associated resources, though we cannot
reinitialize the device yet.

Follow up patch will associate one event pool per device/context.

Differential Revision: https://reviews.llvm.org/D120089

show more ...


# aca33b0b 10-Feb-2022 Shilei Tian <[email protected]>

[OpenMP][CUDA] Remove the hard team limit

Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as

[OpenMP][CUDA] Remove the hard team limit

Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119313

show more ...


# f6685f77 10-Feb-2022 Shilei Tian <[email protected]>

[OpenMP][CUDA] Refine the logic to determine grid size

This patch refines the logic to determine grid size as previous method
can escape the check of whether `CudaBlocksPerGrid` could be greater tha

[OpenMP][CUDA] Refine the logic to determine grid size

This patch refines the logic to determine grid size as previous method
can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual
hardware limit.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119311

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# f44e41af 27-Jan-2022 Sri Hari Krishna Narayanan <[email protected]>

Runtime for Interop directive

This implements the runtime portion of the interop directive.
It expects the frontend and IRBuilder portions to be in place
for proper execution. It currently works onl

Runtime for Interop directive

This implements the runtime portion of the interop directive.
It expects the frontend and IRBuilder portions to be in place
for proper execution. It currently works only for GPUs
and has several TODOs that should be addressed going forward.

Reviewed By: RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D106674

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 943d1d83 28-Dec-2021 Shilei Tian <[email protected]>

[OpenMP][CUDA] Add resource pool for CUevent

Following D111954, this patch adds the resource pool for CUevent.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D116315


# 357c8031 28-Dec-2021 Shilei Tian <[email protected]>

[OpenMP][Plugin] Minor adjustments to ResourcePool

This patch makes some minor adjustments to `ResourcePool`:
- Don't initialize the resources if `Size` is 0 which can avoid assertion.
- Add a new i

[OpenMP][Plugin] Minor adjustments to ResourcePool

This patch makes some minor adjustments to `ResourcePool`:
- Don't initialize the resources if `Size` is 0 which can avoid assertion.
- Add a new interface function `clear` to release all hold resources.
- If initial size is 0, resize to 1 when the first request is encountered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D116340

show more ...


# a697a0a4 27-Dec-2021 Shilei Tian <[email protected]>

[OpenMP][Plugin] Introduce generic resource pool

Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now
we have the need that some resources, such as CUDA stream and event,

[OpenMP][Plugin] Introduce generic resource pool

Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now
we have the need that some resources, such as CUDA stream and event, will be
hold by `libomptarget`. It is always good to buffer those resources. What's more
important, given the way that `libomptarget` and plugins are connected, we cannot
make sure whether plugins are still alive when `libomptarget` is destroyed. That
leads to an issue that those resouces hold by `libomptarget` might not be
released correctly. As a result, we need an unified management of all the resources
that can be shared between `libomptarget` and plugins.

`ResourcePoolTy` is designed to manage the type of resource for one device.
It has to work with an allocator which is supposed to provide `create` and
`destroy`. In this way, when the plugin is destroyed, we can make sure that
all resources allocated from native runtime library will be released correctly,
no matter whether `libomptarget` starts its destroy.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111954

show more ...


Revision tags: llvmorg-13.0.1-rc1
# b1ce4549 19-Oct-2021 Joseph Huber <[email protected]>

[OpenMP] Remove macro guards for device debugging

The plugin currently uses a macro to check if this is a debug built
before assigning the debug kind variable to the device environment
struct. This

[OpenMP] Remove macro guards for device debugging

The plugin currently uses a macro to check if this is a debug built
before assigning the debug kind variable to the device environment
struct. This is being deprecated because the new device runtime does not
maintain separate debug builds and should always be availible.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112083

show more ...


1234