History log of /llvm-project-15.0.7/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp (Results 1 – 25 of 118)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1
# 046d5b91 15-Sep-2022 Joseph Huber <[email protected]>

[Libomptarget] Revert changes to AMDGPU plugin destructors

These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
sta

[Libomptarget] Revert changes to AMDGPU plugin destructors

These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
state. This will prevent us from using device destructors but should
remove some new bugs. In the future this interface should be changed
once these problems are addressed more correctly.

This reverts commit ed0f21811544320f829124efbb6a38ee12eb9155.

This reverts commit 2b7203a35972e98b8521f92d2791043dc539ae88.

Fixes #57536

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133997

show more ...


Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2
# 207f96e8 02-Aug-2022 Joseph Huber <[email protected]>

[Libomptarget] Deinitialize AMDGPU global state more intentionally

A previous patch made the destruction of the HSA plugin more
deterministic. However, there were still other global values that are

[Libomptarget] Deinitialize AMDGPU global state more intentionally

A previous patch made the destruction of the HSA plugin more
deterministic. However, there were still other global values that are not
handled this way. When attempting to call a destructor kernel, the
device would have already been uninitialized and we could not find the
appropriate kernel to call. This is because they were stored in global
containers that had their destructors called already. Merges this global
state into the rest of the info state by putting those global values
inside of the global pointer already allocated and deallocated by the
constructor and destructor. This should allow the AMDGPU plugin to
correctly identify the destructors if we were to run them.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131011

(cherry picked from commit 2b7203a35972e98b8521f92d2791043dc539ae88)

show more ...


Revision tags: llvmorg-15.0.0-rc1
# cb24013b 28-Jul-2022 Jon Chesterfield <[email protected]>

[openmp][amdgpu] Tear down amdgpu plugin accurately

Moves DeviceInfo global to heap to accurately control lifetime.
Moves calls from libomptarget to deinit_plugin later, plugins need to stay
alive u

[openmp][amdgpu] Tear down amdgpu plugin accurately

Moves DeviceInfo global to heap to accurately control lifetime.
Moves calls from libomptarget to deinit_plugin later, plugins need to stay
alive until very shortly before libomptarget is destructed.

Leaving the deinit_plugin calls where initially inserted hits use after
free from the dynamic_module.c offloading test (verified with valgrind
that the new location is sound with respect to this)

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130714

(cherry picked from commit ed0f21811544320f829124efbb6a38ee12eb9155)

show more ...


# 087d9bb2 28-Jul-2022 Jon Chesterfield <[email protected]>

[amdgpu][openmp][nfc] Restore stb_local on DeviceInfo symbol

(cherry picked from commit c214cb6a689581c1b7f3b702b5da6d68de6eaf3f)


# b5151c32 28-Jul-2022 Jon Chesterfield <[email protected]>

[openmp][amdgpu] Move global DeviceInfo behind call syntax prior to using D130712

(cherry picked from commit 75aa52106452a1d15ca487af7b408a812012e133)


# 410bfa00 28-Jul-2022 Jon Chesterfield <[email protected]>

[openmp] Introduce optional plugin init/deinit functions

Will allow plugins to migrate away from using global variables to
manage lifetime, which will fix a segfault discovered in relation to D12743

[openmp] Introduce optional plugin init/deinit functions

Will allow plugins to migrate away from using global variables to
manage lifetime, which will fix a segfault discovered in relation to D127432

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D130712

(cherry picked from commit 1f9d3974e444f95ddb600a6964ed14ded559e89c)

show more ...


Revision tags: llvmorg-16-init
# 4075a811 25-Jul-2022 Saiyedul Islam <[email protected]>

[Libomptarget] Add checks for AMDGPU TargetID using new image info

This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's run

[Libomptarget] Add checks for AMDGPU TargetID using new image info

This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127769

show more ...


# 4cf30c51 25-Jul-2022 Saiyedul Islam <[email protected]>

Revert "Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"""

This reverts commit 281eb9223cf2e9366b5356fafab275abf0ea1d2b.


# 281eb922 25-Jul-2022 Saiyedul Islam <[email protected]>

Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info""

This reverts commit 8cbf4a386b6740180fe48aaebbd1ca9f8ee14367.


# 8cbf4a38 25-Jul-2022 Saiyedul Islam <[email protected]>

Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"

This reverts commit 471f2abc62d96b3ef97e13f4f7be2d386fc9f75f.


# 471f2abc 22-Jul-2022 Saiyedul Islam <[email protected]>

[Libomptarget] Add checks for AMDGPU TargetID using new image info

This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's run

[Libomptarget] Add checks for AMDGPU TargetID using new image info

This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Differential Revision: https://reviews.llvm.org/D127769

show more ...


# 1f940b69 15-Jul-2022 Joseph Huber <[email protected]>

[Libomptarget][NFC] Fix signed comparison warnings

Summary:
Non-functional change, just fixing some sign comparison warnings by
making both match.


# d27d0a67 01-Jul-2022 Joseph Huber <[email protected]>

[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention

Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly la

[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention

Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly large clash
between the naming conventions used. This patch fixes most of the
variable names that did not confrom to the LLVM standard, that is
`VariableName` for variables and `functionName` for functions.

This patch was primarily done using my editor's linting messages, if
there are any issues I missed arising from the automation let me know.

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D128997

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# 15ed5c0a 01-Jun-2022 Jose Manuel Monsalve Diaz <[email protected]>

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
This is a generic-elf-64bit device

Device (1):
This is a generic-elf-64bit device

Device (2):
This is a generic-elf-64bit device

Device (3):
This is a generic-elf-64bit device

Device (4):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 0
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (5):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 1
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (6):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 2
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (7):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 3
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
```

Differential Revision: https://reviews.llvm.org/D126836

show more ...


# 84e020a0 09-Jun-2022 Jose Manuel Monsalve Diaz <[email protected]>

Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"

This reverts commit d16a0877d8ac12a49fc75ae651247f338d46fead.


# d16a0877 01-Jun-2022 Jose Manuel Monsalve Diaz <[email protected]>

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
This is a generic-elf-64bit device

Device (1):
This is a generic-elf-64bit device

Device (2):
This is a generic-elf-64bit device

Device (3):
This is a generic-elf-64bit device

Device (4):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 0
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (5):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 1
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (6):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 2
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (7):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 3
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
```

Differential Revision: https://reviews.llvm.org/D126836

show more ...


Revision tags: llvmorg-14.0.4
# f4f23de1 09-May-2022 Joseph Huber <[email protected]>

[Libomptarget] Add basic support for dynamic shared memory on AMDGPU

This patchs adds the arguments necessary to allocate the size of the
dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SI

[Libomptarget] Add basic support for dynamic shared memory on AMDGPU

This patchs adds the arguments necessary to allocate the size of the
dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE`
environment variable. This patch only allocates the memory, AMDGPU has a
limitation that shared memory can only be accessed from the kernel
directly. So this will currently only work with optimizations to inline
the accessor function.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D125252

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# 7086a1db 14-Apr-2022 Dhruva Chakrabarti <[email protected]>

[libomptarget] [amdgpu] Hostcall offset check should consider implicit args

Fixed hostcall offset check to compare against kernarg segment size
and implicit arguments. Improved the corresponding deb

[libomptarget] [amdgpu] Hostcall offset check should consider implicit args

Fixed hostcall offset check to compare against kernarg segment size
and implicit arguments. Improved the corresponding debug print.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123827

show more ...


Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# a74826d3 10-Jan-2022 Jon Chesterfield <[email protected]>

[openmp][amdgpu] Replace unsigned long with uint64_t

Some types need to be 64 bit. Unsigned long is a hazard there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D116963


# 91dfb32f 17-Dec-2021 Jon Chesterfield <[email protected]>

[openmp][amdgpu][nfc] Mark all external functions extern C to get type checking


# d3abb04e 17-Dec-2021 Carlo Bertolli <[email protected]>

[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter
I missed the async info parameter in the first version of this API.

Reviewed By: JonChesterfield

Differe

[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter
I missed the async info parameter in the first version of this API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115887

show more ...


# d83dc4c6 15-Dec-2021 Carlo Bertolli <[email protected]>

[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin
This patch extends the AMDGPU plugin for OpenMP target offloading from using a singl

[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin
This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115771

show more ...


# 28309c54 10-Dec-2021 Carlo Bertolli <[email protected]>

[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous
and synchronous kernel launch implementations into a single
synchronous version. This patch prepares the plugin for asynchronous

[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous
and synchronous kernel launch implementations into a single
synchronous version. This patch prepares the plugin for asynchronous
implementation by:

Privatizing actual kernel launch code (valid in both cases) into
an anonymous namespace base function (submitted at D115267)

- Separating the control flow path of asynchronous and synchronous
kernel launch functions** (this diff)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115273

show more ...


# cc8dc5e2 08-Dec-2021 Carlo Bertolli <[email protected]>

[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version

Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Movi

[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version

Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279

show more ...


# 14ff611f 08-Dec-2021 Jon Chesterfield <[email protected]>

Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version"

This reverts commit 6de698bf10996b532632bb9dfa9fd420c5af62af.
It didn't build in the dynamic_hsa configuration


12345