History log of /llvm-project-15.0.7/openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.cpp (Results 1 – 8 of 8)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5
# 15ed5c0a 01-Jun-2022 Jose Manuel Monsalve Diaz <[email protected]>

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
This is a generic-elf-64bit device

Device (1):
This is a generic-elf-64bit device

Device (2):
This is a generic-elf-64bit device

Device (3):
This is a generic-elf-64bit device

Device (4):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 0
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (5):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 1
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (6):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 2
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (7):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 3
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
```

Differential Revision: https://reviews.llvm.org/D126836

show more ...


# 84e020a0 09-Jun-2022 Jose Manuel Monsalve Diaz <[email protected]>

Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"

This reverts commit d16a0877d8ac12a49fc75ae651247f338d46fead.


# d16a0877 01-Jun-2022 Jose Manuel Monsalve Diaz <[email protected]>

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
This is a generic-elf-64bit device

Device (1):
This is a generic-elf-64bit device

Device (2):
This is a generic-elf-64bit device

Device (3):
This is a generic-elf-64bit device

Device (4):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 0
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (5):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 1
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (6):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 2
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE

Device (7):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 3
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
```

Differential Revision: https://reviews.llvm.org/D126836

show more ...


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 38af5b4f 17-Dec-2021 Jon Chesterfield <[email protected]>

[libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches


# cc8dc5e2 08-Dec-2021 Carlo Bertolli <[email protected]>

[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version

Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Movi

[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version

Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 8cf93a35 26-Sep-2021 Jon Chesterfield <[email protected]>

[libomptarget][amdgpu] Destruct HSA queues

Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewe

[libomptarget][amdgpu] Destruct HSA queues

Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109511

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1
# a90da62a 29-Jul-2021 Jon Chesterfield <[email protected]>

[libomptarget][amdgpu] Update printed plugin name


Revision tags: llvmorg-14-init
# 1a965706 22-Jul-2021 Jon Chesterfield <[email protected]>

[libomptarget][amdgpu] Implement dlopen of libhsa

AMDGPU plugin equivalent of D95155, build without HSA installed locally

Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file

[libomptarget][amdgpu] Implement dlopen of libhsa

AMDGPU plugin equivalent of D95155, build without HSA installed locally

Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that
exposes the same symbols that the plugin presently uses from hsa. The object
file contains dlopen of hsa and cached dlsym calls. Also provides header files
corresponding to the subset that is used.

This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off.
That allows developers to build against the dlopen/dlsym implementation, e.g.
while testing this mode.

Enabling by default will cause this plugin to build on a wider variety of
machines than it does at present so may break some CI builds. That risk can
be minimised by reviewing the header dependencies of the library and ensuring
it doesn't use any libraries that are not already used by libomptarget.

Separating the implementation from enabling by default in case the latter needs
to be rolled back after wider CI results.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106559

show more ...