|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
15ed5c0a |
| 01-Jun-2022 |
Jose Manuel Monsalve Diaz <[email protected]> |
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool.
This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added
Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device
Device (1): This is a generic-elf-64bit device
Device (2): This is a generic-elf-64bit device
Device (3): This is a generic-elf-64bit device
Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ```
Differential Revision: https://reviews.llvm.org/D126836
show more ...
|
| #
84e020a0 |
| 09-Jun-2022 |
Jose Manuel Monsalve Diaz <[email protected]> |
Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"
This reverts commit d16a0877d8ac12a49fc75ae651247f338d46fead.
|
| #
d16a0877 |
| 01-Jun-2022 |
Jose Manuel Monsalve Diaz <[email protected]> |
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool.
This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added
Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device
Device (1): This is a generic-elf-64bit device
Device (2): This is a generic-elf-64bit device
Device (3): This is a generic-elf-64bit device
Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ```
Differential Revision: https://reviews.llvm.org/D126836
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
38af5b4f |
| 17-Dec-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches
|
| #
cc8dc5e2 |
| 08-Dec-2021 |
Carlo Bertolli <[email protected]> |
[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Movi
[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115279
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
8cf93a35 |
| 26-Sep-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.
Reviewe
[libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D109511
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1 |
|
| #
a90da62a |
| 29-Jul-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget][amdgpu] Update printed plugin name
|
|
Revision tags: llvmorg-14-init |
|
| #
1a965706 |
| 22-Jul-2021 |
Jon Chesterfield <[email protected]> |
[libomptarget][amdgpu] Implement dlopen of libhsa
AMDGPU plugin equivalent of D95155, build without HSA installed locally
Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file
[libomptarget][amdgpu] Implement dlopen of libhsa
AMDGPU plugin equivalent of D95155, build without HSA installed locally
Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that exposes the same symbols that the plugin presently uses from hsa. The object file contains dlopen of hsa and cached dlsym calls. Also provides header files corresponding to the subset that is used.
This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off. That allows developers to build against the dlopen/dlsym implementation, e.g. while testing this mode.
Enabling by default will cause this plugin to build on a wider variety of machines than it does at present so may break some CI builds. That risk can be minimised by reviewing the header dependencies of the library and ensuring it doesn't use any libraries that are not already used by libomptarget.
Separating the implementation from enabling by default in case the latter needs to be rolled back after wider CI results.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106559
show more ...
|