|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1 |
|
| #
046d5b91 |
| 15-Sep-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Revert changes to AMDGPU plugin destructors
These patches exposed a lot of problems in the AMD toolchain. Rather than keep it broken we should revert it to its old semi-functional sta
[Libomptarget] Revert changes to AMDGPU plugin destructors
These patches exposed a lot of problems in the AMD toolchain. Rather than keep it broken we should revert it to its old semi-functional state. This will prevent us from using device destructors but should remove some new bugs. In the future this interface should be changed once these problems are addressed more correctly.
This reverts commit ed0f21811544320f829124efbb6a38ee12eb9155.
This reverts commit 2b7203a35972e98b8521f92d2791043dc539ae88.
Fixes #57536
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133997
show more ...
|
|
Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
| #
207f96e8 |
| 02-Aug-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Deinitialize AMDGPU global state more intentionally
A previous patch made the destruction of the HSA plugin more deterministic. However, there were still other global values that are
[Libomptarget] Deinitialize AMDGPU global state more intentionally
A previous patch made the destruction of the HSA plugin more deterministic. However, there were still other global values that are not handled this way. When attempting to call a destructor kernel, the device would have already been uninitialized and we could not find the appropriate kernel to call. This is because they were stored in global containers that had their destructors called already. Merges this global state into the rest of the info state by putting those global values inside of the global pointer already allocated and deallocated by the constructor and destructor. This should allow the AMDGPU plugin to correctly identify the destructors if we were to run them.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131011
(cherry picked from commit 2b7203a35972e98b8521f92d2791043dc539ae88)
show more ...
|
|
Revision tags: llvmorg-15.0.0-rc1 |
|
| #
cb24013b |
| 28-Jul-2022 |
Jon Chesterfield <[email protected]> |
[openmp][amdgpu] Tear down amdgpu plugin accurately
Moves DeviceInfo global to heap to accurately control lifetime. Moves calls from libomptarget to deinit_plugin later, plugins need to stay alive u
[openmp][amdgpu] Tear down amdgpu plugin accurately
Moves DeviceInfo global to heap to accurately control lifetime. Moves calls from libomptarget to deinit_plugin later, plugins need to stay alive until very shortly before libomptarget is destructed.
Leaving the deinit_plugin calls where initially inserted hits use after free from the dynamic_module.c offloading test (verified with valgrind that the new location is sound with respect to this)
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130714
(cherry picked from commit ed0f21811544320f829124efbb6a38ee12eb9155)
show more ...
|
| #
087d9bb2 |
| 28-Jul-2022 |
Jon Chesterfield <[email protected]> |
[amdgpu][openmp][nfc] Restore stb_local on DeviceInfo symbol
(cherry picked from commit c214cb6a689581c1b7f3b702b5da6d68de6eaf3f)
|
| #
b5151c32 |
| 28-Jul-2022 |
Jon Chesterfield <[email protected]> |
[openmp][amdgpu] Move global DeviceInfo behind call syntax prior to using D130712
(cherry picked from commit 75aa52106452a1d15ca487af7b408a812012e133)
|
| #
410bfa00 |
| 28-Jul-2022 |
Jon Chesterfield <[email protected]> |
[openmp] Introduce optional plugin init/deinit functions
Will allow plugins to migrate away from using global variables to manage lifetime, which will fix a segfault discovered in relation to D12743
[openmp] Introduce optional plugin init/deinit functions
Will allow plugins to migrate away from using global variables to manage lifetime, which will fix a segfault discovered in relation to D127432
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D130712
(cherry picked from commit 1f9d3974e444f95ddb600a6964ed14ded559e89c)
show more ...
|
|
Revision tags: llvmorg-16-init |
|
| #
4075a811 |
| 25-Jul-2022 |
Saiyedul Islam <[email protected]> |
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the binary's target ID matches the one parsed from the system's run
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the binary's target ID matches the one parsed from the system's runtime environment. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for AMDGPU. It also handles compatibility testing of target IDs of the image and the enviornment.
Depends on D127432
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D127769
show more ...
|
| #
4cf30c51 |
| 25-Jul-2022 |
Saiyedul Islam <[email protected]> |
Revert "Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"""
This reverts commit 281eb9223cf2e9366b5356fafab275abf0ea1d2b.
|
| #
281eb922 |
| 25-Jul-2022 |
Saiyedul Islam <[email protected]> |
Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info""
This reverts commit 8cbf4a386b6740180fe48aaebbd1ca9f8ee14367.
|
| #
8cbf4a38 |
| 25-Jul-2022 |
Saiyedul Islam <[email protected]> |
Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"
This reverts commit 471f2abc62d96b3ef97e13f4f7be2d386fc9f75f.
|
| #
471f2abc |
| 22-Jul-2022 |
Saiyedul Islam <[email protected]> |
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the binary's target ID matches the one parsed from the system's run
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the binary's target ID matches the one parsed from the system's runtime environment. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for AMDGPU. It also handles compatibility testing of target IDs of the image and the enviornment.
Depends on D127432
Differential Revision: https://reviews.llvm.org/D127769
show more ...
|
| #
1f940b69 |
| 15-Jul-2022 |
Joseph Huber <[email protected]> |
[Libomptarget][NFC] Fix signed comparison warnings
Summary: Non-functional change, just fixing some sign comparison warnings by making both match.
|
| #
d27d0a67 |
| 01-Jul-2022 |
Joseph Huber <[email protected]> |
[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention
Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly la
[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention
Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly large clash between the naming conventions used. This patch fixes most of the variable names that did not confrom to the LLVM standard, that is `VariableName` for variables and `functionName` for functions.
This patch was primarily done using my editor's linting messages, if there are any issues I missed arising from the automation let me know.
Reviewed By: saiislam
Differential Revision: https://reviews.llvm.org/D128997
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
15ed5c0a |
| 01-Jun-2022 |
Jose Manuel Monsalve Diaz <[email protected]> |
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool.
This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added
Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device
Device (1): This is a generic-elf-64bit device
Device (2): This is a generic-elf-64bit device
Device (3): This is a generic-elf-64bit device
Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ```
Differential Revision: https://reviews.llvm.org/D126836
show more ...
|
| #
84e020a0 |
| 09-Jun-2022 |
Jose Manuel Monsalve Diaz <[email protected]> |
Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"
This reverts commit d16a0877d8ac12a49fc75ae651247f338d46fead.
|
| #
d16a0877 |
| 01-Jun-2022 |
Jose Manuel Monsalve Diaz <[email protected]> |
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool.
This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added
Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device
Device (1): This is a generic-elf-64bit device
Device (2): This is a generic-elf-64bit device
Device (3): This is a generic-elf-64bit device
Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE
Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ```
Differential Revision: https://reviews.llvm.org/D126836
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
f4f23de1 |
| 09-May-2022 |
Joseph Huber <[email protected]> |
[Libomptarget] Add basic support for dynamic shared memory on AMDGPU
This patchs adds the arguments necessary to allocate the size of the dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SI
[Libomptarget] Add basic support for dynamic shared memory on AMDGPU
This patchs adds the arguments necessary to allocate the size of the dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable. This patch only allocates the memory, AMDGPU has a limitation that shared memory can only be accessed from the kernel directly. So this will currently only work with optimizations to inline the accessor function.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D125252
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
7086a1db |
| 14-Apr-2022 |
Dhruva Chakrabarti <[email protected]> |
[libomptarget] [amdgpu] Hostcall offset check should consider implicit args
Fixed hostcall offset check to compare against kernarg segment size and implicit arguments. Improved the corresponding deb
[libomptarget] [amdgpu] Hostcall offset check should consider implicit args
Fixed hostcall offset check to compare against kernarg segment size and implicit arguments. Improved the corresponding debug print.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D123827
show more ...
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
a74826d3 |
| 10-Jan-2022 |
Jon Chesterfield <[email protected]> |
[openmp][amdgpu] Replace unsigned long with uint64_t
Some types need to be 64 bit. Unsigned long is a hazard there.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D116963
|
| #
91dfb32f |
| 17-Dec-2021 |
Jon Chesterfield <[email protected]> |
[openmp][amdgpu][nfc] Mark all external functions extern C to get type checking
|
| #
d3abb04e |
| 17-Dec-2021 |
Carlo Bertolli <[email protected]> |
[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter I missed the async info parameter in the first version of this API.
Reviewed By: JonChesterfield
Differe
[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter I missed the async info parameter in the first version of this API.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115887
show more ...
|
| #
d83dc4c6 |
| 15-Dec-2021 |
Carlo Bertolli <[email protected]> |
[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin This patch extends the AMDGPU plugin for OpenMP target offloading from using a singl
[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115771
show more ...
|
| #
28309c54 |
| 10-Dec-2021 |
Carlo Bertolli <[email protected]> |
[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous
[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by:
Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function (submitted at D115267)
- Separating the control flow path of asynchronous and synchronous kernel launch functions** (this diff)
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115273
show more ...
|
| #
cc8dc5e2 |
| 08-Dec-2021 |
Carlo Bertolli <[email protected]> |
[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Movi
[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115279
show more ...
|
| #
14ff611f |
| 08-Dec-2021 |
Jon Chesterfield <[email protected]> |
Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version"
This reverts commit 6de698bf10996b532632bb9dfa9fd420c5af62af. It didn't build in the dynamic_hsa configuration
|