|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
b370be37 |
| 13-Jul-2022 |
Joseph Huber <[email protected]> |
[CUDA] Allow the new driver to compile CUDA in non-RDC mode
The new driver primarily allows us to support RDC-mode compilations with proper linking. This is not needed for non-RDC mode compilation,
[CUDA] Allow the new driver to compile CUDA in non-RDC mode
The new driver primarily allows us to support RDC-mode compilations with proper linking. This is not needed for non-RDC mode compilation, but we still would like the new driver to be able to handle this mode so we can transition away from the old driver in the future. This patch adds the necessary code to support creating a fatbinary for CUDA code generation as well as removing old assumptions and errors about RDC-mode with the new driver.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D129655
show more ...
|
| #
e88d53d2 |
| 29-Jun-2022 |
Joseph Huber <[email protected]> |
[HIP] Generate offloading entries for HIP with the new driver.
This patch adds the small change required to output offloading entried for HIP instead of CUDA. These should be placed in different sec
[HIP] Generate offloading entries for HIP with the new driver.
This patch adds the small change required to output offloading entried for HIP instead of CUDA. These should be placed in different sections so because they need to be distinct to the offloading toolchain, otherwise we'd have HIP trying to register CUDA kernels or vice-versa. This patch will precede support for HIP in the linker wrapper.
Reviewed By: yaxunl, tra
Differential Revision: https://reviews.llvm.org/D128850
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
1bae02b7 |
| 18-May-2022 |
Joseph Huber <[email protected]> |
[Cuda] Use fallback method to mangle externalized decls if no CUID given
CUDA requires that static variables be visible to the host when offloading. However, The standard semantics of a stiatc varia
[Cuda] Use fallback method to mangle externalized decls if no CUID given
CUDA requires that static variables be visible to the host when offloading. However, The standard semantics of a stiatc variable dictate that it should not be visible outside of the current file. In order to access it from the host we need to perform "externalization" on the static variable on the device. This requires generating a semi-unique name that can be affixed to the variable as to not cause linker errors.
This is currently done using the CUID functionality, an MD5 hash value set up by the clang driver. This allows us to achieve is mostly unique ID that is unique even between multiple compilations of the same file. However, this is not always availible. Instead, this patch uses the unique ID from the file to generate a unique symbol name. This will create a unique name that is consistent between the host and device side compilations without requiring the CUID to be entered by the driver. The one downside to this is that we are no longer stable under multiple compilations of the same file. However, this is a very niche use-case and is not supported by Nvidia's CUDA compiler so it likely to be good enough.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D125904
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
0035f715 |
| 15-Mar-2022 |
Joseph Huber <[email protected]> |
[CUDA] Create offloading entries when using the new driver
The changes made in D123460 generalized the code generation for OpenMP's offloading entries. We can use the same scheme to register globals
[CUDA] Create offloading entries when using the new driver
The changes made in D123460 generalized the code generation for OpenMP's offloading entries. We can use the same scheme to register globals for CUDA code. This patch adds the code generation to create these offloading entries when compiling using the new offloading driver mode. The offloading entries are simple structs that contain the information necessary to register the global. The struct used is as follows:
``` Type struct __tgt_offload_entry { void *addr; // Pointer to the offload entry info. // (function or global) char *name; // Name of the function or global. size_t size; // Size of the entry info (0 if it a function). int32_t flags; int32_t reserved; }; ```
Currently CUDA handles RDC code generation by deferring the registration of globals in the current TU to a callback function containing the modules ID. Later all the module IDs will be used to register all of the globals at once. Rather than mimic this, offloading entries allow us to mimic the way OpenMP registers globals. That is, we create a simple global struct for each device global to be registered. These are placed at a special section `cuda_offloading_entires`. Because this section is a valid C-identifier, the linker will profide a `__start` and `__stop` pointer that we can use to iterate and register all globals at runtime.
the registration requires a flag variable to indicate which registration function to use. I have assigned the flags somewhat arbitrarily, but these use the following values.
Kernel: 0 Variable: 0 Managed: 1 Surface: 2 Texture: 3
Depends on D120272
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123471
show more ...
|
| #
62501bc4 |
| 03-May-2022 |
Yaxun (Sam) Liu <[email protected]> |
[NFC][CUDA][HIP] rework mangling number for aux target
CUDA/HIP needs to mangle for aux target. When mangling for aux target, the mangler should use mangling number for aux target. Previously in htt
[NFC][CUDA][HIP] rework mangling number for aux target
CUDA/HIP needs to mangle for aux target. When mangling for aux target, the mangler should use mangling number for aux target. Previously in https://reviews.llvm.org/D122734 a state was introduced in ASTContext to let the mangler get mangling number for aux target from ASTContext. This patch removes that state from ASTConext and add an IsAux member to MangleContext to indicate that the mangle context is for aux target. This reflects the reality that the mangle context is created for mangling aux target and makes ASTContext cleaner.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D124842
show more ...
|
| #
11d3e31c |
| 30-Mar-2022 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] Fix mangling number for local struct
MSVC and Itanium mangling use different mangling numbers for function-scope structs, which causes inconsistent mangled kernel names in device and hos
[CUDA][HIP] Fix mangling number for local struct
MSVC and Itanium mangling use different mangling numbers for function-scope structs, which causes inconsistent mangled kernel names in device and host compilations.
This patch uses Itanium mangling number for structs in for mangling device side names in CUDA/HIP host compilation on Windows to fix this issue.
A state is added to ASTContext to indicate whether the current name mangling is for device side names in host compilation. Device and host mangling number are encoded/decoded as upper and lower half of 32 bit unsigned integer to fit into the original mangling number field for AST. Diagnostic will be emitted if a manglining number exceeds limit.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D122734
Fixes: SWDEV-328515
show more ...
|
| #
4ea1d435 |
| 08-Apr-2022 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] Externalize kernels in anonymous name space
kernels in anonymous name space needs to have unique name to avoid duplicate symbols.
Fixes: https://github.com/llvm/llvm-project/issues/5456
[CUDA][HIP] Externalize kernels in anonymous name space
kernels in anonymous name space needs to have unique name to avoid duplicate symbols.
Fixes: https://github.com/llvm/llvm-project/issues/54560
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D123353
show more ...
|
| #
e4903d8b |
| 08-Apr-2022 |
Jonas Hahnfeld <[email protected]> |
[CUDA/HIP] Remove argument from module ctor/dtor signatures
In theory, constructors can take arguments when called via .init_array where at least glibc passes in (argc, argv, envp). This isn't used
[CUDA/HIP] Remove argument from module ctor/dtor signatures
In theory, constructors can take arguments when called via .init_array where at least glibc passes in (argc, argv, envp). This isn't used in the generated code and if it was, the first argument should be an integer, not a pointer. For destructors registered via atexit, the function should never take an argument.
Differential Revision: https://reviews.llvm.org/D123370
show more ...
|
| #
b8f0e128 |
| 22-Mar-2022 |
Nikita Popov <[email protected]> |
[CodeGen] Remove some uses of deprecated Address constructor
Remove two stray uses in CodeGenModule and CGCUDANV.
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
9d899d8f |
| 22-Feb-2022 |
Yaxun (Sam) Liu <[email protected]> |
[HIP] Support `-fgpu-default-stream`
Introduce -fgpu-default-stream={legacy|per-thread} option to support per-thread default stream for HIP runtime.
When -fgpu-default-stream=per-thread, HIP kernel
[HIP] Support `-fgpu-default-stream`
Introduce -fgpu-default-stream={legacy|per-thread} option to support per-thread default stream for HIP runtime.
When -fgpu-default-stream=per-thread, HIP kernels are launched through hipLaunchKernel_spt instead of hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1 is defined by the preprocessor to enable other per-thread stream API's.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D120298
show more ...
|
| #
50650766 |
| 16-Feb-2022 |
Nikita Popov <[email protected]> |
[CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated().
While d
[CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated().
While doing the rename, I've filled in element types in cases where it was relatively obvious, but we're still left with 135 calls to the deprecated constructor.
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
3b172f60 |
| 02-Dec-2021 |
Yaxun (Sam) Liu <[email protected]> |
[HIP] Fix -fgpu-rdc for Windows
This patch fixes issues for -fgpu-rdc for Windows MSVC toolchain:
Fix COFF specific section flags and remove section types in llvm-mc input file for Windows.
Escape
[HIP] Fix -fgpu-rdc for Windows
This patch fixes issues for -fgpu-rdc for Windows MSVC toolchain:
Fix COFF specific section flags and remove section types in llvm-mc input file for Windows.
Escape fatbin path in llvm-mc input file.
Add -triple option to llvm-mc.
Put __hip_gpubin_handle in comdat when it has linkonce_odr linkage.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D115039
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
80072fde |
| 04-Nov-2021 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also r
[CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also requires these symbols be in comdat sections. Linux does not require the symbols in comdat sections to be merged by linker but by default clang puts them in comdat sections.
If a template kernel is instantiated identically in two TU's. MSVC requires that them to be in comdat sections, otherwise MSVC linker will diagnose them as duplicate symbols. However, currently clang does not put instantiated template kernels in comdat sections, which causes link error for MSVC.
This patch allows putting instantiated template kernels into comdat sections.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D112492
show more ...
|
| #
1b758925 |
| 29-Oct-2021 |
Jay Foad <[email protected]> |
[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around ConstantExpr::getAsInstruction, which also inserted the new instruction into
[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around ConstantExpr::getAsInstruction, which also inserted the new instruction into a basic block. Implement this directly in getAsInstruction by adding an InsertBefore parameter and change all callers to use it. NFC.
A follow-up patch will remove createReplacementInstr.
Differential Revision: https://reviews.llvm.org/D112791
show more ...
|
| #
9fb9c6b9 |
| 19-Oct-2021 |
Uday Bondhugula <[email protected]> |
[Clang][NFC] Clang CUDA codegen clean-up
Update an instance of dyn_cast -> cast and other NFC clang-tidy fixes for Clang CUDA codegen.
Differential Revision: https://reviews.llvm.org/D112284
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
6d3e7c78 |
| 17-Jul-2021 |
Nikita Popov <[email protected]> |
[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type
Remove uses of to-be-deprecated API. I've fallen back to calling getPointerElementType() in some cases where the correct type was
[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type
Remove uses of to-be-deprecated API. I've fallen back to calling getPointerElementType() in some cases where the correct type wasn't immediately obvious to me.
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
eba69b59 |
| 23-Apr-2021 |
Erich Keane <[email protected]> |
Reimplement __builtin_unique_stable_name-
The original version of this was reverted, and @rjmcall provided some advice to architect a new solution. This is that solution.
This implements a builtin
Reimplement __builtin_unique_stable_name-
The original version of this was reverted, and @rjmcall provided some advice to architect a new solution. This is that solution.
This implements a builtin to provide a unique name that is stable across compilations of this TU for the purposes of implementing the library component of the unnamed kernel feature of SYCL. It does this by running the Itanium mangler with a few modifications.
Because it is somewhat common to wrap non-kernel-related lambdas in macros that aren't present on the device (such as for logging), this uniquely generates an ID for all lambdas involved in the naming of a kernel. It uses the lambda-mangling number to do this, except replaces this with its own number (starting at 10000 for readabililty reasons) for lambdas used to name a kernel.
Additionally, this implements itself as constexpr with a slight catch: if a name would be invalidated by the use of this lambda in a later kernel invocation, it is diagnosed as an error (see the Sema tests).
Differential Revision: https://reviews.llvm.org/D103112
show more ...
|
| #
98575708 |
| 11-May-2021 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables instantiated only in host functions, however, nvcc is able to do that:
https://godbolt.org/z/fneEff
[CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables instantiated only in host functions, however, nvcc is able to do that:
https://godbolt.org/z/fneEfferY
This patch fixes this issue by refactoring and extending the existing mechanism for emitting static device var ODR-used by host only. Basically clang records device variables ODR-used by host code and force them to be emitted in device compilation. The existing mechanism makes sure these device variables ODR-used by host code are added to llvm.compiler-used, therefore they are guaranteed not to be deleted.
It also fixes non-ODR-use of static device variable by host code causing static device variable to be emitted and registered, which should not.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D102237
show more ...
|
| #
6a72ed23 |
| 19-Apr-2021 |
Jan Svoboda <[email protected]> |
[clang] NFC: Fix range-based for loop warnings related to decl lookup
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
d5c0f00e |
| 17-Mar-2021 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] Mark device var used by host only
Add device variables to llvm.compiler.used if they are ODR-used by either host or device functions.
This is necessary to prevent them from being elimin
[CUDA][HIP] Mark device var used by host only
Add device variables to llvm.compiler.used if they are ODR-used by either host or device functions.
This is necessary to prevent them from being eliminated by whole-program optimization where the compiler has no way to know a device variable is used by some host code.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D98814
show more ...
|
| #
2901dc75 |
| 06-Apr-2021 |
Simon Pilgrim <[email protected]> |
Don't directly dereference getAs<> casts to avoid potential null dereferences. NFCI.
Replace with castAs<> which asserts the cast is valid.
Fixes a number of static analyzer warnings.
|
| #
cc947716 |
| 24-Mar-2021 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] add __builtin_get_device_side_mangled_name
Add builtin function __builtin_get_device_side_mangled_name to get device side manged name for functions and global variables, which can be use
[CUDA][HIP] add __builtin_get_device_side_mangled_name
Add builtin function __builtin_get_device_side_mangled_name to get device side manged name for functions and global variables, which can be used to get symbol address of kernels or variables by mangled name in dynamically loaded bundled code objects at run time.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D99301
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
| #
5cf2a37f |
| 08-Feb-2021 |
Yaxun (Sam) Liu <[email protected]> |
[HIP] Emit kernel symbol
Currently clang uses stub function to launch kernel. This is inconvenient to interop with C++ programs since the stub function has different name as kernel, which is require
[HIP] Emit kernel symbol
Currently clang uses stub function to launch kernel. This is inconvenient to interop with C++ programs since the stub function has different name as kernel, which is required by ROCm debugger.
This patch emits a variable symbol which has the same name as the kernel and uses it to register and launch the kernel. This allows C++ program to launch a kernel by using the original kernel name.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D86376
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
47acdec1 |
| 19-Jan-2021 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc
For -fgpu-rdc mode, static device vars in different TU's may have the same name. To support accessing file-scope stati
[CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc
For -fgpu-rdc mode, static device vars in different TU's may have the same name. To support accessing file-scope static device variables in host code, we need to give them a distinct name and external linkage. This can be done by postfixing each static device variable with a distinct CUID (Compilation Unit ID) hash.
Since the static device variables have different name across compilation units, now we let them have external linkage so that they can be looked up by the runtime.
Reviewed by: Artem Belevich, and Jon Chesterfield
Differential Revision: https://reviews.llvm.org/D85223
show more ...
|
| #
a3ce7f5c |
| 05-Feb-2021 |
Yaxun (Sam) Liu <[email protected]> |
[HIP] Fix managed variable linkage
Currently managed variables are emitted as undefined symbols, which causes difficulty for diagnosing undefined symbols for non-managed variables.
This patch trans
[HIP] Fix managed variable linkage
Currently managed variables are emitted as undefined symbols, which causes difficulty for diagnosing undefined symbols for non-managed variables.
This patch transforms managed variables in device compilation so that they can be emitted as normal variables.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D96195
show more ...
|