History log of /llvm-project-15.0.7/clang/lib/CodeGen/CGCUDANV.cpp (Results 1 – 25 of 93)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# b370be37 13-Jul-2022 Joseph Huber <[email protected]>

[CUDA] Allow the new driver to compile CUDA in non-RDC mode

The new driver primarily allows us to support RDC-mode compilations with
proper linking. This is not needed for non-RDC mode compilation,

[CUDA] Allow the new driver to compile CUDA in non-RDC mode

The new driver primarily allows us to support RDC-mode compilations with
proper linking. This is not needed for non-RDC mode compilation, but we
still would like the new driver to be able to handle this mode so we can
transition away from the old driver in the future. This patch adds the
necessary code to support creating a fatbinary for CUDA code generation
as well as removing old assumptions and errors about RDC-mode with the
new driver.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D129655

show more ...


# e88d53d2 29-Jun-2022 Joseph Huber <[email protected]>

[HIP] Generate offloading entries for HIP with the new driver.

This patch adds the small change required to output offloading entried
for HIP instead of CUDA. These should be placed in different sec

[HIP] Generate offloading entries for HIP with the new driver.

This patch adds the small change required to output offloading entried
for HIP instead of CUDA. These should be placed in different sections so
because they need to be distinct to the offloading toolchain, otherwise
we'd have HIP trying to register CUDA kernels or vice-versa. This patch will
precede support for HIP in the linker wrapper.

Reviewed By: yaxunl, tra

Differential Revision: https://reviews.llvm.org/D128850

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4
# 1bae02b7 18-May-2022 Joseph Huber <[email protected]>

[Cuda] Use fallback method to mangle externalized decls if no CUID given

CUDA requires that static variables be visible to the host when
offloading. However, The standard semantics of a stiatc varia

[Cuda] Use fallback method to mangle externalized decls if no CUID given

CUDA requires that static variables be visible to the host when
offloading. However, The standard semantics of a stiatc variable dictate
that it should not be visible outside of the current file. In order to
access it from the host we need to perform "externalization" on the
static variable on the device. This requires generating a semi-unique
name that can be affixed to the variable as to not cause linker errors.

This is currently done using the CUID functionality, an MD5 hash value
set up by the clang driver. This allows us to achieve is mostly unique
ID that is unique even between multiple compilations of the same file.
However, this is not always availible. Instead, this patch uses the
unique ID from the file to generate a unique symbol name. This will
create a unique name that is consistent between the host and device side
compilations without requiring the CUID to be entered by the driver. The
one downside to this is that we are no longer stable under multiple
compilations of the same file. However, this is a very niche use-case
and is not supported by Nvidia's CUDA compiler so it likely to be good
enough.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D125904

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 0035f715 15-Mar-2022 Joseph Huber <[email protected]>

[CUDA] Create offloading entries when using the new driver

The changes made in D123460 generalized the code generation for OpenMP's
offloading entries. We can use the same scheme to register globals

[CUDA] Create offloading entries when using the new driver

The changes made in D123460 generalized the code generation for OpenMP's
offloading entries. We can use the same scheme to register globals for
CUDA code. This patch adds the code generation to create these
offloading entries when compiling using the new offloading driver mode.
The offloading entries are simple structs that contain the information
necessary to register the global. The struct used is as follows:

```
Type struct __tgt_offload_entry {
void *addr; // Pointer to the offload entry info.
// (function or global)
char *name; // Name of the function or global.
size_t size; // Size of the entry info (0 if it a function).
int32_t flags;
int32_t reserved;
};
```

Currently CUDA handles RDC code generation by deferring the registration
of globals in the current TU to a callback function containing the
modules ID. Later all the module IDs will be used to register all of the
globals at once. Rather than mimic this, offloading entries allow us to
mimic the way OpenMP registers globals. That is, we create a simple
global struct for each device global to be registered. These are placed
at a special section `cuda_offloading_entires`. Because this section is
a valid C-identifier, the linker will profide a `__start` and `__stop`
pointer that we can use to iterate and register all globals at runtime.

the registration requires a flag variable to indicate which registration
function to use. I have assigned the flags somewhat arbitrarily, but
these use the following values.

Kernel: 0
Variable: 0
Managed: 1
Surface: 2
Texture: 3

Depends on D120272

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123471

show more ...


# 62501bc4 03-May-2022 Yaxun (Sam) Liu <[email protected]>

[NFC][CUDA][HIP] rework mangling number for aux target

CUDA/HIP needs to mangle for aux target. When mangling for aux target,
the mangler should use mangling number for aux target. Previously
in htt

[NFC][CUDA][HIP] rework mangling number for aux target

CUDA/HIP needs to mangle for aux target. When mangling for aux target,
the mangler should use mangling number for aux target. Previously
in https://reviews.llvm.org/D122734 a state was introduced in
ASTContext to let the mangler get mangling number for aux target
from ASTContext. This patch removes that state from ASTConext
and add an IsAux member to MangleContext to indicate that
the mangle context is for aux target. This reflects the reality that
the mangle context is created for mangling aux target and makes
ASTContext cleaner.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D124842

show more ...


# 11d3e31c 30-Mar-2022 Yaxun (Sam) Liu <[email protected]>

[CUDA][HIP] Fix mangling number for local struct

MSVC and Itanium mangling use different mangling numbers
for function-scope structs, which causes inconsistent
mangled kernel names in device and hos

[CUDA][HIP] Fix mangling number for local struct

MSVC and Itanium mangling use different mangling numbers
for function-scope structs, which causes inconsistent
mangled kernel names in device and host compilations.

This patch uses Itanium mangling number for structs
in for mangling device side names in CUDA/HIP host
compilation on Windows to fix this issue.

A state is added to ASTContext to indicate whether the
current name mangling is for device side names in host
compilation. Device and host mangling number
are encoded/decoded as upper and lower half of 32 bit
unsigned integer to fit into the original mangling number
field for AST. Diagnostic will be emitted if a manglining
number exceeds limit.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D122734

Fixes: SWDEV-328515

show more ...


# 4ea1d435 08-Apr-2022 Yaxun (Sam) Liu <[email protected]>

[CUDA][HIP] Externalize kernels in anonymous name space

kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: https://github.com/llvm/llvm-project/issues/5456

[CUDA][HIP] Externalize kernels in anonymous name space

kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: https://github.com/llvm/llvm-project/issues/54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353

show more ...


# e4903d8b 08-Apr-2022 Jonas Hahnfeld <[email protected]>

[CUDA/HIP] Remove argument from module ctor/dtor signatures

In theory, constructors can take arguments when called via .init_array
where at least glibc passes in (argc, argv, envp). This isn't used

[CUDA/HIP] Remove argument from module ctor/dtor signatures

In theory, constructors can take arguments when called via .init_array
where at least glibc passes in (argc, argv, envp). This isn't used in
the generated code and if it was, the first argument should be an
integer, not a pointer. For destructors registered via atexit, the
function should never take an argument.

Differential Revision: https://reviews.llvm.org/D123370

show more ...


# b8f0e128 22-Mar-2022 Nikita Popov <[email protected]>

[CodeGen] Remove some uses of deprecated Address constructor

Remove two stray uses in CodeGenModule and CGCUDANV.


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# 9d899d8f 22-Feb-2022 Yaxun (Sam) Liu <[email protected]>

[HIP] Support `-fgpu-default-stream`

Introduce -fgpu-default-stream={legacy|per-thread} option to
support per-thread default stream for HIP runtime.

When -fgpu-default-stream=per-thread, HIP kernel

[HIP] Support `-fgpu-default-stream`

Introduce -fgpu-default-stream={legacy|per-thread} option to
support per-thread default stream for HIP runtime.

When -fgpu-default-stream=per-thread, HIP kernels are
launched through hipLaunchKernel_spt instead of
hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1
is defined by the preprocessor to enable other per-thread stream
API's.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D120298

show more ...


# 50650766 16-Feb-2022 Nikita Popov <[email protected]>

[CodeGen] Rename deprecated Address constructor

To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().

While d

[CodeGen] Rename deprecated Address constructor

To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().

While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 3b172f60 02-Dec-2021 Yaxun (Sam) Liu <[email protected]>

[HIP] Fix -fgpu-rdc for Windows

This patch fixes issues for -fgpu-rdc for Windows MSVC
toolchain:

Fix COFF specific section flags and remove section types
in llvm-mc input file for Windows.

Escape

[HIP] Fix -fgpu-rdc for Windows

This patch fixes issues for -fgpu-rdc for Windows MSVC
toolchain:

Fix COFF specific section flags and remove section types
in llvm-mc input file for Windows.

Escape fatbin path in llvm-mc input file.

Add -triple option to llvm-mc.

Put __hip_gpubin_handle in comdat when it has linkonce_odr
linkage.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D115039

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 80072fde 04-Nov-2021 Yaxun (Sam) Liu <[email protected]>

[CUDA][HIP] Allow comdat for kernels

Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also r

[CUDA][HIP] Allow comdat for kernels

Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also requires these symbols be in comdat sections. Linux does not require
the symbols in comdat sections to be merged by linker but by default
clang puts them in comdat sections.

If a template kernel is instantiated identically in two TU's. MSVC requires
that them to be in comdat sections, otherwise MSVC linker will diagnose them as
duplicate symbols. However, currently clang does not put instantiated template
kernels in comdat sections, which causes link error for MSVC.

This patch allows putting instantiated template kernels into comdat sections.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D112492

show more ...


# 1b758925 29-Oct-2021 Jay Foad <[email protected]>

[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction

createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into

[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction

createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.

Differential Revision: https://reviews.llvm.org/D112791

show more ...


# 9fb9c6b9 19-Oct-2021 Uday Bondhugula <[email protected]>

[Clang][NFC] Clang CUDA codegen clean-up

Update an instance of dyn_cast -> cast and other NFC clang-tidy fixes
for Clang CUDA codegen.

Differential Revision: https://reviews.llvm.org/D112284


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# 6d3e7c78 17-Jul-2021 Nikita Popov <[email protected]>

[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type

Remove uses of to-be-deprecated API. I've fallen back to calling
getPointerElementType() in some cases where the correct type was

[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type

Remove uses of to-be-deprecated API. I've fallen back to calling
getPointerElementType() in some cases where the correct type wasn't
immediately obvious to me.

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# eba69b59 23-Apr-2021 Erich Keane <[email protected]>

Reimplement __builtin_unique_stable_name-

The original version of this was reverted, and @rjmcall provided some
advice to architect a new solution. This is that solution.

This implements a builtin

Reimplement __builtin_unique_stable_name-

The original version of this was reverted, and @rjmcall provided some
advice to architect a new solution. This is that solution.

This implements a builtin to provide a unique name that is stable across
compilations of this TU for the purposes of implementing the library
component of the unnamed kernel feature of SYCL. It does this by
running the Itanium mangler with a few modifications.

Because it is somewhat common to wrap non-kernel-related lambdas in
macros that aren't present on the device (such as for logging), this
uniquely generates an ID for all lambdas involved in the naming of a
kernel. It uses the lambda-mangling number to do this, except replaces
this with its own number (starting at 10000 for readabililty reasons)
for lambdas used to name a kernel.

Additionally, this implements itself as constexpr with a slight catch:
if a name would be invalidated by the use of this lambda in a later
kernel invocation, it is diagnosed as an error (see the Sema tests).

Differential Revision: https://reviews.llvm.org/D103112

show more ...


# 98575708 11-May-2021 Yaxun (Sam) Liu <[email protected]>

[CUDA][HIP] Fix device template variables

Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEff

[CUDA][HIP] Fix device template variables

Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEfferY

This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.

It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102237

show more ...


# 6a72ed23 19-Apr-2021 Jan Svoboda <[email protected]>

[clang] NFC: Fix range-based for loop warnings related to decl lookup


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# d5c0f00e 17-Mar-2021 Yaxun (Sam) Liu <[email protected]>

[CUDA][HIP] Mark device var used by host only

Add device variables to llvm.compiler.used if they are
ODR-used by either host or device functions.

This is necessary to prevent them from being
elimin

[CUDA][HIP] Mark device var used by host only

Add device variables to llvm.compiler.used if they are
ODR-used by either host or device functions.

This is necessary to prevent them from being
eliminated by whole-program optimization
where the compiler has no way to know a device
variable is used by some host code.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D98814

show more ...


# 2901dc75 06-Apr-2021 Simon Pilgrim <[email protected]>

Don't directly dereference getAs<> casts to avoid potential null dereferences. NFCI.

Replace with castAs<> which asserts the cast is valid.

Fixes a number of static analyzer warnings.


# cc947716 24-Mar-2021 Yaxun (Sam) Liu <[email protected]>

[CUDA][HIP] add __builtin_get_device_side_mangled_name

Add builtin function __builtin_get_device_side_mangled_name
to get device side manged name for functions and global
variables, which can be use

[CUDA][HIP] add __builtin_get_device_side_mangled_name

Add builtin function __builtin_get_device_side_mangled_name
to get device side manged name for functions and global
variables, which can be used to get symbol address of kernels
or variables by mangled name in dynamically loaded
bundled code objects at run time.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99301

show more ...


Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# 5cf2a37f 08-Feb-2021 Yaxun (Sam) Liu <[email protected]>

[HIP] Emit kernel symbol

Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is require

[HIP] Emit kernel symbol

Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is required by ROCm debugger.

This patch emits a variable symbol which has the same name as the kernel
and uses it to register and launch the kernel. This allows C++ program to
launch a kernel by using the original kernel name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D86376

show more ...


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# 47acdec1 19-Jan-2021 Yaxun (Sam) Liu <[email protected]>

[CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc

For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope stati

[CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc

For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.

Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.

Reviewed by: Artem Belevich, and Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D85223

show more ...


# a3ce7f5c 05-Feb-2021 Yaxun (Sam) Liu <[email protected]>

[HIP] Fix managed variable linkage

Currently managed variables are emitted as undefined symbols, which
causes difficulty for diagnosing undefined symbols for non-managed
variables.

This patch trans

[HIP] Fix managed variable linkage

Currently managed variables are emitted as undefined symbols, which
causes difficulty for diagnosing undefined symbols for non-managed
variables.

This patch transforms managed variables in device compilation so that
they can be emitted as normal variables.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D96195

show more ...


1234