| #
806bbc49 |
| 21-Feb-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Try to embed offloading objects after codegen
Currently we use the `-fembed-offload-object` option to embed a binary file into the host as a named section. This is currently only used as a
[OpenMP] Try to embed offloading objects after codegen
Currently we use the `-fembed-offload-object` option to embed a binary file into the host as a named section. This is currently only used as a codegen action, meaning we only handle this option correctly when the input is a bitcode file. This patch adds the same handling to embed an offloading object after we complete code generation. This allows us to embed the object correctly if the input file is source or bitcode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120270
show more ...
|
| #
dc152659 |
| 10-Mar-2022 |
Erich Keane <[email protected]> |
Have cpu-specific variants set 'tune-cpu' as an optimization hint
Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the
Have cpu-specific variants set 'tune-cpu' as an optimization hint
Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the 'feature' list of a processor to identify it. This results in the identified processor in source-code not being propogated to the optimizer, and thus, not able to be tuned for.
This patch changes to use the actual cpu as written for tune-cpu so that opt can make decisions based on the cpu-as-spelled, which should better match the behavior expected by the programmer.
Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or with different feature lists!).
If this is not done, there is two potential performance issues with the patch, but I believe them to be worth it in light of the improvements to behavior and performance.
1- In the event that the user spelled "ProcessorB", but we only have the features available to test for "ProcessorA" (where A is B minus features), AND there is an optimization opportunity for "B" that negatively affects "A", the optimizer will likely choose to do so.
2- In the event that the user spelled VendorI's processor, and the feature list allows it to run on VendorA's processor of similar features, AND there is an optimization opportunity for VendorIs that negatively affects "A"s, the optimizer will likely choose to do so. This can be fixed by adding an alias to X86TargetParser.def.
Differential Revision: https://reviews.llvm.org/D121410
show more ...
|
| #
f3480390 |
| 29-Jan-2022 |
Itay Bookstein <[email protected]> |
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers
The purpose of this change is to fix the following codegen bug:
``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) {
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers
The purpose of this change is to fix the following codegen bug:
``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; }
// other.c __attribute__((cpu_dispatch(generic))) int *foo(void);
// run: clang main.c other.c -o main; ./main ```
This will segfault prior to the change, and return the correct exit code 5 after the change.
The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text.
The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function.
I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`.
Previous discussion in: https://reviews.llvm.org/D112349
Signed-off-by: Itay Bookstein <[email protected]>
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D120266
show more ...
|
| #
50650766 |
| 16-Feb-2022 |
Nikita Popov <[email protected]> |
[CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated().
While d
[CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated().
While doing the rename, I've filled in element types in cases where it was relatively obvious, but we're still left with 135 calls to the deprecated constructor.
show more ...
|
| #
6398903a |
| 14-Feb-2022 |
Momchil Velikov <[email protected]> |
Extend the `uwtable` attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tabl
Extend the `uwtable` attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables.
Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of *not* having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size.
That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed.
This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114543
show more ...
|
| #
87dd3d35 |
| 11-Feb-2022 |
Arthur Eubanks <[email protected]> |
[clang][OpaquePtr] Remove call to getPointerElementType() in CodeGenModule::GetAddrOfGlobalTemporary()
|
| #
d8f99bb6 |
| 11-Feb-2022 |
Sameer Sahasrabuddhe <[email protected]> |
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now r
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument.
The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
show more ...
|
| #
1d97cb1f |
| 04-Feb-2022 |
Yaxun (Sam) Liu <[email protected]> |
[HIP] Emit amdgpu_code_object_version module flag
code object version determines ABI, therefore should not be mixed.
This patch emits amdgpu_code_object_version module flag in LLVM IR based on code
[HIP] Emit amdgpu_code_object_version module flag
code object version determines ABI, therefore should not be mixed.
This patch emits amdgpu_code_object_version module flag in LLVM IR based on code object version (default 4).
The amdgpu_code_object_version value is code object version times 100.
LLVM IR with different amdgpu_code_object_version module flag cannot be linked.
The -cc1 option -mcode-object-version=none is for ROCm device library use only, which supports multiple ABI.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D119026
show more ...
|
| #
171da443 |
| 04-Feb-2022 |
Yaxun (Sam) Liu <[email protected]> |
[HIPSPV] Fix literals are mapped to Generic address space
This issue is an oversight in D108621.
Literals in HIP are emitted as global constant variables with default address space which maps to Ge
[HIPSPV] Fix literals are mapped to Generic address space
This issue is an oversight in D108621.
Literals in HIP are emitted as global constant variables with default address space which maps to Generic address space for HIPSPV. In SPIR-V such variables translate to OpVariable instructions with Generic storage class which are not legal. Fix by mapping literals to CrossWorkGroup address space.
The literals are not mapped to UniformConstant because the “flat” pointers in HIP may reference them and “flat” pointers are modeled as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant pointers may not be casted to Generic.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D118876
show more ...
|
| #
853e0aa4 |
| 04-Feb-2022 |
Hans Wennborg <[email protected]> |
Don't dllexport reference temporaries
Even if the reference itself is dllexport, the temporary should not be. In fact, we're already giving it internal linkage, so dllexporting it is not just wastef
Don't dllexport reference temporaries
Even if the reference itself is dllexport, the temporary should not be. In fact, we're already giving it internal linkage, so dllexporting it is not just wasteful, but will fail to link, as in the example below:
$ cat /tmp/a.cc void _DllMainCRTStartup() {} const int __declspec(dllexport) &foo = 42;
$ clang-cl -fuse-ld=lld /tmp/a.cc /Zl /link /dll /out:a.dll lld-link: error: <root>: undefined symbol: int const &foo::$RT1
Differential revision: https://reviews.llvm.org/D118980
show more ...
|
| #
1f08b086 |
| 28-Jan-2022 |
Amilendra Kodithuwakku <[email protected]> |
[clang][ARM] Emit warnings when PACBTI-M is used with unsupported architectures
Branch protection in M-class is supported by - Armv8.1-M.Main - Armv8-M.Main - Armv7-M
Attempting to enable this f
[clang][ARM] Emit warnings when PACBTI-M is used with unsupported architectures
Branch protection in M-class is supported by - Armv8.1-M.Main - Armv8-M.Main - Armv7-M
Attempting to enable this for other architectures, either by command-line (e.g -mbranch-protection=bti) or by target attribute in source code (e.g. __attribute__((target("branch-protection=..."))) ) will generate a warning.
In both cases function attributes related to branch protection will not be emitted. Regardless of the warning, module level attributes related to branch protection will be emitted when it is enabled via the command-line.
The following people also contributed to this patch: - Victor Campos
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D115501
show more ...
|
| #
82af9502 |
| 21-Jan-2022 |
Joao Moreira <[email protected]> |
[X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit
[X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.
Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].
This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.
A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.
The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.
[1] - https://lkml.org/lkml/2021/11/22/591
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D116070
show more ...
|
| #
85c2bd2a |
| 19-Jan-2022 |
Yaxun (Sam) Liu <[email protected]> |
Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address option, because of appending module flag - amdgpu_hostcall twice, one
Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address option, because of appending module flag - amdgpu_hostcall twice, one for printf and one for sanitize option. This patch fixes that issue.
Patch by: Praveen Velliengiri
Reviewed by: Yaxun Liu, Roman Lebedev
Differential Revision: https://reviews.llvm.org/D116216
show more ...
|
| #
c63a3175 |
| 15-Jan-2022 |
Nikita Popov <[email protected]> |
[AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeLi
[AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.
show more ...
|
| #
2bcba21c |
| 14-Jan-2022 |
Erich Keane <[email protected]> |
[CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function before the function becomes 'multiversion' (such as
[CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function before the function becomes 'multiversion' (such as a member function) causes the wrong name to be emitted for one of the variants/resolver, since the name is cached. Make sure we invalidate the cache in cpu-dispatch/cpu-specific modes, like we previously did for just target multiversioning.
show more ...
|
| #
b699e8b1 |
| 13-Jan-2022 |
Erich Keane <[email protected]> |
Add another assert to cpu-dispatch emission to help track down a tough to repro error.
As mentioned yesterday, I've got a problem that I can only reproduce on Godbolt (none of the build configs on m
Add another assert to cpu-dispatch emission to help track down a tough to repro error.
As mentioned yesterday, I've got a problem that I can only reproduce on Godbolt (none of the build configs on my local machine!), so this is at least somewhat usable until I figure out a cause.
show more ...
|
| #
6e77ad11 |
| 12-Jan-2022 |
Erich Keane <[email protected]> |
Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on godbolt, where the cpu-dispatch resolver for an out of line member
Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on godbolt, where the cpu-dispatch resolver for an out of line member function is generated with the wrong name, causing a link failure.
show more ...
|
| #
d2cc6c2d |
| 03-Jan-2022 |
Serge Guelton <[email protected]> |
Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor sligh
Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step.
Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by
https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions
Differential Revision: https://reviews.llvm.org/D116599
show more ...
|
| #
40446663 |
| 09-Jan-2022 |
Kazu Hirata <[email protected]> |
[clang] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
|
| #
9290ccc3 |
| 04-Jan-2022 |
serge-sans-paille <[email protected]> |
Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of attributes to be removed from an AttrBuilder. Previously AttrBuilder was used both for bu
Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of attributes to be removed from an AttrBuilder. Previously AttrBuilder was used both for building and removing, which introduced odd situation like creation of Attribute with dummy value because the only relevant part was the attribute kind.
Differential Revision: https://reviews.llvm.org/D116110
show more ...
|
| #
ec2e26ea |
| 10-Aug-2021 |
Sami Tolvanen <[email protected]> |
[Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken functions with pointers to the CFI jump table. This is a problem for low-level code, such as op
[Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken functions with pointers to the CFI jump table. This is a problem for low-level code, such as operating system kernels, which may need the address of an actual function body without the jump table indirection.
This change adds the __builtin_function_start() builtin, which accepts an argument that can be constant-evaluated to a function, and returns the address of the function body.
Link: https://github.com/ClangBuiltLinux/linux/issues/1353
Depends on D108478
Reviewed By: pcc, rjmccall
Differential Revision: https://reviews.llvm.org/D108479
show more ...
|
| #
c3b624a1 |
| 15-Dec-2021 |
Nikita Popov <[email protected]> |
[CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the element type explicitly and drop it.
For cases where the correct element type was no
[CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the element type explicitly and drop it.
For cases where the correct element type was not immediately obvious to me or would require a slightly larger change I'm falling back to explicitly calling getPointerElementType() for now.
show more ...
|
| #
0a14674f |
| 03-Dec-2021 |
Peter Collingbourne <[email protected]> |
CodeGen: Strip exception specifications from function types in CFI type names.
With C++17 the exception specification has been made part of the function type, and therefore part of mangled type name
CodeGen: Strip exception specifications from function types in CFI type names.
With C++17 the exception specification has been made part of the function type, and therefore part of mangled type names.
However, it's valid to convert function pointers with an exception specification to function pointers with the same argument and return types but without an exception specification, which means that e.g. a function of type "void () noexcept" can be called through a pointer of type "void ()". We must therefore consider the two types to be compatible for CFI purposes.
We can do this by stripping the exception specification before mangling the type name, which is what this patch does.
Differential Revision: https://reviews.llvm.org/D115015
show more ...
|
| #
e3b2f022 |
| 01-Dec-2021 |
Ties Stuij <[email protected]> |
[clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function attribute. One patch for both mechanisms, as they use the same underlying parsing mech
[clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function attribute. One patch for both mechanisms, as they use the same underlying parsing mechanism.
These are recorded in a set of LLVM IR module-level attributes like we do for AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):
- command-line options are "translated" to module-level LLVM IR attributes (metadata).
- functions have PAC/BTI specific attributes iff the __attribute__((target("branch-protection=...))) was used in the function declaration.
- command-line option -mbranch-protection to armclang targeting Arm, following this grammar:
branch-protection ::= "-mbranch-protection=" <protection> protection ::= "none" | "standard" | "bti" [ "+" <pac-ret-clause> ] | <pac-ret-clause> [ "+" "bti"] pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ] pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]
b-key is simply a placeholder to make it consistent with AArch64's version. In Arm, however, it triggers a warning informing that b-key is unsupported and a-key will be selected instead.
- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the same grammer as the commandline options.
This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov - Victor Campos - Ties Stuij
Reviewed By: vhscampos
Differential Revision: https://reviews.llvm.org/D112421
show more ...
|
| #
fc53eb69 |
| 29-Nov-2021 |
Erich Keane <[email protected]> |
Reapply 'Implement target_clones multiversioning'
See discussion in D51650, this change was a little aggressive in an error while doing a 'while we were here', so this removes that error condition,
Reapply 'Implement target_clones multiversioning'
See discussion in D51650, this change was a little aggressive in an error while doing a 'while we were here', so this removes that error condition, as it is apparently useful.
This reverts commit bb4934601d731465e01e2e22c80ce2dbe687d73f.
show more ...
|