| #
de34a940 |
| 18-Nov-2021 |
Phoebe Wang <[email protected]> |
[X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers when passing variable arguments. GCC also provides option -mskip-rax-setup
[X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers when passing variable arguments. GCC also provides option -mskip-rax-setup to skip the setup of rax when SSE is disabled. This helps to reduce the code size, see pr23258.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112413
show more ...
|
| #
d0ac215d |
| 14-Nov-2021 |
Kazu Hirata <[email protected]> |
[clang] Use isa instead of dyn_cast (NFC)
|
| #
bb493460 |
| 12-Nov-2021 |
Adrian Kuegel <[email protected]> |
Revert "Implement target_clones multiversioning"
This reverts commit 9deab60ae710f8c4cc810cd680edfb64c803f42d. There is a possibly unintended semantic change.
|
| #
9deab60a |
| 05-Nov-2021 |
Erich Keane <[email protected]> |
Implement target_clones multiversioning
As discussed here: https://lwn.net/Articles/691932/
GCC6.0 adds target_clones multiversioning. This functionality is an odd cross between the cpu_dispatch an
Implement target_clones multiversioning
As discussed here: https://lwn.net/Articles/691932/
GCC6.0 adds target_clones multiversioning. This functionality is an odd cross between the cpu_dispatch and 'target' MV, but is compatible with neither.
This attribute allows you to list all options, then emits a separately optimized version of each function per-option (similar to the cpu_specific attribute). It automatically generates a resolver, just like the other two.
The mangling however, is... ODD to say the least. The mangling format is: <normal_mangling>.<option string>.<option ordinal>.
Differential Revision:https://reviews.llvm.org/D51650
show more ...
|
| #
4b3881e9 |
| 10-Nov-2021 |
Yaxun (Sam) Liu <[email protected]> |
Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall hidden argument is emitted for printf, but the sanitized kernels also u
Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall hidden argument is emitted for printf, but the sanitized kernels also use hostcall buffer to report a error for invalid memory access, which is not handled by the above patch and it leads to vdi runtime error:
Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b
Patch by: Praveen Velliengiri
Reviewed by: Yaxun Liu, Matt Arsenault
Differential Revision: https://reviews.llvm.org/D112820
show more ...
|
| #
80072fde |
| 04-Nov-2021 |
Yaxun (Sam) Liu <[email protected]> |
[CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also r
[CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also requires these symbols be in comdat sections. Linux does not require the symbols in comdat sections to be merged by linker but by default clang puts them in comdat sections.
If a template kernel is instantiated identically in two TU's. MSVC requires that them to be in comdat sections, otherwise MSVC linker will diagnose them as duplicate symbols. However, currently clang does not put instantiated template kernels in comdat sections, which causes link error for MSVC.
This patch allows putting instantiated template kernels into comdat sections.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D112492
show more ...
|
| #
9efce0ba |
| 06-Nov-2021 |
Itay Bookstein <[email protected]> |
[clang] Run LLVM Verifier in modes without CodeGen too
Previously, the Backend_Emit{Nothing,BC,LL} modes did not run the LLVM verifier since it is usually added via the TargetMachine::addPassesToEmi
[clang] Run LLVM Verifier in modes without CodeGen too
Previously, the Backend_Emit{Nothing,BC,LL} modes did not run the LLVM verifier since it is usually added via the TargetMachine::addPassesToEmitFile method according to the DisableVerify parameter. This is called from EmitAssemblyHelper::AddEmitPasses, which is only relevant for BackendAction-s that require CodeGen.
Note: * In these particular situations the verifier is added to the optimization pipeline rather than the codegen pipeline so that it runs prior to the BC/LL emission pass. * This change applies to both the old and the new PMs. * Because the clang tests use -emit-llvm ubiquitously, this change will enable the verifier for them. * A small bug is fixed in emitIFuncDefinition so that the clang/test/CodeGen/ifunc.c test would pass: the emitIFuncDefinition incorrectly passed the GlobalDecl of the IFunc itself to the call to GetOrCreateLLVMFunction for creating the resolver.
Signed-off-by: Itay Bookstein <[email protected]>
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D113352
show more ...
|
| #
3b1fd193 |
| 30-Oct-2021 |
Itay Bookstein <[email protected]> |
[CodeGen] Diagnose and reject non-function ifunc resolvers
Signed-off-by: Itay Bookstein <[email protected]>
Reviewed By: MaskRay, erichkeane
Differential Revision: https://reviews.llvm.org/D11
[CodeGen] Diagnose and reject non-function ifunc resolvers
Signed-off-by: Itay Bookstein <[email protected]>
Reviewed By: MaskRay, erichkeane
Differential Revision: https://reviews.llvm.org/D112868
show more ...
|
| #
737c4a26 |
| 09-Nov-2021 |
Atmn Patel <[email protected]> |
[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit
[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner.
Reviewed By: jdoerfert, JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D113421
show more ...
|
| #
ef717f38 |
| 09-Nov-2021 |
Atmn Patel <[email protected]> |
Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files"
This reverts commit 81a7cad2ffc18f15b732f69d991c8398c979c5ca.
|
| #
81a7cad2 |
| 09-Nov-2021 |
Atmn Patel <[email protected]> |
[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit
[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner.
Reviewed By: jdoerfert, JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D113421
show more ...
|
| #
8adb6d6d |
| 07-Nov-2021 |
Benjamin Kramer <[email protected]> |
[clang] Use llvm::reverse. NFCI.
|
| #
848812a5 |
| 01-Nov-2021 |
Itay Bookstein <[email protected]> |
[Verifier] Add verification logic for GlobalIFuncs
Verify that the resolver exists, that it is a defined Function, and that its return type matches the ifunc's type. Add corresponding check to Bitco
[Verifier] Add verification logic for GlobalIFuncs
Verify that the resolver exists, that it is a defined Function, and that its return type matches the ifunc's type. Add corresponding check to BitcodeReader, change clang to emit the correct type, and fix tests to comply.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112349
show more ...
|
| #
aad244df |
| 21-Oct-2021 |
Aaron Ballman <[email protected]> |
Revert "AddGlobalAnnotations for function with or without function body."
This reverts commit 121b2252de0eed68f2ddf5f09e924a6c35423d47.
The following code causes a crash in some circumstances:
s
Revert "AddGlobalAnnotations for function with or without function body."
This reverts commit 121b2252de0eed68f2ddf5f09e924a6c35423d47.
The following code causes a crash in some circumstances:
struct k { ~k() __attribute__((annotate(""))) {} }; void m() { k(); }
show more ...
|
| #
08ed2160 |
| 20-Oct-2021 |
Itay Bookstein <[email protected]> |
[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/1450
[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html
The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases.
This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR.
I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D108872
show more ...
|
| #
d245f2e8 |
| 17-Oct-2021 |
Kazu Hirata <[email protected]> |
[clang] Use llvm::erase_if (NFC)
|
| #
121b2252 |
| 11-Oct-2021 |
Chris Bieneman <[email protected]> |
AddGlobalAnnotations for function with or without function body.
When AnnotateAttr is on a function, AddGlobalAnnotations is only called in CodeGenModule::EmitGlobalFunctionDefinition which means An
AddGlobalAnnotations for function with or without function body.
When AnnotateAttr is on a function, AddGlobalAnnotations is only called in CodeGenModule::EmitGlobalFunctionDefinition which means AnnotateAttr on function declaration without function body will be ignored. The patch will move AddGlobalAnnotations to CodeGenModule::SetFunctionAttributes, so with or without function body, the AnnotateAttr will get code gen for a function.
It'll help case when AnnotateAttr is on external function, and the AnnotateAttr will be consumed in IR level.
For example, a pass to collect num of uses for functions with __attribute((annotate("count_use"))) after optimizations, As long as there's __attribute((annotate("count_use"))), function with or without function body should be counted.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D111109
Patch by: python3kgae (Xiang Li)
show more ...
|
| #
05392466 |
| 24-Sep-2021 |
Arthur Eubanks <[email protected]> |
Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bi
Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
show more ...
|
| #
569346f2 |
| 06-Oct-2021 |
Arthur Eubanks <[email protected]> |
Revert "Reland [IR] Increase max alignment to 4GB"
This reverts commit 8d64314ffea55f2ad94c1b489586daa8ce30f451.
|
| #
8d64314f |
| 24-Sep-2021 |
Arthur Eubanks <[email protected]> |
Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bi
Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
show more ...
|
| #
72cf8b60 |
| 06-Oct-2021 |
Arthur Eubanks <[email protected]> |
Revert "[IR] Increase max alignment to 4GB"
This reverts commit df84c1fe78130a86445d57563dea742e1b85156a.
Breaks some bots
|
| #
df84c1fe |
| 24-Sep-2021 |
Arthur Eubanks <[email protected]> |
[IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are
[IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
show more ...
|
| #
aa53785f |
| 23-Sep-2021 |
Arthur Eubanks <[email protected]> |
Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attrib
Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source).
One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols.
Previous revisions didn't properly declare the new dependencies.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
show more ...
|
| #
7833d20f |
| 28-Sep-2021 |
Arthur Eubanks <[email protected]> |
Revert "[clang] Rework dontcall attributes"
This reverts commit 2943071e2ee0c7f31f34062a44d12aeb0e3a66fd.
Breaks bots
|
| #
2943071e |
| 23-Sep-2021 |
Arthur Eubanks <[email protected]> |
[clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute val
[clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source).
One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
show more ...
|