|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
d0ea42a7 |
| 12-Apr-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill
Differential Revision: https://reviews.llvm.org/D112330
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
50a97aac |
| 24-Mar-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` informat
[AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example:
``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ```
after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ```
Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
85c53c70 |
| 04-Mar-2022 |
Hans Wennborg <[email protected]> |
Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64
Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.
when targeting iOS. See comment on the code review for reproducer.
> This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411
This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.
show more ...
|
| #
63c9aca1 |
| 02-Mar-2022 |
Momchil Velikov <[email protected]> |
Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84.
It causes test failures that look like infinite loop in asan/hwasan unwinding.
|
| #
74319d67 |
| 02-Mar-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues.
Reviewed By: MaskRay
[AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112330
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
32e8b550 |
| 28-Feb-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state
[AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example:
``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ```
after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ```
Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
adec9223 |
| 09-Oct-2021 |
David Green <[email protected]> |
[AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting
[AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model.
The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster.
On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general.
When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used.
A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined.
Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average.
Differential Revision: https://reviews.llvm.org/D110830
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
0d8cd4e2 |
| 03-Aug-2021 |
Jason Molenda <[email protected]> |
[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is
[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is unshifted, subs x9, x0, #256 ; =256 when the comment adds nothing additional to the reader.
Differential Revision: https://reviews.llvm.org/D107196
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
033138ea |
| 21-May-2021 |
Nick Desaulniers <[email protected]> |
[IR] make stack-protector-guard-* flags into module attrs
D88631 added initial support for:
- -mstack-protector-guard= - -mstack-protector-guard-reg= - -mstack-protector-guard-offset=
flags, and D
[IR] make stack-protector-guard-* flags into module attrs
D88631 added initial support for:
- -mstack-protector-guard= - -mstack-protector-guard-reg= - -mstack-protector-guard-offset=
flags, and D100919 extended these to AArch64. Unfortunately, these flags aren't retained for LTO. Make them module attributes rather than TargetOptions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1378
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102742
show more ...
|
| #
0f417789 |
| 17-May-2021 |
Nick Desaulniers <[email protected]> |
[AArch64] Support customizing stack protector guard
Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags:
1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-r
[AArch64] Support customizing stack protector guard
Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags:
1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-reg=sp_el0 3. -mstack-protector-guard-offset=0
to use the system register sp_el0 for the stack canary, enabling the kernel to have a unique stack canary per task (like a thread, but not limited to userspace as the kernel can preempt itself).
Address pr/47341 for aarch64.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/289 Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen
Differential Revision: https://reviews.llvm.org/D100919
show more ...
|