|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
a83aa33d |
| 16-Jun-2022 |
Bradley Smith <[email protected]> |
[IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of
[IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace.
Differential Revision: https://reviews.llvm.org/D127976
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
d0ea42a7 |
| 12-Apr-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill
Differential Revision: https://reviews.llvm.org/D112330
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
62a983eb |
| 06-Apr-2022 |
Daniil Kovalev <[email protected]> |
Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if"
This reverts commit 83a798d4b0e17ac41d5430f1290d3661343eee1e.
As discussed in D120714 with @thakis, the patch added unneed
Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if"
This reverts commit 83a798d4b0e17ac41d5430f1290d3661343eee1e.
As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits.
show more ...
|
| #
83a798d4 |
| 06-Apr-2022 |
Daniil Kovalev <[email protected]> |
[CodeGen] Place SDNode debug ID declaration under appropriate #if
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed.
Differential
[CodeGen] Place SDNode debug ID declaration under appropriate #if
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed.
Differential Revision: https://reviews.llvm.org/D120714
show more ...
|
| #
50a97aac |
| 24-Mar-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` informat
[AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example:
``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ```
after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ```
Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
85c53c70 |
| 04-Mar-2022 |
Hans Wennborg <[email protected]> |
Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64
Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.
when targeting iOS. See comment on the code review for reproducer.
> This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411
This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.
show more ...
|
| #
63c9aca1 |
| 02-Mar-2022 |
Momchil Velikov <[email protected]> |
Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84.
It causes test failures that look like infinite loop in asan/hwasan unwinding.
|
| #
74319d67 |
| 02-Mar-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues.
Reviewed By: MaskRay
[AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112330
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
32e8b550 |
| 28-Feb-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state
[AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example:
``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ```
after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ```
Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
2e585dd9 |
| 07-Dec-2021 |
Matt Devereau <[email protected]> |
[AArch64][SVE] Lower vector.insert to predicated merged MOV
Use predicated SEL for vector.insert instead of going through memory
Differential Revision: https://reviews.llvm.org/D115259
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
adec9223 |
| 09-Oct-2021 |
David Green <[email protected]> |
[AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting
[AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model.
The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster.
On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general.
When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used.
A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined.
Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average.
Differential Revision: https://reviews.llvm.org/D110830
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
0d8cd4e2 |
| 03-Aug-2021 |
Jason Molenda <[email protected]> |
[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is
[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is unshifted, subs x9, x0, #256 ; =256 when the comment adds nothing additional to the reader.
Differential Revision: https://reviews.llvm.org/D107196
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4 |
|
| #
00291150 |
| 28-Jun-2021 |
Bradley Smith <[email protected]> |
[TargetLowering][AArch64][SVE] Take into account accessed type when clamping address
When clamping the index for a memory access to a stacked vector we must take into account the entire type being a
[TargetLowering][AArch64][SVE] Take into account accessed type when clamping address
When clamping the index for a memory access to a stacked vector we must take into account the entire type being accessed, not just assume that we are accessing only a single element.
Differential Revision: https://reviews.llvm.org/D105016
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
83f5fa51 |
| 16-Apr-2021 |
David Sherwood <[email protected]> |
[CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can test if the index is less than the minimu
[CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can test if the index is less than the minimum number of elements in the vector. If so, we can simply return the index because we know it is guaranteed to fit inside the vector.
Differential Revision: https://reviews.llvm.org/D100639
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
| #
ad85e396 |
| 12-Jan-2021 |
Cullen Rhodes <[email protected]> |
[SVE] Add ISel pattern for addvl
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D94504
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
d863a0dd |
| 09-Dec-2020 |
Joe Ellis <[email protected]> |
[SelectionDAG] Implement SplitVecOp_INSERT_SUBVECTOR
This function is needed for when it is necessary to split the subvector operand of an llvm.experimental.vector.insert call. Splitting the subvect
[SelectionDAG] Implement SplitVecOp_INSERT_SUBVECTOR
This function is needed for when it is necessary to split the subvector operand of an llvm.experimental.vector.insert call. Splitting the subvector operand means performing two insertions: one inserting the lower part of the split subvector into the destination vector, and another for inserting the upper part.
Through experimenting, it seems quite rare to need split the subvector operand, but this is necessary to avoid assertion errors.
Differential Revision: https://reviews.llvm.org/D92760
show more ...
|