split-vector-insert.ll - OpenGrok history log for /llvm-project-15.0.7/llvm/test/CodeGen/AArch64/split-vector-insert.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# a83aa33d	16-Jun-2022	Bradley Smith <[email protected]>	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of [IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976 show more ...
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# d0ea42a7	12-Apr-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330
Revision tags: llvmorg-14.0.1
# 62a983eb	06-Apr-2022	Daniil Kovalev <[email protected]>	Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit 83a798d4b0e17ac41d5430f1290d3661343eee1e. As discussed in D120714 with @thakis, the patch added unneed Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit 83a798d4b0e17ac41d5430f1290d3661343eee1e. As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits. show more ...
# 83a798d4	06-Apr-2022	Daniil Kovalev <[email protected]>	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential [CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714 show more ...
# 50a97aac	24-Mar-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function prologues Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05 This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` informat [AArch64] Async unwind - function prologues Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05 This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 85c53c70	04-Mar-2022	Hans Wennborg <[email protected]>	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64 Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05. show more ...
# 63c9aca1	02-Mar-2022	Momchil Velikov <[email protected]>	Revert "[AArch64] Async unwind - function epilogues" This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84. It causes test failures that look like infinite loop in asan/hwasan unwinding.
# 74319d67	02-Mar-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay [AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330 show more ...
Revision tags: llvmorg-14.0.0-rc2
# 32e8b550	28-Feb-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state [AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411 show more ...
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 2e585dd9	07-Dec-2021	Matt Devereau <[email protected]>	[AArch64][SVE] Lower vector.insert to predicated merged MOV Use predicated SEL for vector.insert instead of going through memory Differential Revision: https://reviews.llvm.org/D115259
Revision tags: llvmorg-13.0.1-rc1
# adec9223	09-Oct-2021	David Green <[email protected]>	[AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting [AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830 show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 0d8cd4e2	03-Aug-2021	Jason Molenda <[email protected]>	[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is [AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is unshifted, subs x9, x0, #256 ; =256 when the comment adds nothing additional to the reader. Differential Revision: https://reviews.llvm.org/D107196 show more ...
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4
# 00291150	28-Jun-2021	Bradley Smith <[email protected]>	[TargetLowering][AArch64][SVE] Take into account accessed type when clamping address When clamping the index for a memory access to a stacked vector we must take into account the entire type being a [TargetLowering][AArch64][SVE] Take into account accessed type when clamping address When clamping the index for a memory access to a stacked vector we must take into account the entire type being accessed, not just assume that we are accessing only a single element. Differential Revision: https://reviews.llvm.org/D105016 show more ...
Revision tags: llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 83f5fa51	16-Apr-2021	David Sherwood <[email protected]>	[CodeGen] Improve code generation for clamping of constant indices with scalable vectors When trying to clamp a constant index into a scalable vector we can test if the index is less than the minimu [CodeGen] Improve code generation for clamping of constant indices with scalable vectors When trying to clamp a constant index into a scalable vector we can test if the index is less than the minimum number of elements in the vector. If so, we can simply return the index because we know it is guaranteed to fit inside the vector. Differential Revision: https://reviews.llvm.org/D100639 show more ...
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# ad85e396	12-Jan-2021	Cullen Rhodes <[email protected]>	[SVE] Add ISel pattern for addvl Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D94504
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# d863a0dd	09-Dec-2020	Joe Ellis <[email protected]>	[SelectionDAG] Implement SplitVecOp_INSERT_SUBVECTOR This function is needed for when it is necessary to split the subvector operand of an llvm.experimental.vector.insert call. Splitting the subvect [SelectionDAG] Implement SplitVecOp_INSERT_SUBVECTOR This function is needed for when it is necessary to split the subvector operand of an llvm.experimental.vector.insert call. Splitting the subvector operand means performing two insertions: one inserting the lower part of the split subvector into the destination vector, and another for inserting the upper part. Through experimenting, it seems quite rare to need split the subvector operand, but this is necessary to avoid assertion errors. Differential Revision: https://reviews.llvm.org/D92760 show more ...