AArch64InstrInfo.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# ddc9e886	28-Jun-2022	Guozhi Wei <[email protected]>	[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequen [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequence has lower latency. Differential Revision: https://reviews.llvm.org/D124564 show more ...
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# 163c77b2	08-Jun-2022	Serguei Katkov <[email protected]>	[AARCH64 folding] Do not fold any copy with NZCV There is no instruction to fold NZCV, so, just do not do it. Without the fix the added test case crashes with an assert "Mismatched register size in [AARCH64 folding] Do not fold any copy with NZCV There is no instruction to fold NZCV, so, just do not do it. Without the fix the added test case crashes with an assert "Mismatched register size in non subreg COPY" Reviewed By: danilaml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127294 show more ...
# 129b531c	19-Jun-2022	Kazu Hirata <[email protected]>	[llvm] Use value_or instead of getValueOr (NFC)
# c42a2255	13-Jun-2022	zhongyunde <[email protected]>	[MachineScheduler] Order more stores by ascending address According D125377, we order STP Q's by ascending address. While on some targets, paired 128 bit loads and stores are slow, so the STP will s [MachineScheduler] Order more stores by ascending address According D125377, we order STP Q's by ascending address. While on some targets, paired 128 bit loads and stores are slow, so the STP will split into STRQ and STUR, so I hope these stores will also be ordered. Also add subtarget feature ascend-store-address to control the aggressive order. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D126700 show more ...
# 0ff51d5d	10-Jun-2022	Eli Friedman <[email protected]>	Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway throu Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930 show more ...
# 3b9707db	05-Jun-2022	Kazu Hirata <[email protected]>	[llvm] Convert for_each to range-based for loops (NFC)
Revision tags: llvmorg-14.0.4
# 9c38fc11	18-May-2022	Sander de Smalen <[email protected]>	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler pre [AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977 show more ...
# 5cb14dc5	31-May-2022	David Green <[email protected]>	[AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISe [AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in. Differential Revision: https://reviews.llvm.org/D126632 show more ...
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# de07cde6	22-Apr-2022	Daniel Kiss <[email protected]>	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these t [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780 show more ...
# d0ea42a7	12-Apr-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330
Revision tags: llvmorg-14.0.1
# 50a97aac	24-Mar-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function prologues Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05 This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` informat [AArch64] Async unwind - function prologues Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05 This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411 show more ...
# 37b37838	16-Mar-2022	Shengchen Kan <[email protected]>	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 85c53c70	04-Mar-2022	Hans Wennborg <[email protected]>	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64 Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05. show more ...
# e4fa8291	03-Mar-2022	Cullen Rhodes <[email protected]>	[AArch64] Allow copying of SVE registers in Streaming SVE Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118562
# 63c9aca1	02-Mar-2022	Momchil Velikov <[email protected]>	Revert "[AArch64] Async unwind - function epilogues" This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84. It causes test failures that look like infinite loop in asan/hwasan unwinding.
# 74319d67	02-Mar-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay [AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330 show more ...
Revision tags: llvmorg-14.0.0-rc2
# 32e8b550	28-Feb-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state [AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411 show more ...
# 25e92920	24-Feb-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - helper functions to decide on CFI emission Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D112327
# fd7e59f0	24-Feb-2022	Momchil Velikov <[email protected]>	[AArch64] Async unwind - do not schedule frame setup/destroy The PostRA scheduler can reorder non-CFI instructions in a way that makes the unwind info not instruction precise. Reviewed By: efriedm [AArch64] Async unwind - do not schedule frame setup/destroy The PostRA scheduler can reorder non-CFI instructions in a way that makes the unwind info not instruction precise. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D112326 show more ...
# 68c718c8	23-Feb-2022	Jessica Paquette <[email protected]>	Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"" This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781. This commit was not NFC. (See: https://reviews.llvm.org/r Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"" This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781. This commit was not NFC. (See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781) show more ...
# d97f997e	16-Feb-2022	Jessica Paquette <[email protected]>	[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to bu [MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before. show more ...
# c69af70f	19-Feb-2022	Micah Weston <[email protected]>	[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt. Implements ADDS/SUBS 24-bit immediate optimization using the MIPeepholeOpt pass. This follows the pattern: Optimize ([adds\|subs] r, i [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt. Implements ADDS/SUBS 24-bit immediate optimization using the MIPeepholeOpt pass. This follows the pattern: Optimize ([adds\|subs] r, imm) -> ([ADDS\|SUBS] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([adds\|subs] r, imm) -> ([SUBS\|ADDS] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The SplitAndOpcFunc type had to change the return type to an Opcode pair so that the first add/sub is the regular instruction and the second is the flag setting instruction. This required updating the code in the AND case. Testing: I ran a two stage bootstrap with this code. Using the second stage compiler, I verified that the negation of an ADDS to SUBS or vice versa is a valid optimization. Example V == -0x111111. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118663 show more ...
# fc1b2122	17-Feb-2022	Kerry McLaughlin <[email protected]>	[AArch64][SVE] Add structured load/store opcodes to getMemOpInfo Currently, loading from or storing to a stack location with a structured load or store crashes in isAArch64FrameOffsetLegal as the op [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo Currently, loading from or storing to a stack location with a structured load or store crashes in isAArch64FrameOffsetLegal as the opcodes are not handled by getMemOpInfo. This patch adds the opcodes for structured load/store instructions with an immediate index to getMemOpInfo & getLoadStoreImmIdx, setting appropriate values for the scale, width & min/max offsets. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D119338 show more ...
# f3809b20	17-Feb-2022	Pavel Kosov <[email protected]>	[AArch64][SchedModels] Handle virtual registers in FP/NEON predicates Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA sche [AArch64][SchedModels] Handle virtual registers in FP/NEON predicates Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA scheduling. Patch fixes this implementing two function predicates: CheckQForm for checking that instruction writes 128-bit NEON register and CheckFpOrNEON which checks that instruction writes FP register (any width). The latter supersedes Check[HSD]Form predicates which are not used individually. OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114642 show more ...
# 6d58f4ab	16-Feb-2022	Jessica Paquette <[email protected]>	[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for re [MachineOutliner] NFC: Hide LRU-related stuff behind helper functions It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for registers used in a sequence was also initialized by `initLRU`. This patch hides this stuff behind some helper functions: * `isAvailableAcrossAndOutOfSeq` * `isAnyUnavailableAcrossOrOutOfSeq` * `isAvailableInsideSeq` This allows the user to avoid calling `initLRU` explicitly. Also, it allows us to separate initializing the used-in-sequence LRU from the main LRU. Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this refactor requires that we de-const the Candidate there. Some other quality-of-code improvements: * LRUs in outliner::Candidate now have more descriptive names * Use `Register` instead of `unsigned` in some places * Improve readability in some places by using ranges rather than `std::for_each` This is a preparatory commit for a larger compile time related change for the AArch64 outliner. show more ...
12 3 4 5 6 7 8 9 10 >>...19