|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
ddc9e886 |
| 28-Jun-2022 |
Guozhi Wei <[email protected]> |
[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency
Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequen
[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency
Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequence has lower latency.
Differential Revision: https://reviews.llvm.org/D124564
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
163c77b2 |
| 08-Jun-2022 |
Serguei Katkov <[email protected]> |
[AARCH64 folding] Do not fold any copy with NZCV
There is no instruction to fold NZCV, so, just do not do it.
Without the fix the added test case crashes with an assert "Mismatched register size in
[AARCH64 folding] Do not fold any copy with NZCV
There is no instruction to fold NZCV, so, just do not do it.
Without the fix the added test case crashes with an assert "Mismatched register size in non subreg COPY"
Reviewed By: danilaml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127294
show more ...
|
| #
129b531c |
| 19-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use value_or instead of getValueOr (NFC)
|
| #
c42a2255 |
| 13-Jun-2022 |
zhongyunde <[email protected]> |
[MachineScheduler] Order more stores by ascending address
According D125377, we order STP Q's by ascending address. While on some targets, paired 128 bit loads and stores are slow, so the STP will s
[MachineScheduler] Order more stores by ascending address
According D125377, we order STP Q's by ascending address. While on some targets, paired 128 bit loads and stores are slow, so the STP will split into STRQ and STUR, so I hope these stores will also be ordered. Also add subtarget feature ascend-store-address to control the aggressive order.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D126700
show more ...
|
| #
0ff51d5d |
| 10-Jun-2022 |
Eli Friedman <[email protected]> |
Fix interaction of CFI instructions with MachineOutliner.
1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway throu
Fix interaction of CFI instructions with MachineOutliner.
1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction.
Fixes https://github.com/llvm/llvm-project/issues/55842
Differential Revision: https://reviews.llvm.org/D126930
show more ...
|
| #
3b9707db |
| 05-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Convert for_each to range-based for loops (NFC)
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
9c38fc11 |
| 18-May-2022 |
Sander de Smalen <[email protected]> |
[AArch64] Remove references to Streaming SVE from target features.
Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler pre
[AArch64] Remove references to Streaming SVE from target features.
Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags.
This patch is largely NFC.
Reviewed By: paulwalker-arm, david-arm
Differential Revision: https://reviews.llvm.org/D125977
show more ...
|
| #
5cb14dc5 |
| 31-May-2022 |
David Green <[email protected]> |
[AArch64] Look through copy in MachineCombiner FMUL patterns.
This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISe
[AArch64] Look through copy in MachineCombiner FMUL patterns.
This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in.
Differential Revision: https://reviews.llvm.org/D126632
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
de07cde6 |
| 22-Apr-2022 |
Daniel Kiss <[email protected]> |
[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these t
[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction.
Reviewed By: nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D111780
show more ...
|
| #
d0ea42a7 |
| 12-Apr-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill
Differential Revision: https://reviews.llvm.org/D112330
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
50a97aac |
| 24-Mar-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` informat
[AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example:
``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ```
after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ```
Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
show more ...
|
| #
37b37838 |
| 16-Mar-2022 |
Shengchen Kan <[email protected]> |
[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
85c53c70 |
| 04-Mar-2022 |
Hans Wennborg <[email protected]> |
Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64
Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.
when targeting iOS. See comment on the code review for reproducer.
> This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411
This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.
show more ...
|
| #
e4fa8291 |
| 03-Mar-2022 |
Cullen Rhodes <[email protected]> |
[AArch64] Allow copying of SVE registers in Streaming SVE
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D118562
|
| #
63c9aca1 |
| 02-Mar-2022 |
Momchil Velikov <[email protected]> |
Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84.
It causes test failures that look like infinite loop in asan/hwasan unwinding.
|
| #
74319d67 |
| 02-Mar-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues.
Reviewed By: MaskRay
[AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112330
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
32e8b550 |
| 28-Feb-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state
[AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example:
``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ```
after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ```
Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
show more ...
|
| #
25e92920 |
| 24-Feb-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - helper functions to decide on CFI emission
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D112327
|
| #
fd7e59f0 |
| 24-Feb-2022 |
Momchil Velikov <[email protected]> |
[AArch64] Async unwind - do not schedule frame setup/destroy
The PostRA scheduler can reorder non-CFI instructions in a way that makes the unwind info not instruction precise.
Reviewed By: efriedm
[AArch64] Async unwind - do not schedule frame setup/destroy
The PostRA scheduler can reorder non-CFI instructions in a way that makes the unwind info not instruction precise.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D112326
show more ...
|
| #
68c718c8 |
| 23-Feb-2022 |
Jessica Paquette <[email protected]> |
Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""
This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781.
This commit was not NFC.
(See: https://reviews.llvm.org/r
Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""
This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781.
This commit was not NFC.
(See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)
show more ...
|
| #
d97f997e |
| 16-Feb-2022 |
Jessica Paquette <[email protected]> |
[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"
We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to bu
[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"
We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner.
The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates.
Imagine a case like this:
``` bb: i1 i2 i3 ... i123456 ```
Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block.
The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries.
This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live.
To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win.
By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate.
Doing all of this is about a 16% compile time improvement on the case.
This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.
show more ...
|
| #
c69af70f |
| 19-Feb-2022 |
Micah Weston <[email protected]> |
[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.
Implements ADDS/SUBS 24-bit immediate optimization using the MIPeepholeOpt pass. This follows the pattern:
Optimize ([adds|subs] r, i
[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.
Implements ADDS/SUBS 24-bit immediate optimization using the MIPeepholeOpt pass. This follows the pattern:
Optimize ([adds|subs] r, imm) -> ([ADDS|SUBS] ([ADD|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers.
Optimize ([adds|subs] r, imm) -> ([SUBS|ADDS] ([SUB|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers.
The SplitAndOpcFunc type had to change the return type to an Opcode pair so that the first add/sub is the regular instruction and the second is the flag setting instruction. This required updating the code in the AND case.
Testing:
I ran a two stage bootstrap with this code. Using the second stage compiler, I verified that the negation of an ADDS to SUBS or vice versa is a valid optimization. Example V == -0x111111.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D118663
show more ...
|
| #
fc1b2122 |
| 17-Feb-2022 |
Kerry McLaughlin <[email protected]> |
[AArch64][SVE] Add structured load/store opcodes to getMemOpInfo
Currently, loading from or storing to a stack location with a structured load or store crashes in isAArch64FrameOffsetLegal as the op
[AArch64][SVE] Add structured load/store opcodes to getMemOpInfo
Currently, loading from or storing to a stack location with a structured load or store crashes in isAArch64FrameOffsetLegal as the opcodes are not handled by getMemOpInfo. This patch adds the opcodes for structured load/store instructions with an immediate index to getMemOpInfo & getLoadStoreImmIdx, setting appropriate values for the scale, width & min/max offsets.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D119338
show more ...
|
| #
f3809b20 |
| 17-Feb-2022 |
Pavel Kosov <[email protected]> |
[AArch64][SchedModels] Handle virtual registers in FP/NEON predicates
Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA sche
[AArch64][SchedModels] Handle virtual registers in FP/NEON predicates
Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA scheduling. Patch fixes this implementing two function predicates: CheckQForm for checking that instruction writes 128-bit NEON register and CheckFpOrNEON which checks that instruction writes FP register (any width). The latter supersedes Check[HSD]Form predicates which are not used individually.
OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D114642
show more ...
|
| #
6d58f4ab |
| 16-Feb-2022 |
Jessica Paquette <[email protected]> |
[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for re
[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for registers used in a sequence was also initialized by `initLRU`.
This patch hides this stuff behind some helper functions:
* `isAvailableAcrossAndOutOfSeq` * `isAnyUnavailableAcrossOrOutOfSeq` * `isAvailableInsideSeq`
This allows the user to avoid calling `initLRU` explicitly. Also, it allows us to separate initializing the used-in-sequence LRU from the main LRU.
Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this refactor requires that we de-const the Candidate there.
Some other quality-of-code improvements:
* LRUs in outliner::Candidate now have more descriptive names * Use `Register` instead of `unsigned` in some places * Improve readability in some places by using ranges rather than `std::for_each`
This is a preparatory commit for a larger compile time related change for the AArch64 outliner.
show more ...
|