History log of /llvm-project-15.0.7/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (Results 1 – 25 of 465)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# ddc9e886 28-Jun-2022 Guozhi Wei <[email protected]>

[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency

Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance
to evaluate which instruction sequen

[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency

Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance
to evaluate which instruction sequence has lower latency.

Differential Revision: https://reviews.llvm.org/D124564

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# 163c77b2 08-Jun-2022 Serguei Katkov <[email protected]>

[AARCH64 folding] Do not fold any copy with NZCV

There is no instruction to fold NZCV, so, just do not do it.

Without the fix the added test case crashes with an assert
"Mismatched register size in

[AARCH64 folding] Do not fold any copy with NZCV

There is no instruction to fold NZCV, so, just do not do it.

Without the fix the added test case crashes with an assert
"Mismatched register size in non subreg COPY"

Reviewed By: danilaml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127294

show more ...


# 129b531c 19-Jun-2022 Kazu Hirata <[email protected]>

[llvm] Use value_or instead of getValueOr (NFC)


# c42a2255 13-Jun-2022 zhongyunde <[email protected]>

[MachineScheduler] Order more stores by ascending address

According D125377, we order STP Q's by ascending address. While on some
targets, paired 128 bit loads and stores are slow, so the STP will s

[MachineScheduler] Order more stores by ascending address

According D125377, we order STP Q's by ascending address. While on some
targets, paired 128 bit loads and stores are slow, so the STP will split
into STRQ and STUR, so I hope these stores will also be ordered.
Also add subtarget feature ascend-store-address to control the aggressive order.

Reviewed By: dmgreen, fhahn

Differential Revision: https://reviews.llvm.org/D126700

show more ...


# 0ff51d5d 10-Jun-2022 Eli Friedman <[email protected]>

Fix interaction of CFI instructions with MachineOutliner.

1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
throu

Fix interaction of CFI instructions with MachineOutliner.

1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
through.
2. Make sure copied CFI directives refer to the correct instruction.

Fixes https://github.com/llvm/llvm-project/issues/55842

Differential Revision: https://reviews.llvm.org/D126930

show more ...


# 3b9707db 05-Jun-2022 Kazu Hirata <[email protected]>

[llvm] Convert for_each to range-based for loops (NFC)


Revision tags: llvmorg-14.0.4
# 9c38fc11 18-May-2022 Sander de Smalen <[email protected]>

[AArch64] Remove references to Streaming SVE from target features.

Following discussion on D120261 and D121208 it seems better to remove the
concept of Streaming SVE from the subtarget/assembler pre

[AArch64] Remove references to Streaming SVE from target features.

Following discussion on D120261 and D121208 it seems better to remove the
concept of Streaming SVE from the subtarget/assembler predicates and
instead reason about 'SVE' and 'SME' as its higher level features, rather
than trying to model this runtime mode through explicit feature flags.

This patch is largely NFC.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D125977

show more ...


# 5cb14dc5 31-May-2022 David Green <[email protected]>

[AArch64] Look through copy in MachineCombiner FMUL patterns.

This is a small addition to D99662, which added machine combiner
patterns for FMUL(DUP(..)). Due to the way these are generated from
ISe

[AArch64] Look through copy in MachineCombiner FMUL patterns.

This is a small addition to D99662, which added machine combiner
patterns for FMUL(DUP(..)). Due to the way these are generated from
ISel, they may also be FMUL(COPY(DUP(..))), which this patch now
ignores the no-op COPY in.

Differential Revision: https://reviews.llvm.org/D126632

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# de07cde6 22-Apr-2022 Daniel Kiss <[email protected]>

[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.

autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these t

[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.

autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780

show more ...


# d0ea42a7 12-Apr-2022 Momchil Velikov <[email protected]>

[AArch64] Async unwind - function epilogues

Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D112330


Revision tags: llvmorg-14.0.1
# 50a97aac 24-Mar-2022 Momchil Velikov <[email protected]>

[AArch64] Async unwind - function prologues

Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` informat

[AArch64] Async unwind - function prologues

Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411

show more ...


# 37b37838 16-Mar-2022 Shengchen Kan <[email protected]>

[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 85c53c70 04-Mar-2022 Hans Wennborg <[email protected]>

Revert "[AArch64] Async unwind - function prologues"

It caused builds to assert with:

(StackSize == 0 && "We already have the CFA offset!"),
function generateCompactUnwindEncoding, file AArch64

Revert "[AArch64] Async unwind - function prologues"

It caused builds to assert with:

(StackSize == 0 && "We already have the CFA offset!"),
function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
> mov x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.

show more ...


# e4fa8291 03-Mar-2022 Cullen Rhodes <[email protected]>

[AArch64] Allow copying of SVE registers in Streaming SVE

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118562


# 63c9aca1 02-Mar-2022 Momchil Velikov <[email protected]>

Revert "[AArch64] Async unwind - function epilogues"

This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84.

It causes test failures that look like infinite loop in asan/hwasan
unwinding.


# 74319d67 02-Mar-2022 Momchil Velikov <[email protected]>

[AArch64] Async unwind - function epilogues

Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

[AArch64] Async unwind - function epilogues

Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112330

show more ...


Revision tags: llvmorg-14.0.0-rc2
# 32e8b550 28-Feb-2022 Momchil Velikov <[email protected]>

[AArch64] Async unwind - function prologues

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state

[AArch64] Async unwind - function prologues

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411

show more ...


# 25e92920 24-Feb-2022 Momchil Velikov <[email protected]>

[AArch64] Async unwind - helper functions to decide on CFI emission

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D112327


# fd7e59f0 24-Feb-2022 Momchil Velikov <[email protected]>

[AArch64] Async unwind - do not schedule frame setup/destroy

The PostRA scheduler can reorder non-CFI instructions in a way that
makes the unwind info not instruction precise.

Reviewed By: efriedm

[AArch64] Async unwind - do not schedule frame setup/destroy

The PostRA scheduler can reorder non-CFI instructions in a way that
makes the unwind info not instruction precise.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D112326

show more ...


# 68c718c8 23-Feb-2022 Jessica Paquette <[email protected]>

Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""

This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781.

This commit was not NFC.

(See: https://reviews.llvm.org/r

Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""

This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781.

This commit was not NFC.

(See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)

show more ...


# d97f997e 16-Feb-2022 Jessica Paquette <[email protected]>

[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"

We found a case in the Swift benchmarks where the MachineOutliner introduces
about a 20% compile time overhead in comparison to bu

[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"

We found a case in the Swift benchmarks where the MachineOutliner introduces
about a 20% compile time overhead in comparison to building without the
MachineOutliner.

The origin of this slowdown is that the benchmark has long blocks which incur
lots of LRU checks for lots of candidates.

Imagine a case like this:

```
bb:
i1
i2
i3
...
i123456
```

Now imagine that all of the outlining candidates appear early in the block, and
that something like, say, NZCV is defined at the end of the block.

The outliner has to check liveness for certain registers across all candidates,
because outlining from areas where those registers are used is unsafe at call
boundaries.

This is fairly wasteful because in the previously-described case, the outlining
candidates will never appear in an area where those registers are live.

To avoid this, precalculate areas where we will consider outlining from.
Anything outside of these areas is mapped to illegal and not included in the
outlining search space. This allows us to reduce the size of the outliner's
suffix tree as well, giving us a potential memory win.

By precalculating areas, we can also optimize other checks too, like whether
or not LR is live across an outlining candidate.

Doing all of this is about a 16% compile time improvement on the case.

This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now,
this only implements the AArch64 path. The original "is the MBB safe" method
still works as before.

show more ...


# c69af70f 19-Feb-2022 Micah Weston <[email protected]>

[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.

Implements ADDS/SUBS 24-bit immediate optimization using the
MIPeepholeOpt pass. This follows the pattern:

Optimize ([adds|subs] r, i

[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.

Implements ADDS/SUBS 24-bit immediate optimization using the
MIPeepholeOpt pass. This follows the pattern:

Optimize ([adds|subs] r, imm) -> ([ADDS|SUBS] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([adds|subs] r, imm) -> ([SUBS|ADDS] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The SplitAndOpcFunc type had to change the return type to an Opcode pair so that
the first add/sub is the regular instruction and the second is the flag setting
instruction. This required updating the code in the AND case.

Testing:

I ran a two stage bootstrap with this code.
Using the second stage compiler, I verified that the negation of an ADDS to SUBS
or vice versa is a valid optimization. Example V == -0x111111.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118663

show more ...


# fc1b2122 17-Feb-2022 Kerry McLaughlin <[email protected]>

[AArch64][SVE] Add structured load/store opcodes to getMemOpInfo

Currently, loading from or storing to a stack location with a structured load
or store crashes in isAArch64FrameOffsetLegal as the op

[AArch64][SVE] Add structured load/store opcodes to getMemOpInfo

Currently, loading from or storing to a stack location with a structured load
or store crashes in isAArch64FrameOffsetLegal as the opcodes are not handled by
getMemOpInfo. This patch adds the opcodes for structured load/store instructions
with an immediate index to getMemOpInfo & getLoadStoreImmIdx, setting appropriate
values for the scale, width & min/max offsets.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D119338

show more ...


# f3809b20 17-Feb-2022 Pavel Kosov <[email protected]>

[AArch64][SchedModels] Handle virtual registers in FP/NEON predicates

Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA sche

[AArch64][SchedModels] Handle virtual registers in FP/NEON predicates

Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA scheduling. Patch fixes this implementing two function predicates: CheckQForm for checking that instruction writes 128-bit NEON register and CheckFpOrNEON which checks that instruction writes FP register (any width). The latter supersedes Check[HSD]Form predicates which are not used individually.

OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D114642

show more ...


# 6d58f4ab 16-Feb-2022 Jessica Paquette <[email protected]>

[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions

It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for re

[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions

It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.

This patch hides this stuff behind some helper functions:

* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`

This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.

Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.

Some other quality-of-code improvements:

* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`

This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.

show more ...


12345678910>>...19