|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
8d0383eb |
| 24-Jun-2022 |
Matt Arsenault <[email protected]> |
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
621f58e7 |
| 18-Jun-2022 |
Kazu Hirata <[email protected]> |
[Target, CodeGen] Use isImm(), isReg(), etc (NFC)
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
3b9707db |
| 05-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Convert for_each to range-based for loops (NFC)
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
d8e67c1c |
| 26-Nov-2021 |
Martin Storsjö <[email protected]> |
[ARM] Add SEH opcodes in frame lowering
Skip inserting regular CFI instructions if using WinCFI.
This is based a fair amount on the corresponding ARM64 implementation, but instead of trying to inse
[ARM] Add SEH opcodes in frame lowering
Skip inserting regular CFI instructions if using WinCFI.
This is based a fair amount on the corresponding ARM64 implementation, but instead of trying to insert the SEH opcodes one by one where we generate other prolog/epilog instructions, we try to walk over the whole prolog/epilog range and insert them. This is done because in many cases, the exact number of instructions inserted is abstracted away deeper.
For some cases, we manually insert specific SEH opcodes directly where instructions are generated, where the automatic mapping of instructions to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes).
Skip Thumb2SizeReduction for SEH prologs/epilogs, and force tail calls to wide instructions (just like on MachO), to make sure that the unwind info actually matches the width of the final instructions, without heuristics about what later passes will do.
Mark SEH instructions as scheduling boundaries, to make sure that they aren't reordered away from the instruction they describe by PostRAScheduler.
Mark the SEH instructions with the NoMerge flag, to avoid doing tail merging of functions that have multiple epilogs that all end with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue".
Differential Revision: https://reviews.llvm.org/D125648
show more ...
|
| #
917dc074 |
| 28-Mar-2022 |
David Penry <[email protected]> |
[ARM] Recognize t2LoopEnd for software pipelining
- Add t2LoopEnd to TargetInstrInfo::analyzeBranch and related functions. As there are many side effects of analyzing a branch, only do so if so
[ARM] Recognize t2LoopEnd for software pipelining
- Add t2LoopEnd to TargetInstrInfo::analyzeBranch and related functions. As there are many side effects of analyzing a branch, only do so if software pipelining is enabled to maintain previous behavior when pipelining is not desired. - Make sure that t2LoopEndDec is immediately followed by a t2B when it is synthesized from a t2LoopEnd. This is done because the t2LoopEnd might have acquired a fall-through path, but IfConversion assumes that fall-through are only possible on analyzable branches.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D126322
show more ...
|
| #
dcb77643 |
| 29-Mar-2022 |
David Penry <[email protected]> |
Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled with clang.
original commit: 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f reverted
Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled with clang.
original commit: 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f reverted in: fa49021c68ef7a7adcdf7b8a44b9006506523191
------
This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch.
MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body.
The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues.
It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D122672
show more ...
|
| #
fa49021c |
| 28-Apr-2022 |
David Penry <[email protected]> |
Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f while I investigate a buildbot failure.
|
| #
28d09bbb |
| 29-Mar-2022 |
David Penry <[email protected]> |
[CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a l
[CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch.
MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body.
The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues.
It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D122672
show more ...
|
| #
37b37838 |
| 16-Mar-2022 |
Shengchen Kan <[email protected]> |
[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
|
| #
6d58f4ab |
| 16-Feb-2022 |
Jessica Paquette <[email protected]> |
[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for re
[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for registers used in a sequence was also initialized by `initLRU`.
This patch hides this stuff behind some helper functions:
* `isAvailableAcrossAndOutOfSeq` * `isAnyUnavailableAcrossOrOutOfSeq` * `isAvailableInsideSeq`
This allows the user to avoid calling `initLRU` explicitly. Also, it allows us to separate initializing the used-in-sequence LRU from the main LRU.
Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this refactor requires that we de-const the Candidate there.
Some other quality-of-code improvements:
* LRUs in outliner::Candidate now have more descriptive names * Use `Register` instead of `unsigned` in some places * Improve readability in some places by using ranges rather than `std::for_each`
This is a preparatory commit for a larger compile time related change for the AArch64 outliner.
show more ...
|
| #
762f0b54 |
| 23-Jan-2022 |
tyb0807 <[email protected]> |
[ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td
Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this inf
[ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td
Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this information should ideally be found in ARMInstrInfo.td, ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence, the .td files should be updated and no hard-coded instruction sizes should be used by getInstSizeInBytes() anymore.
Differential Revision: https://reviews.llvm.org/D118009
show more ...
|
| #
f15014ff |
| 26-Jan-2022 |
Benjamin Kramer <[email protected]> |
Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.
- It conflicts with the existing llvm::size in STLEx
Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.
- It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat
show more ...
|
| #
ef820632 |
| 26-Jan-2022 |
serge-sans-paille <[email protected]> |
Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no buil
Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).
show more ...
|
| #
319e7759 |
| 01-Jan-2022 |
David Green <[email protected]> |
[ARM] Verify addressing immediates
This adds at extra check into ARMBaseInstrInfo::verifyInstruction to verify the offsets used in addressing mode immediates using isLegalAddressImm. Some tests need
[ARM] Verify addressing immediates
This adds at extra check into ARMBaseInstrInfo::verifyInstruction to verify the offsets used in addressing mode immediates using isLegalAddressImm. Some tests needed fixing up as a result, adjusting the opcode created from CMSE stack adjustments.
Differential Revision: https://reviews.llvm.org/D114939
show more ...
|
| #
48349967 |
| 12-Dec-2021 |
Kazu Hirata <[email protected]> |
[Target] Use llvm::reverse (NFC)
|
| #
63eb7ff4 |
| 07-Dec-2021 |
Ties Stuij <[email protected]> |
[ARM] Implement PAC return address signing mechanism for PACBTI-M
This patch implements PAC return address signing for armv8-m. This patch roughly accomplishes the following things:
- PAC and AUT i
[ARM] Implement PAC return address signing mechanism for PACBTI-M
This patch implements PAC return address signing for armv8-m. This patch roughly accomplishes the following things:
- PAC and AUT instructions are generated. - They're part of the stack frame setup, so that shrink-wrapping can move them inwards to cover only part of a function - The auth code generated by PAC is saved across subroutine calls so that AUT can find it again to check - PAC is emitted before stacking registers (so that the SP it signs is the one on function entry). - The new pseudo-register ra_auth_code is mentioned in the DWARF frame data - With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT validates the corresponding value of SP - Emit correct unwind information when PAC is replaced by PACBTI - Handle tail calls correctly
Some notes:
We make the assembler accept the `.save {ra_auth_code}` directive that is emitted by the compiler when it saves a register that contains a return address authentication code.
For EHABI we need to have the `FrameSetup` flag on the instruction and handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit `.save {ra_auth_code}`, instead of `.save {r12}`.
For PACBTI-M, the instruction which computes return address PAC should use SP value before adjustment for the argument registers save are (used for variadic functions and when a parameter is is split between stack and register), but at the same it should be after the instruction that saves FPCXT when compiling a CMSE entry function.
This patch moves the varargs SP adjustment after the FPCXT save (they are never enabled at the same time), so in a following patch handling of the `PAC` instruction can be placed between them.
Epilogue emission code adjusted in a similar manner.
PACBTI-M code generation should not emit any instructions for architectures v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases is handled separately by a future ticket.
note on tail calls:
If the called function has four arguments that occupy registers `r0`-`r3`, the only option for holding the function pointer itself is `r12`, but this register is used to keep the PAC during function/prologue epilogue and clobbers the function pointer.
When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to keep six values - the four function arguments, the function pointer and the PAC, which is obviously impossible.
One option would be to authenticate the return address before all callee-saved registers are restored, so we have a scratch register to temporarily keep the value of `r12`. The issue with this approach is that it violates a fundamental invariant that PAC is computed using CFA as a modifier. It would also mean using separate instructions to pop `lr` and the rest of the callee-saved registers, which would offset the advantages of doing a tail call.
Instead, this patch disables indirect tail calls when the called function take four or more arguments and the return address sign and authentication is enabled for the caller function, conservatively assuming the caller function would spill LR.
This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov - Ties Stuij
Reviewed By: danielkiss
Differential Revision: https://reviews.llvm.org/D112429
show more ...
|
| #
b8f1ccb0 |
| 02-Dec-2021 |
David Green <[email protected]> |
[ARM] Introduce i8neg and i8pos addressing modes
Some instructions with i8 immediate ranges can only hold negative values (like t2LDRHi8), only hold positive values (like t2STRT) or hold +/- dependi
[ARM] Introduce i8neg and i8pos addressing modes
Some instructions with i8 immediate ranges can only hold negative values (like t2LDRHi8), only hold positive values (like t2STRT) or hold +/- depending on the U bit (like the pre/post inc instructions. e.g t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8, AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear.
This allows us to get the offset ranges of t2LDRHi8 correct in the load/store optimizer, fixing issues where we could end up creating instructions with positive offsets (which may then be encoded as ldrht).
Differential Revision: https://reviews.llvm.org/D114638
show more ...
|
| #
f5f28d5b |
| 01-Dec-2021 |
Ties Stuij <[email protected]> |
[ARM] Implement BTI placement pass for PACBTI-M
This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targ
[ARM] Implement BTI placement pass for PACBTI-M
This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass.
BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads
Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels).
Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI.
The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function.
The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero.
PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions).
The ARM Constant Island pass is adjusted to handle BTI instructions correctly.
Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI.
The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations.
This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Mikhail Maltsev - Momchil Velikov - Ties Stuij
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D112426
show more ...
|
| #
89453ed6 |
| 30-Nov-2021 |
Nick Desaulniers <[email protected]> |
[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards
We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the stack guard for PIC code that references the GOT, since arm-pseudo m
[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards
We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the stack guard for PIC code that references the GOT, since arm-pseudo may expand this to the narrow tLDRpci rather than the wider t2LDRpci.
Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci.
Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361
Reviewed By: ardb
Differential Revision: https://reviews.llvm.org/D114762
show more ...
|
| #
562356d6 |
| 26-Nov-2021 |
Kazu Hirata <[email protected]> |
[Target] Use range-based for loops (NFC)
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
3264e959 |
| 09-Nov-2021 |
Jay Foad <[email protected]> |
[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals
[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D113493
show more ...
|
| #
a19da876 |
| 09-Nov-2021 |
Ard Biesheuvel <[email protected]> |
[ARM] implement support for TLS register based stack protector
Implement support for loading the stack canary from a memory location held in the TLS register, with an optional offset applied. This i
[ARM] implement support for TLS register based stack protector
Implement support for loading the stack canary from a memory location held in the TLS register, with an optional offset applied. This is used by the Linux kernel to implement per-task stack canaries, which is impossible on SMP systems when using a global variable for the stack canary.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112768
show more ...
|
| #
2caf85ad |
| 08-Nov-2021 |
Ard Biesheuvel <[email protected]> |
[ARM] implement LOAD_STACK_GUARD for remaining targets
Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in sp
[ARM] implement LOAD_STACK_GUARD for remaining targets
Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register.
So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue.
This code was split off from
D112768: [ARM] implement support for TLS register based stack protector
for which it is a prerequisite.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112811
show more ...
|
| #
41ef3187 |
| 07-Nov-2021 |
Kazu Hirata <[email protected]> |
[ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC)
|
| #
082fa568 |
| 13-Oct-2021 |
John Brawn <[email protected]> |
[ARM] Fix MOVCC peephole to not use an incorrect register class
The MOVCC peephole eliminates a MOVCC by making one of its inputs a conditional instruction, but when doing this it should be using bo
[ARM] Fix MOVCC peephole to not use an incorrect register class
The MOVCC peephole eliminates a MOVCC by making one of its inputs a conditional instruction, but when doing this it should be using both inputs of the MOVCC to decide on the register class to use as otherwise we can get an error when using -verify-machineinstrs.
Differential Revision: https://reviews.llvm.org/D111714
show more ...
|