|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
f55dbfbd |
| 22-Jul-2022 |
Shubham Narlawar <[email protected]> |
[AArch64] Move SeparateConstOffsetFromGEPPass before LSR and enable EnableGEPOpt by default.
GEP's across basic blocks were not getting splitted due to EnableGEPOpt which was turned off by default.
[AArch64] Move SeparateConstOffsetFromGEPPass before LSR and enable EnableGEPOpt by default.
GEP's across basic blocks were not getting splitted due to EnableGEPOpt which was turned off by default. Hence, EarlyCSE missed the opportunity to eliminate common part of GEP's. This can be achieved by simply turning GEP pass on. - This patch moves SeparateConstOffsetFromGEPPass() just before LSR. - It enables EnableGEPOpt by default.
Resolves - https://github.com/llvm/llvm-project/issues/50528
Added an unit test.
Differential Revision: https://reviews.llvm.org/D128582
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
7a47ee51 |
| 21-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Don't use Optional::getValue (NFC)
|
| #
e0e687a6 |
| 20-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Don't use Optional::hasValue (NFC)
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
7c13ae64 |
| 04-May-2022 |
Adrian Tong <[email protected]> |
Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP.
This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement
Dif
Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP.
This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement
Differential Revision: https://reviews.llvm.org/D125335
show more ...
|
| #
572fc7d2 |
| 23-May-2022 |
Andre Vieira <[email protected]> |
[AArch64] Order STP Q's by ascending address
This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have fo
[AArch64] Order STP Q's by ascending address
This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have found this to improve performance on Neoverse N1 and should not hurt other AArch64 cores.
Differential Revision: https://reviews.llvm.org/D125377
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
b4ad28da |
| 11-Apr-2022 |
Momchil Velikov <[email protected]> |
[CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CG
[CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CGA aware) nature of the unwind tables.
Unlike the `CFIInstrInserer` pass, this one almost always emits only `.cfi_remember_state`/`.cfi_restore_state`, which results in smaller unwind tables and also transparently handles custom unwind info extensions like CFA offset adjustement and save locations of SVE registers.
This pass takes advantage of the constraints taht LLVM imposes on the placement of save/restore points (cf. `ShrinkWrap.cpp`):
* there is a single basic block, containing the function prologue
* possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks.
* prologue and epilogue blocks are outside of any loops
Thus, during execution, at the beginning and at the end of each basic block the function can be in one of two states:
- "has a call frame", if the function has executed the prologue, or has not executed any epilogue
- "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue
These properties can be computed for each basic block by a single RPO traversal.
From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order.
Where these states differ, we insert compensating CFI instructions, which come in two flavours:
- CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be: ``` .cfi_def_cfa <sp>, 0 .cfi_same_value <rN> .cfi_same_value <rN-1> ... ``` where `<rN>` are the callee-saved registers.
- CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence: ``` .cfi_restore_state .cfi_remember_state ``` In this case we also insert a `.cfi_remember_state` after the last CFI instruction in the function prologue.
Reviewed By: MaskRay, danielkiss, chill
Differential Revision: https://reviews.llvm.org/D114545
show more ...
|
| #
1235aaef |
| 06-Apr-2022 |
Craig Topper <[email protected]> |
[AArch64][AMDGPU][WebAssembly] Use static_cast instead of a reinterpret_cast to downcast in parseMachineFunctionInfo. NFC
static_cast is a little safer here since the compiler will ensure we're cast
[AArch64][AMDGPU][WebAssembly] Use static_cast instead of a reinterpret_cast to downcast in parseMachineFunctionInfo. NFC
static_cast is a little safer here since the compiler will ensure we're casting to a class derived from yaml::MachineFunctionInfo.
I believe this first appeared on AMDGPU and was copied to the other two targets.
Spotted when it was being copied to RISCV in D123178.
Differential Revision: https://reviews.llvm.org/D123260
show more ...
|
| #
0320115c |
| 05-Apr-2022 |
Muhammad Omair Javaid <[email protected]> |
Revert "[CodeGen] Async unwind - add a pass to fix CFI information"
This reverts commit 980c3e6dd223a8e628367144b8180117950bb364.
This commit had failing tests with clang crashing across various AA
Revert "[CodeGen] Async unwind - add a pass to fix CFI information"
This reverts commit 980c3e6dd223a8e628367144b8180117950bb364.
This commit had failing tests with clang crashing across various AArch64/Linux buildots.
https://lab.llvm.org/buildbot/#/builders/179/builds/3346
Differential Revision: https://reviews.llvm.org/D114545
show more ...
|
| #
980c3e6d |
| 04-Apr-2022 |
Momchil Velikov <[email protected]> |
[CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CF
[CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CFG aware) nature of the unwind tables.
Unlike the `CFIInstrInserer` pass, this one almost always emits only `.cfi_remember_state`/`.cfi_restore_state`, which results in smaller unwind tables and also transparently handles custom unwind info extensions like CFA offset adjustement and save locations of SVE registers.
This pass takes advantage of the constraints that LLVM imposes on the placement of save/restore points (cf. `ShrinkWrap.cpp`):
* there is a single basic block, containing the function prologue
* possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks.
* prologue and epilogue blocks are outside of any loops
Thus, during execution, at the beginning and at the end of each basic block the function can be in one of two states:
- "has a call frame", if the function has executed the prologue, or has not executed any epilogue
- "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue
These properties can be computed for each basic block by a single RPO traversal.
In order to accommodate backends which do not generate unwind info in epilogues we compute an additional property "strong no call frame on entry" which is set for the entry point of the function and for every block reachable from the entry along a path that does not execute the prologue. If this property holds, it takes precedence over the "has a call frame" property.
From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order.
Where these states differ, we insert compensating CFI instructions, which come in two flavours:
- CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be: ``` .cfi_def_cfa <sp>, 0 .cfi_same_value <rN> .cfi_same_value <rN-1> ... ``` where `<rN>` are the callee-saved registers.
- CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence: ``` .cfi_restore_state .cfi_remember_state ``` In this case we also insert a `.cfi_remember_state` after the last CFI instruction in the function prologue.
Reviewed By: MaskRay, danielkiss, chill
Differential Revision: https://reviews.llvm.org/D114545
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
ed98c1b3 |
| 09-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
c4b1a63a |
| 25-Feb-2022 |
Jameson Nash <[email protected]> |
mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D1
mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D120518
show more ...
|
| #
371fcb72 |
| 17-Feb-2022 |
Roman Lebedev <[email protected]> |
[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP
That transformation is lossy, as discussed in https://github.com/llvm/l
[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP
That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574
This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG.
Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP.
Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D119854
show more ...
|
| #
f9270214 |
| 10-Feb-2022 |
Yuanfang Chen <[email protected]> |
Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de084a540cfa38b72e609b25ea0569bb7.
Restrict the tests to Windows only since the flag symbol hash depends on system-dependent pat
Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de084a540cfa38b72e609b25ea0569bb7.
Restrict the tests to Windows only since the flag symbol hash depends on system-dependent path normalization.
show more ...
|
| #
b380a31d |
| 10-Feb-2022 |
Yuanfang Chen <[email protected]> |
Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683f80d174ea9c97000db3ec3276bc022.
Break bots: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x6
Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683f80d174ea9c97000db3ec3276bc022.
Break bots: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview
show more ...
|
| #
bd3a1de6 |
| 10-Feb-2022 |
Yuanfang Chen <[email protected]> |
[clang-cl] Support the /JMC flag
The introduction and some examples are on this page: https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/
The `/JMC` flag enables these
[clang-cl] Support the /JMC flag
The introduction and some examples are on this page: https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/
The `/JMC` flag enables these instrumentations: - Insert at the beginning of every function immediately after the prologue with a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`. The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean global variable (the global variable is initialized to 1) with the name convention `__<hash>_<filename>`. All such global variables are placed in the `.msvcjmc` section. - The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping with a directory path. MSVC uses some unknown hashing function. Here I used DJB. - Add a dummy/empty COMDAT function `__JustMyCode_Default`. - Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link option via ".drectve" section. This is to prevent failure in case `__CheckForDebuggerJustMyCode` is not provided during linking.
Implementation: All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D118428
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
52faad83 |
| 10-Dec-2021 |
Archibald Elliott <[email protected]> |
[AArch64] Use Feature for A53 Erratum 835769 Fix
When this pass was originally implemented, the fix pass was enabled using a llvm command-line flag. This works fine, except in the case of LTO, where
[AArch64] Use Feature for A53 Erratum 835769 Fix
When this pass was originally implemented, the fix pass was enabled using a llvm command-line flag. This works fine, except in the case of LTO, where the flag is not passed into the linker plugin in order to enable the function pass in the LTO backend.
Now LTO exists, the expectation now is to use target features rather than command-line arguments to control code generation, as this ensures that different command-line arguments in different files are correctly represented, and target-features always get to the LTO plugin as they are encoded into LLVM IR.
The fall-out of this change is that the fix pass has to always be added to the backend pass pipeline, so now it makes no changes if the function does not have the right target feature to enable it. This should make a minimal difference to compile time.
One advantage is it's now much easier to enable when compiling for a Cortex-A53, as CPUs imply their own individual sets of target-features, in a more fine-grained way. I haven't done this yet, but it is an option, if the fix should be enabled in more places.
Existing tests of the user interface are unaffected, the changes are to reflect that the argument is now turned into a target feature.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D114703
show more ...
|
| #
0395e015 |
| 07-Dec-2021 |
Cullen Rhodes <[email protected]> |
[IR] Split vscale_range interface
Interface is split from:
std::pair<unsigned, unsigned> getVScaleRangeArgs()
into separate functions for min/max:
unsigned getVScaleRangeMin(); Optional<uns
[IR] Split vscale_range interface
Interface is split from:
std::pair<unsigned, unsigned> getVScaleRangeArgs()
into separate functions for min/max:
unsigned getVScaleRangeMin(); Optional<unsigned> getVScaleRangeMax();
Reviewed By: sdesmalen, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D114075
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
dc84770d |
| 15-Nov-2021 |
Amara Emerson <[email protected]> |
[GlobalISel] Add a store-merging optimization pass and enable for AArch64.
This is a first attempt at a constant value consecutive store merging pass, a counterpart to the DAGCombiner's store mergin
[GlobalISel] Add a store-merging optimization pass and enable for AArch64.
This is a first attempt at a constant value consecutive store merging pass, a counterpart to the DAGCombiner's store merging optimization.
The high level goals of this pass:
* Have a simple and efficient algorithm. As close to linear time as we can get. Thus, prioritizing scalability of the algorithm over merging every corner case we can find. The DAGCombiner's store merging code has been the source of compile time and complexity issues in the past and I wanted to avoid that. * Don't introduce any new data structures for ordering memory operations. In MIR, we don't have the concept of chains like we do in the DAG, and the instruction order is stricter than enforcing ordering with graph edges. Although I considered adding something similar, I couldn't justify the overhead.
The pass is current split into 3 main parts. The main store merging code focuses on identifying candidate stores and managing the candidate group that's under consideration for merging. Analyzing addressing of stores is a potentially complex part and for now there's just a basic implementation to identify easy cases. Finally, the other main bit of complexity is the alias analysis, which tries to follow the same logic as the DAG's AA.
Currently this implementation only supports merging of constant stores. Stores of arbitrary variables are technically possible with a very small change, but the DAG chooses not to do this. Doing so here makes most code worse since there's extra overhead in merging values into wider registers.
On AArch64 -Os, this optimization results in very minor savings on CTMark.
Differential Revision: https://reviews.llvm.org/D109131
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
607fb1bb |
| 22-Sep-2021 |
David Sherwood <[email protected]> |
[AArch64] Always add -tune-cpu argument to -cc1 driver
This patch ensures that we always tune for a given CPU on AArch64 targets when the user specifies the "-mtune=xyz" flag. In the AArch64Subtarge
[AArch64] Always add -tune-cpu argument to -cc1 driver
This patch ensures that we always tune for a given CPU on AArch64 targets when the user specifies the "-mtune=xyz" flag. In the AArch64Subtarget if the tune flag is unset we use the CPU value instead.
I've updated the release notes here:
llvm/docs/ReleaseNotes.rst
and added tests here:
clang/test/Driver/aarch64-mtune.c
Differential Revision: https://reviews.llvm.org/D110258
show more ...
|
| #
89b57061 |
| 08-Oct-2021 |
Reid Kleckner <[email protected]> |
Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually us
Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
show more ...
|
| #
30caca39 |
| 07-Oct-2021 |
Jingu Kang <[email protected]> |
Third Recommit "[AArch64] Split bitmask immediate of bitwise AND operation"
This reverts the revert commit fc36fb4d23a5e419cf33002c87c0082f682cb77b with bug fixes.
Differential Revision: https://re
Third Recommit "[AArch64] Split bitmask immediate of bitwise AND operation"
This reverts the revert commit fc36fb4d23a5e419cf33002c87c0082f682cb77b with bug fixes.
Differential Revision: https://reviews.llvm.org/D109963
show more ...
|
| #
fc36fb4d |
| 06-Oct-2021 |
David Spickett <[email protected]> |
Revert "Second Recommit "[AArch64] Split bitmask immediate of bitwise AND operation""
This reverts commit 13f3c39f3658fa28cb008eb56a58d8e34697cd5d.
Due to test failures in stage 2 clang tests on AA
Revert "Second Recommit "[AArch64] Split bitmask immediate of bitwise AND operation""
This reverts commit 13f3c39f3658fa28cb008eb56a58d8e34697cd5d.
Due to test failures in stage 2 clang tests on AArch64 bots.
show more ...
|
| #
13f3c39f |
| 29-Sep-2021 |
Jingu Kang <[email protected]> |
Second Recommit "[AArch64] Split bitmask immediate of bitwise AND operation"
This reverts the revert commit c07f7099690e8607d119227db1f80ee21eff3a3b with bug fixes.
Differential Revision: https://r
Second Recommit "[AArch64] Split bitmask immediate of bitwise AND operation"
This reverts the revert commit c07f7099690e8607d119227db1f80ee21eff3a3b with bug fixes.
Differential Revision: https://reviews.llvm.org/D109963
show more ...
|
| #
8a645fc4 |
| 29-Sep-2021 |
David Green <[email protected]> |
[AArch64] Enable type promotion for AArch64
This enables the type promotion pass for AArch64, which acts as a CodeGenPrepare pass to promote illegal integers to legal ones, especially useful for rem
[AArch64] Enable type promotion for AArch64
This enables the type promotion pass for AArch64, which acts as a CodeGenPrepare pass to promote illegal integers to legal ones, especially useful for removing extends that would otherwise require cross-basic-block analysis.
I have enabled this generally, for both ISel and GlobalISel. In some quick experiments it appeared to help GlobalISel remove extra extends in places too, but that might just be missing optimizations that are better left for later. We can disable it again if required.
In my experiments, this can improvement performance in some cases, and codesize was a small improvement. SPEC was a very small improvement, within the noise. Some of the test cases show extends being moved out of loops, often when the extend would be part of a cmp operand, but that should reduce the latency of the instruction in the loop on many cpus. The signed-truncation-check tests are increasing as they are no longer matching specific DAG combines.
We also hope to add some additional improvements to the pass in the near future, to capture more cases of promoting extends through phis that have come up in a few places lately.
Differential Revision: https://reviews.llvm.org/D110239
show more ...
|
| #
c07f7099 |
| 29-Sep-2021 |
Sterling Augustine <[email protected]> |
Revert "Recommit "[AArch64] Split bitmask immediate of bitwise AND operation""
This reverts commit 73a196a11c0e6fe7bbf33055cc2c96ce3c61ff0d.
Causes crashes as reported in https://reviews.llvm.org/D
Revert "Recommit "[AArch64] Split bitmask immediate of bitwise AND operation""
This reverts commit 73a196a11c0e6fe7bbf33055cc2c96ce3c61ff0d.
Causes crashes as reported in https://reviews.llvm.org/D109963
show more ...
|