|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
4704da13 |
| 20-Jul-2022 |
David Green <[email protected]> |
[ARM] Fix Thumb2 compare being emitted ExpandCMP_SWAP
Given a patch like D129506, using instructions not valid for the current target feature set becomes an error. This fixes an issue in ARMExpandPs
[ARM] Fix Thumb2 compare being emitted ExpandCMP_SWAP
Given a patch like D129506, using instructions not valid for the current target feature set becomes an error. This fixes an issue in ARMExpandPseudo::ExpandCMP_SWAP where Thumb2 compares were used in Thumb1Only code, such as thumbv8m.baseline targets.
Differential Revision: https://reviews.llvm.org/D129695
show more ...
|
| #
cb806ce2 |
| 17-Jul-2022 |
David Green <[email protected]> |
[ARM] Guard VMOVH and VINS patterns.
These instructions are only available when fp is available, so cannot be used with just +mve. Add predicates to ensure we fall-back under the right circumstances.
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
07881861 |
| 03-Jun-2022 |
Guillaume Chatelet <[email protected]> |
[Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with
[Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times.
Differential Revision: https://reviews.llvm.org/D126910
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
440c4b70 |
| 21-Feb-2022 |
Craig Topper <[email protected]> |
[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).
By placi
[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).
By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw.
Some X86 tests for Z - abs(X) seem to have improved as well.
Other targets look to be a wash.
I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch.
This is an alternative to D119099 which was focused on RISCV only.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D119171
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
d6b07348 |
| 19-Jan-2022 |
Jim Lin <[email protected]> |
[NFC] Use Register instead of unsigned
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
2aed0813 |
| 07-Jan-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
|
| #
69ccc961 |
| 01-Jan-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use the default constructor for SDValue (NFC)
|
| #
fbb61adb |
| 25-Nov-2021 |
David Green <[email protected]> |
[ARM] Convert fptoi.sat to fixed point multiply
This is a very small addition to the existing MVE fixed point vcvt code to also create them from FP_TO_SINT_SAT and FP_TO_UINT_SAT nodes, which should
[ARM] Convert fptoi.sat to fixed point multiply
This is a very small addition to the existing MVE fixed point vcvt code to also create them from FP_TO_SINT_SAT and FP_TO_UINT_SAT nodes, which should be equally valid for native saturating converts under MVE.
Differential Revision: https://reviews.llvm.org/D114360
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
85b4b21c |
| 21-Sep-2021 |
Kazu Hirata <[email protected]> |
[llvm] Use make_early_inc_range (NFC)
|
|
Revision tags: llvmorg-13.0.0-rc3 |
|
| #
9af8f1b1 |
| 09-Sep-2021 |
Craig Topper <[email protected]> |
[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483.
R
[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D109535
show more ...
|
| #
9cb8f4d1 |
| 02-Sep-2021 |
David Green <[email protected]> |
[ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words:
mov r
[ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words:
mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated.
This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions.
This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used.
A lot of tests needed updating.
Differential Revision: https://reviews.llvm.org/D107638
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2 |
|
| #
1e770f03 |
| 17-Aug-2021 |
Simon Pilgrim <[email protected]> |
[ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the ca
[ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.
Fixes static analyser warnings.
show more ...
|
| #
77e8f4ee |
| 06-Aug-2021 |
David Green <[email protected]> |
[ARM] Define ComplexPatternFuncMutatesDAG
Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we n
[ARM] Define ComplexPatternFuncMutatesDAG
Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we need to define ComplexPatternFuncMutatesDAG.
Differential Revision: https://reviews.llvm.org/D107476
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
| #
24d76419 |
| 21-Jun-2021 |
Sam Tebbs <[email protected]> |
[ARM] Transform a floating-point to fixed-point conversion to a VCVT_fix
Much like fixed-point to floating-point conversion, the converse can also be transformed into a fixed-point VCVT. This patch
[ARM] Transform a floating-point to fixed-point conversion to a VCVT_fix
Much like fixed-point to floating-point conversion, the converse can also be transformed into a fixed-point VCVT. This patch transforms multiplications of floating point numbers by 2^n into a VCVT_fix. The exception is that a float to fixed conversion with 1 fractional bit ends up being an FADD (FADD(x, x) emulates FMUL(x, 2)) rather than an FMUL so there is a special case for that. This patch also moves the code from https://reviews.llvm.org/D103903 into a separate function as fixed to float and float to fixed are very similar.
Differential Revision: https://reviews.llvm.org/D104793
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc2 |
|
| #
bbe16b7a |
| 07-Jun-2021 |
Sam Tebbs <[email protected]> |
[ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix
Conversion from a fixed-point number to a floating-point number is done by multiplying the fixed-point number by 2^(-n) whe
[ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix
Conversion from a fixed-point number to a floating-point number is done by multiplying the fixed-point number by 2^(-n) where n is the number of fractional bits. Currently this is lowered to a vcvt (integer to floating-point) then a vmul, but it can instead be lowered directly to a vcvt (fixed-point to floating-point). This patch enables such transformations as long as the multiplication factor is a power of 2.
Differential Revision: https://reviews.llvm.org/D103903
show more ...
|
| #
521d3732 |
| 20-Jun-2021 |
Fangrui Song <[email protected]> |
Fix -Wunused-variable and -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC
|
| #
f6b9836b |
| 02-Jun-2021 |
Kristina Bessonova <[email protected]> |
[ARM][NEON] Combine base address updates for vld1Ndup intrinsics
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D103836
|
| #
44843e2a |
| 25-May-2021 |
Kristina Bessonova <[email protected]> |
[ARM][NEON] Combine base address updates for vld1x intrinsics
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D102855
|
|
Revision tags: llvmorg-12.0.1-rc1 |
|
| #
d59a2a32 |
| 06-May-2021 |
Kristina Bessonova <[email protected]> |
[ARM][NEON] Combine base address updates for vst1x intrinsics
Differential Revision: https://reviews.llvm.org/D102256
|
| #
1011d4ed |
| 13-May-2021 |
David Green <[email protected]> |
[ARM] Constrain CMPZ shift combine to a single use
We currently prefer t2CMPrs over t2CMPri when the node contains a shift. This can introduce more nodes if the shift has multiple uses though, as va
[ARM] Constrain CMPZ shift combine to a single use
We currently prefer t2CMPrs over t2CMPri when the node contains a shift. This can introduce more nodes if the shift has multiple uses though, as value from the shift will be needed anyway, and in the case of a t2CMPri compared with zero will more readily be removed entirely.
Differential Revision: https://reviews.llvm.org/D101688
show more ...
|
| #
34c098b7 |
| 11-May-2021 |
Tomas Matheson <[email protected]> |
[ARM] Prevent spilling between ldrex/strex pairs
Based on the same for AArch64: 4751cadcca45984d7671e594ce95aed8fe030bf1
At -O0, the fast register allocator may insert spills between the ldrex and
[ARM] Prevent spilling between ldrex/strex pairs
Based on the same for AArch64: 4751cadcca45984d7671e594ce95aed8fe030bf1
At -O0, the fast register allocator may insert spills between the ldrex and strex instructions inserted by AtomicExpandPass when expanding atomicrmw instructions in LL/SC loops. To avoid this, expand to cmpxchg loops and therefore expand the cmpxchg pseudos after register allocation.
Required a tweak to ARMExpandPseudo::ExpandCMP_SWAP to use the 4-byte encoding of UXT, since the pseudo instruction can be allocated a high register (R8-R15) which the 2-byte encoding doesn't support. However, the 4-byte encodings are not present for ARM v8-M Baseline. To enable this, two new pseudos are added for Thumb which are only valid for v8mbase, tCMP_SWAP_8 and tCMP_SWAP_16.
The previously committed attempt in D101164 had to be reverted due to runtime failures in the test suites. Rather than spending time fixing that implementation (adding another implementation of atomic operations and more divergence between backends) I have chosen to follow the approach taken in D101163.
Differential Revision: https://reviews.llvm.org/D101898
Depends on D101912
show more ...
|
| #
9d86095f |
| 03-May-2021 |
Tomas Matheson <[email protected]> |
Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 753185031d939711f8733639a77a6fdc3bdbad22.
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
75318503 |
| 31-Mar-2021 |
Tomas Matheson <[email protected]> |
[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert s
[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops.
To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation.
Floating point legalisation: f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32)
Differential Revision: https://reviews.llvm.org/D101164
Originally submitted as 3338290c187b254ad071f4b9cbf2ddb2623cefc0. Reverted in c7df6b1223d88dfd15248fbf7b7b83dacad22ae3.
show more ...
|
| #
d1bbe61d |
| 03-May-2021 |
David Green <[email protected]> |
[ARM] Memory operands for MVE gathers/scatters
Similarly to D101096, this makes sure that MMO operands get propagated through from MVE gathers/scatters to the Machine Instructions. This allows extra
[ARM] Memory operands for MVE gathers/scatters
Similarly to D101096, this makes sure that MMO operands get propagated through from MVE gathers/scatters to the Machine Instructions. This allows extra scheduling freedom, not forcing the instructions to act as scheduling barriers. We create MMO's with an unknown size, specifying that they can load from anywhere in memory, similar to the masked_gather or X86 intrinsics.
Differential Revision: https://reviews.llvm.org/D101219
show more ...
|
| #
15b5d1a5 |
| 02-May-2021 |
David Green <[email protected]> |
[ARM] Transfer memory operands for VLDn
We create MMO's for the VLDn/VSTn intrinsics in ARMTargetLowering:: getTgtMemIntrinsic, but they do not currently make it ll the way through ISel. This chang
[ARM] Transfer memory operands for VLDn
We create MMO's for the VLDn/VSTn intrinsics in ARMTargetLowering:: getTgtMemIntrinsic, but they do not currently make it ll the way through ISel. This changes that in the various places it needs changing, making sure that the MMO is propagate through to the final instruction. This can help in scheduling, not treating the VLD2/VST2 as a scheduling barrier.
Differential Revision: https://reviews.llvm.org/D101096
show more ...
|