|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
| #
b5b43551 |
| 03-Aug-2022 |
Craig Topper <[email protected]> |
[RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'a
[RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop.
To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation.
Fixes PR56905.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D131113
(cherry picked from commit 53d560b22f5b5d91ae5296f030e0ca75a5d2c625)
show more ...
|
| #
98411113 |
| 02-Aug-2022 |
wanglian <[email protected]> |
[RISCV][NFC] Use defined variable instead some code.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D130687
(cherry picked from commit e208bab55fb11a69931a02dec8583a8ec5f94bbf)
|
|
Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
45944e7c |
| 25-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC.
D130508 handles more constants than just 1 or -1. We need to extract the constant instead of relying isOneConstant or isAllOnesC
[RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC.
D130508 handles more constants than just 1 or -1. We need to extract the constant instead of relying isOneConstant or isAllOnesConstant.
show more ...
|
| #
d8800ead |
| 12-Jul-2022 |
jacquesguan <[email protected]> |
[RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element f
[RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.
Differential Revision: https://reviews.llvm.org/D129545
show more ...
|
| #
9adc00a9 |
| 23-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Add a continue to reduce nesting. NFC
|
| #
1cc7f5be |
| 23-Jul-2022 |
Kazu Hirata <[email protected]> |
Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
|
| #
add17fc8 |
| 21-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C. An earlier combine removed the setcc so it wasn't there to m
[RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C. An earlier combine removed the setcc so it wasn't there to match when we created the SELECT_CC. This doesn't happen for BR_CC because generic DAG combine rebuilds the setcc if it is used by BRCOND.
We can shift X left by XLen-1-C to put the bit to be tested in the MSB, and use a signed compare with 0 to test the MSB.
show more ...
|
| #
7dda6c71 |
| 21-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function.
The only difference between the combines were the calls to getNode that include the true/false values for SELECT_
[RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function.
The only difference between the combines were the calls to getNode that include the true/false values for SELECT_CC or the chain and branch target for BR_CC.
Wrap the rest of the code into a helper that reads LHS, RHS, and CC and outputs new values and a bool if a new node needs to be created.
show more ...
|
| #
8983db15 |
| 21-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the And. To avoid this, we can shift X left by XLen-1-C bits to put the tested bit
[RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the And. To avoid this, we can shift X left by XLen-1-C bits to put the tested bit in the MSB, then we can do a signed compare with 0 to determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.
I've implemented this inside of translateSetCCForBranch which is called when setcc+brcond or setcc+select is converted to br_cc or select_cc during lowering. It doesn't make sense to do this for general setcc since we lack a sgez instruction.
I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31 and between 32 and 63 for both i32 and i64 where applicable. Select has some deficiencies where we receive (and (srl X, C), 1) instead. This doesn't happen for br_cc due to the call to rebuildSetCC in the generic DAGCombiner for brcond. I'll explore improving select in a future patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D130203
show more ...
|
| #
3198364e |
| 26-Jun-2022 |
ksyx <[email protected]> |
[RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension of M (Integer Multiplication and Division) consisting only multiplication part
[RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension of M (Integer Multiplication and Division) consisting only multiplication part of it.
Differential Revision: https://reviews.llvm.org/D103313 Reviewed By: craig.topper, jrtc27, asb
show more ...
|
| #
0b027528 |
| 18-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1)
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we know the result is being used by a compare, we can use a sext_inreg instead of
[RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1)
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we know the result is being used by a compare, we can use a sext_inreg instead of an AND if we also modify C1 to have 33 sign bits instead of 32 leading zeros. This can also improve the generated code for materializing C1.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D129980
show more ...
|
| #
259c36e7 |
| 18-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC.
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
2b111740 |
| 16-May-2022 |
jacquesguan <[email protected]> |
[RISCV][NFC] Use more Arrayref in TargetLowering functions.
This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable.
Reviewed By: craig.topper
Differen
[RISCV][NFC] Use more Arrayref in TargetLowering functions.
This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125656
show more ...
|
| #
d9554971 |
| 17-Jul-2022 |
Fangrui Song <[email protected]> |
[RISCV] Simplify lowerGlobalAddress. NFC
|
| #
decf385c |
| 17-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR.
We were only handling AND before, but SimplifyDemandedBits can also call it for OR and XOR.
|
| #
25775553 |
| 13-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32).
The former pattern will select as slliw+sraiw while the latter will select as slli+srai. This can enable the slli
[RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32).
The former pattern will select as slliw+sraiw while the latter will select as slli+srai. This can enable the slli+srai to be compressed.
Differential Revision: https://reviews.llvm.org/D129688
show more ...
|
| #
dde2a7fb |
| 12-Jul-2022 |
Philip Reames <[email protected]> |
[RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The R
[RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.
vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)
We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)
It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.
Differential Revision: https://reviews.llvm.org/D129609
show more ...
|
| #
c5be6a83 |
| 12-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Use X0 in place of VLMaxSentinel in lowering.
I thought I had already fixed all of these, but I guess I missed one.
|
| #
c3c17b16 |
| 11-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to fix. Since the return type is MVT and it uses MVT::getVectorVT, taking an MVT
[RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to fix. Since the return type is MVT and it uses MVT::getVectorVT, taking an MVT as input makes the most sense.
show more ...
|
| #
1a2bd44b |
| 11-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when enableUnalignedScalarMem is false. This fixes a
[RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when enableUnalignedScalarMem is false. This fixes a regression spotted by @asb.
To fix this correctly, we need to consider alignment of the load we'd be replacing, but that's not possible in the current interface.
show more ...
|
| #
3f68f0f8 |
| 09-Jul-2022 |
LiaoChunyu <[email protected]> |
[RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR
Reviewed By: craig.topper
Diffe
[RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127871
show more ...
|
| #
35ec8a42 |
| 10-Jul-2022 |
Craig Topper <[email protected]> |
[RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going to turn around and create a c
[RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going to turn around and create a constant pool for the immmediate.
I left out the check for useConstantPoolForLargeInts() thinking that even if you don't want the commpiler to create a constant pool you might still want to avoid materializing an integer that is already available in a global variable.
Test file was copied from AArch64/ARM and has not been commited yet. Will post separate review for that.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D129402
show more ...
|
| #
9cfb28d6 |
| 29-Jun-2022 |
Lian Wang <[email protected]> |
[RISCV] Change VECTOR_SPLICE mask operation from expand to promote
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D128717
|
| #
bf1758c3 |
| 07-Jul-2022 |
Diego Caballero <[email protected]> |
Revert "[RISCV] Optimize 2x SELECT for floating-point types"
This reverts commit 1178992c72b002c3b2c87203252c566eeb273cc1.
|
| #
51d67294 |
| 30-Jun-2022 |
Craig Topper <[email protected]> |
[RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
Similar for a subtract with a constant left hand side.
(sra (add (shl X, 32), C1<<32), 32) is the canonical IR f
[RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
Similar for a subtract with a constant left hand side.
(sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine for (sext (add (trunc X to i32), 32) to i32).
For RISCV, we should lower this as addiw which means turning it into (sext_inreg (add X, C1)).
There is an existing DAG combine to convert back to (sext (add (trunc X to i32), 32) to i32), but it requires isTruncateFree to return true and for i32 to be a legal type as it used sign_extend and truncate nodes. So that doesn't work for RISCV.
If the outer sra happens be used by a shl by constant, it will be folded and the shift amount of the sra will be changed before we can do our own DAG combine. This requires us to match the more general pattern and restore the shl.
I had wanted to do this as a separate (add (shl X, 32), C1<<32) -> (shl (add X, C1), 32) combine, but that hit an infinite loop for some values of C1.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D128869
show more ...
|