|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
e82d49bf |
| 24-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types
Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed
[DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types
Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors
show more ...
|
| #
a3e38b4a |
| 24-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero
|
| #
5f89d2ba |
| 23-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well.
[DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....).
It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.
show more ...
|
| #
6aff1b7b |
| 23-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC.
|
| #
0f6b0461 |
| 19-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XO
[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.
This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115
Alive2: https://alive2.llvm.org/ce/z/fl7T7K
Differential Revision: https://reviews.llvm.org/D129933
show more ...
|
| #
7fa1c326 |
| 18-Jul-2022 |
Craig Topper <[email protected]> |
[CodeGen] Remove unnecessary APInt copy. NFC
|
| #
a55ff6aa |
| 18-Jul-2022 |
Craig Topper <[email protected]> |
[Support][CodeGen] Fix spelling Divison->Division. NFC
|
| #
795602af |
| 18-Jul-2022 |
Craig Topper <[email protected]> |
[CodeGen] Don't compare bool with integer 0. NFC
The IsAdd field is a bool.
|
| #
3c8bf296 |
| 15-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits
SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(no
[DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits
SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts
As mentioned on D127115, we should be able to further generalise this based off the demanded bits.
show more ...
|
| #
2a721374 |
| 07-Jul-2022 |
Nikita Popov <[email protected]> |
[IR] Don't use blockaddresses as callbr arguments
Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label
[IR] Don't use blockaddresses as callbr arguments
Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations:
; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
show more ...
|
| #
dcfc1fd2 |
| 14-Jul-2022 |
Craig Topper <[email protected]> |
[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount.
If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those
[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount.
If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those leading zeros to not demand those bits from operand 0. This can allow zero_extend/sign_extend to become any_extend. This pattern can occur due to C integer promotion rules.
This transform is already done by InstCombineSimplifyDemanded.cpp where sign_extend can be turned into zero_extend for example.
Reviewed By: spatel, foad
Differential Revision: https://reviews.llvm.org/D121833
show more ...
|
| #
611ffcf4 |
| 14-Jul-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use value instead of getValue (NFC)
|
| #
d172842b |
| 13-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results
If an element is known zero from both selections then it shouldn't matter what the selection mas
[DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results
If an element is known zero from both selections then it shouldn't matter what the selection mask element is.
show more ...
|
| #
8eaf00e0 |
| 09-Jul-2022 |
Craig Topper <[email protected]> |
[TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types.
To convert CTLZ to popcount we do
x = x | (x >> 1); x = x | (x >> 2); ... x = x | (x >>16); x = x | (x >>32); // for 64-bit in
[TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types.
To convert CTLZ to popcount we do
x = x | (x >> 1); x = x | (x >> 2); ... x = x | (x >>16); x = x | (x >>32); // for 64-bit input return popcount(~x);
This smears the most significant set bit across all of the bits below it then inverts the remaining 0s and does a population count.
To support non-power of 2 types, the last shift amount must be more than half of the size of the type. For i15, the last shift was previously a shift by 4, with this patch we add another shift of 8.
Fixes PR56457.
Differential Revision: https://reviews.llvm.org/D129431
show more ...
|
| #
ded62411 |
| 12-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits
Noticed while investigating the SystemZ regressions in D77804, prefer
[DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits
Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits
show more ...
|
| #
c64aba5d |
| 11-Jul-2022 |
Nikita Popov <[email protected]> |
[SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI)
visitInlineAsm() in SDAGBuilder was duplicating a lot of the code in ParseConstraints(), in particular all the logic to de
[SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI)
visitInlineAsm() in SDAGBuilder was duplicating a lot of the code in ParseConstraints(), in particular all the logic to determine the operand value and constraint VT.
Rely on the data computed by ParseConstraints() instead, and update its ConstraintVT implementation to match getCallOperandValEVT() more precisely.
show more ...
|
| #
b05160db |
| 11-Jul-2022 |
Craig Topper <[email protected]> |
[SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits.
As far as I can tell what was happening in the original code is that the getNode call receives the same operands as the orig
[SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits.
As far as I can tell what was happening in the original code is that the getNode call receives the same operands as the original node with different SDNodeFlags. The logic inside getNode detects that the node already exists and intersects the flags into the existing node and returns it. This results in Op and NewOp for the TLO.CombineTo call always being the same node.
We may have already called CombineTo as part of the recursive handling. A second call to CombineTo as we unwind the recursion overwrites the previous CombineTo. I think this means any time we updated the poison flags that was the only change that ends up getting made and we relied on DAGCombiner to revisit and call SimplifyDemandedBits again. The second time the poison flags wouldn't need to be dropped and we would keep the CombineTo call from further down the recursion.
We can instead call setFlags to drop the poison flags and remove the call to TLO.CombineTo. This way we keep the CombineTo from deeper in the recursion which should be more efficient.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D129511
show more ...
|
| #
b5304612 |
| 08-Jul-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I)
If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, t
[DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I)
If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all.
In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly.
Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector.
Differential Revision: https://reviews.llvm.org/D129290
show more ...
|
| #
3b7c3a65 |
| 25-Jun-2022 |
Kazu Hirata <[email protected]> |
Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
|
| #
aa8feeef |
| 25-Jun-2022 |
Kazu Hirata <[email protected]> |
Don't use Optional::hasValue (NFC)
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
c0ecbfa4 |
| 20-Jun-2022 |
David Green <[email protected]> |
[AArch64] Known bits for AArch64ISD::DUP
An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode.
Proble
[AArch64] Known bits for AArch64ISD::DUP
An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode.
Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isTargetCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D128144
show more ...
|
| #
1ebe5cac |
| 19-Jun-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification
|
| #
db1be696 |
| 19-Jun-2022 |
Simon Pilgrim <[email protected]> |
[DAG] SimplifyDemandedBits - add ISD::VSELECT handling
|
| #
c06f77ec |
| 15-Jun-2022 |
Ping Deng <[email protected]> |
[SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))'
Reviewed By: craig.topper, spatel
Differential Revision: https://reviews.llvm.org/D127474
|
| #
1cf9b24d |
| 12-Jun-2022 |
Simon Pilgrim <[email protected]> |
[DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source
[DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
This helps with several of the regressions from D125836
show more ...
|