History log of /llvm-project-15.0.7/llvm/lib/Transforms/Vectorize/VectorCombine.cpp (Results 1 – 25 of 105)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 4b7913c3 16-Jul-2022 David Green <[email protected]>

[VectorCombine] Only consider shuffle uses with the same type.

The backend getShuffleCosts do not currently handle shuffles that change
size very well. Limit the shuffles we collect to the same type

[VectorCombine] Only consider shuffle uses with the same type.

The backend getShuffleCosts do not currently handle shuffles that change
size very well. Limit the shuffles we collect to the same type to make
sure they do not cause issues as reported in D128732.

show more ...


# 519d7876 07-Jul-2022 Sander de Smalen <[email protected]>

[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.

This addresses https://github.com/llvm/llvm-project/issues/56377

Reviewed By: fhahn

Differential Revision: ht

[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.

This addresses https://github.com/llvm/llvm-project/issues/56377

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D129136

show more ...


# 5493f8fc 05-Jul-2022 David Green <[email protected]>

[VectorCombine] Improve shuffle select shuffle-of-shuffles

This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
t

[VectorCombine] Improve shuffle select shuffle-of-shuffles

This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
%x = shuffle %i1, %i2
%y = shuffle %i1, %i2
%a = binop %x, %y
%b = binop %x, %y
shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
%x = shuffle %i1, %i2
%y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

This is a recommit with some additional checks for supported forms and
out-of-bounds mask elements, with some extra tests.

Differential Revision: https://reviews.llvm.org/D128732

show more ...


# b69c75d5 05-Jul-2022 Nikita Popov <[email protected]>

Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"

This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d.

Clang crashes while linking bullet from llvm-test-suite in
ReleaseL

Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"

This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d.

Clang crashes while linking bullet from llvm-test-suite in
ReleaseLTO-g cmake configuration.

show more ...


# 19a1e20b 04-Jul-2022 David Green <[email protected]>

[VectorCombine] Improve shuffle select shuffle-of-shuffles

This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
t

[VectorCombine] Improve shuffle select shuffle-of-shuffles

This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
%x = shuffle %i1, %i2
%y = shuffle %i1, %i2
%a = binop %x, %y
%b = binop %x, %y
shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
%x = shuffle %i1, %i2
%y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

Differential Revision: https://reviews.llvm.org/D128732

show more ...


# bdba8278 29-Jun-2022 Nikita Popov <[email protected]>

[VectorCombine] Avoid ConstantExpr::get() (NFC)

Use IRBuilder APIs instead, which will still constant fold.


Revision tags: llvmorg-14.0.6
# c399b3a6 18-Jun-2022 Kazu Hirata <[email protected]>

[Vectorize] Use llvm::is_contained (NFC)


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# 6f9e1ea0 08-May-2022 David Green <[email protected]>

[VectorCombine] Attempt to fold select shuffles from reductions

Given a commutative reduction leading from a shuffle, the order of the
lanes on the shuffle are not important for the result. This mea

[VectorCombine] Attempt to fold select shuffles from reductions

Given a commutative reduction leading from a shuffle, the order of the
lanes on the shuffle are not important for the result. This means we can
reorder the shuffle to something simpler, which we try shuffling the
first vector lanes first. This was D123494.

The new shuffle may not be profitable though, and if it is not we can
try the folding of select shuffles from D123911. This, with some
adjustment as the output lane ordering is now unimportant, can allow the
final shuffle to simplify given the inputs to the patterns from D123911.
Where as each transformation on their own are not profitable, the
combination is.

We can only support a single shuffle when called from reductions, but we
are able to sort the ReconstructMask, potentially allowing it to
simplify to an identity or concat mask.

Differential Revision: https://reviews.llvm.org/D125086

show more ...


# 100cb9a2 06-May-2022 David Green <[email protected]>

[VectorCombine] Fold shuffle select pattern

This patch adds a combine to attempt to reduce the costs of certain
select-shuffle patterns. The form of code it attempts to detect is:
%x = shuffle ...

[VectorCombine] Fold shuffle select pattern

This patch adds a combine to attempt to reduce the costs of certain
select-shuffle patterns. The form of code it attempts to detect is:
%x = shuffle ...
%y = shuffle ...
%a = binop %x, %y
%b = binop %x, %y
shuffle %a, %b, selectmask

A classic select-mask will pick items from each lane of a or b. These
do not always have a great lowering on many architectures. This patch
attempts to pack a and b into the lower elements, creating a differently
ordered shuffle for reconstructing the orignal which may be better than
the select mask. This can be better for performance, especially if less
elements of a and b need to be computed and the input shuffles are
cheaper.

Because select-masks are just one form of shuffle, we generalize to any
mask. So long as the backend has decent costmodel for the shuffles, this
can generally improve things when they come up. For more basic cost
models the folds do not appear to be profitable, not getting past the
cost checks.

Differential Revision: https://reviews.llvm.org/D123911

show more ...


# 70306542 03-May-2022 serge-sans-paille <[email protected]>

[iwyu] Handle regressions in libLLVM header include

Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.

[iwyu] Handle regressions in libLLVM header include

Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D124847

show more ...


# 34f97a37 01-May-2022 Simon Pilgrim <[email protected]>

[VectorCombine] Merge isa<>/cast<> into dyn_cast<>. NFC.

We want to handle the the assert in VectorCombine so avoid the repeated isa/cast code.


# 7047c479 29-Apr-2022 David Green <[email protected]>

[VecCombine] Fix sort comparator logic in foldShuffleFromReductions

I think this sort comparator was overly complex, and the windows
expensive check bot agreed, failing as it was not giving a strict

[VecCombine] Fix sort comparator logic in foldShuffleFromReductions

I think this sort comparator was overly complex, and the windows
expensive check bot agreed, failing as it was not giving a strict weak
ordering. Change it to use the comparison of the mask values as unsigned
integers. This should sort the undef elements to the end whilst keeping
X<Y otherwise.

show more ...


Revision tags: llvmorg-14.0.3
# ded8187e 28-Apr-2022 David Green <[email protected]>

[VectorCombine] Try to reduce shuffle cost for commutative reduction operands

Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is als

[VectorCombine] Try to reduce shuffle cost for commutative reduction operands

Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is also true if there are a
number of operations between the reduction and the shuffle, providing
they only operate lane-wise. This patch searches for cases like that in
Vector Combine, allowing us to check the cost of the shuffle vs an
in-order identity shuffle and replace the order if possible. This only
handles a single shuffle at the moment to keep things simple, and is
able to ignore splats that produce results where every result is the
same.

This is a more powerful version of a combine that already happens in
instrcombine, capable of optimizing more cases by looking through more
instructions and being able to cost the shuffle.

Differential Revision: https://reviews.llvm.org/D123494

show more ...


Revision tags: llvmorg-14.0.2, llvmorg-14.0.1
# 2e44b787 16-Mar-2022 Fraser Cormack <[email protected]>

[VectorCombine] Insert addrspacecast when crossing address space boundaries

We can not bitcast pointers across different address spaces. This was
previously fixed in D89577 but then in D93229 an enh

[VectorCombine] Insert addrspacecast when crossing address space boundaries

We can not bitcast pointers across different address spaces. This was
previously fixed in D89577 but then in D93229 an enhancement was added
which peeks further through the ponter operand, opening up the
possibility that address-space violations could be introduced.

Instead of bailing as the previous fix did, simply insert an
addrspacecast cast instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D121787

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# c141d158 19-Feb-2022 Florian Hahn <[email protected]>

[VectorCombine] Remove redundant checks (NFC).

The removed conditions are already checked by the if above.

Fixes #53761.


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 0edf9995 28-Dec-2021 Sanjay Patel <[email protected]>

[Analysis] allow caller to choose signed/unsigned when computing constant range

We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the oth

[Analysis] allow caller to choose signed/unsigned when computing constant range

We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.

This patch is modeled on a similar construct that was added with
D59386.

I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).

InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.

Fixes #52884

Differential Revision: https://reviews.llvm.org/D116322

show more ...


# 5a81a603 09-Dec-2021 Arthur Eubanks <[email protected]>

[NFC] Remove more calls to getAlignment()

These are deprecated and should be replaced with getAlign().

Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align val

[NFC] Remove more calls to getAlignment()

These are deprecated and should be replaced with getAlign().

Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 66d22b4d 21-Oct-2021 Sanjay Patel <[email protected]>

[VectorCombine] fold shuffle-of-binops with common operand

shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)

This is motivated by an example in D111800
(although that patch avoids the problem

[VectorCombine] fold shuffle-of-binops with common operand

shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)

This is motivated by an example in D111800
(although that patch avoids the problem for that particular example).

The pattern is shown in reduced form with:
https://llvm.org/PR52178
https://alive2.llvm.org/ce/z/d8zB4D

There is no difference on the PhaseOrdering test from D111800
because the aarch64 cost model says that the shuffle cost is 3 while
the fadd cost is 2.

Differential Revision: https://reviews.llvm.org/D111901

show more ...


# 4a1d63d7 15-Oct-2021 Florian Hahn <[email protected]>

[VectorCombine] Add option to only run scalarization transforms.

This patch adds a pass option to only run transforms that scalarize
vector operations and do not create new vector instructions.

Whe

[VectorCombine] Add option to only run scalarization transforms.

This patch adds a pass option to only run transforms that scalarize
vector operations and do not create new vector instructions.

When running VectorCombine early in the pipeline introducing new vector
operations can have negative effects, like blocking loop or SLP
vectorization. To avoid regressions, restrict the early VectorCombine
run (when using -enable-matrix) to only perform scalarization and not
introduce new vector operations.

This is done as option to the pass directly, which is then set when
adding the pass to the pipeline. This is done for the new pass manager
only.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111800

show more ...


# 098a0d8f 30-Sep-2021 Hongtao Yu <[email protected]>

[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.

This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsi

[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.

This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847

show more ...


# 0dcd2b40 06-Oct-2021 Simon Pilgrim <[email protected]>

[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost

We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SS

[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost

We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument.

This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known.

Differential Revision: https://reviews.llvm.org/D111024

show more ...


# e2f6290e 28-Sep-2021 Florian Hahn <[email protected]>

[VectorCombine] Discard ScalarizationResult state in early exit.

ScalarizationResult's destructor makes sure ToFreeze is not ignored if
set. Currently, scalarizeLoadExtract has an early exit if the

[VectorCombine] Discard ScalarizationResult state in early exit.

ScalarizationResult's destructor makes sure ToFreeze is not ignored if
set. Currently, scalarizeLoadExtract has an early exit if the index is
not safe directly. But when it is SafeWithFreeze, we need to discard the
state first, otherwise we hit the assert in the destructor.

Fixes PR51992.

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# 300870a9 22-Sep-2021 Florian Hahn <[email protected]>

[VectorCombine] Switch to using a worklist.

This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The

[VectorCombine] Switch to using a worklist.

This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171

show more ...


# 5131037e 21-Sep-2021 Florian Hahn <[email protected]>

[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.

isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts compute

[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.

isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175

show more ...


Revision tags: llvmorg-13.0.0-rc3
# c24fc37e 13-Sep-2021 Florian Hahn <[email protected]>

[VectorCombine] Support AND/UREM indices that require freezing.

38b098be6605 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we

[VectorCombine] Support AND/UREM indices that require freezing.

38b098be6605 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107580

show more ...


12345