History log of /llvm-project-15.0.7/llvm/test/Transforms/SLPVectorizer/X86/tiny-tree.ll (Results 1 – 25 of 31)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 7d6e8f2a 29-Mar-2022 Philip Reames <[email protected]>

[slp] Delete dead scalar instructions feeding vectorized instructions

If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We ca

[slp] Delete dead scalar instructions feeding vectorized instructions

If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well.

This is purely for test output quality and readability. It should have no effect in any sane pipeline.

Differential Revision: https://reviews.llvm.org/D122493

show more ...


# 48cc9287 18-Mar-2022 Philip Reames <[email protected]>

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)

The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)

The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed over the weekend and have had several days to bake. The last was fixed this morning after being noticed in manual review of test changes yesterday. See the review thread for links to each change.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# deae979a 03-Mar-2022 Philip Reames <[email protected]>

Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""

This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on

Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""

This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.

show more ...


# 73804271 02-Mar-2022 Philip Reames <[email protected]>

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""

Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch.

Original commit message follows:

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""

Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0-rc2
# 9c6250ee 01-Mar-2022 Arthur Eubanks <[email protected]>

Revert "[SLP] Schedule only sub-graph of vectorizable instructions"

This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bo

Revert "[SLP] Schedule only sub-graph of vectorizable instructions"

This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.

show more ...


# 0539a26d 22-Feb-2022 Philip Reames <[email protected]>

[SLP] Schedule only sub-graph of vectorizable instructions

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to th

[SLP] Schedule only sub-graph of vectorizable instructions

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1
# 352c46e7 29-Jul-2021 Alexey Bataev <[email protected]>

[SLP]Improve vectorization of split loads.

Need to fix ther cost estimation for split loads, since we look at the
subregs already, no need to permute them, need just to estimate
subregister insert,

[SLP]Improve vectorization of split loads.

Need to fix ther cost estimation for split loads, since we look at the
subregs already, no need to permute them, need just to estimate
subregister insert, if it is smaller than the real register. Also, using
split loads, it might be profitable already to vectorize smaller trees
with gathering of the loads.

Differential Revision: https://reviews.llvm.org/D107188

show more ...


# 3ea7877c 15-Sep-2021 Alexey Bataev <[email protected]>

[SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization.

Vectorization of PHIs and stores very similar, it might be beneficial to
try to revectorize stores (like PHIs)

[SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization.

Vectorization of PHIs and stores very similar, it might be beneficial to
try to revectorize stores (like PHIs) if the total number of stores with
the same/alternate opcode is less than the vector size but number of
stores with the same type is larger than the vector size.

Differential Revision: https://reviews.llvm.org/D109831

show more ...


# b6d10beb 22-Sep-2021 Alexey Bataev <[email protected]>

[SLP][NFC]Rename function in the test for better matching of the
transformation.


# 446e11fa 15-Sep-2021 Alexey Bataev <[email protected]>

[SLP][NFC]Add a test for tiny tree with stores and with not
same/alternate instructions.


# 95e5d401 29-Jul-2021 Alexey Bataev <[email protected]>

[SLP]Improve splats vectorization.

Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/

[SLP]Improve splats vectorization.

Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.

Differential Revision: https://reviews.llvm.org/D107104

show more ...


Revision tags: llvmorg-14-init
# da3dbfca 15-Jul-2021 Alexey Bataev <[email protected]>

[SLP]Improve calculations of the cost for reused/reordered scalars.

Part of D105020. Also, fixed FIXMEs that need to use wider vector type
when trying to calculate the cost of reused scalars. This m

[SLP]Improve calculations of the cost for reused/reordered scalars.

Part of D105020. Also, fixed FIXMEs that need to use wider vector type
when trying to calculate the cost of reused scalars. This may cause
regressions unless D100486 is landed to improve the cost estimations
for long vectors shuffling.

Differential Revision: https://reviews.llvm.org/D106060

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# a0086add 01-Jun-2021 Alexey Bataev <[email protected]>

[SLP]Improve gathering of scalar elements.

1. Better sorting of scalars to be gathered. Trying to insert
constants/arguments/instructions-out-of-loop at first and only then
the instructions wh

[SLP]Improve gathering of scalar elements.

1. Better sorting of scalars to be gathered. Trying to insert
constants/arguments/instructions-out-of-loop at first and only then
the instructions which are inside the loop. It improves hoisting of
invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103458

show more ...


Revision tags: llvmorg-12.0.1-rc1
# 369cd2ae 04-May-2021 Alexey Bataev <[email protected]>

Revert "[SLP]Allow masked gathers only if allowed by target."

This reverts commit fd18547e0721983dcb273670d16341921f831e50. Need to
add a check for the size of the vectorization tree to avoid some e

Revert "[SLP]Allow masked gathers only if allowed by target."

This reverts commit fd18547e0721983dcb273670d16341921f831e50. Need to
add a check for the size of the vectorization tree to avoid some extra
vectorization.

show more ...


# fd18547e 03-May-2021 Alexey Bataev <[email protected]>

[SLP]Allow masked gathers only if allowed by target.

Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
pattern

[SLP]Allow masked gathers only if allowed by target.

Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297

show more ...


# 2e4cc9a7 03-May-2021 Alexey Bataev <[email protected]>

Revert "[SLP]Allow masked gathers only if allowed by target."

This reverts commit b5f64768cfeecca16c7c9c53cbd97ac7289c43aa to fix
a compiler crash revealed by buildbots.


# b5f64768 26-Apr-2021 Alexey Bataev <[email protected]>

[SLP]Allow masked gathers only if allowed by target.

Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
pattern

[SLP]Allow masked gathers only if allowed by target.

Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297

show more ...


# 8af4723c 26-Apr-2021 Alexey Bataev <[email protected]>

[SLP]Try to vectorize tiny trees with shuffled gathers.

If the first tree element is vectorize and the second is gather, it
still might be profitable to vectorize it if the gather node contains
less

[SLP]Try to vectorize tiny trees with shuffled gathers.

If the first tree element is vectorize and the second is gather, it
still might be profitable to vectorize it if the gather node contains
less scalars to vectorize than the original tree node. It might be
profitable to use shuffles.

Differential Revision: https://reviews.llvm.org/D101397

show more ...


# 1c0ab341 27-Apr-2021 Alexey Bataev <[email protected]>

[SLP]Add a test for possibly vectorized tiny tree, NFC.


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 4a8e6ed2 05-Jan-2021 Juneyoung Lee <[email protected]>

[SLP,LV] Use poison constant vector for shufflevector/initial insertelement

This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part o

[SLP,LV] Use poison constant vector for shufflevector/initial insertelement

This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94061

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3
# 691c086d 26-Jun-2020 Arthur Eubanks <[email protected]>

[NewPM][BasicAA] basicaa -> basic-aa in Transforms/SLPVectorizer

Following https://reviews.llvm.org/D82607.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82681


Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# cee313d2 17-Apr-2019 Eric Christopher <[email protected]>

Revert "Temporarily Revert "Add basic loop fusion pass.""

The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552


Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3
# 2534592b 20-Feb-2019 Eric Christopher <[email protected]>

Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"

As this has broken the lto bootstrap build for 3 days and is
showing a significant regress

Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"

As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.

This reverts commit r353923.

llvm-svn: 354434

show more ...


# ca9aff93 13-Feb-2019 Anton Afanasyev <[email protected]>

[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)

Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for pa

[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)

Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

llvm-svn: 353923

show more ...


Revision tags: llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1
# f1e66883 04-Apr-2018 Simon Pilgrim <[email protected]>

[SLPVectorizer][X86] Regenerate some tests. NFCI

llvm-svn: 329196


12