History log of /llvm-project-15.0.7/llvm/test/CodeGen/X86/oddsubvector.ll (Results 1 – 25 of 28)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# 2f448bf5 22-Jun-2022 Nikita Popov <[email protected]>

[X86] Migrate tests to use opaque pointers (NFC)

Test updates were performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

These are only the test updates where the test pas

[X86] Migrate tests to use opaque pointers (NFC)

Test updates were performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

These are only the test updates where the test passed without
further modification (which is almost all of them, as the backend
is largely pointer-type agnostic).

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3
# 03482bcc 27-Apr-2022 Simon Pilgrim <[email protected]>

[X86] collectConcatOps - add ability to collect from vector 'widening' patterns

Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvect

[X86] collectConcatOps - add ability to collect from vector 'widening' patterns

Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvector on the fly.

show more ...


Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# c486b82c 13-Feb-2022 Sanjay Patel <[email protected]>

[x86] try harder to scalarize a vector load with extracted integer op uses

This is a retry of b4b97ec813a0 - that was reverted because it
could cause miscompiles by illegally reordering memory opera

[x86] try harder to scalarize a vector load with extracted integer op uses

This is a retry of b4b97ec813a0 - that was reverted because it
could cause miscompiles by illegally reordering memory operations.
A new test based on #53695 is added here to verify we do not have
that same problem.

extract_vec_elt (load X), C --> scalar load (X+C)

As noted in the comment, DAGCombiner has this fold -- and the code in this
patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but
x86 should benefit even if the loaded vector has other uses as long as we
apply some other x86-specific conditions. The motivating example from #50310
is shown in vec_int_to_fp.ll.

Fixes #50310
Fixes #53695

Differential Revision: https://reviews.llvm.org/D118376

show more ...


Revision tags: llvmorg-14.0.0-rc1
# 7b037250 04-Feb-2022 Sanjay Patel <[email protected]>

Revert "[x86] try harder to scalarize a vector load with extracted integer op uses"

This reverts commit b4b97ec813a02585000f30ac7d532dda74e8bfda.

As discussed in post-commit feedback at:
https://re

Revert "[x86] try harder to scalarize a vector load with extracted integer op uses"

This reverts commit b4b97ec813a02585000f30ac7d532dda74e8bfda.

As discussed in post-commit feedback at:
https://reviews.llvm.org/D118376
...there's a stage 2 failure on a Mac running a clang-refactor tool test.

show more ...


Revision tags: llvmorg-15-init
# b4b97ec8 28-Jan-2022 Sanjay Patel <[email protected]>

[x86] try harder to scalarize a vector load with extracted integer op uses

extract_vec_elt (load X), C --> scalar load (X+C)

As noted in the comment, DAGCombiner has this fold -- and the code in th

[x86] try harder to scalarize a vector load with extracted integer op uses

extract_vec_elt (load X), C --> scalar load (X+C)

As noted in the comment, DAGCombiner has this fold -- and the code in this
patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but
x86 should benefit even if the loaded vector has other uses as long as we
apply some other x86-specific conditions. The motivating example from #50310
is shown in vec_int_to_fp.ll.

Fixes #50310

Differential Revision: https://reviews.llvm.org/D118376

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# cf9b1f7a 01-Jun-2021 Roman Lebedev <[email protected]>

[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants

Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature,
which controls

[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants

Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature,
which controls profitability of both the cross-lane and per-lane variable shuffles.
I guess, this has been fine so far.

But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`)
are as fast as as shuffles with fixed/immediate mask,
while lane-crossing shuffles, e.g. `VPERMPS` is performing worse.

So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles,
as suggested by @RKSimon, split the feature flag into two.

Differential Revision: https://reviews.llvm.org/D103274

show more ...


# 629e2b34 26-May-2021 Simon Pilgrim <[email protected]>

[X86][SSE] Regenerate some tests to expose the rip relative vector/broadcast loads


Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 8767f3bb 18-Dec-2020 Simon Pilgrim <[email protected]>

[X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969)

Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_L

[X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969)

Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead.

Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.

show more ...


Revision tags: llvmorg-11.0.1-rc1
# f4467c4d 17-Nov-2020 Wang, Pengfei <[email protected]>

[CodeGen][X86] Remove some unused check-prefixes and regenerate tests.


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 21d02dc5 02-Sep-2020 Simon Pilgrim <[email protected]>

[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support

This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts mak

[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support

This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.

We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.

Differential Revision: https://reviews.llvm.org/D66004

show more ...


Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2
# 872c5fb1 26-May-2020 Fangrui Song <[email protected]>

[AsmPrinter] Don't generate .Lfoo$local for -fno-PIC and -fPIE

-fno-PIC and -fPIE code generally cannot be linked in -shared mode and there is no benefit accessing via local aliases.

Actually, a .L

[AsmPrinter] Don't generate .Lfoo$local for -fno-PIC and -fPIE

-fno-PIC and -fPIE code generally cannot be linked in -shared mode and there is no benefit accessing via local aliases.

Actually, a .Lfoo$local reference will be converted to a STT_SECTION (if no section relaxation) reference which will cause the section symbol (sizeof(Elf64_Sym)=24) to be generated.

show more ...


# 9fa58d1b 25-May-2020 Simon Pilgrim <[email protected]>

[DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling

For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, t

[DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling

For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies.

This helps with PR40502.

Differential Revision: https://reviews.llvm.org/D79003

show more ...


Revision tags: llvmorg-10.0.1-rc1
# bd60b298 27-Apr-2020 Simon Pilgrim <[email protected]>

[X86][SSE] Regenerate oddsubvector.ll test checks

Fixes some missed address symbol regexs


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1
# 5b22bcc2 22-Jan-2020 Fangrui Song <[email protected]>

[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local

For a MC_GlobalAddress reference to a dso_local external GlobalValue with a definition, emit .Lfoo$local to avoid a relocation.

-

[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local

For a MC_GlobalAddress reference to a dso_local external GlobalValue with a definition, emit .Lfoo$local to avoid a relocation.

-fno-pic and -fpie can infer dso_local but -fpic cannot. In the future,
we can explore the possibility of inferring dso_local with -fpic. As the
description of D73228 says, LLVM's existing IPO optimization behaviors
(like -fno-semantic-interposition) and a previous assembly behavior give
us enough license to be aggressive here.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D73230

show more ...


# a14aa7da 22-Jan-2020 Simon Pilgrim <[email protected]>

[X86][SSE] combineExtractWithShuffle - extract(bictcast(scalar_to_vector(x))) --> x

Removes some unnecessary gpr<-->fpu traffic


Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2
# 8b5f2ab2 07-Aug-2019 Craig Topper <[email protected]>

Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch chang

Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 368183

show more ...


# bd0d97e1 06-Aug-2019 Mitch Phillips <[email protected]>

Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."

This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.

This commit broke the MSan buildbots. See
https://revi

Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."

This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.

This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.

llvm-svn: 368107

show more ...


# 3de33245 05-Aug-2019 Craig Topper <[email protected]>

[X86] Enable -x86-experimental-vector-widening-legalization by default.

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from prom

[X86] Enable -x86-experimental-vector-widening-legalization by default.

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 367901

show more ...


# 24ad2b5e 31-Jul-2019 Simon Pilgrim <[email protected]>

[X86][AVX] Ensure chained subvector insertions are the same size (PR42833)

Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all t

[X86][AVX] Ensure chained subvector insertions are the same size (PR42833)

Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all the same type. On AVX512 targets especially we might have a mixture of 128/256 subvector insertions.

llvm-svn: 367429

show more ...


# 24e4e808 31-Jul-2019 Simon Pilgrim <[email protected]>

[X86][AVX] Add reduced test case for PR42833

llvm-svn: 367412


# e4d5423d 30-Jul-2019 Simon Pilgrim <[email protected]>

[X86][AVX] SimplifyDemandedVectorElts - handle extraction from X86ISD::SUBV_BROADCAST source (PR42819)

PR42819 showed an issue that we couldn't handle the case where we demanded a 'sub-sub-vector' o

[X86][AVX] SimplifyDemandedVectorElts - handle extraction from X86ISD::SUBV_BROADCAST source (PR42819)

PR42819 showed an issue that we couldn't handle the case where we demanded a 'sub-sub-vector' of the SUBV_BROADCAST 'sub-vector' source.

This patch recognizes these cases and extracts the sub-sub-vector instead of trying to broadcast to a type smaller than the 'sub-vector' source.

llvm-svn: 367306

show more ...


Revision tags: llvmorg-9.0.0-rc1
# 510e6fad 22-Jul-2019 Craig Topper <[email protected]>

[X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it.

The build_vector

[X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it.

The build_vector will become a constant pool load. By using the
desired type initially, it ensures we don't generate a bitcast
of the constant pool load which will need to be folded with
the load.

While experimenting with another patch, I noticed that when the
load type and the constant pool type don't match, then
SimplifyDemandedBits can't handle it. While we should probably
fix that, this was a simple way to fix the issue I saw.

llvm-svn: 366732

show more ...


Revision tags: llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2
# 606eb236 04-Jun-2019 Sanjay Patel <[email protected]>

[x86] split 256-bit store of concatenated vectors

This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://god

[x86] split 256-bit store of concatenated vectors

This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 362524

show more ...


# f7980e72 28-May-2019 Sanjay Patel <[email protected]>

Revert "[x86] split 256-bit store of concatenated vectors"

This reverts commit d5a8637072f4c556b88156bd2f6237a2ead47d31.

Most likely suspect for this bot failure:
http://lab.llvm.org:8011/builders/

Revert "[x86] split 256-bit store of concatenated vectors"

This reverts commit d5a8637072f4c556b88156bd2f6237a2ead47d31.

Most likely suspect for this bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684

llvm-svn: 361850

show more ...


# d5a86370 28-May-2019 Sanjay Patel <[email protected]>

[x86] split 256-bit store of concatenated vectors

This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://god

[x86] split 256-bit store of concatenated vectors

This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 361822

show more ...


12