|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
f1879481 |
| 16-Jul-2022 |
Phoebe Wang <[email protected]> |
[X86][FP16] Enable vector support for FP16 emulation
This is follow up of D107082, which enable vector support according to psABI.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org
[X86][FP16] Enable vector support for FP16 emulation
This is follow up of D107082, which enable vector support according to psABI.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D127982
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
7a9ad257 |
| 22-Jun-2022 |
Vasileios Porpodas <[email protected]> |
Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6d6268dcbf0f48e43f6f9fe46b3a28c29ba63c7d.
Review: https://reviews.llvm.org/D125712
|
| #
6d6268dc |
| 22-Jun-2022 |
Vasileios Porpodas <[email protected]> |
Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6f88acf410b48f3e6c1526df2dc32ed86f249685.
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
6f88acf4 |
| 13-May-2022 |
Vasileios Porpodas <[email protected]> |
[SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles like fadd,fsub because this may block them bein
[SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles like fadd,fsub because this may block them being matched into a single vector instruction in x86. We do this by checking if a TreeEntry is such a pattern and adding it to the list of TreeEntries with orders that need to be considered.
Differential Revision: https://reviews.llvm.org/D125712
show more ...
|
| #
cf2072bc |
| 15-Jun-2022 |
Simon Pilgrim <[email protected]> |
[X86] X86TargetTransformInfo.cpp - use InstructionCost type to accumulate instructions costs
|
| #
6a845792 |
| 25-May-2022 |
eopXD <[email protected]> |
[LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D126350
|
| #
6c80267d |
| 24-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] getScalarizationOverhead - improve extraction costs for > 128-bit vectors
We were using the default getScalarizationOverhead expansion for extraction costs, which adds up all the in
[CostModel][X86] getScalarizationOverhead - improve extraction costs for > 128-bit vectors
We were using the default getScalarizationOverhead expansion for extraction costs, which adds up all the individual element extraction costs.
This is fine for 128-bit vectors, but for 256/512-bit vectors each element extraction also has to account for extracting the upper 128-bit subvector extraction before it can handle the element. For scalarization costs we only need to extract each demanded subvector once.
Differential Revision: https://reviews.llvm.org/D125527
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
6bec3e93 |
| 06-Oct-2021 |
Jay Foad <[email protected]> |
[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op).
[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing extending to a *smaller* width, or truncating to a *larger* width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
show more ...
|
| #
3d107ce2 |
| 06-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] Relax fcmp costs on SSE41 targets or later
Only pre-SSE41 targets double-pump the fp comparison ops
|
| #
cbfa8573 |
| 06-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op
Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust t
[CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op
Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust the worse case cost values
show more ...
|
| #
d21bf514 |
| 06-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] Adjust pre-SSE41 fp scalar select costs to account for vector ops
Based off the script from D103695, we now mainly use BLENDV or OR(AND,ANDN) to select scalar float/double ops
|
| #
f0e8c1d6 |
| 06-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] Adjust 256-bit select costs to account for slow BLENDV op
Based off the script from D103695, on AVX1, Jaguar/Bulldozer both have low throughput for ymm select patterns (BLENDV + OR(
[CostModel][X86] Adjust 256-bit select costs to account for slow BLENDV op
Based off the script from D103695, on AVX1, Jaguar/Bulldozer both have low throughput for ymm select patterns (BLENDV + OR(AND,ANDN))), and even on AVX2 Haswell still struggles with BLENDV ops
show more ...
|
| #
86bb7df6 |
| 02-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512)
We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what genera
[CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512)
We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what generated the vXi1, I've been optimistic and assumed we can use PMOVMSKB to extract the maximum number of bools with a single op.
The MOVMSK pattern isn't great for extract+insert round trips as vXi1 type legalization can interfere with this a lot - so this relies on us remaining good at using getScalarizationOverhead properly (and tagging both Insert and Extract modes) for those round trip cases.
The AVX512 KMOV codegen for bool extraction is a bit of a mess so for now I've not included that - the per-element cost is a lot more accurate for current codegen.
show more ...
|
| #
d5198cf9 |
| 01-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] Check for 'null op' truncations
If the legalized src/dst types are the same, assume the "truncation" is free.
This fixes some edge cases such as mul lo/hi ops and bool vectors whic
[CostModel][X86] Check for 'null op' truncations
If the legalized src/dst types are the same, assume the "truncation" is free.
This fixes some edge cases such as mul lo/hi ops and bool vectors which will get legalized back to legal vector widths
show more ...
|
| #
c2964746 |
| 01-May-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets
Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count i
[CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets
Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput
show more ...
|
| #
371412e0 |
| 29-Apr-2022 |
Alexey Bataev <[email protected]> |
[COST]Fix crash for non-power-2 vector shuffle mask.
Need to normalizize the mask to avoid possible crashes during attempts to estimate cost of the very long shuffles with non-power-2 number of elem
[COST]Fix crash for non-power-2 vector shuffle mask.
Need to normalizize the mask to avoid possible crashes during attempts to estimate cost of the very long shuffles with non-power-2 number of elements in masks.
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
75e1cf4a |
| 14-Apr-2021 |
Alexey Bataev <[email protected]> |
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
Differential Revision: https://reviews.llvm.org/D100486
show more ...
|
| #
9861ca0c |
| 28-Apr-2022 |
Alexey Bataev <[email protected]> |
Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e3804ca216d4e76c88a38086eb61c200f9 to fix a crash reported in https://reviews.llvm.org/D100486#3479989.
|
| #
29a470e3 |
| 14-Apr-2021 |
Alexey Bataev <[email protected]> |
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
Differential Revision: https://reviews.llvm.org/D100486
show more ...
|
| #
fa8a9fea |
| 26-Apr-2022 |
Vasileios Porpodas <[email protected]> |
Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.
Code review: https://reviews.llvm.or
Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.
Code review: https://reviews.llvm.org/D124202
show more ...
|
| #
6a9bbd9f |
| 26-Apr-2022 |
Vasileios Porpodas <[email protected]> |
Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 55ce296d6f217fd0defed2592ff7b74b79b2c1f0.
|
| #
55ce296d |
| 21-Apr-2022 |
Vasileios Porpodas <[email protected]> |
[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is no
[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
show more ...
|
| #
889588ee |
| 20-Apr-2022 |
Vasileios Porpodas <[email protected]> |
[SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`.
Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`. This helps reduce the diff of a future SLP patch
[SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`.
Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`. This helps reduce the diff of a future SLP patch for AArch64.
show more ...
|
| #
d663166a |
| 25-Mar-2022 |
Simon Pilgrim <[email protected]> |
[CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets
Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instea
[CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets
Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instead of effective throughput
show more ...
|
| #
39aa202a |
| 24-Mar-2022 |
Vasileios Porpodas <[email protected]> |
Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354
This reverts commit e6ead19b774718113007ecb1a4
Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354
This reverts commit e6ead19b774718113007ecb1a4449d7af0cbcfeb.
show more ...
|