|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
96515df8 |
| 06-Jul-2022 |
Masoud Ataei <[email protected]> |
[PowerPC] Fix the check for scalar MASS conversion
Proposing to move the check for scalar MASS conversion from constructor of PPCTargetLowering to the lowerLibCallBase function which decides about t
[PowerPC] Fix the check for scalar MASS conversion
Proposing to move the check for scalar MASS conversion from constructor of PPCTargetLowering to the lowerLibCallBase function which decides about the lowering.
The Target machine option Options.PPCGenScalarMASSEntries is set in PPCTargetMachine.cpp. But an object of the class PPCTargetLowering is created in one of the included header files. So, the constructor will run before setting PPCGenScalarMASSEntries to correct value. So, we cannot check this option in the constructor.
Differential: https://reviews.llvm.org/D128653 Reviewer: @bmahjour
show more ...
|
| #
88b6d227 |
| 28-Jun-2022 |
Ting Wang <[email protected]> |
[PowerPC] Improve getNormalLoadInput to reach more splat load opportunities
There are straight forward splat load opportunities blocked by getNormalLoadInput(), since those cases involve consecutive
[PowerPC] Improve getNormalLoadInput to reach more splat load opportunities
There are straight forward splat load opportunities blocked by getNormalLoadInput(), since those cases involve consecutive bitcasts. Improve by looking through bitcasts.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D128703
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
e09f6ff3 |
| 20-Jun-2022 |
Nemanja Ivanovic <[email protected]> |
[PowerPC] Disable automatic generation of STXVP
There are instances where using paired vector stores leads to significant performance degradation due to issues with store forwarding.To avoid falling
[PowerPC] Disable automatic generation of STXVP
There are instances where using paired vector stores leads to significant performance degradation due to issues with store forwarding.To avoid falling into this trap with compiler - generated code, we will not emit these instructions unless the user requests them explicitly(with a builtin or by specifying the option).
Reviewed By : lei, amyk, saghir
Differential Revision: https://reviews.llvm.org/D127218
show more ...
|
| #
34033a84 |
| 15-Jun-2022 |
Amy Kwan <[email protected]> |
[PowerPC] Skip combine for vector_shuffles when two scalar_to_vector nodes are different vector types.
Currently in `combineVectorShuffle()`, we update the shuffle mask if either input vector comes
[PowerPC] Skip combine for vector_shuffles when two scalar_to_vector nodes are different vector types.
Currently in `combineVectorShuffle()`, we update the shuffle mask if either input vector comes from a scalar_to_vector, and we keep the respective input vectors in its permuted form by producing PPCISD::SCALAR_TO_VECTOR_PERMUTED. However, it is possible that we end up in a situation where both input vectors to the vector_shuffle are scalar_to_vector, and are different vector types. In situations like this, the shuffle mask is updated incorrectly as the current code assumes both scalar_to_vector inputs are the same vector type.
This patch skips the combines for vector_shuffle if both input vectors are scalar_to_vector, and if they are of different vector types. A follow up patch will focus on fixing this issue afterwards, in order to correctly update the shuffle mask.
Differential Revision: https://reviews.llvm.org/D127818
show more ...
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
335e8bf1 |
| 08-Jun-2022 |
Quinn Pham <[email protected]> |
[PowerPC] emit VSX instructions instead of VMX instructions for vector loads and stores
This patch changes the PowerPC backend to generate VSX load/store instructions for all vector loads/stores on
[PowerPC] emit VSX instructions instead of VMX instructions for vector loads and stores
This patch changes the PowerPC backend to generate VSX load/store instructions for all vector loads/stores on Power8 and earlier (LE) instead of VMX load/store instructions. The reason for this change is because VMX instructions require the vector to be 16-byte aligned. So, a vector load/store will fail with VMX instructions if the vector is misaligned. Also, `gcc` generates VSX instructions in this situation which allow for unaligned access but require a swap instruction after loading/before storing. This is not an issue for BE because we already emit VSX instructions since no swap is required. And this is not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which allow for unaligned access and do not require swaps.
This patch also delays the VSX load/store for LE combines until after LegalizeOps to prioritize other load/store combines.
Reviewed By: #powerpc, stefanp
Differential Revision: https://reviews.llvm.org/D127309
show more ...
|
| #
263f1b2f |
| 14-Jun-2022 |
Stefan Pintilie <[email protected]> |
[PowerPC] Fix combine step for shufflevector.
The combine step for shufflevector will sometimes replace undef in the mask with a defined value. This can cause an infinite loop in some cases as anoth
[PowerPC] Fix combine step for shufflevector.
The combine step for shufflevector will sometimes replace undef in the mask with a defined value. This can cause an infinite loop in some cases as another combine will then put the undef back in the mask.
This patch fixes the issue so that undefs are not replaced when doing a combine.
Reviewed By: ZarkoCA, amyk, quinnp, saghir
Differential Revision: https://reviews.llvm.org/D127439
show more ...
|
| #
07881861 |
| 03-Jun-2022 |
Guillaume Chatelet <[email protected]> |
[Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with
[Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times.
Differential Revision: https://reviews.llvm.org/D126910
show more ...
|
| #
ad73ce31 |
| 26-May-2022 |
Zongwei Lan <[email protected]> |
[Target] use getSubtarget<> instead of static_cast<>(getSubtarget())
Differential Revision: https://reviews.llvm.org/D125391
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
115c1888 |
| 06-May-2022 |
David Green <[email protected]> |
[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast
[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests.
The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general.
Differential Revision: https://reviews.llvm.org/D123801
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
fb193db2 |
| 20-Apr-2022 |
Fangrui Song <[email protected]> |
[PowerPC] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
18679ac0 |
| 08-Apr-2022 |
Kai Luo <[email protected]> |
[PowerPC] Adjust `MaxAtomicSizeInBitsSupported` on PPC64
AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 exp
[PowerPC] Adjust `MaxAtomicSizeInBitsSupported` on PPC64
AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_*` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_*` libcalls. Thanks @efriedma for pointing it out.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122868
show more ...
|
| #
549e118e |
| 08-Apr-2022 |
Kai Luo <[email protected]> |
[PowerPC] Support 16-byte lock free atomics on pwr8 and up
Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on
[PowerPC] Support 16-byte lock free atomics on pwr8 and up
Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D122377
show more ...
|
| #
b389354b |
| 06-Apr-2022 |
Ting Wang <[email protected]> |
[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend
Add support for builtin_[max|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; t
[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend
Add support for builtin_[max|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; they must all be float, double, or long double. Internally use SelectCC to get the result.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D122478
show more ...
|
| #
585c85ab |
| 31-Mar-2022 |
Stefan Pintilie <[email protected]> |
[PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes.
To store a byval parameter the existing code would store as many 8 byte elements as was required to store the full size of
[PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes.
To store a byval parameter the existing code would store as many 8 byte elements as was required to store the full size of the byval parameter. For example, a paramter of size 16 would store two element of 8 bytes. A paramter of size 12 would also store two elements of 8 bytes. This would sometimes store too many bytes as the size of the paramter is not always a factor of 8.
This patch fixes that issue and now byval paramters are stored with the correct number of bytes.
Reviewed By: nemanjai, #powerpc, quinnp, amyk
Differential Revision: https://reviews.llvm.org/D121430
show more ...
|
| #
662b9fa0 |
| 28-Mar-2022 |
Shao-Ce SUN <[email protected]> |
[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122557
|
| #
c1a31ee6 |
| 20-Mar-2022 |
Aaron Puchert <[email protected]> |
[PPCISelLowering] Avoid emitting calls to __multi3, __muloti4
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3 instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, no
[PPCISelLowering] Avoid emitting calls to __multi3, __muloti4
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3 instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not even with compiler-rt. Block it as well so that we get inline code.
Because libgcc doesn't have __muloti4, we block that as well.
Fixes #54460.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D122090
show more ...
|
| #
300e1293 |
| 15-Mar-2022 |
Qiu Chaofan <[email protected]> |
[PowerPC] Disable perfect shuffle by default
We are going to remove the old 'perfect shuffle' optimization since it brings performance penalty in hot loop around vectors. For example, in following l
[PowerPC] Disable perfect shuffle by default
We are going to remove the old 'perfect shuffle' optimization since it brings performance penalty in hot loop around vectors. For example, in following loop sharing the same mask:
%v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead of `vperm-vperm`. In some large loop cases, this causes 20%+ performance penalty.
The original attempt to resolve this is to pre-record masks of every shufflevector operation in DAG, but that is somewhat complex and brings unnecessary computation (to scan all nodes) in optimization. Here we disable it by default. There're indeed some cases becoming worse after this, which will be fixed in a more careful way in future patches.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D121082
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
30f30e1c |
| 08-Mar-2022 |
Masoud Ataei <[email protected]> |
[PowerPC] Fix the none tail call in scalar MASS conversion This patch is proposing a fix for patch https://reviews.llvm.org/D101759 on none tail call math function conversion to MASS call.
Different
[PowerPC] Fix the none tail call in scalar MASS conversion This patch is proposing a fix for patch https://reviews.llvm.org/D101759 on none tail call math function conversion to MASS call.
Differential: https://reviews.llvm.org/D121016
reviewer: @nemanjai
show more ...
|
| #
b2497e54 |
| 07-Mar-2022 |
Qiu Chaofan <[email protected]> |
[PowerPC] Add generic fnmsub intrinsic
Currently in Clang, we have two types of builtins for fnmsub operation: one for float/double vector, they'll be transformed into IR operations; one for float/d
[PowerPC] Add generic fnmsub intrinsic
Currently in Clang, we have two types of builtins for fnmsub operation: one for float/double vector, they'll be transformed into IR operations; one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
43d48ed2 |
| 20-Feb-2022 |
Qiu Chaofan <[email protected]> |
[PowerPC] Add option to disable perfect shuffle
Perfect shuffle was introduced into PowerPC backend years ago, and only available in big-endian subtargets. This optimization has good effects in simp
[PowerPC] Add option to disable perfect shuffle
Perfect shuffle was introduced into PowerPC backend years ago, and only available in big-endian subtargets. This optimization has good effects in simple cases, but brings serious negative impact in large programs with many shuffle instructions sharing the same mask.
Here introduces a temporary backend hidden option to control it until we implemented better way to fix the gap in vectorshuffle decomposition.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D120072
show more ...
|
| #
097a95f2 |
| 10-Feb-2022 |
Ting Wang <[email protected]> |
[PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection, and this opens the opportunity to improve instructi
[PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection, and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128, llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.
Reviewed By: nemanjai, shchenz, amyk
Differential Revision: https://reviews.llvm.org/D117006
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
149195f5 |
| 07-Feb-2022 |
Nikita Popov <[email protected]> |
[PPCISelLowering] Avoid use of getPointerElementType()
Use the value type instead.
|
| #
8ce13bc9 |
| 04-Feb-2022 |
Masoud Ataei <[email protected]> |
[PowerPC] Option controling scalar MASS convertion
differential: https://reviews.llvm.org/D119035
reviewer: bmahjour
|
| #
85243124 |
| 04-Feb-2022 |
Benjamin Kramer <[email protected]> |
Tweak some uses of std::iota to skip initializing the underlying storage. NFCI.
|
| #
256d2533 |
| 02-Feb-2022 |
Masoud Ataei <[email protected]> |
[PowerPC] Scalar IBM MASS library conversion pass
This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need t
[PowerPC] Scalar IBM MASS library conversion pass
This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.
Differential: https://reviews.llvm.org/D101759
Reviewer: bmahjour
show more ...
|