|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
bb82f746 |
| 23-May-2022 |
Jingu Kang <[email protected]> |
Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.
The commmit from https://reviews.llvm.org/D125918 has fix
Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.
The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure.
Differential Revision: https://reviews.llvm.org/D118979
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
42ebfa82 |
| 12-Apr-2022 |
Muhammad Omair Javaid <[email protected]> |
Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.
This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:
ht
Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.
This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:
https://lab.llvm.org/buildbot/#/builders/176/builds/1515
llvm-tblgen crashes after applying this patch.
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
64b6192e |
| 05-Apr-2022 |
Jingu Kang <[email protected]> |
[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth
Set the maximum VF of AArch64 with 128 / the size of smallest type in loop.
Differential Revision: https://reviews.llvm.org/D118979
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
23c2f2e6 |
| 07-Jun-2021 |
Florian Hahn <[email protected]> |
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail.
If th
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail.
If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW.
This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling.
At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything.
Note that this could probably be further improved by using information from the original IV.
Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g
Part of a set of fixes required for PR50412.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D103255
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
| #
f31811f2 |
| 27-Jun-2020 |
Fangrui Song <[email protected]> |
[BasicAA] Rename deprecated -basicaa to -basic-aa
Follow-up to D82607 Revert an accidental change (empty.ll) of D82683
|
|
Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
| #
cee313d2 |
| 17-Apr-2019 |
Eric Christopher <[email protected]> |
Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.
Will be re-reverting again.
llvm-svn: 358552
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1 |
|
| #
c12306c0 |
| 01-Jul-2017 |
Teresa Johnson <[email protected]> |
Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction instructions to dehao.
llvm-svn: 306936
|
| #
eb4fba9d |
| 01-Jul-2017 |
Teresa Johnson <[email protected]> |
re-commit r306336: Enable vectorizer-maximize-bandwidth by default.
Differential Revision: https://reviews.llvm.org/D33341
llvm-svn: 306935
|
| #
de56903b |
| 01-Jul-2017 |
Teresa Johnson <[email protected]> |
revert r306336 for breaking ppc test.
llvm-svn: 306934
|
| #
1fbaffeb |
| 01-Jul-2017 |
Teresa Johnson <[email protected]> |
Enable vectorizer-maximize-bandwidth by default.
Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu ben
Enable vectorizer-maximize-bandwidth by default.
Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:
spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35%
geometric mean +0.29%
The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.
I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.
Reviewers: hfinkel, mkuper, davidxl, chandlerc
Reviewed By: chandlerc
Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33341
llvm-svn: 306933
show more ...
|
| #
5ce1ce74 |
| 30-Jun-2017 |
Daniel Jasper <[email protected]> |
Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction instructions to dehao.
llvm-svn: 306792
|
| #
920d0225 |
| 27-Jun-2017 |
Dehao Chen <[email protected]> |
re-commit r306336: Enable vectorizer-maximize-bandwidth by default.
Differential Revision: https://reviews.llvm.org/D33341
llvm-svn: 306473
|
| #
8b7effb3 |
| 26-Jun-2017 |
Dehao Chen <[email protected]> |
revert r306336 for breaking ppc test.
llvm-svn: 306344
|
| #
79655792 |
| 26-Jun-2017 |
Dehao Chen <[email protected]> |
Enable vectorizer-maximize-bandwidth by default.
Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu ben
Enable vectorizer-maximize-bandwidth by default.
Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:
spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35%
geometric mean +0.29%
The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.
I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.
Reviewers: hfinkel, mkuper, davidxl, chandlerc
Reviewed By: chandlerc
Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33341
llvm-svn: 306336
show more ...
|
| #
b512e915 |
| 22-Jun-2017 |
Diana Picus <[email protected]> |
Revert "Enable vectorizer-maximize-bandwidth by default."
This reverts commit r305960 because it broke self-hosting on AArch64.
llvm-svn: 305990
|
| #
014db29b |
| 21-Jun-2017 |
Dehao Chen <[email protected]> |
Enable vectorizer-maximize-bandwidth by default.
Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu ben
Enable vectorizer-maximize-bandwidth by default.
Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:
spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35%
geometric mean +0.29%
The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.
I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.
Reviewers: hfinkel, mkuper, davidxl, chandlerc
Reviewed By: chandlerc
Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33341
llvm-svn: 305960
show more ...
|
|
Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3 |
|
| #
a20352e1 |
| 09-Jun-2017 |
George Burgess IV <[email protected]> |
[LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.
If we're shrinking a binary operation, it may be the case that the new operations wraps where the old didn't. If this happens, the behav
[LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.
If we're shrinking a binary operation, it may be the case that the new operations wraps where the old didn't. If this happens, the behavior should be well-defined. So, we can't always carry wrapping flags with us when we shrink operations.
If we do, we get incorrect optimizations in cases like:
void foo(const unsigned char *from, unsigned char *to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] - 128; }
which gets optimized to:
void foo(const unsigned char *from, unsigned char *to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] | 128; }
Because: - InstCombine turned `sub i32 %from.i, 128` into `add nuw nsw i32 %from.i, 128`. - LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a vector full of `i8 128`s - InstCombine took advantage of the fact that the newly-shrunken add "couldn't wrap", and changed the `add` to an `or`.
InstCombine seems happy to figure out whether we can add nuw/nsw on its own, so I just decided to drop the flags. There are already a number of places in LoopVectorize where we rely on InstCombine to clean up.
llvm-svn: 305053
show more ...
|
|
Revision tags: llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1 |
|
| #
dcdb325f |
| 13-Apr-2017 |
Anna Thomas <[email protected]> |
[LV] Fix the vector code generation for first order recurrence
Summary: In first order recurrences where phi's are used outside the loop, we should generate an additional vector.extract of the secon
[LV] Fix the vector code generation for first order recurrence
Summary: In first order recurrences where phi's are used outside the loop, we should generate an additional vector.extract of the second last element from the vectorized phi update. This is because we require the phi itself (which is the value at the second last iteration of the vector loop) and not the phi's update within the loop. Also fix the code gen when we just unroll, but don't vectorize. Fixes PR32396.
Reviewers: mssimpso, mkuper, anemet
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31979
llvm-svn: 300238
show more ...
|
| #
00dc1b74 |
| 11-Apr-2017 |
Anna Thomas <[email protected]> |
[LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such that the last element in the vector will be the one ext
[LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such that the last element in the vector will be the one extracted to pass into the scalar remainder loop. However, this is not true when there is a phi (other than the primary induction variable) is used outside the loop. In such a case, we need the value from the second last iteration (i.e. the phi value), not the last iteration (which would be the phi update). I've added a test case for this. Also see PR32396.
A follow up patch would generate the correct code gen for such cases, and turn this vectorization on.
Differential Revision: https://reviews.llvm.org/D31910
Reviewers: mssimpso llvm-svn: 299985
show more ...
|
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
| #
aa1d6388 |
| 10-May-2016 |
James Molloy <[email protected]> |
Revert "[VectorUtils] Query number of sign bits to allow more truncations"
This was a fairly simple patch but on closer inspection was seriously flawed and caused PR27690.
This reverts commit r2689
Revert "[VectorUtils] Query number of sign bits to allow more truncations"
This was a fairly simple patch but on closer inspection was seriously flawed and caused PR27690.
This reverts commit r268921.
llvm-svn: 269051
show more ...
|
| #
5c20e27b |
| 09-May-2016 |
James Molloy <[email protected]> |
[VectorUtils] Query number of sign bits to allow more truncations
When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more inf
[VectorUtils] Query number of sign bits to allow more truncations
When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more information and allow more truncations.
llvm-svn: 268921
show more ...
|
| #
8e46cd05 |
| 30-Mar-2016 |
James Molloy <[email protected]> |
[VectorUtils] Don't try and truncate PHIs to a smaller bitwidth
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because bot
[VectorUtils] Don't try and truncate PHIs to a smaller bitwidth
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.
However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.
This fixes PR17018.
llvm-svn: 264852
show more ...
|
|
Revision tags: llvmorg-3.8.0, llvmorg-3.8.0-rc3 |
|
| #
6e518a3b |
| 03-Feb-2016 |
James Molloy <[email protected]> |
[DemandedBits] Revert r249687 due to PR26071
This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken.
From the writeup in PR2607
[DemandedBits] Revert r249687 due to PR26071
This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken.
From the writeup in PR26071:
What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs.
This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails.
The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one.
llvm-svn: 259649
show more ...
|
|
Revision tags: llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1 |
|
| #
45f67d52 |
| 09-Nov-2015 |
James Molloy <[email protected]> |
[LoopVectorize] Address post-commit feedback on r250032
Implemented as many of Michael's suggestions as were possible: * clang-format the added code while it is still fresh. * tried to change Va
[LoopVectorize] Address post-commit feedback on r250032
Implemented as many of Michael's suggestions as were possible: * clang-format the added code while it is still fresh. * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible. * Reduce the pass list on loop-vectorization-factors.ll. * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I].
llvm-svn: 252469
show more ...
|
| #
55d633bd |
| 12-Oct-2015 |
James Molloy <[email protected]> |
[LoopVectorize] Shrink integer operations into the smallest type possible
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int type (e.g. i32) whenever arithmetic is performed
[LoopVectorize] Shrink integer operations into the smallest type possible
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int type (e.g. i32) whenever arithmetic is performed on them.
For targets with native i8 or i16 operations, usually InstCombine can shrink the arithmetic type down again. However InstCombine refuses to create illegal types, so for targets without i8 or i16 registers, the lengthening and shrinking remains.
Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when their scalar equivalents do not, so during vectorization it is important to remove these lengthens and truncates when deciding the profitability of vectorization.
The algorithm this uses starts at truncs and icmps, trawling their use-def chains until they terminate or instructions outside the loop are found (or unsafe instructions like inttoptr casts are found). If the use-def chains starting from different root instructions (truncs/icmps) meet, they are unioned. The demanded bits of each node in the graph are ORed together to form an overall mask of the demanded bits in the entire graph. The minimum bitwidth that graph can be truncated to is the bitwidth minus the number of leading zeroes in the overall mask.
The intention is that this algorithm should "first do no harm", so it will never insert extra cast instructions. This is why the use-def graphs are unioned, so that subgraphs with different minimum bitwidths do not need casts inserted between them.
This algorithm works hard to reduce compile time impact. DemandedBits are only queried if there are extends of illegal types and if a truncate to an illegal type is seen. In the general case, this results in a simple linear scan of the instructions in the loop.
No non-noise compile time impact was seen on a clang bootstrap build.
llvm-svn: 250032
show more ...
|