|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
d86a206f |
| 05-Jun-2022 |
Fangrui Song <[email protected]> |
Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options
|
| #
557efc9a |
| 04-Jun-2022 |
Fangrui Song <[email protected]> |
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the err
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
show more ...
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
851332a1 |
| 09-Mar-2022 |
Benoit Jacob <[email protected]> |
Fix linking error, undefined class static constants.
Reviewed By: spupyrev
Differential Revision: https://reviews.llvm.org/D121293
|
| #
ce29a042 |
| 09-Mar-2022 |
Vitaly Buka <[email protected]> |
Revert "Attempt to fix linking issue on the bot"
The issue was fixed with 48c74bb2e2a72830f1068823bfc2f6fd4b53d427
This reverts commit ac423a8c8aa87a128e51f3690afc1405d06b8c9d.
|
| #
ac423a8c |
| 08-Mar-2022 |
Vitaly Buka <[email protected]> |
Attempt to fix linking issue on the bot
|
| #
48c74bb2 |
| 08-Mar-2022 |
Fangrui Song <[email protected]> |
[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508
MinBaseDistance may be odr-used by std::max, leading to an undefined symbol linker
[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508
MinBaseDistance may be odr-used by std::max, leading to an undefined symbol linker error:
``` ld.lld: error: undefined symbol: (anonymous namespace)::MinCostMaxFlow::MinBaseDistance >>> referenced by SampleProfileInference.cpp:744 (/home/ray/llvm-project/llvm/lib/Transforms/Utils/SampleProfileInference.cpp:744) >>> lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/SampleProfileInference.cpp.o:((anonymous namespace)::FlowAdjuster::jumpDistance(llvm::FlowJump*) const) ```
Since llvm-project is still using C++ 14, workaround it with a cast.
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
81aedab7 |
| 24-Feb-2022 |
spupyrev <[email protected]> |
introducing some profi flags
Differential Revision: https://reviews.llvm.org/D120508
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
f2ade65f |
| 31-Jan-2022 |
spupyrev <[email protected]> |
[CSSPGO] Even flow distribution
Differential Revision: https://reviews.llvm.org/D118640
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
13d1364a |
| 10-Jan-2022 |
spupyrev <[email protected]> |
A better profi rebalancer
This is an extension of **profi** post-processing step that rebalances counts in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically, the new versio
A better profi rebalancer
This is an extension of **profi** post-processing step that rebalances counts in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically, the new version finds many more "unknown" subgraphs and marks more "unknown" basic blocks as hot (which prevents unwanted optimization passes).
I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8.
The algorithm is still linear and yields no build time overhead.
show more ...
|
| #
5f4ae564 |
| 18-Jan-2022 |
Jan Svoboda <[email protected]> |
[llvm] Remove uses of `std::vector<bool>`
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.
This patch does just
[llvm] Remove uses of `std::vector<bool>`
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.
This patch does just that for llvm.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D117121
show more ...
|
| #
e7774f49 |
| 26-Dec-2021 |
Kazu Hirata <[email protected]> |
Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
|
| #
93a2c291 |
| 02-Dec-2021 |
spupyrev <[email protected]> |
profi - a flow-based profile inference algorithm: Part III (out of 3)
This is a continuation of D109860 and D109903.
An important challenge for profile inference is caused by the fact that the samp
profi - a flow-based profile inference algorithm: Part III (out of 3)
This is a continuation of D109860 and D109903.
An important challenge for profile inference is caused by the fact that the sample profile is collected on a fully optimized binary, while the block and edge frequencies are consumed on an early stage of the compilation that operates with a non-optimized IR. As a result, some of the basic blocks may not have associated sample counts, and it is up to the algorithm to deduce missing frequencies. The problem is illustrated in the figure where three basic blocks are not present in the optimized binary and hence, receive no samples during profiling.
We found that it is beneficial to treat all such blocks equally. Otherwise the compiler may decide that some blocks are “cold” and apply undesirable optimizations (e.g., hot-cold splitting) regressing the performance. Therefore, we want to distribute the counts evenly along the blocks with missing samples. This is achieved by a post-processing step that identifies "dangling" subgraphs consisting of basic blocks with no sampled counts; once the subgraphs are found, we rebalance the flow so as every branch probability is 50:50 within the subgraphs.
Our experiments indicate up to 1% performance win using the optimization on some binaries and a significant improvement in the quality of profile counts (when compared to ground-truth instrumentation-based counts)
{F19093045}
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D109980
show more ...
|
| #
98dd2f9e |
| 02-Dec-2021 |
spupyrev <[email protected]> |
profi - a flow-based profile inference algorithm: Part II (out of 3)
This is a continuation of D109860.
Traditional flow-based algorithms cannot guarantee that the resulting edge frequencies corres
profi - a flow-based profile inference algorithm: Part II (out of 3)
This is a continuation of D109860.
Traditional flow-based algorithms cannot guarantee that the resulting edge frequencies correspond to a *connected* flow in the control-flow graph. For example, for an instance in the attached figure, a flow-based (or any other) inference algorithm may produce an output in which the hot loop is disconnected from the entry block (refer to the rightmost graph in the figure). Furthermore, creating a connected minimum-cost maximum flow is a computationally NP-hard problem. Hence, we apply a post-processing adjustments to the computed flow by connecting all isolated flow components ("islands").
This feature helps to keep all blocks with sample counts connected and results in significant performance wins for some binaries. {F19077343}
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D109903
show more ...
|
| #
22d82949 |
| 02-Dec-2021 |
Kazu Hirata <[email protected]> |
[llvm] Fix "unused variable" warnings
|
| #
7cc2493d |
| 01-Dec-2021 |
spupyrev <[email protected]> |
profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm,
profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected.
Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile.
The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(|V|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second.
The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts.
UPD Dec 1st 2021: - synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`; - added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109860
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
884b6dd3 |
| 23-Nov-2021 |
spupyrev <[email protected]> |
profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm,
profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected.
Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile.
The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(|V|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second.
The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109860
show more ...
|
| #
b00fc198 |
| 23-Nov-2021 |
spupyrev <[email protected]> |
profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm,
profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected.
Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile.
The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(|V|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second.
The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109860
show more ...
|