|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
611ffcf4 |
| 14-Jul-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use value instead of getValue (NFC)
|
| #
7e86b13c |
| 28-Jun-2022 |
wlei <[email protected]> |
[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion an
[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.
One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.
Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.
Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D127031
show more ...
|
| #
a7938c74 |
| 26-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.
|
| #
3b7c3a65 |
| 25-Jun-2022 |
Kazu Hirata <[email protected]> |
Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
|
| #
aa8feeef |
| 25-Jun-2022 |
Kazu Hirata <[email protected]> |
Don't use Optional::hasValue (NFC)
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
d86a206f |
| 05-Jun-2022 |
Fangrui Song <[email protected]> |
Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options
|
| #
557efc9a |
| 04-Jun-2022 |
Fangrui Song <[email protected]> |
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the err
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
9f732af5 |
| 12-May-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] Filter out oversized LBR ranges.
As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid.
For example, the last two pairs in the following trace
[llvm-profgen] Filter out oversized LBR ranges.
As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid.
For example, the last two pairs in the following trace form a range [0x0d7b02b0, 0x368ba706] that covers a ton of functions in the binary. Such oversized range should also be ignored.
0x0c74505f/0x368b99a0 **0x368ba706**/0x0c745040 0x0d7b1c3f/**0x0d7b02b0**
Add a defensive check to filter out those ranges based that the valid range should not cross the unconditional branch(Call, return, unconditional jmp).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D125448
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
bfcb2c11 |
| 24-Apr-2022 |
wlei <[email protected]> |
[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues
This patch is fixing two issues for both CS and non-CS. 1) For external-call-internal, the head samp
[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues
This patch is fixing two issues for both CS and non-CS. 1) For external-call-internal, the head samples of the the internal function should be recorded. 2) avoid ignoring LBR after meeting the interrupt branch for CS profile
LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder.
Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context.
The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D118177
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
3f970168 |
| 23-Mar-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] Decoding pseudo probe for profiled function only.
Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in
[llvm-profgen] Decoding pseudo probe for profiled function only.
Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled.
To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain.
I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding.
I'm seeing 20GB memory is saved for one of our internal large service.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D121643
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
db29f437 |
| 23-Feb-2022 |
serge-sans-paille <[email protected]> |
Cleanup include: DebugInfo/Symbolize
Estimation of the impact on preprocessor output after: 1067349756 before:1067487786
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-
Cleanup include: DebugInfo/Symbolize
Estimation of the impact on preprocessor output after: 1067349756 before:1067487786
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120433
show more ...
|
| #
b3a778fb |
| 22-Feb-2022 |
wlei <[email protected]> |
[llvm-profgen] Support symbol loading for debug fission
Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the `getNonSkeletonUnitDIE(false)` API to achieve this.
Add
[llvm-profgen] Support symbol loading for debug fission
Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the `getNonSkeletonUnitDIE(false)` API to achieve this.
Add test cause to make sure all the ranges is well retrieved by the loader.
Reviewed By: ayermolo, hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115973
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
34e131b0 |
| 28-Jan-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] On-demand track optimized-away inlinees for preinliner.
Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking
[llvm-profgen] On-demand track optimized-away inlinees for preinliner.
Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking on-demand based on profiled functions only. This saves about 10% memory overall for a medium-sized benchmark.
Before:
note: After parsePerfTraces note: Thu Jan 27 18:42:09 2022 note: VM: 8.68 GB RSS: 8.39 GB note: After computeSizeForProfiledFunctions note: Thu Jan 27 18:42:41 2022 note: **VM: 10.63 GB RSS: 10.20 GB** note: After generateProbeBasedProfile note: Thu Jan 27 18:45:49 2022 note: VM: 25.00 GB RSS: 24.95 GB note: After postProcessProfiles note: Thu Jan 27 18:49:29 2022 note: VM: 26.34 GB RSS: 26.27 GB
After: note: After parsePerfTraces note: Fri Jan 28 12:04:49 2022 note: VM: 8.68 GB RSS: 7.65 GB note: After computeSizeForProfiledFunctions note: Fri Jan 28 12:05:26 2022 note: **VM: 8.68 GB RSS: 8.42 GB** note: After generateProbeBasedProfile note: Fri Jan 28 12:08:03 2022 note: VM: 22.93 GB RSS: 22.89 GB note: After postProcessProfiles note: Fri Jan 28 12:11:30 2022 note: VM: 24.27 GB RSS: 24.22 GB
This should be a no-diff change in terms of profile quality.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D118515
show more ...
|
| #
c56a85fd |
| 02-Feb-2022 |
Simon Pilgrim <[email protected]> |
[llvm-profgen] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointers are dereferenced immediately, so assert the cast is correct instead of returning nullptr
|
| #
6693c562 |
| 25-Jan-2022 |
wlei <[email protected]> |
[llvm-profgen] Support to load debug info from a second binary
For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug)
[llvm-profgen] Support to load debug info from a second binary
For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug). Here add support in llvm-profgen to use two binaries as input. The original one is executable binary and added for debug info only binary. Adding a flag `--debug-binary=file-path`, with this, the binary will load debug info from debug binary.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115948
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
f4aa2a42 |
| 14-Jan-2022 |
Simon Pilgrim <[email protected]> |
[llvm-profgen] ProfiledBinary::load - use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointer is always dereferenced immediately, so assert the cast is correct instead of return
[llvm-profgen] ProfiledBinary::load - use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointer is always dereferenced immediately, so assert the cast is correct instead of returning nullptr
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
32205717 |
| 13-Dec-2021 |
wlei <[email protected]> |
[llvm-profgen] Skip disassembling for PLT section
Skip disassembling .plt section, then .plt section code will be treated as external code.
Reviewed By: hoy, wenlei
Differential Revision: https://
[llvm-profgen] Skip disassembling for PLT section
Skip disassembling .plt section, then .plt section code will be treated as external code.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115699
show more ...
|
| #
484a569e |
| 03-Dec-2021 |
wlei <[email protected]> |
[llvm-profgen] Fix total samples related issues
Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation
[llvm-profgen] Fix total samples related issues
Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool.
Three changes in this diff:
1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch.
2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold.
3. Change to accumulate total samples per byte instead of per instruction.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115013
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
41a681ce |
| 04-Nov-2021 |
wlei <[email protected]> |
[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator
In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen:
1) Dissassemble .r
[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator
In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen:
1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary.
2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`.
3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData.
Reviewed By: xur, hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113296
show more ...
|
| #
f7976edc |
| 12-Nov-2021 |
Wenlei He <[email protected]> |
[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset
Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By defau
[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset
Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools.
Differential Revision: https://reviews.llvm.org/D113727
show more ...
|
| #
aab18100 |
| 09-Nov-2021 |
wlei <[email protected]> |
[llvm-profgen] Fix bug of setting function entry
Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore report
[llvm-profgen] Fix bug of setting function entry
Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore reporting callsite sample that's from an intra func branch. However, in HHVM, it appears that the symbol table name is inconsistent with the dwarf info func name, it's likely due to `OptimizeGlobalAliases`.
This change is a workaround in llvm-profgen side to mark the only one range as the function entry and add warnings for the remaining inconsistence.
This also fixed a missing `getCanonicalFnName` for symbol name which caused the mismatching as well.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113492
show more ...
|
| #
5bf191a3 |
| 05-Nov-2021 |
wlei <[email protected]> |
[llvm-profgen] Fix index out of bounds error while using ip.advance
Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound
[llvm-profgen] Fix index out of bounds error while using ip.advance
Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113238
show more ...
|
| #
138202a8 |
| 26-Oct-2021 |
wlei <[email protected]> |
[llvm-profgen] Warn on invalid range and show warning summary
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In
[llvm-profgen] Warn on invalid range and show warning summary
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.
Example of warning summary. ``` warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction. warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function. ```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D111902
show more ...
|
| #
f5537643 |
| 27-Oct-2021 |
wlei <[email protected]> |
[llvm-profgen] Update total samples by accumulating all its body samples
Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update
[llvm-profgen] Update total samples by accumulating all its body samples
Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112672
show more ...
|
| #
3b285ff5 |
| 29-Oct-2021 |
Kazu Hirata <[email protected]> |
[llvm-profgen] Fix a set-but-unused warning
This patch fixes:
llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable 'EndOffset' set but not used [-Werror,-Wunused-but-set-variable]
[llvm-profgen] Fix a set-but-unused warning
This patch fixes:
llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable 'EndOffset' set but not used [-Werror,-Wunused-but-set-variable]
The last use of the variable was removed on Oct 26 in commit 40ca4112515d03bbcf594bd2dfa6b4394d5b00d6.
show more ...
|