|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
b62e3a73 |
| 16-Jun-2022 |
Corentin Jabot <[email protected]> |
Replace to_hexString by touhexstr [NFC]
LLVM had 2 methods to convert a number to an hexa string, this remove one of them.
Differential Revision: https://reviews.llvm.org/D127958
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
d86a206f |
| 05-Jun-2022 |
Fangrui Song <[email protected]> |
Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options
|
| #
557efc9a |
| 04-Jun-2022 |
Fangrui Song <[email protected]> |
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the err
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
9f732af5 |
| 12-May-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] Filter out oversized LBR ranges.
As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid.
For example, the last two pairs in the following trace
[llvm-profgen] Filter out oversized LBR ranges.
As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid.
For example, the last two pairs in the following trace form a range [0x0d7b02b0, 0x368ba706] that covers a ton of functions in the binary. Such oversized range should also be ignored.
0x0c74505f/0x368b99a0 **0x368ba706**/0x0c745040 0x0d7b1c3f/**0x0d7b02b0**
Add a defensive check to filter out those ranges based that the valid range should not cross the unconditional branch(Call, return, unconditional jmp).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D125448
show more ...
|
|
Revision tags: llvmorg-14.0.3 |
|
| #
e36786d1 |
| 28-Apr-2022 |
Hongtao Yu <[email protected]> |
[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `P
[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D122602
show more ...
|
|
Revision tags: llvmorg-14.0.2 |
|
| #
bfcb2c11 |
| 24-Apr-2022 |
wlei <[email protected]> |
[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues
This patch is fixing two issues for both CS and non-CS. 1) For external-call-internal, the head samp
[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues
This patch is fixing two issues for both CS and non-CS. 1) For external-call-internal, the head samples of the the internal function should be recorded. 2) avoid ignoring LBR after meeting the interrupt branch for CS profile
LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder.
Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context.
The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D118177
show more ...
|
| #
17f6cba3 |
| 15-Apr-2022 |
Wenlei He <[email protected]> |
[llvm-profgen] Add process filter for perf reader
For profile generation, we need to filter raw perf samples for binary of interest. Sometimes binary name along isn't enough as we can have binary of
[llvm-profgen] Add process filter for perf reader
For profile generation, we need to filter raw perf samples for binary of interest. Sometimes binary name along isn't enough as we can have binary of the same name running in the system. This change adds a process id filter to allow users to further disambiguiate the input raw samples.
Differential Revision: https://reviews.llvm.org/D123869
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
8a0406dc |
| 08-Apr-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] Filter out invalid LBR ranges.
The profiler can sometimes give us a LBR trace that implicates bogus code ranges. For example,
0xc5acb56/0xc66c6c0 0xc628195/0xf31fbb0 0xc611261/0x
[llvm-profgen] Filter out invalid LBR ranges.
The profiler can sometimes give us a LBR trace that implicates bogus code ranges. For example,
0xc5acb56/0xc66c6c0 0xc628195/0xf31fbb0 0xc611261/0xc628130 0xc5c1a21/0xc6111c0 0x1f7edfd3/0xc5c3a50 0xc5c154f/0x1f7edec0 0xe8eed07/0xc5c11e0
, note that the first two pairs are supposed to form a linear execution range, in this case, it is [0xf31fbb0, 0xc5acb56] , which doesn't make sense.
Such bogus ranges should be ruled out to avoid generating a bad profile. I'm fixing this for both CS and non-CS cases.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D123271
show more ...
|
| #
3f970168 |
| 23-Mar-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] Decoding pseudo probe for profiled function only.
Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in
[llvm-profgen] Decoding pseudo probe for profiled function only.
Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled.
To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain.
I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding.
I'm seeing 20GB memory is saved for one of our internal large service.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D121643
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
db29f437 |
| 23-Feb-2022 |
serge-sans-paille <[email protected]> |
Cleanup include: DebugInfo/Symbolize
Estimation of the impact on preprocessor output after: 1067349756 before:1067487786
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-
Cleanup include: DebugInfo/Symbolize
Estimation of the impact on preprocessor output after: 1067349756 before:1067487786
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120433
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
67db3111 |
| 01-Feb-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a l
[llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a large benchmark.
Before: note: Before parsePerfTraces note: VM: 40.73 GB RSS: 39.18 GB note: Before parseAndAggregateTrace note: VM: 40.73 GB RSS: 39.18 GB note: After parseAndAggregateTrace note: VM: 88.93 GB RSS: 87.97 GB note: Before generateUnsymbolizedProfile note: VM: 88.95 GB RSS: 87.99 GB note: After generateUnsymbolizedProfile note: VM: 93.50 GB RSS: 92.53 GB note: After computeSizeForProfiledFunctions note: VM: 101.13 GB RSS: 99.36 GB note: After generateProbeBasedProfile note: VM: 215.61 GB RSS: 210.88 GB note: After postProcessProfiles note: VM: 237.48 GB RSS: 212.50 GB
After: note: Before parsePerfTraces note: VM: 40.73 GB RSS: 39.18 GB note: Before parseAndAggregateTrace note: VM: 40.73 GB RSS: 39.18 GB note: After parseAndAggregateTrace note: VM: 88.93 GB RSS: 87.96 GB note: Before generateUnsymbolizedProfile note: VM: 88.95 GB RSS: 87.97 GB note: After generateUnsymbolizedProfile note: VM: 93.50 GB RSS: 92.51 GB note: After computeSizeForProfiledFunctions note: VM: 93.50 GB RSS: 92.53 GB note: After generateProbeBasedProfile note: VM: 164.87 GB RSS: 163.55 GB note: After postProcessProfiles note: VM: 182.28 GB RSS: 179.43 GB
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D118677
show more ...
|
| #
fec57e5b |
| 01-Feb-2022 |
Hongtao Yu <[email protected]> |
Revert "[llvm-profgen] Clean up unnecessary memory reservations between phases."
This reverts commit 057e784b0962a7c5a17e858932bb6f03c7676c47.
|
| #
057e784b |
| 01-Feb-2022 |
Hongtao Yu <[email protected]> |
[llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a l
[llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a large benchmark.
Before: note: Before parsePerfTraces note: VM: 40.73 GB RSS: 39.18 GB note: Before parseAndAggregateTrace note: VM: 40.73 GB RSS: 39.18 GB note: After parseAndAggregateTrace note: VM: 88.93 GB RSS: 87.97 GB note: Before generateUnsymbolizedProfile note: VM: 88.95 GB RSS: 87.99 GB note: After generateUnsymbolizedProfile note: VM: 93.50 GB RSS: 92.53 GB note: After computeSizeForProfiledFunctions note: VM: 101.13 GB RSS: 99.36 GB note: After generateProbeBasedProfile note: VM: 215.61 GB RSS: 210.88 GB note: After postProcessProfiles note: VM: 237.48 GB RSS: 212.50 GB
After: note: Before parsePerfTraces note: VM: 40.73 GB RSS: 39.18 GB note: Before parseAndAggregateTrace note: VM: 40.73 GB RSS: 39.18 GB note: After parseAndAggregateTrace note: VM: 88.93 GB RSS: 87.96 GB note: Before generateUnsymbolizedProfile note: VM: 88.95 GB RSS: 87.97 GB note: After generateUnsymbolizedProfile note: VM: 93.50 GB RSS: 92.51 GB note: After computeSizeForProfiledFunctions note: VM: 93.50 GB RSS: 92.53 GB note: After generateProbeBasedProfile note: VM: 164.87 GB RSS: 163.55 GB note: After postProcessProfiles note: VM: 182.28 GB RSS: 179.43 GB
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D118677
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
b239b2b0 |
| 16-Dec-2021 |
wlei <[email protected]> |
[llvm-profgen] Fix warning of enumerated and non-enumerated type in conditional expression
Differential Revision: https://reviews.llvm.org/D115842
|
| #
0f53df86 |
| 14-Dec-2021 |
wlei <[email protected]> |
[CSSPGO][llvm-profgen] Fix external address issues of perf reader (return to external addr part)
Before we have an issue with artificial LBR whose source is a return, recalling that "an internal cod
[CSSPGO][llvm-profgen] Fix external address issues of perf reader (return to external addr part)
Before we have an issue with artificial LBR whose source is a return, recalling that "an internal code(A) can return to external address, then from the external address call a new internal code(B), making an artificial branch that looks like a return from A to B can confuse the unwinder". We just ignore the LBRs after this artificial LBR which can miss some samples. This change aims at fixing this by correctly unwinding them instead of ignoring them.
List some typical scenarios covered by this change.
1) multiple sequential call back happen in external address, e.g.
``` [ext, call, foo] [foo, return, ext] [ext, call, bar] ``` Unwinder should avoid having foo return from bar. Wrong call stack is like [foo, bar]
2) the call stack before and after external call should be correctly unwinded. ``` {call stack1} {call stack2} [foo, call, ext] [ext, call, bar] [bar, return, ext] [ext, return, foo ] ``` call stack 1 should be the same to call stack2. Both shouldn't be truncated
3) call stack should be truncated after call into external code since we can't do inlining with external code.
``` [foo, call, ext] [ext, call, bar] [bar, call, baz] [baz, return, bar ] [bar, return, ext] ``` the call stack of code in baz should not include foo.
### Implementation:
We leverage artificial frame to fix #2 and #3: when we got a return artificial LBR, push an extra artificial frame to the stack. when we pop frame, check if the parent is an artificial frame to pop(fix #2). Therefore, call/ return artificial LBR is just the same as regular LBR which can keep the call stack.
While recording context on the trie, artificial frame is used as a tag indicating that we should truncate the call stack(fix #3).
To differentiate #1 and #2, we leverage `getCallAddrFromFrameAddr`. Normally the target of the return should be the next inst of a call inst and `getCallAddrFromFrameAddr` will return the address of call inst. Otherwise, getCallAddrFromFrameAddr will return to 0 which is the case of #1.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115550
show more ...
|
| #
30c3aba9 |
| 12-Dec-2021 |
wlei <[email protected]> |
[llvm-profgen] Fix to use getUntrackedCallsites outside the loop
Unwinder is hoisted out in https://reviews.llvm.org/D115550, so fix the useage of getUntrackedCallsites.
Reviewed By: hoy, wenlei
D
[llvm-profgen] Fix to use getUntrackedCallsites outside the loop
Unwinder is hoisted out in https://reviews.llvm.org/D115550, so fix the useage of getUntrackedCallsites.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115760
show more ...
|
| #
3dcb60db |
| 15-Dec-2021 |
wlei <[email protected]> |
[CSSPGO][llvm-profgen] Fix external address issues of perf reader (leading external LBR part)
We can have the sampling just hit into the external addresses, in that case, both the top stack frame an
[CSSPGO][llvm-profgen] Fix external address issues of perf reader (leading external LBR part)
We can have the sampling just hit into the external addresses, in that case, both the top stack frame and the latest LBR target are external addresses. For example: ``` ffffffff 0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0
ffffffff 40067e 0xffffffff/0xffffffff/P/-/-/0 0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0 ``` Before we will ignore the entire samples. However, we found there exists some internal LBRs in the remaining part of sample, the range between them is still a valid range, we will lose some valid LBRs. Those LBRs will be unwinded based on a empty(context-less) call stack.
This change tries to fix it, instead of ignoring the entire sample, we only ignore the leading external addresses.
Note that the first outgoing LBR is useful since there is a valid range between it's source and next LBR's target.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115538
show more ...
|
| #
5740bb80 |
| 14-Dec-2021 |
Hongtao Yu <[email protected]> |
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is e
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ```
Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
41a681ce |
| 04-Nov-2021 |
wlei <[email protected]> |
[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator
In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen:
1) Dissassemble .r
[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator
In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen:
1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary.
2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`.
3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData.
Reviewed By: xur, hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113296
show more ...
|
| #
f7976edc |
| 12-Nov-2021 |
Wenlei He <[email protected]> |
[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset
Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By defau
[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset
Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools.
Differential Revision: https://reviews.llvm.org/D113727
show more ...
|
| #
aab18100 |
| 09-Nov-2021 |
wlei <[email protected]> |
[llvm-profgen] Fix bug of setting function entry
Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore report
[llvm-profgen] Fix bug of setting function entry
Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore reporting callsite sample that's from an intra func branch. However, in HHVM, it appears that the symbol table name is inconsistent with the dwarf info func name, it's likely due to `OptimizeGlobalAliases`.
This change is a workaround in llvm-profgen side to mark the only one range as the function entry and add warnings for the remaining inconsistence.
This also fixed a missing `getCanonicalFnName` for symbol name which caused the mismatching as well.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113492
show more ...
|
| #
dc9f0379 |
| 02-Nov-2021 |
wlei <[email protected]> |
[llvm-profgen] Refactor the code of getHashCode
Refactor to generate hash code lazily. Tested on clang self build, no observable generating time regression.
Reviewed By: hoy, wenlei
Differential R
[llvm-profgen] Refactor the code of getHashCode
Refactor to generate hash code lazily. Tested on clang self build, no observable generating time regression.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113059
show more ...
|
| #
138202a8 |
| 26-Oct-2021 |
wlei <[email protected]> |
[llvm-profgen] Warn on invalid range and show warning summary
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In
[llvm-profgen] Warn on invalid range and show warning summary
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.
Example of warning summary. ``` warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction. warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function. ```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D111902
show more ...
|
| #
a5f411b7 |
| 15-Oct-2021 |
wlei <[email protected]> |
[llvm-profgen] Allow unsymbolized profile as perf input
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it
[llvm-profgen] Allow unsymbolized profile as perf input
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this.
Format of unsymbolized profile: ```
[context stack1] # If it's a CS profile number of entries in RangeCounter from_1-to_1:count_1 from_2-to_2:count_2 ...... from_n-to_n:count_n number of entries in BranchCounter src_1->dst_1:count_1 src_2->dst_2:count_2 ...... src_n->dst_n:count_n [context stack2] ...... ```
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D111750
show more ...
|
| #
4e3eebc6 |
| 23-Oct-2021 |
Kazu Hirata <[email protected]> |
[tools, utils] Use StringRef::contains (NFC)
|