History log of /llvm-project-15.0.7/llvm/tools/llvm-profgen/ProfiledBinary.cpp (Results 1 – 25 of 68)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 611ffcf4 14-Jul-2022 Kazu Hirata <[email protected]>

[llvm] Use value instead of getValue (NFC)


# 7e86b13c 28-Jun-2022 wlei <[email protected]>

[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion an

[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031

show more ...


# a7938c74 26-Jun-2022 Kazu Hirata <[email protected]>

[llvm] Don't use Optional::hasValue (NFC)

This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.


# 3b7c3a65 25-Jun-2022 Kazu Hirata <[email protected]>

Revert "Don't use Optional::hasValue (NFC)"

This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.


# aa8feeef 25-Jun-2022 Kazu Hirata <[email protected]>

Don't use Optional::hasValue (NFC)


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# d86a206f 05-Jun-2022 Fangrui Song <[email protected]>

Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options


# 557efc9a 04-Jun-2022 Fangrui Song <[email protected]>

[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC

Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the err

[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC

Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.

show more ...


Revision tags: llvmorg-14.0.4
# 9f732af5 12-May-2022 Hongtao Yu <[email protected]>

[llvm-profgen] Filter out oversized LBR ranges.

As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid.

For example, the last two pairs in the following trace

[llvm-profgen] Filter out oversized LBR ranges.

As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid.

For example, the last two pairs in the following trace form a range [0x0d7b02b0, 0x368ba706] that covers a ton of functions in the binary. Such oversized range should also be ignored.

0x0c74505f/0x368b99a0 **0x368ba706**/0x0c745040 0x0d7b1c3f/**0x0d7b02b0**

Add a defensive check to filter out those ranges based that the valid range should not cross the unconditional branch(Call, return, unconditional jmp).

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D125448

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# bfcb2c11 24-Apr-2022 wlei <[email protected]>

[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

This patch is fixing two issues for both CS and non-CS.
1) For external-call-internal, the head samp

[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

This patch is fixing two issues for both CS and non-CS.
1) For external-call-internal, the head samples of the the internal function should be recorded.
2) avoid ignoring LBR after meeting the interrupt branch for CS profile

LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder.

Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context.

The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D118177

show more ...


Revision tags: llvmorg-14.0.1
# 3f970168 23-Mar-2022 Hongtao Yu <[email protected]>

[llvm-profgen] Decoding pseudo probe for profiled function only.

Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in

[llvm-profgen] Decoding pseudo probe for profiled function only.

Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled.

To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain.

I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding.

I'm seeing 20GB memory is saved for one of our internal large service.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D121643

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# db29f437 23-Feb-2022 serge-sans-paille <[email protected]>

Cleanup include: DebugInfo/Symbolize

Estimation of the impact on preprocessor output
after: 1067349756
before:1067487786

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-

Cleanup include: DebugInfo/Symbolize

Estimation of the impact on preprocessor output
after: 1067349756
before:1067487786

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120433

show more ...


# b3a778fb 22-Feb-2022 wlei <[email protected]>

[llvm-profgen] Support symbol loading for debug fission

Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the `getNonSkeletonUnitDIE(false)` API to achieve this.

Add

[llvm-profgen] Support symbol loading for debug fission

Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the `getNonSkeletonUnitDIE(false)` API to achieve this.

Add test cause to make sure all the ranges is well retrieved by the loader.

Reviewed By: ayermolo, hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115973

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 34e131b0 28-Jan-2022 Hongtao Yu <[email protected]>

[llvm-profgen] On-demand track optimized-away inlinees for preinliner.

Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking

[llvm-profgen] On-demand track optimized-away inlinees for preinliner.

Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking on-demand based on profiled functions only. This saves about 10% memory overall for a medium-sized benchmark.

Before:

note: After parsePerfTraces
note: Thu Jan 27 18:42:09 2022
note: VM: 8.68 GB RSS: 8.39 GB
note: After computeSizeForProfiledFunctions
note: Thu Jan 27 18:42:41 2022
note: **VM: 10.63 GB RSS: 10.20 GB**
note: After generateProbeBasedProfile
note: Thu Jan 27 18:45:49 2022
note: VM: 25.00 GB RSS: 24.95 GB
note: After postProcessProfiles
note: Thu Jan 27 18:49:29 2022
note: VM: 26.34 GB RSS: 26.27 GB

After:
note: After parsePerfTraces
note: Fri Jan 28 12:04:49 2022
note: VM: 8.68 GB RSS: 7.65 GB
note: After computeSizeForProfiledFunctions
note: Fri Jan 28 12:05:26 2022
note: **VM: 8.68 GB RSS: 8.42 GB**
note: After generateProbeBasedProfile
note: Fri Jan 28 12:08:03 2022
note: VM: 22.93 GB RSS: 22.89 GB
note: After postProcessProfiles
note: Fri Jan 28 12:11:30 2022
note: VM: 24.27 GB RSS: 24.22 GB

This should be a no-diff change in terms of profile quality.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D118515

show more ...


# c56a85fd 02-Feb-2022 Simon Pilgrim <[email protected]>

[llvm-profgen] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr

The pointers are dereferenced immediately, so assert the cast is correct instead of returning nullptr


# 6693c562 25-Jan-2022 wlei <[email protected]>

[llvm-profgen] Support to load debug info from a second binary

For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug)

[llvm-profgen] Support to load debug info from a second binary

For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug). Here add support in llvm-profgen to use two binaries as input. The original one is executable binary and added for debug info only binary. Adding a flag `--debug-binary=file-path`, with this, the binary will load debug info from debug binary.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115948

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3
# f4aa2a42 14-Jan-2022 Simon Pilgrim <[email protected]>

[llvm-profgen] ProfiledBinary::load - use cast<> instead of dyn_cast<> to avoid dereference of nullptr

The pointer is always dereferenced immediately, so assert the cast is correct instead of return

[llvm-profgen] ProfiledBinary::load - use cast<> instead of dyn_cast<> to avoid dereference of nullptr

The pointer is always dereferenced immediately, so assert the cast is correct instead of returning nullptr

show more ...


Revision tags: llvmorg-13.0.1-rc2
# 32205717 13-Dec-2021 wlei <[email protected]>

[llvm-profgen] Skip disassembling for PLT section

Skip disassembling .plt section, then .plt section code will be treated as external code.

Reviewed By: hoy, wenlei

Differential Revision: https://

[llvm-profgen] Skip disassembling for PLT section

Skip disassembling .plt section, then .plt section code will be treated as external code.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115699

show more ...


# 484a569e 03-Dec-2021 wlei <[email protected]>

[llvm-profgen] Fix total samples related issues

Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation

[llvm-profgen] Fix total samples related issues

Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool.

Three changes in this diff:

1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch.

2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold.

3. Change to accumulate total samples per byte instead of per instruction.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115013

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 41a681ce 04-Nov-2021 wlei <[email protected]>

[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator

In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen:

1) Dissassemble .r

[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator

In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen:

1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary.

2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`.

3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData.

Reviewed By: xur, hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113296

show more ...


# f7976edc 12-Nov-2021 Wenlei He <[email protected]>

[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset

Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By defau

[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset

Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools.

Differential Revision: https://reviews.llvm.org/D113727

show more ...


# aab18100 09-Nov-2021 wlei <[email protected]>

[llvm-profgen] Fix bug of setting function entry

Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore report

[llvm-profgen] Fix bug of setting function entry

Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore reporting callsite sample that's from an intra func branch. However, in HHVM, it appears that the symbol table name is inconsistent with the dwarf info func name, it's likely due to `OptimizeGlobalAliases`.

This change is a workaround in llvm-profgen side to mark the only one range as the function entry and add warnings for the remaining inconsistence.

This also fixed a missing `getCanonicalFnName` for symbol name which caused the mismatching as well.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113492

show more ...


# 5bf191a3 05-Nov-2021 wlei <[email protected]>

[llvm-profgen] Fix index out of bounds error while using ip.advance

Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound

[llvm-profgen] Fix index out of bounds error while using ip.advance

Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113238

show more ...


# 138202a8 26-Oct-2021 wlei <[email protected]>

[llvm-profgen] Warn on invalid range and show warning summary

Two things in this diff:

1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.

2) In

[llvm-profgen] Warn on invalid range and show warning summary

Two things in this diff:

1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.

2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.

Example of warning summary.
```
warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction.
warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function.
```

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D111902

show more ...


# f5537643 27-Oct-2021 wlei <[email protected]>

[llvm-profgen] Update total samples by accumulating all its body samples

Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update

[llvm-profgen] Update total samples by accumulating all its body samples

Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D112672

show more ...


# 3b285ff5 29-Oct-2021 Kazu Hirata <[email protected]>

[llvm-profgen] Fix a set-but-unused warning

This patch fixes:

llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable
'EndOffset' set but not used [-Werror,-Wunused-but-set-variable]

[llvm-profgen] Fix a set-but-unused warning

This patch fixes:

llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable
'EndOffset' set but not used [-Werror,-Wunused-but-set-variable]

The last use of the variable was removed on Oct 26 in commit
40ca4112515d03bbcf594bd2dfa6b4394d5b00d6.

show more ...


123