ProfiledCallGraph.h - OpenGrok history log for /llvm-project-15.0.7/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 7b81a81d	21-Jul-2022	Mircea Trofin <[email protected]>	[NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate The name `getEntrySamples` was misleading for 2 reasons. One, it's close in name to `Function::getEntryCount`, but the equivalent her [NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate The name `getEntrySamples` was misleading for 2 reasons. One, it's close in name to `Function::getEntryCount`, but the equivalent here is `getHeadSamples`; second, as opposed to the other get* APIs in `FunctionSamples`, it performs an estimate/heuristic rather than just retrieving raw data (or a non-heuristic derivate off that data, like `getMaxCountInside`) The new name should more clearly communicate its intent; and, being close (in name) to `getHeadSamples`, it should allow the reader discover the relation between them. Also updated the doc comments for both `getHeadSamples[Estimate]` so a reader may better understand the relation between them. Differential Revision: https://reviews.llvm.org/D130281 show more ...
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3
# e36786d1	28-Apr-2022	Hongtao Yu <[email protected]>	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `P [CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602 show more ...
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1
# dbb158eb	15-Mar-2022	Pavel Labath <[email protected]>	Remove top-level using directives from Transforms/IPO headers These directives pollute the namespace of all files which include the header.
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# 79787b90	25-Feb-2022	Harald van Dijk <[email protected]>	[ADT, CSSPGO] Specify set comparer In _GLIBCXX_DEBUG builds, potentially implicitly enabled by LLVM_ENABLE_EXPENSIVE_CHECKS, std::set<A, B>::iterator and std::set<A, C>::iterator are distinct types [ADT, CSSPGO] Specify set comparer In _GLIBCXX_DEBUG builds, potentially implicitly enabled by LLVM_ENABLE_EXPENSIVE_CHECKS, std::set<A, B>::iterator and std::set<A, C>::iterator are distinct types that are not interconvertible. This change aligns the iterator types with the set types. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D119798 show more ...
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 5740bb80	14-Dec-2021	Hongtao Yu <[email protected]>	[CSSPGO] Use nested context-sensitive profile. CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is e [CSSPGO] Use nested context-sensitive profile. CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205 show more ...
# bf317f66	29-Nov-2021	Hongtao Yu <[email protected]>	[CSSPGO] Sorting nodes in a cycle of profiled call graph. For nodes that are in a cycle of a profiled call graph, the current order the underlying scc_iter computes purely depends on how those nodes [CSSPGO] Sorting nodes in a cycle of profiled call graph. For nodes that are in a cycle of a profiled call graph, the current order the underlying scc_iter computes purely depends on how those nodes are reached from outside the SCC and inside the SCC, based on the Tarjan algorithm. This does not honor profile edge hotness, thus does not gurantee hot callsites to be inlined prior to cold callsites. To mitigate that, I'm adding an extra sorter on top of scc_iter to sort scc functions in the order of callsite hotness, instead of changing the internal of scc_iter. Sorting on callsite hotness can be optimally based on detecting cycles on a directed call graph, i.e, to remove the coldest edge until a cycle is broken. However, detecting cycles isn't cheap. I'm using an MST-based approach which is faster and appear to deliver some performance wins. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D114204 show more ...
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 7ca80300	30-Aug-2021	Hongtao Yu <[email protected]>	[CSSPGO] Enable loading MD5 CS profile. Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbin [CSSPGO] Enable loading MD5 CS profile. Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster. There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D108342 show more ...
Revision tags: llvmorg-13.0.0-rc2
# b9db7036	25-Aug-2021	Hongtao Yu <[email protected]>	[CSSPGO] Split context string to deduplicate function name used in the context. Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. [CSSPGO] Split context string to deduplicate function name used in the context. Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context. A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing. When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`. Testing This is no-diff change in terms of code quality and profile content (for text profile). For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated. The compile time of ads is dropped by 25%. Differential Revision: https://reviews.llvm.org/D107299 show more ...
# eec96db1	03-Aug-2021	Kazu Hirata <[email protected]>	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# 3dc727e8	13-Jun-2021	Simon Pilgrim <[email protected]>	ProfiledCallGraph.h - remove unused <string> include. NFCI.
Revision tags: llvmorg-12.0.1-rc1
# 82956de0	04-May-2021	Wei Mi <[email protected]>	[SampleFDO] Fix a bug when appending function symbol into the Callees set of Root node in ProfiledCallGraph. In ProfiledCallGraph::addProfiledFunction, to add a function symbol into the ProfiledCall [SampleFDO] Fix a bug when appending function symbol into the Callees set of Root node in ProfiledCallGraph. In ProfiledCallGraph::addProfiledFunction, to add a function symbol into the ProfiledCallGraph, currently an uninitialized ProfiledCallGraphNode node is created by ProfiledFunctions[Name] and inserted into Callees set of Root node before the node is initialized. The Callees set use ProfiledCallGraphNodeComparer as its comparator so the uninitialized ProfiledCallGraphNode may fail to be inserted into Callees set if it happens to contain a name in memory which has been inserted into the Callees set before. The problem will prevent some function symbols from being annotated with profiles and cause performance regression. The patch fixes the problem. Differential Revision: https://reviews.llvm.org/D101815 show more ...
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 3e3fc431	29-Mar-2021	Hongtao Yu <[email protected]>	[CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn [CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351 show more ...
Revision tags: llvmorg-12.0.0-rc3
# 30b02323	05-Mar-2021	Wenlei He <[email protected]>	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it [CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146 show more ...