| #
a80181a8 |
| 02-Feb-2022 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef][NFC] Free resources at an earlier stage
This patch releases some memory from InstrRefBasedLDV earlier that it would otherwise. The underlying problem is: * We store a big tabl
[DebugInfo][InstrRef][NFC] Free resources at an earlier stage
This patch releases some memory from InstrRefBasedLDV earlier that it would otherwise. The underlying problem is: * We store a big table of "live in values for each block", * We translate that into DBG_VALUE instructions in each block,
And both exist in memory at the same time, which needlessly doubles that information. The most of what this patch does is: as we progressively translate live-in information into DBG_VALUEs, we free the variable-value / machine-value tracking information as we go, which significantly reduces peak memory.
While I'm here, also add a clear method to wipe variable assignments that have been accumulated into VLocTracker objects, and turn a DenseMap into a SmallDenseMap to avoid an initial allocation.
Differential Revision: https://reviews.llvm.org/D118453
show more ...
|
| #
d556eb7e |
| 02-Feb-2022 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef][NFC] Cache some PHI resolutions
Install a cache of DBG_INSTR_REF -> ValueIDNum resolutions, for scenarios where the value has to be reconstructed from several DBG_PHIs. Wheneve
[DebugInfo][InstrRef][NFC] Cache some PHI resolutions
Install a cache of DBG_INSTR_REF -> ValueIDNum resolutions, for scenarios where the value has to be reconstructed from several DBG_PHIs. Whenever this happens, it's because branch folding + tail duplication has messed with the SSA form of the program, and we have to solve a mini SSA problem to find the variable value. This is always called twice, so it makes sense to cache the value.
This gives a ~0.5% geomean compile-time-performance improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D118455
show more ...
|
|
Revision tags: llvmorg-15-init |
|
| #
14aaaa12 |
| 01-Feb-2022 |
Jeremy Morse <[email protected]> |
Re-apply 3fab2d138e30, now with a triple added
Was reverted in 1c1b670a73a9 as it broke all non-x86 bots. Original commit message:
[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out
In
Re-apply 3fab2d138e30, now with a triple added
Was reverted in 1c1b670a73a9 as it broke all non-x86 bots. Original commit message:
[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out
In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821.
It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them.
In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero.
Differential Revision: https://reviews.llvm.org/D118601
show more ...
|
| #
1c1b670a |
| 02-Feb-2022 |
Kevin Athey <[email protected]> |
Revert "[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out"
This reverts commit 3fab2d138e30c65249e1eaea6cc68b2b7f50955a.
Breaking PPC sanitizer build: https://lab.llvm.org/buildbot/#/bui
Revert "[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out"
This reverts commit 3fab2d138e30c65249e1eaea6cc68b2b7f50955a.
Breaking PPC sanitizer build: https://lab.llvm.org/buildbot/#/builders/105/builds/20857
show more ...
|
| #
8e75536e |
| 01-Feb-2022 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef][NFC] Bypass a frequently-noop loop
Bypass this loop if it would do nothing -- if there are no register masks to be examined, there's no point looking at each location to see if
[DebugInfo][InstrRef][NFC] Bypass a frequently-noop loop
Bypass this loop if it would do nothing -- if there are no register masks to be examined, there's no point looking at each location to see if the location has been def'd. Awkwardly, this was responsible for almost an entire half a percent of performance improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D118613
show more ...
|
| #
3fab2d13 |
| 01-Feb-2022 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out
In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, cau
[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out
In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821.
It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them.
In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero.
Differential Revision: https://reviews.llvm.org/D118601
show more ...
|
| #
91fb66cf |
| 01-Feb-2022 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef][NFC] Don't build a map of un-needed values
When finding locations for variable values at the start of a block, we build a large map of every value to every location, and then p
[DebugInfo][InstrRef][NFC] Don't build a map of un-needed values
When finding locations for variable values at the start of a block, we build a large map of every value to every location, and then pick out the locations for values that are desired. This takes up quite a lot of time, because, unsurprisingly, there are usually more values in registers and stack slots than there are variables.
This patch instead creates a map of desired values to their locations, which are initially illegal locations. Then, as we examine every available value, we can select locations for values we care about, and ignore those that we don't. This substantially reduces the amount of work done (i.e., building a map up of values to locations that nothing wants or needs).
Geomean performance improvement of 1% on CTMark, woo.
Differential Revision: https://reviews.llvm.org/D118597
show more ...
|
| #
4a2cb013 |
| 31-Jan-2022 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef][NFC] Refactor ahead of further optimisations
This patch shuffles some functions around so that some blocks of code can be reused. In particular, * Move the determination of "w
[DebugInfo][InstrRef][NFC] Refactor ahead of further optimisations
This patch shuffles some functions around so that some blocks of code can be reused. In particular, * Move the determination of "which blocks are in scope" to its own function, as it's non-trivial to solve. Delete the "InScopeBlocks" collection too, which nothing reads from. * Split transfer emission (i.e., installing DBG_VALUEs into blocks) into its own function. * Name some useful types. * Rename "ScopeToBlocks" to "ScopeToAssignBlocks", as that's what the collection contains, blocks where assignments happen.
Differential Revision: https://reviews.llvm.org/D118454
show more ...
|
| #
c703d77a |
| 31-Jan-2022 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Don't fully propagate single assigned variables
If we only assign a variable value a single time, we can take a short-cut when computing its location: the variable value is onl
[DebugInfo][InstrRef] Don't fully propagate single assigned variables
If we only assign a variable value a single time, we can take a short-cut when computing its location: the variable value is only valid up to the dominance frontier of where the assignemnt happens. Past that point, there are other predecessors from where the variable has no value, meaning the variable has no location past that point.
This patch recognises this scenario, and avoids expensive SSA computation, to improve compile-time performance.
Differential Revision: https://reviews.llvm.org/D117877
show more ...
|
| #
e0b11c76 |
| 30-Jan-2022 |
Markus Böck <[email protected]> |
[Support][NFC] Fix generic `ChildrenGetterTy` of `IDFCalculatorBase`
Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block typ
[Support][NFC] Fix generic `ChildrenGetterTy` of `IDFCalculatorBase`
Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block type itself and any uses of GraphTraits is therefore done via a pointer to the node type. However, the ChildrenGetterTy type of IDFCalculatorBase has a use on just the node type instead of a pointer to the node type. Various parts of the monorepo has worked around this issue by providing specializations of GraphTraits for the node type directly, or not been affected by using specializations instead of the generic case. These are unnecessary however and instead the generic code should be fixed instead.
An example from within Tree is eg. A use of IDFCalculatorBase in InstrRefBasedImpl.cpp. It basically instantiates a IDFCalculatorBase<MachineBasicBlock, false> but due to the bug above then goes on to specialize GraphTraits<MachineBasicBlock> although GraphTraits<MachineBasicBlock*> exists (and should be used instead).
Similar dead code exists in clang which defines redundant GraphTraits to work around this bug.
This patch fixes both the original issue and removes the dead code that was used to work around the issue.
Differential Revision: https://reviews.llvm.org/D118386
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
81d35f27 |
| 18-Jan-2022 |
Nikita Popov <[email protected]> |
[DebugInstrRef] Memoize variable order during sorting (NFC)
Instead of constructing DebugVariables and looking up the order in the comparison function, compute the order upfront and then sort a vect
[DebugInstrRef] Memoize variable order during sorting (NFC)
Instead of constructing DebugVariables and looking up the order in the comparison function, compute the order upfront and then sort a vector of (order, instr).
This improves compile-time by -0.4% geomean on CTMark ReleaseLTO-g.
Differential Revision: https://reviews.llvm.org/D117575
show more ...
|
| #
0d51b6ab |
| 18-Jan-2022 |
Nikita Popov <[email protected]> |
[DebugInstrRef] Add some missing const qualifiers (NFC)
|
| #
cbaae614 |
| 18-Jan-2022 |
Nikita Popov <[email protected]> |
[DebugInstrRef] Use DenseMap for ValueToLoc (NFC)
Just replacing std::map with DenseMap here is a major regression -- because this code used an identity hash for ValueIDNum. Because ValueIDNum is co
[DebugInstrRef] Use DenseMap for ValueToLoc (NFC)
Just replacing std::map with DenseMap here is a major regression -- because this code used an identity hash for ValueIDNum. Because ValueIDNum is composed of multiple components, it is important that we use a reasonably good hash function here, so switch it to hash_value. DenseMapInfo::getHashValue<uint64_t> would not be sufficient.
This gives a -0.8% geomean improvement on CTMark ReleaseLTO-g.
show more ...
|
| #
764e52f0 |
| 13-Jan-2022 |
Eugene Zhulenev <[email protected]> |
[DebugInfo][InstrRef] Short-circuit unnecessary preferred location map construction
Reviewed By: cota
Differential Revision: https://reviews.llvm.org/D117162
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
3aed2822 |
| 04-Dec-2021 |
Kazu Hirata <[email protected]> |
[CodeGen] Use range-based for loops (NFC)
|
| #
8dda516b |
| 30-Nov-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Avoid dropping fragment info during PHI elimination
InstrRefBasedLDV used to crash on the added test -- the exit block is not in scope for the variable being propagated, but is
[DebugInfo][InstrRef] Avoid dropping fragment info during PHI elimination
InstrRefBasedLDV used to crash on the added test -- the exit block is not in scope for the variable being propagated, but is still considered because it contains an assignment. The failure-mode was vlocJoin ignoring assign-only blocks and not updating DIExpressions, but pickVPHILoc would still find a variable location for it. That led to DBG_VALUEs created with the wrong fragment information.
Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will now consider assign-only blocks and will update their expressions.
Differential Revision: https://reviews.llvm.org/D114727
show more ...
|
| #
0eee8445 |
| 29-Nov-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Terminate overlapping variable fragments
If we have a variable where its fragments are split into overlapping segments:
DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LL
[DebugInfo][InstrRef] Terminate overlapping variable fragments
If we have a variable where its fragments are split into overlapping segments:
DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16) ... DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32)
we should only propagate the most recently assigned fragment out of a block. LiveDebugValues only deals with live-in variable locations, as overlaps within blocks is DbgEntityHistoryCalculators domain.
InstrRefBasedLDV has kept the accumulateFragmentMap method from VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's produced a mapping of variable / fragments to the overlapped variable / fragments, VLocTracker uses it to identify when a debug instruction needs to terminate the other parts it overlaps with. The test is updated for some standard "InstrRef picks different registers" variation, and the order of some unrelated DBG_VALUEs changes.
Differential Revision: https://reviews.llvm.org/D114603
show more ...
|
| #
9cf31b8d |
| 29-Nov-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Preserve properties of restored variables
InstrRefBasedLDV observes when variable locations are clobbered, scans what values are available in the machine, and re-issues a DBG_V
[DebugInfo][InstrRef] Preserve properties of restored variables
InstrRefBasedLDV observes when variable locations are clobbered, scans what values are available in the machine, and re-issues a DBG_VALUE for the variable if it can find another location. Unfortunately, I hadn't joined up the Indirectness flag, so if it did this to an Indirect Value, the indirectness would be dropped.
Fix this, and add a test that if we clobber a variable value (on the stack in this case), then the recovered variable location keeps the Indirect flag.
Differential Revision: https://reviews.llvm.org/D114378
show more ...
|
| #
536b9eb3 |
| 25-Nov-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Add extra indirection for NRVO tests
In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs after SelectionDAG, even in instruction referencing mode (if th
[DebugInfo][InstrRef] Add extra indirection for NRVO tests
In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs after SelectionDAG, even in instruction referencing mode (if the variable is an argument). If the corresponding argument value is spilt to the stack, then we have: * Indirection from it being on the stack, * Indirection from it being a dbg.declare or a dbg.addr.
However InstrRefBasedLDV only emits one level of indirection. This patch adds the second, by adding an extra DW_OP_deref if necessary. The two tests modified fail otherwise -- they feature some NRVO, and require two levels of indirection to be correct.
Differential Revision: https://reviews.llvm.org/D114364
show more ...
|
| #
102d2a8a |
| 25-Nov-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks
DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the lexical scope of their variable. It's arguable as to
[DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks
DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the lexical scope of their variable. It's arguable as to what we should do about this, however VarLocBasedLDV permits such variable locations to be propagated, so let's allow it in InstrRefBasedLDV.
It's necessary for the modified test to work.
Differential Revision: https://reviews.llvm.org/D114578
show more ...
|
| #
bfadc5dc |
| 24-Nov-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Cope with win32 calls changing SP in LiveDebugValues
Almost all of the time, call instructions don't actually lead to SP being different after they return. An exception is win3
[DebugInfo][InstrRef] Cope with win32 calls changing SP in LiveDebugValues
Almost all of the time, call instructions don't actually lead to SP being different after they return. An exception is win32's _chkstk, which which implements stack probes. We need to recognise that as modifying SP, so that copies of the value are tracked as distinct vla pointers.
This patch adds a target frame-lowering hook to see whether stack probe functions will modify the stack pointer, store that in an internal flag, and if it's true then scan CALL instructions to see whether they're a stack probe. If they are, recognise them as defining a new stack-pointer value.
The added test exercises this behaviour: two calls to _chkstk should be considered as producing two different values.
Differential Revision: https://reviews.llvm.org/D114443
show more ...
|
| #
133e25f9 |
| 24-Nov-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Ignore SP clobbers on call instructions even more
Avoid un-necessarily recreating DBG_VALUEs on call instructions.
In LiveDebugvalues we choose to ignore any clobbers of SP by
[DebugInfo][InstrRef] Ignore SP clobbers on call instructions even more
Avoid un-necessarily recreating DBG_VALUEs on call instructions.
In LiveDebugvalues we choose to ignore any clobbers of SP by call instructions, as they're irrelevant to our model of the machine. We currently do so for tracking register values (MTracker); do the same for tracking variable locations (TTracker).
Test modified to endure that a duplicate DBG_VALUE is not created after the call in struction in this test.
Differential Revision: https://reviews.llvm.org/D114365
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
ef2d0e0f |
| 10-Nov-2021 |
Kazu Hirata <[email protected]> |
[llvm] Use MachineBasicBlock::{successors,predecessors} (NFC)
|
| #
4136897b |
| 25-Oct-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar
There are a few STL containers hanging around that can become DenseMaps, SmallVectors and similar. This recovers a modest amount of c
[DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar
There are a few STL containers hanging around that can become DenseMaps, SmallVectors and similar. This recovers a modest amount of compile time performance.
While I'm here, adjust the bit layout of ValueIDNum: this was always supposed to act like a value type, however it seems that clang doesn't compile the comparison functions to act that way. Add a uint64_t to a union that explicitly aliases the bitfields, so that we can compare the whole value as a single integer.
Differential Revision: https://reviews.llvm.org/D112333
show more ...
|
| #
97ddf49e |
| 25-Oct-2021 |
Jeremy Morse <[email protected]> |
[DebugInfo][InstrRef] Recover stack-slot tracking performance
This patch is like D111627 -- instead of calculating IDF for every location on the stack, only do it for the smallest units of interfere
[DebugInfo][InstrRef] Recover stack-slot tracking performance
This patch is like D111627 -- instead of calculating IDF for every location on the stack, only do it for the smallest units of interference, and copy the PHIs for those units to any aliases.
The test added runs placeMLocPHIs directly, and tests that: * A def of the lower 8 bits of a stack slot causes all aliasing regs to have PHIs placed, * It doesn't cause the equivalent location to x86's $ah, which isn't aliased, to have a PHI placed.
Differential Revision: https://reviews.llvm.org/D112324
show more ...
|