1# Instruction referencing for debug info
2
3This document explains how LLVM uses value tracking, or instruction
4referencing, to determine variable locations for debug info in the code
5generation stage of compilation. This content is aimed at those working on code
6generation targets and optimisation passes. It may also be of interest to anyone
7curious about low-level debug info handling.
8
9# Problem statement
10
11At the end of compilation, LLVM must produce a DWARF location list (or similar)
12describing what register or stack location a variable can be found in, for each
13instruction in that variable's lexical scope. We could track the virtual
14register that the variable resides in through compilation, however this is
15vulnerable to register optimisations during regalloc, and instruction
16movements.
17
18# Solution: instruction referencing
19
20Rather than identify the virtual register that a variable value resides in,
21instead in instruction referencing mode, LLVM refers to the machine instruction
22and operand position that the value is defined in. Consider the LLVM IR way of
23referring to instruction values:
24
25  %2 = add i32 %0, %1
26  call void @llvm.dbg.value(metadata i32 %2,
27
28In LLVM IR, the IR Value is synonymous with the instruction that computes the
29value, to the extent that in memory a Value is a pointer to the computing
30instruction. Instruction referencing implements this relationship in the
31codegen backend of LLVM, after instruction selection. Consider the X86 assembly
32below and instruction referencing debug info, corresponding to the earlier
33LLVM IR:
34
35  %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
36  DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
37
38While the function remains in SSA form, virtual register %2 is sufficient to
39identify the value computed by the instruction -- however the function
40eventually leaves SSA form, and register optimisations will obscure which
41register the desired value is in. Instead, a more consistent way of identifying
42the instruction's value is to refer to the MachineOperand where the value is
43defined: independently of which register is defined by that MachineOperand. In
44the code above, the DBG_INSTR_REF instruction refers to instruction number one,
45operand zero, while the ADD32rr has a debug-instr-number attribute attached
46indicating that it is instruction number one.
47
48De-coupling variable locations from registers avoids difficulties involving
49register allocation and optimisation, but requires additional instrumentation
50when the instructions are optimised instead. Optimisations that replace
51instructions with optimised versions that compute the same value must either
52preserve the instruction number, or record a substitution from the old
53instruction / operand number pair to the new instruction / operand pair -- see
54MachineFunction::substituteDebugValuesForInst. If debug info maintenance is not
55performed, or an instruction is eliminated as dead code, the variable location
56is safely dropped and marked "optimised out". The exception is instructions
57that are mutated rather than replaced, which always need debug info
58maintenance.
59
60# Register allocator considerations
61
62When the register allocator runs, debugging instructions do not directly refer
63to any virtual registers, and thus there is no need for expensive location
64maintenance during regalloc (i.e., LiveDebugVariables). Debug instructions are
65unlinked from the function, then linked back in after register allocation
66completes.
67
68The exception is PHI instructions: these become implicit definitions at control
69flow merges once regalloc finishes, and any debug numbers attached to PHI
70instructions are lost. To circumvent this, debug numbers of PHIs are recorded
71at the start of register allocation (phi-node-elimination), then DBG_PHI
72instructions are inserted after regalloc finishes. This requires some
73maintenance of which register a variable is located in during regalloc, but at
74single positions (block entry points) rather than ranges of instructions.
75
76An example, before regalloc:
77
78  bb.2:
79    %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
80
81After:
82
83  bb.2:
84    DBG_PHI $rax, 1
85
86# LiveDebugValues
87
88After optimisations and code layout complete, information about variable
89values must be translated into variable locations, i.e. registers and stack
90slots. This is performed in the [LiveDebugValues pass][LiveDebugValues], where
91the debug instructions and machine code are separated out into two independent
92functions:
93 * One that assigns values to variable names,
94 * One that assigns values to machine registers and stack slots.
95
96LLVM's existing SSA tools are used to place PHIs for each function, between
97variable values and the values contained in machine locations, with value
98propagation eliminating any un-necessary PHIs. The two can then be joined up
99to map variables to values, then values to locations, for each instruction in
100the function.
101
102Key to this process is being able to identify the movement of values between
103registers and stack locations, so that the location of values can be preserved
104for the full time that they are resident in the machine.
105
106# Required target support and transition guide
107
108Instruction referencing will work on any target, but likely with poor coverage.
109Supporting instruction referencing well requires:
110 * Target hooks to be implemented to allow LiveDebugValues to follow values through the machine,
111 * Target-specific optimisations to be instrumented, to preserve instruction numbers.
112
113## Target hooks
114
115TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any
116instructions that are copy-like -- LiveDebugValues uses this to identify when
117values move between registers.
118
119TargetInstrInfo::isLoadFromStackSlotPostFE and
120TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and
121restore instructions. Each should return the destination or source register
122respectively. LiveDebugValues will track the movement of a value from / to
123the stack slot. In addition, any instruction that writes to a stack spill
124should have a MachineMemoryOperand attached, so that LiveDebugValues can
125recognise that a slot has been clobbered.
126
127## Target-specific optimisation instrumentation
128
129Optimisations come in two flavours: those that mutate a MachineInstr to make
130it do something different, and those that create a new instruction to replace
131the operation of the old.
132
133The former _must_ be instrumented -- the relevant question is whether any
134register def in any operand will produce a different value, as a result of the
135mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF
136instruction referring to that operand will end up assigning the different
137value to a variable, presenting the debugging developer with an unexpected
138variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the
139mutated instruction to erase its instruction number. Any DBG_INSTR_REF
140referring to it will produce an empty variable location instead, that appears
141as "optimised out" in the debugger.
142
143For the latter flavour of optimisation, to increase coverage you should record
144an instruction number substitution: a mapping from the old instruction number /
145operand pair to new instruction number / operand pair. Consider if we replace
146a three-address add instruction with a two-address add:
147
148  %2:gr32 = ADD32rr %0, %1, debug-instr-number 1
149
150becomes
151
152  %2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
153
154With a substitution from "instruction number 1 operand 0" to "instruction number
1552 operand 0" recorded in the MachineFunction. In LiveDebugValues, DBG_INSTR_REFs
156will be mapped through the substitution table to find the most recent
157instruction number / operand number of the value it refers to.
158
159Use MachineFunction::substituteDebugValuesForInst to automatically produce
160substitutions between an old and new instruction. It assumes that any operand
161that is a def in the old instruction is a def in the new instruction at the
162same operand position. This works most of the time, for example in the example
163above.
164
165If operand numbers do not line up between the old and new instruction, use
166MachineInstr::getDebugInstrNum to acquire the instruction number for the new
167instruction, and MachineFunction::makeDebugValueSubstitution to record the
168mapping between register definitions in the old and new instructions. If some
169values computed by the old instruction are no longer computed by the new
170instruction, record no substitution -- LiveDebugValues will safely drop the
171now unavailable variable value.
172
173Should your target clone instructions, much the same as the TailDuplicator
174optimisation pass, do not attempt to preserve the instruction numbers or
175record any substitutions. MachineFunction::CloneMachineInstr should drop the
176instruction number of any cloned instruction, to avoid duplicate numbers
177appearing to LiveDebugValues. Dealing with duplicated instructions is a
178natural extension to instruction referencing that's currently unimplemented.
179
180[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations
181