1# Instruction referencing for debug info 2 3This document explains how LLVM uses value tracking, or instruction 4referencing, to determine variable locations for debug info in the code 5generation stage of compilation. This content is aimed at those working on code 6generation targets and optimisation passes. It may also be of interest to anyone 7curious about low-level debug info handling. 8 9# Problem statement 10 11At the end of compilation, LLVM must produce a DWARF location list (or similar) 12describing what register or stack location a variable can be found in, for each 13instruction in that variable's lexical scope. We could track the virtual 14register that the variable resides in through compilation, however this is 15vulnerable to register optimisations during regalloc, and instruction 16movements. 17 18# Solution: instruction referencing 19 20Rather than identify the virtual register that a variable value resides in, 21instead in instruction referencing mode, LLVM refers to the machine instruction 22and operand position that the value is defined in. Consider the LLVM IR way of 23referring to instruction values: 24 25 %2 = add i32 %0, %1 26 call void @llvm.dbg.value(metadata i32 %2, 27 28In LLVM IR, the IR Value is synonymous with the instruction that computes the 29value, to the extent that in memory a Value is a pointer to the computing 30instruction. Instruction referencing implements this relationship in the 31codegen backend of LLVM, after instruction selection. Consider the X86 assembly 32below and instruction referencing debug info, corresponding to the earlier 33LLVM IR: 34 35 %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1 36 DBG_INSTR_REF 1, 0, !123, !456, debug-location !789 37 38While the function remains in SSA form, virtual register %2 is sufficient to 39identify the value computed by the instruction -- however the function 40eventually leaves SSA form, and register optimisations will obscure which 41register the desired value is in. Instead, a more consistent way of identifying 42the instruction's value is to refer to the MachineOperand where the value is 43defined: independently of which register is defined by that MachineOperand. In 44the code above, the DBG_INSTR_REF instruction refers to instruction number one, 45operand zero, while the ADD32rr has a debug-instr-number attribute attached 46indicating that it is instruction number one. 47 48De-coupling variable locations from registers avoids difficulties involving 49register allocation and optimisation, but requires additional instrumentation 50when the instructions are optimised instead. Optimisations that replace 51instructions with optimised versions that compute the same value must either 52preserve the instruction number, or record a substitution from the old 53instruction / operand number pair to the new instruction / operand pair -- see 54MachineFunction::substituteDebugValuesForInst. If debug info maintenance is not 55performed, or an instruction is eliminated as dead code, the variable location 56is safely dropped and marked "optimised out". The exception is instructions 57that are mutated rather than replaced, which always need debug info 58maintenance. 59 60# Register allocator considerations 61 62When the register allocator runs, debugging instructions do not directly refer 63to any virtual registers, and thus there is no need for expensive location 64maintenance during regalloc (i.e., LiveDebugVariables). Debug instructions are 65unlinked from the function, then linked back in after register allocation 66completes. 67 68The exception is PHI instructions: these become implicit definitions at control 69flow merges once regalloc finishes, and any debug numbers attached to PHI 70instructions are lost. To circumvent this, debug numbers of PHIs are recorded 71at the start of register allocation (phi-node-elimination), then DBG_PHI 72instructions are inserted after regalloc finishes. This requires some 73maintenance of which register a variable is located in during regalloc, but at 74single positions (block entry points) rather than ranges of instructions. 75 76An example, before regalloc: 77 78 bb.2: 79 %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1 80 81After: 82 83 bb.2: 84 DBG_PHI $rax, 1 85 86# LiveDebugValues 87 88After optimisations and code layout complete, information about variable 89values must be translated into variable locations, i.e. registers and stack 90slots. This is performed in the [LiveDebugValues pass][LiveDebugValues], where 91the debug instructions and machine code are separated out into two independent 92functions: 93 * One that assigns values to variable names, 94 * One that assigns values to machine registers and stack slots. 95 96LLVM's existing SSA tools are used to place PHIs for each function, between 97variable values and the values contained in machine locations, with value 98propagation eliminating any un-necessary PHIs. The two can then be joined up 99to map variables to values, then values to locations, for each instruction in 100the function. 101 102Key to this process is being able to identify the movement of values between 103registers and stack locations, so that the location of values can be preserved 104for the full time that they are resident in the machine. 105 106# Required target support and transition guide 107 108Instruction referencing will work on any target, but likely with poor coverage. 109Supporting instruction referencing well requires: 110 * Target hooks to be implemented to allow LiveDebugValues to follow values through the machine, 111 * Target-specific optimisations to be instrumented, to preserve instruction numbers. 112 113## Target hooks 114 115TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any 116instructions that are copy-like -- LiveDebugValues uses this to identify when 117values move between registers. 118 119TargetInstrInfo::isLoadFromStackSlotPostFE and 120TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and 121restore instructions. Each should return the destination or source register 122respectively. LiveDebugValues will track the movement of a value from / to 123the stack slot. In addition, any instruction that writes to a stack spill 124should have a MachineMemoryOperand attached, so that LiveDebugValues can 125recognise that a slot has been clobbered. 126 127## Target-specific optimisation instrumentation 128 129Optimisations come in two flavours: those that mutate a MachineInstr to make 130it do something different, and those that create a new instruction to replace 131the operation of the old. 132 133The former _must_ be instrumented -- the relevant question is whether any 134register def in any operand will produce a different value, as a result of the 135mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF 136instruction referring to that operand will end up assigning the different 137value to a variable, presenting the debugging developer with an unexpected 138variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the 139mutated instruction to erase its instruction number. Any DBG_INSTR_REF 140referring to it will produce an empty variable location instead, that appears 141as "optimised out" in the debugger. 142 143For the latter flavour of optimisation, to increase coverage you should record 144an instruction number substitution: a mapping from the old instruction number / 145operand pair to new instruction number / operand pair. Consider if we replace 146a three-address add instruction with a two-address add: 147 148 %2:gr32 = ADD32rr %0, %1, debug-instr-number 1 149 150becomes 151 152 %2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2 153 154With a substitution from "instruction number 1 operand 0" to "instruction number 1552 operand 0" recorded in the MachineFunction. In LiveDebugValues, DBG_INSTR_REFs 156will be mapped through the substitution table to find the most recent 157instruction number / operand number of the value it refers to. 158 159Use MachineFunction::substituteDebugValuesForInst to automatically produce 160substitutions between an old and new instruction. It assumes that any operand 161that is a def in the old instruction is a def in the new instruction at the 162same operand position. This works most of the time, for example in the example 163above. 164 165If operand numbers do not line up between the old and new instruction, use 166MachineInstr::getDebugInstrNum to acquire the instruction number for the new 167instruction, and MachineFunction::makeDebugValueSubstitution to record the 168mapping between register definitions in the old and new instructions. If some 169values computed by the old instruction are no longer computed by the new 170instruction, record no substitution -- LiveDebugValues will safely drop the 171now unavailable variable value. 172 173Should your target clone instructions, much the same as the TailDuplicator 174optimisation pass, do not attempt to preserve the instruction numbers or 175record any substitutions. MachineFunction::CloneMachineInstr should drop the 176instruction number of any cloned instruction, to avoid duplicate numbers 177appearing to LiveDebugValues. Dealing with duplicated instructions is a 178natural extension to instruction referencing that's currently unimplemented. 179 180[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations 181