1# XNU debugging 2 3Debugging xnu through kernel core files or with a live device. 4 5## Overview 6 7xnu’s debugging macros are compatible with both Python 3.9+. Please be careful about pulling 8in the latest language features. Some users are living on older Xcodes and may not have the newest 9Python installed. 10 11## General coding tips 12 13### Imports 14 15The current implementation re-exports a lot of submodules through the XNU main module. This leads to some 16surprising behavior: 17 18* Name collisions at the top level may override methods with unexpected results. 19* New imports may change the order of imports, leading to some surpising side effects. 20 21Please avoid `from xnu import *` where possible and always explicitly import only what is 22required from other modules. 23 24### Checking the type of an object 25 26Avoid testing for a `type` explicitly like `type(obj) == type`. 27Instead, always use the inheritance-sensitive `isinstance(obj, type)`. 28 29### Dealing with binary data 30 31It’s recommended to use **bytearray**, **bytes**, and **memoryviews** instead of a string. 32Some LLDB APIs no longer accept a string in place of binary data in Python 3. 33 34### Accessing large amounts of binary data (or accessing small amounts frequently) 35 36In case you're planning on accessing large contiguous blocks of memory (e.g. reading a whole 10KB of memory), 37or you're accessing small semi-contiguous chunks (e.g. if you're parsing large structured data), then it might 38be hugely beneficial performance-wise to make use of the `io.SBProcessRawIO` class. Furthermore, if you're in 39a hurry and just want to read one specific chunk once, then it might be easier to use `LazyTarget.GetProcess().ReadMemory()` 40directly. 41 42In other words, avoid the following: 43 44``` 45data_ptr = kern.GetValueFromAddress(start_addr, 'uint8_t *') 46with open(filepath, 'wb') as f: 47 f.write(data_ptr[:4096]) 48``` 49 50And instead use: 51 52``` 53from core.io import SBProcessRawIO 54import shutil 55 56io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, 4096) 57with open(filepath, 'wb') as f: 58 shutil.copyfileobj(io_access, f) 59``` 60 61Or, if you're in a hurry: 62 63``` 64err = lldb.SBError() 65my_data = LazyTarget.GetProcess().ReadMemory(start_addr, length, err) 66if err.Success(): 67 # Use my precious data 68 pass 69``` 70 71For small semi-contiguous chunks, you can map the whole region and access random chunks from it like so: 72 73``` 74from core.io import SBProcessRawIO 75 76io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size) 77io_access.seek(my_struct_offset) 78my_struct_contents = io_access.read(my_struct_size) 79``` 80 81Not only that, but you can also tack on a BufferedRandom class on top of the SBProcessRawIO instance, which 82provides you with buffering (aka caching) in case your random small chunk accesses are repeated: 83 84``` 85from core.io import SBProcessRawIO 86from io import BufferedRandom 87 88io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size) 89buffered_io = BufferedRandom(io_access) 90# And then use buffered_io for your accesses 91``` 92 93### Encoding data to strings and back 94 95All strings are now `unicode` and must be converted between binary data and strings explicitly. 96When no explicit encoding is selected then UTF-8 is the default. 97 98``` 99mystring = mybytes.decode() 100mybytes = mystring.encode() 101``` 102In most cases **utf-8** will work but be careful to be sure that the encoding matches your data. 103 104There are two options to consider when trying to get a string out of the raw data without knowing if 105they are valid string or not: 106 107* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’ 108* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct 109the string precisely 110 111Which to use depends on the transformation goals. The lossy conversion produces a printable string 112with strange characters in it. The lossless option is meant to be used when a string is only a transport 113mechanism and needs to be converted back to original values later. 114 115Switch the method by using `errors` handler during conversion: 116 117``` 118# Lossy escapes invalid chars 119b.decode('utf-8', errors='`backslashreplace'`) 120# Lossy removes invalid chars 121b.decode('utf-8', errors='ignore') 122# Loss-less but may likely fail to print() 123b.decode('utf-8', errors='surrogateescape') 124``` 125 126### Dealing with signed numbers 127 128Python's int has unlimited precision. This may be surprising for kernel developers who expect 129the behavior follows twos complement. 130 131Always use **unsigned()** or **signed()** regardless of what the actual underlying type is 132to ensure that macros use the correct semantics. 133 134## Testing changes 135 136There is no perfect test suite to check that macros are producing a correct value compared to what 137the debugger sees in a target. 138 139Be careful when touching common framework code. For larger changes, ask the Platform Triage team to 140validate that the changes work in their environment before integration. 141 142### Coding style 143 144Use a static analyzer like **pylint** or **flake8** to check the macro source code: 145 146``` 147$ python3 -m pip install --user pylint flake8 148 149# Run the lint either by setting your path to point to one of the runtimes 150# or through python 151$ python3 -m pylint <src files/dirs> 152$ python3 -m flake8 <src files/dirs> 153``` 154 155### Correctness 156 157Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`. 158 159``` 160# 1. Run LLDB with debug options set 161$ DEBUG_XNU_LLDBMACROS=1 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel 162 163# 2. Optionally load modified operating system plugin 164(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py 165 166# 3. Load modified scripts 167(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py 168 169# 4. Exercise macros 170``` 171 172Depending on the change, test other targets and architectures (for instance, both Astris and KDP). 173 174### Regression 175 176This is simpler than previous step because the goal is to ensure behavior has not changed. 177You can speed up few things by using local symbols: 178 179``` 180# 1. Get a coredump from a device and kernel UUID 181# 2. Grab symbols with dsymForUUID 182$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path> 183 184# 3. Run lldb with local symbols to avoid dsymForUUID NFS 185 186$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image> 187``` 188 189The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>` 190option. Then run `diff` on the outputs of the baseline and modified code: 191 192* No environment variables to get baseline 193* Modified dSYM as described above 194 195It’s difficult to make this automated: 196 197* Some macros needs arguments which must be found in a core file. 198* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump 199 should be taken and then inspected afterwards, but this ties up a lab device for the duration of the 200 test. 201* Even with coredumps, testing the macros takes too long in our automation system and triggers the 202 failsafe timeout. 203 204### Code coverage 205 206Use code coverage to check which parts of macros have actually been tested. 207Install **coverage** lib with: 208 209``` 210$ python3 -m pip install --user coverage 211``` 212 213Then collect coverage:. 214 215``` 216# 1. Start LLDB with your macros as described above. 217 218# 2. Load and start code coverage recording. 219(lldb) script import coverage 220(lldb) script cov = coverage.Coverage() 221(lldb) script cov.start() 222 223# 3. Do the testing. 224 225# 4. Collect the coverage. 226(lldb) script cov.stop() 227(lldb) script cov.save() 228``` 229 230You can override the default file (*.coverage*) by adding an additional environment variable to LLDB: 231 232``` 233$ env COVERAGE_FILE="${OUTDIR}/.coverage.mytest" # usual LLDB command line 234``` 235 236Combine coverage from multiple files: 237 238``` 239# Point PATH to local python where coverage is installed. 240$ export PATH="$HOME/Library/Python/3.8/bin:$PATH" 241 242# Use --keep to avoid deletion of input files after merge. 243$ coverage combine --keep <list of .coverage files or dirs to scan> 244 245# Get HTML report or use other subcommands to inspect. 246$ coverage html 247``` 248 249It is possible to start coverage collection **before** importing the operating system library and 250loading macros to check code run during bootstrapping. 251 252### Performance testing 253 254Some macros can run for a long time. Some code may be costly even if it looks simple because objects 255aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting 256code coverage. 257 258First setup your environment: 259 260``` 261# Install gprof2dot 262$ python3 -m pip install gprof2dot 263# Install graphviz 264$ brew install graphviz 265``` 266 267Then to profile commands, follow this sequence: 268 269``` 270(lldb) xnudebug profile /tmp/macro.prof showcurrentstacks 271[... command outputs ...] 272 273 Ordered by: cumulative time 274 List reduced from 468 to 30 due to restriction <30> 275 276 ncalls tottime percall cumtime percall filename:lineno(function) 277 [... profiling output ...] 278 279Profile info saved to "/tmp/macro.prof" 280``` 281 282Then to visualize callgraphs in context, in a separate shell: 283 284``` 285# Now convert the file to a colored SVG call graph 286$ python3 -m gprof2dot -f pstats /tmp/macro.prof -o /tmp/call.dot 287$ dot -O -T svg /tmp/call.dot 288 289# and view it in your favourite viewer 290$ open /tmp/call.dot.svg 291``` 292 293## Debugging your changes 294 295### Get detailed exception report 296 297The easiest way to debug an exception is to re-run your macro with the `--debug` option. 298This turns on more detailed output for each stack frame that includes source lines 299and local variables. 300 301### File a radar 302 303To report an actionable radar, please use re-run your failing macro with `--radar`. 304This will collect additional logs to an archive located in `/tmp`. 305 306Use the link provided to create a new radar. 307 308### Debugging with pdb 309 310YES, It is possible to use a debugger to debug your macro! 311 312The steps are similar to testing techniques described above (use scripting interactive mode). There is no point to 313document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which 314is part of Python installation so works out of the box. 315 316Problem: Something wrong is going on with addkext macro. What now? 317 318 (lldb) addkext -N com.apple.driver.AppleT8103PCIeC 319 Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310] 320 Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536] 321 Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536] 322 ... 323 Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c 324 Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC 325 section '__TEXT' loaded at 0xfffffe001478c780 326 327There is no exception, lot of errors and no output. So what next? 328Try to narrow the problem down to an isolated piece of macro code: 329 330 1. Try to get values of globals through regular LLDB commands 331 2. Use interactive mode and invoke functions with arguments directly. 332 333After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an 334exception in the end. It was just captured in try/catch block. So the simplified reproducer is: 335 336 (lldb) script 337 >>> import lldb 338 >>> import xnu 339 >>> err = lldb.SBError() 340 >>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err) 341 >>> m = macho.MemMacho(data, len(data)) 342 Traceback (most recent call last): 343 File "<console>", line 1, in <module> 344 File ".../lldbmacros/macho.py", line 91, in __init__ 345 self.load(fp) 346 File ".../site-packages/macholib/MachO.py", line 133, in load 347 self.load_header(fh, 0, size) 348 File ".../site-packages/macholib/MachO.py", line 168, in load_header 349 hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 350 File ".../site-packages/macholib/MachO.py", line 209, in __init__ 351 self.load(fh) 352 File ".../lldbmacros/macho.py", line 23, in new_load 353 _old_MachOHeader_load(s, fh) 354 File ".../site-packages/macholib/MachO.py", line 287, in load 355 fh.seek(seg.offset) 356 File ".../site-packages/macholib/util.py", line 91, in seek 357 self._checkwindow(seekto, "seek") 358 File ".../site-packages/macholib/util.py", line 76, in _checkwindow 359 raise IOError( 360 OSError: seek to offset 9042440 is outside window [0, 112080] 361 362Clearly an external library is involved and execution flow jumps between dSYM and the library few times. 363Lets try to look around with a debugger. 364 365 (lldb) script 366 # Prepare data variable as described above. 367 368 # Run last statement with debugger. 369 >>> import pdb 370 >>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals()) 371 > <string>(1)<module>() 372 373 # Show debugger's help 374 (Pdb) help 375 376It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source 377code. This puts breakpoint on the IOError exception mentioned above. 378 379 (Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 380 Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 381 382You can now single step or continue the execution as usuall for a debugger. 383 384 (Pdb) cont 385 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 386 -> raise IOError( 387 (Pdb) bt 388 /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run() 389 -> exec(cmd, globals, locals) 390 <string>(1)<module>() 391 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__() 392 -> self.load(fp) 393 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load() 394 -> self.load_header(fh, 0, size) 395 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header() 396 -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 397 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__() 398 -> self.load(fh) 399 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load() 400 -> _old_MachOHeader_load(s, fh) 401 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 402 -> fh.seek(seg.offset) 403 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek() 404 -> self._checkwindow(seekto, "seek") 405 > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 406 -> raise IOError( 407 408 409Now we can move a frame above and inspect stopped target: 410 411 # Show current frame arguments 412 (Pdb) up 413 (Pdb) a 414 self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>> 415 offset = 9042440 416 whence = 0 417 418 # globals, local or expressons 419 (Pdb) p type(seg.offset) 420 <class 'macholib.ptypes.p_uint32'> 421 (Pdb) p hex(seg.offset) 422 '0x89fa08' 423 424 # Find attributes of a Python object. 425 (Pdb) p dir(section_cls) 426 ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str'] 427 (Pdb) p section_cls.sectname 428 <property object at 0x1077bbef0> 429 430Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which 431provides the offset to the seek method. Lets see where we are in the source code. 432 433 (Pdb) up 434 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 435 -> fh.seek(seg.offset) 436 (Pdb) list 437 282 not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL 438 283 if seg.offset > 0 and seg.size > 0 and not_zerofill: 439 284 low_offset = min(low_offset, seg.offset) 440 285 if not_zerofill: 441 286 c = fh.tell() 442 287 -> fh.seek(seg.offset) 443 288 sd = fh.read(seg.size) 444 289 seg.add_section_data(sd) 445 290 fh.seek(c) 446 291 segs.append(seg) 447 292 # data is a list of segments 448 449Running debugger on working case and stepping through the load() method shows that this code is not present. 450That means we are broken by a library update! Older versions of library do not load data for a section. 451