xref: /xnu-11215/doc/debugging/debugging.md (revision 8d741a5d)
1# XNU debugging
2
3Debugging XNU through kernel core files or with a live device.
4
5## Overview
6
7XNU’s debugging macros are compatible with Python 3.9+. Please be careful about pulling
8in the latest language features. Some users are living on older Xcodes and may not have the newest
9Python installed.
10
11## General coding tips
12
13### Imports
14
15The current implementation re-exports a lot of submodules through the XNU main module. This leads to some
16surprising behavior:
17
18* Name collisions at the top level may override methods with unexpected results.
19* New imports may change the order of imports, leading to some surpising side effects.
20
21Please avoid `from xnu import *` where possible and always explicitly import only what is
22required from other modules.
23
24### Checking the type of an object
25
26Avoid testing for a `type` explicitly like `type(obj) == type`.
27Instead, always use the inheritance-sensitive `isinstance(obj, type)`.
28
29### Dealing with binary data
30
31It’s recommended to use **bytearray**, **bytes**, and **memoryviews** instead of a string.
32Some LLDB APIs no longer accept a string in place of binary data in Python 3.
33
34### Accessing large amounts of binary data (or accessing small amounts frequently)
35
36In case you're planning on accessing large contiguous blocks of memory (e.g. reading a whole 10KB of memory),
37or you're accessing small semi-contiguous chunks (e.g. if you're parsing large structured data), then it might
38be hugely beneficial performance-wise to make use of the `io.SBProcessRawIO` class. Furthermore, if you're in
39a hurry and just want to read one specific chunk once, then it might be easier to use `LazyTarget.GetProcess().ReadMemory()`
40directly.
41
42In other words, avoid the following:
43
44```
45data_ptr = kern.GetValueFromAddress(start_addr, 'uint8_t *')
46with open(filepath, 'wb') as f:
47    f.write(data_ptr[:4096])
48```
49
50And instead use:
51
52```
53from core.io import SBProcessRawIO
54import shutil
55
56io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, 4096)
57with open(filepath, 'wb') as f:
58    shutil.copyfileobj(io_access, f)
59```
60
61Or, if you're in a hurry:
62
63```
64err = lldb.SBError()
65my_data = LazyTarget.GetProcess().ReadMemory(start_addr, length, err)
66if err.Success():
67    # Use my precious data
68    pass
69```
70
71For small semi-contiguous chunks, you can map the whole region and access random chunks from it like so:
72
73```
74from core.io import SBProcessRawIO
75
76io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size)
77io_access.seek(my_struct_offset)
78my_struct_contents = io_access.read(my_struct_size)
79```
80
81Not only that, but you can also tack on a BufferedRandom class on top of the SBProcessRawIO instance, which
82provides you with buffering (aka caching) in case your random small chunk accesses are repeated:
83
84```
85from core.io import SBProcessRawIO
86from io import BufferedRandom
87
88io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size)
89buffered_io = BufferedRandom(io_access)
90# And then use buffered_io for your accesses
91```
92
93### Encoding data to strings and back
94
95All strings are now `unicode` and must be converted between binary data and strings explicitly.
96When no explicit encoding is selected then UTF-8 is the default.
97
98```
99mystring = mybytes.decode()
100mybytes = mystring.encode()
101```
102In most cases **utf-8** will work but be careful to be sure that the encoding matches your data.
103
104There are two options to consider when trying to get a string out of the raw data without knowing if
105they are valid string or not:
106
107* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’
108* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct
109the string precisely
110
111Which to use depends on the transformation goals. The lossy conversion produces a printable string
112with strange characters in it. The lossless option is meant to be used when a string is only a transport
113mechanism and needs to be converted back to original values later.
114
115Switch the method by using `errors` handler during conversion:
116
117```
118# Lossy escapes invalid chars
119b.decode('utf-8', errors='`backslashreplace'`)
120# Lossy removes invalid chars
121b.decode('utf-8', errors='ignore')
122# Loss-less but may likely fail to print()
123b.decode('utf-8', errors='surrogateescape')
124```
125
126### Dealing with signed numbers
127
128Python's int has unlimited precision. This may be surprising for kernel developers who expect
129the behavior follows twos complement.
130
131Always use **unsigned()** or **signed()** regardless of what the actual underlying type is
132to ensure that macros use the correct semantics.
133
134## Testing changes
135
136Please check documentation here: <doc:macro_testing>
137
138### Coding style
139
140Use a static analyzer like **pylint** or **flake8** to check the macro source code:
141
142```
143$ python3 -m pip install --user pylint flake8
144
145# Run the lint either by setting your path to point to one of the runtimes
146# or through python
147$ python3 -m pylint <src files/dirs>
148$ python3 -m flake8 <src files/dirs>
149```
150
151### Correctness
152
153Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`.
154
155```
156# 1. Run LLDB with debug options set
157$ DEBUG_XNU_LLDBMACROS=1 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel
158
159# 2. Optionally load modified operating system plugin
160(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py
161
162# 3. Load modified scripts
163(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py
164
165# 4. Exercise macros
166```
167
168Depending on the change, test other targets and architectures (for instance, both Astris and KDP).
169
170### Regression
171
172This is simpler than previous step because the goal is to ensure behavior has not changed.
173You can speed up few things by using local symbols:
174
175```
176# 1. Get a coredump from a device and kernel UUID
177# 2. Grab symbols with dsymForUUID
178$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path>
179
180# 3. Run lldb with local symbols to avoid dsymForUUID NFS
181
182$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image>
183```
184
185The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>`
186option. Then run `diff` on the outputs of the baseline and modified code:
187
188* No environment variables to get baseline
189* Modified dSYM as described above
190
191It’s difficult to make this automated:
192
193* Some macros needs arguments which must be found in a core file.
194* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump
195  should be taken and then inspected afterwards, but this ties up a lab device for the duration of the
196  test.
197* Even with coredumps, testing the macros takes too long in our automation system and triggers the
198  failsafe timeout.
199
200### Code coverage
201
202Use code coverage to check which parts of macros have actually been tested.
203Install **coverage** lib with:
204
205```
206$ python3 -m pip install --user coverage
207```
208
209Then collect coverage:.
210
211```
212(lldb) xnudebug coverage /tmp/coverage.cov showallstacks
213
214...
215
216Coverage info saved to: "/tmp/coverage.cov"
217```
218
219You can then run `coverage html --data-file=/tmp/coverage.cov` in your terminal
220to generate an HTML report.
221
222
223Combine coverage from multiple files:
224
225```
226# Point PATH to local python where coverage is installed.
227$ export PATH="$HOME/Library/Python/3.8/bin:$PATH"
228
229# Use --keep to avoid deletion of input files after merge.
230$ coverage combine --keep <list of .coverage files or dirs to scan>
231
232# Get HTML report or use other subcommands to inspect.
233$ coverage html
234```
235
236It is possible to start coverage collection **before** importing the operating system library and
237loading macros to check code run during bootstrapping.
238
239For this, you'll need to run coverage manually:
240# 1. Start LLDB
241
242# 2. Load and start code coverage recording.
243(lldb) script import coverage
244(lldb) script cov = coverage.Coverage(data_file=_filepath_)
245(lldb) script cov.start()
246
247# 3. Load macros
248
249# 4. Collect the coverage.
250(lldb) script cov.stop()
251(lldb) script cov.save()
252
253### Performance testing
254
255Some macros can run for a long time. Some code may be costly even if it looks simple because objects
256aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting
257code coverage.
258
259First setup your environment:
260
261```
262# Install gprof2dot
263$ python3 -m pip install gprof2dot
264# Install graphviz
265$ brew install graphviz
266```
267
268Then to profile commands, follow this sequence:
269
270```
271(lldb) xnudebug profile /tmp/macro.prof showcurrentstacks
272[... command outputs ...]
273
274   Ordered by: cumulative time
275   List reduced from 468 to 30 due to restriction <30>
276
277   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
278   [... profiling output ...]
279
280Profile info saved to "/tmp/macro.prof"
281```
282
283Then to visualize callgraphs in context, in a separate shell:
284
285```
286# Now convert the file to a colored SVG call graph
287$ python3 -m gprof2dot -f pstats /tmp/macro.prof -o /tmp/call.dot
288$ dot -O -T svg /tmp/call.dot
289
290# and view it in your favourite viewer
291$ open /tmp/call.dot.svg
292```
293
294## Debugging your changes
295
296### Get detailed exception report
297
298The easiest way to debug an exception is to re-run your macro with the `--debug` option.
299This turns on more detailed output for each stack frame that includes source lines
300and local variables.
301
302### File a radar
303
304To report an actionable radar, please use re-run your failing macro with `--radar`.
305This will collect additional logs to an archive located in `/tmp`.
306
307Use the link provided to create a new radar.
308
309### Debugging with pdb
310
311YES, It is possible to use a debugger to debug your macro!
312
313The steps are similar to testing techniques described above (use scripting interactive mode). There is no point to
314document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which
315is part of Python installation so works out of the box.
316
317Problem: Something wrong is going on with addkext macro. What now?
318
319    (lldb) addkext -N com.apple.driver.AppleT8103PCIeC
320    Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310]
321    Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536]
322    Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536]
323	...
324	Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c
325    Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC
326    section '__TEXT' loaded at 0xfffffe001478c780
327
328There is no exception, lot of errors and no output. So what next?
329Try to narrow the problem down to an isolated piece of macro code:
330
331  1. Try to get values of globals through regular LLDB commands
332  2. Use interactive mode and invoke functions with arguments directly.
333
334After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an
335exception in the end. It was just captured in try/catch block. So the simplified reproducer is:
336
337    (lldb) script
338    >>> import lldb
339    >>> import xnu
340    >>> err = lldb.SBError()
341    >>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err)
342    >>> m = macho.MemMacho(data, len(data))
343    Traceback (most recent call last):
344      File "<console>", line 1, in <module>
345      File ".../lldbmacros/macho.py", line 91, in __init__
346        self.load(fp)
347      File ".../site-packages/macholib/MachO.py", line 133, in load
348        self.load_header(fh, 0, size)
349      File ".../site-packages/macholib/MachO.py", line 168, in load_header
350        hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
351      File ".../site-packages/macholib/MachO.py", line 209, in __init__
352        self.load(fh)
353      File ".../lldbmacros/macho.py", line 23, in new_load
354        _old_MachOHeader_load(s, fh)
355      File ".../site-packages/macholib/MachO.py", line 287, in load
356        fh.seek(seg.offset)
357      File ".../site-packages/macholib/util.py", line 91, in seek
358        self._checkwindow(seekto, "seek")
359      File ".../site-packages/macholib/util.py", line 76, in _checkwindow
360        raise IOError(
361    OSError: seek to offset 9042440 is outside window [0, 112080]
362
363Clearly an external library is involved and execution flow jumps between dSYM and the library few times.
364Lets try to look around with a debugger.
365
366    (lldb) script
367	# Prepare data variable as described above.
368
369	# Run last statement with debugger.
370	>>> import pdb
371	>>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals())
372	> <string>(1)<module>()
373
374	# Show debugger's help
375	(Pdb) help
376
377It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source
378code. This puts breakpoint on the IOError exception mentioned above.
379
380	(Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
381    Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
382
383You can now single step or continue the execution as usuall for a debugger.
384
385    (Pdb) cont
386    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
387    -> raise IOError(
388    (Pdb) bt
389      /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run()
390    -> exec(cmd, globals, locals)
391      <string>(1)<module>()
392      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__()
393    -> self.load(fp)
394      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load()
395    -> self.load_header(fh, 0, size)
396      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header()
397    -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
398      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__()
399    -> self.load(fh)
400      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load()
401    -> _old_MachOHeader_load(s, fh)
402      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
403    -> fh.seek(seg.offset)
404      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek()
405    -> self._checkwindow(seekto, "seek")
406    > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
407    -> raise IOError(
408
409
410Now we can move a frame above and inspect stopped target:
411
412    # Show current frame arguments
413    (Pdb) up
414    (Pdb) a
415    self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>>
416    offset = 9042440
417    whence = 0
418
419    # globals, local or expressons
420    (Pdb) p type(seg.offset)
421    <class 'macholib.ptypes.p_uint32'>
422    (Pdb) p hex(seg.offset)
423    '0x89fa08'
424
425    # Find attributes of a Python object.
426    (Pdb) p dir(section_cls)
427    ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str']
428    (Pdb) p section_cls.sectname
429    <property object at 0x1077bbef0>
430
431Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which
432provides the offset to the seek method. Lets see where we are in the source code.
433
434    (Pdb) up
435    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
436    -> fh.seek(seg.offset)
437    (Pdb) list
438    282  	                        not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL
439    283  	                        if seg.offset > 0 and seg.size > 0 and not_zerofill:
440    284  	                            low_offset = min(low_offset, seg.offset)
441    285  	                        if not_zerofill:
442    286  	                            c = fh.tell()
443    287  ->	                            fh.seek(seg.offset)
444    288  	                            sd = fh.read(seg.size)
445    289  	                            seg.add_section_data(sd)
446    290  	                            fh.seek(c)
447    291  	                        segs.append(seg)
448    292  	                # data is a list of segments
449
450Running debugger on working case and stepping through the load() method shows that this code is not present.
451That means we are broken by a library update! Older versions of library do not load data for a section.
452