xref: /xnu-11215/doc/debugging/debugging.md (revision 94d3b452)
1# XNU debugging
2
3Debugging xnu through kernel core files or with a live device.
4
5## Overview
6
7xnu’s debugging macros are compatible with both Python 3.9+. Please be careful about pulling
8in the latest language features. Some users are living on older Xcodes and may not have the newest
9Python installed.
10
11## General coding tips
12
13### Imports
14
15The current implementation re-exports a lot of submodules through the XNU main module. This leads to some
16surprising behavior:
17
18* Name collisions at the top level may override methods with unexpected results.
19* New imports may change the order of imports, leading to some surpising side effects.
20
21Please avoid `from xnu import *` where possible and always explicitly import only what is
22required from other modules.
23
24### Checking the type of an object
25
26Avoid testing for a `type` explicitly like `type(obj) == type`.
27Instead, always use the inheritance-sensitive `isinstance(obj, type)`.
28
29### Dealing with binary data
30
31It’s recommended to use **bytearray**, **bytes**, and **memoryviews** instead of a string.
32Some LLDB APIs no longer accept a string in place of binary data in Python 3.
33
34### Accessing large amounts of binary data (or accessing small amounts frequently)
35
36In case you're planning on accessing large contiguous blocks of memory (e.g. reading a whole 10KB of memory),
37or you're accessing small semi-contiguous chunks (e.g. if you're parsing large structured data), then it might
38be hugely beneficial performance-wise to make use of the `io.SBProcessRawIO` class. Furthermore, if you're in
39a hurry and just want to read one specific chunk once, then it might be easier to use `LazyTarget.GetProcess().ReadMemory()`
40directly.
41
42In other words, avoid the following:
43
44```
45data_ptr = kern.GetValueFromAddress(start_addr, 'uint8_t *')
46with open(filepath, 'wb') as f:
47    f.write(data_ptr[:4096])
48```
49
50And instead use:
51
52```
53from core.io import SBProcessRawIO
54import shutil
55
56io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, 4096)
57with open(filepath, 'wb') as f:
58    shutil.copyfileobj(io_access, f)
59```
60
61Or, if you're in a hurry:
62
63```
64err = lldb.SBError()
65my_data = LazyTarget.GetProcess().ReadMemory(start_addr, length, err)
66if err.Success():
67    # Use my precious data
68    pass
69```
70
71For small semi-contiguous chunks, you can map the whole region and access random chunks from it like so:
72
73```
74from core.io import SBProcessRawIO
75
76io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size)
77io_access.seek(my_struct_offset)
78my_struct_contents = io_access.read(my_struct_size)
79```
80
81Not only that, but you can also tack on a BufferedRandom class on top of the SBProcessRawIO instance, which
82provides you with buffering (aka caching) in case your random small chunk accesses are repeated:
83
84```
85from core.io import SBProcessRawIO
86from io import BufferedRandom
87
88io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size)
89buffered_io = BufferedRandom(io_access)
90# And then use buffered_io for your accesses
91```
92
93### Encoding data to strings and back
94
95All strings are now `unicode` and must be converted between binary data and strings explicitly.
96When no explicit encoding is selected then UTF-8 is the default.
97
98```
99mystring = mybytes.decode()
100mybytes = mystring.encode()
101```
102In most cases **utf-8** will work but be careful to be sure that the encoding matches your data.
103
104There are two options to consider when trying to get a string out of the raw data without knowing if
105they are valid string or not:
106
107* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’
108* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct
109the string precisely
110
111Which to use depends on the transformation goals. The lossy conversion produces a printable string
112with strange characters in it. The lossless option is meant to be used when a string is only a transport
113mechanism and needs to be converted back to original values later.
114
115Switch the method by using `errors` handler during conversion:
116
117```
118# Lossy escapes invalid chars
119b.decode('utf-8', errors='`backslashreplace'`)
120# Lossy removes invalid chars
121b.decode('utf-8', errors='ignore')
122# Loss-less but may likely fail to print()
123b.decode('utf-8', errors='surrogateescape')
124```
125
126### Dealing with signed numbers
127
128Python's int has unlimited precision. This may be surprising for kernel developers who expect
129the behavior follows twos complement.
130
131Always use **unsigned()** or **signed()** regardless of what the actual underlying type is
132to ensure that macros use the correct semantics.
133
134## Testing changes
135
136There is no perfect test suite to check that macros are producing a correct value compared to what
137the debugger sees in a target.
138
139Be careful when touching common framework code. For larger changes, ask the Platform Triage team to
140validate that the changes work in their environment before integration.
141
142### Coding style
143
144Use a static analyzer like **pylint** or **flake8** to check the macro source code:
145
146```
147$ python3 -m pip install --user pylint flake8
148
149# Run the lint either by setting your path to point to one of the runtimes
150# or through python
151$ python3 -m pylint <src files/dirs>
152$ python3 -m flake8 <src files/dirs>
153```
154
155### Correctness
156
157Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`.
158
159```
160# 1. Run LLDB with debug options set
161$ DEBUG_XNU_LLDBMACROS=1 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel
162
163# 2. Optionally load modified operating system plugin
164(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py
165
166# 3. Load modified scripts
167(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py
168
169# 4. Exercise macros
170```
171
172Depending on the change, test other targets and architectures (for instance, both Astris and KDP).
173
174### Regression
175
176This is simpler than previous step because the goal is to ensure behavior has not changed.
177You can speed up few things by using local symbols:
178
179```
180# 1. Get a coredump from a device and kernel UUID
181# 2. Grab symbols with dsymForUUID
182$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path>
183
184# 3. Run lldb with local symbols to avoid dsymForUUID NFS
185
186$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image>
187```
188
189The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>`
190option. Then run `diff` on the outputs of the baseline and modified code:
191
192* No environment variables to get baseline
193* Modified dSYM as described above
194
195It’s difficult to make this automated:
196
197* Some macros needs arguments which must be found in a core file.
198* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump
199  should be taken and then inspected afterwards, but this ties up a lab device for the duration of the
200  test.
201* Even with coredumps, testing the macros takes too long in our automation system and triggers the
202  failsafe timeout.
203
204### Code coverage
205
206Use code coverage to check which parts of macros have actually been tested.
207Install **coverage** lib with:
208
209```
210$ python3 -m pip install --user coverage
211```
212
213Then collect coverage:.
214
215```
216# 1. Start LLDB with your macros as described above.
217
218# 2. Load and start code coverage recording.
219(lldb) script import coverage
220(lldb) script cov = coverage.Coverage()
221(lldb) script cov.start()
222
223# 3. Do the testing.
224
225# 4. Collect the coverage.
226(lldb) script cov.stop()
227(lldb) script cov.save()
228```
229
230You can override the default file (*.coverage*) by adding an additional environment variable to LLDB:
231
232```
233$ env COVERAGE_FILE="${OUTDIR}/.coverage.mytest" # usual LLDB command line
234```
235
236Combine coverage from multiple files:
237
238```
239# Point PATH to local python where coverage is installed.
240$ export PATH="$HOME/Library/Python/3.8/bin:$PATH"
241
242# Use --keep to avoid deletion of input files after merge.
243$ coverage combine --keep <list of .coverage files or dirs to scan>
244
245# Get HTML report or use other subcommands to inspect.
246$ coverage html
247```
248
249It is possible to start coverage collection **before** importing the operating system library and
250loading macros to check code run during bootstrapping.
251
252### Performance testing
253
254Some macros can run for a long time. Some code may be costly even if it looks simple because objects
255aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting
256code coverage.
257
258First setup your environment:
259
260```
261# Install gprof2dot
262$ python3 -m pip install gprof2dot
263# Install graphviz
264$ brew install graphviz
265```
266
267Then to profile commands, follow this sequence:
268
269```
270(lldb) xnudebug profile /tmp/macro.prof showcurrentstacks
271[... command outputs ...]
272
273   Ordered by: cumulative time
274   List reduced from 468 to 30 due to restriction <30>
275
276   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
277   [... profiling output ...]
278
279Profile info saved to "/tmp/macro.prof"
280```
281
282Then to visualize callgraphs in context, in a separate shell:
283
284```
285# Now convert the file to a colored SVG call graph
286$ python3 -m gprof2dot -f pstats /tmp/macro.prof -o /tmp/call.dot
287$ dot -O -T svg /tmp/call.dot
288
289# and view it in your favourite viewer
290$ open /tmp/call.dot.svg
291```
292
293## Debugging your changes
294
295### Get detailed exception report
296
297The easiest way to debug an exception is to re-run your macro with the `--debug` option.
298This turns on more detailed output for each stack frame that includes source lines
299and local variables.
300
301### File a radar
302
303To report an actionable radar, please use re-run your failing macro with `--radar`.
304This will collect additional logs to an archive located in `/tmp`.
305
306Use the link provided to create a new radar.
307
308### Debugging with pdb
309
310YES, It is possible to use a debugger to debug your macro!
311
312The steps are similar to testing techniques described above (use scripting interactive mode). There is no point to
313document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which
314is part of Python installation so works out of the box.
315
316Problem: Something wrong is going on with addkext macro. What now?
317
318    (lldb) addkext -N com.apple.driver.AppleT8103PCIeC
319    Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310]
320    Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536]
321    Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536]
322	...
323	Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c
324    Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC
325    section '__TEXT' loaded at 0xfffffe001478c780
326
327There is no exception, lot of errors and no output. So what next?
328Try to narrow the problem down to an isolated piece of macro code:
329
330  1. Try to get values of globals through regular LLDB commands
331  2. Use interactive mode and invoke functions with arguments directly.
332
333After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an
334exception in the end. It was just captured in try/catch block. So the simplified reproducer is:
335
336    (lldb) script
337    >>> import lldb
338    >>> import xnu
339    >>> err = lldb.SBError()
340    >>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err)
341    >>> m = macho.MemMacho(data, len(data))
342    Traceback (most recent call last):
343      File "<console>", line 1, in <module>
344      File ".../lldbmacros/macho.py", line 91, in __init__
345        self.load(fp)
346      File ".../site-packages/macholib/MachO.py", line 133, in load
347        self.load_header(fh, 0, size)
348      File ".../site-packages/macholib/MachO.py", line 168, in load_header
349        hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
350      File ".../site-packages/macholib/MachO.py", line 209, in __init__
351        self.load(fh)
352      File ".../lldbmacros/macho.py", line 23, in new_load
353        _old_MachOHeader_load(s, fh)
354      File ".../site-packages/macholib/MachO.py", line 287, in load
355        fh.seek(seg.offset)
356      File ".../site-packages/macholib/util.py", line 91, in seek
357        self._checkwindow(seekto, "seek")
358      File ".../site-packages/macholib/util.py", line 76, in _checkwindow
359        raise IOError(
360    OSError: seek to offset 9042440 is outside window [0, 112080]
361
362Clearly an external library is involved and execution flow jumps between dSYM and the library few times.
363Lets try to look around with a debugger.
364
365    (lldb) script
366	# Prepare data variable as described above.
367
368	# Run last statement with debugger.
369	>>> import pdb
370	>>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals())
371	> <string>(1)<module>()
372
373	# Show debugger's help
374	(Pdb) help
375
376It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source
377code. This puts breakpoint on the IOError exception mentioned above.
378
379	(Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
380    Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
381
382You can now single step or continue the execution as usuall for a debugger.
383
384    (Pdb) cont
385    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
386    -> raise IOError(
387    (Pdb) bt
388      /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run()
389    -> exec(cmd, globals, locals)
390      <string>(1)<module>()
391      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__()
392    -> self.load(fp)
393      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load()
394    -> self.load_header(fh, 0, size)
395      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header()
396    -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
397      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__()
398    -> self.load(fh)
399      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load()
400    -> _old_MachOHeader_load(s, fh)
401      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
402    -> fh.seek(seg.offset)
403      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek()
404    -> self._checkwindow(seekto, "seek")
405    > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
406    -> raise IOError(
407
408
409Now we can move a frame above and inspect stopped target:
410
411    # Show current frame arguments
412    (Pdb) up
413    (Pdb) a
414    self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>>
415    offset = 9042440
416    whence = 0
417
418    # globals, local or expressons
419    (Pdb) p type(seg.offset)
420    <class 'macholib.ptypes.p_uint32'>
421    (Pdb) p hex(seg.offset)
422    '0x89fa08'
423
424    # Find attributes of a Python object.
425    (Pdb) p dir(section_cls)
426    ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str']
427    (Pdb) p section_cls.sectname
428    <property object at 0x1077bbef0>
429
430Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which
431provides the offset to the seek method. Lets see where we are in the source code.
432
433    (Pdb) up
434    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
435    -> fh.seek(seg.offset)
436    (Pdb) list
437    282  	                        not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL
438    283  	                        if seg.offset > 0 and seg.size > 0 and not_zerofill:
439    284  	                            low_offset = min(low_offset, seg.offset)
440    285  	                        if not_zerofill:
441    286  	                            c = fh.tell()
442    287  ->	                            fh.seek(seg.offset)
443    288  	                            sd = fh.read(seg.size)
444    289  	                            seg.add_section_data(sd)
445    290  	                            fh.seek(c)
446    291  	                        segs.append(seg)
447    292  	                # data is a list of segments
448
449Running debugger on working case and stepping through the load() method shows that this code is not present.
450That means we are broken by a library update! Older versions of library do not load data for a section.
451