xref: /xnu-11215/doc/allocators/api-basics.md (revision 8d741a5d)
1# XNU Allocators best practices
2
3The right way to allocate memory in the kernel.
4
5## Introduction
6
7XNU proposes two ways to allocate memory:
8
9- the VM subsystem that provides allocations at the granularity of pages (with
10  `kmem_alloc` and similar interfaces);
11- the zone allocator subsystem (`<kern/zalloc.h>`) which is a slab-allocator of
12  objects of fixed size.
13
14In addition to that, `<kern/kalloc.h>` provides a variable-size general purpose
15allocator implemented as a collection of zones of fixed size, and overflowing to
16`kmem_alloc` for allocations larger than a few pages (32KB when this
17document was being written but this is subject to change/tuning in the future).
18
19
20The Core Kernel allocators rely on the following headers:
21
22- `<kern/zalloc.h>` and `<kern/kalloc.h>` for its API surface, which most
23  clients should find sufficient,
24- `<kern/zalloc_internal.h>` for interfaces that need to be exported
25  for introspection and implementation purposes, and is not meant
26  for general consumption.
27
28This document will present the best practices to allocate memory
29in the kernel, from a security perspective.
30
31## Permanent allocations
32
33The kernel sometimes needs to provide persistent allocations that depend on
34parameters that aren't compile time constants, but will not vary over time (NCPU
35is an obvious example here).
36
37The zone subsystem provides a `zalloc_permanent*` family of functions that help
38allocating memory in such a fashion in a very compact way.
39
40Unlike the typical zone allocators, this allows for arbitrary sizes, in a
41similar fashion to `kalloc`. These functions will never fail (if the allocation
42fails, the kernel will panic), and always return zeroed memory. Trying to free
43these allocations results in a kernel panic.
44
45## Allocation flags
46
47Most `zalloc` or `kalloc` functions take `zalloc_flags_t` typed flags.
48When flags are expected, exactly one of `Z_WAITOK`, `Z_NOWAIT` or `Z_NOPAGEWAIT`
49is to be passed:
50
51- `Z_WAITOK` means that the zone allocator can wait and block,
52- `Z_NOWAIT` can be used to require a fully non blocking behavior, which can be
53  used for allocations under spinlock and other preemption disabled contexts;
54- `Z_NOPAGEWAIT` allows for the allocator to block (typically on mutexes),
55  but not to wait for available pages if there are none, this is only useful
56  for the buffer cache, and most client should either use `Z_NOWAIT` or `Z_WAITOK`.
57
58Other important flags:
59
60- `Z_ZERO` if zeroed memory is expected (nowadays most of the allocations will
61  be zeroed regardless, but it's always clearer to specify it), note that it is
62  often more efficient than calling bzero as the allocator tends to maintain
63  freed memory as zeroed in the first place,
64- `Z_NOFAIL` if the caller knows the allocation can't fail: allocations that are
65   made with `Z_WAITOK` from regular (non exhaustible) zones, or from `kalloc*`
66   interfaces with a size smaller than `KALLOC_SAFE_ALLOC_SIZE`,
67  will never fail (the kernel will instead panic if no memory can be found).
68  `Z_NOFAIL` can be used to denote that the caller knows about this.
69  If `Z_NOFAIL` is incorrectly used, then the zone allocator will panic at runtime.
70
71## Zones (`zalloc`)
72
73The first blessed way to allocate memory in the kernel is by using zones.
74Zones are mostly meant to be used in Core XNU and some "BSD" kexts.
75
76It is generally recommended to create zones early and to store the `zone_t`
77pointer in read-only memory (using `SECURITY_READ_ONLY_LATE` storage).
78
79Zones are more feature-rich than `kalloc`, and some features can only be
80used when making a zone:
81
82- the object type being allocated requires extremely strong segregation
83  from other types (typically `zone_require` will be used with this zone),
84- the object type implements some form of security boundary and wants to adopt
85  the read-only allocator (See `ZC_READONLY`),
86- the allocation must be per-cpu,
87- ...
88
89In the vast majority of cases however, using `kalloc_type` (or `IOMallocType`)
90is preferred.
91
92
93## The Typed allocator
94
95Ignoring VM allocations (or wrappers like `IOMemoryDescriptor`), the only
96blessed way to allocate typed memory in XNU is using the typed allocator
97`kalloc_type` or one of its variants (like IOKit's `IOMallocType`) and untyped
98memory that doesn't contain pointers is using the data API `kalloc_data` or
99one of its variants (like IOKit's `IOMallocData`). However, this comes with
100additional requirements.
101
102Note that at this time, those interfaces aren't exported to third parties,
103as its ABI has not yet converged.
104
105### A word about types
106
107The typed allocators assume that allocated types fit a very precise model.
108If the allocations you perform do not fit the model, then your types
109must be restructured to fit, for security reasons.
110
111A general theme will be the separation of data/primitive types from pointers,
112as attackers tend to use data/pointer overlaps to carry out their exploitations.
113
114The typed allocators use compiler support to infer signatures
115of the types being allocated. Because some scalars actually represent
116kernel pointers (like `vm_offset_t`,`vm_address_t`, `uintptr_t`, ...),
117types or structure members can be decorated with `__kernel_ptr_semantics`
118to denote when a data-looking type is actually a pointer.
119
120Do note that `__kernel_data_semantics` and `__kernel_dual_semantics`
121are also provided but should typically rarely be used.
122
123#### fixed-sized types
124
125The first case is fixed size types, this is typically a `struct`, `union`
126or C++ `class`. Fixed-size types must follow certain rules:
127
128- types should be small enough to fit in the zone allocator:
129  smaller than `KALLOC_SAFE_ALLOC_SIZE`. When this is not the case,
130  we have typically found that there is a large array of data,
131  or some buffer in that type, the solution is to outline this allocation.
132  kernel extensions must define `KALLOC_TYPE_STRICT_SIZE_CHECK` to turn
133  misuse of `kalloc_type()` relative to size at compile time, it's default in XNU.
134- for union types, data/pointer overlaps should be avoided if possible.
135  when this isn't possible, a zone should be considered.
136
137#### Variable-sized types
138
139These come in two variants: arrays, and arrays prefixed with a header.
140Any other case must be reduced to those, by possibly making more allocations.
141
142An array is simply an allocation of several fixed-size types,
143and the rules of "fixed-sized types" above apply to them.
144
145The following rules are expected when dealing with variable sized allocations:
146
147- variable sized allocations should have a single owner and not be refcounted;
148- under the header-prefixed form, if the header contains pointers,
149  then the array element type **must not** be only data.
150
151If those rules can't be followed, then the allocation must be split with
152the header becoming a fixed-sized type becoming the single owner
153of an array.
154
155#### Untyped memory
156
157When allocating untyped memory with the data APIs ensure that it doesn't
158contain kernel pointers. If your untyped allocation contains kernel pointers
159consider splitting the allocation into two: one part that is typed and contains
160the kernel pointers and the second that is untyped and data-only.
161
162### API surface
163
164<table>
165  <tr>
166    <th>Interface</th>
167    <th>API</th>
168    <th>Notes</th>
169  </tr>
170  <tr>
171    <td>Data/Primitive types</td>
172    <td>
173      <p>
174      <b>Core Kernel</b>:<br/>
175      <tt>kalloc_data(size, flags)</tt><br/>
176      <tt>krealloc_data(ptr, old_size, new_size, flags)</tt><br/>
177      <tt>kfree_data(ptr, size)</tt><br/>
178      <tt>kfree_data_counted_by(ptr_var, count_var)</tt><br/>
179      <tt>kfree_data_sized_by(ptr_var, byte_count_var)</tt><br/>
180      <tt>kfree_data_addr(ptr)</tt>
181      </p>
182      <p>
183      <b>IOKit untyped variant (returns <tt>void *</tt>)</b>:<br/>
184      <tt>IOMallocData(size)</tt><br/>
185      <tt>IOMallocZeroData(size)</tt><br/>
186      <tt>IOFreeData(ptr, size)</tt>
187      </p>
188      <p>
189      <b>IOKit typed variant (returns <tt>type_t *</tt>)</b>:<br/>
190      <tt>IONewData(type_t, count)</tt><br/>
191      <tt>IONewZeroData(type_t, count)</tt><br/>
192      <tt>IODeleteData(ptr, type_t, count)</tt>
193      </p>
194    </td>
195    <td>This should be used when the allocated type contains no kernel pointer only</td>
196  </tr>
197  <tr>
198    <td>Fixed-sized type</td>
199    <td>
200      <p>
201      <b>Core Kernel</b>:<br/>
202      <tt>kalloc_type(type_t, flags)</tt><br/>
203      <tt>kfree_type(type_t, ptr)</tt>
204      </p>
205      <p>
206      <b>IOKit:</b><br/>
207      <tt>IOMallocType(type_t)</tt><br/>
208      <tt>IOFreeType(ptr, type_t)</tt>
209      </p>
210    </td>
211    <td>
212      <p>
213      Note that this is absolutely OK to use this variant
214      for data/primitive types, it will be redirected to <tt>kalloc_data</tt>
215      (or <tt>IOMallocData</tt>).
216      </p>
217    </td>
218  </tr>
219  <tr>
220    <td>Arrays of fixed-sized type</td>
221    <td>
222      <p>
223      <b>Core Kernel</b>:<br/>
224      <tt>kalloc_type(type_t, count, flags)</tt><br/>
225      <tt>kfree_type(type_t, count, ptr)</tt>
226      </p>
227      <p>
228      <b>IOKit:</b><br/>
229      <tt>IONew(type_t, count)</tt><br/>
230      <tt>IONewZero(type_t, count)</tt><br/>
231      <tt>IODelete(ptr, type_t, count)</tt>
232      </p>
233    </td>
234    <td>
235      <p>
236      <tt>kalloc_type(type_t, ...)</tt> (resp. <tt>IONew(type_t, 1)</tt>)
237      <b>isn't</b> equivalent to <tt>kalloc_type(type_t, 1, ...)</tt>
238      (resp. <tt>IOMallocType(type_t)</tt>). Mix-and-matching interfaces
239      will result in panics.
240      </p>
241      <p>
242      Note that this is absolutely OK to use this variant
243      for data/primitive types, it will be redirected to <tt>kalloc_data</tt>.
244      </p>
245    </td>
246  </tr>
247  <tr>
248    <td>Header-prefixed arrays of fixed-sized type</td>
249    <td>
250      <p>
251      <b>Core Kernel</b>:<br/>
252      <tt>kalloc_type(hdr_type_t, type_t, count, flags)</tt><br/>
253      <tt>kfree_type(hdr_type_t, type_t, count, ptr)</tt>
254      </p>
255      <p>
256      <b>IOKit:</b><br/>
257      <tt>IONew(hdr_type_t, type_t, count)</tt><br/>
258      <tt>IONewZero(hdr_type_t, type_t, count)</tt><br/>
259      <tt>IODelete(ptr, hdr_type_t, type_t, count)</tt>
260      </p>
261    </td>
262    <td>
263      <p>
264      <tt>hdr_type_t</tt> can't contain a refcount,
265      and <tt>type_t</tt> can't be a primitive type.
266      </p>
267    </td>
268  </tr>
269</table>
270
271`kfree_data_counted_by` and `kfree_data_sized_by` are used when working with
272-fbounds-safety and pointers with __counted_by and __sized_by modifiers,
273respectively. They expect both their pointer and size arguments to be
274modifiable, and the pointer and size will be set to 0 together, in accordance
275with -fbounds-safety semantics. Please note that arguments are evaluated
276multiple times. When -fbounds-safety is enabled, the compiler can help ensuring
277correct usage of these macros; with -fbounds-safety disabled, engineers are on
278their own to ensure proper usage.
279
280## C++ classes and operator new.
281
282This section covers how typed allocators should be adopted to use
283`operator new/delete` in C++. For C++ classes, the approach required
284differs based on whether the class inherits from `OSObject` or not.
285
286Most, if not all, C++ objects used in conjuction with IOKit APIs
287should probably use OSObject as a base class. C++ operators
288and non-POD types should be used seldomly.
289
290### `OSObject` subclasses
291
292All subclasses of `OSObject` must declare and define one of IOKit's
293`OSDeclare*` and `OSDefine*` macros. As part of those, an `operator new` and
294`operator delete` are injected that force objects to enroll into `kalloc_type`.
295
296Note that idiomatic IOKit is supposed to use `OSTypeAlloc(Class)`.
297
298### Other classes
299
300Unlike `OSObject` subclasses, regular C++ classes must adopt typed allocators
301manually. If your struct or class is POD (Plain Old Data), then replacing usage of
302`new/delete` (resp. `new[]/delete[]`) with `IOMallocType/IOFreeType` (resp.
303`IONew/IODelete`) is safe.
304
305However, if you have non default structors, or members of your class/struct
306have non default structors, you will need to manually enroll it into `kalloc_type`.
307This can be accomplished through one of the following approaches, and it lets you
308to continue to use C++'s new and delete keywords to allocate/deallocate instances.
309
310The first approach is to subclass the IOTypedOperatorsMixin struct. This will
311adopt typed allocators for your class/struct by providing the appropriate
312implementations for `operator new/delete`:
313
314```cpp
315struct Type : public IOTypedOperatorsMixin<Type> {
316    ...
317};
318```
319
320Alternatively, if you cannot use the mixin approach, you can use the
321`IOOverrideTypedOperators` macro to override `operator new/delete`
322within your class/struct declaration:
323
324```cpp
325struct Type {
326    IOOverrideTypedOperators(Type);
327    ...
328};
329```
330
331Finally, if you need to decouple the declaration of the operators from
332their implementation, you can use `IODeclareTypedOperators` paired with
333`IODefineTypedOperators`, to declare the operators within your class/struct
334declaration and then provide their definition out of line:
335
336```cpp
337// declaration
338struct Type {
339    IODeclareTypedOperators(Type);
340    ...
341};
342
343// definition
344IODefineTypedOperators(Type)
345```
346
347When a class/struct adopts typed allocators through one of those approaches,
348all its subclasses must also explicitly adopt typed allocators. It is not
349sufficient for a common parent within the class hierarchy to enroll, in order to
350automatically provide the implementation of the operators for all of its children:
351each and every subclass in the class hierarchy must also explicitly do the same.
352
353### The case of `operator new[]`
354
355The ABI of `operator new[]` is unfortunate, as it denormalizes
356data that we prefer to be known by the owning object
357(the element sizes and array element count).
358
359It also makes those allocations ripe for abuse in an adversarial
360context as this denormalized information is at the begining
361of the structure, making it relatively easy to attack with
362out-of-bounds bugs.
363
364For this reason, the default variants of the mixin and the macros
365presented above will delete the implementation of `operator new[]`
366from the class they are applied to.
367
368However, if those must be used, you can add adopt the typed
369allocators on your class by using the appropriate variant
370which explicitly implements the support for array operators:
371- `IOTypedOperatorsMixinSupportingArrayOperators`
372- `IOOverrideTypedOperatorsSupportingArrayOperators`
373- `IO{Declare, Define}TypedOperatorsSupportingArrayOperators`
374
375### Scalar types
376
377The only accepted ways of using `operator new/delete` and their variants are the ones
378described above. You should never use the operators on scalar types. Instead, you
379should use the appropriate typed allocator API based on the semantics of the memory
380being allocated (i.e. `IOMallocData` for data only buffers, and `IOMallocType`/`IONew`
381for any other type).
382
383### Wrapping C++ type allocation in container OSObjects
384
385The blessed way of wrapping and passing a C++ type allocation for use in the
386libkern collection is using `OSValueObject`. Please do not use `OSData` for this
387purpose as its backing store should not contain kernel pointers.
388
389`OSValueObject<T>` allows you to safely use an `OSData` like API surface
390wrapping a structure of type `T`. For each unique `T` being used, the
391`OSValueObject<T>` must be instantiated in a module of your kernel extension,
392using `OSDefineValueObjectForDependentType(T);`.
393
394