1# String handling in xnu 2 3xnu implements most POSIX C string functions, including the inherited subset of 4standard C string functions. Unfortunately, poor design choices have made many 5of these functions, including the more modern `strl` functions, confusing or 6unsafe. In addition, the advent of -fbounds-safety support in xnu is forcing 7some string handling practices to be revisited. This document explains the 8failings of POSIX C string functions, xnu's `strbuf` functions, and their 9intersection with the -fbounds-safety C extension. 10 11## The short-form guidance 12 13* Use `strbuf*` when you have the length for all the strings; 14* use `strl*` when you have the length of _one_ string, and the other is 15 guaranteed to be NUL-terminated; 16* use `str*` when you don't have the length for any of the strings, and they 17 are all guaranteed to be NUL-terminated; 18* stop using `strn*` functions. 19 20## Replacing `strncmp` 21 22`strncmp` is always wrong with -fbounds-safety, and it's unavailable as a 23result. Given `strcmp(first, secnd, n)`, you need to know the types of `first` 24and `secnd` to pick a replacement. Choose according to this table: 25 26| strncmp(first, secnd, n) | __null_terminated first | __indexable first | 27| ------------------------ | ------------------------- | ------------------------------- | 28| __null_terminated secnd | n/a | strlcmp(first, secnd, n1) | 29| __indexable secnd | strlcmp(secnd, first, n2) | strbufcmp(first, n1, secnd, n2) | 30 31Using `strncmp` with two NUL-terminated strings is uncommon and it has no 32direct replacement. The first person who needs to use -fbounds-safety in a file 33that does this might need to write the string function. 34 35If you try to use `strlcmp` and you get a diagnostic like this: 36 37> passing 'const char *__indexable' to parameter of incompatible type 38> 'const char *__null_terminated' is an unsafe operation ... 39 40then you might need to swap the two string arguments. `strlcmp` is sensitive to 41the argument order: just like for `strlcpy`, the indexable string goes first. 42 43# The problems with string functions 44 45POSIX/BSD string handling functions come in many variants: 46 47* `str` functions (strlen, strcat, etc), unsafe for writing; 48* `strn` functions (strnlen, strncat, etc), unsafe for writing; 49* `strl` functions (strlcpy, strlcat, etc), safe but easily misunderstood. 50 51`str` functions for writing (`strcpy`, `strcat`, etc) are **all** unsafe 52because they don't care about the bounds of the output buffer. Most or all of 53these functions have been deprecated or outright removed from xnu. You should 54never use `str` functions to write to strings. Functions that simply read 55strings (`strlen`, `strcmp`, `strchr`, etc) are generally found to be safe 56because there is no confusion that their input must be NUL-terminated and there 57is no danger of writing out of bounds (out of not writing at all). 58 59`strn` functions for writing (`strncpy`, `strncat`, etc) are **all** unsafe. 60`strncpy` doesn't NUL-terminate the output buffer, and `strncat` doesn't accept 61a length for the output buffer. **All** new string buffers should include space 62for a NUL terminator. `strn` functions for reading (`strncmp`, `strnlen`) are 63_generally_ safe, but `strncmp` can cause confusion over which string is bound 64by the given size. In extreme cases, this can create information disclosure 65bugs or stability issues. 66 67`strl` functions, from OpenBSD, only come in writing variants, and they always 68NUL-terminate their output. This makes the writing part safe. (xnu adds `strl` 69comparison functions, which do no writing and are also safe.) However, these 70functions assume the output pointer is a buffer and the input is a NUL- 71terminated string. Because of coexistence with `strn` functions that make no 72such assumption, this mental model isn't entirely adopted by many users. For 73instance, the following code is buggy: 74 75```c 76char output[4]; 77char input[8] = "abcdefgh"; /* not NUL-terminated */ 78strlcpy(output, input, sizeof(output)); 79``` 80 81`strlcpy` returns the length of the input string; in xnu's implementation, 82literally by calling `strlen(input)`. Even though only 3 characters are written 83to `output` (plus a NUL), `input` is read until reaching a NUL character. This 84is always a problem from the perspective of memory disclosures, and in some 85cases, it can also lead to stability issues. 86 87# Changes with -fbounds-safety 88 89When enabling -fbounds-safety, character buffers and NUL-terminated strings are 90two distinct types, and they do not implicitly convert to each other. This 91prevents confusing the two in the way that is problematic with `strlcpy`, for 92instance. However, it creates new problems: 93 94* What is the correct way to transform a character buffer into a NUL-terminated 95 string? 96* When -fbounds-safety flags that the use of a string function was improper, 97 what is the solution? 98 99The most common use of character buffers is to build a string, and then this 100string is passed without bounds as a NUL-terminated string to downstream users. 101-fbounds-safety and XNU enshrine this practice with the following additions: 102 103* `tsnprintf`: like `snprintf`, but it returns a NUL-terminated string; 104* `strbuf` functions, explicitly accepting character buffers and a distinct 105 count for each: 106 * `strbuflen(buffer, length)`: like `strnlen`; 107 * `strbufcmp(a, alen, b, len)`: like `strcmp`; 108 * `strbufcasecmp(a, alen, b, blen)`: like `strcasecmp`; 109 * `strbufcpy(a, alen, b, blen)`: like `strlcpy` but returns `a` as a NUL- 110 terminated string; 111 * `strbufcat(a, alen, b, blen)`: like `strlcat` but returns `a` as a NUL- 112 terminated string; 113* `strl` (new) functions, accepting _one_ character buffer of a known size and 114 _one_ NUL-terminated string: 115 * `strlcmp(a, b, alen)`: like `strcmp`; 116 * `strlcasecmp(a, b, alen)`: like `strcasecmp`. 117 118`strbuf` functions additionally all have overloads accepting character arrays 119in lieu of a pointer+length pair: `strbuflen(array)`, `strbufcmp(a, b)`, 120`strbufcasecmp(a, b)`, `strbufcpy(a, b)`, `strbufcat(a, b)`. 121 122If the destination array of `strbufcpy` or `strbufcat` has a size of 0, they 123return NULL without doing anything else. Otherwise, the destination is always 124NUL-terminated and returned as a NUL-terminated string pointer. 125 126While you are modifying a string, you should reference its data as some flavor 127of indexable pointer, and only once you're done should you convert it to a 128NUL-terminated string. NUL-terminated character pointers are generally not 129suitable for modifications as bounds are determined by contents. Overwriting 130any NUL character found through a `__null_terminated` pointer access will result 131in a trap. For instance: 132 133```c 134void my_string_consuming_func(const char *); 135 136// lots of __unsafe! 137char *__null_terminated my_string = __unsafe_forge_null_terminated( 138 kalloc_data(my_string_size, Z_WAITOK)); 139memcpy( 140 __unsafe_forge_bidi_indexable(void *, my_string, my_string_size), 141 my_data, 142 my_string_size); 143my_string_consuming_func(my_string); 144``` 145 146This code converts the string pointer to a NUL-terminated string too early, 147while it's still being modified. Keeping my_string a `__null_terminated` pointer 148while it's being modified leads to more forging, which has more chances of 149introducing errors, and is less ergonomic. Consider this instead: 150 151```c 152void my_string_consuming_func(const char *); 153 154char *my_buffer = kalloc_data(my_string_size, Z_WAITOK); 155const char *__null_terminated finished_string = 156 strbufcpy(my_buffer, my_string_size, my_data, my_string_size); 157my_string_consuming_func(finished); 158``` 159 160This example has two views of the same data: `my_buffer` (through which the 161string is being modified) and `finished_string` (which is `const` and 162NUL-terminated). Using `my_buffer` as an indexable pointer allows you to modify 163it ergonomically, and importantly, without forging. You turn it into a 164NUL-terminated string at the same time you turn it into a `const` reference, 165signalling that you're done making changes. 166 167With -fbounds-safety enabled, you should structure the final operation modifying 168a character array such that you get a NUL-terminated view of it. For instance, 169this plain C code: 170 171```c 172char thread_name[MAXTHREADNAMESIZE]; 173(void) snprintf(thread_name, sizeof(thread_name), 174 "dlil_input_%s", ifp->if_xname); 175thread_set_thread_name(inp->dlth_thread, thread_name); 176``` 177 178becomes: 179 180```c 181char thread_name_buf[MAXTHREADNAMESIZE]; 182const char *__null_terminated thread_name; 183thread_name = tsnprintf(thread_name_buf, sizeof(thread_name_buf), 184 "dlil_input_%s", ifp->if_xname); 185thread_set_thread_name(inp->dlth_thread, thread_name); 186``` 187 188Although `tsnprintf` and `strbuf` functions return a `__null_terminated` 189pointer to you for convenience, not all use cases are resolved by calling 190`tsnprintf` or `strbufcpy` once. As a quick reference, with -fbounds-safety 191enabled, you can use `__unsafe_null_terminated_from_indexable(p_start, p_nul)` 192to convert a character array to a `__null_terminated` string if you need to 193perform more manipulations. (`p_start` is a pointer to the first character, and 194`p_nul` is a pointer to the NUL character in that string.) For instance, if you 195build a string with successive calls to `scnprintf`, you would use 196`__unsafe_null_terminated_from_indexable` at the end of the sequence to get your 197NUL-terminated string pointer. 198 199Occasionally, you need to turn a NUL-terminated string back into "char buffer" 200(usually to interoperate with copy APIs that need a pointer and a byte count). 201When possible, it's advised to use APIs that copy NUL-terminated pointers (like 202`strlcpy`). Otherwise, convert the NUL-terminated string to an indexable buffer 203using `__null_terminated_to_indexable` (if you don't need the NUL terminator to 204be in bounds of the result pointer) or `__unsafe_null_terminated_to_indexable` 205(if you need it). Also keep in mind that in code which pervasively deals with 206buffers that have lengths and some of them happen to also be NUL-terminated 207strings, it could be simply more convenient to keep string buffers in some 208flavor of indexable pointers instead of having conversions from and to 209NUL-terminated strings. 210 211# I have a choice between `strn*`, `strl*`, `strbuf*`. Which one do I use? 212 213You might come across cases where the same function in different families would 214seem like they all do the trick. For instance: 215 216```c 217struct foo { 218 char buf1[10]; 219 char buf2[16]; 220}; 221 222void bar(struct foo *f) { 223 /* how do I test whether buf1 and buf2 contain the same string? */ 224 if (strcmp(f->buf1, f->buf2) == 0) { /* ... */ } 225 if (strncmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } 226 if (strlcmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } 227 if (strbufcmp(f->buf1, f->buf2) == 0) { /* ... */ } 228} 229``` 230 231Without -fbounds-safety, these all work the same, but when you enable it, 232`strbufcmp` could be the only one that builds. If you do not have the privilege 233of -fbounds-safety to guide you to the best choice, as a rule of thumb, you 234should prefer APIs in the following order: 235 2361. `strbuf*` APIs; 2372. `strl*` APIs; 2383. `str*` APIs. 239 240That is, to implement `bar`, you have a choice of `strcmp`, `strncmp` and 241`strbufcmp`, and you should prefer `strbufcmp`. 242 243`strn` functions are **never** recommended. You should use `strbuflen` over 244`strnlen` (they do the same thing, but having a separate `strbuflen` function 245makes the guidance to avoid `strn` functions easier), and you should use 246`strbufcmp`, `strlcmp` or even `strcmp` over `strncmp` (depending on whether 247you know the length of each string, of just one, or of neither). 248