1.. SPDX-License-Identifier: GPL-2.0+
2
3======
4XArray
5======
6
7:Author: Matthew Wilcox
8
9Overview
10========
11
12The XArray is an abstract data type which behaves like a very large array
13of pointers.  It meets many of the same needs as a hash or a conventional
14resizable array.  Unlike a hash, it allows you to sensibly go to the
15next or previous entry in a cache-efficient manner.  In contrast to a
16resizable array, there is no need to copy data or change MMU mappings in
17order to grow the array.  It is more memory-efficient, parallelisable
18and cache friendly than a doubly-linked list.  It takes advantage of
19RCU to perform lookups without locking.
20
21The XArray implementation is efficient when the indices used are densely
22clustered; hashing the object and using the hash as the index will not
23perform well.  The XArray is optimised for small indices, but still has
24good performance with large indices.  If your index can be larger than
25``ULONG_MAX`` then the XArray is not the data type for you.  The most
26important user of the XArray is the page cache.
27
28Each non-``NULL`` entry in the array has three bits associated with
29it called marks.  Each mark may be set or cleared independently of
30the others.  You can iterate over entries which are marked.
31
32Normal pointers may be stored in the XArray directly.  They must be 4-byte
33aligned, which is true for any pointer returned from :c:func:`kmalloc` and
34:c:func:`alloc_page`.  It isn't true for arbitrary user-space pointers,
35nor for function pointers.  You can store pointers to statically allocated
36objects, as long as those objects have an alignment of at least 4.
37
38You can also store integers between 0 and ``LONG_MAX`` in the XArray.
39You must first convert it into an entry using :c:func:`xa_mk_value`.
40When you retrieve an entry from the XArray, you can check whether it is
41a value entry by calling :c:func:`xa_is_value`, and convert it back to
42an integer by calling :c:func:`xa_to_value`.
43
44Some users want to store tagged pointers instead of using the marks
45described above.  They can call :c:func:`xa_tag_pointer` to create an
46entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry
47back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve
48the tag of an entry.  Tagged pointers use the same bits that are used
49to distinguish value entries from normal pointers, so each user must
50decide whether they want to store value entries or tagged pointers in
51any particular XArray.
52
53The XArray does not support storing :c:func:`IS_ERR` pointers as some
54conflict with value entries or internal entries.
55
56An unusual feature of the XArray is the ability to create entries which
57occupy a range of indices.  Once stored to, looking up any index in
58the range will return the same entry as looking up any other index in
59the range.  Setting a mark on one index will set it on all of them.
60Storing to any index will store to all of them.  Multi-index entries can
61be explicitly split into smaller entries, or storing ``NULL`` into any
62entry will cause the XArray to forget about the range.
63
64Normal API
65==========
66
67Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
68for statically allocated XArrays or :c:func:`xa_init` for dynamically
69allocated ones.  A freshly-initialised XArray contains a ``NULL``
70pointer at every index.
71
72You can then set entries using :c:func:`xa_store` and get entries
73using :c:func:`xa_load`.  xa_store will overwrite any entry with the
74new entry and return the previous entry stored at that index.  You can
75use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
76``NULL`` entry.  There is no difference between an entry that has never
77been stored to, one that has been erased and one that has most recently
78had ``NULL`` stored to it.
79
80You can conditionally replace an entry at an index by using
81:c:func:`xa_cmpxchg`.  Like :c:func:`cmpxchg`, it will only succeed if
82the entry at that index has the 'old' value.  It also returns the entry
83which was at that index; if it returns the same entry which was passed as
84'old', then :c:func:`xa_cmpxchg` succeeded.
85
86If you want to only store a new entry to an index if the current entry
87at that index is ``NULL``, you can use :c:func:`xa_insert` which
88returns ``-EBUSY`` if the entry is not empty.
89
90You can enquire whether a mark is set on an entry by using
91:c:func:`xa_get_mark`.  If the entry is not ``NULL``, you can set a mark
92on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
93calling :c:func:`xa_clear_mark`.  You can ask whether any entry in the
94XArray has a particular mark set by calling :c:func:`xa_marked`.
95
96You can copy entries out of the XArray into a plain array by calling
97:c:func:`xa_extract`.  Or you can iterate over the present entries in
98the XArray by calling :c:func:`xa_for_each`.  You may prefer to use
99:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present
100entry in the XArray.
101
102Calling :c:func:`xa_store_range` stores the same entry in a range
103of indices.  If you do this, some of the other operations will behave
104in a slightly odd way.  For example, marking the entry at one index
105may result in the entry being marked at some, but not all of the other
106indices.  Storing into one index may result in the entry retrieved by
107some, but not all of the other indices changing.
108
109Sometimes you need to ensure that a subsequent call to :c:func:`xa_store`
110will not need to allocate memory.  The :c:func:`xa_reserve` function
111will store a reserved entry at the indicated index.  Users of the
112normal API will see this entry as containing ``NULL``.  If you do
113not need to use the reserved entry, you can call :c:func:`xa_release`
114to remove the unused entry.  If another user has stored to the entry
115in the meantime, :c:func:`xa_release` will do nothing; if instead you
116want the entry to become ``NULL``, you should use :c:func:`xa_erase`.
117Using :c:func:`xa_insert` on a reserved entry will fail.
118
119If all entries in the array are ``NULL``, the :c:func:`xa_empty` function
120will return ``true``.
121
122Finally, you can remove all entries from an XArray by calling
123:c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
124to free the entries first.  You can do this by iterating over all present
125entries in the XArray using the :c:func:`xa_for_each` iterator.
126
127Allocating XArrays
128------------------
129
130If you use :c:func:`DEFINE_XARRAY_ALLOC` to define the XArray, or
131initialise it by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`,
132the XArray changes to track whether entries are in use or not.
133
134You can call :c:func:`xa_alloc` to store the entry at an unused index
135in the XArray.  If you need to modify the array from interrupt context,
136you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable
137interrupts while allocating the ID.
138
139Using :c:func:`xa_store`, :c:func:`xa_cmpxchg` or :c:func:`xa_insert` will
140also mark the entry as being allocated.  Unlike a normal XArray, storing
141``NULL`` will mark the entry as being in use, like :c:func:`xa_reserve`.
142To free an entry, use :c:func:`xa_erase` (or :c:func:`xa_release` if
143you only want to free the entry if it's ``NULL``).
144
145By default, the lowest free entry is allocated starting from 0.  If you
146want to allocate entries starting at 1, it is more efficient to use
147:c:func:`DEFINE_XARRAY_ALLOC1` or ``XA_FLAGS_ALLOC1``.  If you want to
148allocate IDs up to a maximum, then wrap back around to the lowest free
149ID, you can use :c:func:`xa_alloc_cyclic`.
150
151You cannot use ``XA_MARK_0`` with an allocating XArray as this mark
152is used to track whether an entry is free or not.  The other marks are
153available for your use.
154
155Memory allocation
156-----------------
157
158The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`,
159:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t
160parameter in case the XArray needs to allocate memory to store this entry.
161If the entry is being deleted, no memory allocation needs to be performed,
162and the GFP flags specified will be ignored.
163
164It is possible for no memory to be allocatable, particularly if you pass
165a restrictive set of GFP flags.  In that case, the functions return a
166special value which can be turned into an errno using :c:func:`xa_err`.
167If you don't need to know exactly which error occurred, using
168:c:func:`xa_is_err` is slightly more efficient.
169
170Locking
171-------
172
173When using the Normal API, you do not have to worry about locking.
174The XArray uses RCU and an internal spinlock to synchronise access:
175
176No lock needed:
177 * :c:func:`xa_empty`
178 * :c:func:`xa_marked`
179
180Takes RCU read lock:
181 * :c:func:`xa_load`
182 * :c:func:`xa_for_each`
183 * :c:func:`xa_find`
184 * :c:func:`xa_find_after`
185 * :c:func:`xa_extract`
186 * :c:func:`xa_get_mark`
187
188Takes xa_lock internally:
189 * :c:func:`xa_store`
190 * :c:func:`xa_store_bh`
191 * :c:func:`xa_store_irq`
192 * :c:func:`xa_insert`
193 * :c:func:`xa_insert_bh`
194 * :c:func:`xa_insert_irq`
195 * :c:func:`xa_erase`
196 * :c:func:`xa_erase_bh`
197 * :c:func:`xa_erase_irq`
198 * :c:func:`xa_cmpxchg`
199 * :c:func:`xa_cmpxchg_bh`
200 * :c:func:`xa_cmpxchg_irq`
201 * :c:func:`xa_store_range`
202 * :c:func:`xa_alloc`
203 * :c:func:`xa_alloc_bh`
204 * :c:func:`xa_alloc_irq`
205 * :c:func:`xa_reserve`
206 * :c:func:`xa_reserve_bh`
207 * :c:func:`xa_reserve_irq`
208 * :c:func:`xa_destroy`
209 * :c:func:`xa_set_mark`
210 * :c:func:`xa_clear_mark`
211
212Assumes xa_lock held on entry:
213 * :c:func:`__xa_store`
214 * :c:func:`__xa_insert`
215 * :c:func:`__xa_erase`
216 * :c:func:`__xa_cmpxchg`
217 * :c:func:`__xa_alloc`
218 * :c:func:`__xa_reserve`
219 * :c:func:`__xa_set_mark`
220 * :c:func:`__xa_clear_mark`
221
222If you want to take advantage of the lock to protect the data structures
223that you are storing in the XArray, you can call :c:func:`xa_lock`
224before calling :c:func:`xa_load`, then take a reference count on the
225object you have found before calling :c:func:`xa_unlock`.  This will
226prevent stores from removing the object from the array between looking
227up the object and incrementing the refcount.  You can also use RCU to
228avoid dereferencing freed memory, but an explanation of that is beyond
229the scope of this document.
230
231The XArray does not disable interrupts or softirqs while modifying
232the array.  It is safe to read the XArray from interrupt or softirq
233context as the RCU lock provides enough protection.
234
235If, for example, you want to store entries in the XArray in process
236context and then erase them in softirq context, you can do that this way::
237
238    void foo_init(struct foo *foo)
239    {
240        xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
241    }
242
243    int foo_store(struct foo *foo, unsigned long index, void *entry)
244    {
245        int err;
246
247        xa_lock_bh(&foo->array);
248        err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
249        if (!err)
250            foo->count++;
251        xa_unlock_bh(&foo->array);
252        return err;
253    }
254
255    /* foo_erase() is only called from softirq context */
256    void foo_erase(struct foo *foo, unsigned long index)
257    {
258        xa_lock(&foo->array);
259        __xa_erase(&foo->array, index);
260        foo->count--;
261        xa_unlock(&foo->array);
262    }
263
264If you are going to modify the XArray from interrupt or softirq context,
265you need to initialise the array using :c:func:`xa_init_flags`, passing
266``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
267
268The above example also shows a common pattern of wanting to extend the
269coverage of the xa_lock on the store side to protect some statistics
270associated with the array.
271
272Sharing the XArray with interrupt context is also possible, either
273using :c:func:`xa_lock_irqsave` in both the interrupt handler and process
274context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock`
275in the interrupt handler.  Some of the more common patterns have helper
276functions such as :c:func:`xa_store_bh`, :c:func:`xa_store_irq`,
277:c:func:`xa_erase_bh`, :c:func:`xa_erase_irq`, :c:func:`xa_cmpxchg_bh`
278and :c:func:`xa_cmpxchg_irq`.
279
280Sometimes you need to protect access to the XArray with a mutex because
281that lock sits above another mutex in the locking hierarchy.  That does
282not entitle you to use functions like :c:func:`__xa_erase` without taking
283the xa_lock; the xa_lock is used for lockdep validation and will be used
284for other purposes in the future.
285
286The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also
287available for situations where you look up an entry and want to atomically
288set or clear a mark.  It may be more efficient to use the advanced API
289in this case, as it will save you from walking the tree twice.
290
291Advanced API
292============
293
294The advanced API offers more flexibility and better performance at the
295cost of an interface which can be harder to use and has fewer safeguards.
296No locking is done for you by the advanced API, and you are required
297to use the xa_lock while modifying the array.  You can choose whether
298to use the xa_lock or the RCU lock while doing read-only operations on
299the array.  You can mix advanced and normal operations on the same array;
300indeed the normal API is implemented in terms of the advanced API.  The
301advanced API is only available to modules with a GPL-compatible license.
302
303The advanced API is based around the xa_state.  This is an opaque data
304structure which you declare on the stack using the :c:func:`XA_STATE`
305macro.  This macro initialises the xa_state ready to start walking
306around the XArray.  It is used as a cursor to maintain the position
307in the XArray and let you compose various operations together without
308having to restart from the top every time.
309
310The xa_state is also used to store errors.  You can call
311:c:func:`xas_error` to retrieve the error.  All operations check whether
312the xa_state is in an error state before proceeding, so there's no need
313for you to check for an error after each call; you can make multiple
314calls in succession and only check at a convenient point.  The only
315errors currently generated by the XArray code itself are ``ENOMEM`` and
316``EINVAL``, but it supports arbitrary errors in case you want to call
317:c:func:`xas_set_err` yourself.
318
319If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem`
320will attempt to allocate more memory using the specified gfp flags and
321cache it in the xa_state for the next attempt.  The idea is that you take
322the xa_lock, attempt the operation and drop the lock.  The operation
323attempts to allocate memory while holding the lock, but it is more
324likely to fail.  Once you have dropped the lock, :c:func:`xas_nomem`
325can try harder to allocate more memory.  It will return ``true`` if it
326is worth retrying the operation (i.e. that there was a memory error *and*
327more memory was allocated).  If it has previously allocated memory, and
328that memory wasn't used, and there is no error (or some error that isn't
329``ENOMEM``), then it will free the memory previously allocated.
330
331Internal Entries
332----------------
333
334The XArray reserves some entries for its own purposes.  These are never
335exposed through the normal API, but when using the advanced API, it's
336possible to see them.  Usually the best way to handle them is to pass them
337to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
338
339.. flat-table::
340   :widths: 1 1 6
341
342   * - Name
343     - Test
344     - Usage
345
346   * - Node
347     - :c:func:`xa_is_node`
348     - An XArray node.  May be visible when using a multi-index xa_state.
349
350   * - Sibling
351     - :c:func:`xa_is_sibling`
352     - A non-canonical entry for a multi-index entry.  The value indicates
353       which slot in this node has the canonical entry.
354
355   * - Retry
356     - :c:func:`xa_is_retry`
357     - This entry is currently being modified by a thread which has the
358       xa_lock.  The node containing this entry may be freed at the end
359       of this RCU period.  You should restart the lookup from the head
360       of the array.
361
362   * - Zero
363     - :c:func:`xa_is_zero`
364     - Zero entries appear as ``NULL`` through the Normal API, but occupy
365       an entry in the XArray which can be used to reserve the index for
366       future use.  This is used by allocating XArrays for allocated entries
367       which are ``NULL``.
368
369Other internal entries may be added in the future.  As far as possible, they
370will be handled by :c:func:`xas_retry`.
371
372Additional functionality
373------------------------
374
375The :c:func:`xas_create_range` function allocates all the necessary memory
376to store every entry in a range.  It will set ENOMEM in the xa_state if
377it cannot allocate memory.
378
379You can use :c:func:`xas_init_marks` to reset the marks on an entry
380to their default state.  This is usually all marks clear, unless the
381XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
382and all other marks are clear.  Replacing one entry with another using
383:c:func:`xas_store` will not reset the marks on that entry; if you want
384the marks reset, you should do that explicitly.
385
386The :c:func:`xas_load` will walk the xa_state as close to the entry
387as it can.  If you know the xa_state has already been walked to the
388entry and need to check that the entry hasn't changed, you can use
389:c:func:`xas_reload` to save a function call.
390
391If you need to move to a different index in the XArray, call
392:c:func:`xas_set`.  This resets the cursor to the top of the tree, which
393will generally make the next operation walk the cursor to the desired
394spot in the tree.  If you want to move to the next or previous index,
395call :c:func:`xas_next` or :c:func:`xas_prev`.  Setting the index does
396not walk the cursor around the array so does not require a lock to be
397held, while moving to the next or previous index does.
398
399You can search for the next present entry using :c:func:`xas_find`.  This
400is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`;
401if the cursor has been walked to an entry, then it will find the next
402entry after the one currently referenced.  If not, it will return the
403entry at the index of the xa_state.  Using :c:func:`xas_next_entry` to
404move to the next present entry instead of :c:func:`xas_find` will save
405a function call in the majority of cases at the expense of emitting more
406inline code.
407
408The :c:func:`xas_find_marked` function is similar.  If the xa_state has
409not been walked, it will return the entry at the index of the xa_state,
410if it is marked.  Otherwise, it will return the first marked entry after
411the entry referenced by the xa_state.  The :c:func:`xas_next_marked`
412function is the equivalent of :c:func:`xas_next_entry`.
413
414When iterating over a range of the XArray using :c:func:`xas_for_each`
415or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
416the iteration.  The :c:func:`xas_pause` function exists for this purpose.
417After you have done the necessary work and wish to resume, the xa_state
418is in an appropriate state to continue the iteration after the entry
419you last processed.  If you have interrupts disabled while iterating,
420then it is good manners to pause the iteration and reenable interrupts
421every ``XA_CHECK_SCHED`` entries.
422
423The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and
424:c:func:`xas_clear_mark` functions require the xa_state cursor to have
425been moved to the appropriate location in the xarray; they will do
426nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set`
427immediately before.
428
429You can call :c:func:`xas_set_update` to have a callback function
430called each time the XArray updates a node.  This is used by the page
431cache workingset code to maintain its list of nodes which contain only
432shadow entries.
433
434Multi-Index Entries
435-------------------
436
437The XArray has the ability to tie multiple indices together so that
438operations on one index affect all indices.  For example, storing into
439any index will change the value of the entry retrieved from any index.
440Setting or clearing a mark on any index will set or clear the mark
441on every index that is tied together.  The current implementation
442only allows tying ranges which are aligned powers of two together;
443eg indices 64-127 may be tied together, but 2-6 may not be.  This may
444save substantial quantities of memory; for example tying 512 entries
445together will save over 4kB.
446
447You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
448or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
449Calling :c:func:`xas_load` with a multi-index xa_state will walk the
450xa_state to the right location in the tree, but the return value is not
451meaningful, potentially being an internal entry or ``NULL`` even when there
452is an entry stored within the range.  Calling :c:func:`xas_find_conflict`
453will return the first entry within the range or ``NULL`` if there are no
454entries in the range.  The :c:func:`xas_for_each_conflict` iterator will
455iterate over every entry which overlaps the specified range.
456
457If :c:func:`xas_load` encounters a multi-index entry, the xa_index
458in the xa_state will not be changed.  When iterating over an XArray
459or calling :c:func:`xas_find`, if the initial index is in the middle
460of a multi-index entry, it will not be altered.  Subsequent calls
461or iterations will move the index to the first index in the range.
462Each entry will only be returned once, no matter how many indices it
463occupies.
464
465Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state
466is not supported.  Using either of these functions on a multi-index entry
467will reveal sibling entries; these should be skipped over by the caller.
468
469Storing ``NULL`` into any index of a multi-index entry will set the entry
470at every index to ``NULL`` and dissolve the tie.  Splitting a multi-index
471entry into entries occupying smaller ranges is not yet supported.
472
473Functions and structures
474========================
475
476.. kernel-doc:: include/linux/xarray.h
477.. kernel-doc:: lib/xarray.c
478