1992a8e60SMatthew Wilcox.. SPDX-License-Identifier: GPL-2.0+ 2992a8e60SMatthew Wilcox 3992a8e60SMatthew Wilcox====== 4992a8e60SMatthew WilcoxXArray 5992a8e60SMatthew Wilcox====== 6992a8e60SMatthew Wilcox 7992a8e60SMatthew Wilcox:Author: Matthew Wilcox 8992a8e60SMatthew Wilcox 9992a8e60SMatthew WilcoxOverview 10992a8e60SMatthew Wilcox======== 11992a8e60SMatthew Wilcox 12992a8e60SMatthew WilcoxThe XArray is an abstract data type which behaves like a very large array 13992a8e60SMatthew Wilcoxof pointers. It meets many of the same needs as a hash or a conventional 14992a8e60SMatthew Wilcoxresizable array. Unlike a hash, it allows you to sensibly go to the 15992a8e60SMatthew Wilcoxnext or previous entry in a cache-efficient manner. In contrast to a 16992a8e60SMatthew Wilcoxresizable array, there is no need to copy data or change MMU mappings in 17992a8e60SMatthew Wilcoxorder to grow the array. It is more memory-efficient, parallelisable 18992a8e60SMatthew Wilcoxand cache friendly than a doubly-linked list. It takes advantage of 19992a8e60SMatthew WilcoxRCU to perform lookups without locking. 20992a8e60SMatthew Wilcox 21992a8e60SMatthew WilcoxThe XArray implementation is efficient when the indices used are densely 22992a8e60SMatthew Wilcoxclustered; hashing the object and using the hash as the index will not 23992a8e60SMatthew Wilcoxperform well. The XArray is optimised for small indices, but still has 24992a8e60SMatthew Wilcoxgood performance with large indices. If your index can be larger than 25992a8e60SMatthew Wilcox``ULONG_MAX`` then the XArray is not the data type for you. The most 26992a8e60SMatthew Wilcoximportant user of the XArray is the page cache. 27992a8e60SMatthew Wilcox 28992a8e60SMatthew WilcoxNormal pointers may be stored in the XArray directly. They must be 4-byte 299c79df7fSJonathan Corbetaligned, which is true for any pointer returned from kmalloc() and 309c79df7fSJonathan Corbetalloc_page(). It isn't true for arbitrary user-space pointers, 31992a8e60SMatthew Wilcoxnor for function pointers. You can store pointers to statically allocated 32992a8e60SMatthew Wilcoxobjects, as long as those objects have an alignment of at least 4. 33992a8e60SMatthew Wilcox 34992a8e60SMatthew WilcoxYou can also store integers between 0 and ``LONG_MAX`` in the XArray. 359c79df7fSJonathan CorbetYou must first convert it into an entry using xa_mk_value(). 36992a8e60SMatthew WilcoxWhen you retrieve an entry from the XArray, you can check whether it is 379c79df7fSJonathan Corbeta value entry by calling xa_is_value(), and convert it back to 389c79df7fSJonathan Corbetan integer by calling xa_to_value(). 39992a8e60SMatthew Wilcox 406b81141dSMatthew Wilcox (Oracle)Some users want to tag the pointers they store in the XArray. You can 416b81141dSMatthew Wilcox (Oracle)call xa_tag_pointer() to create an entry with a tag, xa_untag_pointer() 426b81141dSMatthew Wilcox (Oracle)to turn a tagged entry back into an untagged pointer and xa_pointer_tag() 436b81141dSMatthew Wilcox (Oracle)to retrieve the tag of an entry. Tagged pointers use the same bits that 446b81141dSMatthew Wilcox (Oracle)are used to distinguish value entries from normal pointers, so you must 45eb7a18ebSTamir Dubersteindecide whether you want to store value entries or tagged pointers in any 46eb7a18ebSTamir Dubersteinparticular XArray. 47992a8e60SMatthew Wilcox 489c79df7fSJonathan CorbetThe XArray does not support storing IS_ERR() pointers as some 49992a8e60SMatthew Wilcoxconflict with value entries or internal entries. 50992a8e60SMatthew Wilcox 51992a8e60SMatthew WilcoxAn unusual feature of the XArray is the ability to create entries which 52992a8e60SMatthew Wilcoxoccupy a range of indices. Once stored to, looking up any index in 53992a8e60SMatthew Wilcoxthe range will return the same entry as looking up any other index in 546b81141dSMatthew Wilcox (Oracle)the range. Storing to any index will store to all of them. Multi-index 55eb7a18ebSTamir Dubersteinentries can be explicitly split into smaller entries. Unsetting (using 56eb7a18ebSTamir Dubersteinxa_erase() or xa_store() with ``NULL``) any entry will cause the XArray 57eb7a18ebSTamir Dubersteinto forget about the range. 58992a8e60SMatthew Wilcox 59992a8e60SMatthew WilcoxNormal API 60992a8e60SMatthew Wilcox========== 61992a8e60SMatthew Wilcox 629c79df7fSJonathan CorbetStart by initialising an XArray, either with DEFINE_XARRAY() 639c79df7fSJonathan Corbetfor statically allocated XArrays or xa_init() for dynamically 64992a8e60SMatthew Wilcoxallocated ones. A freshly-initialised XArray contains a ``NULL`` 65992a8e60SMatthew Wilcoxpointer at every index. 66992a8e60SMatthew Wilcox 67eb7a18ebSTamir DubersteinYou can then set entries using xa_store() and get entries using 68eb7a18ebSTamir Dubersteinxa_load(). xa_store() will overwrite any entry with the new entry and 69eb7a18ebSTamir Dubersteinreturn the previous entry stored at that index. You can unset entries 70eb7a18ebSTamir Dubersteinusing xa_erase() or by setting the entry to ``NULL`` using xa_store(). 71eb7a18ebSTamir DubersteinThere is no difference between an entry that has never been stored to 72eb7a18ebSTamir Dubersteinand one that has been erased with xa_erase(); an entry that has most 73eb7a18ebSTamir Dubersteinrecently had ``NULL`` stored to it is also equivalent except if the 74eb7a18ebSTamir DubersteinXArray was initialized with ``XA_FLAGS_ALLOC``. 75992a8e60SMatthew Wilcox 76992a8e60SMatthew WilcoxYou can conditionally replace an entry at an index by using 779c79df7fSJonathan Corbetxa_cmpxchg(). Like cmpxchg(), it will only succeed if 78992a8e60SMatthew Wilcoxthe entry at that index has the 'old' value. It also returns the entry 79992a8e60SMatthew Wilcoxwhich was at that index; if it returns the same entry which was passed as 809c79df7fSJonathan Corbet'old', then xa_cmpxchg() succeeded. 81992a8e60SMatthew Wilcox 82992a8e60SMatthew WilcoxIf you want to only store a new entry to an index if the current entry 839c79df7fSJonathan Corbetat that index is ``NULL``, you can use xa_insert() which 84fd9dc93eSMatthew Wilcoxreturns ``-EBUSY`` if the entry is not empty. 85992a8e60SMatthew Wilcox 86992a8e60SMatthew WilcoxYou can copy entries out of the XArray into a plain array by calling 8700ed452cSMatthew Wilcox (Oracle)xa_extract(). Or you can iterate over the present entries in the XArray 8800ed452cSMatthew Wilcox (Oracle)by calling xa_for_each(), xa_for_each_start() or xa_for_each_range(). 8900ed452cSMatthew Wilcox (Oracle)You may prefer to use xa_find() or xa_find_after() to move to the next 9000ed452cSMatthew Wilcox (Oracle)present entry in the XArray. 91992a8e60SMatthew Wilcox 929c79df7fSJonathan CorbetCalling xa_store_range() stores the same entry in a range 930e9446c3SMatthew Wilcoxof indices. If you do this, some of the other operations will behave 940e9446c3SMatthew Wilcoxin a slightly odd way. For example, marking the entry at one index 950e9446c3SMatthew Wilcoxmay result in the entry being marked at some, but not all of the other 960e9446c3SMatthew Wilcoxindices. Storing into one index may result in the entry retrieved by 970e9446c3SMatthew Wilcoxsome, but not all of the other indices changing. 980e9446c3SMatthew Wilcox 999c79df7fSJonathan CorbetSometimes you need to ensure that a subsequent call to xa_store() 1009c79df7fSJonathan Corbetwill not need to allocate memory. The xa_reserve() function 101b0606fedSMatthew Wilcoxwill store a reserved entry at the indicated index. Users of the 102b0606fedSMatthew Wilcoxnormal API will see this entry as containing ``NULL``. If you do 1039c79df7fSJonathan Corbetnot need to use the reserved entry, you can call xa_release() 104b0606fedSMatthew Wilcoxto remove the unused entry. If another user has stored to the entry 1059c79df7fSJonathan Corbetin the meantime, xa_release() will do nothing; if instead you 1069c79df7fSJonathan Corbetwant the entry to become ``NULL``, you should use xa_erase(). 1079c79df7fSJonathan CorbetUsing xa_insert() on a reserved entry will fail. 1084c0608f4SMatthew Wilcox 1099c79df7fSJonathan CorbetIf all entries in the array are ``NULL``, the xa_empty() function 110804dfaf0SMatthew Wilcoxwill return ``true``. 111804dfaf0SMatthew Wilcox 112992a8e60SMatthew WilcoxFinally, you can remove all entries from an XArray by calling 1139c79df7fSJonathan Corbetxa_destroy(). If the XArray entries are pointers, you may wish 114992a8e60SMatthew Wilcoxto free the entries first. You can do this by iterating over all present 1159c79df7fSJonathan Corbetentries in the XArray using the xa_for_each() iterator. 116992a8e60SMatthew Wilcox 1176b81141dSMatthew Wilcox (Oracle)Search Marks 1186b81141dSMatthew Wilcox (Oracle)------------ 1196b81141dSMatthew Wilcox (Oracle) 1206b81141dSMatthew Wilcox (Oracle)Each entry in the array has three bits associated with it called marks. 1216b81141dSMatthew Wilcox (Oracle)Each mark may be set or cleared independently of the others. You can 1226b81141dSMatthew Wilcox (Oracle)iterate over marked entries by using the xa_for_each_marked() iterator. 1236b81141dSMatthew Wilcox (Oracle) 1246b81141dSMatthew Wilcox (Oracle)You can enquire whether a mark is set on an entry by using 1256b81141dSMatthew Wilcox (Oracle)xa_get_mark(). If the entry is not ``NULL``, you can set a mark on it 1266b81141dSMatthew Wilcox (Oracle)by using xa_set_mark() and remove the mark from an entry by calling 1276b81141dSMatthew Wilcox (Oracle)xa_clear_mark(). You can ask whether any entry in the XArray has a 1286b81141dSMatthew Wilcox (Oracle)particular mark set by calling xa_marked(). Erasing an entry from the 1296b81141dSMatthew Wilcox (Oracle)XArray causes all marks associated with that entry to be cleared. 1306b81141dSMatthew Wilcox (Oracle) 1316b81141dSMatthew Wilcox (Oracle)Setting or clearing a mark on any index of a multi-index entry will 1326b81141dSMatthew Wilcox (Oracle)affect all indices covered by that entry. Querying the mark on any 1336b81141dSMatthew Wilcox (Oracle)index will return the same result. 1346b81141dSMatthew Wilcox (Oracle) 1356b81141dSMatthew Wilcox (Oracle)There is no way to iterate over entries which are not marked; the data 1366b81141dSMatthew Wilcox (Oracle)structure does not allow this to be implemented efficiently. There are 1376b81141dSMatthew Wilcox (Oracle)not currently iterators to search for logical combinations of bits (eg 1386b81141dSMatthew Wilcox (Oracle)iterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2`` 1396b81141dSMatthew Wilcox (Oracle)set, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2`` 1406b81141dSMatthew Wilcox (Oracle)set). It would be possible to add these if a user arises. 1416b81141dSMatthew Wilcox (Oracle) 142d9c48043SMatthew WilcoxAllocating XArrays 143d9c48043SMatthew Wilcox------------------ 144d9c48043SMatthew Wilcox 1459c79df7fSJonathan CorbetIf you use DEFINE_XARRAY_ALLOC() to define the XArray, or 1469c79df7fSJonathan Corbetinitialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(), 147d9c48043SMatthew Wilcoxthe XArray changes to track whether entries are in use or not. 148371c752dSMatthew Wilcox 1499c79df7fSJonathan CorbetYou can call xa_alloc() to store the entry at an unused index 150371c752dSMatthew Wilcoxin the XArray. If you need to modify the array from interrupt context, 1519c79df7fSJonathan Corbetyou can use xa_alloc_bh() or xa_alloc_irq() to disable 152d9c48043SMatthew Wilcoxinterrupts while allocating the ID. 153d9c48043SMatthew Wilcox 1549c79df7fSJonathan CorbetUsing xa_store(), xa_cmpxchg() or xa_insert() will 1553ccaf57aSMatthew Wilcoxalso mark the entry as being allocated. Unlike a normal XArray, storing 1569c79df7fSJonathan Corbet``NULL`` will mark the entry as being in use, like xa_reserve(). 1579c79df7fSJonathan CorbetTo free an entry, use xa_erase() (or xa_release() if 158d9c48043SMatthew Wilcoxyou only want to free the entry if it's ``NULL``). 159d9c48043SMatthew Wilcox 1603ccaf57aSMatthew WilcoxBy default, the lowest free entry is allocated starting from 0. If you 1613ccaf57aSMatthew Wilcoxwant to allocate entries starting at 1, it is more efficient to use 1629c79df7fSJonathan CorbetDEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``. If you want to 1632fa044e5SMatthew Wilcoxallocate IDs up to a maximum, then wrap back around to the lowest free 1649c79df7fSJonathan CorbetID, you can use xa_alloc_cyclic(). 1653ccaf57aSMatthew Wilcox 166d9c48043SMatthew WilcoxYou cannot use ``XA_MARK_0`` with an allocating XArray as this mark 167d9c48043SMatthew Wilcoxis used to track whether an entry is free or not. The other marks are 168d9c48043SMatthew Wilcoxavailable for your use. 169371c752dSMatthew Wilcox 170992a8e60SMatthew WilcoxMemory allocation 171992a8e60SMatthew Wilcox----------------- 172992a8e60SMatthew Wilcox 1739c79df7fSJonathan CorbetThe xa_store(), xa_cmpxchg(), xa_alloc(), 1749c79df7fSJonathan Corbetxa_reserve() and xa_insert() functions take a gfp_t 175371c752dSMatthew Wilcoxparameter in case the XArray needs to allocate memory to store this entry. 176992a8e60SMatthew WilcoxIf the entry is being deleted, no memory allocation needs to be performed, 177992a8e60SMatthew Wilcoxand the GFP flags specified will be ignored. 178992a8e60SMatthew Wilcox 179992a8e60SMatthew WilcoxIt is possible for no memory to be allocatable, particularly if you pass 180992a8e60SMatthew Wilcoxa restrictive set of GFP flags. In that case, the functions return a 1819c79df7fSJonathan Corbetspecial value which can be turned into an errno using xa_err(). 182992a8e60SMatthew WilcoxIf you don't need to know exactly which error occurred, using 1839c79df7fSJonathan Corbetxa_is_err() is slightly more efficient. 184992a8e60SMatthew Wilcox 185992a8e60SMatthew WilcoxLocking 186992a8e60SMatthew Wilcox------- 187992a8e60SMatthew Wilcox 188992a8e60SMatthew WilcoxWhen using the Normal API, you do not have to worry about locking. 189992a8e60SMatthew WilcoxThe XArray uses RCU and an internal spinlock to synchronise access: 190992a8e60SMatthew Wilcox 191992a8e60SMatthew WilcoxNo lock needed: 1929c79df7fSJonathan Corbet * xa_empty() 1939c79df7fSJonathan Corbet * xa_marked() 194992a8e60SMatthew Wilcox 195992a8e60SMatthew WilcoxTakes RCU read lock: 1969c79df7fSJonathan Corbet * xa_load() 1979c79df7fSJonathan Corbet * xa_for_each() 19800ed452cSMatthew Wilcox (Oracle) * xa_for_each_start() 19900ed452cSMatthew Wilcox (Oracle) * xa_for_each_range() 2009c79df7fSJonathan Corbet * xa_find() 2019c79df7fSJonathan Corbet * xa_find_after() 2029c79df7fSJonathan Corbet * xa_extract() 2039c79df7fSJonathan Corbet * xa_get_mark() 204992a8e60SMatthew Wilcox 205992a8e60SMatthew WilcoxTakes xa_lock internally: 2069c79df7fSJonathan Corbet * xa_store() 2079c79df7fSJonathan Corbet * xa_store_bh() 2089c79df7fSJonathan Corbet * xa_store_irq() 2099c79df7fSJonathan Corbet * xa_insert() 2109c79df7fSJonathan Corbet * xa_insert_bh() 2119c79df7fSJonathan Corbet * xa_insert_irq() 2129c79df7fSJonathan Corbet * xa_erase() 2139c79df7fSJonathan Corbet * xa_erase_bh() 2149c79df7fSJonathan Corbet * xa_erase_irq() 2159c79df7fSJonathan Corbet * xa_cmpxchg() 2169c79df7fSJonathan Corbet * xa_cmpxchg_bh() 2179c79df7fSJonathan Corbet * xa_cmpxchg_irq() 2189c79df7fSJonathan Corbet * xa_store_range() 2199c79df7fSJonathan Corbet * xa_alloc() 2209c79df7fSJonathan Corbet * xa_alloc_bh() 2219c79df7fSJonathan Corbet * xa_alloc_irq() 2229c79df7fSJonathan Corbet * xa_reserve() 2239c79df7fSJonathan Corbet * xa_reserve_bh() 2249c79df7fSJonathan Corbet * xa_reserve_irq() 2259c79df7fSJonathan Corbet * xa_destroy() 2269c79df7fSJonathan Corbet * xa_set_mark() 2279c79df7fSJonathan Corbet * xa_clear_mark() 228992a8e60SMatthew Wilcox 229992a8e60SMatthew WilcoxAssumes xa_lock held on entry: 2309c79df7fSJonathan Corbet * __xa_store() 2319c79df7fSJonathan Corbet * __xa_insert() 2329c79df7fSJonathan Corbet * __xa_erase() 2339c79df7fSJonathan Corbet * __xa_cmpxchg() 2349c79df7fSJonathan Corbet * __xa_alloc() 2359c79df7fSJonathan Corbet * __xa_set_mark() 2369c79df7fSJonathan Corbet * __xa_clear_mark() 237992a8e60SMatthew Wilcox 238992a8e60SMatthew WilcoxIf you want to take advantage of the lock to protect the data structures 2399c79df7fSJonathan Corbetthat you are storing in the XArray, you can call xa_lock() 2409c79df7fSJonathan Corbetbefore calling xa_load(), then take a reference count on the 2419c79df7fSJonathan Corbetobject you have found before calling xa_unlock(). This will 242992a8e60SMatthew Wilcoxprevent stores from removing the object from the array between looking 243992a8e60SMatthew Wilcoxup the object and incrementing the refcount. You can also use RCU to 244992a8e60SMatthew Wilcoxavoid dereferencing freed memory, but an explanation of that is beyond 245992a8e60SMatthew Wilcoxthe scope of this document. 246992a8e60SMatthew Wilcox 247992a8e60SMatthew WilcoxThe XArray does not disable interrupts or softirqs while modifying 248992a8e60SMatthew Wilcoxthe array. It is safe to read the XArray from interrupt or softirq 249992a8e60SMatthew Wilcoxcontext as the RCU lock provides enough protection. 250992a8e60SMatthew Wilcox 251992a8e60SMatthew WilcoxIf, for example, you want to store entries in the XArray in process 252992a8e60SMatthew Wilcoxcontext and then erase them in softirq context, you can do that this way:: 253992a8e60SMatthew Wilcox 254992a8e60SMatthew Wilcox void foo_init(struct foo *foo) 255992a8e60SMatthew Wilcox { 256992a8e60SMatthew Wilcox xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH); 257992a8e60SMatthew Wilcox } 258992a8e60SMatthew Wilcox 259992a8e60SMatthew Wilcox int foo_store(struct foo *foo, unsigned long index, void *entry) 260992a8e60SMatthew Wilcox { 261992a8e60SMatthew Wilcox int err; 262992a8e60SMatthew Wilcox 263992a8e60SMatthew Wilcox xa_lock_bh(&foo->array); 264992a8e60SMatthew Wilcox err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL)); 265992a8e60SMatthew Wilcox if (!err) 266992a8e60SMatthew Wilcox foo->count++; 267992a8e60SMatthew Wilcox xa_unlock_bh(&foo->array); 268992a8e60SMatthew Wilcox return err; 269992a8e60SMatthew Wilcox } 270992a8e60SMatthew Wilcox 271992a8e60SMatthew Wilcox /* foo_erase() is only called from softirq context */ 272992a8e60SMatthew Wilcox void foo_erase(struct foo *foo, unsigned long index) 273992a8e60SMatthew Wilcox { 274992a8e60SMatthew Wilcox xa_lock(&foo->array); 275992a8e60SMatthew Wilcox __xa_erase(&foo->array, index); 276992a8e60SMatthew Wilcox foo->count--; 277992a8e60SMatthew Wilcox xa_unlock(&foo->array); 278992a8e60SMatthew Wilcox } 279992a8e60SMatthew Wilcox 280992a8e60SMatthew WilcoxIf you are going to modify the XArray from interrupt or softirq context, 2819c79df7fSJonathan Corbetyou need to initialise the array using xa_init_flags(), passing 282992a8e60SMatthew Wilcox``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``. 283992a8e60SMatthew Wilcox 284992a8e60SMatthew WilcoxThe above example also shows a common pattern of wanting to extend the 285992a8e60SMatthew Wilcoxcoverage of the xa_lock on the store side to protect some statistics 286992a8e60SMatthew Wilcoxassociated with the array. 287992a8e60SMatthew Wilcox 288992a8e60SMatthew WilcoxSharing the XArray with interrupt context is also possible, either 2899c79df7fSJonathan Corbetusing xa_lock_irqsave() in both the interrupt handler and process 2909c79df7fSJonathan Corbetcontext, or xa_lock_irq() in process context and xa_lock() 291992a8e60SMatthew Wilcoxin the interrupt handler. Some of the more common patterns have helper 2929c79df7fSJonathan Corbetfunctions such as xa_store_bh(), xa_store_irq(), 2939c79df7fSJonathan Corbetxa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh() 2949c79df7fSJonathan Corbetand xa_cmpxchg_irq(). 295992a8e60SMatthew Wilcox 296992a8e60SMatthew WilcoxSometimes you need to protect access to the XArray with a mutex because 297992a8e60SMatthew Wilcoxthat lock sits above another mutex in the locking hierarchy. That does 2989c79df7fSJonathan Corbetnot entitle you to use functions like __xa_erase() without taking 299992a8e60SMatthew Wilcoxthe xa_lock; the xa_lock is used for lockdep validation and will be used 300992a8e60SMatthew Wilcoxfor other purposes in the future. 301992a8e60SMatthew Wilcox 3029c79df7fSJonathan CorbetThe __xa_set_mark() and __xa_clear_mark() functions are also 303992a8e60SMatthew Wilcoxavailable for situations where you look up an entry and want to atomically 304992a8e60SMatthew Wilcoxset or clear a mark. It may be more efficient to use the advanced API 305992a8e60SMatthew Wilcoxin this case, as it will save you from walking the tree twice. 306992a8e60SMatthew Wilcox 307992a8e60SMatthew WilcoxAdvanced API 308992a8e60SMatthew Wilcox============ 309992a8e60SMatthew Wilcox 310992a8e60SMatthew WilcoxThe advanced API offers more flexibility and better performance at the 311992a8e60SMatthew Wilcoxcost of an interface which can be harder to use and has fewer safeguards. 312992a8e60SMatthew WilcoxNo locking is done for you by the advanced API, and you are required 313992a8e60SMatthew Wilcoxto use the xa_lock while modifying the array. You can choose whether 314992a8e60SMatthew Wilcoxto use the xa_lock or the RCU lock while doing read-only operations on 315992a8e60SMatthew Wilcoxthe array. You can mix advanced and normal operations on the same array; 316992a8e60SMatthew Wilcoxindeed the normal API is implemented in terms of the advanced API. The 317992a8e60SMatthew Wilcoxadvanced API is only available to modules with a GPL-compatible license. 318992a8e60SMatthew Wilcox 319992a8e60SMatthew WilcoxThe advanced API is based around the xa_state. This is an opaque data 320ac23d1a9SMatthew Wilcox (Oracle)structure which you declare on the stack using the XA_STATE() macro. 321ac23d1a9SMatthew Wilcox (Oracle)This macro initialises the xa_state ready to start walking around the 322ac23d1a9SMatthew Wilcox (Oracle)XArray. It is used as a cursor to maintain the position in the XArray 323ac23d1a9SMatthew Wilcox (Oracle)and let you compose various operations together without having to restart 324ac23d1a9SMatthew Wilcox (Oracle)from the top every time. The contents of the xa_state are protected by 325ac23d1a9SMatthew Wilcox (Oracle)the rcu_read_lock() or the xas_lock(). If you need to drop whichever of 326ac23d1a9SMatthew Wilcox (Oracle)those locks is protecting your state and tree, you must call xas_pause() 327ac23d1a9SMatthew Wilcox (Oracle)so that future calls do not rely on the parts of the state which were 328ac23d1a9SMatthew Wilcox (Oracle)left unprotected. 329992a8e60SMatthew Wilcox 330992a8e60SMatthew WilcoxThe xa_state is also used to store errors. You can call 3319c79df7fSJonathan Corbetxas_error() to retrieve the error. All operations check whether 332992a8e60SMatthew Wilcoxthe xa_state is in an error state before proceeding, so there's no need 333992a8e60SMatthew Wilcoxfor you to check for an error after each call; you can make multiple 334992a8e60SMatthew Wilcoxcalls in succession and only check at a convenient point. The only 335992a8e60SMatthew Wilcoxerrors currently generated by the XArray code itself are ``ENOMEM`` and 336992a8e60SMatthew Wilcox``EINVAL``, but it supports arbitrary errors in case you want to call 3379c79df7fSJonathan Corbetxas_set_err() yourself. 338992a8e60SMatthew Wilcox 3399c79df7fSJonathan CorbetIf the xa_state is holding an ``ENOMEM`` error, calling xas_nomem() 340992a8e60SMatthew Wilcoxwill attempt to allocate more memory using the specified gfp flags and 341992a8e60SMatthew Wilcoxcache it in the xa_state for the next attempt. The idea is that you take 342992a8e60SMatthew Wilcoxthe xa_lock, attempt the operation and drop the lock. The operation 343992a8e60SMatthew Wilcoxattempts to allocate memory while holding the lock, but it is more 3449c79df7fSJonathan Corbetlikely to fail. Once you have dropped the lock, xas_nomem() 345992a8e60SMatthew Wilcoxcan try harder to allocate more memory. It will return ``true`` if it 346992a8e60SMatthew Wilcoxis worth retrying the operation (i.e. that there was a memory error *and* 347992a8e60SMatthew Wilcoxmore memory was allocated). If it has previously allocated memory, and 348992a8e60SMatthew Wilcoxthat memory wasn't used, and there is no error (or some error that isn't 349992a8e60SMatthew Wilcox``ENOMEM``), then it will free the memory previously allocated. 350992a8e60SMatthew Wilcox 351992a8e60SMatthew WilcoxInternal Entries 352992a8e60SMatthew Wilcox---------------- 353992a8e60SMatthew Wilcox 354992a8e60SMatthew WilcoxThe XArray reserves some entries for its own purposes. These are never 355992a8e60SMatthew Wilcoxexposed through the normal API, but when using the advanced API, it's 356992a8e60SMatthew Wilcoxpossible to see them. Usually the best way to handle them is to pass them 3579c79df7fSJonathan Corbetto xas_retry(), and retry the operation if it returns ``true``. 358992a8e60SMatthew Wilcox 359992a8e60SMatthew Wilcox.. flat-table:: 360992a8e60SMatthew Wilcox :widths: 1 1 6 361992a8e60SMatthew Wilcox 362992a8e60SMatthew Wilcox * - Name 363992a8e60SMatthew Wilcox - Test 364992a8e60SMatthew Wilcox - Usage 365992a8e60SMatthew Wilcox 366992a8e60SMatthew Wilcox * - Node 3679c79df7fSJonathan Corbet - xa_is_node() 368992a8e60SMatthew Wilcox - An XArray node. May be visible when using a multi-index xa_state. 369992a8e60SMatthew Wilcox 370992a8e60SMatthew Wilcox * - Sibling 3719c79df7fSJonathan Corbet - xa_is_sibling() 372992a8e60SMatthew Wilcox - A non-canonical entry for a multi-index entry. The value indicates 373992a8e60SMatthew Wilcox which slot in this node has the canonical entry. 374992a8e60SMatthew Wilcox 375992a8e60SMatthew Wilcox * - Retry 3769c79df7fSJonathan Corbet - xa_is_retry() 377992a8e60SMatthew Wilcox - This entry is currently being modified by a thread which has the 378992a8e60SMatthew Wilcox xa_lock. The node containing this entry may be freed at the end 379992a8e60SMatthew Wilcox of this RCU period. You should restart the lookup from the head 380992a8e60SMatthew Wilcox of the array. 381992a8e60SMatthew Wilcox 3829f14d4f1SMatthew Wilcox * - Zero 3839c79df7fSJonathan Corbet - xa_is_zero() 3849f14d4f1SMatthew Wilcox - Zero entries appear as ``NULL`` through the Normal API, but occupy 3859f14d4f1SMatthew Wilcox an entry in the XArray which can be used to reserve the index for 386d9c48043SMatthew Wilcox future use. This is used by allocating XArrays for allocated entries 387d9c48043SMatthew Wilcox which are ``NULL``. 3889f14d4f1SMatthew Wilcox 389992a8e60SMatthew WilcoxOther internal entries may be added in the future. As far as possible, they 3909c79df7fSJonathan Corbetwill be handled by xas_retry(). 391992a8e60SMatthew Wilcox 392992a8e60SMatthew WilcoxAdditional functionality 393992a8e60SMatthew Wilcox------------------------ 394992a8e60SMatthew Wilcox 3959c79df7fSJonathan CorbetThe xas_create_range() function allocates all the necessary memory 396992a8e60SMatthew Wilcoxto store every entry in a range. It will set ENOMEM in the xa_state if 397992a8e60SMatthew Wilcoxit cannot allocate memory. 398992a8e60SMatthew Wilcox 3999c79df7fSJonathan CorbetYou can use xas_init_marks() to reset the marks on an entry 400992a8e60SMatthew Wilcoxto their default state. This is usually all marks clear, unless the 401992a8e60SMatthew WilcoxXArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set 402992a8e60SMatthew Wilcoxand all other marks are clear. Replacing one entry with another using 4039c79df7fSJonathan Corbetxas_store() will not reset the marks on that entry; if you want 404992a8e60SMatthew Wilcoxthe marks reset, you should do that explicitly. 405992a8e60SMatthew Wilcox 4069c79df7fSJonathan CorbetThe xas_load() will walk the xa_state as close to the entry 407992a8e60SMatthew Wilcoxas it can. If you know the xa_state has already been walked to the 408992a8e60SMatthew Wilcoxentry and need to check that the entry hasn't changed, you can use 4099c79df7fSJonathan Corbetxas_reload() to save a function call. 410992a8e60SMatthew Wilcox 411992a8e60SMatthew WilcoxIf you need to move to a different index in the XArray, call 4129c79df7fSJonathan Corbetxas_set(). This resets the cursor to the top of the tree, which 413992a8e60SMatthew Wilcoxwill generally make the next operation walk the cursor to the desired 414992a8e60SMatthew Wilcoxspot in the tree. If you want to move to the next or previous index, 4159c79df7fSJonathan Corbetcall xas_next() or xas_prev(). Setting the index does 416992a8e60SMatthew Wilcoxnot walk the cursor around the array so does not require a lock to be 417992a8e60SMatthew Wilcoxheld, while moving to the next or previous index does. 418992a8e60SMatthew Wilcox 4199c79df7fSJonathan CorbetYou can search for the next present entry using xas_find(). This 4209c79df7fSJonathan Corbetis the equivalent of both xa_find() and xa_find_after(); 421992a8e60SMatthew Wilcoxif the cursor has been walked to an entry, then it will find the next 422992a8e60SMatthew Wilcoxentry after the one currently referenced. If not, it will return the 4239c79df7fSJonathan Corbetentry at the index of the xa_state. Using xas_next_entry() to 4249c79df7fSJonathan Corbetmove to the next present entry instead of xas_find() will save 425992a8e60SMatthew Wilcoxa function call in the majority of cases at the expense of emitting more 426992a8e60SMatthew Wilcoxinline code. 427992a8e60SMatthew Wilcox 4289c79df7fSJonathan CorbetThe xas_find_marked() function is similar. If the xa_state has 429992a8e60SMatthew Wilcoxnot been walked, it will return the entry at the index of the xa_state, 430992a8e60SMatthew Wilcoxif it is marked. Otherwise, it will return the first marked entry after 4319c79df7fSJonathan Corbetthe entry referenced by the xa_state. The xas_next_marked() 4329c79df7fSJonathan Corbetfunction is the equivalent of xas_next_entry(). 433992a8e60SMatthew Wilcox 4349c79df7fSJonathan CorbetWhen iterating over a range of the XArray using xas_for_each() 4359c79df7fSJonathan Corbetor xas_for_each_marked(), it may be necessary to temporarily stop 4369c79df7fSJonathan Corbetthe iteration. The xas_pause() function exists for this purpose. 437992a8e60SMatthew WilcoxAfter you have done the necessary work and wish to resume, the xa_state 438992a8e60SMatthew Wilcoxis in an appropriate state to continue the iteration after the entry 439992a8e60SMatthew Wilcoxyou last processed. If you have interrupts disabled while iterating, 440992a8e60SMatthew Wilcoxthen it is good manners to pause the iteration and reenable interrupts 441992a8e60SMatthew Wilcoxevery ``XA_CHECK_SCHED`` entries. 442992a8e60SMatthew Wilcox 4436b81141dSMatthew Wilcox (Oracle)The xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require 4446b81141dSMatthew Wilcox (Oracle)the xa_state cursor to have been moved to the appropriate location in the 4456b81141dSMatthew Wilcox (Oracle)XArray; they will do nothing if you have called xas_pause() or xas_set() 446992a8e60SMatthew Wilcoximmediately before. 447992a8e60SMatthew Wilcox 4489c79df7fSJonathan CorbetYou can call xas_set_update() to have a callback function 449992a8e60SMatthew Wilcoxcalled each time the XArray updates a node. This is used by the page 450992a8e60SMatthew Wilcoxcache workingset code to maintain its list of nodes which contain only 451992a8e60SMatthew Wilcoxshadow entries. 452992a8e60SMatthew Wilcox 453992a8e60SMatthew WilcoxMulti-Index Entries 454992a8e60SMatthew Wilcox------------------- 455992a8e60SMatthew Wilcox 456992a8e60SMatthew WilcoxThe XArray has the ability to tie multiple indices together so that 457992a8e60SMatthew Wilcoxoperations on one index affect all indices. For example, storing into 458992a8e60SMatthew Wilcoxany index will change the value of the entry retrieved from any index. 459992a8e60SMatthew WilcoxSetting or clearing a mark on any index will set or clear the mark 460992a8e60SMatthew Wilcoxon every index that is tied together. The current implementation 461992a8e60SMatthew Wilcoxonly allows tying ranges which are aligned powers of two together; 462992a8e60SMatthew Wilcoxeg indices 64-127 may be tied together, but 2-6 may not be. This may 463992a8e60SMatthew Wilcoxsave substantial quantities of memory; for example tying 512 entries 464992a8e60SMatthew Wilcoxtogether will save over 4kB. 465992a8e60SMatthew Wilcox 4669c79df7fSJonathan CorbetYou can create a multi-index entry by using XA_STATE_ORDER() 4679c79df7fSJonathan Corbetor xas_set_order() followed by a call to xas_store(). 4689c79df7fSJonathan CorbetCalling xas_load() with a multi-index xa_state will walk the 469992a8e60SMatthew Wilcoxxa_state to the right location in the tree, but the return value is not 470992a8e60SMatthew Wilcoxmeaningful, potentially being an internal entry or ``NULL`` even when there 4719c79df7fSJonathan Corbetis an entry stored within the range. Calling xas_find_conflict() 472992a8e60SMatthew Wilcoxwill return the first entry within the range or ``NULL`` if there are no 4739c79df7fSJonathan Corbetentries in the range. The xas_for_each_conflict() iterator will 474992a8e60SMatthew Wilcoxiterate over every entry which overlaps the specified range. 475992a8e60SMatthew Wilcox 4769c79df7fSJonathan CorbetIf xas_load() encounters a multi-index entry, the xa_index 477992a8e60SMatthew Wilcoxin the xa_state will not be changed. When iterating over an XArray 4789c79df7fSJonathan Corbetor calling xas_find(), if the initial index is in the middle 479992a8e60SMatthew Wilcoxof a multi-index entry, it will not be altered. Subsequent calls 480992a8e60SMatthew Wilcoxor iterations will move the index to the first index in the range. 481992a8e60SMatthew WilcoxEach entry will only be returned once, no matter how many indices it 482992a8e60SMatthew Wilcoxoccupies. 483992a8e60SMatthew Wilcox 4848fc75643SMatthew Wilcox (Oracle)Using xas_next() or xas_prev() with a multi-index xa_state is not 4858fc75643SMatthew Wilcox (Oracle)supported. Using either of these functions on a multi-index entry will 4868fc75643SMatthew Wilcox (Oracle)reveal sibling entries; these should be skipped over by the caller. 487992a8e60SMatthew Wilcox 4888fc75643SMatthew Wilcox (Oracle)Storing ``NULL`` into any index of a multi-index entry will set the 4898fc75643SMatthew Wilcox (Oracle)entry at every index to ``NULL`` and dissolve the tie. A multi-index 4908fc75643SMatthew Wilcox (Oracle)entry can be split into entries occupying smaller ranges by calling 4918fc75643SMatthew Wilcox (Oracle)xas_split_alloc() without the xa_lock held, followed by taking the lock 492*3fec86f8SZi Yanand calling xas_split() or calling xas_try_split() with xa_lock. The 493*3fec86f8SZi Yandifference between xas_split_alloc()+xas_split() and xas_try_alloc() is 494*3fec86f8SZi Yanthat xas_split_alloc() + xas_split() split the entry from the original 495*3fec86f8SZi Yanorder to the new order in one shot uniformly, whereas xas_try_split() 496*3fec86f8SZi Yaniteratively splits the entry containing the index non-uniformly. 497*3fec86f8SZi YanFor example, to split an order-9 entry, which takes 2^(9-6)=8 slots, 498*3fec86f8SZi Yanassuming ``XA_CHUNK_SHIFT`` is 6, xas_split_alloc() + xas_split() need 499*3fec86f8SZi Yan8 xa_node. xas_try_split() splits the order-9 entry into 500*3fec86f8SZi Yan2 order-8 entries, then split one order-8 entry, based on the given index, 501*3fec86f8SZi Yanto 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. 502*3fec86f8SZi YanWhen splitting the order-6 entry and a new xa_node is needed, xas_try_split() 503*3fec86f8SZi Yanwill try to allocate one if possible. As a result, xas_try_split() would only 504*3fec86f8SZi Yanneed 1 xa_node instead of 8. 505992a8e60SMatthew Wilcox 506992a8e60SMatthew WilcoxFunctions and structures 507992a8e60SMatthew Wilcox======================== 508992a8e60SMatthew Wilcox 509992a8e60SMatthew Wilcox.. kernel-doc:: include/linux/xarray.h 510992a8e60SMatthew Wilcox.. kernel-doc:: lib/xarray.c 511