|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
200a89c1 |
| 14-Mar-2025 |
Zi Yan <[email protected]> |
mm/filemap: use xas_try_split() in __filemap_add_folio()
Patch series "Minimize xa_node allocation during xarry split", v3.
When splitting a multi-index entry in XArray from order-n to order-m, exi
mm/filemap: use xas_try_split() in __filemap_add_folio()
Patch series "Minimize xa_node allocation during xarry split", v3.
When splitting a multi-index entry in XArray from order-n to order-m, existing xas_split_alloc()+xas_split() approach requires 2^(n % XA_CHUNK_SHIFT) xa_node allocations. But its callers, __filemap_add_folio() and shmem_split_large_entry(), use at most 1 xa_node. To minimize xa_node allocation and remove the limitation of no split from order-12 (or above) to order-0 (or anything between 0 and 5)[1], xas_try_split() was added[2], which allocates (n / XA_CHUNK_SHIFT - m / XA_CHUNK_SHIFT) xa_node. It is used for non-uniform folio split, but can be used by __filemap_add_folio() and shmem_split_large_entry().
xas_split_alloc() and xas_split() split an order-9 to order-0:
--------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | | | ------- --- --- ------- | | ... | | V V V V ----------- ----------- ----------- ----------- | xa_node | | xa_node | ... | xa_node | | xa_node | ----------- ----------- ----------- -----------
xas_try_split() splits an order-9 to order-0: --------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | V ----------- | xa_node | -----------
xas_try_split() is designed to be called iteratively with n = m + 1. xas_try_split_mini_order() is added to minmize the number of calls to xas_try_split() by telling the caller the next minimal order to split to instead of n - 1. Splitting order-n to order-m when m= l * XA_CHUNK_SHIFT does not require xa_node allocation and requires 1 xa_node when n=l * XA_CHUNK_SHIFT and m = n - 1, so it is OK to use xas_try_split() with n > m + 1 when no new xa_node is needed.
xfstests quick group test passed on xfs and tmpfs.
[1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://lore.kernel.org/linux-mm/[email protected]/
This patch (of 2):
During __filemap_add_folio(), a shadow entry is covering n slots and a folio covers m slots with m < n is to be added. Instead of splitting all n slots, only the m slots covered by the folio need to be split and the remaining n-m shadow entries can be retained with orders ranging from m to n-1. This method only requires
(n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)
new xa_nodes instead of
(n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT))
new xa_nodes, compared to the original xas_split_alloc() + xas_split() one. For example, to insert an order-0 folio when an order-9 shadow entry is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of 8.
xas_try_split_min_order() is introduced to reduce the number of calls to xas_try_split() during split.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zi Yan <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kairui Song <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Mattew Wilcox <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shuemov <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6 |
|
| #
3fec86f8 |
| 07-Mar-2025 |
Zi Yan <[email protected]> |
xarray: add xas_try_split() to split a multi-index entry
Patch series "Buddy allocator like (or non-uniform) folio split", v10.
This patchset adds a new buddy allocator like (or non-uniform) large
xarray: add xas_try_split() to split a multi-index entry
Patch series "Buddy allocator like (or non-uniform) folio split", v10.
This patchset adds a new buddy allocator like (or non-uniform) large folio split from a order-n folio to order-m with m < n. It reduces
1. the total number of after-split folios from 2^(n-m) to n-m+1;
2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to n/6-m/6, assuming XA_CHUNK_SHIFT=6;
3. keep more large folios after a split from all order-m folios to order-(n-1) to order-m folios.
For example, to split an order-9 to order-0, folio split generates 10 (or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios.
Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split().
xfstests quick group passed for both tmpfs and xfs. I also semi-replicated Hugh's test[12] and ran it without any issue for almost 24 hours.
This patch (of 8):
A preparation patch for non-uniform folio split, which always split a folio into half iteratively, and minimal xarray entry split.
Currently, xas_split_alloc() and xas_split() always split all slots from a multi-index entry. They cost the same number of xa_node as the to-be-split slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are needed. Instead xas_try_split() is intended to be used iteratively to split the order-9 entry into 2 order-8 entries, then split one order-8 entry, based on the given index, to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. When splitting the order-6 entry and a new xa_node is needed, xas_try_split() will try to allocate one if possible. As a result, xas_try_split() would only need 1 xa_node instead of 8.
When a new xa_node is needed during the split, xas_try_split() can try to allocate one but no more. -ENOMEM will be return if a node cannot be allocated. -EINVAL will be return if a sibling node is split or cascade split happens, where two or more new nodes are needed, and these are not supported by xas_try_split().
xas_split_alloc() and xas_split() split an order-9 to order-0:
--------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | | | ------- --- --- ------- | | ... | | V V V V ----------- ----------- ----------- ----------- | xa_node | | xa_node | ... | xa_node | | xa_node | ----------- ----------- ----------- -----------
xas_try_split() splits an order-9 to order-0: --------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | V ----------- | xa_node | -----------
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zi Yan <[email protected]> Cc: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shuemov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Zi Yan <[email protected]> Cc: Kairui Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
13fd5cf3 |
| 13-Dec-2024 |
Kemeng Shi <[email protected]> |
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Besides xas_squash_marks(), all functions use xa_mark_t type to iterate all possible marks. Use xa_mark_t in xas_squash_marks() t
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Besides xas_squash_marks(), all functions use xa_mark_t type to iterate all possible marks. Use xa_mark_t in xas_squash_marks() to keep code consistent.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Cc: Mattew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
1988b318 |
| 13-Dec-2024 |
Kemeng Shi <[email protected]> |
Xarray: remove repeat check in xas_squash_marks()
Caller of xas_squash_marks() has ensured xas->xa_sibs is non-zero. Just remove repeat check of xas->xa_sibs in xas_squash_marks().
Link: https://l
Xarray: remove repeat check in xas_squash_marks()
Caller of xas_squash_marks() has ensured xas->xa_sibs is non-zero. Just remove repeat check of xas->xa_sibs in xas_squash_marks().
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Cc: Mattew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
97db889b |
| 13-Dec-2024 |
Kemeng Shi <[email protected]> |
Xarray: distinguish large entries correctly in xas_split_alloc()
We don't support large entries which expand two more level xa_node in split. For case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order",
Xarray: distinguish large entries correctly in xas_split_alloc()
We don't support large entries which expand two more level xa_node in split. For case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order", we also need two level of xa_node to expand. Distinguish entry as large entry in case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order".
As max order of folio in pagecache (MAX_PAGECACHE_ORDER) is <= (XA_CHUNK_SHIFT * 2 - 1), this change is more likely a cleanup...
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Cc: Mattew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
c9ba5249 |
| 13-Dec-2024 |
Kemeng Shi <[email protected]> |
Xarray: move forward index correctly in xas_pause()
After xas_load(), xas->index could point to mid of found multi-index entry and xas->index's bits under node->shift maybe non-zero. The afterward
Xarray: move forward index correctly in xas_pause()
After xas_load(), xas->index could point to mid of found multi-index entry and xas->index's bits under node->shift maybe non-zero. The afterward xas_pause() will move forward xas->index with xa->node->shift with bits under node->shift un-masked and thus skip some index unexpectedly.
Consider following case: Assume XA_CHUNK_SHIFT is 4. xa_store_range(xa, 16, 31, ...) xa_store(xa, 32, ...) XA_STATE(xas, xa, 17); xas_for_each(&xas,...) xas_load(&xas) /* xas->index = 17, xas->xa_offset = 1, xas->xa_node->xa_shift = 4 */ xas_pause() /* xas->index = 33, xas->xa_offset = 2, xas->xa_node->xa_shift = 4 */ As we can see, index of 32 is skipped unexpectedly.
Fix this by mask bit under node->xa_shift when move forward index in xas_pause().
For now, this will not cause serious problems. Only minor problem like cachestat return less number of page status could happen.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Cc: Mattew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
7e060df0 |
| 13-Dec-2024 |
Kemeng Shi <[email protected]> |
Xarray: do not return sibling entries from xas_find_marked()
Patch series "Fixes and cleanups to xarray", v5.
This series contains some random fixes and cleanups to xarray. Patch 1-2 are fixes and
Xarray: do not return sibling entries from xas_find_marked()
Patch series "Fixes and cleanups to xarray", v5.
This series contains some random fixes and cleanups to xarray. Patch 1-2 are fixes and patch 3-6 are cleanups. More details can be found in respective patches.
This patch (of 5):
Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return sibling entries from xa_load()"), we may return sibling entries from xas_find_marked as following: Thread A: Thread B: xa_store_range(xa, entry, 6, 7, gfp); xa_set_mark(xa, 6, mark) XA_STATE(xas, xa, 6); xas_find_marked(&xas, 7, mark); offset = xas_find_chunk(xas, advance, mark); [offset is 6 which points to a valid entry] xa_store_range(xa, entry, 4, 7, gfp); entry = xa_entry(xa, node, 6); [entry is a sibling of 4] if (!xa_is_node(entry)) return entry;
Skip sibling entry like xas_find() does to protect caller from seeing sibling entry from xas_find_marked() or caller may use sibling entry as a valid entry and crash the kernel.
Besides, load_race() test is modified to catch mentioned issue and modified load_race() only passes after this fix is merged.
Here is an example how this bug could be triggerred in tmpfs which enables large folio in mapping: Let's take a look at involved racer: 1. How pages could be created and dirtied in shmem file. write ksys_write vfs_write new_sync_write shmem_file_write_iter generic_perform_write shmem_write_begin shmem_get_folio shmem_allowable_huge_orders shmem_alloc_and_add_folios shmem_alloc_folio __folio_set_locked shmem_add_to_page_cache XA_STATE_ORDER(..., index, order) xax_store() shmem_write_end folio_mark_dirty()
2. How dirty pages could be deleted in shmem file. ioctl do_vfs_ioctl file_ioctl ioctl_preallocate vfs_fallocate shmem_fallocate shmem_truncate_range shmem_undo_range truncate_inode_folio filemap_remove_folio page_cache_delete xas_store(&xas, NULL);
3. How dirty pages could be lockless searched sync_file_range ksys_sync_file_range __filemap_fdatawrite_range filemap_fdatawrite_wbc do_writepages writeback_use_writepage writeback_iter writeback_get_folio filemap_get_folios_tag find_get_entry folio = xas_find_marked() folio_try_get(folio)
Kernel will crash as following: 1.Create 2.Search 3.Delete /* write page 2,3 */ write ... shmem_write_begin XA_STATE_ORDER(xas, i_pages, index = 2, order = 1) xa_store(&xas, folio) shmem_write_end folio_mark_dirty()
/* sync page 2 and page 3 */ sync_file_range ... find_get_entry folio = xas_find_marked() /* offset will be 2 */ offset = xas_find_chunk()
/* delete page 2 and page 3 */ ioctl ... xas_store(&xas, NULL);
/* write page 0-3 */ write ... shmem_write_begin XA_STATE_ORDER(xas, i_pages, index = 0, order = 2) xa_store(&xas, folio) shmem_write_end folio_mark_dirty(folio)
/* get sibling entry from offset 2 */ entry = xa_entry(.., 2) /* use sibling entry as folio and crash kernel */ folio_try_get(folio)
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Cc: Mattew Wilcox <[email protected]> [English fixes] Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
79ada2ae |
| 12-Nov-2024 |
Tamir Duberstein <[email protected]> |
xarray: extract helper from __xa_{insert,cmpxchg}
Reduce code duplication by extracting a static inline function. This function is identical to __xa_cmpxchg with the exception that it does not coer
xarray: extract helper from __xa_{insert,cmpxchg}
Reduce code duplication by extracting a static inline function. This function is identical to __xa_cmpxchg with the exception that it does not coerce zero entries to null on the return path.
[[email protected]: fix __xa_erase()] Link: https://lkml.kernel.org/r/CAJ-ks9kN_qddZ3Ne5d=cADu5POC1rHd4rQcbVSD_spnZOrLLZg@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tamir Duberstein <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
74e2712b |
| 12-Nov-2024 |
Tamir Duberstein <[email protected]> |
xarray: extract xa_zero_to_null
Patch series "xarray: extract __xa_cmpxchg_raw".
This series reduces duplication between __xa_cmpxchg and __xa_insert by extracting a new function that does not coer
xarray: extract xa_zero_to_null
Patch series "xarray: extract __xa_cmpxchg_raw".
This series reduces duplication between __xa_cmpxchg and __xa_insert by extracting a new function that does not coerce zero entries to null on the return path.
The new function may be used by the upcoming Rust xarray abstraction in its reservation API where it is useful to tell the difference between zero entries and null slots.
This patch (of 2):
Reduce code duplication by extracting a static inline function that returns its argument if it is non-zero and NULL otherwise.
This changes xas_result to check for errors before checking for zero but this cannot change the behavior of existing callers: - __xa_erase: passes the result of xas_store(_, NULL) which cannot fail. - __xa_store: passes the result of xas_store(_, entry) which may fail. xas_store calls xas_create when entry is not NULL which returns NULL on error, which is immediately checked. This should not change observable behavior. - __xa_cmpxchg: passes the result of xas_load(_) which might be zero. This would previously return NULL regardless of the outcome of xas_store but xas_store cannot fail if xas_load returns zero because there is no need to allocate memory. - xa_store_range: same as __xa_erase.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tamir Duberstein <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5 |
|
| #
ba591801 |
| 16-Apr-2024 |
Long Li <[email protected]> |
xarray: inline xas_descend to improve performance
The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") modified the xas_descend function in such a way that it was no longer being co
xarray: inline xas_descend to improve performance
The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") modified the xas_descend function in such a way that it was no longer being compiled as an inline function, because it increased the size of xas_descend(), and the compiler no longer optimizes it as inline. This had a negative impact on performance, xas_descend is called frequently to traverse downwards in the xarray tree, making it a hot function.
Inlining xas_descend has been shown to significantly improve performance by approximately 4.95% in the iozone write test.
Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz #iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest
Before this patch:
kB reclen write rewrite read reread 67108864 16384 2230080 3637689 6315197 5496027
After this patch:
kB reclen write rewrite read reread 67108864 16384 2340360 3666175 6272401 5460782
Percentage change: 4.95% 0.78% -0.68% -0.64%
This patch introduces inlining to the xas_descend function. While this change increases the size of lib/xarray.o, the performance gains in critical workloads make this an acceptable trade-off.
Size comparison before and after patch: .text .data .bss file 0x3502 0 0 lib/xarray.o.before 0x3602 0 0 lib/xarray.o.after
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Long Li <[email protected]> Cc: Hou Tao <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: yangerkun <[email protected]> Cc: Zhang Yi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
2a0774c2 |
| 01-May-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: set the marks correctly when splitting an entry
If we created a new node to replace an entry which had search marks set, we were setting the search mark on every entry in that node. That wo
XArray: set the marks correctly when splitting an entry
If we created a new node to replace an entry which had search marks set, we were setting the search mark on every entry in that node. That works fine when we're splitting to order 0, but when splitting to a larger order, we must not set the search marks on the sibling entries.
Link: https://lkml.kernel.org/r/[email protected] Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Luis Chamberlain <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
a4864671 |
| 15-Apr-2024 |
Kairui Song <[email protected]> |
lib/xarray: introduce a new helper xas_get_order
It can be used after xas_load to check the order of loaded entries. Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.
Added new te
lib/xarray: introduce a new helper xas_get_order
It can be used after xas_load to check the order of loaded entries. Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.
Added new test for xas_get_order, to make the test work, we have to export xas_get_order with EXPORT_SYMBOL_GPL.
Also fix a sparse warning by checking the slot value with xa_entry instead of accessing it directly, as suggested by Matthew Wilcox.
[[email protected]: simplify comment, sparse warning fix, per Matthew Wilcox] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kairui Song <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5 |
|
| #
e7716c74 |
| 21-Aug-2023 |
Philipp Stanner <[email protected]> |
xarray: Document necessary flag in alloc functions
Adds a new line to the docstrings of functions wrapping __xa_alloc() and __xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC
xarray: Document necessary flag in alloc functions
Adds a new line to the docstrings of functions wrapping __xa_alloc() and __xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC being set previously.
The documentation so far says that functions wrapping __xa_alloc() and __xa_alloc_cyclic() are supposed to return either -ENOMEM or -EBUSY in case of an error. If the xarray has been initialized without the flag XA_FLAGS_ALLOC, however, they fail with a different, undocumented error code.
As hinted at in Documentation/core-api/xarray.rst, wrappers around these functions should only be invoked when the flag has been set. The functions' documentation should reflect that as well.
Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4 |
|
| #
cbc02854 |
| 27-Jul-2023 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Do not return sibling entries from xa_load()
It is possible for xa_load() to observe a sibling entry pointing to another sibling entry. An example:
Thread A: Thread B: xa_store_range(x
XArray: Do not return sibling entries from xa_load()
It is possible for xa_load() to observe a sibling entry pointing to another sibling entry. An example:
Thread A: Thread B: xa_store_range(xa, entry, 188, 191, gfp); xa_load(xa, 191); entry = xa_entry(xa, node, 63); [entry is a sibling of 188] xa_store_range(xa, entry, 184, 191, gfp); if (xa_is_sibling(entry)) offset = xa_to_sibling(entry); entry = xa_entry(xas->xa, node, offset); [entry is now a sibling of 184]
It is sufficient to go around this loop until we hit a non-sibling entry. Sibling entries always point earlier in the node, so we are guaranteed to terminate this search.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Cc: [email protected]
show more ...
|
|
Revision tags: v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3 |
|
| #
bde1597d |
| 16-May-2023 |
Arnd Bergmann <[email protected]> |
radix-tree: move declarations to header
The xarray.c file contains the only call to radix_tree_node_rcu_free(), and it comes with its own extern declaration for it. This means the function definiti
radix-tree: move declarations to header
The xarray.c file contains the only call to radix_tree_node_rcu_free(), and it comes with its own extern declaration for it. This means the function definition causes a missing-prototype warning:
lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]
Instead, move the declaration for this function to a new header that can be included by both, and do the same for the radix_tree_node_cachep variable that has the same underlying problem but does not cause a warning with gcc.
[[email protected]: fix building radix tree test suite] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Peng Zhang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2 |
|
| #
69a37a8b |
| 08-Jun-2022 |
Matthew Wilcox (Oracle) <[email protected]> |
mm/huge_memory: Fix xarray node memory leak
If xas_split_alloc() fails to allocate the necessary nodes to complete the xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem() then int
mm/huge_memory: Fix xarray node memory leak
If xas_split_alloc() fails to allocate the necessary nodes to complete the xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem() then interprets as "Please allocate more memory", not as "Please free any unnecessary memory" (which was the intended outcome). It's confusing to use xas_nomem() to free memory in this context, so call xas_destroy() instead.
Reported-by: [email protected] Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Cc: [email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4 |
|
| #
63b1898f |
| 22-Apr-2022 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Disallow sibling entries of nodes
There is a race between xas_split() and xas_load() which can result in the wrong page being returned, and thus data corruption. Fortunately, it's hard to h
XArray: Disallow sibling entries of nodes
There is a race between xas_split() and xas_load() which can result in the wrong page being returned, and thus data corruption. Fortunately, it's hard to hit (syzbot took three months to find it) and often guarded with VM_BUG_ON().
The anatomy of this race is:
thread A thread B order-9 page is stored at index 0x200 lookup of page at index 0x274 page split starts load of sibling entry at offset 9 stores nodes at offsets 8-15 load of entry at offset 8
The entry at offset 8 turns out to be a node, and so we descend into it, and load the page at index 0x234 instead of 0x274. This is hard to fix on the split side; we could replace the entire node that contains the order-9 page instead of replacing the eight entries. Fixing it on the lookup side is easier; just disallow sibling entries that point to nodes. This cannot ever be a useful thing as the descent would not know the correct offset to use within the new node.
The test suite continues to pass, but I have not added a new test for this bug.
Reported-by: [email protected] Tested-by: [email protected] Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc3, v5.18-rc2, v5.18-rc1 |
|
| #
3ed4bb77 |
| 31-Mar-2022 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Update the LRU list in xas_split()
When splitting a value entry, we may need to add the new nodes to the LRU list and remove the parent node from the LRU list. The WARN_ON checks in shadow_
XArray: Update the LRU list in xas_split()
When splitting a value entry, we may need to add the new nodes to the LRU list and remove the parent node from the LRU list. The WARN_ON checks in shadow_lru_isolate() catch this oversight. This bug was latent until we stopped splitting folios in shrink_page_list() with commit 820c4e2e6f51 ("mm/vmscan: Free non-shmem folios without splitting them"). That allows the creation of large shadow entries, and subsequently when trying to page in a small page, we will split the large shadow entry in __filemap_add_folio().
Fixes: 8fc75643c5e1 ("XArray: add xas_split") Reported-by: Hugh Dickins <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
| #
3e3c6580 |
| 28-Mar-2022 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix xas_create_range() when multi-order entry present
If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinte
XArray: Fix xas_create_range() when multi-order entry present
If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinterpret that entry as a node and dereference xa_node->parent, generally leading to a crash that looks something like this:
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0 RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline] RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725
It's deterministically reproducable once you know what the problem is, but producing it in a live kernel requires khugepaged to hit a race. While the problem has been present since xas_create_range() was introduced, I'm not aware of a way to hit it before the page cache was converted to use multi-index entries.
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Reported-by: [email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
| #
9bbdc0f3 |
| 22-Mar-2022 |
Muchun Song <[email protected]> |
xarray: use kmem_cache_alloc_lru to allocate xa_node
The workingset will add the xa_node to the shadow_nodes list. So the allocation of xa_node should be done by kmem_cache_alloc_lru(). Using xas_
xarray: use kmem_cache_alloc_lru to allocate xa_node
The workingset will add the xa_node to the shadow_nodes list. So the allocation of xa_node should be done by kmem_cache_alloc_lru(). Using xas_set_lru() to pass the list_lru which we want to insert xa_node into to set up the xa_node reclaim context correctly.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Alex Shi <[email protected]> Cc: Anna Schumaker <[email protected]> Cc: Chao Yu <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Fam Zheng <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Kari Argillander <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14 |
|
| #
25a8de7f |
| 27-Aug-2021 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Add xas_advance()
Add a new helper function to help iterate over multi-index entries.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]
XArray: Add xas_advance()
Add a new helper function to help iterate over multi-index entries.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5 |
|
| #
3012110d |
| 19-Nov-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix splitting to non-zero orders
Splitting an order-4 entry into order-2 entries would leave the array containing pointers to 000040008000c000 instead of 000044448888cccc. This is a one-char
XArray: Fix splitting to non-zero orders
Splitting an order-4 entry into order-2 entries would leave the array containing pointers to 000040008000c000 instead of 000044448888cccc. This is a one-character fix, but enhance the test suite to check this case.
Reported-by: Zi Yan <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9 |
|
| #
12efebab |
| 10-Oct-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix split documentation
I wrote the documentation backwards; the new order of the entry is stored in the xas and the caller passes the old entry.
Reported-by: Zi Yan <[email protected]> Signed
XArray: Fix split documentation
I wrote the documentation backwards; the new order of the entry is stored in the xas and the caller passes the old entry.
Reported-by: Zi Yan <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
| #
8fc75643 |
| 16-Oct-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: add xas_split
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page en
XArray: add xas_split
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page entry). This version does not support splitting more than one level of the tree at a time. This is an acceptable limitation for the page cache as we do not expect to support order-12 pages in the near future.
[[email protected]: export xas_split_alloc() to modules] [[email protected]: fix xarray split] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix xarray] Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Qian Cai <[email protected]> Cc: Song Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
57417ceb |
| 16-Oct-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: add xa_get_order
Patch series "Fix read-only THP for non-tmpfs filesystems".
As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache w
XArray: add xa_get_order
Patch series "Fix read-only THP for non-tmpfs filesystems".
As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache which occupies 512 consecutive entries. Users are running into this if they enable the READ_ONLY_THP_FOR_FS config option; see https://bugzilla.kernel.org/show_bug.cgi?id=206569 and Qian Cai has also reported it here: https://lore.kernel.org/lkml/[email protected]/
This is a rather intrusive way of fixing the problem, but has the advantage that I've actually been testing it with the THP patches, which means that it sees far more use than it does upstream -- indeed, Song has been entirely unable to reproduce it. It also has the advantage that it removes a few patches from my gargantuan backlog of THP patches.
This patch (of 3):
This function returns the order of the entry at the index. We need this because there isn't space in the shadow entry to encode its order.
[[email protected]: export xa_get_order to modules]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Qian Cai <[email protected]> Cc: Song Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|