History log of /linux-6.15/lib/xarray.c (Results 1 – 25 of 73)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# 200a89c1 14-Mar-2025 Zi Yan <[email protected]>

mm/filemap: use xas_try_split() in __filemap_add_folio()

Patch series "Minimize xa_node allocation during xarry split", v3.

When splitting a multi-index entry in XArray from order-n to order-m,
exi

mm/filemap: use xas_try_split() in __filemap_add_folio()

Patch series "Minimize xa_node allocation during xarry split", v3.

When splitting a multi-index entry in XArray from order-n to order-m,
existing xas_split_alloc()+xas_split() approach requires 2^(n %
XA_CHUNK_SHIFT) xa_node allocations. But its callers,
__filemap_add_folio() and shmem_split_large_entry(), use at most 1
xa_node. To minimize xa_node allocation and remove the limitation of no
split from order-12 (or above) to order-0 (or anything between 0 and
5)[1], xas_try_split() was added[2], which allocates (n / XA_CHUNK_SHIFT -
m / XA_CHUNK_SHIFT) xa_node. It is used for non-uniform folio split, but
can be used by __filemap_add_folio() and shmem_split_large_entry().

xas_split_alloc() and xas_split() split an order-9 to order-0:

---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
| | | |
------- --- --- -------
| | ... | |
V V V V
----------- ----------- ----------- -----------
| xa_node | | xa_node | ... | xa_node | | xa_node |
----------- ----------- ----------- -----------

xas_try_split() splits an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
|
|
V
-----------
| xa_node |
-----------

xas_try_split() is designed to be called iteratively with n = m + 1.
xas_try_split_mini_order() is added to minmize the number of calls to
xas_try_split() by telling the caller the next minimal order to split to
instead of n - 1. Splitting order-n to order-m when m= l * XA_CHUNK_SHIFT
does not require xa_node allocation and requires 1 xa_node when n=l *
XA_CHUNK_SHIFT and m = n - 1, so it is OK to use xas_try_split() with n >
m + 1 when no new xa_node is needed.

xfstests quick group test passed on xfs and tmpfs.

[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/


This patch (of 2):

During __filemap_add_folio(), a shadow entry is covering n slots and a
folio covers m slots with m < n is to be added. Instead of splitting all
n slots, only the m slots covered by the folio need to be split and the
remaining n-m shadow entries can be retained with orders ranging from m to
n-1. This method only requires

(n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)

new xa_nodes instead of

(n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT))

new xa_nodes, compared to the original xas_split_alloc() + xas_split()
one. For example, to insert an order-0 folio when an order-9 shadow entry
is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of
8.

xas_try_split_min_order() is introduced to reduce the number of calls to
xas_try_split() during split.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zi Yan <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Mattew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Kirill A. Shuemov <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.14-rc6
# 3fec86f8 07-Mar-2025 Zi Yan <[email protected]>

xarray: add xas_try_split() to split a multi-index entry

Patch series "Buddy allocator like (or non-uniform) folio split", v10.

This patchset adds a new buddy allocator like (or non-uniform) large

xarray: add xas_try_split() to split a multi-index entry

Patch series "Buddy allocator like (or non-uniform) folio split", v10.

This patchset adds a new buddy allocator like (or non-uniform) large folio
split from a order-n folio to order-m with m < n. It reduces

1. the total number of after-split folios from 2^(n-m) to n-m+1;

2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to
n/6-m/6, assuming XA_CHUNK_SHIFT=6;

3. keep more large folios after a split from all order-m folios to
order-(n-1) to order-m folios.

For example, to split an order-9 to order-0, folio split generates 10 (or
11 for anonymous memory) folios instead of 512, allocates 1 xa_node
instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2
order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0
folios.

Instead of duplicating existing split_huge_page*() code, __folio_split()
is introduced as the shared backend code for both
split_huge_page_to_list_to_order() and folio_split(). __folio_split() can
support both uniform split and buddy allocator like (or non-uniform)
split. All existing split_huge_page*() users can be gradually converted
to use folio_split() if possible. In this patchset, I converted
truncate_inode_partial_folio() to use folio_split().

xfstests quick group passed for both tmpfs and xfs. I also
semi-replicated Hugh's test[12] and ran it without any issue for almost 24
hours.


This patch (of 8):

A preparation patch for non-uniform folio split, which always split a
folio into half iteratively, and minimal xarray entry split.

Currently, xas_split_alloc() and xas_split() always split all slots from a
multi-index entry. They cost the same number of xa_node as the
to-be-split slots. For example, to split an order-9 entry, which takes
2^(9-6)=8 slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8
xa_node are needed. Instead xas_try_split() is intended to be used
iteratively to split the order-9 entry into 2 order-8 entries, then split
one order-8 entry, based on the given index, to 2 order-7 entries, ...,
and split one order-1 entry to 2 order-0 entries. When splitting the
order-6 entry and a new xa_node is needed, xas_try_split() will try to
allocate one if possible. As a result, xas_try_split() would only need 1
xa_node instead of 8.

When a new xa_node is needed during the split, xas_try_split() can try to
allocate one but no more. -ENOMEM will be return if a node cannot be
allocated. -EINVAL will be return if a sibling node is split or cascade
split happens, where two or more new nodes are needed, and these are not
supported by xas_try_split().

xas_split_alloc() and xas_split() split an order-9 to order-0:

---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
| | | |
------- --- --- -------
| | ... | |
V V V V
----------- ----------- ----------- -----------
| xa_node | | xa_node | ... | xa_node | | xa_node |
----------- ----------- ----------- -----------

xas_try_split() splits an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
|
|
V
-----------
| xa_node |
-----------

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zi Yan <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Kirill A. Shuemov <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Kairui Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# 13fd5cf3 13-Dec-2024 Kemeng Shi <[email protected]>

Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent

Besides xas_squash_marks(), all functions use xa_mark_t type to iterate
all possible marks. Use xa_mark_t in xas_squash_marks() t

Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent

Besides xas_squash_marks(), all functions use xa_mark_t type to iterate
all possible marks. Use xa_mark_t in xas_squash_marks() to keep code
consistent.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Cc: Mattew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 1988b318 13-Dec-2024 Kemeng Shi <[email protected]>

Xarray: remove repeat check in xas_squash_marks()

Caller of xas_squash_marks() has ensured xas->xa_sibs is non-zero. Just
remove repeat check of xas->xa_sibs in xas_squash_marks().

Link: https://l

Xarray: remove repeat check in xas_squash_marks()

Caller of xas_squash_marks() has ensured xas->xa_sibs is non-zero. Just
remove repeat check of xas->xa_sibs in xas_squash_marks().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Cc: Mattew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 97db889b 13-Dec-2024 Kemeng Shi <[email protected]>

Xarray: distinguish large entries correctly in xas_split_alloc()

We don't support large entries which expand two more level xa_node in
split. For case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order",

Xarray: distinguish large entries correctly in xas_split_alloc()

We don't support large entries which expand two more level xa_node in
split. For case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order", we also
need two level of xa_node to expand. Distinguish entry as large entry in
case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order".

As max order of folio in pagecache (MAX_PAGECACHE_ORDER) is <=
(XA_CHUNK_SHIFT * 2 - 1), this change is more likely a cleanup...

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Cc: Mattew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# c9ba5249 13-Dec-2024 Kemeng Shi <[email protected]>

Xarray: move forward index correctly in xas_pause()

After xas_load(), xas->index could point to mid of found multi-index entry
and xas->index's bits under node->shift maybe non-zero. The afterward

Xarray: move forward index correctly in xas_pause()

After xas_load(), xas->index could point to mid of found multi-index entry
and xas->index's bits under node->shift maybe non-zero. The afterward
xas_pause() will move forward xas->index with xa->node->shift with bits
under node->shift un-masked and thus skip some index unexpectedly.

Consider following case:
Assume XA_CHUNK_SHIFT is 4.
xa_store_range(xa, 16, 31, ...)
xa_store(xa, 32, ...)
XA_STATE(xas, xa, 17);
xas_for_each(&xas,...)
xas_load(&xas)
/* xas->index = 17, xas->xa_offset = 1, xas->xa_node->xa_shift = 4 */
xas_pause()
/* xas->index = 33, xas->xa_offset = 2, xas->xa_node->xa_shift = 4 */
As we can see, index of 32 is skipped unexpectedly.

Fix this by mask bit under node->xa_shift when move forward index in
xas_pause().

For now, this will not cause serious problems. Only minor problem like
cachestat return less number of page status could happen.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Cc: Mattew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 7e060df0 13-Dec-2024 Kemeng Shi <[email protected]>

Xarray: do not return sibling entries from xas_find_marked()

Patch series "Fixes and cleanups to xarray", v5.

This series contains some random fixes and cleanups to xarray. Patch 1-2
are fixes and

Xarray: do not return sibling entries from xas_find_marked()

Patch series "Fixes and cleanups to xarray", v5.

This series contains some random fixes and cleanups to xarray. Patch 1-2
are fixes and patch 3-6 are cleanups. More details can be found in
respective patches.


This patch (of 5):

Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return
sibling entries from xa_load()"), we may return sibling entries from
xas_find_marked as following:
Thread A: Thread B:
xa_store_range(xa, entry, 6, 7, gfp);
xa_set_mark(xa, 6, mark)
XA_STATE(xas, xa, 6);
xas_find_marked(&xas, 7, mark);
offset = xas_find_chunk(xas, advance, mark);
[offset is 6 which points to a valid entry]
xa_store_range(xa, entry, 4, 7, gfp);
entry = xa_entry(xa, node, 6);
[entry is a sibling of 4]
if (!xa_is_node(entry))
return entry;

Skip sibling entry like xas_find() does to protect caller from seeing
sibling entry from xas_find_marked() or caller may use sibling entry
as a valid entry and crash the kernel.

Besides, load_race() test is modified to catch mentioned issue and modified
load_race() only passes after this fix is merged.

Here is an example how this bug could be triggerred in tmpfs which
enables large folio in mapping:
Let's take a look at involved racer:
1. How pages could be created and dirtied in shmem file.
write
ksys_write
vfs_write
new_sync_write
shmem_file_write_iter
generic_perform_write
shmem_write_begin
shmem_get_folio
shmem_allowable_huge_orders
shmem_alloc_and_add_folios
shmem_alloc_folio
__folio_set_locked
shmem_add_to_page_cache
XA_STATE_ORDER(..., index, order)
xax_store()
shmem_write_end
folio_mark_dirty()

2. How dirty pages could be deleted in shmem file.
ioctl
do_vfs_ioctl
file_ioctl
ioctl_preallocate
vfs_fallocate
shmem_fallocate
shmem_truncate_range
shmem_undo_range
truncate_inode_folio
filemap_remove_folio
page_cache_delete
xas_store(&xas, NULL);

3. How dirty pages could be lockless searched
sync_file_range
ksys_sync_file_range
__filemap_fdatawrite_range
filemap_fdatawrite_wbc
do_writepages
writeback_use_writepage
writeback_iter
writeback_get_folio
filemap_get_folios_tag
find_get_entry
folio = xas_find_marked()
folio_try_get(folio)

Kernel will crash as following:
1.Create 2.Search 3.Delete
/* write page 2,3 */
write
...
shmem_write_begin
XA_STATE_ORDER(xas, i_pages, index = 2, order = 1)
xa_store(&xas, folio)
shmem_write_end
folio_mark_dirty()

/* sync page 2 and page 3 */
sync_file_range
...
find_get_entry
folio = xas_find_marked()
/* offset will be 2 */
offset = xas_find_chunk()

/* delete page 2 and page 3 */
ioctl
...
xas_store(&xas, NULL);

/* write page 0-3 */
write
...
shmem_write_begin
XA_STATE_ORDER(xas, i_pages, index = 0, order = 2)
xa_store(&xas, folio)
shmem_write_end
folio_mark_dirty(folio)

/* get sibling entry from offset 2 */
entry = xa_entry(.., 2)
/* use sibling entry as folio and crash kernel */
folio_try_get(folio)

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Cc: Mattew Wilcox <[email protected]> [English fixes]
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.13-rc2, v6.13-rc1, v6.12
# 79ada2ae 12-Nov-2024 Tamir Duberstein <[email protected]>

xarray: extract helper from __xa_{insert,cmpxchg}

Reduce code duplication by extracting a static inline function. This
function is identical to __xa_cmpxchg with the exception that it does not
coer

xarray: extract helper from __xa_{insert,cmpxchg}

Reduce code duplication by extracting a static inline function. This
function is identical to __xa_cmpxchg with the exception that it does not
coerce zero entries to null on the return path.

[[email protected]: fix __xa_erase()]
Link: https://lkml.kernel.org/r/CAJ-ks9kN_qddZ3Ne5d=cADu5POC1rHd4rQcbVSD_spnZOrLLZg@mail.gmail.com
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tamir Duberstein <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 74e2712b 12-Nov-2024 Tamir Duberstein <[email protected]>

xarray: extract xa_zero_to_null

Patch series "xarray: extract __xa_cmpxchg_raw".

This series reduces duplication between __xa_cmpxchg and __xa_insert by
extracting a new function that does not coer

xarray: extract xa_zero_to_null

Patch series "xarray: extract __xa_cmpxchg_raw".

This series reduces duplication between __xa_cmpxchg and __xa_insert by
extracting a new function that does not coerce zero entries to null on the
return path.

The new function may be used by the upcoming Rust xarray abstraction in
its reservation API where it is useful to tell the difference between zero
entries and null slots.


This patch (of 2):

Reduce code duplication by extracting a static inline function that
returns its argument if it is non-zero and NULL otherwise.

This changes xas_result to check for errors before checking for zero but
this cannot change the behavior of existing callers:
- __xa_erase: passes the result of xas_store(_, NULL) which cannot fail.
- __xa_store: passes the result of xas_store(_, entry) which may fail.
xas_store calls xas_create when entry is not NULL which returns NULL
on error, which is immediately checked. This should not change
observable behavior.
- __xa_cmpxchg: passes the result of xas_load(_) which might be zero.
This would previously return NULL regardless of the outcome of
xas_store but xas_store cannot fail if xas_load returns zero
because there is no need to allocate memory.
- xa_store_range: same as __xa_erase.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tamir Duberstein <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5
# ba591801 16-Apr-2024 Long Li <[email protected]>

xarray: inline xas_descend to improve performance

The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes")
modified the xas_descend function in such a way that it was no longer
being co

xarray: inline xas_descend to improve performance

The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes")
modified the xas_descend function in such a way that it was no longer
being compiled as an inline function, because it increased the size of
xas_descend(), and the compiler no longer optimizes it as inline. This
had a negative impact on performance, xas_descend is called frequently to
traverse downwards in the xarray tree, making it a hot function.

Inlining xas_descend has been shown to significantly improve performance
by approximately 4.95% in the iozone write test.

Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
#iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest

Before this patch:

kB reclen write rewrite read reread
67108864 16384 2230080 3637689 6315197 5496027

After this patch:

kB reclen write rewrite read reread
67108864 16384 2340360 3666175 6272401 5460782

Percentage change:
4.95% 0.78% -0.68% -0.64%

This patch introduces inlining to the xas_descend function. While this
change increases the size of lib/xarray.o, the performance gains in
critical workloads make this an acceptable trade-off.

Size comparison before and after patch:
.text .data .bss file
0x3502 0 0 lib/xarray.o.before
0x3602 0 0 lib/xarray.o.after

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Long Li <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: yangerkun <[email protected]>
Cc: Zhang Yi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 2a0774c2 01-May-2024 Matthew Wilcox (Oracle) <[email protected]>

XArray: set the marks correctly when splitting an entry

If we created a new node to replace an entry which had search marks set,
we were setting the search mark on every entry in that node. That wo

XArray: set the marks correctly when splitting an entry

If we created a new node to replace an entry which had search marks set,
we were setting the search mark on every entry in that node. That works
fine when we're splitting to order 0, but when splitting to a larger
order, we must not set the search marks on the sibling entries.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reported-by: Luis Chamberlain <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Luis Chamberlain <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# a4864671 15-Apr-2024 Kairui Song <[email protected]>

lib/xarray: introduce a new helper xas_get_order

It can be used after xas_load to check the order of loaded entries.
Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.

Added new te

lib/xarray: introduce a new helper xas_get_order

It can be used after xas_load to check the order of loaded entries.
Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.

Added new test for xas_get_order, to make the test work, we have to export
xas_get_order with EXPORT_SYMBOL_GPL.

Also fix a sparse warning by checking the slot value with xa_entry instead
of accessing it directly, as suggested by Matthew Wilcox.

[[email protected]: simplify comment, sparse warning fix, per Matthew Wilcox]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5
# e7716c74 21-Aug-2023 Philipp Stanner <[email protected]>

xarray: Document necessary flag in alloc functions

Adds a new line to the docstrings of functions wrapping __xa_alloc() and
__xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC

xarray: Document necessary flag in alloc functions

Adds a new line to the docstrings of functions wrapping __xa_alloc() and
__xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC
being set previously.

The documentation so far says that functions wrapping __xa_alloc() and
__xa_alloc_cyclic() are supposed to return either -ENOMEM or -EBUSY in
case of an error. If the xarray has been initialized without the flag
XA_FLAGS_ALLOC, however, they fail with a different, undocumented error
code.

As hinted at in Documentation/core-api/xarray.rst, wrappers around these
functions should only be invoked when the flag has been set. The
functions' documentation should reflect that as well.

Signed-off-by: Philipp Stanner <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

show more ...


Revision tags: v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4
# cbc02854 27-Jul-2023 Matthew Wilcox (Oracle) <[email protected]>

XArray: Do not return sibling entries from xa_load()

It is possible for xa_load() to observe a sibling entry pointing to
another sibling entry. An example:

Thread A: Thread B:
xa_store_range(x

XArray: Do not return sibling entries from xa_load()

It is possible for xa_load() to observe a sibling entry pointing to
another sibling entry. An example:

Thread A: Thread B:
xa_store_range(xa, entry, 188, 191, gfp);
xa_load(xa, 191);
entry = xa_entry(xa, node, 63);
[entry is a sibling of 188]
xa_store_range(xa, entry, 184, 191, gfp);
if (xa_is_sibling(entry))
offset = xa_to_sibling(entry);
entry = xa_entry(xas->xa, node, offset);
[entry is now a sibling of 184]

It is sufficient to go around this loop until we hit a non-sibling entry.
Sibling entries always point earlier in the node, so we are guaranteed
to terminate this search.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Cc: [email protected]

show more ...


Revision tags: v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3
# bde1597d 16-May-2023 Arnd Bergmann <[email protected]>

radix-tree: move declarations to header

The xarray.c file contains the only call to radix_tree_node_rcu_free(),
and it comes with its own extern declaration for it. This means the
function definiti

radix-tree: move declarations to header

The xarray.c file contains the only call to radix_tree_node_rcu_free(),
and it comes with its own extern declaration for it. This means the
function definition causes a missing-prototype warning:

lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]

Instead, move the declaration for this function to a new header that can
be included by both, and do the same for the radix_tree_node_cachep
variable that has the same underlying problem but does not cause a warning
with gcc.

[[email protected]: fix building radix tree test suite]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Peng Zhang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2
# 69a37a8b 08-Jun-2022 Matthew Wilcox (Oracle) <[email protected]>

mm/huge_memory: Fix xarray node memory leak

If xas_split_alloc() fails to allocate the necessary nodes to complete the
xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem()
then int

mm/huge_memory: Fix xarray node memory leak

If xas_split_alloc() fails to allocate the necessary nodes to complete the
xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem()
then interprets as "Please allocate more memory", not as "Please free
any unnecessary memory" (which was the intended outcome). It's confusing
to use xas_nomem() to free memory in this context, so call xas_destroy()
instead.

Reported-by: [email protected]
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Cc: [email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

show more ...


Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4
# 63b1898f 22-Apr-2022 Matthew Wilcox (Oracle) <[email protected]>

XArray: Disallow sibling entries of nodes

There is a race between xas_split() and xas_load() which can result in
the wrong page being returned, and thus data corruption. Fortunately,
it's hard to h

XArray: Disallow sibling entries of nodes

There is a race between xas_split() and xas_load() which can result in
the wrong page being returned, and thus data corruption. Fortunately,
it's hard to hit (syzbot took three months to find it) and often guarded
with VM_BUG_ON().

The anatomy of this race is:

thread A thread B
order-9 page is stored at index 0x200
lookup of page at index 0x274
page split starts
load of sibling entry at offset 9
stores nodes at offsets 8-15
load of entry at offset 8

The entry at offset 8 turns out to be a node, and so we descend into it,
and load the page at index 0x234 instead of 0x274. This is hard to fix
on the split side; we could replace the entire node that contains the
order-9 page instead of replacing the eight entries. Fixing it on
the lookup side is easier; just disallow sibling entries that point
to nodes. This cannot ever be a useful thing as the descent would not
know the correct offset to use within the new node.

The test suite continues to pass, but I have not added a new test for
this bug.

Reported-by: [email protected]
Tested-by: [email protected]
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

show more ...


Revision tags: v5.18-rc3, v5.18-rc2, v5.18-rc1
# 3ed4bb77 31-Mar-2022 Matthew Wilcox (Oracle) <[email protected]>

XArray: Update the LRU list in xas_split()

When splitting a value entry, we may need to add the new nodes to the LRU
list and remove the parent node from the LRU list. The WARN_ON checks
in shadow_

XArray: Update the LRU list in xas_split()

When splitting a value entry, we may need to add the new nodes to the LRU
list and remove the parent node from the LRU list. The WARN_ON checks
in shadow_lru_isolate() catch this oversight. This bug was latent
until we stopped splitting folios in shrink_page_list() with commit
820c4e2e6f51 ("mm/vmscan: Free non-shmem folios without splitting them").
That allows the creation of large shadow entries, and subsequently when
trying to page in a small page, we will split the large shadow entry
in __filemap_add_folio().

Fixes: 8fc75643c5e1 ("XArray: add xas_split")
Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

show more ...


# 3e3c6580 28-Mar-2022 Matthew Wilcox (Oracle) <[email protected]>

XArray: Fix xas_create_range() when multi-order entry present

If there is already an entry present that is of order >= XA_CHUNK_SHIFT
when we call xas_create_range(), xas_create_range() will misinte

XArray: Fix xas_create_range() when multi-order entry present

If there is already an entry present that is of order >= XA_CHUNK_SHIFT
when we call xas_create_range(), xas_create_range() will misinterpret
that entry as a node and dereference xa_node->parent, generally leading
to a crash that looks something like this:

general protection fault, probably for non-canonical address 0xdffffc0000000001:
0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0
RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline]
RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725

It's deterministically reproducable once you know what the problem is,
but producing it in a live kernel requires khugepaged to hit a race.
While the problem has been present since xas_create_range() was
introduced, I'm not aware of a way to hit it before the page cache was
converted to use multi-index entries.

Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Reported-by: [email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

show more ...


# 9bbdc0f3 22-Mar-2022 Muchun Song <[email protected]>

xarray: use kmem_cache_alloc_lru to allocate xa_node

The workingset will add the xa_node to the shadow_nodes list. So the
allocation of xa_node should be done by kmem_cache_alloc_lru(). Using
xas_

xarray: use kmem_cache_alloc_lru to allocate xa_node

The workingset will add the xa_node to the shadow_nodes list. So the
allocation of xa_node should be done by kmem_cache_alloc_lru(). Using
xas_set_lru() to pass the list_lru which we want to insert xa_node into to
set up the xa_node reclaim context correctly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Fam Zheng <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Kari Argillander <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14
# 25a8de7f 27-Aug-2021 Matthew Wilcox (Oracle) <[email protected]>

XArray: Add xas_advance()

Add a new helper function to help iterate over multi-index entries.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]

XArray: Add xas_advance()

Add a new helper function to help iterate over multi-index entries.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

show more ...


Revision tags: v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5
# 3012110d 19-Nov-2020 Matthew Wilcox (Oracle) <[email protected]>

XArray: Fix splitting to non-zero orders

Splitting an order-4 entry into order-2 entries would leave the array
containing pointers to 000040008000c000 instead of 000044448888cccc.
This is a one-char

XArray: Fix splitting to non-zero orders

Splitting an order-4 entry into order-2 entries would leave the array
containing pointers to 000040008000c000 instead of 000044448888cccc.
This is a one-character fix, but enhance the test suite to check this
case.

Reported-by: Zi Yan <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

show more ...


Revision tags: v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9
# 12efebab 10-Oct-2020 Matthew Wilcox (Oracle) <[email protected]>

XArray: Fix split documentation

I wrote the documentation backwards; the new order of the entry is stored
in the xas and the caller passes the old entry.

Reported-by: Zi Yan <[email protected]>
Signed

XArray: Fix split documentation

I wrote the documentation backwards; the new order of the entry is stored
in the xas and the caller passes the old entry.

Reported-by: Zi Yan <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

show more ...


# 8fc75643 16-Oct-2020 Matthew Wilcox (Oracle) <[email protected]>

XArray: add xas_split

In order to use multi-index entries for huge pages in the page cache, we
need to be able to split a multi-index entry (eg if a file is truncated in
the middle of a huge page en

XArray: add xas_split

In order to use multi-index entries for huge pages in the page cache, we
need to be able to split a multi-index entry (eg if a file is truncated in
the middle of a huge page entry). This version does not support splitting
more than one level of the tree at a time. This is an acceptable
limitation for the page cache as we do not expect to support order-12
pages in the near future.

[[email protected]: export xas_split_alloc() to modules]
[[email protected]: fix xarray split]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: fix xarray]
Link: https://lkml.kernel.org/r/[email protected]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Song Liu <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


# 57417ceb 16-Oct-2020 Matthew Wilcox (Oracle) <[email protected]>

XArray: add xa_get_order

Patch series "Fix read-only THP for non-tmpfs filesystems".

As described more verbosely in the [3/3] changelog, we can inadvertently
put an order-0 page in the page cache w

XArray: add xa_get_order

Patch series "Fix read-only THP for non-tmpfs filesystems".

As described more verbosely in the [3/3] changelog, we can inadvertently
put an order-0 page in the page cache which occupies 512 consecutive
entries. Users are running into this if they enable the
READ_ONLY_THP_FOR_FS config option; see
https://bugzilla.kernel.org/show_bug.cgi?id=206569 and Qian Cai has also
reported it here:
https://lore.kernel.org/lkml/[email protected]/

This is a rather intrusive way of fixing the problem, but has the
advantage that I've actually been testing it with the THP patches, which
means that it sees far more use than it does upstream -- indeed, Song has
been entirely unable to reproduce it. It also has the advantage that it
removes a few patches from my gargantuan backlog of THP patches.

This patch (of 3):

This function returns the order of the entry at the index. We need this
because there isn't space in the shadow entry to encode its order.

[[email protected]: export xa_get_order to modules]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Song Liu <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


123