|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
3fec86f8 |
| 07-Mar-2025 |
Zi Yan <[email protected]> |
xarray: add xas_try_split() to split a multi-index entry
Patch series "Buddy allocator like (or non-uniform) folio split", v10.
This patchset adds a new buddy allocator like (or non-uniform) large
xarray: add xas_try_split() to split a multi-index entry
Patch series "Buddy allocator like (or non-uniform) folio split", v10.
This patchset adds a new buddy allocator like (or non-uniform) large folio split from a order-n folio to order-m with m < n. It reduces
1. the total number of after-split folios from 2^(n-m) to n-m+1;
2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to n/6-m/6, assuming XA_CHUNK_SHIFT=6;
3. keep more large folios after a split from all order-m folios to order-(n-1) to order-m folios.
For example, to split an order-9 to order-0, folio split generates 10 (or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios.
Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split().
xfstests quick group passed for both tmpfs and xfs. I also semi-replicated Hugh's test[12] and ran it without any issue for almost 24 hours.
This patch (of 8):
A preparation patch for non-uniform folio split, which always split a folio into half iteratively, and minimal xarray entry split.
Currently, xas_split_alloc() and xas_split() always split all slots from a multi-index entry. They cost the same number of xa_node as the to-be-split slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are needed. Instead xas_try_split() is intended to be used iteratively to split the order-9 entry into 2 order-8 entries, then split one order-8 entry, based on the given index, to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. When splitting the order-6 entry and a new xa_node is needed, xas_try_split() will try to allocate one if possible. As a result, xas_try_split() would only need 1 xa_node instead of 8.
When a new xa_node is needed during the split, xas_try_split() can try to allocate one but no more. -ENOMEM will be return if a node cannot be allocated. -EINVAL will be return if a sibling node is split or cascade split happens, where two or more new nodes are needed, and these are not supported by xas_try_split().
xas_split_alloc() and xas_split() split an order-9 to order-0:
--------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | | | ------- --- --- ------- | | ... | | V V V V ----------- ----------- ----------- ----------- | xa_node | | xa_node | ... | xa_node | | xa_node | ----------- ----------- ----------- -----------
xas_try_split() splits an order-9 to order-0: --------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | V ----------- | xa_node | -----------
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zi Yan <[email protected]> Cc: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shuemov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Zi Yan <[email protected]> Cc: Kairui Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
8344017a |
| 13-Feb-2025 |
Kemeng Shi <[email protected]> |
test_xarray: fix failure in check_pause when CONFIG_XARRAY_MULTI is not defined
In case CONFIG_XARRAY_MULTI is not defined, xa_store_order can store a multi-index entry but xas_for_each can't tell s
test_xarray: fix failure in check_pause when CONFIG_XARRAY_MULTI is not defined
In case CONFIG_XARRAY_MULTI is not defined, xa_store_order can store a multi-index entry but xas_for_each can't tell sbiling entry from valid entry. So the check_pause failed when we store a multi-index entry and wish xas_for_each can handle it normally. Avoid to store multi-index entry when CONFIG_XARRAY_MULTI is disabled to fix the failure.
Link: https://lkml.kernel.org/r/[email protected] Fixes: c9ba5249ef8b ("Xarray: move forward index correctly in xas_pause()") Signed-off-by: Kemeng Shi <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Closes: https://lore.kernel.org/r/CAMuHMdU_bfadUO=0OZ=AoQ9EAmQPA4wsLCBqohXR+QCeCKRn4A@mail.gmail.com Tested-by: Geert Uytterhoeven <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1 |
|
| #
05033905 |
| 31-Jan-2025 |
Andrew Morton <[email protected]> |
revert "xarray: port tests to kunit"
Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code.
Reported-by: Sidhartha Kumar <sid
revert "xarray: port tests to kunit"
Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code.
Reported-by: Sidhartha Kumar <[email protected]> Closes: https://lkml.kernel.org/r/[email protected] Cc: David Gow <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Tamir Duberstein <[email protected]> Cc: "Liam R. Howlett" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
c9ba5249 |
| 13-Dec-2024 |
Kemeng Shi <[email protected]> |
Xarray: move forward index correctly in xas_pause()
After xas_load(), xas->index could point to mid of found multi-index entry and xas->index's bits under node->shift maybe non-zero. The afterward
Xarray: move forward index correctly in xas_pause()
After xas_load(), xas->index could point to mid of found multi-index entry and xas->index's bits under node->shift maybe non-zero. The afterward xas_pause() will move forward xas->index with xa->node->shift with bits under node->shift un-masked and thus skip some index unexpectedly.
Consider following case: Assume XA_CHUNK_SHIFT is 4. xa_store_range(xa, 16, 31, ...) xa_store(xa, 32, ...) XA_STATE(xas, xa, 17); xas_for_each(&xas,...) xas_load(&xas) /* xas->index = 17, xas->xa_offset = 1, xas->xa_node->xa_shift = 4 */ xas_pause() /* xas->index = 33, xas->xa_offset = 2, xas->xa_node->xa_shift = 4 */ As we can see, index of 32 is skipped unexpectedly.
Fix this by mask bit under node->xa_shift when move forward index in xas_pause().
For now, this will not cause serious problems. Only minor problem like cachestat return less number of page status could happen.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Cc: Mattew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2 |
|
| #
c7bb5cf9 |
| 05-Dec-2024 |
Tamir Duberstein <[email protected]> |
xarray: port tests to kunit
Minimally rewrite the XArray unit tests to use kunit. This integrates nicely with existing kunit tools which produce nicer human-readable output compared to the existing
xarray: port tests to kunit
Minimally rewrite the XArray unit tests to use kunit. This integrates nicely with existing kunit tools which produce nicer human-readable output compared to the existing machinery.
Running the xarray tests before this change requires an obscure invocation
``` tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 \ --kconfig_add CONFIG_TEST_XARRAY=y --raw_output=all nothing ```
which on failure produces
``` BUG at check_reserve:513 ... XArray: 6782340 of 6782364 tests passed ```
and exits 0.
Running the xarray tests after this change requires a simpler invocation
``` tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 \ xarray ```
which on failure produces (colors omitted)
``` [09:50:53] ====================== check_reserve ====================== [09:50:53] [FAILED] param-0 [09:50:53] # check_reserve: EXPECTATION FAILED at lib/test_xarray.c:536 [09:50:53] xa_erase(xa, 12345678) != NULL ... [09:50:53] # module: test_xarray [09:50:53] # xarray: pass:26 fail:3 skip:0 total:29 [09:50:53] # Totals: pass:28 fail:3 skip:0 total:31 [09:50:53] ===================== [FAILED] xarray ====================== ```
and exits 1.
Use of richer kunit assertions is intentionally omitted to reduce the scope of the change.
[[email protected]: fix cocci warning] Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tamir Duberstein <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Naveen N Rao <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Nick Desaulniers <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
757234f1 |
| 01-Jun-2024 |
Jeff Johnson <[email protected]> |
test_xarray: add missing MODULE_DESCRIPTION() macro
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_xarray.o
Add the missing invocation of the
test_xarray: add missing MODULE_DESCRIPTION() macro
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_xarray.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Johnson <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
2a0774c2 |
| 01-May-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: set the marks correctly when splitting an entry
If we created a new node to replace an entry which had search marks set, we were setting the search mark on every entry in that node. That wo
XArray: set the marks correctly when splitting an entry
If we created a new node to replace an entry which had search marks set, we were setting the search mark on every entry in that node. That works fine when we're splitting to order 0, but when splitting to a larger order, we must not set the search marks on the sibling entries.
Link: https://lkml.kernel.org/r/[email protected] Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Luis Chamberlain <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6 |
|
| #
2aaba39e |
| 23-Apr-2024 |
Luis Chamberlain <[email protected]> |
lib/test_xarray.c: fix error assumptions on check_xa_multi_store_adv_add()
While testing lib/test_xarray in userspace I've noticed we can fail with:
make -C tools/testing/radix-tree ./tools/testing
lib/test_xarray.c: fix error assumptions on check_xa_multi_store_adv_add()
While testing lib/test_xarray in userspace I've noticed we can fail with:
make -C tools/testing/radix-tree ./tools/testing/radix-tree/xarray
BUG at check_xa_multi_store_adv_add:749 xarray: 0x55905fb21a00x head 0x55905fa1d8e0x flags 0 marks 0 0 0 0: 0x55905fa1d8e0x xarray: ../../../lib/test_xarray.c:749: check_xa_multi_store_adv_add: Assertion `0' failed. Aborted
We get a failure with a BUG_ON(), and that is because we actually can fail due to -ENOMEM, the check in xas_nomem() will fix this for us so it makes no sense to expect no failure inside the loop. So modify the check and since this is also useful for instructional purposes clarify the situation.
The check for XA_BUG_ON(xa, xa_load(xa, index) != p) is already done at the end of the loop so just remove the bogus on inside the loop.
With this we now pass the test in both kernel and userspace:
In userspace:
./tools/testing/radix-tree/xarray XArray: 149092856 of 149092856 tests passed
In kernel space:
XArray: 148257077 of 148257077 tests passed
Link: https://lkml.kernel.org/r/[email protected] Fixes: a60cc288a1a2 ("test_xarray: add tests for advanced multi-index use") Signed-off-by: Luis Chamberlain <[email protected]> Cc: Daniel Gomez <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: "Liam R. Howlett" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Pankaj Raghav <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5 |
|
| #
6758c112 |
| 15-Apr-2024 |
Kairui Song <[email protected]> |
mm/filemap: optimize filemap folio adding
Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it wi
mm/filemap: optimize filemap folio adding
Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order.
Drop the lock and do the allocation only if a split is needed.
In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split).
Testing with 4K pages, in an 8G cgroup, with 16G brd as block device:
echo 3 > /proc/sys/vm/drop_caches
fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting
Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691
After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651
Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting):
echo 3 > /proc/sys/vm/drop_caches
fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting
fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting
Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976·
READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec
After (+12.5%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146
READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec
The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kairui Song <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
a4864671 |
| 15-Apr-2024 |
Kairui Song <[email protected]> |
lib/xarray: introduce a new helper xas_get_order
It can be used after xas_load to check the order of loaded entries. Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.
Added new te
lib/xarray: introduce a new helper xas_get_order
It can be used after xas_load to check the order of loaded entries. Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.
Added new test for xas_get_order, to make the test work, we have to export xas_get_order with EXPORT_SYMBOL_GPL.
Also fix a sparse warning by checking the slot value with xa_entry instead of accessing it directly, as suggested by Matthew Wilcox.
[[email protected]: simplify comment, sparse warning fix, per Matthew Wilcox] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kairui Song <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3 |
|
| #
e777ae44 |
| 31-Jan-2024 |
Daniel Gomez <[email protected]> |
XArray: add cmpxchg order test
XArray multi-index entries do not keep track of the order stored once the entry is being marked as used with cmpxchg (conditionally replaced with NULL). Add a test to
XArray: add cmpxchg order test
XArray multi-index entries do not keep track of the order stored once the entry is being marked as used with cmpxchg (conditionally replaced with NULL). Add a test to check the order is actually lost. The test also verifies the order and entries for all the tied indexes before and after the NULL replacement with xa_cmpxchg.
Add another entry at 1 << order that keeps the node around and the order information for the NULL-entry after xa_cmpxchg.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Daniel Gomez <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pankaj Raghav <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
a60cc288 |
| 31-Jan-2024 |
Luis Chamberlain <[email protected]> |
test_xarray: add tests for advanced multi-index use
Patch series "test_xarray: advanced API multi-index tests", v2.
This is a respin of the test_xarray multi-index tests [0] which use and demonstra
test_xarray: add tests for advanced multi-index use
Patch series "test_xarray: advanced API multi-index tests", v2.
This is a respin of the test_xarray multi-index tests [0] which use and demonstrate the advanced API which is used by the page cache. This should let folks more easily follow how we use multi-index to support for example a min order later in the page cache. It also lets us grow the selftests to mimic more of what we do in the page cache.
This patch (of 2):
The multi index selftests are great but they don't replicate how we deal with the page cache exactly, which makes it a bit hard to follow as the page cache uses the advanced API.
Add tests which use the advanced API, mimicking what we do in the page cache, while at it, extend the example to do what is needed for min order support.
[[email protected]: fix soft lockup for advanced-api tests] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: s/i/loops/, make non-static] [[email protected]: restore static storage for loop counter] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Tested-by: Daniel Gomez <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pankaj Raghav <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1 |
|
| #
3e3c6580 |
| 28-Mar-2022 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix xas_create_range() when multi-order entry present
If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinte
XArray: Fix xas_create_range() when multi-order entry present
If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinterpret that entry as a node and dereference xa_node->parent, generally leading to a crash that looks something like this:
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0 RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline] RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725
It's deterministically reproducable once you know what the problem is, but producing it in a live kernel requires khugepaged to hit a race. While the problem has been present since xas_create_range() was introduced, I'm not aware of a way to hit it before the page cache was converted to use multi-index entries.
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Reported-by: [email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5 |
|
| #
3012110d |
| 19-Nov-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix splitting to non-zero orders
Splitting an order-4 entry into order-2 entries would leave the array containing pointers to 000040008000c000 instead of 000044448888cccc. This is a one-char
XArray: Fix splitting to non-zero orders
Splitting an order-4 entry into order-2 entries would leave the array containing pointers to 000040008000c000 instead of 000044448888cccc. This is a one-character fix, but enhance the test suite to check this case.
Reported-by: Zi Yan <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1 |
|
| #
8fc75643 |
| 16-Oct-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: add xas_split
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page en
XArray: add xas_split
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page entry). This version does not support splitting more than one level of the tree at a time. This is an acceptable limitation for the page cache as we do not expect to support order-12 pages in the near future.
[[email protected]: export xas_split_alloc() to modules] [[email protected]: fix xarray split] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix xarray] Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Qian Cai <[email protected]> Cc: Song Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
57417ceb |
| 16-Oct-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: add xa_get_order
Patch series "Fix read-only THP for non-tmpfs filesystems".
As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache w
XArray: add xa_get_order
Patch series "Fix read-only THP for non-tmpfs filesystems".
As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache which occupies 512 consecutive entries. Users are running into this if they enable the READ_ONLY_THP_FOR_FS config option; see https://bugzilla.kernel.org/show_bug.cgi?id=206569 and Qian Cai has also reported it here: https://lore.kernel.org/lkml/[email protected]/
This is a rather intrusive way of fixing the problem, but has the advantage that I've actually been testing it with the THP patches, which means that it sees far more use than it does upstream -- indeed, Song has been entirely unable to reproduce it. It also has the advantage that it removes a few patches from my gargantuan backlog of THP patches.
This patch (of 3):
This function returns the order of the entry at the index. We need this because there isn't space in the shadow entry to encode its order.
[[email protected]: export xa_get_order to modules]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Qian Cai <[email protected]> Cc: Song Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2 |
|
| #
f82cd2f0 |
| 18-Aug-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Add private interface for workingset node deletion
Move the tricky bits of dealing with the XArray from the workingset code to the XArray. Make it clear in the documentation that this is a
XArray: Add private interface for workingset node deletion
Move the tricky bits of dealing with the XArray from the workingset code to the XArray. Make it clear in the documentation that this is a private interface, and only export it for the benefit of the test suite.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2 |
|
| #
04e9e9bb |
| 15-Jun-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Test marked multiorder iterations
Demonstrate that starting a marked iteration partway through a marked multi-order entry works.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
|
|
Revision tags: v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1 |
|
| #
062b7359 |
| 31-Mar-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Test two more things about xa_cmpxchg
1. If we xa_cmpxchg() an entry in, it marks the index as not free. 2. If we xa_cmpxchg() NULL in, it marks the index as free.
Signed-off-by: Matthew Wi
XArray: Test two more things about xa_cmpxchg
1. If we xa_cmpxchg() an entry in, it marks the index as not free. 2. If we xa_cmpxchg() NULL in, it marks the index as free.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1 |
|
| #
c36d451a |
| 31-Jan-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix xas_pause for large multi-index entries
Inspired by the recent Coverity report, I looked for other places where the offset wasn't being converted to an unsigned long before being shifted
XArray: Fix xas_pause for large multi-index entries
Inspired by the recent Coverity report, I looked for other places where the offset wasn't being converted to an unsigned long before being shifted, and I found one in xas_pause() when the entry being paused is of order >32.
Fixes: b803b42823d0 ("xarray: Add XArray iterators") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: [email protected]
show more ...
|
| #
bd40b17c |
| 31-Jan-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix xa_find_next for large multi-index entries
Coverity pointed out that xas_sibling() was shifting xa_offset without promoting it to an unsigned long first, so the shift could cause an over
XArray: Fix xa_find_next for large multi-index entries
Coverity pointed out that xas_sibling() was shifting xa_offset without promoting it to an unsigned long first, so the shift could cause an overflow and we'd get the wrong answer. The fix is obvious, and the new test-case provokes UBSAN to report an error: runtime error: shift exponent 60 is too large for 32-bit type 'int'
Fixes: 19c30f4dd092 ("XArray: Fix xa_find_after with multi-index entries") Reported-by: Bjorn Helgaas <[email protected]> Reported-by: Kees Cook <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: [email protected]
show more ...
|
|
Revision tags: v5.5, v5.5-rc7 |
|
| #
c44aa5e8 |
| 18-Jan-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix xas_find returning too many entries
If you call xas_find() with the initial index > max, it should have returned NULL but was returning the entry at index.
Signed-off-by: Matthew Wilcox
XArray: Fix xas_find returning too many entries
If you call xas_find() with the initial index > max, it should have returned NULL but was returning the entry at index.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: [email protected]
show more ...
|
| #
19c30f4d |
| 18-Jan-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix xa_find_after with multi-index entries
If the entry is of an order which is a multiple of XA_CHUNK_SIZE, the current detection of sibling entries does not work. Factor out an xas_siblin
XArray: Fix xa_find_after with multi-index entries
If the entry is of an order which is a multiple of XA_CHUNK_SIZE, the current detection of sibling entries does not work. Factor out an xas_sibling() function to make xa_find_after() a little more understandable, and write a new implementation that doesn't suffer from the same bug.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: [email protected]
show more ...
|
| #
430f24f9 |
| 17-Jan-2020 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix infinite loop with entry at ULONG_MAX
If there is an entry at ULONG_MAX, xa_for_each() will overflow the 'index + 1' in xa_find_after() and wrap around to 0. Catch this case and termina
XArray: Fix infinite loop with entry at ULONG_MAX
If there is an entry at ULONG_MAX, xa_for_each() will overflow the 'index + 1' in xa_find_after() and wrap around to 0. Catch this case and terminate the loop by returning NULL.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: [email protected]
show more ...
|
|
Revision tags: v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7 |
|
| #
82a22311 |
| 08-Nov-2019 |
Matthew Wilcox (Oracle) <[email protected]> |
XArray: Fix xas_pause at ULONG_MAX
If we were unlucky enough to call xas_pause() when the index was at ULONG_MAX (or a multi-slot entry which ends at ULONG_MAX), we would wrap the index back around
XArray: Fix xas_pause at ULONG_MAX
If we were unlucky enough to call xas_pause() when the index was at ULONG_MAX (or a multi-slot entry which ends at ULONG_MAX), we would wrap the index back around to 0 and restart the iteration from the beginning. Use the XAS_BOUNDS state to indicate that we should just stop the iteration.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|