|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
d01c495f |
| 12-Feb-2025 |
David Howells <[email protected]> |
netfs: Add retry stat counters
Add stat counters to count the number of request and subrequest retries and display them in /proc/fs/netfs/stats.
Signed-off-by: David Howells <[email protected]> L
netfs: Add retry stat counters
Add stat counters to count the number of request and subrequest retries and display them in /proc/fs/netfs/stats.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
e2d46f2e |
| 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Change the read result collector to only use one work item
Change the way netfslib collects read results to do all the collection for a particular read request using a single work item that w
netfs: Change the read result collector to only use one work item
Change the way netfslib collects read results to do all the collection for a particular read request using a single work item that walks along the subrequest queue as subrequests make progress or complete, unlocking folios progressively rather than doing the unlock in parallel as parallel requests come in.
The code is remodelled to be more like the write-side code, though only using a single stream. This makes it more directly comparable and thus easier to duplicate fixes between the two sides.
This has a number of advantages:
(1) It's simpler. There doesn't need to be a complex donation mechanism to handle mismatches between the size and alignment of subrequests and folios. The collector unlocks folios as the subrequests covering each complete.
(2) It should cause less scheduler overhead as there's a single work item in play unlocking pages in parallel when a read gets split up into a lot of subrequests instead of one per subrequest.
Whilst the parallellism is nice in theory, in practice, the vast majority of loads are sequential reads of the whole file, so committing a bunch of threads to unlocking folios out of order doesn't help in those cases.
(3) It should make it easier to implement content decryption. A folio cannot be decrypted until all the requests that contribute to it have completed - and, again, most loads are sequential and so, most of the time, we want to begin decryption sequentially (though it's great if the decryption can happen in parallel).
There is a disadvantage in that we're losing the ability to decrypt and unlock things on an as-things-arrive basis which may affect some applications.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
49866ce7 |
| 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Add support for caching single monolithic objects such as AFS dirs
Add support for caching the content of a file that contains a single monolithic object that must be read/written with a sing
netfs: Add support for caching single monolithic objects such as AFS dirs
Add support for caching the content of a file that contains a single monolithic object that must be read/written with a single I/O operation, such as an AFS directory.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
627cf645 |
| 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Don't use bh spinlock
All the accessing of the subrequest lists is now done in process context, possibly in a workqueue, but not now in a BH context, so we don't need the lock against BH inte
netfs: Don't use bh spinlock
All the accessing of the subrequest lists is now done in process context, possibly in a workqueue, but not now in a BH context, so we don't need the lock against BH interference when taking the netfs_io_request::lock spinlock.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
d606c362 |
| 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Make netfs_advance_write() return size_t
netfs_advance_write() calculates the amount of data it's attaching to a stream with size_t, but then returns this as an int. Switch the return value
netfs: Make netfs_advance_write() return size_t
netfs_advance_write() calculates the amount of data it's attaching to a stream with size_t, but then returns this as an int. Switch the return value to size_t for consistency.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
06fa229c |
| 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Abstract out a rolling folio buffer implementation
A rolling buffer is a series of folios held in a list of folio_queues. New folios and folio_queue structs may be inserted at the head simul
netfs: Abstract out a rolling folio buffer implementation
A rolling buffer is a series of folios held in a list of folio_queues. New folios and folio_queue structs may be inserted at the head simultaneously with spent ones being removed from the tail without the need for locking.
The rolling buffer includes an iov_iter and it has to be careful managing this as the list of folio_queues is extended such that an oops doesn't incurred because the iterator was pointing to the end of a folio_queue segment that got appended to and then removed.
We need to use the mechanism twice, once for read and once for write, and, in future patches, we will use a second rolling buffer to handle bounce buffering for content encryption.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
aabcabf2 |
| 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Add a tracepoint to log the lifespan of folio_queue structs
Add a tracepoint to log the lifespan of folio_queue structs. For tracing illustrative purposes, folio_queues are tagged with the d
netfs: Add a tracepoint to log the lifespan of folio_queue structs
Add a tracepoint to log the lifespan of folio_queue structs. For tracing illustrative purposes, folio_queues are tagged with the debug ID of whatever they're related to (typically a netfs_io_request) and a debug ID of their own.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3 |
|
| #
4acb665c |
| 13-Dec-2024 |
David Howells <[email protected]> |
netfs: Work around recursion by abandoning retry if nothing read
syzkaller reported recursion with a loop of three calls (netfs_rreq_assess, netfs_retry_reads and netfs_rreq_terminated) hitting the
netfs: Work around recursion by abandoning retry if nothing read
syzkaller reported recursion with a loop of three calls (netfs_rreq_assess, netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack during an unbuffered or direct I/O read.
There are a number of issues:
(1) There is no limit on the number of retries.
(2) A subrequest is supposed to be abandoned if it does not transfer anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all circumstances.
(3) The actual root cause, which is this:
if (atomic_dec_and_test(&rreq->nr_outstanding)) netfs_rreq_terminated(rreq, ...);
When we do a retry, we bump the rreq->nr_outstanding counter to prevent the final cleanup phase running before we've finished dispatching the retries. The problem is if we hit 0, we have to do the cleanup phase - but we're in the cleanup phase and end up repeating the retry cycle, hence the recursion.
Work around the problem by limiting the number of retries. This is based on Lizhi Xu's patch[1], and makes the following changes:
(1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make the filesystem set it if it managed to read or write at least one byte of data. Clear this bit before issuing a subrequest.
(2) Add a ->retry_count member to the subrequest and increment it any time we do a retry.
(3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with ->retry_count. If the latter is non-zero, we're doing a retry.
(4) Abandon a subrequest if retry_count is non-zero and we made no progress.
(5) Use ->retry_count in both the write-side and the read-size.
[?] Question: Should I set a hard limit on retry_count in both read and write? Say it hits 50, we always abandon it. The problem is that these changes only mitigate the issue. As long as it made at least one byte of progress, the recursion is still an issue. This patch mitigates the problem, but does not fix the underlying cause. I have patches that will do that, but it's an intrusive fix that's currently pending for the next merge window.
The oops generated by KASAN looks something like:
BUG: TASK stack guard page was hit at ffffc9000482ff48 (stack is ffffc90004830000..ffffc90004838000) Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN NOPTI ... RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686 ... mark_usage kernel/locking/lockdep.c:4646 [inline] __lock_acquire+0x906/0x3ce0 kernel/locking/lockdep.c:5156 lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5825 local_lock_acquire include/linux/local_lock_internal.h:29 [inline] ___slab_alloc+0x123/0x1880 mm/slub.c:3695 __slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908 __slab_alloc_node mm/slub.c:3961 [inline] slab_alloc_node mm/slub.c:4122 [inline] kmem_cache_alloc_noprof+0x2a7/0x2f0 mm/slub.c:4141 radix_tree_node_alloc.constprop.0+0x1e8/0x350 lib/radix-tree.c:253 idr_get_free+0x528/0xa40 lib/radix-tree.c:1506 idr_alloc_u32+0x191/0x2f0 lib/idr.c:46 idr_alloc+0xc1/0x130 lib/idr.c:87 p9_tag_alloc+0x394/0x870 net/9p/client.c:321 p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644 p9_client_zc_rpc.constprop.0+0x105/0x880 net/9p/client.c:793 p9_client_read_once+0x443/0x820 net/9p/client.c:1570 p9_client_read+0x13f/0x1b0 net/9p/client.c:1534 v9fs_issue_read+0x115/0x310 fs/9p/vfs_addr.c:74 netfs_retry_read_subrequests fs/netfs/read_retry.c:60 [inline] netfs_retry_reads+0x153a/0x1d00 fs/netfs/read_retry.c:232 netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371 netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407 netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235 netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371 netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407 netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235 netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371 ... netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407 netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235 netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371 netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407 netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235 netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371 netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407 netfs_dispatch_unbuffered_reads fs/netfs/direct_read.c:103 [inline] netfs_unbuffered_read fs/netfs/direct_read.c:127 [inline] netfs_unbuffered_read_iter_locked+0x12f6/0x19b0 fs/netfs/direct_read.c:221 netfs_unbuffered_read_iter+0xc5/0x100 fs/netfs/direct_read.c:256 v9fs_file_read_iter+0xbf/0x100 fs/9p/vfs_file.c:361 do_iter_readv_writev+0x614/0x7f0 fs/read_write.c:832 vfs_readv+0x4cf/0x890 fs/read_write.c:1025 do_preadv fs/read_write.c:1142 [inline] __do_sys_preadv fs/read_write.c:1192 [inline] __se_sys_preadv fs/read_write.c:1187 [inline] __x64_sys_preadv+0x22d/0x310 fs/read_write.c:1187 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Closes: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6 Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/[email protected] Tested-by: [email protected] Suggested-by: Lizhi Xu <[email protected]> cc: Dominique Martinet <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Reported-by: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
1ca4169c |
| 02-Oct-2024 |
David Howells <[email protected]> |
netfs: Fix missing wakeup after issuing writes
After dividing up a proposed write into subrequests, netfslib sets NETFS_RREQ_ALL_QUEUED to indicate to the collector that it can move on to the final
netfs: Fix missing wakeup after issuing writes
After dividing up a proposed write into subrequests, netfslib sets NETFS_RREQ_ALL_QUEUED to indicate to the collector that it can move on to the final cleanup once it has emptied the subrequest queues.
Now, whilst the collector will normally end up running at least once after this bit is set just because it takes a while to process all the write subrequests before the collector runs out of subrequests, there exists the possibility that the issuing thread will be forced to sleep and the collector thread will clean up all the subrequests before ALL_QUEUED gets set.
In such a case, the collector thread will not get triggered again and will never clear NETFS_RREQ_IN_PROGRESS thus leaving a request uncompleted and causing a potential futute hang.
Fix this by scheduling the write collector if all the subrequest queues are empty (and thus no writes pending issuance).
Note that we'd do this ideally before queuing the subrequest, but in the case of buffered writeback, at least, we can't find out that we've run out of folios until after we've called writeback_iter() and it has returned NULL - at which point we might not actually have any subrequests still under construction.
Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1 |
|
| #
9fffa4e9 |
| 27-Sep-2024 |
David Howells <[email protected]> |
netfs: Advance iterator correctly rather than jumping it
In netfs_write_folio(), use iov_iter_advance() to advance the folio as we split bits of it off to subrequests rather than manually jumping th
netfs: Advance iterator correctly rather than jumping it
In netfs_write_folio(), use iov_iter_advance() to advance the folio as we split bits of it off to subrequests rather than manually jumping the ->iov_offset value around. This becomes more problematic when we use a bounce buffer made out of single-page folios to cover a multipage pagecache folio.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
df9b4556 |
| 26-Sep-2024 |
David Howells <[email protected]> |
netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)
In netfslib, a buffered writeback operation has a 'write queue' of folios that are being written, held in a linear sequence of folio_
netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)
In netfslib, a buffered writeback operation has a 'write queue' of folios that are being written, held in a linear sequence of folio_queue structs. The 'issuer' adds new folio_queues on the leading edge of the queue and populates each one progressively; the 'collector' pops them off the trailing edge and discards them and the folios they point to as they are consumed.
The queue is required to always retain at least one folio_queue structure. This allows the queue to be accessed without locking and with just a bit of barriering.
When a new subrequest is prepared, its ->io_iter iterator is pointed at the current end of the write queue and then the iterator is extended as more data is added to the queue until the subrequest is committed.
Now, the problem is that the folio_queue at the leading edge of the write queue when a subrequest is prepared might have been entirely consumed - but not yet removed from the queue as it is the only remaining one and is preventing the queue from collapsing.
So, what happens is that subreq->io_iter is pointed at the spent folio_queue, then a new folio_queue is added, and, at that point, the collector is at entirely at liberty to immediately delete the spent folio_queue.
This leaves the subreq->io_iter pointing at a freed object. If the system is lucky, iterate_folioq() sees ->io_iter, sees the as-yet uncorrupted freed object and advances to the next folio_queue in the queue.
In the case seen, however, the freed object gets recycled and put back onto the queue at the tail and filled to the end. This confuses iterate_folioq() and it tries to step ->next, which may be NULL - resulting in an oops.
Fix this by the following means:
(1) When preparing a write subrequest, make sure there's a folio_queue struct with space in it at the leading edge of the queue. A function to make space is split out of the function to append a folio so that it can be called for this purpose.
(2) If the request struct iterator is pointing to a completely spent folio_queue when we make space, then advance the iterator to the newly allocated folio_queue. The subrequest's iterator will then be set from this.
The oops could be triggered using the generic/346 xfstest with a filesystem on9P over TCP with cache=loose. The oops looked something like:
BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page ... RIP: 0010:_copy_from_iter+0x2db/0x530 ... Call Trace: <TASK> ... p9pdu_vwritef+0x3d8/0x5d0 p9_client_prepare_req+0xa8/0x140 p9_client_rpc+0x81/0x280 p9_client_write+0xcf/0x1c0 v9fs_issue_write+0x87/0xc0 netfs_advance_write+0xa0/0xb0 netfs_write_folio.isra.0+0x42d/0x500 netfs_writepages+0x15a/0x1f0 do_writepages+0xd1/0x220 filemap_fdatawrite_wbc+0x5c/0x80 v9fs_mmap_vm_close+0x7d/0xb0 remove_vma+0x35/0x70 vms_complete_munmap_vmas+0x11a/0x170 do_vmi_align_munmap+0x17d/0x1c0 do_vmi_munmap+0x13e/0x150 __vm_munmap+0x92/0xd0 __x64_sys_munmap+0x17/0x20 do_syscall_64+0x80/0xe0 entry_SYSCALL_64_after_hwframe+0x71/0x79
This also fixed a similar-looking issue with cifs and generic/074.
Fixes: cd0277ed0c18 ("netfs: Use new folio_queue data type and iterator instead of xarray iter") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: David Howells <[email protected]> Tested-by: kernel test robot <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: Paulo Alcantara <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
8f246b7c |
| 29-Jul-2024 |
David Howells <[email protected]> |
netfs: Cancel dirty folios that have no storage destination
Kafs wants to be able to cache the contents of directories (and symlinks), but whilst these are downloaded from the server with the FS.Fet
netfs: Cancel dirty folios that have no storage destination
Kafs wants to be able to cache the contents of directories (and symlinks), but whilst these are downloaded from the server with the FS.FetchData RPC op and similar, the same as for regular files, they can't be updated by FS.StoreData, but rather have special operations (FS.MakeDir, etc.).
Now, rather than redownloading a directory's content after each change made to that directory, kafs modifies the local blob. This blob can be saved out to the cache, and since it's using netfslib, kafs just marks the folios dirty and lets ->writepages() on the directory take care of it, as for an regular file.
This is fine as long as there's a cache as although the upload stream is disabled, there's a cache stream to drive the procedure. But if the cache goes away in the meantime, suddenly there's no way do any writes and the code gets confused, complains "R=%x: No submit" to dmesg and leaves the dirty folio hanging.
Fix this by just cancelling the store of the folio if neither stream is active. (If there's no cache at the time of dirtying, we should just not mark the folio dirty).
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10 |
|
| #
c4f1450e |
| 12-Jul-2024 |
David Howells <[email protected]> |
cachefiles, netfs: Fix write to partial block at EOF
Because it uses DIO writes, cachefiles is unable to make a write to the backing file if that write is not aligned to and sized according to the b
cachefiles, netfs: Fix write to partial block at EOF
Because it uses DIO writes, cachefiles is unable to make a write to the backing file if that write is not aligned to and sized according to the backing file's DIO block alignment. This makes it tricky to handle a write to the cache where the EOF on the network file is not correctly aligned.
To get around this, netfslib attempts to tell the driver it is calling how much more data there is available beyond the EOF that it can use to pad the write (netfslib preclears the part of the folio above the EOF). However, it tries to tell the cache what the maximum length is, but doesn't calculate this correctly; and, in any case, cachefiles actually ignores the value and just skips the block.
Fix this by:
(1) Change the value passed to indicate the amount of extra data that can be added to the operation (now ->submit_extendable_to). This is much simpler to calculate as it's just the end of the folio minus the top of the data within the folio - rather than having to account for data spread over multiple folios.
(2) Make cachefiles add some of this data if the subrequest it is given ends at the network file's i_size if the extra data is sufficient to pad out to a whole block.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7 |
|
| #
ee4cdf7b |
| 01-Jul-2024 |
David Howells <[email protected]> |
netfs: Speed up buffered reading
Improve the efficiency of buffered reads in a number of ways:
(1) Overhaul the algorithm in general so that it's a lot more compact and split the read submiss
netfs: Speed up buffered reading
Improve the efficiency of buffered reads in a number of ways:
(1) Overhaul the algorithm in general so that it's a lot more compact and split the read submission code between buffered and unbuffered versions. The unbuffered version can be vastly simplified.
(2) Read-result collection is handed off to a work queue rather than being done in the I/O thread. Multiple subrequests can be processes simultaneously.
(3) When a subrequest is collected, any folios it fully spans are collected and "spare" data on either side is donated to either the previous or the next subrequest in the sequence.
Notes:
(*) Readahead expansion is massively slows down fio, presumably because it causes a load of extra allocations, both folio and xarray, up front before RPC requests can be transmitted.
(*) RDMA with cifs does appear to work, both with SIW and RXE.
(*) PG_private_2-based reading and copy-to-cache is split out into its own file and altered to use folio_queue. Note that the copy to the cache now creates a new write transaction against the cache and adds the folios to be copied into it. This allows it to use part of the writeback I/O code.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
983cdcf8 |
| 06-Jun-2024 |
David Howells <[email protected]> |
netfs: Simplify the writeback code
Use the new folio_queue structures to simplify the writeback code. The problem with referring to the i_pages xarray directly is that we may have gaps in the seque
netfs: Simplify the writeback code
Use the new folio_queue structures to simplify the writeback code. The problem with referring to the i_pages xarray directly is that we may have gaps in the sequence of folios we're writing from that we need to skip when we're removing the writeback mark from the folios we're writing back from.
At the moment the code tries to deal with this by carefully tracking the gaps in each writeback stream (eg. write to server and write to cache) and divining when there's a gap that spans folios (something that's not helped by folios not being a consistent size).
Instead, the folio_queue buffer contains pointers only the folios we're dealing with, has them in ascending order and indicates a gap by placing non-consequitive folios next to each other. This makes it possible to track where we need to clean up to by just keeping track of where we've processed to on each stream and taking the minimum.
Note that the I/O iterator is always rounded up to the end of the folio, even if that is beyond the EOF position, so that the cache can do DIO from the page. The excess space is cleared, though mmapped writes clobber it.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
bfaa33b8 |
| 08-Jul-2024 |
David Howells <[email protected]> |
netfs: Provide an iterator-reset function
Provide a function to reset the iterator on a subrequest.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: netfs
netfs: Provide an iterator-reset function
Provide a function to reset the iterator on a subrequest.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2 |
|
| #
cd0277ed |
| 29-May-2024 |
David Howells <[email protected]> |
netfs: Use new folio_queue data type and iterator instead of xarray iter
Make the netfs write-side routines use the new folio_queue struct to hold a rolling buffer of folios, with the issuer adding
netfs: Use new folio_queue data type and iterator instead of xarray iter
Make the netfs write-side routines use the new folio_queue struct to hold a rolling buffer of folios, with the issuer adding folios at the tail and the collector removing them from the head as they're processed instead of using an xarray.
This will allow a subsequent patch to simplify the write collector.
The primary mark (as tested by folioq_is_marked()) is used to note if the corresponding folio needs putting.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
22de489d |
| 12-Jul-2024 |
David Howells <[email protected]> |
netfs: Use bh-disabling spinlocks for rreq->lock
Use bh-disabling spinlocks when accessing rreq->lock because, in the future, it may be twiddled from softirq context when cleanup is driven from cach
netfs: Use bh-disabling spinlocks for rreq->lock
Use bh-disabling spinlocks when accessing rreq->lock because, in the future, it may be twiddled from softirq context when cleanup is driven from cache backend DIO completion.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
24c90a79 |
| 09-Jul-2024 |
David Howells <[email protected]> |
netfs: Set the request work function upon allocation
Set the work function in the netfs_io_request work_struct when we allocate the request rather than doing this later. This reduces the number of
netfs: Set the request work function upon allocation
Set the work function in the netfs_io_request work_struct when we allocate the request rather than doing this later. This reduces the number of places we need to set it in future code.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
52d55922 |
| 07-Jun-2024 |
David Howells <[email protected]> |
netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
Move max_len/max_nr_segs from struct netfs_io_subrequest to struct netfs_io_stream as we only issue one subreq at a time a
netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
Move max_len/max_nr_segs from struct netfs_io_subrequest to struct netfs_io_stream as we only issue one subreq at a time and then don't need these values again for that subreq unless and until we have to retry it - in which case we want to renegotiate them.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
ef966d73 |
| 31-May-2024 |
David Howells <[email protected]> |
netfs: Record contention stats for writeback lock
Record statistics for contention upon the writeback serialisation lock that prevents racing writeback calls from causing each other to interleave th
netfs: Record contention stats for writeback lock
Record statistics for contention upon the writeback serialisation lock that prevents racing writeback calls from causing each other to interleave their writebacks. These can be viewed in /proc/fs/netfs/stats on the WbLock line, with skip=N indicating the number of non-SYNC writebacks skipped and wait=N indicating the number of SYNC writebacks that waited.
Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: Steve French <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
7b589a9b |
| 07-Aug-2024 |
David Howells <[email protected]> |
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used correctly. The problem is that we try to set them up in the re
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used correctly. The problem is that we try to set them up in the request initialisation, but we the cache may be in the process of setting up still, and so the state may not be correct. Further, we secondarily sample the cache state and make contradictory decisions later.
The issue arises because we set up the cache resources, which allows the cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which triggers cache writing even if we didn't set the flags when allocating.
Fix this in the following way:
(1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in ->init_request() rather than trying to juggle that in netfs_alloc_request().
(2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is to be done, then PG_private_2 is to be used rather than only setting it if we decide to cache and then having netfs_rreq_unlock_folios() set the non-PG_private_2 writeback-to-cache if it wasn't set.
(3) Split netfs_rreq_unlock_folios() into two functions, one of which contains the deprecated code for using PG_private_2 to avoid accidentally doing the writeback path - and always use it if USE_PGPRIV2 is set.
(4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always wait for PG_private_2. This function is deprecated and only used by ceph anyway, and so label it so.
(5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use fscache_operation_valid() on the cache_resources instead. This has the advantage of picking up the result of netfs_begin_cache_read() and fscache_begin_write_operation() - which are called after the object is initialised and will wait for the cache to come to a usable state.
Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be applied on top of that. Without this as well, things like:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
and:
WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386
may happen, along with some UAFs due to PG_private_2 not getting used to wait on writeback completion.
Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty") Reported-by: Max Kellermann <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Ilya Dryomov <[email protected]> cc: Xiubo Li <[email protected]> cc: Hristo Venev <[email protected]> cc: Jeff Layton <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
212be98a |
| 19-Jul-2024 |
David Howells <[email protected]> |
netfs: Fix writeback that needs to go to both server and cache
When netfslib is performing writeback (ie. ->writepages), it maintains two parallel streams of writes, one to the server and one to the
netfs: Fix writeback that needs to go to both server and cache
When netfslib is performing writeback (ie. ->writepages), it maintains two parallel streams of writes, one to the server and one to the cache, but it doesn't mark either stream of writes as active until it gets some data that needs to be written to that stream.
This is done because some folios will only be written to the cache (e.g. copying to the cache on read is done by marking the folios and letting writeback do the actual work) and sometimes we'll only be writing to the server (e.g. if there's no cache).
Now, since we don't actually dispatch uploads and cache writes in parallel, but rather flip between the streams, depending on which has the lowest so-far-issued offset, and don't wait for the subreqs to finish before flipping, we can end up in a situation where, say, we issue a write to the server and this completes before we start the write to the cache.
But because we only activate a stream when we first add a subreq to it, the result collection code may run before we manage to activate the stream - resulting in the folio being cleaned and having the writeback-in-progress mark removed. At this point, the folio no longer belongs to us.
This is only really a problem for folios that need to be written to both streams - and in that case, the upload to the server is started first, followed by the write to the cache - and the cache write may see a bad folio.
Fix this by activating the cache stream up front if there's a cache available. If there's a cache, then all data is going to be written to it.
Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
a9d47a50 |
| 18-Jul-2024 |
David Howells <[email protected]> |
netfs: Revert "netfs: Switch debug logging to pr_debug()"
Revert commit 163eae0fb0d4c610c59a8de38040f8e12f89fd43 to get back the original operation of the debugging macros.
Signed-off-by: David How
netfs: Revert "netfs: Switch debug logging to pr_debug()"
Revert commit 163eae0fb0d4c610c59a8de38040f8e12f89fd43 to get back the original operation of the debugging macros.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] cc: Uwe Kleine-König <[email protected]> cc: Christian Brauner <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
6470e0bc |
| 05-Jun-2024 |
David Howells <[email protected]> |
netfs: Fix early issue of write op on partial write to folio tail
During the writeback procedure, at the end of netfs_write_folio(), pending write operations are flushed if the amount of write-strea
netfs: Fix early issue of write op on partial write to folio tail
During the writeback procedure, at the end of netfs_write_folio(), pending write operations are flushed if the amount of write-streaming data stored in a page is less than the size of the folio because if we haven't modified a folio to the end, it cannot be contiguous with the following folio... except if the dirty region of the folio is right at the end of the folio space.
Fix the test to take the offset into the folio into account as well, such that if the dirty region runs right up to the end of the folio, we leave the flushing for later.
Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: Marc Dionne <[email protected]> cc: Steve French <[email protected]> cc: Paulo Alcantara <[email protected]> (DFS, global name space) cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|