| 15e9aaf9 | 14-Mar-2025 |
David Howells <[email protected]> |
netfs: Fix rolling_buffer_load_from_ra() to not clear mark bits
rolling_buffer_load_from_ra() looms large in the perf report because it loops around doing an atomic clear for each of the three mark
netfs: Fix rolling_buffer_load_from_ra() to not clear mark bits
rolling_buffer_load_from_ra() looms large in the perf report because it loops around doing an atomic clear for each of the three mark bits per folio. However, this is both inefficient (it would be better to build a mask and atomically AND them out) and unnecessary as they shouldn't be set.
Fix this by removing the loop.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: "Paulo Alcantara (Red Hat)" <[email protected]> cc: Jeff Layton <[email protected]> cc: Steve French <[email protected]> cc: Paulo Alcantara <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| 344b7ef2 | 14-Mar-2025 |
Max Kellermann <[email protected]> |
netfs: Call `invalidate_cache` only if implemented
Many filesystems such as NFS and Ceph do not implement the `invalidate_cache` method. On those filesystems, if writing to the cache (`NETFS_WRITE_
netfs: Call `invalidate_cache` only if implemented
Many filesystems such as NFS and Ceph do not implement the `invalidate_cache` method. On those filesystems, if writing to the cache (`NETFS_WRITE_TO_CACHE`) fails for some reason, the kernel crashes like this:
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: Oops: 0010 [#1] SMP PTI CPU: 9 UID: 0 PID: 3380 Comm: kworker/u193:11 Not tainted 6.13.3-cm4all1-hp #437 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_write_collection_worker RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffff9b86e2ca7dc0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 7fffffffffffffff RDX: 0000000000000001 RSI: ffff89259d576a18 RDI: ffff89259d576900 RBP: ffff89259d5769b0 R08: ffff9b86e2ca7d28 R09: 0000000000000002 R10: ffff89258ceaca80 R11: 0000000000000001 R12: 0000000000000020 R13: ffff893d158b9338 R14: ffff89259d576900 R15: ffff89259d5769b0 FS: 0000000000000000(0000) GS:ffff893c9fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000054442e003 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x15c/0x460 ? try_to_wake_up+0x2d2/0x530 ? exc_page_fault+0x5e/0x100 ? asm_exc_page_fault+0x22/0x30 netfs_write_collection_worker+0xe9f/0x12b0 ? xs_poll_check_readable+0x3f/0x80 ? xs_stream_data_receive_workfn+0x8d/0x110 process_one_work+0x134/0x2d0 worker_thread+0x299/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xba/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 0000000000000000
This patch adds the missing `NULL` check.
Fixes: 0e0f2dfe880f ("netfs: Dispatch write requests to process a writeback slice") Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: Max Kellermann <[email protected]> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: "Paulo Alcantara (Red Hat)" <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| 5de0219a | 12-Feb-2025 |
David Howells <[email protected]> |
netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued
Due to the code that queues a subreq on the active subrequest list getting moved to netfs_issue_read(), the NETFS_RREQ_ALL_QUE
netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued
Due to the code that queues a subreq on the active subrequest list getting moved to netfs_issue_read(), the NETFS_RREQ_ALL_QUEUED flag may now get set before the list-add actually happens. This is not a problem if the collection worker happens after the list-add, but it's a race - and, for 9P, where the read from the server is synchronous and done in the submitting thread, this is a lot more likely.
The result is that, if the timing is wrong, a ref gets leaked because the collector thinks that all the subreqs have completed (because it can't see the last one yet) and clears NETFS_RREQ_IN_PROGRESS - at which point, the collection worker no longer goes into the collector.
This can be provoked with AFS by injecting an msleep() right before the final subreq is queued.
Fix this by splitting the queuing part out of netfs_issue_read() into a new function, netfs_queue_read(), and calling it separately. The setting of NETFS_RREQ_ALL_QUEUED is then done by netfs_queue_read() whilst it is holding the spinlock (that's probably unnecessary, but shouldn't hurt).
It might be better to set a flag on the final subreq, but this could be a problem if an error occurs and we can't queue it.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") Reported-by: Ihor Solodrai <[email protected]> Closes: https://lore.kernel.org/r/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/ Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Ihor Solodrai <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: Marc Dionne <[email protected]> cc: Steve French <[email protected]> cc: Paulo Alcantara <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| 794d8cf3 | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
It seems that it's possible to get to netfs_writeback_unlock_folios() with an empty rolling buffer during buffered writes. This shoul
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
It seems that it's possible to get to netfs_writeback_unlock_folios() with an empty rolling buffer during buffered writes. This should not be possible as the rolling buffer is initialised as the write request is set up and thereafter maintains at least one folio_queue struct therein until it gets destroyed. This allows lockless addition and removal of folio_queue structs in the buffer because, unlike with a ring buffer, the producer and consumer each only need to look at and alter one pointer into the buffer.
Now, the rolling buffer is only used for buffered I/O operations as netfs_collect_write_results() should only call netfs_writeback_unlock_folios() if the request is of origin type NETFS_WRITEBACK, NETFS_WRITETHROUGH or NETFS_PGPRIV2_COPY_TO_CACHE.
So it would seem that one of the following occurred: (1) I/O started before the request was fully initialised, (2) the origin got switched mid-flow or (3) the request has already been freed and this is a UAF error. I think the last is the most likely.
Make netfs_writeback_unlock_folios() report information about the request and subrequests if folioq is seen to be NULL to try and help debug this, throw a warning and return.
Note that this does not try to fix the problem.
Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=af5c06208fa71bf31b16 Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/r/[email protected] cc: Chang Yu <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| e2d46f2e | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Change the read result collector to only use one work item
Change the way netfslib collects read results to do all the collection for a particular read request using a single work item that w
netfs: Change the read result collector to only use one work item
Change the way netfslib collects read results to do all the collection for a particular read request using a single work item that walks along the subrequest queue as subrequests make progress or complete, unlocking folios progressively rather than doing the unlock in parallel as parallel requests come in.
The code is remodelled to be more like the write-side code, though only using a single stream. This makes it more directly comparable and thus easier to duplicate fixes between the two sides.
This has a number of advantages:
(1) It's simpler. There doesn't need to be a complex donation mechanism to handle mismatches between the size and alignment of subrequests and folios. The collector unlocks folios as the subrequests covering each complete.
(2) It should cause less scheduler overhead as there's a single work item in play unlocking pages in parallel when a read gets split up into a lot of subrequests instead of one per subrequest.
Whilst the parallellism is nice in theory, in practice, the vast majority of loads are sequential reads of the whole file, so committing a bunch of threads to unlocking folios out of order doesn't help in those cases.
(3) It should make it easier to implement content decryption. A folio cannot be decrypted until all the requests that contribute to it have completed - and, again, most loads are sequential and so, most of the time, we want to begin decryption sequentially (though it's great if the decryption can happen in parallel).
There is a disadvantage in that we're losing the ability to decrypt and unlock things on an as-things-arrive basis which may affect some applications.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| e61bfaad | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Add functions to build/clean a buffer in a folio_queue
Add two netfslib functions to build up or clean up a buffer in a folio_queue. The first, netfs_alloc_folioq_buffer() will add folios to
netfs: Add functions to build/clean a buffer in a folio_queue
Add two netfslib functions to build up or clean up a buffer in a folio_queue. The first, netfs_alloc_folioq_buffer() will add folios to a buffer, extending up at least to the given size. If it can, it will add multipage folios. The folios are optionally have the mapping set and will have the index set according to the distance from the front of the folio queue.
The second function will free up a folio queue and put any folios in the queue that have the first mark set.
The netfs_folio tracepoint is also altered to cope with folios that have a NULL mapping, and the folios being added/put will have trace lines emitted and will be accounted in the stats.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| 627cf645 | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Don't use bh spinlock
All the accessing of the subrequest lists is now done in process context, possibly in a workqueue, but not now in a BH context, so we don't need the lock against BH inte
netfs: Don't use bh spinlock
All the accessing of the subrequest lists is now done in process context, possibly in a workqueue, but not now in a BH context, so we don't need the lock against BH interference when taking the netfs_io_request::lock spinlock.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| 31fc366a | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Drop the was_async arg from netfs_read_subreq_terminated()
Drop the was_async argument from netfs_read_subreq_terminated(). Almost every caller is either in process context and passes false.
netfs: Drop the was_async arg from netfs_read_subreq_terminated()
Drop the was_async argument from netfs_read_subreq_terminated(). Almost every caller is either in process context and passes false. Some filesystems delegate the call to a workqueue to avoid doing the work in their network message queue parsing thread.
The only exception is netfs_cache_read_terminated() which handles completion in the cache - which is usually a callback from the backing filesystem in softirq context, though it can be from process context if an error occurred. In this case, delegate to a workqueue.
Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wiVC5Cgyz6QKXFu6fTaA6h4CjexDR-OV9kL6Vo5x9v8=A@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| d606c362 | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Make netfs_advance_write() return size_t
netfs_advance_write() calculates the amount of data it's attaching to a stream with size_t, but then returns this as an int. Switch the return value
netfs: Make netfs_advance_write() return size_t
netfs_advance_write() calculates the amount of data it's attaching to a stream with size_t, but then returns this as an int. Switch the return value to size_t for consistency.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| 06fa229c | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Abstract out a rolling folio buffer implementation
A rolling buffer is a series of folios held in a list of folio_queues. New folios and folio_queue structs may be inserted at the head simul
netfs: Abstract out a rolling folio buffer implementation
A rolling buffer is a series of folios held in a list of folio_queues. New folios and folio_queue structs may be inserted at the head simultaneously with spent ones being removed from the tail without the need for locking.
The rolling buffer includes an iov_iter and it has to be careful managing this as the list of folio_queues is extended such that an oops doesn't incurred because the iterator was pointing to the end of a folio_queue segment that got appended to and then removed.
We need to use the mechanism twice, once for read and once for write, and, in future patches, we will use a second rolling buffer to handle bounce buffering for content encryption.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| eb118159 | 16-Dec-2024 |
David Howells <[email protected]> |
netfs: Use a folio_queue allocation and free functions
Provide and use folio_queue allocation and free functions to combine the allocation, initialisation and stat (un)accounting steps that are repe
netfs: Use a folio_queue allocation and free functions
Provide and use folio_queue allocation and free functions to combine the allocation, initialisation and stat (un)accounting steps that are repeated in several places.
Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|