|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2 |
|
| #
cf960726 |
| 07-Apr-2025 |
Jens Axboe <[email protected]> |
io_uring/kbuf: reject zero sized provided buffers
This isn't fixing a real issue, but there's also zero point in going through group and buffer setup, when the buffers are going to be rejected once
io_uring/kbuf: reject zero sized provided buffers
This isn't fixing a real issue, but there's also zero point in going through group and buffer setup, when the buffers are going to be rejected once attempted to get used.
Cc: [email protected] Reported-by: [email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
cf9536e5 |
| 10-Mar-2025 |
Jens Axboe <[email protected]> |
io_uring/kbuf: enable bundles for incrementally consumed buffers
The original support for incrementally consumed buffers didn't allow it to be used with bundles, with the assumption being that incre
io_uring/kbuf: enable bundles for incrementally consumed buffers
The original support for incrementally consumed buffers didn't allow it to be used with bundles, with the assumption being that incremental buffers are generally larger, and hence there's less of a nedd to support it.
But that assumption may not be correct - it's perfectly viable to use smaller buffers with incremental consumption, and there may be valid reasons for an application or framework to do so.
As there's really no need to explicitly disable bundles with incrementally consumed buffers, allow it. This actually makes the peek side cheaper and simpler, with the completion side basically the same, just needing to iterate for the consumed length.
Reported-by: Norman Maurer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
5d3e5124 |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: uninline __io_put_kbufs
__io_put_kbufs() and other helper functions are too large to be inlined, compilers would normally refuse to do so. Uninline it and move together with io_kbuf_c
io_uring/kbuf: uninline __io_put_kbufs
__io_put_kbufs() and other helper functions are too large to be inlined, compilers would normally refuse to do so. Uninline it and move together with io_kbuf_commit into kbuf.c.
io_kbuf_commitSigned-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/3dade7f55ad590e811aff83b1ec55c9c04e17b2b.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
54e00d9a |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: introduce io_kbuf_drop_legacy()
io_kbuf_drop() is only used for legacy provided buffers, and so __io_put_kbuf_list() is never called for REQ_F_BUFFER_RING. Remove the dead branch out
io_uring/kbuf: introduce io_kbuf_drop_legacy()
io_kbuf_drop() is only used for legacy provided buffers, and so __io_put_kbuf_list() is never called for REQ_F_BUFFER_RING. Remove the dead branch out of __io_put_kbuf_list(), rename it into io_kbuf_drop_legacy() and use it directly instead of io_kbuf_drop().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c8cc73e2272f09a86ecbdad9ebdd8304f8e583c0.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
e150e70f |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: open code __io_put_kbuf()
__io_put_kbuf() is a trivial wrapper, open code it into __io_put_kbufs().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.o
io_uring/kbuf: open code __io_put_kbuf()
__io_put_kbuf() is a trivial wrapper, open code it into __io_put_kbufs().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/9dc17380272b48d56c95992c6f9eaacd5546e1d3.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
13ee854e |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: remove legacy kbuf caching
Remove all struct io_buffer caches. It makes it a fair bit simpler. Apart from from killing a bunch of lines and juggling between lists, __io_put_kbuf_list(
io_uring/kbuf: remove legacy kbuf caching
Remove all struct io_buffer caches. It makes it a fair bit simpler. Apart from from killing a bunch of lines and juggling between lists, __io_put_kbuf_list() doesn't need ->completion_lock locking now.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/18287217466ee2576ea0b1e72daccf7b22c7e856.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
dc39fb10 |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: simplify __io_put_kbuf
As a preparation step remove an optimisation from __io_put_kbuf() trying to use the locked cache. With that __io_put_kbuf_list() is only used with ->io_buffers_
io_uring/kbuf: simplify __io_put_kbuf
As a preparation step remove an optimisation from __io_put_kbuf() trying to use the locked cache. With that __io_put_kbuf_list() is only used with ->io_buffers_comp, and we remove the explicit list argument.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1b7f1394ec4afc7f96b35a61f5992e27c49fd067.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
9afe6847 |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: remove legacy kbuf kmem cache
Remove the kmem cache used by legacy provided buffers.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8195c207d8
io_uring/kbuf: remove legacy kbuf kmem cache
Remove the kmem cache used by legacy provided buffers.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8195c207d8524d94e972c0c82de99282289f7f5c.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
7919292a |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: remove legacy kbuf bulk allocation
Legacy provided buffers are slow and discouraged in favour of the ring variant. Remove the bulk allocation to keep it simpler as we don't care about
io_uring/kbuf: remove legacy kbuf bulk allocation
Legacy provided buffers are slow and discouraged in favour of the ring variant. Remove the bulk allocation to keep it simpler as we don't care about performance.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/a064d70370e590efed8076e9501ae4cfc20fe0ca.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
88027663 |
| 12-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: reallocate buf lists on upgrade
IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legacy selected buffer and has been emptied. It violates the req
io_uring/kbuf: reallocate buf lists on upgrade
IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legacy selected buffer and has been emptied. It violates the requirement that most of the field should stay stable after publish. Always reallocate it instead.
Cc: [email protected] Reported-by: Pumpkin Chang <[email protected]> Fixes: 2fcabce2d7d34 ("io_uring: disallow mixed provided buffer group registrations") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6 |
|
| #
ed123c94 |
| 03-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/kbuf: use pre-committed buffer address for non-pollable file
For non-pollable files, buffer ring consumption will commit upfront. This is fine, but io_ring_buffer_select() will return the a
io_uring/kbuf: use pre-committed buffer address for non-pollable file
For non-pollable files, buffer ring consumption will commit upfront. This is fine, but io_ring_buffer_select() will return the address of the buffer after having committed it. For incrementally consumed buffers, this is incorrect as it will modify the buffer address.
Store the pre-committed value and return that. If that isn't done, then the initial part of the buffer is not used and the application will correctly assume the content arrived at the start of the userspace buffer, but the kernel will have put it later in the buffer. Or it can cause a spurious -EFAULT returned in the CQE, depending on the buffer size. As bounds are suitably checked for doing the actual IO, no adverse side effects are possible - it's just a data misplacement within the existing buffer.
Reported-by: Gwendal Fernet <[email protected]> Cc: [email protected] Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
febfbf76 |
| 04-Dec-2024 |
Colin Ian King <[email protected]> |
io_uring/kbuf: fix unintentional sign extension on shift of reg.bgid
Shifting reg.bgid << IORING_OFF_PBUF_SHIFT results in a promotion from __u16 to a 32 bit signed integer, this is then sign extend
io_uring/kbuf: fix unintentional sign extension on shift of reg.bgid
Shifting reg.bgid << IORING_OFF_PBUF_SHIFT results in a promotion from __u16 to a 32 bit signed integer, this is then sign extended to a 64 bit unsigned long on 64 bit architectures. If reg.bgid is greater than 0x7fff then this leads to a sign extended result where all the upper 32 bits of mmap_offset are set to 1. Fix this by casting reg.bgid to the same type as mmap_offset before performing the shift.
Fixes: ef62de3c4ad5 ("io_uring/kbuf: use region api for pbuf rings") Signed-off-by: Colin Ian King <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1 |
|
| #
7cd7b957 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: unify io_uring mmap'ing code
All mapped memory is now backed by regions and we can unify and clean up io_region_validate_mmap() and io_uring_mmap(). Extract a function looking up a
io_uring/memmap: unify io_uring mmap'ing code
All mapped memory is now backed by regions and we can unify and clean up io_region_validate_mmap() and io_uring_mmap(). Extract a function looking up a region, the rest of the handling should be generic and just needs the region.
There is one more ring type specific code, i.e. the mmaping size truncation quirk for IORING_OFF_[S,C]Q_RING, which is left as is.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/f5e1eda1562bfd34276de07465525ae5f10e1e84.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
ef62de3c |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: use region api for pbuf rings
Convert internal parts of the provided buffer ring managment to the region API. It's the last non-region mapped ring we have, so it also kills a bunch of
io_uring/kbuf: use region api for pbuf rings
Convert internal parts of the provided buffer ring managment to the region API. It's the last non-region mapped ring we have, so it also kills a bunch of now unused memmap.c helpers.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/6c40cf7beaa648558acd4d84bc0fb3279a35d74b.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
90175f3f |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: remove pbuf ring refcounting
struct io_buffer_list refcounting was needed for RCU based sync with mmap, now we can kill it.
Signed-off-by: Pavel Begunkov <[email protected]> Li
io_uring/kbuf: remove pbuf ring refcounting
struct io_buffer_list refcounting was needed for RCU based sync with mmap, now we can kill it.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/4a9cc54bf0077bb2bf2f3daf917549ddd41080da.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
78fda3d0 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: use mmap_lock to sync with mmap
A preparation / cleanup patch simplifying the buf ring - mmap synchronisation. Instead of relying on RCU, which is trickier, do it by grabbing the mmap
io_uring/kbuf: use mmap_lock to sync with mmap
A preparation / cleanup patch simplifying the buf ring - mmap synchronisation. Instead of relying on RCU, which is trickier, do it by grabbing the mmap_lock when when anyone tries to publish or remove a registered buffer to / from ->io_bl_xa.
Modifications of the xarray should always be protected by both ->uring_lock and ->mmap_lock, while lookups should hold either of them. While a struct io_buffer_list is in the xarray, the mmap related fields like ->flags and ->buf_pages should stay stable.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/af13bde56ee1a26bcaefaa9aad37a9ea318a590e.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
f274495a |
| 30-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: return correct iovec count from classic buffer peek
io_provided_buffers_select() returns 0 to indicate success, but it should be returning 1 to indicate that 1 vec was mapped. This ca
io_uring/kbuf: return correct iovec count from classic buffer peek
io_provided_buffers_select() returns 0 to indicate success, but it should be returning 1 to indicate that 1 vec was mapped. This causes peeking to fail with classic provided buffers, and while that's not a use case that anyone should use, it should still work correctly.
The end result is that no buffer will be selected, and hence a completion with '0' as the result will be posted, without a buffer attached.
Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3 |
|
| #
ae98dbf4 |
| 09-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: add support for incremental buffer consumption
By default, any recv/read operation that uses provided buffers will consume at least 1 buffer fully (and maybe more, in case of bundles)
io_uring/kbuf: add support for incremental buffer consumption
By default, any recv/read operation that uses provided buffers will consume at least 1 buffer fully (and maybe more, in case of bundles). This adds support for incremental consumption, meaning that an application may add large buffers, and each read/recv will just consume the part of the buffer that it needs.
For example, let's say an application registers 1MB buffers in a provided buffer ring, for streaming receives. If it gets a short recv, then the full 1MB buffer will be consumed and passed back to the application. With incremental consumption, only the part that was actually used is consumed, and the buffer remains the current one.
This means that both the application and the kernel needs to keep track of what the current receive point is. Each recv will still pass back a buffer ID and the size consumed, the only difference is that before the next receive would always be the next buffer in the ring. Now the same buffer ID may return multiple receives, each at an offset into that buffer from where the previous receive left off. Example:
Application registers a provided buffer ring, and adds two 32K buffers to the ring.
Buffer1 address: 0x1000000 (buffer ID 0) Buffer2 address: 0x2000000 (buffer ID 1)
A recv completion is received with the following values:
cqe->res 0x1000 (4k bytes received) cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)
and the application now knows that 4096b of data is available at 0x1000000, the start of that buffer, and that more data from this buffer will be coming. Now the next receive comes in:
cqe->res 0x2010 (8k bytes received) cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)
which tells the application that 8k is available where the last completion left off, at 0x1001000. Next completion is:
cqe->res 0x5000 (20k bytes received) cqe->flags 0x1 (CQE_F_BUFFER set, buffer ID 0)
and the application now knows that 20k of data is available at 0x1003000, which is where the previous receive ended. CQE_F_BUF_MORE isn't set, as no more data is available in this buffer ID. The next completion is then:
cqe->res 0x1000 (4k bytes received) cqe->flags 0x10001 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 1)
which tells the application that buffer ID 1 is now the current one, hence there's 4k of valid data at 0x2000000. 0x2001000 will be the next receive point for this buffer ID.
When a buffer will be reused by future CQE completions, IORING_CQE_BUF_MORE will be set in cqe->flags. This tells the application that the kernel isn't done with the buffer yet, and that it should expect more completions for this buffer ID. Will only be set by provided buffer rings setup with IOU_PBUF_RING INC, as that's the only type of buffer that will see multiple consecutive completions for the same buffer ID. For any other provided buffer type, any completion that passes back a buffer to the application is final.
Once a buffer has been fully consumed, the buffer ring head is incremented and the next receive will indicate the next buffer ID in the CQE cflags.
On the send side, the application can manage how much data is sent from an existing buffer by setting sqe->len to the desired send length.
An application can request incremental consumption by setting IOU_PBUF_RING_INC in the provided buffer ring registration. Outside of that, any provided buffer ring setup and buffer additions is done like before, no changes there. The only change is in how an application may see multiple completions for the same buffer ID, hence needing to know where the next receive will happen.
Note that like existing provided buffer rings, this should not be used with IOSQE_ASYNC, as both really require the ring to remain locked over the duration of the buffer selection and the operation completion. It will consume a buffer otherwise regardless of the size of the IO done.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
6733e678 |
| 27-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: pass in 'len' argument for buffer commit
In preparation for needing the consumed length, pass in the length being completed. Unused right now, but will be used when it is possible to
io_uring/kbuf: pass in 'len' argument for buffer commit
In preparation for needing the consumed length, pass in the length being completed. Unused right now, but will be used when it is possible to partially consume a buffer.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
2c8fa70b |
| 12-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h
In preparation for using this helper in kbuf.h as well, move it there and turn it into a macro.
Signed-off-by: Jens Axboe <[email protected]>
|
| #
ecd5c9b2 |
| 12-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: add io_kbuf_commit() helper
Committing the selected ring buffer is currently done in three different spots, combine it into a helper and just call that.
Signed-off-by: Jens Axboe <ax
io_uring/kbuf: add io_kbuf_commit() helper
Committing the selected ring buffer is currently done in three different spots, combine it into a helper and just call that.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
a69307a5 |
| 09-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: turn io_buffer_list booleans into flags
We could just move these two and save some space, but in preparation for adding another flag, turn them into flags first.
This saves 8 bytes i
io_uring/kbuf: turn io_buffer_list booleans into flags
We could just move these two and save some space, but in preparation for adding another flag, turn them into flags first.
This saves 8 bytes in struct io_buffer_list, making it exactly half a cacheline on 64-bit archs now rather than 40 bytes.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
03e02e8f |
| 08-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: use 'bl' directly rather than req->buf_list
req->buf_list is assigned higher up and is safe to use as we remain within a locked region, as is the 'bl' variable itself from which it wa
io_uring/kbuf: use 'bl' directly rather than req->buf_list
req->buf_list is assigned higher up and is safe to use as we remain within a locked region, as is the 'bl' variable itself from which it was assigned. To improve readability, use 'bl' directly rather than get it from the io_kiocb, if we need to increment the head directly in the buffer selection path. This makes it readily apparent that it's the same io_buffer_list being used.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
e0ee9676 |
| 21-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring/kbuf: sanitize peek buffer setup
Harden the buffer peeking a bit, by adding a sanity check for it having a valid size. Outside of that, arg->max_len is a size_t, though it's only ever set t
io_uring/kbuf: sanitize peek buffer setup
Harden the buffer peeking a bit, by adding a sanity check for it having a valid size. Outside of that, arg->max_len is a size_t, though it's only ever set to a 32-bit value (as it's governed by MAX_RW_COUNT). Bump our needed check to a size_t so we know it fits. Finally, cap the calculated needed iov value to the PEEK_MAX_IMPORT, which is the maximum number of segments that should be peeked.
Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1 |
|
| #
bcc87d97 |
| 18-Jul-2024 |
Pavel Begunkov <[email protected]> |
io_uring: fix error pbuf checking
Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent error handling in io_alloc_pbuf_ring().
KASAN: null-ptr-deref in range [0x0000000000000000-0
io_uring: fix error pbuf checking
Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent error handling in io_alloc_pbuf_ring().
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:__io_remove_buffers+0xac/0x700 io_uring/kbuf.c:341 Call Trace: <TASK> io_put_bl io_uring/kbuf.c:378 [inline] io_destroy_buffers+0x14e/0x490 io_uring/kbuf.c:392 io_ring_ctx_free+0xa00/0x1070 io_uring/io_uring.c:2613 io_ring_exit_work+0x80f/0x8a0 io_uring/io_uring.c:2844 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd40 kernel/workqueue.c:3390 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Cc: [email protected] Reported-by: [email protected] Fixes: 87585b05757dc ("io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c5f9df20560bd9830401e8e48abc029e7cfd9f5e.1721329239.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|