History log of /linux-6.15/io_uring/kbuf.c (Results 1 – 25 of 74)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2
# cf960726 07-Apr-2025 Jens Axboe <[email protected]>

io_uring/kbuf: reject zero sized provided buffers

This isn't fixing a real issue, but there's also zero point in going
through group and buffer setup, when the buffers are going to be
rejected once

io_uring/kbuf: reject zero sized provided buffers

This isn't fixing a real issue, but there's also zero point in going
through group and buffer setup, when the buffers are going to be
rejected once attempted to get used.

Cc: [email protected]
Reported-by: [email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.15-rc1, v6.14, v6.14-rc7
# cf9536e5 10-Mar-2025 Jens Axboe <[email protected]>

io_uring/kbuf: enable bundles for incrementally consumed buffers

The original support for incrementally consumed buffers didn't allow it
to be used with bundles, with the assumption being that incre

io_uring/kbuf: enable bundles for incrementally consumed buffers

The original support for incrementally consumed buffers didn't allow it
to be used with bundles, with the assumption being that incremental
buffers are generally larger, and hence there's less of a nedd to
support it.

But that assumption may not be correct - it's perfectly viable to use
smaller buffers with incremental consumption, and there may be valid
reasons for an application or framework to do so.

As there's really no need to explicitly disable bundles with
incrementally consumed buffers, allow it. This actually makes the peek
side cheaper and simpler, with the completion side basically the same,
just needing to iterate for the consumed length.

Reported-by: Norman Maurer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2
# 5d3e5124 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: uninline __io_put_kbufs

__io_put_kbufs() and other helper functions are too large to be inlined,
compilers would normally refuse to do so. Uninline it and move together
with io_kbuf_c

io_uring/kbuf: uninline __io_put_kbufs

__io_put_kbufs() and other helper functions are too large to be inlined,
compilers would normally refuse to do so. Uninline it and move together
with io_kbuf_commit into kbuf.c.

io_kbuf_commitSigned-off-by: Pavel Begunkov <[email protected]>

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/3dade7f55ad590e811aff83b1ec55c9c04e17b2b.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 54e00d9a 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: introduce io_kbuf_drop_legacy()

io_kbuf_drop() is only used for legacy provided buffers, and so
__io_put_kbuf_list() is never called for REQ_F_BUFFER_RING. Remove the
dead branch out

io_uring/kbuf: introduce io_kbuf_drop_legacy()

io_kbuf_drop() is only used for legacy provided buffers, and so
__io_put_kbuf_list() is never called for REQ_F_BUFFER_RING. Remove the
dead branch out of __io_put_kbuf_list(), rename it into
io_kbuf_drop_legacy() and use it directly instead of io_kbuf_drop().

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/c8cc73e2272f09a86ecbdad9ebdd8304f8e583c0.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# e150e70f 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: open code __io_put_kbuf()

__io_put_kbuf() is a trivial wrapper, open code it into
__io_put_kbufs().

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.o

io_uring/kbuf: open code __io_put_kbuf()

__io_put_kbuf() is a trivial wrapper, open code it into
__io_put_kbufs().

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/9dc17380272b48d56c95992c6f9eaacd5546e1d3.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 13ee854e 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: remove legacy kbuf caching

Remove all struct io_buffer caches. It makes it a fair bit simpler.
Apart from from killing a bunch of lines and juggling between lists,
__io_put_kbuf_list(

io_uring/kbuf: remove legacy kbuf caching

Remove all struct io_buffer caches. It makes it a fair bit simpler.
Apart from from killing a bunch of lines and juggling between lists,
__io_put_kbuf_list() doesn't need ->completion_lock locking now.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/18287217466ee2576ea0b1e72daccf7b22c7e856.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# dc39fb10 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: simplify __io_put_kbuf

As a preparation step remove an optimisation from __io_put_kbuf() trying
to use the locked cache. With that __io_put_kbuf_list() is only used
with ->io_buffers_

io_uring/kbuf: simplify __io_put_kbuf

As a preparation step remove an optimisation from __io_put_kbuf() trying
to use the locked cache. With that __io_put_kbuf_list() is only used
with ->io_buffers_comp, and we remove the explicit list argument.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/1b7f1394ec4afc7f96b35a61f5992e27c49fd067.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 9afe6847 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: remove legacy kbuf kmem cache

Remove the kmem cache used by legacy provided buffers.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/8195c207d8

io_uring/kbuf: remove legacy kbuf kmem cache

Remove the kmem cache used by legacy provided buffers.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/8195c207d8524d94e972c0c82de99282289f7f5c.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 7919292a 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: remove legacy kbuf bulk allocation

Legacy provided buffers are slow and discouraged in favour of the ring
variant. Remove the bulk allocation to keep it simpler as we don't care
about

io_uring/kbuf: remove legacy kbuf bulk allocation

Legacy provided buffers are slow and discouraged in favour of the ring
variant. Remove the bulk allocation to keep it simpler as we don't care
about performance.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/a064d70370e590efed8076e9501ae4cfc20fe0ca.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 88027663 12-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: reallocate buf lists on upgrade

IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it
was created for legacy selected buffer and has been emptied. It violates
the req

io_uring/kbuf: reallocate buf lists on upgrade

IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it
was created for legacy selected buffer and has been emptied. It violates
the requirement that most of the field should stay stable after publish.
Always reallocate it instead.

Cc: [email protected]
Reported-by: Pumpkin Chang <[email protected]>
Fixes: 2fcabce2d7d34 ("io_uring: disallow mixed provided buffer group registrations")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6
# ed123c94 03-Jan-2025 Jens Axboe <[email protected]>

io_uring/kbuf: use pre-committed buffer address for non-pollable file

For non-pollable files, buffer ring consumption will commit upfront.
This is fine, but io_ring_buffer_select() will return the a

io_uring/kbuf: use pre-committed buffer address for non-pollable file

For non-pollable files, buffer ring consumption will commit upfront.
This is fine, but io_ring_buffer_select() will return the address of the
buffer after having committed it. For incrementally consumed buffers,
this is incorrect as it will modify the buffer address.

Store the pre-committed value and return that. If that isn't done, then
the initial part of the buffer is not used and the application will
correctly assume the content arrived at the start of the userspace
buffer, but the kernel will have put it later in the buffer. Or it can
cause a spurious -EFAULT returned in the CQE, depending on the buffer
size. As bounds are suitably checked for doing the actual IO, no adverse
side effects are possible - it's just a data misplacement within the
existing buffer.

Reported-by: Gwendal Fernet <[email protected]>
Cc: [email protected]
Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2
# febfbf76 04-Dec-2024 Colin Ian King <[email protected]>

io_uring/kbuf: fix unintentional sign extension on shift of reg.bgid

Shifting reg.bgid << IORING_OFF_PBUF_SHIFT results in a promotion
from __u16 to a 32 bit signed integer, this is then sign extend

io_uring/kbuf: fix unintentional sign extension on shift of reg.bgid

Shifting reg.bgid << IORING_OFF_PBUF_SHIFT results in a promotion
from __u16 to a 32 bit signed integer, this is then sign extended
to a 64 bit unsigned long on 64 bit architectures. If reg.bgid is
greater than 0x7fff then this leads to a sign extended result where
all the upper 32 bits of mmap_offset are set to 1. Fix this by
casting reg.bgid to the same type as mmap_offset before performing
the shift.

Fixes: ef62de3c4ad5 ("io_uring/kbuf: use region api for pbuf rings")
Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.13-rc1
# 7cd7b957 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring/memmap: unify io_uring mmap'ing code

All mapped memory is now backed by regions and we can unify and clean
up io_region_validate_mmap() and io_uring_mmap(). Extract a function
looking up a

io_uring/memmap: unify io_uring mmap'ing code

All mapped memory is now backed by regions and we can unify and clean
up io_region_validate_mmap() and io_uring_mmap(). Extract a function
looking up a region, the rest of the handling should be generic and just
needs the region.

There is one more ring type specific code, i.e. the mmaping size
truncation quirk for IORING_OFF_[S,C]Q_RING, which is left as is.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/f5e1eda1562bfd34276de07465525ae5f10e1e84.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# ef62de3c 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring/kbuf: use region api for pbuf rings

Convert internal parts of the provided buffer ring managment to the
region API. It's the last non-region mapped ring we have, so it also
kills a bunch of

io_uring/kbuf: use region api for pbuf rings

Convert internal parts of the provided buffer ring managment to the
region API. It's the last non-region mapped ring we have, so it also
kills a bunch of now unused memmap.c helpers.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/6c40cf7beaa648558acd4d84bc0fb3279a35d74b.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 90175f3f 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring/kbuf: remove pbuf ring refcounting

struct io_buffer_list refcounting was needed for RCU based sync with
mmap, now we can kill it.

Signed-off-by: Pavel Begunkov <[email protected]>
Li

io_uring/kbuf: remove pbuf ring refcounting

struct io_buffer_list refcounting was needed for RCU based sync with
mmap, now we can kill it.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/4a9cc54bf0077bb2bf2f3daf917549ddd41080da.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 78fda3d0 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring/kbuf: use mmap_lock to sync with mmap

A preparation / cleanup patch simplifying the buf ring - mmap
synchronisation. Instead of relying on RCU, which is trickier, do it by
grabbing the mmap

io_uring/kbuf: use mmap_lock to sync with mmap

A preparation / cleanup patch simplifying the buf ring - mmap
synchronisation. Instead of relying on RCU, which is trickier, do it by
grabbing the mmap_lock when when anyone tries to publish or remove a
registered buffer to / from ->io_bl_xa.

Modifications of the xarray should always be protected by both
->uring_lock and ->mmap_lock, while lookups should hold either of them.
While a struct io_buffer_list is in the xarray, the mmap related fields
like ->flags and ->buf_pages should stay stable.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/af13bde56ee1a26bcaefaa9aad37a9ea318a590e.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6
# f274495a 30-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: return correct iovec count from classic buffer peek

io_provided_buffers_select() returns 0 to indicate success, but it should
be returning 1 to indicate that 1 vec was mapped. This ca

io_uring/kbuf: return correct iovec count from classic buffer peek

io_provided_buffers_select() returns 0 to indicate success, but it should
be returning 1 to indicate that 1 vec was mapped. This causes peeking
to fail with classic provided buffers, and while that's not a use case
that anyone should use, it should still work correctly.

The end result is that no buffer will be selected, and hence a completion
with '0' as the result will be posted, without a buffer attached.

Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3
# ae98dbf4 09-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: add support for incremental buffer consumption

By default, any recv/read operation that uses provided buffers will
consume at least 1 buffer fully (and maybe more, in case of bundles)

io_uring/kbuf: add support for incremental buffer consumption

By default, any recv/read operation that uses provided buffers will
consume at least 1 buffer fully (and maybe more, in case of bundles).
This adds support for incremental consumption, meaning that an
application may add large buffers, and each read/recv will just consume
the part of the buffer that it needs.

For example, let's say an application registers 1MB buffers in a
provided buffer ring, for streaming receives. If it gets a short recv,
then the full 1MB buffer will be consumed and passed back to the
application. With incremental consumption, only the part that was
actually used is consumed, and the buffer remains the current one.

This means that both the application and the kernel needs to keep track
of what the current receive point is. Each recv will still pass back a
buffer ID and the size consumed, the only difference is that before the
next receive would always be the next buffer in the ring. Now the same
buffer ID may return multiple receives, each at an offset into that
buffer from where the previous receive left off. Example:

Application registers a provided buffer ring, and adds two 32K buffers
to the ring.

Buffer1 address: 0x1000000 (buffer ID 0)
Buffer2 address: 0x2000000 (buffer ID 1)

A recv completion is received with the following values:

cqe->res 0x1000 (4k bytes received)
cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)

and the application now knows that 4096b of data is available at
0x1000000, the start of that buffer, and that more data from this buffer
will be coming. Now the next receive comes in:

cqe->res 0x2010 (8k bytes received)
cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)

which tells the application that 8k is available where the last
completion left off, at 0x1001000. Next completion is:

cqe->res 0x5000 (20k bytes received)
cqe->flags 0x1 (CQE_F_BUFFER set, buffer ID 0)

and the application now knows that 20k of data is available at
0x1003000, which is where the previous receive ended. CQE_F_BUF_MORE
isn't set, as no more data is available in this buffer ID. The next
completion is then:

cqe->res 0x1000 (4k bytes received)
cqe->flags 0x10001 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 1)

which tells the application that buffer ID 1 is now the current one,
hence there's 4k of valid data at 0x2000000. 0x2001000 will be the next
receive point for this buffer ID.

When a buffer will be reused by future CQE completions,
IORING_CQE_BUF_MORE will be set in cqe->flags. This tells the application
that the kernel isn't done with the buffer yet, and that it should expect
more completions for this buffer ID. Will only be set by provided buffer
rings setup with IOU_PBUF_RING INC, as that's the only type of buffer
that will see multiple consecutive completions for the same buffer ID.
For any other provided buffer type, any completion that passes back
a buffer to the application is final.

Once a buffer has been fully consumed, the buffer ring head is
incremented and the next receive will indicate the next buffer ID in the
CQE cflags.

On the send side, the application can manage how much data is sent from
an existing buffer by setting sqe->len to the desired send length.

An application can request incremental consumption by setting
IOU_PBUF_RING_INC in the provided buffer ring registration. Outside of
that, any provided buffer ring setup and buffer additions is done like
before, no changes there. The only change is in how an application may
see multiple completions for the same buffer ID, hence needing to know
where the next receive will happen.

Note that like existing provided buffer rings, this should not be used
with IOSQE_ASYNC, as both really require the ring to remain locked over
the duration of the buffer selection and the operation completion. It
will consume a buffer otherwise regardless of the size of the IO done.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 6733e678 27-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: pass in 'len' argument for buffer commit

In preparation for needing the consumed length, pass in the length being
completed. Unused right now, but will be used when it is possible to

io_uring/kbuf: pass in 'len' argument for buffer commit

In preparation for needing the consumed length, pass in the length being
completed. Unused right now, but will be used when it is possible to
partially consume a buffer.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 2c8fa70b 12-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h

In preparation for using this helper in kbuf.h as well, move it there and
turn it into a macro.

Signed-off-by: Jens Axboe <[email protected]>


# ecd5c9b2 12-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: add io_kbuf_commit() helper

Committing the selected ring buffer is currently done in three different
spots, combine it into a helper and just call that.

Signed-off-by: Jens Axboe <ax

io_uring/kbuf: add io_kbuf_commit() helper

Committing the selected ring buffer is currently done in three different
spots, combine it into a helper and just call that.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# a69307a5 09-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: turn io_buffer_list booleans into flags

We could just move these two and save some space, but in preparation
for adding another flag, turn them into flags first.

This saves 8 bytes i

io_uring/kbuf: turn io_buffer_list booleans into flags

We could just move these two and save some space, but in preparation
for adding another flag, turn them into flags first.

This saves 8 bytes in struct io_buffer_list, making it exactly half
a cacheline on 64-bit archs now rather than 40 bytes.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 03e02e8f 08-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: use 'bl' directly rather than req->buf_list

req->buf_list is assigned higher up and is safe to use as we remain
within a locked region, as is the 'bl' variable itself from which it
wa

io_uring/kbuf: use 'bl' directly rather than req->buf_list

req->buf_list is assigned higher up and is safe to use as we remain
within a locked region, as is the 'bl' variable itself from which it
was assigned. To improve readability, use 'bl' directly rather than
get it from the io_kiocb, if we need to increment the head directly
in the buffer selection path. This makes it readily apparent that
it's the same io_buffer_list being used.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# e0ee9676 21-Aug-2024 Jens Axboe <[email protected]>

io_uring/kbuf: sanitize peek buffer setup

Harden the buffer peeking a bit, by adding a sanity check for it having
a valid size. Outside of that, arg->max_len is a size_t, though it's
only ever set t

io_uring/kbuf: sanitize peek buffer setup

Harden the buffer peeking a bit, by adding a sanity check for it having
a valid size. Outside of that, arg->max_len is a size_t, though it's
only ever set to a 32-bit value (as it's governed by MAX_RW_COUNT).
Bump our needed check to a size_t so we know it fits. Finally, cap the
calculated needed iov value to the PEEK_MAX_IMPORT, which is the
maximum number of segments that should be peeked.

Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.11-rc2, v6.11-rc1
# bcc87d97 18-Jul-2024 Pavel Begunkov <[email protected]>

io_uring: fix error pbuf checking

Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent
error handling in io_alloc_pbuf_ring().

KASAN: null-ptr-deref in range [0x0000000000000000-0

io_uring: fix error pbuf checking

Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent
error handling in io_alloc_pbuf_ring().

KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:__io_remove_buffers+0xac/0x700 io_uring/kbuf.c:341
Call Trace:
<TASK>
io_put_bl io_uring/kbuf.c:378 [inline]
io_destroy_buffers+0x14e/0x490 io_uring/kbuf.c:392
io_ring_ctx_free+0xa00/0x1070 io_uring/io_uring.c:2613
io_ring_exit_work+0x80f/0x8a0 io_uring/io_uring.c:2844
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
worker_thread+0x86d/0xd40 kernel/workqueue.c:3390
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Cc: [email protected]
Reported-by: [email protected]
Fixes: 87585b05757dc ("io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/c5f9df20560bd9830401e8e48abc029e7cfd9f5e.1721329239.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


123