History log of /linux-6.15/io_uring/register.c (Results 1 – 25 of 37)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3
# 6f377873 15-Feb-2025 David Wei <[email protected]>

io_uring/zcrx: add interface queue and refill queue

Add a new object called an interface queue (ifq) that represents a net
rx queue that has been configured for zero copy. Each ifq is registered
usi

io_uring/zcrx: add interface queue and refill queue

Add a new object called an interface queue (ifq) that represents a net
rx queue that has been configured for zero copy. Each ifq is registered
using a new registration opcode IORING_REGISTER_ZCRX_IFQ.

The refill queue is allocated by the kernel and mapped by userspace
using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main
SQ/CQ. It is used by userspace to return buffers that it is done with,
which will then be re-used by the netdev again.

The main CQ ring is used to notify userspace of received data by using
the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each
entry contains the offset + len to the data.

For now, each io_uring instance only has a single ifq.

Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: David Wei <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc2, v6.14-rc1
# a23ad06b 24-Jan-2025 Jens Axboe <[email protected]>

io_uring/register: use atomic_read/write for sq_flags migration

A previous commit changed all of the migration from the old to the new
ring for resizing to use READ/WRITE_ONCE. However, ->sq_flags i

io_uring/register: use atomic_read/write for sq_flags migration

A previous commit changed all of the migration from the old to the new
ring for resizing to use READ/WRITE_ONCE. However, ->sq_flags is an
atomic_t, and while most archs won't complain on this, some will indeed
flag this:

io_uring/register.c:554:9: sparse: sparse: cast to non-scalar
io_uring/register.c:554:9: sparse: sparse: cast from non-scalar

Just use atomic_set/atomic_read for handling this case.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Fixes: 2c5aae129f42 ("io_uring/register: document io_register_resize_rings() shared mem usage")
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.13
# bb2d7634 16-Jan-2025 Pavel Begunkov <[email protected]>

io_uring: clean up io_uring_register_get_file()

Make it always reference the returned file. It's safer, especially with
unregistrations happening under it. And it makes the api cleaner with no
condi

io_uring: clean up io_uring_register_get_file()

Make it always reference the returned file. It's safer, especially with
unregistrations happening under it. And it makes the api cleaner with no
conditional clean ups by the caller.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/0d0b13a63e8edd6b5d360fc821dcdb035cb6b7e0.1736995897.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 53745105 15-Jan-2025 Josh Triplett <[email protected]>

io_uring: Factor out a function to parse restrictions

Preparation for subsequent work on inherited restrictions.

Signed-off-by: Josh Triplett <[email protected]>
Reviewed-by: Pavel Begunkov <as

io_uring: Factor out a function to parse restrictions

Preparation for subsequent work on inherited restrictions.

Signed-off-by: Josh Triplett <[email protected]>
Reviewed-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/9bac2b4d1b9b9ab41c55ea3816021be847f354df.1736932318.git.josh@joshtriplett.org
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 6f7a644e 15-Jan-2025 Jens Axboe <[email protected]>

io_uring/register: cache old SQ/CQ head reading for copies

The SQ and CQ ring heads are read twice - once for verifying that it's
within bounds, and once inside the loops copying SQE and CQE entries

io_uring/register: cache old SQ/CQ head reading for copies

The SQ and CQ ring heads are read twice - once for verifying that it's
within bounds, and once inside the loops copying SQE and CQE entries.
This is technically incorrect, in case the values could get modified
in between verifying them and using them in the copy loop. While this
won't lead to anything truly nefarious, it may cause longer loop times
for the copies than expected.

Read the ring head values once, and use the verified value in the copy
loops.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 2c5aae12 15-Jan-2025 Jens Axboe <[email protected]>

io_uring/register: document io_register_resize_rings() shared mem usage

It can be a bit hard to tell which parts of io_register_resize_rings()
are operating on shared memory, and which ones are not.

io_uring/register: document io_register_resize_rings() shared mem usage

It can be a bit hard to tell which parts of io_register_resize_rings()
are operating on shared memory, and which ones are not. And anything
reading or writing to those regions should really use the read/write
once primitives.

Hence add those, ensuring sanity in how this memory is accessed, and
helping document the shared nature of it.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 8911798d 15-Jan-2025 Jens Axboe <[email protected]>

io_uring/register: use stable SQ/CQ ring data during resize

Normally the kernel would not expect an application to modify any of
the data shared with the kernel during a resize operation, but of
cou

io_uring/register: use stable SQ/CQ ring data during resize

Normally the kernel would not expect an application to modify any of
the data shared with the kernel during a resize operation, but of
course the kernel cannot always assume good intent on behalf of the
application.

As part of resizing the rings, existing SQEs and CQEs are copied over
to the new storage. Resizing uses the masks in the newly allocated
shared storage to index the arrays, however it's possible that malicious
userspace could modify these after they have been sanity checked.

Use the validated and locally stored CQ and SQ ring sizing for masking
to ensure the values are both stable and valid.

Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Jann Horn <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1
# 81a4058e 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: use region api for CQ

Convert internal parts of the CQ/SQ array managment to the region API.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/46fc3c8

io_uring: use region api for CQ

Convert internal parts of the CQ/SQ array managment to the region API.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/46fc3c801290d6b1ac16023d78f6b8e685c87fd6.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 8078486e 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: use region api for SQ

Convert internal parts of the SQ managment to the region API.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/1fb73ced6b835cb3

io_uring: use region api for SQ

Convert internal parts of the SQ managment to the region API.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/1fb73ced6b835cb319ab0fe1dc0b2e982a9a5650.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 02255d55 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: pass ctx to io_register_free_rings

A preparation patch, pass the context to io_register_free_rings.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/

io_uring: pass ctx to io_register_free_rings

A preparation patch, pass the context to io_register_free_rings.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/c1865fd2b3d4db22d1a1aac7dd06ea22cb990834.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 087f9978 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring/memmap: implement mmap for regions

The patch implements mmap for the param region and enables the kernel
allocation mode. Internally it uses a fixed mmap offset, however the
user has to use

io_uring/memmap: implement mmap for regions

The patch implements mmap for the param region and enables the kernel
allocation mode. Internally it uses a fixed mmap offset, however the
user has to use the offset returned in
struct io_uring_region_desc::mmap_offset.

Note, mmap doesn't and can't take ->uring_lock and the region / ring
lookup is protected by ->mmap_lock, and it's directly peeking at
ctx->param_region. We can't protect io_create_region() with the
mmap_lock as it'd deadlock, which is why io_create_region_mmap_safe()
initialises it for us in a temporary variable and then publishes it
with the lock taken. It's intentionally decoupled from main region
helpers, and in the future we might want to have a list of active
regions, which then could be protected by the ->mmap_lock.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/0f1212bd6af7fb39b63514b34fae8948014221d1.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 1e21df69 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring/memmap: implement kernel allocated regions

Allow the kernel to allocate memory for a region. That's the classical
way SQ/CQ are allocated. It's not yet useful to user space as there
is no w

io_uring/memmap: implement kernel allocated regions

Allow the kernel to allocate memory for a region. That's the classical
way SQ/CQ are allocated. It's not yet useful to user space as there
is no way to mmap it, which is why it's explicitly disabled in
io_register_mem_region().

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/7b8c40e6542546bbf93f4842a9a42a7373b81e0d.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 943d0609 29-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: rename ->resize_lock

->resize_lock is used for resizing rings, but it's a good idea to reuse
it in other cases as well. Rename it into mmap_lock as it's protects
from races with mmap.

Sig

io_uring: rename ->resize_lock

->resize_lock is used for resizing rings, but it's a good idea to reuse
it in other cases as well. Rename it into mmap_lock as it's protects
from races with mmap.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/68f705306f3ac4d2fb999eb80ea1615015ce9f7f.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# c261e4f1 19-Dec-2024 Jens Axboe <[email protected]>

io_uring/register: limit ring resizing to DEFER_TASKRUN

With DEFER_TASKRUN, we know the ring can't be both waited upon and
resized at the same time. This is important for CQ resizing. Allowing SQ
ri

io_uring/register: limit ring resizing to DEFER_TASKRUN

With DEFER_TASKRUN, we know the ring can't be both waited upon and
resized at the same time. This is important for CQ resizing. Allowing SQ
ring resizing is more trivial, but isn't the interesting use case. Hence
limit ring resizing in general to DEFER_TASKRUN only for now. This isn't
a huge problem as CQ ring resizing is generally the most useful on
networking type of workloads where it can be hard to size the ring
appropriately upfront, and those should be using DEFER_TASKRUN for
better performance.

Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# e358e09a 18-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: protect register tracing

Syz reports:

BUG: KCSAN: data-race in __se_sys_io_uring_register / io_sqe_files_register

read-write to 0xffff8881021940b8 of 4 bytes by task 5923 on cpu 1:
io_s

io_uring: protect register tracing

Syz reports:

BUG: KCSAN: data-race in __se_sys_io_uring_register / io_sqe_files_register

read-write to 0xffff8881021940b8 of 4 bytes by task 5923 on cpu 1:
io_sqe_files_register+0x2c4/0x3b0 io_uring/rsrc.c:713
__io_uring_register io_uring/register.c:403 [inline]
__do_sys_io_uring_register io_uring/register.c:611 [inline]
__se_sys_io_uring_register+0x8d0/0x1280 io_uring/register.c:591
__x64_sys_io_uring_register+0x55/0x70 io_uring/register.c:591
x64_sys_call+0x202/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:428
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff8881021940b8 of 4 bytes by task 5924 on cpu 0:
__do_sys_io_uring_register io_uring/register.c:613 [inline]
__se_sys_io_uring_register+0xe4a/0x1280 io_uring/register.c:591
__x64_sys_io_uring_register+0x55/0x70 io_uring/register.c:591
x64_sys_call+0x202/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:428
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Which should be due to reading the table size after unlock. We don't
care much as it's just to print it in trace, but we might as well do it
under the lock.

Reported-by: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/8233af2886a37b57f79e444e3db88fcfda1817ac.1731942203.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12
# d617b314 15-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: restore back registered wait arguments

Now we've got a more generic region registration API, place
IORING_ENTER_EXT_ARG_REG and re-enable it.

First, the user has to register a region with

io_uring: restore back registered wait arguments

Now we've got a more generic region registration API, place
IORING_ENTER_EXT_ARG_REG and re-enable it.

First, the user has to register a region with the
IORING_MEM_REGION_REG_WAIT_ARG flag set. It can only be done for a
ring in a disabled state, aka IORING_SETUP_R_DISABLED, to avoid races
with already running waiters. With that we should have stable constant
values for ctx->cq_wait_{size,arg} in io_get_ext_arg_reg() and hence no
READ_ONCE required.

The other API difference is that we're now passing byte offsets instead
of indexes. The user _must_ align all offsets / pointers to the native
word size, failing to do so might but not necessarily has to lead to a
failure usually returned as -EFAULT. liburing will be hiding this
details from users.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/81822c1b4ffbe8ad391b4f9ad1564def0d26d990.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 93238e66 15-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: add memory region registration

Regions will serve multiple purposes. First, with it we can decouple
ring/etc. object creation from registration / mapping of the memory they
will be placed

io_uring: add memory region registration

Regions will serve multiple purposes. First, with it we can decouple
ring/etc. object creation from registration / mapping of the memory they
will be placed in. We already have hacks that allow to put both SQ and
CQ into the same huge page, in the future we should be able to:

region = create_region(io_ring);
create_pbuf_ring(io_uring, region, offset=0);
create_pbuf_ring(io_uring, region, offset=N);

The second use case is efficiently passing parameters. The following
patch enables back on top of regions IORING_ENTER_EXT_ARG_REG, which
optimises wait arguments. It'll also be useful for request arguments
replacing iovecs, msghdr, etc. pointers. Eventually it would also be
handy for BPF as well if it comes to fruition.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/0798cf3a14fad19cfc96fc9feca5f3e11481691d.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 83e04152 15-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: temporarily disable registered waits

Disable wait argument registration as it'll be replaced with a more
generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing
in a few commits

io_uring: temporarily disable registered waits

Disable wait argument registration as it'll be replaced with a more
generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing
in a few commits so leave it be.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/70b1d1d218c41ba77a76d1789c8641dab0b0563e.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5
# 3597f278 26-Oct-2024 Jens Axboe <[email protected]>

io_uring/rsrc: unify file and buffer resource tables

For files, there's nr_user_files/file_table/file_data, and buffers have
nr_user_bufs/user_bufs/buf_data. There's no reason why file_table and
fil

io_uring/rsrc: unify file and buffer resource tables

For files, there's nr_user_files/file_table/file_data, and buffers have
nr_user_bufs/user_bufs/buf_data. There's no reason why file_table and
file_data can't be the same thing, and ditto for the buffer side. That
gets rid of more io_ring_ctx state that's in two spots rather than just
being in one spot, as it should be. Put all the registered file data in
one locations, and ditto on the buffer front.

This also avoids having both io_rsrc_data->nodes being an allocated
array, and ->user_bufs[] or ->file_table.nodes. There's no reason to
have this information duplicated. Keep it in one spot, io_rsrc_data,
along with how many resources are available.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# aa00f67a 22-Oct-2024 Jens Axboe <[email protected]>

io_uring: add support for fixed wait regions

Generally applications have 1 or a few waits of waiting, yet they pass
in a struct io_uring_getevents_arg every time. This needs to get copied
and, in tu

io_uring: add support for fixed wait regions

Generally applications have 1 or a few waits of waiting, yet they pass
in a struct io_uring_getevents_arg every time. This needs to get copied
and, in turn, the timeout value needs to get copied.

Rather than do this for every invocation, allow the application to
register a fixed set of wait regions that can simply be indexed when
asking the kernel to wait on events.

At ring setup time, the application can register a number of these wait
regions and initialize region/index 0 upfront:

struct io_uring_reg_wait *reg;

reg = io_uring_setup_reg_wait(ring, nr_regions, &ret);

/* set timeout and mark as set, sigmask/sigmask_sz as needed */
reg->ts.tv_sec = 0;
reg->ts.tv_nsec = 100000;
reg->flags = IORING_REG_WAIT_TS;

where nr_regions >= 1 && nr_regions <= PAGE_SIZE / sizeof(*reg). The
above initializes index 0, but 63 other regions can be initialized,
if needed. Now, instead of doing:

struct __kernel_timespec timeout = { .tv_nsec = 100000, };

io_uring_submit_and_wait_timeout(ring, &cqe, nr, &t, NULL);

to wait for events for each submit_and_wait, or just wait, operation, it
can just reference the above region at offset 0 and do:

io_uring_submit_and_wait_reg(ring, &cqe, nr, 0);

to achieve the same goal of waiting 100usec without needing to copy
both struct io_uring_getevents_arg (24b) and struct __kernel_timeout
(16b) for each invocation. Struct io_uring_reg_wait looks as follows:

struct io_uring_reg_wait {
struct __kernel_timespec ts;
__u32 min_wait_usec;
__u32 flags;
__u64 sigmask;
__u32 sigmask_sz;
__u32 pad[3];
__u64 pad2[2];
};

embedding the timeout itself in the region, rather than passing it as
a pointer as well. Note that the signal mask is still passed as a
pointer, both for compatability reasons, but also because there doesn't
seem to be a lot of high frequency waits scenarios that involve setting
and resetting the signal mask for each wait.

The application is free to modify any region before a wait call, or it
can use keep multiple regions with different settings to avoid needing to
modify the same one for wait calls. Up to a page size of regions is mapped
by default, allowing PAGE_SIZE / 64 available regions for use.

The registered region must fit within a page. On a 4kb page size system,
that allows for 64 wait regions if a full page is used, as the size of
struct io_uring_reg_wait is 64b. The region registered must be aligned
to io_uring_reg_wait in size. It's valid to register less than 64
entries.

In network performance testing with zero-copy, this reduced the time
spent waiting on the TX side from 3.12% to 0.3% and the RX side from 4.4%
to 0.3%.

Wait regions are fixed for the lifetime of the ring - once registered,
they are persistent until the ring is torn down. The regions support
minimum wait timeout as well as the regular waits.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 79cfe9e5 21-Oct-2024 Jens Axboe <[email protected]>

io_uring/register: add IORING_REGISTER_RESIZE_RINGS

Once a ring has been created, the size of the CQ and SQ rings are fixed.
Usually this isn't a problem on the SQ ring side, as it merely controls
t

io_uring/register: add IORING_REGISTER_RESIZE_RINGS

Once a ring has been created, the size of the CQ and SQ rings are fixed.
Usually this isn't a problem on the SQ ring side, as it merely controls
the available number of requests that can be submitted in a single
system call, and there's rarely a need to change that.

For the CQ ring, it's a different story. For most efficient use of
io_uring, it's important that the CQ ring never overflows. This means
that applications must size it for the worst case scenario, which can
be wasteful.

Add IORING_REGISTER_RESIZE_RINGS, which allows an application to resize
the existing rings. It takes a struct io_uring_params argument, the same
one which is used to setup the ring initially, and resizes rings
according to the sizes given.

Certain properties are always inherited from the original ring setup,
like SQE128/CQE32 and other setup options. The implementation only
allows flag associated with how the CQ ring is sized and clamped.

Existing unconsumed SQE and CQE entries are copied as part of the
process. If either the SQ or CQ resized destination ring cannot hold the
entries already present in the source rings, then the operation is failed
with -EOVERFLOW. Any register op holds ->uring_lock, which prevents new
submissions, and the internal mapping holds the completion lock as well
across moving CQ ring state.

To prevent races between mmap and ring resizing, add a mutex that's
solely used to serialize ring resize and mmap. mmap_sem can't be used
here, as as fork'ed process may be doing mmaps on the ring as well.
The ctx->resize_lock is held across mmap operations, and the resize
will grab it before swapping out the already mapped new data.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1
# a3771321 24-Sep-2024 Jens Axboe <[email protected]>

io_uring/msg_ring: add support for sending a sync message

Normally MSG_RING requires both a source and a destination ring. But
some users don't always have a ring avilable to send a message from, ye

io_uring/msg_ring: add support for sending a sync message

Normally MSG_RING requires both a source and a destination ring. But
some users don't always have a ring avilable to send a message from, yet
they still need to notify a target ring.

Add support for using io_uring_register(2) without having a source ring,
using a file descriptor of -1 for that. Internally those are called
blind registration opcodes. Implement IORING_REGISTER_SEND_MSG_RING as a
blind opcode, which simply takes an sqe that the application can put on
the stack and use the normal liburing helpers to initialize it. Then the
app can call:

io_uring_register(-1, IORING_REGISTER_SEND_MSG_RING, &sqe, 1);

and get the same behavior in terms of the target, where a CQE is posted
with the details given in the sqe.

For now this takes a single sqe pointer argument, and hence arg must
be set to that, and nr_args must be 1. Could easily be extended to take
an array of sqes, but for now let's keep it simple.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 2f6a55e4 16-Sep-2024 Dan Carpenter <[email protected]>

io_uring: clean up a type in io_uring_register_get_file()

Originally "fd" was unsigned int but it was changed to int when we pulled
this code into a separate function in commit 0b6d253e084a
("io_uri

io_uring: clean up a type in io_uring_register_get_file()

Originally "fd" was unsigned int but it was changed to int when we pulled
this code into a separate function in commit 0b6d253e084a
("io_uring/register: provide helper to get io_ring_ctx from 'fd'"). This
doesn't really cause a runtime problem because the call to
array_index_nospec() will clamp negative fds to 0 and nothing else uses
the negative values.

Signed-off-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.11
# 636119af 14-Sep-2024 Jens Axboe <[email protected]>

io_uring: rename "copy buffers" to "clone buffers"

A recent commit added support for copying registered buffers from one
ring to another. But that term is a bit confusing, as no copying of
buffer da

io_uring: rename "copy buffers" to "clone buffers"

A recent commit added support for copying registered buffers from one
ring to another. But that term is a bit confusing, as no copying of
buffer data is done here. What is being done is simply cloning the
buffer registrations from one ring to another.

Rename it while we still can, so that it's more descriptive. No
functional changes in this patch.

Fixes: 7cc2a6eadcd7 ("io_uring: add IORING_REGISTER_COPY_BUFFERS method")
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 7cc2a6ea 11-Sep-2024 Jens Axboe <[email protected]>

io_uring: add IORING_REGISTER_COPY_BUFFERS method

Buffers can get registered with io_uring, which allows to skip the
repeated pin_pages, unpin/unref pages for each O_DIRECT operation. This
reduces t

io_uring: add IORING_REGISTER_COPY_BUFFERS method

Buffers can get registered with io_uring, which allows to skip the
repeated pin_pages, unpin/unref pages for each O_DIRECT operation. This
reduces the overhead of O_DIRECT IO.

However, registrering buffers can take some time. Normally this isn't an
issue as it's done at initialization time (and hence less critical), but
for cases where rings can be created and destroyed as part of an IO
thread pool, registering the same buffers for multiple rings become a
more time sensitive proposition. As an example, let's say an application
has an IO memory pool of 500G. Initial registration takes:

Got 500 huge pages (each 1024MB)
Registered 500 pages in 409 msec

or about 0.4 seconds. If we go higher to 900 1GB huge pages being
registered:

Registered 900 pages in 738 msec

which is, as expected, a fully linear scaling.

Rather than have each ring pin/map/register the same buffer pool,
provide an io_uring_register(2) opcode to simply duplicate the buffers
that are registered with another ring. Adding the same 900GB of
registered buffers to the target ring can then be accomplished in:

Copied 900 pages in 17 usec

While timing differs a bit, this provides around a 25,000-40,000x
speedup for this use case.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


12