History log of /linux-6.15/io_uring/io_uring.h (Results 1 – 25 of 163)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1
# ea910678 28-Mar-2025 Pavel Begunkov <[email protected]>

io_uring: don't pass ctx to tw add remote helper

Unlike earlier versions, io_msg_remote_post() creates a valid request
with a proper context, so don't pass a context to
io_req_task_work_add_remote()

io_uring: don't pass ctx to tw add remote helper

Unlike earlier versions, io_msg_remote_post() creates a valid request
with a proper context, so don't pass a context to
io_req_task_work_add_remote() explicitly but derive it from the request.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/721f51cf34996d98b48f0bfd24ad40aa2730167e.1743190078.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14, v6.14-rc7, v6.14-rc6
# 5027d024 08-Mar-2025 Pavel Begunkov <[email protected]>

io_uring: unify STOP_MULTISHOT with IOU_OK

IOU_OK means that the request ownership is now handed back to core
io_uring and it has to complete it using the result provided in
req->cqe. Same is true f

io_uring: unify STOP_MULTISHOT with IOU_OK

IOU_OK means that the request ownership is now handed back to core
io_uring and it has to complete it using the result provided in
req->cqe. Same is true for multishot and IOU_STOP_MULTISHOT.

Rename it into IOU_COMPLETE to avoid confusion and use for both modes.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/e6a5b2edb0eb9558acb1c8f1db38ac45fee95491.1741453534.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 7a9dcb05 08-Mar-2025 Pavel Begunkov <[email protected]>

io_uring: return -EAGAIN to continue multishot

Multishot errors can be mapped 1:1 to normal errors, but there are not
identical. It leads to a peculiar situation where all multishot requests
has to

io_uring: return -EAGAIN to continue multishot

Multishot errors can be mapped 1:1 to normal errors, but there are not
identical. It leads to a peculiar situation where all multishot requests
has to check in what context they're run and return different codes.

Unify them starting with EAGAIN / IOU_ISSUE_SKIP_COMPLETE(EIOCBQUEUED)
pair, which mean that core io_uring still owns the request and it should
be retried. In case of multishot it's naturally just continues to poll,
otherwise it might poll, use iowq or do any other kind of allowed
blocking. Introduce IOU_RETRY aliased to -EAGAIN for that.

Apart from obvious upsides, multishot can now also check for misuse of
IOU_ISSUE_SKIP_COMPLETE.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/da117b79ce72ecc3ab488c744e29fae9ba54e23b.1741453534.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 30c97035 05-Mar-2025 Yue Haibing <[email protected]>

io_uring: Remove unused declaration io_alloc_async_data()

Commit ef623a647f42 ("io_uring: Move old async data allocation helper
to header") leave behind this unused declaration.

Signed-off-by: Yue

io_uring: Remove unused declaration io_alloc_async_data()

Commit ef623a647f42 ("io_uring: Move old async data allocation helper
to header") leave behind this unused declaration.

Signed-off-by: Yue Haibing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc5
# 3035deac 24-Feb-2025 Pavel Begunkov <[email protected]>

io_uring: introduce io_is_compat()

A preparation patch adding a simple helper for gauging the compat state.
It'll help us to optimise and compile out more code in the following
commits.

Signed-off-

io_uring: introduce io_is_compat()

A preparation patch adding a simple helper for gauging the compat state.
It'll help us to optimise and compile out more code in the following
commits.

Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Anuj Gupta <[email protected]>
Link: https://lore.kernel.org/r/1a87a640265196a67bc38300128e0bfd7839ab1f.1740400452.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc4, v6.14-rc3
# 11ed914b 15-Feb-2025 David Wei <[email protected]>

io_uring/zcrx: add io_recvzc request

Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a
socket. Only the connection should be land on the specific rx queue set
up for zero copy, and t

io_uring/zcrx: add io_recvzc request

Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a
socket. Only the connection should be land on the specific rx queue set
up for zero copy, and the socket must be handled by the io_uring
instance that the rx queue was registered for zero copy with. That's
because neither net_iovs / buffers from our queue can be read by outside
applications, nor zero copy is possible if traffic for the zero copy
connection goes to another queue. This coordination is outside of the
scope of this patch series. Also, any traffic directed to the zero copy
enabled queue is immediately visible to the application, which is why
CAP_NET_ADMIN is required at the registration step.

Of course, no data is actually read out of the socket, it has already
been copied by the netdev into userspace memory via DMA. OP_RECV_ZC
reads skbs out of the socket and checks that its frags are indeed
net_iovs that belong to io_uring. A cqe is queued for each one of these
frags.

Recall that each cqe is a big cqe, with the top half being an
io_uring_zcrx_cqe. The cqe res field contains the len or error. The
lower IORING_ZCRX_AREA_SHIFT bits of the struct io_uring_zcrx_cqe::off
field contain the offset relative to the start of the zero copy area.
The upper part of the off field is trivially zero, and will be used
to carry the area id.

For now, there is no limit as to how much work each OP_RECV_ZC request
does. It will attempt to drain a socket of all available data. This
request always operates in multishot mode.

Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: David Wei <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# bcf8a029 17-Feb-2025 Caleb Sander Mateos <[email protected]>

io_uring: introduce type alias for io_tw_state

In preparation for changing how io_tw_state is passed, introduce a type
alias io_tw_token_t for struct io_tw_state *. This allows for changing
the repr

io_uring: introduce type alias for io_tw_state

In preparation for changing how io_tw_state is passed, introduce a type
alias io_tw_token_t for struct io_tw_state *. This allows for changing
the representation in one place, without having to update the many
functions that just forward their struct io_tw_state * argument.

Also add a comment to struct io_tw_state to explain its purpose.

Signed-off-by: Caleb Sander Mateos <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc2
# 9afe6847 05-Feb-2025 Pavel Begunkov <[email protected]>

io_uring/kbuf: remove legacy kbuf kmem cache

Remove the kmem cache used by legacy provided buffers.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/8195c207d8

io_uring/kbuf: remove legacy kbuf kmem cache

Remove the kmem cache used by legacy provided buffers.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/8195c207d8524d94e972c0c82de99282289f7f5c.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc1
# ff74954e 23-Jan-2025 Jens Axboe <[email protected]>

io_uring/alloc_cache: get rid of _nocache() helper

Just allow passing in NULL for the cache, if the type in question
doesn't have a cache associated with it.

Signed-off-by: Jens Axboe <axboe@kernel

io_uring/alloc_cache: get rid of _nocache() helper

Just allow passing in NULL for the cache, if the type in question
doesn't have a cache associated with it.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# fa359552 23-Jan-2025 Jens Axboe <[email protected]>

io_uring: get rid of alloc cache init_once handling

init_once is called when an object doesn't come from the cache, and
hence needs initial clearing of certain members. While the whole
struct could

io_uring: get rid of alloc cache init_once handling

init_once is called when an object doesn't come from the cache, and
hence needs initial clearing of certain members. While the whole
struct could get cleared by memset() in that case, a few of the cache
members are large enough that this may cause unnecessary overhead if
the caches used aren't large enough to satisfy the workload. For those
cases, some churn of kmalloc+kfree is to be expected.

Ensure that the 3 users that need clearing put the members they need
cleared at the start of the struct, and wrap the rest of the struct in
a struct group so the offset is known.

While at it, improve the interaction with KASAN such that when/if
KASAN writes to members inside the struct that should be retained over
caching, it won't trip over itself. For rw and net, the retaining of
the iovec over caching is disabled if KASAN is enabled. A helper will
free and clear those members in that case.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7
# 60495b08 07-Jan-2025 Pavel Begunkov <[email protected]>

io_uring: silence false positive warnings

If we kill a ring and then immediately exit the task, we'll get
cancellattion running by the task and a kthread in io_ring_exit_work.
For DEFER_TASKRUN, we

io_uring: silence false positive warnings

If we kill a ring and then immediately exit the task, we'll get
cancellattion running by the task and a kthread in io_ring_exit_work.
For DEFER_TASKRUN, we do want to limit it to only one entity executing
it, however it's currently not an issue as it's protected by uring_lock.

Silence lockdep assertions for now, we'll return to it later.

Reported-by: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/7e5f68281acb0f081f65fde435833c68a3b7e02f.1736257837.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4
# ef623a64 16-Dec-2024 Gabriel Krisman Bertazi <[email protected]>

io_uring: Move old async data allocation helper to header

There are two remaining uses of the old async data allocator that do not
rely on the alloc cache. I don't want to make them use the new
all

io_uring: Move old async data allocation helper to header

There are two remaining uses of the old async data allocator that do not
rely on the alloc cache. I don't want to make them use the new
allocator helper because that would require a if(cache) check, which
will result in dead code for the cached case (for callers passing a
cache, gcc can't prove the cache isn't NULL, and will therefore preserve
the check. Since this is an inline function and just a few lines long,
keep a second helper to deal with cases where we don't have an async
data cache.

No functional change intended here. This is just moving the helper
around and making it inline.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 49f7a309 16-Dec-2024 Gabriel Krisman Bertazi <[email protected]>

io_uring: Add generic helper to allocate async data

This helper replaces io_alloc_async_data by using the folded allocation.
Do it in a header to allow the compiler to decide whether to inline.

Sig

io_uring: Add generic helper to allocate async data

This helper replaces io_alloc_async_data by using the folded allocation.
Do it in a header to allow the compiler to decide whether to inline.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1
# f46b9cdb 20-Nov-2024 David Wei <[email protected]>

io_uring: limit local tw done

Instead of eagerly running all available local tw, limit the amount of
local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The
value of 20 is chosen as

io_uring: limit local tw done

Instead of eagerly running all available local tw, limit the amount of
local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The
value of 20 is chosen as a reasonable heuristic to allow enough work
batching but also keep latency down.

Add a retry_llist that maintains a list of local tw that couldn't be
done in time. No synchronisation is needed since it is only modified
within the task context.

Signed-off-by: David Wei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 40cfe553 20-Nov-2024 David Wei <[email protected]>

io_uring: add io_local_work_pending()

In preparation for adding a new llist of tw to retry due to hitting the
tw limit, add a helper io_local_work_pending(). This function returns
true if there is a

io_uring: add io_local_work_pending()

In preparation for adding a new llist of tw to retry due to hitting the
tw limit, add a helper io_local_work_pending(). This function returns
true if there is any local tw pending. For now it only checks
ctx->work_llist.

Signed-off-by: David Wei <[email protected]>
Reviewed-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12, v6.12-rc7
# af0a2ffe 05-Nov-2024 Pavel Begunkov <[email protected]>

io_uring: avoid normal tw intermediate fallback

When a DEFER_TASKRUN io_uring is terminating it requeues deferred task
work items as normal tw, which can further fallback to kthread
execution. Avoid

io_uring: avoid normal tw intermediate fallback

When a DEFER_TASKRUN io_uring is terminating it requeues deferred task
work items as normal tw, which can further fallback to kthread
execution. Avoid this extra step and always push them to the fallback
kthread.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/d1cd472cec2230c66bd1c8d412a5833f0af75384.1730772720.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12-rc6
# b6f58a3f 03-Nov-2024 Jens Axboe <[email protected]>

io_uring: move struct io_kiocb from task_struct to io_uring_task

Rather than store the task_struct itself in struct io_kiocb, store
the io_uring specific task_struct. The life times are the same in

io_uring: move struct io_kiocb from task_struct to io_uring_task

Rather than store the task_struct itself in struct io_kiocb, store
the io_uring specific task_struct. The life times are the same in terms
of io_uring, and this avoids doing some dereferences through the
task_struct. For the hot path of putting local task references, we can
deref req->tctx instead, which we'll need anyway in that function
regardless of whether it's local or remote references.

This is mostly straight forward, except the original task PF_EXITING
check needs a bit of tweaking. task_work is _always_ run from the
originating task, except in the fallback case, where it's run from a
kernel thread. Replace the potentially racy (in case of fallback work)
checks for req->task->flags with current->flags. It's either the still
the original task, in which case PF_EXITING will be sane, or it has
PF_KTHREAD set, in which case it's fallback work. Both cases should
prevent moving forward with the given request.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# f03baece 03-Nov-2024 Jens Axboe <[email protected]>

io_uring: move cancelations to be io_uring_task based

Right now the task_struct pointer is used as the key to match a task,
but in preparation for some io_kiocb changes, move it to using struct
io_u

io_uring: move cancelations to be io_uring_task based

Right now the task_struct pointer is used as the key to match a task,
but in preparation for some io_kiocb changes, move it to using struct
io_uring_task instead. No functional changes intended in this patch.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12-rc5
# 81d8191e 21-Oct-2024 Jens Axboe <[email protected]>

io_uring: abstract out a bit of the ring filling logic

Abstract out a io_uring_fill_params() helper, which fills out the
necessary bits of struct io_uring_params. Add it to io_uring.h as well,
in pr

io_uring: abstract out a bit of the ring filling logic

Abstract out a io_uring_fill_params() helper, which fills out the
necessary bits of struct io_uring_params. Add it to io_uring.h as well,
in preparation for having another internal user of it.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 09d0a8ea 21-Oct-2024 Jens Axboe <[email protected]>

io_uring: move max entry definition and ring sizing into header

In preparation for needing this somewhere else, move the definitions
for the maximum CQ and SQ ring size into io_uring.h. Make the
rin

io_uring: move max entry definition and ring sizing into header

In preparation for needing this somewhere else, move the definitions
for the maximum CQ and SQ ring size into io_uring.h. Make the
rings_size() helper available as well, and have it take just the setup
flags argument rather than the fill ring pointer. That's all that is
needed.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12-rc4
# 2946f08a 18-Oct-2024 Pavel Begunkov <[email protected]>

io_uring: clean up cqe trace points

We have too many helpers posting CQEs, instead of tracing completion
events before filling in a CQE and thus having to pass all the data,
set the CQE first, pass

io_uring: clean up cqe trace points

We have too many helpers posting CQEs, instead of tracing completion
events before filling in a CQE and thus having to pass all the data,
set the CQE first, pass it to the tracing helper and let it extract
everything it needs.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/b83c1ca9ee5aed2df0f3bb743bf5ed699cce4c86.1729267437.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 8f7033aa 17-Oct-2024 Jens Axboe <[email protected]>

io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work

When the sqpoll is exiting and cancels pending work items, it may need
to run task_work. If this happens from within io_urin

io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work

When the sqpoll is exiting and cancels pending work items, it may need
to run task_work. If this happens from within io_uring_cancel_generic(),
then it may be under waiting for the io_uring_task waitqueue. This
results in the below splat from the scheduler, as the ring mutex may be
attempted grabbed while in a TASK_INTERRUPTIBLE state.

Ensure that the task state is set appropriately for that, just like what
is done for the other cases in io_run_task_work().

do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000029387fd2>] prepare_to_wait+0x88/0x2fc
WARNING: CPU: 6 PID: 59939 at kernel/sched/core.c:8561 __might_sleep+0xf4/0x140
Modules linked in:
CPU: 6 UID: 0 PID: 59939 Comm: iou-sqp-59938 Not tainted 6.12.0-rc3-00113-g8d020023b155 #7456
Hardware name: linux,dummy-virt (DT)
pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __might_sleep+0xf4/0x140
lr : __might_sleep+0xf4/0x140
sp : ffff80008c5e7830
x29: ffff80008c5e7830 x28: ffff0000d93088c0 x27: ffff60001c2d7230
x26: dfff800000000000 x25: ffff0000e16b9180 x24: ffff80008c5e7a50
x23: 1ffff000118bcf4a x22: ffff0000e16b9180 x21: ffff0000e16b9180
x20: 000000000000011b x19: ffff80008310fac0 x18: 1ffff000118bcd90
x17: 30303c5b20746120 x16: 74657320313d6574 x15: 0720072007200720
x14: 0720072007200720 x13: 0720072007200720 x12: ffff600036c64f0b
x11: 1fffe00036c64f0a x10: ffff600036c64f0a x9 : dfff800000000000
x8 : 00009fffc939b0f6 x7 : ffff0001b6327853 x6 : 0000000000000001
x5 : ffff0001b6327850 x4 : ffff600036c64f0b x3 : ffff8000803c35bc
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000e16b9180
Call trace:
__might_sleep+0xf4/0x140
mutex_lock+0x84/0x124
io_handle_tw_list+0xf4/0x260
tctx_task_work_run+0x94/0x340
io_run_task_work+0x1ec/0x3c0
io_uring_cancel_generic+0x364/0x524
io_sq_thread+0x820/0x124c
ret_from_fork+0x10/0x20

Cc: [email protected]
Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 28aabffa 15-Oct-2024 Jens Axboe <[email protected]>

io_uring/sqpoll: close race on waiting for sqring entries

When an application uses SQPOLL, it must wait for the SQPOLL thread to
consume SQE entries, if it fails to get an sqe when calling
io_uring_

io_uring/sqpoll: close race on waiting for sqring entries

When an application uses SQPOLL, it must wait for the SQPOLL thread to
consume SQE entries, if it fails to get an sqe when calling
io_uring_get_sqe(). It can do so by calling io_uring_enter(2) with the
flag value of IORING_ENTER_SQ_WAIT. In liburing, this is generally done
with io_uring_sqring_wait(). There's a natural expectation that once
this call returns, a new SQE entry can be retrieved, filled out, and
submitted. However, the kernel uses the cached sq head to determine if
the SQRING is full or not. If the SQPOLL thread is currently in the
process of submitting SQE entries, it may have updated the cached sq
head, but not yet committed it to the SQ ring. Hence the kernel may find
that there are SQE entries ready to be consumed, and return successfully
to the application. If the SQPOLL thread hasn't yet committed the SQ
ring entries by the time the application returns to userspace and
attempts to get a new SQE, it will fail getting a new SQE.

Fix this by having io_sqring_full() always use the user visible SQ ring
head entry, rather than the internally cached one.

Cc: [email protected] # 5.10+
Link: https://github.com/axboe/liburing/discussions/1267
Reported-by: Benedek Thaler <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11
# 6746ee4c 11-Sep-2024 Pavel Begunkov <[email protected]>

io_uring/cmd: expose iowq to cmds

When an io_uring request needs blocking context we offload it to the
io_uring's thread pool called io-wq. We can get there off ->uring_cmd
by returning -EAGAIN, but

io_uring/cmd: expose iowq to cmds

When an io_uring request needs blocking context we offload it to the
io_uring's thread pool called io-wq. We can get there off ->uring_cmd
by returning -EAGAIN, but there is no straightforward way of doing that
from an asynchronous callback. Add a helper that would transfer a
command to a blocking context.

Note, we do an extra hop via task_work before io_queue_iowq(), that's a
limitation of io_uring infra we have that can likely be lifted later
if that would ever become a problem.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/f735f807d7c8ba50c9452c69dfe5d3e9e535037b.1726072086.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7
# 1100c4a2 04-Jan-2024 Jens Axboe <[email protected]>

io_uring: add support for batch wait timeout

Waiting for events with io_uring has two knobs that can be set:

1) The number of events to wake for
2) The timeout associated with the event

Waiting wi

io_uring: add support for batch wait timeout

Waiting for events with io_uring has two knobs that can be set:

1) The number of events to wake for
2) The timeout associated with the event

Waiting will abort when either of those conditions are met, as expected.

This adds support for a third event, which is associated with the number
of events to wait for. Applications generally like to handle batches of
completions, and right now they'd set a number of events to wait for and
the timeout for that. If no events have been received but the timeout
triggers, control is returned to the application and it can wait again.
However, if the application doesn't have anything to do until events are
reaped, then it's possible to make this waiting more efficient.

For example, the application may have a latency time of 50 usecs and
wanting to handle a batch of 8 requests at the time. If it uses 50 usecs
as the timeout, then it'll be doing 20K context switches per second even
if nothing is happening.

This introduces the notion of min batch wait time. If the min batch wait
time expires, then we'll return to userspace if we have any events at all.
If none are available, the general wait time is applied. Any request
arriving after the min batch wait time will cause waiting to stop and
return control to the application.

Signed-off-by: Jens Axboe <[email protected]>

show more ...


1234567