|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
ea910678 |
| 28-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: don't pass ctx to tw add remote helper
Unlike earlier versions, io_msg_remote_post() creates a valid request with a proper context, so don't pass a context to io_req_task_work_add_remote()
io_uring: don't pass ctx to tw add remote helper
Unlike earlier versions, io_msg_remote_post() creates a valid request with a proper context, so don't pass a context to io_req_task_work_add_remote() explicitly but derive it from the request.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/721f51cf34996d98b48f0bfd24ad40aa2730167e.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
5027d024 |
| 08-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: unify STOP_MULTISHOT with IOU_OK
IOU_OK means that the request ownership is now handed back to core io_uring and it has to complete it using the result provided in req->cqe. Same is true f
io_uring: unify STOP_MULTISHOT with IOU_OK
IOU_OK means that the request ownership is now handed back to core io_uring and it has to complete it using the result provided in req->cqe. Same is true for multishot and IOU_STOP_MULTISHOT.
Rename it into IOU_COMPLETE to avoid confusion and use for both modes.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/e6a5b2edb0eb9558acb1c8f1db38ac45fee95491.1741453534.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
7a9dcb05 |
| 08-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: return -EAGAIN to continue multishot
Multishot errors can be mapped 1:1 to normal errors, but there are not identical. It leads to a peculiar situation where all multishot requests has to
io_uring: return -EAGAIN to continue multishot
Multishot errors can be mapped 1:1 to normal errors, but there are not identical. It leads to a peculiar situation where all multishot requests has to check in what context they're run and return different codes.
Unify them starting with EAGAIN / IOU_ISSUE_SKIP_COMPLETE(EIOCBQUEUED) pair, which mean that core io_uring still owns the request and it should be retried. In case of multishot it's naturally just continues to poll, otherwise it might poll, use iowq or do any other kind of allowed blocking. Introduce IOU_RETRY aliased to -EAGAIN for that.
Apart from obvious upsides, multishot can now also check for misuse of IOU_ISSUE_SKIP_COMPLETE.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/da117b79ce72ecc3ab488c744e29fae9ba54e23b.1741453534.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
30c97035 |
| 05-Mar-2025 |
Yue Haibing <[email protected]> |
io_uring: Remove unused declaration io_alloc_async_data()
Commit ef623a647f42 ("io_uring: Move old async data allocation helper to header") leave behind this unused declaration.
Signed-off-by: Yue
io_uring: Remove unused declaration io_alloc_async_data()
Commit ef623a647f42 ("io_uring: Move old async data allocation helper to header") leave behind this unused declaration.
Signed-off-by: Yue Haibing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
3035deac |
| 24-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring: introduce io_is_compat()
A preparation patch adding a simple helper for gauging the compat state. It'll help us to optimise and compile out more code in the following commits.
Signed-off-
io_uring: introduce io_is_compat()
A preparation patch adding a simple helper for gauging the compat state. It'll help us to optimise and compile out more code in the following commits.
Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Anuj Gupta <[email protected]> Link: https://lore.kernel.org/r/1a87a640265196a67bc38300128e0bfd7839ab1f.1740400452.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3 |
|
| #
11ed914b |
| 15-Feb-2025 |
David Wei <[email protected]> |
io_uring/zcrx: add io_recvzc request
Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a socket. Only the connection should be land on the specific rx queue set up for zero copy, and t
io_uring/zcrx: add io_recvzc request
Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a socket. Only the connection should be land on the specific rx queue set up for zero copy, and the socket must be handled by the io_uring instance that the rx queue was registered for zero copy with. That's because neither net_iovs / buffers from our queue can be read by outside applications, nor zero copy is possible if traffic for the zero copy connection goes to another queue. This coordination is outside of the scope of this patch series. Also, any traffic directed to the zero copy enabled queue is immediately visible to the application, which is why CAP_NET_ADMIN is required at the registration step.
Of course, no data is actually read out of the socket, it has already been copied by the netdev into userspace memory via DMA. OP_RECV_ZC reads skbs out of the socket and checks that its frags are indeed net_iovs that belong to io_uring. A cqe is queued for each one of these frags.
Recall that each cqe is a big cqe, with the top half being an io_uring_zcrx_cqe. The cqe res field contains the len or error. The lower IORING_ZCRX_AREA_SHIFT bits of the struct io_uring_zcrx_cqe::off field contain the offset relative to the start of the zero copy area. The upper part of the off field is trivially zero, and will be used to carry the area id.
For now, there is no limit as to how much work each OP_RECV_ZC request does. It will attempt to drain a socket of all available data. This request always operates in multishot mode.
Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: David Wei <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
bcf8a029 |
| 17-Feb-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring: introduce type alias for io_tw_state
In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the repr
io_uring: introduce type alias for io_tw_state
In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the representation in one place, without having to update the many functions that just forward their struct io_tw_state * argument.
Also add a comment to struct io_tw_state to explain its purpose.
Signed-off-by: Caleb Sander Mateos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2 |
|
| #
9afe6847 |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: remove legacy kbuf kmem cache
Remove the kmem cache used by legacy provided buffers.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8195c207d8
io_uring/kbuf: remove legacy kbuf kmem cache
Remove the kmem cache used by legacy provided buffers.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8195c207d8524d94e972c0c82de99282289f7f5c.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1 |
|
| #
ff74954e |
| 23-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/alloc_cache: get rid of _nocache() helper
Just allow passing in NULL for the cache, if the type in question doesn't have a cache associated with it.
Signed-off-by: Jens Axboe <axboe@kernel
io_uring/alloc_cache: get rid of _nocache() helper
Just allow passing in NULL for the cache, if the type in question doesn't have a cache associated with it.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
fa359552 |
| 23-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring: get rid of alloc cache init_once handling
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could
io_uring: get rid of alloc cache init_once handling
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could get cleared by memset() in that case, a few of the cache members are large enough that this may cause unnecessary overhead if the caches used aren't large enough to satisfy the workload. For those cases, some churn of kmalloc+kfree is to be expected.
Ensure that the 3 users that need clearing put the members they need cleared at the start of the struct, and wrap the rest of the struct in a struct group so the offset is known.
While at it, improve the interaction with KASAN such that when/if KASAN writes to members inside the struct that should be retained over caching, it won't trip over itself. For rw and net, the retaining of the iovec over caching is disabled if KASAN is enabled. A helper will free and clear those members in that case.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7 |
|
| #
60495b08 |
| 07-Jan-2025 |
Pavel Begunkov <[email protected]> |
io_uring: silence false positive warnings
If we kill a ring and then immediately exit the task, we'll get cancellattion running by the task and a kthread in io_ring_exit_work. For DEFER_TASKRUN, we
io_uring: silence false positive warnings
If we kill a ring and then immediately exit the task, we'll get cancellattion running by the task and a kthread in io_ring_exit_work. For DEFER_TASKRUN, we do want to limit it to only one entity executing it, however it's currently not an issue as it's protected by uring_lock.
Silence lockdep assertions for now, we'll return to it later.
Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/7e5f68281acb0f081f65fde435833c68a3b7e02f.1736257837.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
ef623a64 |
| 16-Dec-2024 |
Gabriel Krisman Bertazi <[email protected]> |
io_uring: Move old async data allocation helper to header
There are two remaining uses of the old async data allocator that do not rely on the alloc cache. I don't want to make them use the new all
io_uring: Move old async data allocation helper to header
There are two remaining uses of the old async data allocator that do not rely on the alloc cache. I don't want to make them use the new allocator helper because that would require a if(cache) check, which will result in dead code for the cached case (for callers passing a cache, gcc can't prove the cache isn't NULL, and will therefore preserve the check. Since this is an inline function and just a few lines long, keep a second helper to deal with cases where we don't have an async data cache.
No functional change intended here. This is just moving the helper around and making it inline.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
49f7a309 |
| 16-Dec-2024 |
Gabriel Krisman Bertazi <[email protected]> |
io_uring: Add generic helper to allocate async data
This helper replaces io_alloc_async_data by using the folded allocation. Do it in a header to allow the compiler to decide whether to inline.
Sig
io_uring: Add generic helper to allocate async data
This helper replaces io_alloc_async_data by using the folded allocation. Do it in a header to allow the compiler to decide whether to inline.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
f46b9cdb |
| 20-Nov-2024 |
David Wei <[email protected]> |
io_uring: limit local tw done
Instead of eagerly running all available local tw, limit the amount of local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The value of 20 is chosen as
io_uring: limit local tw done
Instead of eagerly running all available local tw, limit the amount of local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The value of 20 is chosen as a reasonable heuristic to allow enough work batching but also keep latency down.
Add a retry_llist that maintains a list of local tw that couldn't be done in time. No synchronisation is needed since it is only modified within the task context.
Signed-off-by: David Wei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
40cfe553 |
| 20-Nov-2024 |
David Wei <[email protected]> |
io_uring: add io_local_work_pending()
In preparation for adding a new llist of tw to retry due to hitting the tw limit, add a helper io_local_work_pending(). This function returns true if there is a
io_uring: add io_local_work_pending()
In preparation for adding a new llist of tw to retry due to hitting the tw limit, add a helper io_local_work_pending(). This function returns true if there is any local tw pending. For now it only checks ctx->work_llist.
Signed-off-by: David Wei <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7 |
|
| #
af0a2ffe |
| 05-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: avoid normal tw intermediate fallback
When a DEFER_TASKRUN io_uring is terminating it requeues deferred task work items as normal tw, which can further fallback to kthread execution. Avoid
io_uring: avoid normal tw intermediate fallback
When a DEFER_TASKRUN io_uring is terminating it requeues deferred task work items as normal tw, which can further fallback to kthread execution. Avoid this extra step and always push them to the fallback kthread.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/d1cd472cec2230c66bd1c8d412a5833f0af75384.1730772720.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6 |
|
| #
b6f58a3f |
| 03-Nov-2024 |
Jens Axboe <[email protected]> |
io_uring: move struct io_kiocb from task_struct to io_uring_task
Rather than store the task_struct itself in struct io_kiocb, store the io_uring specific task_struct. The life times are the same in
io_uring: move struct io_kiocb from task_struct to io_uring_task
Rather than store the task_struct itself in struct io_kiocb, store the io_uring specific task_struct. The life times are the same in terms of io_uring, and this avoids doing some dereferences through the task_struct. For the hot path of putting local task references, we can deref req->tctx instead, which we'll need anyway in that function regardless of whether it's local or remote references.
This is mostly straight forward, except the original task PF_EXITING check needs a bit of tweaking. task_work is _always_ run from the originating task, except in the fallback case, where it's run from a kernel thread. Replace the potentially racy (in case of fallback work) checks for req->task->flags with current->flags. It's either the still the original task, in which case PF_EXITING will be sane, or it has PF_KTHREAD set, in which case it's fallback work. Both cases should prevent moving forward with the given request.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
f03baece |
| 03-Nov-2024 |
Jens Axboe <[email protected]> |
io_uring: move cancelations to be io_uring_task based
Right now the task_struct pointer is used as the key to match a task, but in preparation for some io_kiocb changes, move it to using struct io_u
io_uring: move cancelations to be io_uring_task based
Right now the task_struct pointer is used as the key to match a task, but in preparation for some io_kiocb changes, move it to using struct io_uring_task instead. No functional changes intended in this patch.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5 |
|
| #
81d8191e |
| 21-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring: abstract out a bit of the ring filling logic
Abstract out a io_uring_fill_params() helper, which fills out the necessary bits of struct io_uring_params. Add it to io_uring.h as well, in pr
io_uring: abstract out a bit of the ring filling logic
Abstract out a io_uring_fill_params() helper, which fills out the necessary bits of struct io_uring_params. Add it to io_uring.h as well, in preparation for having another internal user of it.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
09d0a8ea |
| 21-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring: move max entry definition and ring sizing into header
In preparation for needing this somewhere else, move the definitions for the maximum CQ and SQ ring size into io_uring.h. Make the rin
io_uring: move max entry definition and ring sizing into header
In preparation for needing this somewhere else, move the definitions for the maximum CQ and SQ ring size into io_uring.h. Make the rings_size() helper available as well, and have it take just the setup flags argument rather than the fill ring pointer. That's all that is needed.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4 |
|
| #
2946f08a |
| 18-Oct-2024 |
Pavel Begunkov <[email protected]> |
io_uring: clean up cqe trace points
We have too many helpers posting CQEs, instead of tracing completion events before filling in a CQE and thus having to pass all the data, set the CQE first, pass
io_uring: clean up cqe trace points
We have too many helpers posting CQEs, instead of tracing completion events before filling in a CQE and thus having to pass all the data, set the CQE first, pass it to the tracing helper and let it extract everything it needs.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/b83c1ca9ee5aed2df0f3bb743bf5ed699cce4c86.1729267437.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
8f7033aa |
| 17-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work
When the sqpoll is exiting and cancels pending work items, it may need to run task_work. If this happens from within io_urin
io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work
When the sqpoll is exiting and cancels pending work items, it may need to run task_work. If this happens from within io_uring_cancel_generic(), then it may be under waiting for the io_uring_task waitqueue. This results in the below splat from the scheduler, as the ring mutex may be attempted grabbed while in a TASK_INTERRUPTIBLE state.
Ensure that the task state is set appropriately for that, just like what is done for the other cases in io_run_task_work().
do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000029387fd2>] prepare_to_wait+0x88/0x2fc WARNING: CPU: 6 PID: 59939 at kernel/sched/core.c:8561 __might_sleep+0xf4/0x140 Modules linked in: CPU: 6 UID: 0 PID: 59939 Comm: iou-sqp-59938 Not tainted 6.12.0-rc3-00113-g8d020023b155 #7456 Hardware name: linux,dummy-virt (DT) pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : __might_sleep+0xf4/0x140 lr : __might_sleep+0xf4/0x140 sp : ffff80008c5e7830 x29: ffff80008c5e7830 x28: ffff0000d93088c0 x27: ffff60001c2d7230 x26: dfff800000000000 x25: ffff0000e16b9180 x24: ffff80008c5e7a50 x23: 1ffff000118bcf4a x22: ffff0000e16b9180 x21: ffff0000e16b9180 x20: 000000000000011b x19: ffff80008310fac0 x18: 1ffff000118bcd90 x17: 30303c5b20746120 x16: 74657320313d6574 x15: 0720072007200720 x14: 0720072007200720 x13: 0720072007200720 x12: ffff600036c64f0b x11: 1fffe00036c64f0a x10: ffff600036c64f0a x9 : dfff800000000000 x8 : 00009fffc939b0f6 x7 : ffff0001b6327853 x6 : 0000000000000001 x5 : ffff0001b6327850 x4 : ffff600036c64f0b x3 : ffff8000803c35bc x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000e16b9180 Call trace: __might_sleep+0xf4/0x140 mutex_lock+0x84/0x124 io_handle_tw_list+0xf4/0x260 tctx_task_work_run+0x94/0x340 io_run_task_work+0x1ec/0x3c0 io_uring_cancel_generic+0x364/0x524 io_sq_thread+0x820/0x124c ret_from_fork+0x10/0x20
Cc: [email protected] Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
28aabffa |
| 15-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring/sqpoll: close race on waiting for sqring entries
When an application uses SQPOLL, it must wait for the SQPOLL thread to consume SQE entries, if it fails to get an sqe when calling io_uring_
io_uring/sqpoll: close race on waiting for sqring entries
When an application uses SQPOLL, it must wait for the SQPOLL thread to consume SQE entries, if it fails to get an sqe when calling io_uring_get_sqe(). It can do so by calling io_uring_enter(2) with the flag value of IORING_ENTER_SQ_WAIT. In liburing, this is generally done with io_uring_sqring_wait(). There's a natural expectation that once this call returns, a new SQE entry can be retrieved, filled out, and submitted. However, the kernel uses the cached sq head to determine if the SQRING is full or not. If the SQPOLL thread is currently in the process of submitting SQE entries, it may have updated the cached sq head, but not yet committed it to the SQ ring. Hence the kernel may find that there are SQE entries ready to be consumed, and return successfully to the application. If the SQPOLL thread hasn't yet committed the SQ ring entries by the time the application returns to userspace and attempts to get a new SQE, it will fail getting a new SQE.
Fix this by having io_sqring_full() always use the user visible SQ ring head entry, rather than the internally cached one.
Cc: [email protected] # 5.10+ Link: https://github.com/axboe/liburing/discussions/1267 Reported-by: Benedek Thaler <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
6746ee4c |
| 11-Sep-2024 |
Pavel Begunkov <[email protected]> |
io_uring/cmd: expose iowq to cmds
When an io_uring request needs blocking context we offload it to the io_uring's thread pool called io-wq. We can get there off ->uring_cmd by returning -EAGAIN, but
io_uring/cmd: expose iowq to cmds
When an io_uring request needs blocking context we offload it to the io_uring's thread pool called io-wq. We can get there off ->uring_cmd by returning -EAGAIN, but there is no straightforward way of doing that from an asynchronous callback. Add a helper that would transfer a command to a blocking context.
Note, we do an extra hop via task_work before io_queue_iowq(), that's a limitation of io_uring infra we have that can likely be lifted later if that would ever become a problem.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/f735f807d7c8ba50c9452c69dfe5d3e9e535037b.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7 |
|
| #
1100c4a2 |
| 04-Jan-2024 |
Jens Axboe <[email protected]> |
io_uring: add support for batch wait timeout
Waiting for events with io_uring has two knobs that can be set:
1) The number of events to wake for 2) The timeout associated with the event
Waiting wi
io_uring: add support for batch wait timeout
Waiting for events with io_uring has two knobs that can be set:
1) The number of events to wake for 2) The timeout associated with the event
Waiting will abort when either of those conditions are met, as expected.
This adds support for a third event, which is associated with the number of events to wait for. Applications generally like to handle batches of completions, and right now they'd set a number of events to wait for and the timeout for that. If no events have been received but the timeout triggers, control is returned to the application and it can wait again. However, if the application doesn't have anything to do until events are reaped, then it's possible to make this waiting more efficient.
For example, the application may have a latency time of 50 usecs and wanting to handle a batch of 8 requests at the time. If it uses 50 usecs as the timeout, then it'll be doing 20K context switches per second even if nothing is happening.
This introduces the notion of min batch wait time. If the min batch wait time expires, then we'll return to userspace if we have any events at all. If none are available, the general wait time is applied. Any request arriving after the min batch wait time will cause waiting to stop and return control to the application.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|