|
Revision tags: v6.15, v6.15-rc7 |
|
| #
a7d755ed |
| 17-May-2025 |
Pavel Begunkov <[email protected]> |
io_uring: fix overflow resched cqe reordering
Leaving the CQ critical section in the middle of a overflow flushing can cause cqe reordering since the cache cq pointers are reset and any new cqe emit
io_uring: fix overflow resched cqe reordering
Leaving the CQ critical section in the middle of a overflow flushing can cause cqe reordering since the cache cq pointers are reset and any new cqe emitters that might get called in between are not going to be forced into io_cqe_cache_refill().
Fixes: eac2ca2d682f9 ("io_uring: check if we need to reschedule during overflow flush") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc6 |
|
| #
687b2bae |
| 07-May-2025 |
Jens Axboe <[email protected]> |
io_uring: ensure deferred completions are flushed for multishot
Multishot normally uses io_req_post_cqe() to post completions, but when stopping it, it may finish up with a deferred completion. This
io_uring: ensure deferred completions are flushed for multishot
Multishot normally uses io_req_post_cqe() to post completions, but when stopping it, it may finish up with a deferred completion. This is fine, except if another multishot event triggers before the deferred completions get flushed. If this occurs, then CQEs may get reordered in the CQ ring, as new multishot completions get posted before the deferred ones are flushed. This can cause confusion on the application side, if strict ordering is required for the use case.
When multishot posting via io_req_post_cqe(), flush any pending deferred completions first, if any.
Cc: [email protected] # 6.1+ Reported-by: Norman Maurer <[email protected]> Reported-by: Christian Mazakas <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc5 |
|
| #
b53e5232 |
| 04-May-2025 |
Jens Axboe <[email protected]> |
io_uring: always arm linked timeouts prior to issue
There are a few spots where linked timeouts are armed, and not all of them adhere to the pre-arm, attempt issue, post-arm pattern. This can be pro
io_uring: always arm linked timeouts prior to issue
There are a few spots where linked timeouts are armed, and not all of them adhere to the pre-arm, attempt issue, post-arm pattern. This can be problematic if the linked request returns that it will trigger a callback later, and does so before the linked timeout is fully armed.
Consolidate all the linked timeout handling into __io_issue_sqe(), rather than have it spread throughout the various issue entry points.
Cc: [email protected] Link: https://github.com/axboe/liburing/issues/1390 Reported-by: Chase Hiltz <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc4 |
|
| #
edd43f4d |
| 24-Apr-2025 |
Jens Axboe <[email protected]> |
io_uring: fix 'sync' handling of io_fallback_tw()
A previous commit added a 'sync' parameter to io_fallback_tw(), which if true, means the caller wants to wait on the fallback thread handling it. Bu
io_uring: fix 'sync' handling of io_fallback_tw()
A previous commit added a 'sync' parameter to io_fallback_tw(), which if true, means the caller wants to wait on the fallback thread handling it. But the logic is somewhat messed up, ensure that ctxs are swapped and flushed appropriately.
Cc: [email protected] Fixes: dfbe5561ae93 ("io_uring: flush offloaded and delayed task_work on exit") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
5e16f1a6 |
| 24-Apr-2025 |
Pavel Begunkov <[email protected]> |
io_uring: don't duplicate flushing in io_req_post_cqe
io_req_post_cqe() sets submit_state.cq_flush so that *flush_completions() can take care of batch commiting CQEs. Don't commit it twice by using
io_uring: don't duplicate flushing in io_req_post_cqe
io_req_post_cqe() sets submit_state.cq_flush so that *flush_completions() can take care of batch commiting CQEs. Don't commit it twice by using __io_cq_unlock_post().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/41c416660c509cee676b6cad96081274bcb459f3.1745493861.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
39051364 |
| 03-Apr-2025 |
Pavel Begunkov <[email protected]> |
io_uring: always do atomic put from iowq
io_uring always switches requests to atomic refcounting for iowq execution before there is any parallilism by setting REQ_F_REFCOUNT, and the flag is not cle
io_uring: always do atomic put from iowq
io_uring always switches requests to atomic refcounting for iowq execution before there is any parallilism by setting REQ_F_REFCOUNT, and the flag is not cleared until the request completes. That should be fine as long as the compiler doesn't make up a non existing value for the flags, however KCSAN still complains when the request owner changes oter flag bits:
BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work ... read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0: req_ref_put_and_test io_uring/refs.h:22 [inline]
Skip REQ_F_REFCOUNT checks for iowq, we know it's set.
Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
697b2876 |
| 31-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: add req flag invariant build assertion
We're caching some of file related request flags in a tricky way, put a build check to make sure flags don't get reshuffled.
Signed-off-by: Pavel Be
io_uring: add req flag invariant build assertion
We're caching some of file related request flags in a tricky way, put a build check to make sure flags don't get reshuffled.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/9877577b83c25dd78224a8274f799187e7ec7639.1743407551.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
ea910678 |
| 28-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: don't pass ctx to tw add remote helper
Unlike earlier versions, io_msg_remote_post() creates a valid request with a proper context, so don't pass a context to io_req_task_work_add_remote()
io_uring: don't pass ctx to tw add remote helper
Unlike earlier versions, io_msg_remote_post() creates a valid request with a proper context, so don't pass a context to io_req_task_work_add_remote() explicitly but derive it from the request.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/721f51cf34996d98b48f0bfd24ad40aa2730167e.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
6889ae1b |
| 27-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring/net: fix io_req_post_cqe abuse by send bundle
[ 114.987980][ T5313] WARNING: CPU: 6 PID: 5313 at io_uring/io_uring.c:872 io_req_post_cqe+0x12e/0x4f0 [ 114.991597][ T5313] RIP: 0010:io_req
io_uring/net: fix io_req_post_cqe abuse by send bundle
[ 114.987980][ T5313] WARNING: CPU: 6 PID: 5313 at io_uring/io_uring.c:872 io_req_post_cqe+0x12e/0x4f0 [ 114.991597][ T5313] RIP: 0010:io_req_post_cqe+0x12e/0x4f0 [ 115.001880][ T5313] Call Trace: [ 115.002222][ T5313] <TASK> [ 115.007813][ T5313] io_send+0x4fe/0x10f0 [ 115.009317][ T5313] io_issue_sqe+0x1a6/0x1740 [ 115.012094][ T5313] io_wq_submit_work+0x38b/0xed0 [ 115.013223][ T5313] io_worker_handle_work+0x62a/0x1600 [ 115.013876][ T5313] io_wq_worker+0x34f/0xdf0
As the comment states, io_req_post_cqe() should only be used by multishot requests, i.e. REQ_F_APOLL_MULTISHOT, which bundled sends are not. Add a flag signifying whether a request wants to post multiple CQEs. Eventually REQ_F_APOLL_MULTISHOT should imply the new flag, but that's left out for simplicity.
Cc: [email protected] Fixes: a05d1f625c7aa ("io_uring/net: support bundles for send") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8b611dbb54d1cd47a88681f5d38c84d0c02bc563.1743067183.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
81661978 |
| 24-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: move min_events sanitisation
iopoll and normal waiting already duplicate min_completion truncation, so move them inside the corresponding routines.
Signed-off-by: Pavel Begunkov <asml.sil
io_uring: move min_events sanitisation
iopoll and normal waiting already duplicate min_completion truncation, so move them inside the corresponding routines.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/254adb289cc04638f25d746a7499260fa89a179e.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
d73acd7a |
| 24-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: rename "min" arg in io_iopoll_check()
Don't name arguments "min", it shadows the namesake function. min_events is also more consistent.
Signed-off-by: Pavel Begunkov <[email protected]
io_uring: rename "min" arg in io_iopoll_check()
Don't name arguments "min", it shadows the namesake function. min_events is also more consistent.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/f52ce9d88d3bca5732a218b0da14924aa6968909.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
4c76de42 |
| 24-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: open code __io_post_aux_cqe()
There is no reason to keep __io_post_aux_cqe() separately from io_post_aux_cqe().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.k
io_uring: open code __io_post_aux_cqe()
There is no reason to keep __io_post_aux_cqe() separately from io_post_aux_cqe().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/2c4c1f68d694deea25a212fc09bbb11f330cd82e.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
3afcb3b2 |
| 24-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: defer iowq cqe overflow via task_work
Don't handle CQE overflows in io_req_complete_post() and defer it to flush_completions. It cuts some duplication, and I also want to limit the number
io_uring: defer iowq cqe overflow via task_work
Don't handle CQE overflows in io_req_complete_post() and defer it to flush_completions. It cuts some duplication, and I also want to limit the number of places directly overflowing completions.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/9046410ac27e18f2baa6f7cdb363ec921cbc3b79.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
3f0cb8de |
| 24-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: fix retry handling off iowq
io_req_complete_post() doesn't handle reissue and if called with a REQ_F_REISSUE request it might post extra unexpected completions. Fix it by pushing into flus
io_uring: fix retry handling off iowq
io_req_complete_post() doesn't handle reissue and if called with a REQ_F_REISSUE request it might post extra unexpected completions. Fix it by pushing into flush_completion via task work.
Fixes: d803d123948fe ("io_uring/rw: handle -EAGAIN retry at IO completion time") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/badb3d7e462881e7edbfcc2be6301090b07dbe53.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14 |
|
| #
3a4689ac |
| 21-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring/cmd: add iovec cache for commands
Add iou_vec to commands and wire caching for it, but don't expose it to users just yet. We need the vec cleared on initial alloc, but since we can't place
io_uring/cmd: add iovec cache for commands
Add iou_vec to commands and wire caching for it, but don't expose it to users just yet. We need the vec cleared on initial alloc, but since we can't place it at the beginning at the moment, zero the entire async_data. It's cached, and the performance effects only the initial allocation, and it might be not a bad idea since we're exposing those bits to outside drivers.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c0f2145b75791bc6106eb4e72add2cf6a2c72a7a.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7 |
|
| #
07754bfd |
| 14-Mar-2025 |
Jens Axboe <[email protected]> |
io_uring: enable toggle of iowait usage when waiting on CQEs
By default, io_uring marks a waiting task as being in iowait, if it's sleeping waiting on events and there are pending requests. This isn
io_uring: enable toggle of iowait usage when waiting on CQEs
By default, io_uring marks a waiting task as being in iowait, if it's sleeping waiting on events and there are pending requests. This isn't necessarily always useful, and may be confusing on non-storage setups where iowait isn't expected. It can also cause extra power usage, by preventing the CPU from entering lower sleep states.
This adds a new enter flag, IORING_ENTER_NO_IOWAIT. If set, then io_uring will not account the sleeping task as being in iowait. If the kernel supports this feature, then it will be marked by having the IORING_FEAT_NO_IOWAIT feature flag set.
As the kernel currently does not support separating the iowait accounting and CPU frequency boosting, the IORING_ENTER_NO_IOWAIT controls both of these at the same time. In the future, if those do end up being split, then it'd be possible to control them separately. However, it seems more likely that the kernel will decouple iowait and CPU frequency boosting anyway.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
5f14404b |
| 19-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring/cmd: don't expose entire cmd async data
io_uring needs private bits in cmd's ->async_data, and they should never be exposed to drivers as it'd certainly be abused. Leave struct io_uring_cmd
io_uring/cmd: don't expose entire cmd async data
io_uring needs private bits in cmd's ->async_data, and they should never be exposed to drivers as it'd certainly be abused. Leave struct io_uring_cmd_data for the drivers but wrap it into a structure. It's a prep patch and doesn't do anything useful yet.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
575e7b06 |
| 19-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: rename the data cmd cache
Pick a more descriptive name for the cmd async data cache.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/20250319061251.
io_uring: rename the data cmd cache
Pick a more descriptive name for the cmd async data cache.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6 |
|
| #
5027d024 |
| 08-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: unify STOP_MULTISHOT with IOU_OK
IOU_OK means that the request ownership is now handed back to core io_uring and it has to complete it using the result provided in req->cqe. Same is true f
io_uring: unify STOP_MULTISHOT with IOU_OK
IOU_OK means that the request ownership is now handed back to core io_uring and it has to complete it using the result provided in req->cqe. Same is true for multishot and IOU_STOP_MULTISHOT.
Rename it into IOU_COMPLETE to avoid confusion and use for both modes.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/e6a5b2edb0eb9558acb1c8f1db38ac45fee95491.1741453534.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
7a9dcb05 |
| 08-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: return -EAGAIN to continue multishot
Multishot errors can be mapped 1:1 to normal errors, but there are not identical. It leads to a peculiar situation where all multishot requests has to
io_uring: return -EAGAIN to continue multishot
Multishot errors can be mapped 1:1 to normal errors, but there are not identical. It leads to a peculiar situation where all multishot requests has to check in what context they're run and return different codes.
Unify them starting with EAGAIN / IOU_ISSUE_SKIP_COMPLETE(EIOCBQUEUED) pair, which mean that core io_uring still owns the request and it should be retried. In case of multishot it's naturally just continues to poll, otherwise it might poll, use iowq or do any other kind of allowed blocking. Introduce IOU_RETRY aliased to -EAGAIN for that.
Apart from obvious upsides, multishot can now also check for misuse of IOU_ISSUE_SKIP_COMPLETE.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/da117b79ce72ecc3ab488c744e29fae9ba54e23b.1741453534.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
0d83b8a9 |
| 04-Mar-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring: introduce io_cache_free() helper
Add a helper function io_cache_free() that returns an allocation to a io_alloc_cache, falling back on kfree() if the io_alloc_cache is full. This is the in
io_uring: introduce io_cache_free() helper
Add a helper function io_cache_free() that returns an allocation to a io_alloc_cache, falling back on kfree() if the io_alloc_cache is full. This is the inverse of io_cache_alloc(), which takes an allocation from an io_alloc_cache and falls back on kmalloc() if the cache is empty.
Convert 4 callers to use the helper.
Signed-off-by: Caleb Sander Mateos <[email protected]> Suggested-by: Li Zetao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
ed9f3112 |
| 27-Feb-2025 |
Keith Busch <[email protected]> |
io_uring: cache nodes and mapped buffers
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers.
Signed-off-by: Keith Busch <[email protected]
io_uring: cache nodes and mapped buffers
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers.
Signed-off-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: fix imu leak] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
27cb27b6 |
| 27-Feb-2025 |
Keith Busch <[email protected]> |
io_uring: add support for kernel registered bvecs
Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to
io_uring: add support for kernel registered bvecs
Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO.
User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it.
Signed-off-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4 |
|
| #
c457eed5 |
| 23-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring: make io_poll_issue() sturdier
io_poll_issue() forwards the call to io_issue_sqe() and thus inherits some of the handling. That's not particularly failure resistant, as for example returnin
io_uring: make io_poll_issue() sturdier
io_poll_issue() forwards the call to io_issue_sqe() and thus inherits some of the handling. That's not particularly failure resistant, as for example returning an innocently looking IOU_OK from a multishot issue will lead to severe bugs.
Reimplement io_poll_issue() without io_issue_sqe()'s request completion logic. Remove extra checks as we know that req->file is already set, linked timeout are armed, and iopoll is not supported. Also cover it with warnings for now.
The patch should be useful by itself, but it's also preparing the codebase for other future clean ups.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/3096d7b1026d9a52426a598bdfc8d9d324555545.1740331076.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3 |
|
| #
62aa9805 |
| 12-Feb-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring: use lockless_cq flag in io_req_complete_post()
io_uring_create() computes ctx->lockless_cq as: ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)
So use it to simplify that expressi
io_uring: use lockless_cq flag in io_req_complete_post()
io_uring_create() computes ctx->lockless_cq as: ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)
So use it to simplify that expression in io_req_complete_post().
Signed-off-by: Caleb Sander Mateos <[email protected]> Reviewed-by: Li Zetao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|