|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
5027d024 |
| 08-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: unify STOP_MULTISHOT with IOU_OK
IOU_OK means that the request ownership is now handed back to core io_uring and it has to complete it using the result provided in req->cqe. Same is true f
io_uring: unify STOP_MULTISHOT with IOU_OK
IOU_OK means that the request ownership is now handed back to core io_uring and it has to complete it using the result provided in req->cqe. Same is true for multishot and IOU_STOP_MULTISHOT.
Rename it into IOU_COMPLETE to avoid confusion and use for both modes.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/e6a5b2edb0eb9558acb1c8f1db38ac45fee95491.1741453534.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
7a9dcb05 |
| 08-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: return -EAGAIN to continue multishot
Multishot errors can be mapped 1:1 to normal errors, but there are not identical. It leads to a peculiar situation where all multishot requests has to
io_uring: return -EAGAIN to continue multishot
Multishot errors can be mapped 1:1 to normal errors, but there are not identical. It leads to a peculiar situation where all multishot requests has to check in what context they're run and return different codes.
Unify them starting with EAGAIN / IOU_ISSUE_SKIP_COMPLETE(EIOCBQUEUED) pair, which mean that core io_uring still owns the request and it should be retried. In case of multishot it's naturally just continues to poll, otherwise it might poll, use iowq or do any other kind of allowed blocking. Introduce IOU_RETRY aliased to -EAGAIN for that.
Apart from obvious upsides, multishot can now also check for misuse of IOU_ISSUE_SKIP_COMPLETE.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/da117b79ce72ecc3ab488c744e29fae9ba54e23b.1741453534.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4 |
|
| #
bcf8a029 |
| 17-Feb-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring: introduce type alias for io_tw_state
In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the repr
io_uring: introduce type alias for io_tw_state
In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the representation in one place, without having to update the many functions that just forward their struct io_tw_state * argument.
Also add a comment to struct io_tw_state to explain its purpose.
Signed-off-by: Caleb Sander Mateos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
8c8492ca |
| 30-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/net: don't retry connect operation on EPOLLERR
If a socket is shutdown before the connection completes, POLLERR is set in the poll mask. However, connect ignores this as it doesn't know, an
io_uring/net: don't retry connect operation on EPOLLERR
If a socket is shutdown before the connection completes, POLLERR is set in the poll mask. However, connect ignores this as it doesn't know, and attempts the connection again. This may lead to a bogus -ETIMEDOUT result, where it should have noticed the POLLERR and just returned -ECONNRESET instead.
Have the poll logic check for whether or not POLLERR is set in the mask, and if so, mark the request as failed. Then connect can appropriately fail the request rather than retry it.
Reported-by: Sergey Galas <[email protected]> Cc: [email protected] Link: https://github.com/axboe/liburing/discussions/1335 Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
d63b0e8a |
| 28-Jan-2025 |
Pavel Begunkov <[email protected]> |
io_uring: fix multishots with selected buffers
We do io_kbuf_recycle() when arming a poll but every iteration of a multishot can grab more buffers, which is why we need to flush the kbuf ring state
io_uring: fix multishots with selected buffers
We do io_kbuf_recycle() when arming a poll but every iteration of a multishot can grab more buffers, which is why we need to flush the kbuf ring state before continuing with waiting.
Cc: [email protected] Fixes: b3fdea6ecb55c ("io_uring: multishot recv") Reported-by: Muhammad Ramdhan <[email protected]> Reported-by: Bing-Jhong Billy Jheng <[email protected]> Reported-by: Jacob Soo <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1bfc9990fe435f1fc6152ca9efeba5eb3e68339c.1738025570.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
fa359552 |
| 23-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring: get rid of alloc cache init_once handling
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could
io_uring: get rid of alloc cache init_once handling
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could get cleared by memset() in that case, a few of the cache members are large enough that this may cause unnecessary overhead if the caches used aren't large enough to satisfy the workload. For those cases, some churn of kmalloc+kfree is to be expected.
Ensure that the 3 users that need clearing put the members they need cleared at the start of the struct, and wrap the rest of the struct in a struct group so the offset is known.
While at it, improve the interaction with KASAN such that when/if KASAN writes to members inside the struct that should be retained over caching, it won't trip over itself. For rw and net, the retaining of the iovec over caching is disabled if KASAN is enabled. A helper will free and clear those members in that case.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
12108729 |
| 16-Dec-2024 |
Gabriel Krisman Bertazi <[email protected]> |
io_uring/poll: Allocate apoll with generic alloc_cache helper
This abstracts away the cache details to simplify the code.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Link: https://lore
io_uring/poll: Allocate apoll with generic alloc_cache helper
This abstracts away the cache details to simplify the code.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
b6f58a3f |
| 03-Nov-2024 |
Jens Axboe <[email protected]> |
io_uring: move struct io_kiocb from task_struct to io_uring_task
Rather than store the task_struct itself in struct io_kiocb, store the io_uring specific task_struct. The life times are the same in
io_uring: move struct io_kiocb from task_struct to io_uring_task
Rather than store the task_struct itself in struct io_kiocb, store the io_uring specific task_struct. The life times are the same in terms of io_uring, and this avoids doing some dereferences through the task_struct. For the hot path of putting local task references, we can deref req->tctx instead, which we'll need anyway in that function regardless of whether it's local or remote references.
This is mostly straight forward, except the original task PF_EXITING check needs a bit of tweaking. task_work is _always_ run from the originating task, except in the fallback case, where it's run from a kernel thread. Replace the potentially racy (in case of fallback work) checks for req->task->flags with current->flags. It's either the still the original task, in which case PF_EXITING will be sane, or it has PF_KTHREAD set, in which case it's fallback work. Both cases should prevent moving forward with the given request.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
f03baece |
| 03-Nov-2024 |
Jens Axboe <[email protected]> |
io_uring: move cancelations to be io_uring_task based
Right now the task_struct pointer is used as the key to match a task, but in preparation for some io_kiocb changes, move it to using struct io_u
io_uring: move cancelations to be io_uring_task based
Right now the task_struct pointer is used as the key to match a task, but in preparation for some io_kiocb changes, move it to using struct io_uring_task instead. No functional changes intended in this patch.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
ba4366f5 |
| 30-Sep-2024 |
Jens Axboe <[email protected]> |
io_uring/poll: get rid of per-hashtable bucket locks
Any access to the table is protected by ctx->uring_lock now anyway, the per-bucket locking doesn't buy us anything.
Signed-off-by: Jens Axboe <a
io_uring/poll: get rid of per-hashtable bucket locks
Any access to the table is protected by ctx->uring_lock now anyway, the per-bucket locking doesn't buy us anything.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
879ba46a |
| 30-Sep-2024 |
Jens Axboe <[email protected]> |
io_uring/poll: get rid of io_poll_tw_hash_eject()
It serves no purposes anymore, all it does is delete the hash list entry. task_work always has the ring locked.
Signed-off-by: Jens Axboe <axboe@ke
io_uring/poll: get rid of io_poll_tw_hash_eject()
It serves no purposes anymore, all it does is delete the hash list entry. task_work always has the ring locked.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
08526882 |
| 30-Sep-2024 |
Jens Axboe <[email protected]> |
io_uring/poll: get rid of unlocked cancel hash
io_uring maintains two hash lists of inflight requests:
1) ctx->cancel_table_locked. This is used when the caller has the ctx->uring_lock held alre
io_uring/poll: get rid of unlocked cancel hash
io_uring maintains two hash lists of inflight requests:
1) ctx->cancel_table_locked. This is used when the caller has the ctx->uring_lock held already. This is only an issue side parameter, as removal or task_work will always have it held.
2) ctx->cancel_table. This is used when the issuer does NOT have the ctx->uring_lock held, and relies on the table spinlocks for access.
However, it's pretty trivial to simply grab the lock in the one spot where we care about it, for insertion. With that, we can kill the unlocked table (and get rid of the _locked postfix for the other one).
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
829ab73e |
| 30-Sep-2024 |
Jens Axboe <[email protected]> |
io_uring/poll: remove 'ctx' argument from io_poll_req_delete()
It's always req->ctx being used anyway, having this as a separate argument (that is then not even used) just makes it more confusing.
io_uring/poll: remove 'ctx' argument from io_poll_req_delete()
It's always req->ctx being used anyway, having this as a separate argument (that is then not even used) just makes it more confusing.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
2c762be5 |
| 29-Jul-2024 |
Olivier Langlois <[email protected]> |
io_uring: keep multishot request NAPI timeout current
This refresh statement was originally present in the original patch: https://lore.kernel.org/netdev/[email protected]/
I
io_uring: keep multishot request NAPI timeout current
This refresh statement was originally present in the original patch: https://lore.kernel.org/netdev/[email protected]/
It has been removed with no explanation in v6: https://lore.kernel.org/netdev/[email protected]/
It is important to make the refresh for multishot requests, because if no new requests using the same NAPI device are added to the ring, the entry will become stale and be removed silently. The unsuspecting user will not know that their ring had busy polling for only 60 seconds before being pruned.
Signed-off-by: Olivier Langlois <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Fixes: 8d0c12a80cdeb ("io-uring: add napi busy poll support") Cc: [email protected] Link: https://lore.kernel.org/r/0fe61a019ec61e5708cd117cb42ed0dab95e1617.1722294646.git.olivier@trillion01.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
414d0f45 |
| 20-Mar-2024 |
Jens Axboe <[email protected]> |
io_uring/alloc_cache: switch to array based caching
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage.
Outsi
io_uring/alloc_cache: switch to array based caching
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage.
Outside of that detail, games are also played with KASAN as the list is inside the cached entry itself.
Finally, all users of this need a struct io_cache_entry embedded in their struct, which is union'ized with something else in there that isn't used across the free -> realloc cycle.
Get rid of all of that, and simply have it be an array. This will not change the memory used, as we're just trading an 8-byte member entry for the per-elem array size.
This reduces the overhead of the recycled allocations, and it reduces the amount of code code needed to support recycling to about half of what it currently is.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
e5c12945 |
| 18-Mar-2024 |
Pavel Begunkov <[email protected]> |
io_uring: refactor io_fill_cqe_req_aux
The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task conte
io_uring: refactor io_fill_cqe_req_aux
The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task context, either from the syscall path or in task_work. Since task_work now always takes the ctx lock implying IO_URING_F_COMPLETE_DEFER, we can just assume that the function is always called with its defer argument set to true.
Kill the argument. Also rename the function for more consistency as "fill" in CQE related functions was usually meant for raw interfaces only copying data into the CQ without any locking, waking the user and other accounting "post" functions take care of.
Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
8e5b3b89 |
| 18-Mar-2024 |
Pavel Begunkov <[email protected]> |
io_uring: remove struct io_tw_state::locked
ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which
io_uring: remove struct io_tw_state::locked
ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later.
Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
5e3afe58 |
| 15-Mar-2024 |
Pavel Begunkov <[email protected]> |
io_uring: fix poll_remove stalled req completion
Taking the ctx lock is not enough to use the deferred request completion infrastructure, it'll get queued into the list but no one would expect it th
io_uring: fix poll_remove stalled req completion
Taking the ctx lock is not enough to use the deferred request completion infrastructure, it'll get queued into the list but no one would expect it there, so it will sit there until next io_submit_flush_completions(). It's hard to care about the cancellation path, so complete it via tw.
Fixes: ef7dfac51d8ed ("io_uring/poll: serialize poll linked timer start with poll removal") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c446740bc16858f8a2a8dcdce899812f21d15f23.1710514702.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.8 |
|
| #
1a8ec63b |
| 07-Mar-2024 |
Pavel Begunkov <[email protected]> |
io_uring: fix io_queue_proc modifying req->flags
With multiple poll entries __io_queue_proc() might be running in parallel with poll handlers and possibly task_work, we should not be carelessly modi
io_uring: fix io_queue_proc modifying req->flags
With multiple poll entries __io_queue_proc() might be running in parallel with poll handlers and possibly task_work, we should not be carelessly modifying req->flags there. io_poll_double_prepare() handles a similar case with locking but it's much easier to move it into __io_arm_poll_handler().
Cc: [email protected] Fixes: 595e52284d24a ("io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/455cc49e38cf32026fa1b49670be8c162c2cb583.1709834755.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
8d0c12a8 |
| 08-Jun-2023 |
Stefan Roesch <[email protected]> |
io-uring: add napi busy poll support
This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id's that are cu
io-uring: add napi busy poll support
This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id's that are currently enabled for busy polling. The list is synchronized by the new napi_lock spin lock. The current default napi busy polling time is stored in napi_busy_poll_to. If napi busy polling is not enabled, the value is 0.
In addition there is also a hash table. The hash table store the napi id and the pointer to the above list nodes. The hash table is used to speed up the lookup to the list elements. The hash table is synchronized with rcu.
The NAPI_TIMEOUT is stored as a timeout to make sure that the time a napi entry is stored in the napi list is limited.
The busy poll timeout is also stored as part of the io_wait_queue. This is necessary as for sq polling the poll interval needs to be adjusted and the napi callback allows only to pass in one value.
This has been tested with two simple programs from the liburing library repository: the napi client and the napi server program. The client sends a request, which has a timestamp in its payload and the server replies with the same payload. The client calculates the roundtrip time and stores it to calculate the results.
The client is running on host1 and the server is running on host 2 (in the same rack). The measured times below are roundtrip times. They are average times over 5 runs each. Each run measures 1 million roundtrips.
no rx coal rx coal: frames=88,usecs=33 Default 57us 56us
client_poll=100us 47us 46us
server_poll=100us 51us 46us
client_poll=100us+ 40us 40us server_poll=100us
client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client
client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on server
client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client + server
Signed-off-by: Stefan Roesch <[email protected]> Suggested-by: Olivier Langlois <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
3cdc4be1 |
| 01-Feb-2024 |
Jens Axboe <[email protected]> |
io_uring/poll: improve readability of poll reference decrementing
This overly long line is hard to read. Break it up by AND'ing the ref mask first, then perform the atomic_sub_return() with the valu
io_uring/poll: improve readability of poll reference decrementing
This overly long line is hard to read. Break it up by AND'ing the ref mask first, then perform the atomic_sub_return() with the value itself.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
95041b93 |
| 29-Jan-2024 |
Jens Axboe <[email protected]> |
io_uring: add io_file_can_poll() helper
This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at leas
io_uring: add io_file_can_poll() helper
This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads, and if using ring provided buffers, we do it on every buffer selection. Particularly the latter is troublesome, as it's otherwise a very fast operation.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
521223d7 |
| 29-Jan-2024 |
Jens Axboe <[email protected]> |
io_uring/cancel: don't default to setting req->work.cancel_seq
Just leave it unset by default, avoiding dipping into the last cacheline (which is otherwise untouched) for the fast path of using poll
io_uring/cancel: don't default to setting req->work.cancel_seq
Just leave it unset by default, avoiding dipping into the last cacheline (which is otherwise untouched) for the fast path of using poll to drive networked traffic. Add a flag that tells us if the sequence is valid or not, and then we can defer actually assigning the flag and sequence until someone runs cancelations.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
704ea888 |
| 29-Jan-2024 |
Jens Axboe <[email protected]> |
io_uring/poll: add requeue return code from poll multishot handling
Since our poll handling is edge triggered, multishot handlers retry internally until they know that no more data is available. In
io_uring/poll: add requeue return code from poll multishot handling
Since our poll handling is edge triggered, multishot handlers retry internally until they know that no more data is available. In preparation for limiting these retries, add an internal return code, IOU_REQUEUE, which can be used to inform the poll backend about the handler wanting to retry, but that this should happen through a normal task_work requeue rather than keep hammering on the issue side for this one request.
No functional changes in this patch, nobody is using this return code just yet.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
e84b01a8 |
| 29-Jan-2024 |
Jens Axboe <[email protected]> |
io_uring/poll: move poll execution helpers higher up
In preparation for calling __io_poll_execute() higher up, move the functions to avoid forward declarations.
No functional changes in this patch.
io_uring/poll: move poll execution helpers higher up
In preparation for calling __io_poll_execute() higher up, move the functions to avoid forward declarations.
No functional changes in this patch.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|