|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
0fb3f560 |
| 31-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/epoll: remove CONFIG_EPOLL guards
Just have the Makefile add the object if epoll is enabled, then it's not necessary to guard the entire epoll.c file inside an CONFIG_EPOLL ifdef.
Signed-o
io_uring/epoll: remove CONFIG_EPOLL guards
Just have the Makefile add the object if epoll is enabled, then it's not necessary to guard the entire epoll.c file inside an CONFIG_EPOLL ifdef.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
6f377873 |
| 15-Feb-2025 |
David Wei <[email protected]> |
io_uring/zcrx: add interface queue and refill queue
Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered usi
io_uring/zcrx: add interface queue and refill queue
Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered using a new registration opcode IORING_REGISTER_ZCRX_IFQ.
The refill queue is allocated by the kernel and mapped by userspace using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main SQ/CQ. It is used by userspace to return buffers that it is done with, which will then be re-used by the netdev again.
The main CQ ring is used to notify userspace of received data by using the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each entry contains the offset + len to the data.
For now, each io_uring instance only has a single ifq.
Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: David Wei <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
d19af0e9 |
| 28-Jan-2025 |
Pavel Begunkov <[email protected]> |
io_uring: add alloc_cache.c
Avoid inlining all and everything from alloc_cache.h and move cold bits into a new file.
Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Gabriel Kris
io_uring: add alloc_cache.c
Avoid inlining all and everything from alloc_cache.h and move cold bits into a new file.
Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Link: https://lore.kernel.org/r/06984c6cd58e703f7cfae5ab3067912f9f635a06.1738087204.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
1802656e |
| 30-Aug-2024 |
Jens Axboe <[email protected]> |
io_uring: add GCOV_PROFILE_URING Kconfig option
If GCOV is enabled and this option is set, it enables code coverage profiling of the io_uring subsystem. Only use this for test purposes, as it will i
io_uring: add GCOV_PROFILE_URING Kconfig option
If GCOV is enabled and this option is set, it enables code coverage profiling of the io_uring subsystem. Only use this for test purposes, as it will impact the runtime performance.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
200f3abd |
| 03-Jun-2024 |
Jens Axboe <[email protected]> |
io_uring/eventfd: move eventfd handling to separate file
This is pretty nicely abstracted already, but let's move it to a separate file rather than have it in the main io_uring file. With that, we c
io_uring/eventfd: move eventfd handling to separate file
This is pretty nicely abstracted already, but let's move it to a separate file rather than have it in the main io_uring file. With that, we can also move the io_ev_fd struct and enum out of global scope.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
f15ed8b4 |
| 27-Mar-2024 |
Jens Axboe <[email protected]> |
io_uring: move mapping/allocation helpers to a separate file
Move the related code from io_uring.c into memmap.c. No functional changes in this patch, just cleaning it up a bit now that the full tra
io_uring: move mapping/allocation helpers to a separate file
Move the related code from io_uring.c into memmap.c. No functional changes in this patch, just cleaning it up a bit now that the full transition is done.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
77a1cd5e |
| 27-Mar-2024 |
Jens Axboe <[email protected]> |
io_uring: re-arrange Makefile order
The object list is a bit of a mess, with core and opcode files mixed in. Re-arrange it so that we have the core bits first, and then opcode specific files after t
io_uring: re-arrange Makefile order
The object list is a bit of a mess, with core and opcode files mixed in. Re-arrange it so that we have the core bits first, and then opcode specific files after that.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
8d0c12a8 |
| 08-Jun-2023 |
Stefan Roesch <[email protected]> |
io-uring: add napi busy poll support
This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id's that are cu
io-uring: add napi busy poll support
This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id's that are currently enabled for busy polling. The list is synchronized by the new napi_lock spin lock. The current default napi busy polling time is stored in napi_busy_poll_to. If napi busy polling is not enabled, the value is 0.
In addition there is also a hash table. The hash table store the napi id and the pointer to the above list nodes. The hash table is used to speed up the lookup to the list elements. The hash table is synchronized with rcu.
The NAPI_TIMEOUT is stored as a timeout to make sure that the time a napi entry is stored in the napi list is limited.
The busy poll timeout is also stored as part of the io_wait_queue. This is necessary as for sq polling the poll interval needs to be adjusted and the napi callback allows only to pass in one value.
This has been tested with two simple programs from the liburing library repository: the napi client and the napi server program. The client sends a request, which has a timestamp in its payload and the server replies with the same payload. The client calculates the roundtrip time and stores it to calculate the results.
The client is running on host1 and the server is running on host 2 (in the same rack). The measured times below are roundtrip times. They are average times over 5 runs each. Each run measures 1 million roundtrips.
no rx coal rx coal: frames=88,usecs=33 Default 57us 56us
client_poll=100us 47us 46us
server_poll=100us 51us 46us
client_poll=100us+ 40us 40us server_poll=100us
client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client
client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on server
client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client + server
Signed-off-by: Stefan Roesch <[email protected]> Suggested-by: Olivier Langlois <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
b4bb1900 |
| 02-Feb-2024 |
Tony Solomonik <[email protected]> |
io_uring: add support for ftruncate
Adds support for doing truncate through io_uring, eliminating the need for applications to roll their own thread pool or offload mechanism to be able to do non-bl
io_uring: add support for ftruncate
Adds support for doing truncate through io_uring, eliminating the need for applications to roll their own thread pool or offload mechanism to be able to do non-blocking truncates.
Signed-off-by: Tony Solomonik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
c4320315 |
| 19-Dec-2023 |
Jens Axboe <[email protected]> |
io_uring/register: move io_uring_register(2) related code to register.c
Most of this code is basically self contained, move it out of the core io_uring file to bring a bit more separation to the reg
io_uring/register: move io_uring_register(2) related code to register.c
Most of this code is basically self contained, move it out of the core io_uring file to bring a bit more separation to the registration related bits. This moves another ~10% of the code into register.c.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
194bb58c |
| 08-Jun-2023 |
Jens Axboe <[email protected]> |
io_uring: add support for futex wake and wait
Add support for FUTEX_WAKE/WAIT primitives.
IORING_OP_FUTEX_WAKE is mix of FUTEX_WAKE and FUTEX_WAKE_BITSET, as it does support passing in a bitset.
S
io_uring: add support for futex wake and wait
Add support for FUTEX_WAKE/WAIT primitives.
IORING_OP_FUTEX_WAKE is mix of FUTEX_WAKE and FUTEX_WAKE_BITSET, as it does support passing in a bitset.
Similary, IORING_OP_FUTEX_WAIT is a mix of FUTEX_WAIT and FUTEX_WAIT_BITSET.
For both of them, they are using the futex2 interface.
FUTEX_WAKE is straight forward, as those can always be done directly from the io_uring submission without needing async handling. For FUTEX_WAIT, things are a bit more complicated. If the futex isn't ready, then we rely on a callback via futex_queue->wake() when someone wakes up the futex. From that calback, we queue up task_work with the original task, which will post a CQE and wake it, if necessary.
Cancelations are supported, both from the application point-of-view, but also to be able to cancel pending waits if the ring exits before all events have occurred. The return value of futex_unqueue() is used to gate who wins the potential race between cancelation and futex wakeups. Whomever gets a 'ret == 1' return from that claims ownership of the io_uring futex request.
This is just the barebones wait/wake support. PI or REQUEUE support is not added at this point, unclear if we might look into that later.
Likewise, explicit timeouts are not supported either. It is expected that users that need timeouts would do so via the usual io_uring mechanism to do that using linked timeouts.
The SQE format is as follows:
`addr` Address of futex `fd` futex2(2) FUTEX2_* flags `futex_flags` io_uring specific command flags. None valid now. `addr2` Value of futex `addr3` Mask to wake/wait
Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
f31ecf67 |
| 10-Jul-2023 |
Jens Axboe <[email protected]> |
io_uring: add IORING_OP_WAITID support
This adds support for an async version of waitid(2), in a fully async version. If an event isn't immediately available, wait for a callback to trigger a retry.
io_uring: add IORING_OP_WAITID support
This adds support for an async version of waitid(2), in a fully async version. If an event isn't immediately available, wait for a callback to trigger a retry.
The format of the sqe is as follows:
sqe->len The 'which', the idtype being queried/waited for. sqe->fd The 'pid' (or id) being waited for. sqe->file_index The 'options' being set. sqe->addr2 A pointer to siginfo_t, if any, being filled in.
buf_index, add3, and waitid_flags are reserved/unused for now. waitid_flags will be used for options for this request type. One interesting use case may be to add multi-shot support, so that the request stays armed and posts a notification every time a monitored process state change occurs.
Note that this does not support rusage, on Arnd's recommendation.
See the waitid(2) man page for details on the arguments.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7 |
|
| #
eb42cebb |
| 12-Jul-2022 |
Pavel Begunkov <[email protected]> |
io_uring: add zc notification infrastructure
Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info
io_uring: add zc notification infrastructure
Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf.
The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot.
When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3 |
|
| #
d9b57aa3 |
| 15-Jun-2022 |
Jens Axboe <[email protected]> |
io_uring: move opcode table to opdef.c
We already have the declarations in opdef.h, move the rest into its own file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <axboe@kernel.
io_uring: move opcode table to opdef.c
We already have the declarations in opdef.h, move the rest into its own file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
f3b44f92 |
| 13-Jun-2022 |
Jens Axboe <[email protected]> |
io_uring: move read/write related opcodes to its own file
Signed-off-by: Jens Axboe <[email protected]>
|
| #
73572984 |
| 13-Jun-2022 |
Jens Axboe <[email protected]> |
io_uring: move rsrc related data, core, and commands
Signed-off-by: Jens Axboe <[email protected]>
|
| #
3b77495a |
| 13-Jun-2022 |
Jens Axboe <[email protected]> |
io_uring: split provided buffers handling into its own file
Move both the opcodes related to it, and the internals code dealing with it.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Revision tags: v5.19-rc2, v5.19-rc1 |
|
| #
7aaff708 |
| 26-May-2022 |
Jens Axboe <[email protected]> |
io_uring: move cancelation into its own file
This also helps cleanup the io_uring.h cancel parts, as we can make things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.
io_uring: move cancelation into its own file
This also helps cleanup the io_uring.h cancel parts, as we can make things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
329061d3 |
| 26-May-2022 |
Jens Axboe <[email protected]> |
io_uring: move poll handling into its own file
Add a io_poll_issue() rather than export the general task_work locking and io_issue_sqe(), and put the io_op_defs definition and structure into a separ
io_uring: move poll handling into its own file
Add a io_poll_issue() rather than export the general task_work locking and io_issue_sqe(), and put the io_op_defs definition and structure into a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
c9f06aa7 |
| 25-May-2022 |
Jens Axboe <[email protected]> |
io_uring: move io_uring_task (tctx) helpers into its own file
Signed-off-by: Jens Axboe <[email protected]>
|
| #
a4ad4f74 |
| 25-May-2022 |
Jens Axboe <[email protected]> |
io_uring: move fdinfo helpers to its own file
This also means moving a bit more of the fixed file handling to the filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <axboe@
io_uring: move fdinfo helpers to its own file
This also means moving a bit more of the fixed file handling to the filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
17437f31 |
| 25-May-2022 |
Jens Axboe <[email protected]> |
io_uring: move SQPOLL related handling into its own file
Signed-off-by: Jens Axboe <[email protected]>
|
| #
59915143 |
| 25-May-2022 |
Jens Axboe <[email protected]> |
io_uring: move timeout opcodes and handling into its own file
Signed-off-by: Jens Axboe <[email protected]>
|
| #
36404b09 |
| 25-May-2022 |
Jens Axboe <[email protected]> |
io_uring: move msg_ring into its own file
Signed-off-by: Jens Axboe <[email protected]>
|
| #
f9ead18c |
| 25-May-2022 |
Jens Axboe <[email protected]> |
io_uring: split network related opcodes into its own file
While at it, convert the handlers to just use io_eopnotsupp_prep() if CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <[email protected]>
|