|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
6889ae1b |
| 27-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring/net: fix io_req_post_cqe abuse by send bundle
[ 114.987980][ T5313] WARNING: CPU: 6 PID: 5313 at io_uring/io_uring.c:872 io_req_post_cqe+0x12e/0x4f0 [ 114.991597][ T5313] RIP: 0010:io_req
io_uring/net: fix io_req_post_cqe abuse by send bundle
[ 114.987980][ T5313] WARNING: CPU: 6 PID: 5313 at io_uring/io_uring.c:872 io_req_post_cqe+0x12e/0x4f0 [ 114.991597][ T5313] RIP: 0010:io_req_post_cqe+0x12e/0x4f0 [ 115.001880][ T5313] Call Trace: [ 115.002222][ T5313] <TASK> [ 115.007813][ T5313] io_send+0x4fe/0x10f0 [ 115.009317][ T5313] io_issue_sqe+0x1a6/0x1740 [ 115.012094][ T5313] io_wq_submit_work+0x38b/0xed0 [ 115.013223][ T5313] io_worker_handle_work+0x62a/0x1600 [ 115.013876][ T5313] io_wq_worker+0x34f/0xdf0
As the comment states, io_req_post_cqe() should only be used by multishot requests, i.e. REQ_F_APOLL_MULTISHOT, which bundled sends are not. Add a flag signifying whether a request wants to post multiple CQEs. Eventually REQ_F_APOLL_MULTISHOT should imply the new flag, but that's left out for simplicity.
Cc: [email protected] Fixes: a05d1f625c7aa ("io_uring/net: support bundles for send") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8b611dbb54d1cd47a88681f5d38c84d0c02bc563.1743067183.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
73b6dacb |
| 25-Mar-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc
Instead of a bool field in struct io_sr_msg, use REQ_F_IMPORT_BUFFER to track whether io_send_zc() has already imported the buffer. This flag alread
io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc
Instead of a bool field in struct io_sr_msg, use REQ_F_IMPORT_BUFFER to track whether io_send_zc() has already imported the buffer. This flag already serves a similar purpose for sendmsg_zc and {read,write}v_fixed.
Signed-off-by: Caleb Sander Mateos <[email protected]> Suggested-by: Pavel Begunkov <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14 |
|
| #
575e7b06 |
| 19-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: rename the data cmd cache
Pick a more descriptive name for the cmd async data cache.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/20250319061251.
io_uring: rename the data cmd cache
Pick a more descriptive name for the cmd async data cache.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6 |
|
| #
835c4bdf |
| 07-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring/rw: defer reg buf vec import
Import registered buffers for vectored reads and writes later at issue time as we now do for other fixed ops.
Signed-off-by: Pavel Begunkov <asml.silence@gmail
io_uring/rw: defer reg buf vec import
Import registered buffers for vectored reads and writes later at issue time as we now do for other fixed ops.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/e8491c976e4ab83a4e3dc428e9fe7555e59583b8.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
9ef4cbbc |
| 07-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: add infra for importing vectored reg buffers
Add io_import_reg_vec(), which will be responsible for importing vectored registered buffers. The function might reallocate the vector, but it'
io_uring: add infra for importing vectored reg buffers
Add io_import_reg_vec(), which will be responsible for importing vectored registered buffers. The function might reallocate the vector, but it'd try to do the conversion in place first, which is why it's required of the user to pad the iovec to the right border of the cache.
Overlapping also depends on struct iovec being larger than bvec, which is not the case on e.g. 32 bit architectures. Don't try to complicate this case and make sure vectors never overlap, it'll be improved later.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/60bd246b1249476a6996407c1dbc38ef6febad14.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
e1d49959 |
| 07-Mar-2025 |
Pavel Begunkov <[email protected]> |
io_uring: introduce struct iou_vec
I need a convenient way to pass around and work with iovec+size pair, put them into a structure and makes use of it in rw.c
Signed-off-by: Pavel Begunkov <asml.si
io_uring: introduce struct iou_vec
I need a convenient way to pass around and work with iovec+size pair, put them into a structure and makes use of it in rw.c
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/d39fadafc9e9047b0a292e5be6db3cf2f48bb1f7.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
09fdd351 |
| 28-Feb-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring: convert cmd_to_io_kiocb() macro to function
The cmd_to_io_kiocb() macro applies a pointer cast to its input without parenthesizing it. Currently all inputs are variable names, so this has
io_uring: convert cmd_to_io_kiocb() macro to function
The cmd_to_io_kiocb() macro applies a pointer cast to its input without parenthesizing it. Currently all inputs are variable names, so this has the intended effect. But since casts have relatively high precedence, the macro would apply the cast to the wrong value if the input was a pointer addition, for example.
Turn the macro into a static inline function to ensure the pointer cast is applied to the full input value.
Signed-off-by: Caleb Sander Mateos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
ed9f3112 |
| 27-Feb-2025 |
Keith Busch <[email protected]> |
io_uring: cache nodes and mapped buffers
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers.
Signed-off-by: Keith Busch <[email protected]
io_uring: cache nodes and mapped buffers
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers.
Signed-off-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: fix imu leak] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3 |
|
| #
6f377873 |
| 15-Feb-2025 |
David Wei <[email protected]> |
io_uring/zcrx: add interface queue and refill queue
Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered usi
io_uring/zcrx: add interface queue and refill queue
Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered using a new registration opcode IORING_REGISTER_ZCRX_IFQ.
The refill queue is allocated by the kernel and mapped by userspace using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main SQ/CQ. It is used by userspace to return buffers that it is done with, which will then be re-used by the netdev again.
The main CQ ring is used to notify userspace of received data by using the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each entry contains the offset + len to the data.
For now, each io_uring instance only has a single ifq.
Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: David Wei <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
94a4274b |
| 17-Feb-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring: pass struct io_tw_state by value
8e5b3b89ecaf ("io_uring: remove struct io_tw_state::locked") removed the only field of io_tw_state but kept it as a task work callback argument to "forc[e]
io_uring: pass struct io_tw_state by value
8e5b3b89ecaf ("io_uring: remove struct io_tw_state::locked") removed the only field of io_tw_state but kept it as a task work callback argument to "forc[e] users not to invoke them carelessly out of a wrong context". Passing the struct io_tw_state * argument adds a few instructions to all callers that can't inline the functions and see the argument is unused.
So pass struct io_tw_state by value instead. Since it's a 0-sized value, it can be passed without any instructions needed to initialize it.
Signed-off-by: Caleb Sander Mateos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
bcf8a029 |
| 17-Feb-2025 |
Caleb Sander Mateos <[email protected]> |
io_uring: introduce type alias for io_tw_state
In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the repr
io_uring: introduce type alias for io_tw_state
In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the representation in one place, without having to update the many functions that just forward their struct io_tw_state * argument.
Also add a comment to struct io_tw_state to explain its purpose.
Signed-off-by: Caleb Sander Mateos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2 |
|
| #
13ee854e |
| 05-Feb-2025 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: remove legacy kbuf caching
Remove all struct io_buffer caches. It makes it a fair bit simpler. Apart from from killing a bunch of lines and juggling between lists, __io_put_kbuf_list(
io_uring/kbuf: remove legacy kbuf caching
Remove all struct io_buffer caches. It makes it a fair bit simpler. Apart from from killing a bunch of lines and juggling between lists, __io_put_kbuf_list() doesn't need ->completion_lock locking now.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/18287217466ee2576ea0b1e72daccf7b22c7e856.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1 |
|
| #
fa359552 |
| 23-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring: get rid of alloc cache init_once handling
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could
io_uring: get rid of alloc cache init_once handling
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could get cleared by memset() in that case, a few of the cache members are large enough that this may cause unnecessary overhead if the caches used aren't large enough to satisfy the workload. For those cases, some churn of kmalloc+kfree is to be expected.
Ensure that the 3 users that need clearing put the members they need cleared at the start of the struct, and wrap the rest of the struct in a struct group so the offset is known.
While at it, improve the interaction with KASAN such that when/if KASAN writes to members inside the struct that should be retained over caching, it won't trip over itself. For rw and net, the retaining of the iovec over caching is disabled if KASAN is enabled. A helper will free and clear those members in that case.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
1143be17 |
| 17-Dec-2024 |
Jens Axboe <[email protected]> |
io_uring/rw: don't mask in f_iocb_flags
A previous commit changed overwriting kiocb->ki_flags with ->f_iocb_flags with masking it in. This breaks for retry situations, where we don't necessarily wan
io_uring/rw: don't mask in f_iocb_flags
A previous commit changed overwriting kiocb->ki_flags with ->f_iocb_flags with masking it in. This breaks for retry situations, where we don't necessarily want to retain previously set flags, like IOCB_NOWAIT.
The use case needs IOCB_HAS_METADATA to be persistent, but the change makes all flags persistent, which is an issue. Add a request flag to track whether the request has metadata or not, as that is persistent across issues.
Fixes: 59a7d12a7fb5 ("io_uring: introduce attributes for read/write and PI support") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
78fda3d0 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: use mmap_lock to sync with mmap
A preparation / cleanup patch simplifying the buf ring - mmap synchronisation. Instead of relying on RCU, which is trickier, do it by grabbing the mmap
io_uring/kbuf: use mmap_lock to sync with mmap
A preparation / cleanup patch simplifying the buf ring - mmap synchronisation. Instead of relying on RCU, which is trickier, do it by grabbing the mmap_lock when when anyone tries to publish or remove a registered buffer to / from ->io_bl_xa.
Modifications of the xarray should always be protected by both ->uring_lock and ->mmap_lock, while lookups should hold either of them. While a struct io_buffer_list is in the xarray, the mmap related fields like ->flags and ->buf_pages should stay stable.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/af13bde56ee1a26bcaefaa9aad37a9ea318a590e.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
81a4058e |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: use region api for CQ
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/46fc3c8
io_uring: use region api for CQ
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/46fc3c801290d6b1ac16023d78f6b8e685c87fd6.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
8078486e |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: use region api for SQ
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1fb73ced6b835cb3
io_uring: use region api for SQ
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1fb73ced6b835cb319ab0fe1dc0b2e982a9a5650.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
a730d204 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: flag vmap'ed regions
Add internal flags for struct io_mapped_region. The first flag we need is IO_REGION_F_VMAPPED, that indicates that the pointer has to be unmapped on region dest
io_uring/memmap: flag vmap'ed regions
Add internal flags for struct io_mapped_region. The first flag we need is IO_REGION_F_VMAPPED, that indicates that the pointer has to be unmapped on region destruction. For now all regions are vmap'ed, so it's set unconditionally.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/5a3d8046a038da97c0f8a8c8f1733fa3fc689d31.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
943d0609 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: rename ->resize_lock
->resize_lock is used for resizing rings, but it's a good idea to reuse it in other cases as well. Rename it into mmap_lock as it's protects from races with mmap.
Sig
io_uring: rename ->resize_lock
->resize_lock is used for resizing rings, but it's a good idea to reuse it in other cases as well. Rename it into mmap_lock as it's protects from races with mmap.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/68f705306f3ac4d2fb999eb80ea1615015ce9f7f.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
020b40f3 |
| 17-Dec-2024 |
Jens Axboe <[email protected]> |
io_uring: make ctx->timeout_lock a raw spinlock
Chase reports that their tester complaints about a locking context mismatch:
============================= [ BUG: Invalid wait context ] 6.13.0-rc1-g
io_uring: make ctx->timeout_lock a raw spinlock
Chase reports that their tester complaints about a locking context mismatch:
============================= [ BUG: Invalid wait context ] 6.13.0-rc1-gf137f14b7ccb-dirty #9 Not tainted ----------------------------- syz.1.25198/182604 is trying to lock: ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: spin_lock_irq include/linux/spinlock.h:376 [inline] ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: io_match_task_safe io_uring/io_uring.c:218 [inline] ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204 other info that might help us debug this: context-{5:5} 1 lock held by syz.1.25198/182604: #0: ffff88802b7d48c0 (&acct->lock){+.+.}-{2:2}, at: io_acct_cancel_pending_work+0x2d/0x6b0 io_uring/io-wq.c:1049 stack backtrace: CPU: 0 UID: 0 PID: 182604 Comm: syz.1.25198 Not tainted 6.13.0-rc1-gf137f14b7ccb-dirty #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x82/0xd0 lib/dump_stack.c:120 print_lock_invalid_wait_context kernel/locking/lockdep.c:4826 [inline] check_wait_context kernel/locking/lockdep.c:4898 [inline] __lock_acquire+0x883/0x3c80 kernel/locking/lockdep.c:5176 lock_acquire.part.0+0x11b/0x370 kernel/locking/lockdep.c:5849 __raw_spin_lock_irq include/linux/spinlock_api_smp.h:119 [inline] _raw_spin_lock_irq+0x36/0x50 kernel/locking/spinlock.c:170 spin_lock_irq include/linux/spinlock.h:376 [inline] io_match_task_safe io_uring/io_uring.c:218 [inline] io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204 io_acct_cancel_pending_work+0xb8/0x6b0 io_uring/io-wq.c:1052 io_wq_cancel_pending_work io_uring/io-wq.c:1074 [inline] io_wq_cancel_cb+0xb0/0x390 io_uring/io-wq.c:1112 io_uring_try_cancel_requests+0x15e/0xd70 io_uring/io_uring.c:3062 io_uring_cancel_generic+0x6ec/0x8c0 io_uring/io_uring.c:3140 io_uring_files_cancel include/linux/io_uring.h:20 [inline] do_exit+0x494/0x27a0 kernel/exit.c:894 do_group_exit+0xb3/0x250 kernel/exit.c:1087 get_signal+0x1d77/0x1ef0 kernel/signal.c:3017 arch_do_signal_or_restart+0x79/0x5b0 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xd8/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f
which is because io_uring has ctx->timeout_lock nesting inside the io-wq acct lock, the latter of which is used from inside the scheduler and hence is a raw spinlock, while the former is a "normal" spinlock and can hence be sleeping on PREEMPT_RT.
Change ctx->timeout_lock to be a raw spinlock to solve this nesting dependency on PREEMPT_RT=y.
Reported-by: chase xd <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
f46b9cdb |
| 20-Nov-2024 |
David Wei <[email protected]> |
io_uring: limit local tw done
Instead of eagerly running all available local tw, limit the amount of local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The value of 20 is chosen as
io_uring: limit local tw done
Instead of eagerly running all available local tw, limit the amount of local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The value of 20 is chosen as a reasonable heuristic to allow enough work batching but also keep latency down.
Add a retry_llist that maintains a list of local tw that couldn't be done in time. No synchronisation is needed since it is only modified within the task context.
Signed-off-by: David Wei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12 |
|
| #
d617b314 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: restore back registered wait arguments
Now we've got a more generic region registration API, place IORING_ENTER_EXT_ARG_REG and re-enable it.
First, the user has to register a region with
io_uring: restore back registered wait arguments
Now we've got a more generic region registration API, place IORING_ENTER_EXT_ARG_REG and re-enable it.
First, the user has to register a region with the IORING_MEM_REGION_REG_WAIT_ARG flag set. It can only be done for a ring in a disabled state, aka IORING_SETUP_R_DISABLED, to avoid races with already running waiters. With that we should have stable constant values for ctx->cq_wait_{size,arg} in io_get_ext_arg_reg() and hence no READ_ONCE required.
The other API difference is that we're now passing byte offsets instead of indexes. The user _must_ align all offsets / pointers to the native word size, failing to do so might but not necessarily has to lead to a failure usually returned as -EFAULT. liburing will be hiding this details from users.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/81822c1b4ffbe8ad391b4f9ad1564def0d26d990.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
93238e66 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: add memory region registration
Regions will serve multiple purposes. First, with it we can decouple ring/etc. object creation from registration / mapping of the memory they will be placed
io_uring: add memory region registration
Regions will serve multiple purposes. First, with it we can decouple ring/etc. object creation from registration / mapping of the memory they will be placed in. We already have hacks that allow to put both SQ and CQ into the same huge page, in the future we should be able to:
region = create_region(io_ring); create_pbuf_ring(io_uring, region, offset=0); create_pbuf_ring(io_uring, region, offset=N);
The second use case is efficiently passing parameters. The following patch enables back on top of regions IORING_ENTER_EXT_ARG_REG, which optimises wait arguments. It'll also be useful for request arguments replacing iovecs, msghdr, etc. pointers. Eventually it would also be handy for BPF as well if it comes to fruition.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0798cf3a14fad19cfc96fc9feca5f3e11481691d.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
dfbbfbf1 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: introduce concept of memory regions
We've got a good number of mappings we share with the userspace, that includes the main rings, provided buffer rings, upcoming rings for zerocopy rx and
io_uring: introduce concept of memory regions
We've got a good number of mappings we share with the userspace, that includes the main rings, provided buffer rings, upcoming rings for zerocopy rx and more. All of them duplicate user argument parsing and some internal details as well (page pinnning, huge page optimisations, mmap'ing, etc.)
Introduce a notion of regions. For userspace for now it's just a new structure called struct io_uring_region_desc which is supposed to parameterise all such mapping / queue creations. A region either represents a user provided chunk of memory, in which case the user_addr field should point to it, or a request for the kernel to allocate the memory, in which case the user would need to mmap it after using the offset returned in the mmap_offset field. With a uniform userspace API we can avoid additional boiler plate code and apply future optimisation to all of them at once.
Internally, there is a new structure struct io_mapped_region holding all relevant runtime information and some helpers to work with it. This patch limits it to user provided regions.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0e6fe25818dfbaebd1bd90b870a6cac503fe1a24.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
83e04152 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: temporarily disable registered waits
Disable wait argument registration as it'll be replaced with a more generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing in a few commits
io_uring: temporarily disable registered waits
Disable wait argument registration as it'll be replaced with a more generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing in a few commits so leave it be.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/70b1d1d218c41ba77a76d1789c8641dab0b0563e.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|