|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
6f377873 |
| 15-Feb-2025 |
David Wei <[email protected]> |
io_uring/zcrx: add interface queue and refill queue
Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered usi
io_uring/zcrx: add interface queue and refill queue
Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered using a new registration opcode IORING_REGISTER_ZCRX_IFQ.
The refill queue is allocated by the kernel and mapped by userspace using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main SQ/CQ. It is used by userspace to return buffers that it is done with, which will then be re-used by the netdev again.
The main CQ ring is used to notify userspace of received data by using the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each entry contains the offset + len to the data.
For now, each io_uring instance only has a single ifq.
Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: David Wei <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1 |
|
| #
a23ad06b |
| 24-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/register: use atomic_read/write for sq_flags migration
A previous commit changed all of the migration from the old to the new ring for resizing to use READ/WRITE_ONCE. However, ->sq_flags i
io_uring/register: use atomic_read/write for sq_flags migration
A previous commit changed all of the migration from the old to the new ring for resizing to use READ/WRITE_ONCE. However, ->sq_flags is an atomic_t, and while most archs won't complain on this, some will indeed flag this:
io_uring/register.c:554:9: sparse: sparse: cast to non-scalar io_uring/register.c:554:9: sparse: sparse: cast from non-scalar
Just use atomic_set/atomic_read for handling this case.
Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: 2c5aae129f42 ("io_uring/register: document io_register_resize_rings() shared mem usage") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13 |
|
| #
bb2d7634 |
| 16-Jan-2025 |
Pavel Begunkov <[email protected]> |
io_uring: clean up io_uring_register_get_file()
Make it always reference the returned file. It's safer, especially with unregistrations happening under it. And it makes the api cleaner with no condi
io_uring: clean up io_uring_register_get_file()
Make it always reference the returned file. It's safer, especially with unregistrations happening under it. And it makes the api cleaner with no conditional clean ups by the caller.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0d0b13a63e8edd6b5d360fc821dcdb035cb6b7e0.1736995897.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
53745105 |
| 15-Jan-2025 |
Josh Triplett <[email protected]> |
io_uring: Factor out a function to parse restrictions
Preparation for subsequent work on inherited restrictions.
Signed-off-by: Josh Triplett <[email protected]> Reviewed-by: Pavel Begunkov <as
io_uring: Factor out a function to parse restrictions
Preparation for subsequent work on inherited restrictions.
Signed-off-by: Josh Triplett <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/9bac2b4d1b9b9ab41c55ea3816021be847f354df.1736932318.git.josh@joshtriplett.org Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
6f7a644e |
| 15-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/register: cache old SQ/CQ head reading for copies
The SQ and CQ ring heads are read twice - once for verifying that it's within bounds, and once inside the loops copying SQE and CQE entries
io_uring/register: cache old SQ/CQ head reading for copies
The SQ and CQ ring heads are read twice - once for verifying that it's within bounds, and once inside the loops copying SQE and CQE entries. This is technically incorrect, in case the values could get modified in between verifying them and using them in the copy loop. While this won't lead to anything truly nefarious, it may cause longer loop times for the copies than expected.
Read the ring head values once, and use the verified value in the copy loops.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
2c5aae12 |
| 15-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/register: document io_register_resize_rings() shared mem usage
It can be a bit hard to tell which parts of io_register_resize_rings() are operating on shared memory, and which ones are not.
io_uring/register: document io_register_resize_rings() shared mem usage
It can be a bit hard to tell which parts of io_register_resize_rings() are operating on shared memory, and which ones are not. And anything reading or writing to those regions should really use the read/write once primitives.
Hence add those, ensuring sanity in how this memory is accessed, and helping document the shared nature of it.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
8911798d |
| 15-Jan-2025 |
Jens Axboe <[email protected]> |
io_uring/register: use stable SQ/CQ ring data during resize
Normally the kernel would not expect an application to modify any of the data shared with the kernel during a resize operation, but of cou
io_uring/register: use stable SQ/CQ ring data during resize
Normally the kernel would not expect an application to modify any of the data shared with the kernel during a resize operation, but of course the kernel cannot always assume good intent on behalf of the application.
As part of resizing the rings, existing SQEs and CQEs are copied over to the new storage. Resizing uses the masks in the newly allocated shared storage to index the arrays, however it's possible that malicious userspace could modify these after they have been sanity checked.
Use the validated and locally stored CQ and SQ ring sizing for masking to ensure the values are both stable and valid.
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Reported-by: Jann Horn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
81a4058e |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: use region api for CQ
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/46fc3c8
io_uring: use region api for CQ
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/46fc3c801290d6b1ac16023d78f6b8e685c87fd6.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
8078486e |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: use region api for SQ
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1fb73ced6b835cb3
io_uring: use region api for SQ
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1fb73ced6b835cb319ab0fe1dc0b2e982a9a5650.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
02255d55 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: pass ctx to io_register_free_rings
A preparation patch, pass the context to io_register_free_rings.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/
io_uring: pass ctx to io_register_free_rings
A preparation patch, pass the context to io_register_free_rings.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c1865fd2b3d4db22d1a1aac7dd06ea22cb990834.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
087f9978 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: implement mmap for regions
The patch implements mmap for the param region and enables the kernel allocation mode. Internally it uses a fixed mmap offset, however the user has to use
io_uring/memmap: implement mmap for regions
The patch implements mmap for the param region and enables the kernel allocation mode. Internally it uses a fixed mmap offset, however the user has to use the offset returned in struct io_uring_region_desc::mmap_offset.
Note, mmap doesn't and can't take ->uring_lock and the region / ring lookup is protected by ->mmap_lock, and it's directly peeking at ctx->param_region. We can't protect io_create_region() with the mmap_lock as it'd deadlock, which is why io_create_region_mmap_safe() initialises it for us in a temporary variable and then publishes it with the lock taken. It's intentionally decoupled from main region helpers, and in the future we might want to have a list of active regions, which then could be protected by the ->mmap_lock.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0f1212bd6af7fb39b63514b34fae8948014221d1.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
1e21df69 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: implement kernel allocated regions
Allow the kernel to allocate memory for a region. That's the classical way SQ/CQ are allocated. It's not yet useful to user space as there is no w
io_uring/memmap: implement kernel allocated regions
Allow the kernel to allocate memory for a region. That's the classical way SQ/CQ are allocated. It's not yet useful to user space as there is no way to mmap it, which is why it's explicitly disabled in io_register_mem_region().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/7b8c40e6542546bbf93f4842a9a42a7373b81e0d.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
943d0609 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: rename ->resize_lock
->resize_lock is used for resizing rings, but it's a good idea to reuse it in other cases as well. Rename it into mmap_lock as it's protects from races with mmap.
Sig
io_uring: rename ->resize_lock
->resize_lock is used for resizing rings, but it's a good idea to reuse it in other cases as well. Rename it into mmap_lock as it's protects from races with mmap.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/68f705306f3ac4d2fb999eb80ea1615015ce9f7f.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
c261e4f1 |
| 19-Dec-2024 |
Jens Axboe <[email protected]> |
io_uring/register: limit ring resizing to DEFER_TASKRUN
With DEFER_TASKRUN, we know the ring can't be both waited upon and resized at the same time. This is important for CQ resizing. Allowing SQ ri
io_uring/register: limit ring resizing to DEFER_TASKRUN
With DEFER_TASKRUN, we know the ring can't be both waited upon and resized at the same time. This is important for CQ resizing. Allowing SQ ring resizing is more trivial, but isn't the interesting use case. Hence limit ring resizing in general to DEFER_TASKRUN only for now. This isn't a huge problem as CQ ring resizing is generally the most useful on networking type of workloads where it can be hard to size the ring appropriately upfront, and those should be using DEFER_TASKRUN for better performance.
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
e358e09a |
| 18-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: protect register tracing
Syz reports:
BUG: KCSAN: data-race in __se_sys_io_uring_register / io_sqe_files_register
read-write to 0xffff8881021940b8 of 4 bytes by task 5923 on cpu 1: io_s
io_uring: protect register tracing
Syz reports:
BUG: KCSAN: data-race in __se_sys_io_uring_register / io_sqe_files_register
read-write to 0xffff8881021940b8 of 4 bytes by task 5923 on cpu 1: io_sqe_files_register+0x2c4/0x3b0 io_uring/rsrc.c:713 __io_uring_register io_uring/register.c:403 [inline] __do_sys_io_uring_register io_uring/register.c:611 [inline] __se_sys_io_uring_register+0x8d0/0x1280 io_uring/register.c:591 __x64_sys_io_uring_register+0x55/0x70 io_uring/register.c:591 x64_sys_call+0x202/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:428 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff8881021940b8 of 4 bytes by task 5924 on cpu 0: __do_sys_io_uring_register io_uring/register.c:613 [inline] __se_sys_io_uring_register+0xe4a/0x1280 io_uring/register.c:591 __x64_sys_io_uring_register+0x55/0x70 io_uring/register.c:591 x64_sys_call+0x202/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:428 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Which should be due to reading the table size after unlock. We don't care much as it's just to print it in trace, but we might as well do it under the lock.
Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8233af2886a37b57f79e444e3db88fcfda1817ac.1731942203.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12 |
|
| #
d617b314 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: restore back registered wait arguments
Now we've got a more generic region registration API, place IORING_ENTER_EXT_ARG_REG and re-enable it.
First, the user has to register a region with
io_uring: restore back registered wait arguments
Now we've got a more generic region registration API, place IORING_ENTER_EXT_ARG_REG and re-enable it.
First, the user has to register a region with the IORING_MEM_REGION_REG_WAIT_ARG flag set. It can only be done for a ring in a disabled state, aka IORING_SETUP_R_DISABLED, to avoid races with already running waiters. With that we should have stable constant values for ctx->cq_wait_{size,arg} in io_get_ext_arg_reg() and hence no READ_ONCE required.
The other API difference is that we're now passing byte offsets instead of indexes. The user _must_ align all offsets / pointers to the native word size, failing to do so might but not necessarily has to lead to a failure usually returned as -EFAULT. liburing will be hiding this details from users.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/81822c1b4ffbe8ad391b4f9ad1564def0d26d990.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
93238e66 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: add memory region registration
Regions will serve multiple purposes. First, with it we can decouple ring/etc. object creation from registration / mapping of the memory they will be placed
io_uring: add memory region registration
Regions will serve multiple purposes. First, with it we can decouple ring/etc. object creation from registration / mapping of the memory they will be placed in. We already have hacks that allow to put both SQ and CQ into the same huge page, in the future we should be able to:
region = create_region(io_ring); create_pbuf_ring(io_uring, region, offset=0); create_pbuf_ring(io_uring, region, offset=N);
The second use case is efficiently passing parameters. The following patch enables back on top of regions IORING_ENTER_EXT_ARG_REG, which optimises wait arguments. It'll also be useful for request arguments replacing iovecs, msghdr, etc. pointers. Eventually it would also be handy for BPF as well if it comes to fruition.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0798cf3a14fad19cfc96fc9feca5f3e11481691d.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
83e04152 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: temporarily disable registered waits
Disable wait argument registration as it'll be replaced with a more generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing in a few commits
io_uring: temporarily disable registered waits
Disable wait argument registration as it'll be replaced with a more generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing in a few commits so leave it be.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/70b1d1d218c41ba77a76d1789c8641dab0b0563e.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
3597f278 |
| 26-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring/rsrc: unify file and buffer resource tables
For files, there's nr_user_files/file_table/file_data, and buffers have nr_user_bufs/user_bufs/buf_data. There's no reason why file_table and fil
io_uring/rsrc: unify file and buffer resource tables
For files, there's nr_user_files/file_table/file_data, and buffers have nr_user_bufs/user_bufs/buf_data. There's no reason why file_table and file_data can't be the same thing, and ditto for the buffer side. That gets rid of more io_ring_ctx state that's in two spots rather than just being in one spot, as it should be. Put all the registered file data in one locations, and ditto on the buffer front.
This also avoids having both io_rsrc_data->nodes being an allocated array, and ->user_bufs[] or ->file_table.nodes. There's no reason to have this information duplicated. Keep it in one spot, io_rsrc_data, along with how many resources are available.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
aa00f67a |
| 22-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring: add support for fixed wait regions
Generally applications have 1 or a few waits of waiting, yet they pass in a struct io_uring_getevents_arg every time. This needs to get copied and, in tu
io_uring: add support for fixed wait regions
Generally applications have 1 or a few waits of waiting, yet they pass in a struct io_uring_getevents_arg every time. This needs to get copied and, in turn, the timeout value needs to get copied.
Rather than do this for every invocation, allow the application to register a fixed set of wait regions that can simply be indexed when asking the kernel to wait on events.
At ring setup time, the application can register a number of these wait regions and initialize region/index 0 upfront:
struct io_uring_reg_wait *reg;
reg = io_uring_setup_reg_wait(ring, nr_regions, &ret);
/* set timeout and mark as set, sigmask/sigmask_sz as needed */ reg->ts.tv_sec = 0; reg->ts.tv_nsec = 100000; reg->flags = IORING_REG_WAIT_TS;
where nr_regions >= 1 && nr_regions <= PAGE_SIZE / sizeof(*reg). The above initializes index 0, but 63 other regions can be initialized, if needed. Now, instead of doing:
struct __kernel_timespec timeout = { .tv_nsec = 100000, };
io_uring_submit_and_wait_timeout(ring, &cqe, nr, &t, NULL);
to wait for events for each submit_and_wait, or just wait, operation, it can just reference the above region at offset 0 and do:
io_uring_submit_and_wait_reg(ring, &cqe, nr, 0);
to achieve the same goal of waiting 100usec without needing to copy both struct io_uring_getevents_arg (24b) and struct __kernel_timeout (16b) for each invocation. Struct io_uring_reg_wait looks as follows:
struct io_uring_reg_wait { struct __kernel_timespec ts; __u32 min_wait_usec; __u32 flags; __u64 sigmask; __u32 sigmask_sz; __u32 pad[3]; __u64 pad2[2]; };
embedding the timeout itself in the region, rather than passing it as a pointer as well. Note that the signal mask is still passed as a pointer, both for compatability reasons, but also because there doesn't seem to be a lot of high frequency waits scenarios that involve setting and resetting the signal mask for each wait.
The application is free to modify any region before a wait call, or it can use keep multiple regions with different settings to avoid needing to modify the same one for wait calls. Up to a page size of regions is mapped by default, allowing PAGE_SIZE / 64 available regions for use.
The registered region must fit within a page. On a 4kb page size system, that allows for 64 wait regions if a full page is used, as the size of struct io_uring_reg_wait is 64b. The region registered must be aligned to io_uring_reg_wait in size. It's valid to register less than 64 entries.
In network performance testing with zero-copy, this reduced the time spent waiting on the TX side from 3.12% to 0.3% and the RX side from 4.4% to 0.3%.
Wait regions are fixed for the lifetime of the ring - once registered, they are persistent until the ring is torn down. The regions support minimum wait timeout as well as the regular waits.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
79cfe9e5 |
| 21-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring/register: add IORING_REGISTER_RESIZE_RINGS
Once a ring has been created, the size of the CQ and SQ rings are fixed. Usually this isn't a problem on the SQ ring side, as it merely controls t
io_uring/register: add IORING_REGISTER_RESIZE_RINGS
Once a ring has been created, the size of the CQ and SQ rings are fixed. Usually this isn't a problem on the SQ ring side, as it merely controls the available number of requests that can be submitted in a single system call, and there's rarely a need to change that.
For the CQ ring, it's a different story. For most efficient use of io_uring, it's important that the CQ ring never overflows. This means that applications must size it for the worst case scenario, which can be wasteful.
Add IORING_REGISTER_RESIZE_RINGS, which allows an application to resize the existing rings. It takes a struct io_uring_params argument, the same one which is used to setup the ring initially, and resizes rings according to the sizes given.
Certain properties are always inherited from the original ring setup, like SQE128/CQE32 and other setup options. The implementation only allows flag associated with how the CQ ring is sized and clamped.
Existing unconsumed SQE and CQE entries are copied as part of the process. If either the SQ or CQ resized destination ring cannot hold the entries already present in the source rings, then the operation is failed with -EOVERFLOW. Any register op holds ->uring_lock, which prevents new submissions, and the internal mapping holds the completion lock as well across moving CQ ring state.
To prevent races between mmap and ring resizing, add a mutex that's solely used to serialize ring resize and mmap. mmap_sem can't be used here, as as fork'ed process may be doing mmaps on the ring as well. The ctx->resize_lock is held across mmap operations, and the resize will grab it before swapping out the already mapped new data.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
a3771321 |
| 24-Sep-2024 |
Jens Axboe <[email protected]> |
io_uring/msg_ring: add support for sending a sync message
Normally MSG_RING requires both a source and a destination ring. But some users don't always have a ring avilable to send a message from, ye
io_uring/msg_ring: add support for sending a sync message
Normally MSG_RING requires both a source and a destination ring. But some users don't always have a ring avilable to send a message from, yet they still need to notify a target ring.
Add support for using io_uring_register(2) without having a source ring, using a file descriptor of -1 for that. Internally those are called blind registration opcodes. Implement IORING_REGISTER_SEND_MSG_RING as a blind opcode, which simply takes an sqe that the application can put on the stack and use the normal liburing helpers to initialize it. Then the app can call:
io_uring_register(-1, IORING_REGISTER_SEND_MSG_RING, &sqe, 1);
and get the same behavior in terms of the target, where a CQE is posted with the details given in the sqe.
For now this takes a single sqe pointer argument, and hence arg must be set to that, and nr_args must be 1. Could easily be extended to take an array of sqes, but for now let's keep it simple.
Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
2f6a55e4 |
| 16-Sep-2024 |
Dan Carpenter <[email protected]> |
io_uring: clean up a type in io_uring_register_get_file()
Originally "fd" was unsigned int but it was changed to int when we pulled this code into a separate function in commit 0b6d253e084a ("io_uri
io_uring: clean up a type in io_uring_register_get_file()
Originally "fd" was unsigned int but it was changed to int when we pulled this code into a separate function in commit 0b6d253e084a ("io_uring/register: provide helper to get io_ring_ctx from 'fd'"). This doesn't really cause a runtime problem because the call to array_index_nospec() will clamp negative fds to 0 and nothing else uses the negative values.
Signed-off-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11 |
|
| #
636119af |
| 14-Sep-2024 |
Jens Axboe <[email protected]> |
io_uring: rename "copy buffers" to "clone buffers"
A recent commit added support for copying registered buffers from one ring to another. But that term is a bit confusing, as no copying of buffer da
io_uring: rename "copy buffers" to "clone buffers"
A recent commit added support for copying registered buffers from one ring to another. But that term is a bit confusing, as no copying of buffer data is done here. What is being done is simply cloning the buffer registrations from one ring to another.
Rename it while we still can, so that it's more descriptive. No functional changes in this patch.
Fixes: 7cc2a6eadcd7 ("io_uring: add IORING_REGISTER_COPY_BUFFERS method") Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
7cc2a6ea |
| 11-Sep-2024 |
Jens Axboe <[email protected]> |
io_uring: add IORING_REGISTER_COPY_BUFFERS method
Buffers can get registered with io_uring, which allows to skip the repeated pin_pages, unpin/unref pages for each O_DIRECT operation. This reduces t
io_uring: add IORING_REGISTER_COPY_BUFFERS method
Buffers can get registered with io_uring, which allows to skip the repeated pin_pages, unpin/unref pages for each O_DIRECT operation. This reduces the overhead of O_DIRECT IO.
However, registrering buffers can take some time. Normally this isn't an issue as it's done at initialization time (and hence less critical), but for cases where rings can be created and destroyed as part of an IO thread pool, registering the same buffers for multiple rings become a more time sensitive proposition. As an example, let's say an application has an IO memory pool of 500G. Initial registration takes:
Got 500 huge pages (each 1024MB) Registered 500 pages in 409 msec
or about 0.4 seconds. If we go higher to 900 1GB huge pages being registered:
Registered 900 pages in 738 msec
which is, as expected, a fully linear scaling.
Rather than have each ring pin/map/register the same buffer pool, provide an io_uring_register(2) opcode to simply duplicate the buffers that are registered with another ring. Adding the same 900GB of registered buffers to the target ring can then be accomplished in:
Copied 900 pages in 17 usec
While timing differs a bit, this provides around a 25,000-40,000x speedup for this use case.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|