|
Revision tags: v6.15, v6.15-rc7 |
|
| #
f446c631 |
| 12-May-2025 |
Jens Axboe <[email protected]> |
io_uring/memmap: don't use page_address() on a highmem page
For older/32-bit systems with highmem, don't assume that the pages in a mapped region are always going to be mapped. If io_region_init_ptr
io_uring/memmap: don't use page_address() on a highmem page
For older/32-bit systems with highmem, don't assume that the pages in a mapped region are always going to be mapped. If io_region_init_ptr() finds that the pages are coalescable, also check if the first page is a HighMem page or not. If it is, fall through to the usual vmap() mapping rather than attempt to get the unmapped page address.
Cc: [email protected] Fixes: c4d0ac1c1567 ("io_uring/memmap: optimise single folio regions") Link: https://lore.kernel.org/all/[email protected]/ Reported-by: [email protected] Link: https://lore.kernel.org/all/[email protected]/ Reported-by: [email protected] Tested-by: [email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4 |
|
| #
92ade52f |
| 21-Feb-2025 |
Bui Quang Minh <[email protected]> |
io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
Allow user to mmap the kernel allocated zerocopy-rx refill queue.
Signed-off-by: Bui Quang Minh <[email protected]> Reviewed
io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
Allow user to mmap the kernel allocated zerocopy-rx refill queue.
Signed-off-by: Bui Quang Minh <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Reviewed-by: Li Zetao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
7cd7b957 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: unify io_uring mmap'ing code
All mapped memory is now backed by regions and we can unify and clean up io_region_validate_mmap() and io_uring_mmap(). Extract a function looking up a
io_uring/memmap: unify io_uring mmap'ing code
All mapped memory is now backed by regions and we can unify and clean up io_region_validate_mmap() and io_uring_mmap(). Extract a function looking up a region, the rest of the handling should be generic and just needs the region.
There is one more ring type specific code, i.e. the mmaping size truncation quirk for IORING_OFF_[S,C]Q_RING, which is left as is.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/f5e1eda1562bfd34276de07465525ae5f10e1e84.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
ef62de3c |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: use region api for pbuf rings
Convert internal parts of the provided buffer ring managment to the region API. It's the last non-region mapped ring we have, so it also kills a bunch of
io_uring/kbuf: use region api for pbuf rings
Convert internal parts of the provided buffer ring managment to the region API. It's the last non-region mapped ring we have, so it also kills a bunch of now unused memmap.c helpers.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/6c40cf7beaa648558acd4d84bc0fb3279a35d74b.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
90175f3f |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/kbuf: remove pbuf ring refcounting
struct io_buffer_list refcounting was needed for RCU based sync with mmap, now we can kill it.
Signed-off-by: Pavel Begunkov <[email protected]> Li
io_uring/kbuf: remove pbuf ring refcounting
struct io_buffer_list refcounting was needed for RCU based sync with mmap, now we can kill it.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/4a9cc54bf0077bb2bf2f3daf917549ddd41080da.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
81a4058e |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: use region api for CQ
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/46fc3c8
io_uring: use region api for CQ
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/46fc3c801290d6b1ac16023d78f6b8e685c87fd6.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
8078486e |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: use region api for SQ
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1fb73ced6b835cb3
io_uring: use region api for SQ
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1fb73ced6b835cb319ab0fe1dc0b2e982a9a5650.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
087f9978 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: implement mmap for regions
The patch implements mmap for the param region and enables the kernel allocation mode. Internally it uses a fixed mmap offset, however the user has to use
io_uring/memmap: implement mmap for regions
The patch implements mmap for the param region and enables the kernel allocation mode. Internally it uses a fixed mmap offset, however the user has to use the offset returned in struct io_uring_region_desc::mmap_offset.
Note, mmap doesn't and can't take ->uring_lock and the region / ring lookup is protected by ->mmap_lock, and it's directly peeking at ctx->param_region. We can't protect io_create_region() with the mmap_lock as it'd deadlock, which is why io_create_region_mmap_safe() initialises it for us in a temporary variable and then publishes it with the lock taken. It's intentionally decoupled from main region helpers, and in the future we might want to have a list of active regions, which then could be protected by the ->mmap_lock.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0f1212bd6af7fb39b63514b34fae8948014221d1.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
1e21df69 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: implement kernel allocated regions
Allow the kernel to allocate memory for a region. That's the classical way SQ/CQ are allocated. It's not yet useful to user space as there is no w
io_uring/memmap: implement kernel allocated regions
Allow the kernel to allocate memory for a region. That's the classical way SQ/CQ are allocated. It's not yet useful to user space as there is no way to mmap it, which is why it's explicitly disabled in io_register_mem_region().
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/7b8c40e6542546bbf93f4842a9a42a7373b81e0d.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
4b851d20 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: add IO_REGION_F_SINGLE_REF
Kernel allocated compound pages will have just one reference for the entire page array, add a flag telling io_free_region about that.
Signed-off-by: Pave
io_uring/memmap: add IO_REGION_F_SINGLE_REF
Kernel allocated compound pages will have just one reference for the entire page array, add a flag telling io_free_region about that.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/a7abfa7535e9728d5fcade29a1ea1605ec2c04ce.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
a90558b3 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: helper for pinning region pages
In preparation to adding kernel allocated regions extract a new helper that pins user pages.
Signed-off-by: Pavel Begunkov <[email protected]>
io_uring/memmap: helper for pinning region pages
In preparation to adding kernel allocated regions extract a new helper that pins user pages.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/a17d7c39c3de4266b66b75b2dcf768150e1fc618.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
c4d0ac1c |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: optimise single folio regions
We don't need to vmap if memory is already physically contiguous. There are two important cases it covers: PAGE_SIZE regions and huge pages. Use io_che
io_uring/memmap: optimise single folio regions
We don't need to vmap if memory is already physically contiguous. There are two important cases it covers: PAGE_SIZE regions and huge pages. Use io_check_coalesce_buffer() to get the number of contiguous folios.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/d5240af23064a824c29d14d2406f1ae764bf4505.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
226ae1b4 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: reuse io_free_region for failure path
Regions are going to become more complex with allocation options and optimisations, I want to split initialisation into steps and for that it n
io_uring/memmap: reuse io_free_region for failure path
Regions are going to become more complex with allocation options and optimisations, I want to split initialisation into steps and for that it needs a sane fail path. Reuse io_free_region(), it's smart enough to undo only what's needed and leaves the structure in a consistent state.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/b853b4ec407cc80d033d021bdd2c14e22378fc78.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
fc5f22a6 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: account memory before pinning
Move memory accounting before page pinning. It shouldn't even try to pin pages if it's not allowed, and accounting is also relatively inexpensive. It a
io_uring/memmap: account memory before pinning
Move memory accounting before page pinning. It shouldn't even try to pin pages if it's not allowed, and accounting is also relatively inexpensive. It also give a better code structure as we do generic accounting and then can branch for different mapping types.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1e242b8038411a222e8b269d35e021fa5015289f.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
16375af3 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: flag regions with user pages
In preparation to kernel allocated regions add a flag telling if the region contains user pinned pages or not.
Signed-off-by: Pavel Begunkov <asml.sile
io_uring/memmap: flag regions with user pages
In preparation to kernel allocated regions add a flag telling if the region contains user pinned pages or not.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0dc91564642654405bab080b7ec911cb4a43ec6e.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
a730d204 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/memmap: flag vmap'ed regions
Add internal flags for struct io_mapped_region. The first flag we need is IO_REGION_F_VMAPPED, that indicates that the pointer has to be unmapped on region dest
io_uring/memmap: flag vmap'ed regions
Add internal flags for struct io_mapped_region. The first flag we need is IO_REGION_F_VMAPPED, that indicates that the pointer has to be unmapped on region destruction. For now all regions are vmap'ed, so it's set unconditionally.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/5a3d8046a038da97c0f8a8c8f1733fa3fc689d31.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
943d0609 |
| 29-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: rename ->resize_lock
->resize_lock is used for resizing rings, but it's a good idea to reuse it in other cases as well. Rename it into mmap_lock as it's protects from races with mmap.
Sig
io_uring: rename ->resize_lock
->resize_lock is used for resizing rings, but it's a good idea to reuse it in other cases as well. Rename it into mmap_lock as it's protects from races with mmap.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/68f705306f3ac4d2fb999eb80ea1615015ce9f7f.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
43eef70e |
| 25-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: fix corner case forgetting to vunmap
io_pages_unmap() is a bit tricky in trying to figure whether the pages were previously vmap'ed or not. In particular If there is juts one page it beliv
io_uring: fix corner case forgetting to vunmap
io_pages_unmap() is a bit tricky in trying to figure whether the pages were previously vmap'ed or not. In particular If there is juts one page it belives there is no need to vunmap. Paired io_pages_map(), however, could've failed io_mem_alloc_compound() and attempted to io_mem_alloc_single(), which does vmap, and that leads to unpaired vmap.
The solution is to fail if io_mem_alloc_compound() can't allocate a single page. That's the easiest way to deal with it, and those two functions are getting removed soon, so no need to overcomplicate it.
Cc: [email protected] Fixes: 3ab1db3c6039e ("io_uring: get rid of remap_pfn_range() for mapping rings/sqes") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/477e75a3907a2fe83249e49c0a92cd480b2c60e0.1732569842.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
0c0a4eae |
| 26-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: check for overflows in io_pin_pages
WARNING: CPU: 0 PID: 5834 at io_uring/memmap.c:144 io_pin_pages+0x149/0x180 io_uring/memmap.c:144 CPU: 0 UID: 0 PID: 5834 Comm: syz-executor825 Not tain
io_uring: check for overflows in io_pin_pages
WARNING: CPU: 0 PID: 5834 at io_uring/memmap.c:144 io_pin_pages+0x149/0x180 io_uring/memmap.c:144 CPU: 0 UID: 0 PID: 5834 Comm: syz-executor825 Not tainted 6.12.0-next-20241118-syzkaller #0 Call Trace: <TASK> __io_uaddr_map+0xfb/0x2d0 io_uring/memmap.c:183 io_rings_map io_uring/io_uring.c:2611 [inline] io_allocate_scq_urings+0x1c0/0x650 io_uring/io_uring.c:3470 io_uring_create+0x5b5/0xc00 io_uring/io_uring.c:3692 io_uring_setup io_uring/io_uring.c:3781 [inline] ... </TASK>
io_pin_pages()'s uaddr parameter came directly from the user and can be garbage. Don't just add size to it as it can overflow.
Cc: [email protected] Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1b7520ddb168e1d537d64be47414a0629d0d8f8f.1732581026.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
2ae6bdb1 |
| 20-Nov-2024 |
Dan Carpenter <[email protected]> |
io_uring/region: return negative -E2BIG in io_create_region()
This code accidentally returns positivie E2BIG instead of negative -E2BIG. The callers treat negatives and positives the same so this d
io_uring/region: return negative -E2BIG in io_create_region()
This code accidentally returns positivie E2BIG instead of negative -E2BIG. The callers treat negatives and positives the same so this doesn't affect the kernel. The error code is returned to userspace via the system call.
Fixes: dfbbfbf19187 ("io_uring: introduce concept of memory regions") Signed-off-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12 |
|
| #
a6529588 |
| 17-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring/region: fix error codes after failed vmap
io_create_region() jumps after a vmap failure without setting the return code, it could be 0 or just uninitialised.
Fixes: dfbbfbf191878 ("io_urin
io_uring/region: fix error codes after failed vmap
io_create_region() jumps after a vmap failure without setting the return code, it could be 0 or just uninitialised.
Fixes: dfbbfbf191878 ("io_uring: introduce concept of memory regions") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0abac19dbf81c061cffaa9534a2471ed5460ad3e.1731803848.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
dfbbfbf1 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: introduce concept of memory regions
We've got a good number of mappings we share with the userspace, that includes the main rings, provided buffer rings, upcoming rings for zerocopy rx and
io_uring: introduce concept of memory regions
We've got a good number of mappings we share with the userspace, that includes the main rings, provided buffer rings, upcoming rings for zerocopy rx and more. All of them duplicate user argument parsing and some internal details as well (page pinnning, huge page optimisations, mmap'ing, etc.)
Introduce a notion of regions. For userspace for now it's just a new structure called struct io_uring_region_desc which is supposed to parameterise all such mapping / queue creations. A region either represents a user provided chunk of memory, in which case the user_addr field should point to it, or a request for the kernel to allocate the memory, in which case the user would need to mmap it after using the offset returned in the mmap_offset field. With a uniform userspace API we can avoid additional boiler plate code and apply future optimisation to all of them at once.
Internally, there is a new structure struct io_mapped_region holding all relevant runtime information and some helpers to work with it. This patch limits it to user provided regions.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0e6fe25818dfbaebd1bd90b870a6cac503fe1a24.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
68685fa2 |
| 15-Nov-2024 |
Pavel Begunkov <[email protected]> |
io_uring: fortify io_pin_pages with a warning
We're a bit too frivolous with types of nr_pages arguments, converting it to long and back to int, passing an unsigned int pointer as an int pointer and
io_uring: fortify io_pin_pages with a warning
We're a bit too frivolous with types of nr_pages arguments, converting it to long and back to int, passing an unsigned int pointer as an int pointer and so on. Shouldn't cause any problem but should be carefully reviewed, but until then let's add a WARN_ON_ONCE check to be more confident callers don't pass poorely checked arguents.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/d48e0c097cbd90fb47acaddb6c247596510d8cfc.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
79cfe9e5 |
| 21-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring/register: add IORING_REGISTER_RESIZE_RINGS
Once a ring has been created, the size of the CQ and SQ rings are fixed. Usually this isn't a problem on the SQ ring side, as it merely controls t
io_uring/register: add IORING_REGISTER_RESIZE_RINGS
Once a ring has been created, the size of the CQ and SQ rings are fixed. Usually this isn't a problem on the SQ ring side, as it merely controls the available number of requests that can be submitted in a single system call, and there's rarely a need to change that.
For the CQ ring, it's a different story. For most efficient use of io_uring, it's important that the CQ ring never overflows. This means that applications must size it for the worst case scenario, which can be wasteful.
Add IORING_REGISTER_RESIZE_RINGS, which allows an application to resize the existing rings. It takes a struct io_uring_params argument, the same one which is used to setup the ring initially, and resizes rings according to the sizes given.
Certain properties are always inherited from the original ring setup, like SQE128/CQE32 and other setup options. The implementation only allows flag associated with how the CQ ring is sized and clamped.
Existing unconsumed SQE and CQE entries are copied as part of the process. If either the SQ or CQ resized destination ring cannot hold the entries already present in the source rings, then the operation is failed with -EOVERFLOW. Any register op holds ->uring_lock, which prevents new submissions, and the internal mapping holds the completion lock as well across moving CQ ring state.
To prevent races between mmap and ring resizing, add a mutex that's solely used to serialize ring resize and mmap. mmap_sem can't be used here, as as fork'ed process may be doing mmaps on the ring as well. The ctx->resize_lock is held across mmap operations, and the resize will grab it before swapping out the already mapped new data.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
d090bffa |
| 24-Oct-2024 |
Jens Axboe <[email protected]> |
io_uring/memmap: explicitly return -EFAULT for mmap on NULL rings
The later mapping will actually check this too, but in terms of code clarify, explicitly check for whether or not the rings and sqes
io_uring/memmap: explicitly return -EFAULT for mmap on NULL rings
The later mapping will actually check this too, but in terms of code clarify, explicitly check for whether or not the rings and sqes are valid during validation. That makes it explicit that if they are non-NULL, they are valid and can get mapped.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|