|
Revision tags: release/12.4.0, release/13.1.0, release/12.3.0 |
|
| #
b07b1f89 |
| 12-Oct-2021 |
John Baldwin <[email protected]> |
Stop creating socket aio kprocs during boot.
Create the initial pool of kprocs on demand when the first socket AIO request is submitted instead. The pool of kprocs used for other AIO requests is si
Stop creating socket aio kprocs during boot.
Create the initial pool of kprocs on demand when the first socket AIO request is submitted instead. The pool of kprocs used for other AIO requests is similarly created on first use.
Reviewed by: asomers Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32468
(cherry picked from commit d1b6fef0751b70819e632d7d4722efbc8f94b80b)
show more ...
|
|
Revision tags: release/13.0.0 |
|
| #
2247f489 |
| 03-Jan-2021 |
Alan Somers <[email protected]> |
aio: micro-optimize the lio_opcode assignments
This allows slightly more efficient opcode testing in-kernel. It is transparent to userland, except to applications that sneakily submit aio fsync or
aio: micro-optimize the lio_opcode assignments
This allows slightly more efficient opcode testing in-kernel. It is transparent to userland, except to applications that sneakily submit aio fsync or aio mlock operations via lio_listio, which has never been documented, requires the use of deliberately undefined constants (LIO_SYNC and LIO_MLOCK), and is arguably a bug.
Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D27942
show more ...
|
| #
022ca2fc |
| 02-Jan-2021 |
Alan Somers <[email protected]> |
Add aio_writev and aio_readv
POSIX AIO is great, but it lacks vectored I/O functions. This commit fixes that shortcoming by adding aio_writev and aio_readv. They aren't part of the standard, but the
Add aio_writev and aio_readv
POSIX AIO is great, but it lacks vectored I/O functions. This commit fixes that shortcoming by adding aio_writev and aio_readv. They aren't part of the standard, but they're an obvious extension. They work just like their synchronous equivalents pwritev and preadv.
It isn't yet possible to use vectored aiocbs with lio_listio, but that could be added in the future.
Reviewed by: jhb, kib, bcr Relnotes: yes Differential Revision: https://reviews.freebsd.org/D27743
show more ...
|
| #
f908d824 |
| 07-Nov-2020 |
Michael Tuexen <[email protected]> |
The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK access the socket send or receive buffer. This is not possible for listening sockets since r319722. Because send()/recv() calls
The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK access the socket send or receive buffer. This is not possible for listening sockets since r319722. Because send()/recv() calls fail on listening sockets, fail also ioctl() indicating EINVAL.
PR: 250366 Reported by: Yong-Hao Zou Reviewed by: glebius, rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D26897
show more ...
|
|
Revision tags: release/12.2.0 |
|
| #
6fed89b1 |
| 01-Sep-2020 |
Mateusz Guzik <[email protected]> |
kern: clean up empty lines in .c and .h files
|
|
Revision tags: release/11.4.0 |
|
| #
99258935 |
| 20-Mar-2020 |
Mark Johnston <[email protected]> |
Lock the socket in soo_stat().
Otherwise nothing synchronizes with a concurrent conversion of the socket to a listening socket.
Only the PF_LOCAL protocols implement pru_sense, and it is safe to ho
Lock the socket in soo_stat().
Otherwise nothing synchronizes with a concurrent conversion of the socket to a listening socket.
Only the PF_LOCAL protocols implement pru_sense, and it is safe to hold the socket lock there, so do so for now.
Reported by: [email protected] MFC after: 1 week Sponsored by: The FreeBSD Foundation
show more ...
|
| #
7029da5c |
| 26-Feb-2020 |
Pawel Biernacki <[email protected]> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly mark
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
show more ...
|
|
Revision tags: release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0 |
|
| #
ce076a1f |
| 11-Jan-2018 |
Michael Tuexen <[email protected]> |
Ensure that the vnet is set when calling pru_sockaddr() and pru_peeraddr().
This is already true when called via kern_getsockname() and kern_getpeername(). This patch sets it also, when they arecall
Ensure that the vnet is set when calling pru_sockaddr() and pru_peeraddr().
This is already true when called via kern_getsockname() and kern_getpeername(). This patch sets it also, when they arecalled via soo_fill_kinfo(). This is necessary, since the corresponding functions for SCTP require the vnet to be set. Without this, if a process having an wildcard bound SCTP socket is terminated and a core is written, the kernel panics.
Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D13652
show more ...
|
| #
51369649 |
| 20-Nov-2017 |
Pedro F. Giffuni <[email protected]> |
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
show more ...
|
|
Revision tags: release/10.4.0 |
|
| #
12fb14f3 |
| 25-Aug-2017 |
John Baldwin <[email protected]> |
Don't grab SOCK_LOCK for soref() when queuing an AIO request.
The AIO job holds a reference on the associated file descriptor, so the socket's count should already be > 0. This fixes a LOR with the
Don't grab SOCK_LOCK for soref() when queuing an AIO request.
The AIO job holds a reference on the associated file descriptor, so the socket's count should already be > 0. This fixes a LOR with the socket buffer lock after recent socket locking changes in HEAD.
Sponsored by: Chelsio Communications
show more ...
|
|
Revision tags: release/11.1.0 |
|
| #
c7af7893 |
| 17-Jul-2017 |
John Baldwin <[email protected]> |
Set the current vnet pointer in the socket buffer AIO handler.
This fixes panics when using AIO under VIMAGE.
Reported by: kp MFC after: 3 days Sponsored by: Chelsio Communications
|
| #
438902ed |
| 09-Jun-2017 |
Gleb Smirnoff <[email protected]> |
Fix stat(2) on a listening socket.
|
| #
779f106a |
| 08-Jun-2017 |
Gleb Smirnoff <[email protected]> |
Listening sockets improvements.
o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. -
Listening sockets improvements.
o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set().
o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information.
o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc.
o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()?
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770
show more ...
|
| #
95b97895 |
| 26-May-2017 |
Conrad Meyer <[email protected]> |
procstat(1): Add TCP socket send/recv buffer size
Add TCP socket send and receive buffer size to procstat -f output.
Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: htt
procstat(1): Add TCP socket send/recv buffer size
Add TCP socket send and receive buffer size to procstat -f output.
Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10689
show more ...
|
| #
69921123 |
| 23-May-2017 |
Konstantin Belousov <[email protected]> |
Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_na
Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439
show more ...
|
| #
14da48cb |
| 06-Jan-2017 |
John Baldwin <[email protected]> |
Set MORETOCOME for AIO write requests on a socket.
Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend* set PRUS_MOREOTOCOME when invoking the protocol send method. The aio worker task
Set MORETOCOME for AIO write requests on a socket.
Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend* set PRUS_MOREOTOCOME when invoking the protocol send method. The aio worker tasks for sending on a socket set this flag when there are additional write jobs waiting on the socket buffer.
Reviewed by: adrian MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D8955
show more ...
|
|
Revision tags: release/11.0.1, release/11.0.0 |
|
| #
69a28758 |
| 15-Sep-2016 |
Ed Maste <[email protected]> |
Renumber license clauses in sys/kern to avoid skipping #3
|
| #
b1012d80 |
| 21-Jun-2016 |
John Baldwin <[email protected]> |
Account for AIO socket operations in thread/process resource usage.
File and disk-backed I/O requests store counts of read/written disk blocks in each AIO job so that they can be charged to the thre
Account for AIO socket operations in thread/process resource usage.
File and disk-backed I/O requests store counts of read/written disk blocks in each AIO job so that they can be charged to the thread that completes an AIO request via aio_return() or aio_waitcomplete(). This change extends AIO jobs to store counts of received/sent messages and updates socket backends to set these counts accordingly. Note that the socket backends are careful to only charge a single messages for each AIO request even though a single request on a blocking socket might invoke sosend or soreceive multiple times. This is to mimic the resource accounting of synchronous read/write.
Adjust the UNIX socketpair AIO test to verify that the message resource usage counts update accordingly for aio_read and aio_write.
Approved by: re (hrs) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6911
show more ...
|
| #
fe0bdd1d |
| 15-Jun-2016 |
John Baldwin <[email protected]> |
Move backend-specific fields of kaiocb into a union.
This reduces the size of kaiocb slightly. I've also added some generic fields that other backends can use in place of the BIO-specific fields.
C
Move backend-specific fields of kaiocb into a union.
This reduces the size of kaiocb slightly. I've also added some generic fields that other backends can use in place of the BIO-specific fields.
Change the socket and Chelsio DDP backends to use 'backend3' instead of abusing _aiocb_private.status directly. This confines the use of _aiocb_private to the AIO internals in vfs_aio.c.
Reviewed by: kib (earlier version) Approved by: re (gjb) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6547
show more ...
|
| #
778ce4f2 |
| 24-May-2016 |
John Baldwin <[email protected]> |
Return the correct status when a partially completed request is cancelled.
After the previous changes to fix requests on blocking sockets to complete across multiple operations, an edge case exists
Return the correct status when a partially completed request is cancelled.
After the previous changes to fix requests on blocking sockets to complete across multiple operations, an edge case exists where a request can be cancelled after it has partially completed. POSIX doesn't appear to dictate exactly how to handle this case, but in general I feel that aio_cancel() should arrange to cancel any request it can, but that any partially completed requests should return a partial completion rather than ECANCELED. To that end, fix the socket AIO cancellation routine to return a short read/write if a partially completed request is cancelled rather than ECANCELED.
Sponsored by: Chelsio Communications
show more ...
|
| #
1717b68a |
| 24-May-2016 |
John Baldwin <[email protected]> |
Don't prematurely return short completions on blocking sockets.
Always requeue an AIO job at the head of the socket buffer's queue if sosend() or soreceive() returns EWOULDBLOCK on a blocking socket
Don't prematurely return short completions on blocking sockets.
Always requeue an AIO job at the head of the socket buffer's queue if sosend() or soreceive() returns EWOULDBLOCK on a blocking socket. Previously, requests were only requeued if they returned EWOULDBLOCK and completed no data. Now after a partial completion on a blocking socket the request is queued and the remaining request is retried when the socket is ready. This allows writes larger than the currently available space on a blocking socket to fully complete. Reads on a blocking socket that satifsy the low watermark can still return a short read (same as read()).
In order to track previously completed data, the internal 'status' field of the AIO job is used to store the amount of previously computed data.
Non-blocking sockets continue to return short completions for both reads and writes.
Add a test for a "large" AIO write on a blocking socket that writes twice the socket buffer size to a UNIX domain socket.
Sponsored by: Chelsio Communications
show more ...
|
| #
f0ec1740 |
| 20-May-2016 |
John Baldwin <[email protected]> |
Consistently set status to -1 when completing an AIO request with an error.
Sponsored by: Chelsio Communications
|
| #
5163d2ec |
| 29-Apr-2016 |
John Baldwin <[email protected]> |
Expose soaio_enqueue().
This can be used by protocol-specific AIO handlers to queue work to the socket AIO daemon pool.
Sponsored by: Chelsio Communications
|
| #
8722384b |
| 29-Apr-2016 |
John Baldwin <[email protected]> |
Introduce a new protocol hook pru_aio_queue.
This allows a protocol to claim individual AIO requests instead of using the default socket AIO handling.
Sponsored by: Chelsio Communications
|
|
Revision tags: release/10.3.0 |
|
| #
f3215338 |
| 01-Mar-2016 |
John Baldwin <[email protected]> |
Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness.
Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing a
Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness.
Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing an asynchronous I/O request for a given file. The AIO subystem now exports library of routines to manipulate AIO requests as well as the ability to run a handler function in the "default" pool of AIO daemons to service a request.
A default implementation for file types which do not include an fo_aio_queue method queues requests to the "default" pool invoking the fo_read or fo_write methods as before.
The AIO subsystem permits file types to install a private "cancel" routine when a request is queued to permit safe dequeueing and cleanup of cancelled requests.
Sockets now use their own pool of AIO daemons and service per-socket requests in FIFO order. Socket requests will not block indefinitely permitting timely cancellation of all requests.
Due to the now-tight coupling of the AIO subsystem with file types, the AIO subsystem is now a standard part of all kernels. The VFS_AIO kernel option and aio.ko module are gone.
Many file types may block indefinitely in their fo_read or fo_write callbacks resulting in a hung AIO daemon. This can result in hung user processes (when processes attempt to cancel all outstanding requests during exit) or a hung system. To protect against this, AIO requests are only permitted for known "safe" files by default. AIO requests for all file types can be enabled by setting the new vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have been updated to skip operations on unsafe file types if the sysctl is zero.
Currently, AIO requests on sockets and raw disks are considered safe and are enabled by default. aio_mlock() is also enabled by default.
Reviewed by: cem, jilles Discussed with: kib (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5289
show more ...
|