History log of /freebsd-13.1/sys/kern/sys_socket.c (Results 1 – 25 of 116)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: release/12.4.0, release/13.1.0, release/12.3.0
# b07b1f89 12-Oct-2021 John Baldwin <[email protected]>

Stop creating socket aio kprocs during boot.

Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead. The pool of kprocs used for other AIO
requests is si

Stop creating socket aio kprocs during boot.

Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead. The pool of kprocs used for other AIO
requests is similarly created on first use.

Reviewed by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32468

(cherry picked from commit d1b6fef0751b70819e632d7d4722efbc8f94b80b)

show more ...


Revision tags: release/13.0.0
# 2247f489 03-Jan-2021 Alan Somers <[email protected]>

aio: micro-optimize the lio_opcode assignments

This allows slightly more efficient opcode testing in-kernel. It is
transparent to userland, except to applications that sneakily submit
aio fsync or

aio: micro-optimize the lio_opcode assignments

This allows slightly more efficient opcode testing in-kernel. It is
transparent to userland, except to applications that sneakily submit
aio fsync or aio mlock operations via lio_listio, which has never been
documented, requires the use of deliberately undefined constants
(LIO_SYNC and LIO_MLOCK), and is arguably a bug.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D27942

show more ...


# 022ca2fc 02-Jan-2021 Alan Somers <[email protected]>

Add aio_writev and aio_readv

POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but the

Add aio_writev and aio_readv

POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.

It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.

Reviewed by: jhb, kib, bcr
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27743

show more ...


# f908d824 07-Nov-2020 Michael Tuexen <[email protected]>

The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK
access the socket send or receive buffer. This is not possible for
listening sockets since r319722.
Because send()/recv() calls

The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK
access the socket send or receive buffer. This is not possible for
listening sockets since r319722.
Because send()/recv() calls fail on listening sockets, fail also ioctl()
indicating EINVAL.

PR: 250366
Reported by: Yong-Hao Zou
Reviewed by: glebius, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D26897

show more ...


Revision tags: release/12.2.0
# 6fed89b1 01-Sep-2020 Mateusz Guzik <[email protected]>

kern: clean up empty lines in .c and .h files


Revision tags: release/11.4.0
# 99258935 20-Mar-2020 Mark Johnston <[email protected]>

Lock the socket in soo_stat().

Otherwise nothing synchronizes with a concurrent conversion of the
socket to a listening socket.

Only the PF_LOCAL protocols implement pru_sense, and it is safe to ho

Lock the socket in soo_stat().

Otherwise nothing synchronizes with a concurrent conversion of the
socket to a listening socket.

Only the PF_LOCAL protocols implement pru_sense, and it is safe to hold
the socket lock there, so do so for now.

Reported by: [email protected]
MFC after: 1 week
Sponsored by: The FreeBSD Foundation

show more ...


# 7029da5c 26-Feb-2020 Pawel Biernacki <[email protected]>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly mark

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718

show more ...


Revision tags: release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0
# ce076a1f 11-Jan-2018 Michael Tuexen <[email protected]>

Ensure that the vnet is set when calling pru_sockaddr() and
pru_peeraddr().

This is already true when called via kern_getsockname() and
kern_getpeername(). This patch sets it also, when they arecall

Ensure that the vnet is set when calling pru_sockaddr() and
pru_peeraddr().

This is already true when called via kern_getsockname() and
kern_getpeername(). This patch sets it also, when they arecalled
via soo_fill_kinfo(). This is necessary, since the corresponding
functions for SCTP require the vnet to be set. Without this,
if a process having an wildcard bound SCTP socket is
terminated and a core is written, the kernel panics.

Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D13652

show more ...


# 51369649 20-Nov-2017 Pedro F. Giffuni <[email protected]>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

show more ...


Revision tags: release/10.4.0
# 12fb14f3 25-Aug-2017 John Baldwin <[email protected]>

Don't grab SOCK_LOCK for soref() when queuing an AIO request.

The AIO job holds a reference on the associated file descriptor, so the
socket's count should already be > 0. This fixes a LOR with the

Don't grab SOCK_LOCK for soref() when queuing an AIO request.

The AIO job holds a reference on the associated file descriptor, so the
socket's count should already be > 0. This fixes a LOR with the socket
buffer lock after recent socket locking changes in HEAD.

Sponsored by: Chelsio Communications

show more ...


Revision tags: release/11.1.0
# c7af7893 17-Jul-2017 John Baldwin <[email protected]>

Set the current vnet pointer in the socket buffer AIO handler.

This fixes panics when using AIO under VIMAGE.

Reported by: kp
MFC after: 3 days
Sponsored by: Chelsio Communications


# 438902ed 09-Jun-2017 Gleb Smirnoff <[email protected]>

Fix stat(2) on a listening socket.


# 779f106a 08-Jun-2017 Gleb Smirnoff <[email protected]>

Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
-

Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
- Take out selinfo's from the socket buffers into the socket. The
first reason is to support braindamaged scenario when a socket is
added to kevent(2) and then listen(2) is cast on it. The second
reason is that there is future plan to make socket buffers pluggable,
so that for a dataflow socket a socket buffer can be changed, and
in this case we also want to keep same selinfos through the lifetime
of a socket.
- Remove struct struct so_accf. Since now listening stuff no longer
affects struct socket size, just move its fields into listening part
of the union.
- Provide sol_upcall field and enforce that so_upcall_set() may be called
only on a dataflow socket, which has buffers, and for listening sockets
provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
- Add a mutex to socket, to be used instead of socket buffer lock to lock
fields of struct socket that don't belong to a socket buffer.
- Allow to acquire two socket locks, but the first one must belong to a
listening socket.
- Make soref()/sorele() to use atomic(9). This allows in some situations
to do soref() without owning socket lock. There is place for improvement
here, it is possible to make sorele() also to lock optionally.
- Most protocols aren't touched by this change, except UNIX local sockets.
See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
listening sockets: provide function solisten_dequeue(), and use it in
the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
infiniband, rpc.

o UNIX local sockets.
- Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
local sockets. Most races exist around spawning a new socket, when we
are connecting to a local listening socket. To cover them, we need to
hold locks on both PCBs when spawning a third one. This means holding
them across sonewconn(). This creates a LOR between pcb locks and
unp_list_lock.
- To fix the new LOR, abandon the global unp_list_lock in favor of global
unp_link_lock. Indeed, separating these two locks didn't provide us any
extra parralelism in the UNIX sockets.
- Now call into uipc_attach() may happen with unp_link_lock hold if, we
are accepting, or without unp_link_lock in case if we are just creating
a socket.
- Another problem in UNIX sockets is that uipc_close() basicly did nothing
for a listening socket. The vnode remained opened for connections. This
is fixed by removing vnode in uipc_close(). Maybe the right way would be
to do it for all sockets (not only listening), simply move the vnode
teardown from uipc_detach() to uipc_close()?

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9770

show more ...


# 95b97895 26-May-2017 Conrad Meyer <[email protected]>

procstat(1): Add TCP socket send/recv buffer size

Add TCP socket send and receive buffer size to procstat -f output.

Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: htt

procstat(1): Add TCP socket send/recv buffer size

Add TCP socket send and receive buffer size to procstat -f output.

Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10689

show more ...


# 69921123 23-May-2017 Konstantin Belousov <[email protected]>

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_na

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439

show more ...


# 14da48cb 06-Jan-2017 John Baldwin <[email protected]>

Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker task

Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.

Reviewed by: adrian
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D8955

show more ...


Revision tags: release/11.0.1, release/11.0.0
# 69a28758 15-Sep-2016 Ed Maste <[email protected]>

Renumber license clauses in sys/kern to avoid skipping #3


# b1012d80 21-Jun-2016 John Baldwin <[email protected]>

Account for AIO socket operations in thread/process resource usage.

File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thre

Account for AIO socket operations in thread/process resource usage.

File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thread that
completes an AIO request via aio_return() or aio_waitcomplete(). This
change extends AIO jobs to store counts of received/sent messages and
updates socket backends to set these counts accordingly. Note that
the socket backends are careful to only charge a single messages for
each AIO request even though a single request on a blocking socket might
invoke sosend or soreceive multiple times. This is to mimic the
resource accounting of synchronous read/write.

Adjust the UNIX socketpair AIO test to verify that the message resource
usage counts update accordingly for aio_read and aio_write.

Approved by: re (hrs)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6911

show more ...


# fe0bdd1d 15-Jun-2016 John Baldwin <[email protected]>

Move backend-specific fields of kaiocb into a union.

This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.

C

Move backend-specific fields of kaiocb into a union.

This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.

Change the socket and Chelsio DDP backends to use 'backend3' instead of
abusing _aiocb_private.status directly. This confines the use of
_aiocb_private to the AIO internals in vfs_aio.c.

Reviewed by: kib (earlier version)
Approved by: re (gjb)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6547

show more ...


# 778ce4f2 24-May-2016 John Baldwin <[email protected]>

Return the correct status when a partially completed request is cancelled.

After the previous changes to fix requests on blocking sockets to complete
across multiple operations, an edge case exists

Return the correct status when a partially completed request is cancelled.

After the previous changes to fix requests on blocking sockets to complete
across multiple operations, an edge case exists where a request can be
cancelled after it has partially completed. POSIX doesn't appear to
dictate exactly how to handle this case, but in general I feel that
aio_cancel() should arrange to cancel any request it can, but that any
partially completed requests should return a partial completion rather
than ECANCELED. To that end, fix the socket AIO cancellation routine to
return a short read/write if a partially completed request is cancelled
rather than ECANCELED.

Sponsored by: Chelsio Communications

show more ...


# 1717b68a 24-May-2016 John Baldwin <[email protected]>

Don't prematurely return short completions on blocking sockets.

Always requeue an AIO job at the head of the socket buffer's queue if
sosend() or soreceive() returns EWOULDBLOCK on a blocking socket

Don't prematurely return short completions on blocking sockets.

Always requeue an AIO job at the head of the socket buffer's queue if
sosend() or soreceive() returns EWOULDBLOCK on a blocking socket.
Previously, requests were only requeued if they returned EWOULDBLOCK
and completed no data. Now after a partial completion on a blocking
socket the request is queued and the remaining request is retried when
the socket is ready. This allows writes larger than the currently
available space on a blocking socket to fully complete. Reads on a
blocking socket that satifsy the low watermark can still return a short
read (same as read()).

In order to track previously completed data, the internal 'status'
field of the AIO job is used to store the amount of previously
computed data.

Non-blocking sockets continue to return short completions for both
reads and writes.

Add a test for a "large" AIO write on a blocking socket that writes
twice the socket buffer size to a UNIX domain socket.

Sponsored by: Chelsio Communications

show more ...


# f0ec1740 20-May-2016 John Baldwin <[email protected]>

Consistently set status to -1 when completing an AIO request with an error.

Sponsored by: Chelsio Communications


# 5163d2ec 29-Apr-2016 John Baldwin <[email protected]>

Expose soaio_enqueue().

This can be used by protocol-specific AIO handlers to queue work to the
socket AIO daemon pool.

Sponsored by: Chelsio Communications


# 8722384b 29-Apr-2016 John Baldwin <[email protected]>

Introduce a new protocol hook pru_aio_queue.

This allows a protocol to claim individual AIO requests instead of using
the default socket AIO handling.

Sponsored by: Chelsio Communications


Revision tags: release/10.3.0
# f3215338 01-Mar-2016 John Baldwin <[email protected]>

Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing a

Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order. Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels. The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon. This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system. To protect against this, AIO
requests are only permitted for known "safe" files by default. AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default. aio_mlock() is also enabled by default.

Reviewed by: cem, jilles
Discussed with: kib (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5289

show more ...


12345