|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
e45469e5 |
| 13-Dec-2024 |
Anna Emese Nyiri <[email protected]> |
sock: Introduce SO_RCVPRIORITY socket option
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the ancillary data returned by recvmsg(). This is analogous to the existing support for
sock: Introduce SO_RCVPRIORITY socket option
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the ancillary data returned by recvmsg(). This is analogous to the existing support for SO_RCVMARK, as implemented in commit 6fd1d51cfa253 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()").
Reviewed-by: Willem de Bruijn <[email protected]> Suggested-by: Ferenc Fejes <[email protected]> Signed-off-by: Anna Emese Nyiri <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
4aecca4c |
| 01-Oct-2024 |
Vadim Fedorenko <[email protected]> |
net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there
net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there is no way to reliably predict socket timestamp ID value in case of error returned by sendmsg. For UDP sockets it's impossible because of lockless nature of UDP transmit, several threads may send packets in parallel. In case of RAW sockets MSG_MORE option makes things complicated. More details are in the conversation [1]. This patch adds new control message type to give user-space software an opportunity to control the mapping between packets and values by providing ID with each sendmsg for UDP sockets. The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Jason Xing <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11 |
|
| #
678f6e28 |
| 10-Sep-2024 |
Mina Almasry <[email protected]> |
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
Add an interface for the user to notify the kernel that it is done reading the devmem dmabuf frags returned as cmsg. The kernel will drop t
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
Add an interface for the user to notify the kernel that it is done reading the devmem dmabuf frags returned as cmsg. The kernel will drop the reference on the frags to make them available for reuse.
Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: Kaiyuan Zhang <[email protected]> Signed-off-by: Mina Almasry <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
8f0b3cc9 |
| 10-Sep-2024 |
Mina Almasry <[email protected]> |
tcp: RX path for devmem TCP
In tcp_recvmsg_locked(), detect if the skb being received by the user is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM flag - pass it to tcp_recvm
tcp: RX path for devmem TCP
In tcp_recvmsg_locked(), detect if the skb being received by the user is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM flag - pass it to tcp_recvmsg_devmem() for custom handling.
tcp_recvmsg_devmem() copies any data in the skb header to the linear buffer, and returns a cmsg to the user indicating the number of bytes returned in the linear buffer.
tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags, and returns to the user a cmsg_devmem indicating the location of the data in the dmabuf device memory. cmsg_devmem contains this information:
1. the offset into the dmabuf where the payload starts. 'frag_offset'. 2. the size of the frag. 'frag_size'. 3. an opaque token 'frag_token' to return to the kernel when the buffer is to be released.
The pages awaiting freeing are stored in the newly added sk->sk_user_frags, and each page passed to userspace is get_page()'d. This reference is dropped once the userspace indicates that it is done reading this page. All pages are released when the socket is destroyed.
Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: Kaiyuan Zhang <[email protected]> Signed-off-by: Mina Almasry <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
7b26952a |
| 08-Jun-2023 |
Alexander Mikhalitsyn <[email protected]> |
net: core: add getsockopt SO_PEERPIDFD
Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. This thing is direct analog of SO_PEERCRED which allows to get plain PID.
Cc: "David S
net: core: add getsockopt SO_PEERPIDFD
Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. This thing is direct analog of SO_PEERCRED which allows to get plain PID.
Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: David Ahern <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Lennart Poettering <[email protected]> Cc: Luca Boccassi <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Christian Brauner <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Tested-by: Luca Boccassi <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
5e2ff670 |
| 08-Jun-2023 |
Alexander Mikhalitsyn <[email protected]> |
scm: add SO_PASSPIDFD and SCM_PIDFD
Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about
scm: add SO_PASSPIDFD and SCM_PIDFD
Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about PID reuse problem.
We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because it depends on a pidfd_prepare() API which is not exported to the kernel modules.
Idea comes from UAPI kernel group: https://uapi-group.org/kernel-features/
Big thanks to Christian Brauner and Lennart Poettering for productive discussions about this.
Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: David Ahern <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Lennart Poettering <[email protected]> Cc: Luca Boccassi <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Tested-by: Luca Boccassi <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5 |
|
| #
6fd1d51c |
| 27-Apr-2022 |
Erin MacNeil <[email protected]> |
net: SO_RCVMARK socket option for SO_MARK with recvmsg()
Adding a new socket option, SO_RCVMARK, to indicate that SO_MARK should be included in the ancillary data returned by recvmsg().
Renamed the
net: SO_RCVMARK socket option for SO_MARK with recvmsg()
Adding a new socket option, SO_RCVMARK, to indicate that SO_MARK should be included in the ancillary data returned by recvmsg().
Renamed the sock_recv_ts_and_drops() function to sock_recv_cmsgs().
Signed-off-by: Erin MacNeil <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Marc Kleine-Budde <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3 |
|
| #
26859240 |
| 31-Jan-2022 |
Akhmat Karakotov <[email protected]> |
txhash: Add socket option to control TX hash rethink behavior
Add the SO_TXREHASH socket option to control hash rethink behavior per socket. When default mode is set, sockets disable rehash at initi
txhash: Add socket option to control TX hash rethink behavior
Add the SO_TXREHASH socket option to control hash rethink behavior per socket. When default mode is set, sockets disable rehash at initialization and use sysctl option when entering listen state. setsockopt() overrides default behavior.
Signed-off-by: Akhmat Karakotov <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4 |
|
| #
2bb2f5fb |
| 29-Sep-2021 |
Wei Wang <[email protected]> |
net: add new socket option SO_RESERVE_MEM
This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the u
net: add new socket option SO_RESERVE_MEM
This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure.
Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others.
Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5 |
|
| #
04190bf8 |
| 04-Aug-2021 |
Pavel Tikhomirov <[email protected]> |
sock: allow reading and changing sk_userlocks with setsockopt
SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and tcp_s
sock: allow reading and changing sk_userlocks with setsockopt
SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and tcp_sndbuf_expand()). If we've just created a new socket this adjustment is enabled on it, but if one changes the socket buffer size by setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on restore as it first needs to increase buffer sizes for packet queues restore and second it needs to restore back original buffer sizes. So after CRIU restore all sockets become non-auto-adjustable, which can decrease network performance of restored applications significantly.
CRIU need to be able to restore sockets with enabled/disabled adjustment to the same state it was before dump, so let's add special setsockopt for it.
Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so that using these interface one can reenable automatic socket buffer adjustment on their sockets.
Signed-off-by: Pavel Tikhomirov <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13 |
|
| #
e8b9eab9 |
| 23-Jun-2021 |
Martynas Pumputis <[email protected]> |
net: retrieve netns cookie via getsocketopt
It's getting more common to run nested container environments for testing cloud software. One of such examples is Kind [1] which runs a Kubernetes cluster
net: retrieve netns cookie via getsocketopt
It's getting more common to run nested container environments for testing cloud software. One of such examples is Kind [1] which runs a Kubernetes cluster in Docker containers on a single host. Each container acts as a Kubernetes node, and thus can run any Pod (aka container) inside the former. This approach simplifies testing a lot, as it eliminates complicated VM setups.
Unfortunately, such a setup breaks some functionality when cgroupv2 BPF programs are used for load-balancing. The load-balancer BPF program needs to detect whether a request originates from the host netns or a container netns in order to allow some access, e.g. to a service via a loopback IP address. Typically, the programs detect this by comparing netns cookies with the one of the init ns via a call to bpf_get_netns_cookie(NULL). However, in nested environments the latter cannot be used given the Kubernetes node's netns is outside the init ns. To fix this, we need to pass the Kubernetes node netns cookie to the program in a different way: by extending getsockopt() with a SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes node netns can retrieve the cookie and pass it to the program instead.
Thus, this is following up on Eric's commit 3d368ab87cf6 ("net: initialize net->net_cookie at netns setup") to allow retrieval via SO_NETNS_COOKIE. This is also in line in how we retrieve socket cookie via SO_COOKIE.
[1] https://kind.sigs.k8s.io/
Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Martynas Pumputis <[email protected]> Cc: Eric Dumazet <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7 |
|
| #
7c951caf |
| 30-Nov-2020 |
Björn Töpel <[email protected]> |
net: Add SO_BUSY_POLL_BUDGET socket option
This option lets a user set a per socket NAPI budget for busy-polling. If the options is not set, it will use the default of 8.
Signed-off-by: Björn Töpel
net: Add SO_BUSY_POLL_BUDGET socket option
This option lets a user set a per socket NAPI budget for busy-polling. If the options is not set, it will use the default of 8.
Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| #
7fd3253a |
| 30-Nov-2020 |
Björn Töpel <[email protected]> |
net: Introduce preferred busy-polling
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That
net: Introduce preferred busy-polling
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling.
One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling.
This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed.
If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume.
In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option.
Example usage:
$ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing.
Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5 |
|
| #
99f3a064 |
| 13-Jun-2019 |
Martin KaFai Lau <[email protected]> |
bpf: net: Add SO_DETACH_REUSEPORT_BPF
There is SO_ATTACH_REUSEPORT_[CE]BPF but there is no DETACH. This patch adds SO_DETACH_REUSEPORT_BPF sockopt. The same sockopt can be used to undo both SO_ATTA
bpf: net: Add SO_DETACH_REUSEPORT_BPF
There is SO_ATTACH_REUSEPORT_[CE]BPF but there is no DETACH. This patch adds SO_DETACH_REUSEPORT_BPF sockopt. The same sockopt can be used to undo both SO_ATTACH_REUSEPORT_[CE]BPF.
reseport_detach_prog() is added and it is mostly a mirror of the existing reuseport_attach_prog(). The differences are, it does not call reuseport_alloc() and returns -ENOENT when there is no old prog.
Cc: Craig Gallek <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1 |
|
| #
a623a7a1 |
| 11-Mar-2019 |
Arnd Bergmann <[email protected]> |
y2038: fix socket.h header inclusion
Referencing the __kernel_long_t type caused some user space applications to stop compiling when they had not already included linux/posix_types.h, e.g.
s/multic
y2038: fix socket.h header inclusion
Referencing the __kernel_long_t type caused some user space applications to stop compiling when they had not already included linux/posix_types.h, e.g.
s/multicast.c -o ext/sockets/multicast.lo In file included from /builddir/build/BUILD/php-7.3.3/main/php.h:468, from /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:27: /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c: In function 'zm_startup_sockets': /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:776:40: error: '__kernel_long_t' undeclared (first use in this function) 776 | REGISTER_LONG_CONSTANT("SO_SNDTIMEO", SO_SNDTIMEO, CONST_CS | CONST_PERSISTENT);
It is safe to include that header here, since it only contains kernel internal types that do not conflict with other user space types.
It's still possible that some related build failures remain, but those are likely to be for code that is not already y2038 safe.
Reported-by: Laura Abbott <[email protected]> Fixes: a9beb86ae6e5 ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5 |
|
| #
a9beb86a |
| 02-Feb-2019 |
Deepa Dinamani <[email protected]> |
sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW
Add new socket timeout options that are y2038 safe.
Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]
sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW
Add new socket timeout options that are y2038 safe.
Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
45bdc661 |
| 02-Feb-2019 |
Deepa Dinamani <[email protected]> |
socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval as the time format. struct timeval is not y2038 safe. The subsequent patches
socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval as the time format. struct timeval is not y2038 safe. The subsequent patches in the series add support for new socket timeout options with _NEW suffix that will use y2038 safe data structures. Although the existing struct timeval layout is sufficiently wide to represent timeouts, because of the way libc will interpret time_t based on user defined flag, these new flags provide a way of having a structure that is the same for all architectures consistently. Rename the existing options with _OLD suffix forms so that the right option is enabled for userspace applications according to the architecture and time_t definition of libc.
Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
9718475e |
| 02-Feb-2019 |
Deepa Dinamani <[email protected]> |
socket: Add SO_TIMESTAMPING_NEW
Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures.
Signed-off-by: Deepa D
socket: Add SO_TIMESTAMPING_NEW
Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures.
Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
887feae3 |
| 02-Feb-2019 |
Deepa Dinamani <[email protected]> |
socket: Add SO_TIMESTAMP[NS]_NEW
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD fo
socket: Add SO_TIMESTAMP[NS]_NEW
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD for all architectures.
Note that the format of scm_timestamping.ts[0] is not changed in this patch.
Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
7f1bc6e9 |
| 02-Feb-2019 |
Deepa Dinamani <[email protected]> |
sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the way they are currently defined, are not y2038 safe. Subsequent patches in the series
sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the way they are currently defined, are not y2038 safe. Subsequent patches in the series add new y2038 safe versions of these options which provide 64 bit timestamps on all architectures uniformly. Hence, rename existing options with OLD tag suffixes.
Also note that kernel will not use the untagged SO_TIMESTAMP* and SCM_TIMESTAMP* options internally anymore.
Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc4, v5.0-rc3 |
|
| #
f5dd3d0c |
| 15-Jan-2019 |
David Herrmann <[email protected]> |
net: introduce SO_BINDTOIFINDEX sockopt
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface ind
net: introduce SO_BINDTOIFINDEX sockopt
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name.
User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device.
In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood.
A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition.
By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race.
Reviewed-by: Tom Gundersen <[email protected]> Signed-off-by: David Herrmann <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4 |
|
| #
80b14dee |
| 03-Jul-2018 |
Richard Cochran <[email protected]> |
net: Add a new socket option for a future transmit time.
This patch introduces SO_TXTIME. User space enables this option in order to pass a desired future transmit time in a CMSG when calling sendms
net: Add a new socket option for a future transmit time.
This patch introduces SO_TXTIME. User space enables this option in order to pass a desired future transmit time in a CMSG when calling sendmsg(2). The argument to this socket option is a 8-bytes long struct provided by the uapi header net_tstamp.h defined as:
struct sock_txtime { clockid_t clockid; u32 flags; };
Note that new fields were added to struct sock by filling a 2-bytes hole found in the struct. For that reason, neither the struct size or number of cachelines were altered.
Signed-off-by: Richard Cochran <[email protected]> Signed-off-by: Jesus Sanchez-Palencia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8 |
|
| #
6f52b16c |
| 01-Nov-2017 |
Greg Kroah-Hartman <[email protected]> |
License cleanup: add SPDX license identifier to uapi header files with no license
Many user space API headers are missing licensing information, which makes it hard for compliance tools to determine
License cleanup: add SPDX license identifier to uapi header files with no license
Many user space API headers are missing licensing information, which makes it hard for compliance tools to determine the correct license.
By default are files without license information under the default license of the kernel, which is GPLV2. Marking them GPLV2 would exclude them from being included in non GPLV2 code, which is obviously not intended. The user space API headers fall under the syscall exception which is in the kernels COPYING file:
NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work".
otherwise syscall usage would not be possible.
Update the files which contain no license information with an SPDX license identifier. The chosen identifier is 'GPL-2.0 WITH Linux-syscall-note' which is the officially assigned identifier for the Linux syscall exception. SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched.
Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Philippe Ombredanne <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4 |
|
| #
76851d12 |
| 03-Aug-2017 |
Willem de Bruijn <[email protected]> |
sock: add SOCK_ZEROCOPY sockopt
The send call ignores unknown flags. Legacy applications may already unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a socket opts in to zerocopy.
sock: add SOCK_ZEROCOPY sockopt
The send call ignores unknown flags. Legacy applications may already unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a socket opts in to zerocopy.
Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing. Processes can also query this socket option to detect kernel support for the feature. Older kernels will return ENOPROTOOPT.
Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7 |
|
| #
28b5ba2a |
| 21-Jun-2017 |
David Herrmann <[email protected]> |
net: introduce SO_PEERGROUPS getsockopt
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to retrieve the auxiliary groups of the remote peer. It is designed to naturally extend SO_
net: introduce SO_PEERGROUPS getsockopt
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to retrieve the auxiliary groups of the remote peer. It is designed to naturally extend SO_PEERCRED. That is, the underlying data is from the same credentials. Regarding its syntax, it is based on SO_PEERSEC. That is, if the provided buffer is too small, ERANGE is returned and @optlen is updated. Otherwise, the information is copied, @optlen is set to the actual size, and 0 is returned.
While SO_PEERCRED (and thus `struct ucred') already returns the primary group, it lacks the auxiliary group vector. However, nearly all access controls (including kernel side VFS and SYSVIPC, but also user-space polkit, DBus, ...) consider the entire set of groups, rather than just the primary group. But this is currently not possible with pure SO_PEERCRED. Instead, user-space has to work around this and query the system database for the auxiliary groups of a UID retrieved via SO_PEERCRED.
Unfortunately, there is no race-free way to query the auxiliary groups of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space solution is to use getgrouplist(3p), which itself falls back to NSS and whatever is configured in nsswitch.conf(3). This effectively checks which groups we *would* assign to the user if it logged in *now*. On normal systems it is as easy as reading /etc/group, but with NSS it can resort to quering network databases (eg., LDAP), using IPC or network communication.
Long story short: Whenever we want to use auxiliary groups for access checks on IPC, we need further IPC to talk to the user/group databases, rather than just relying on SO_PEERCRED and the incoming socket. This is unfortunate, and might even result in dead-locks if the database query uses the same IPC as the original request.
So far, those recursions / dead-locks have been avoided by using primitive IPC for all crucial NSS modules. However, we want to avoid re-inventing the wheel for each NSS module that might be involved in user/group queries. Hence, we would preferably make DBus (and other IPC that supports access-management based on groups) work without resorting to the user/group database. This new SO_PEERGROUPS ioctl would allow us to make dbus-daemon work without ever calling into NSS.
Cc: Michal Sekletar <[email protected]> Cc: Simon McVittie <[email protected]> Reviewed-by: Tom Gundersen <[email protected]> Signed-off-by: David Herrmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|