|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
1f615422 |
| 25-Mar-2025 |
Jakub Kicinski <[email protected]> |
Revert "udp_tunnel: GRO optimizations"
Revert "udp_tunnel: use static call for GRO hooks when possible" This reverts commit 311b36574ceaccfa3f91b74054a09cd4bb877702.
Revert "udp_tunnel: create a fa
Revert "udp_tunnel: GRO optimizations"
Revert "udp_tunnel: use static call for GRO hooks when possible" This reverts commit 311b36574ceaccfa3f91b74054a09cd4bb877702.
Revert "udp_tunnel: create a fastpath GRO lookup." This reverts commit 8d4880db378350f8ed8969feea13bdc164564fc1.
There are multiple small issues with the series. In the interest of unblocking the merge window let's opt for a revert.
Link: https://lore.kernel.org/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7 |
|
| #
8d4880db |
| 11-Mar-2025 |
Paolo Abeni <[email protected]> |
udp_tunnel: create a fastpath GRO lookup.
Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single
udp_tunnel: create a fastpath GRO lookup.
Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single tunnel device per namespace.
Track in each namespace the UDP tunnel socket respecting the above. When only a single one is present, store a reference in the netns.
When such reference is not NULL, UDP tunnel GRO lookup just need to match the incoming packet destination port vs the socket local port.
The tunnel socket never sets the reuse[port] flag[s]. When bound to no address and interface, no other socket can exist in the same netns matching the specified local port.
Matching packets with non-local destination addresses will be aggregated, and eventually segmented as needed - no behavior changes intended.
Note that the UDP tunnel socket reference is stored into struct netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep all the fastpath-related netns fields in the same struct and allow cacheline-based optimization. Currently both the IPv4 and IPv6 socket pointer share the same cacheline as the `udp_table` field.
Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/4d5c319c4471161829f50cb8436841de81a5edae.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
1b29a730 |
| 14-Nov-2024 |
Philo Lu <[email protected]> |
ipv6/udp: Add 4-tuple hash for connected socket
Implement ipv6 udp hash4 like that in ipv4. The major difference is that the hash value should be calculated with udp6_ehashfn(). Besides, ipv4-mapped
ipv6/udp: Add 4-tuple hash for connected socket
Implement ipv6 udp hash4 like that in ipv4. The major difference is that the hash value should be calculated with udp6_ehashfn(). Besides, ipv4-mapped ipv6 address is handled before hash() and rehash(). Export udp_ehashfn because now we use it in udpv6 rehash.
Core procedures of hash/unhash/rehash are same as ipv4, and udpv4 and udpv6 share the same udptable, so some functions in ipv4 hash4 can also be shared.
Co-developed-by: Cambda Zhu <[email protected]> Signed-off-by: Cambda Zhu <[email protected]> Co-developed-by: Fred Chen <[email protected]> Signed-off-by: Fred Chen <[email protected]> Co-developed-by: Yubing Qiu <[email protected]> Signed-off-by: Yubing Qiu <[email protected]> Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
78c91ae2 |
| 14-Nov-2024 |
Philo Lu <[email protected]> |
ipv4/udp: Add 4-tuple hash for connected socket
Currently, the udp_table has two hash table, the port hash and portaddr hash. Usually for UDP servers, all sockets have the same local port and addr,
ipv4/udp: Add 4-tuple hash for connected socket
Currently, the udp_table has two hash table, the port hash and portaddr hash. Usually for UDP servers, all sockets have the same local port and addr, so they are all on the same hash slot within a reuseport group.
In some applications, UDP servers use connect() to manage clients. In particular, when firstly receiving from an unseen 4 tuple, a new socket is created and connect()ed to the remote addr:port, and then the fd is used exclusively by the client.
Once there are connected sks in a reuseport group, udp has to score all sks in the same hash2 slot to find the best match. This could be inefficient with a large number of connections, resulting in high softirq overhead.
To solve the problem, this patch implement 4-tuple hash for connected udp sockets. During connect(), hash4 slot is updated, as well as a corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(), hslot4 will be searched firstly if the counter is non-zero. Otherwise, hslot2 is used like before. Note that only connected sockets enter this hash4 path, while un-connected ones are not affected.
hlist_nulls is used for hash4, because we probably move to another hslot wrongly when lookup with concurrent rehash. Then we check nulls at the list end to see if we should restart lookup. Because udp does not use SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup.
Stress test results (with 1 cpu fully used) are shown below, in pps: (1) _un-connected_ socket as server [a] w/o hash4: 1,825176 [b] w/ hash4: 1,831750 (+0.36%)
(2) 500 _connected_ sockets as server [c] w/o hash4: 290860 (only 16% of [a]) [d] w/ hash4: 1,889658 (+3.1% compared with [b])
With hash4, compute_score is skipped when lookup, so [d] is slightly better than [b].
Co-developed-by: Cambda Zhu <[email protected]> Signed-off-by: Cambda Zhu <[email protected]> Co-developed-by: Fred Chen <[email protected]> Signed-off-by: Fred Chen <[email protected]> Co-developed-by: Yubing Qiu <[email protected]> Signed-off-by: Yubing Qiu <[email protected]> Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
dab78a17 |
| 14-Nov-2024 |
Philo Lu <[email protected]> |
net/udp: Add 4-tuple hash list basis
Add a new hash list, hash4, in udp table. It will be used to implement 4-tuple hash for connected udp sockets. This patch adds the hlist to table, and implements
net/udp: Add 4-tuple hash list basis
Add a new hash list, hash4, in udp table. It will be used to implement 4-tuple hash for connected udp sockets. This patch adds the hlist to table, and implements helpers and the initialization. 4-tuple hash is implemented in the following patch.
hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to concurrent rehash, because rehash() can happen with lookup().
Co-developed-by: Cambda Zhu <[email protected]> Signed-off-by: Cambda Zhu <[email protected]> Co-developed-by: Fred Chen <[email protected]> Signed-off-by: Fred Chen <[email protected]> Co-developed-by: Yubing Qiu <[email protected]> Signed-off-by: Yubing Qiu <[email protected]> Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
accdd51d |
| 14-Nov-2024 |
Philo Lu <[email protected]> |
net/udp: Add a new struct for hash2 slot
Preparing for udp 4-tuple hash (uhash4 for short).
To implement uhash4 without cache line missing when lookup, hslot2 is used to record the number of hashed
net/udp: Add a new struct for hash2 slot
Preparing for udp 4-tuple hash (uhash4 for short).
To implement uhash4 without cache line missing when lookup, hslot2 is used to record the number of hashed sockets in hslot4. Thus adding a new struct udp_hslot_main with field hash4_cnt, which is used by hash2. The new struct is used to avoid doubling the size of udp_hslot.
Before uhash4 lookup, firstly checking hash4_cnt to see if there are hashed sks in hslot4. Because hslot2 is always used in lookup, there is no cache line miss.
Related helpers are updated, and use the helpers as possible.
uhash4 is implemented in following patches.
Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
70d0bb45 |
| 22-Aug-2024 |
Simon Horman <[email protected]> |
net: Correct spelling in headers
Correct spelling in Networking headers. As reported by codespell.
Signed-off-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/20240822-net-spell-v
net: Correct spelling in headers
Correct spelling in Networking headers. As reported by codespell.
Signed-off-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
87d973e8 |
| 02-Aug-2024 |
Eric Dumazet <[email protected]> |
ipv6: udp: constify 'struct net' parameter of socket lookups
Following helpers do not touch their 'struct net' argument.
- udp6_lib_lookup() - __udp6_lib_lookup()
Signed-off-by: Eric Dumazet <edum
ipv6: udp: constify 'struct net' parameter of socket lookups
Following helpers do not touch their 'struct net' argument.
- udp6_lib_lookup() - __udp6_lib_lookup()
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
b9abcbb1 |
| 02-Aug-2024 |
Eric Dumazet <[email protected]> |
udp: constify 'struct net' parameter of socket lookups
Following helpers do not touch their 'struct net' argument.
- udp_sk_bound_dev_eq() - udp4_lib_lookup() - __udp4_lib_lookup()
Signed-off-by:
udp: constify 'struct net' parameter of socket lookups
Following helpers do not touch their 'struct net' argument.
- udp_sk_bound_dev_eq() - udp4_lib_lookup() - __udp4_lib_lookup()
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3 |
|
| #
e8205119 |
| 07-Apr-2024 |
Al Viro <[email protected]> |
new helper: copy_to_iter_full()
... and convert copy_linear_skb() to using that.
Signed-off-by: Al Viro <[email protected]>
|
|
Revision tags: v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5 |
|
| #
cc97777c |
| 05-Aug-2023 |
Yue Haibing <[email protected]> |
udp/udplite: Remove unused function declarations udp{,lite}_get_port()
Commit 6ba5a3c52da0 ("[UDP]: Make full use of proto.h.udp_hash innovation.") removed these implementations but leave declaratio
udp/udplite: Remove unused function declarations udp{,lite}_get_port()
Commit 6ba5a3c52da0 ("[UDP]: Make full use of proto.h.udp_hash innovation.") removed these implementations but leave declarations.
Signed-off-by: Yue Haibing <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
9e63a99c |
| 01-Aug-2023 |
Yue Haibing <[email protected]> |
udp: Remove unused function declaration udp_bpf_get_proto()
commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind this.
Signed-off-by: Yue Haibing <yuehaibing@huaw
udp: Remove unused function declaration udp_bpf_get_proto()
commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind this.
Signed-off-by: Yue Haibing <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
e1d001fa |
| 09-Jun-2023 |
Breno Leitao <[email protected]> |
net: ioctl: Use kernel memory on protocol ioctl callbacks
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl
net: ioctl: Use kernel memory on protocol ioctl callbacks
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers.
Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback).
This changes the "struct proto" ioctl format in the following way:
int (*ioctl)(struct sock *sk, int cmd, - unsigned long arg); + int *karg);
(Important to say that this patch does not touch the "struct proto_ops" protocols)
So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls:
1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace.
The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions:
* Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace.
* Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6
* Protocol PHONET: * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace.
For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space.
The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases.
Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
d457a0e3 |
| 08-Jun-2023 |
Eric Dumazet <[email protected]> |
net: move gso declarations and functions to their own files
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <[email protected]> Cc: Stanislav Fom
net: move gso declarations and functions to their own files
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <[email protected]> Cc: Stanislav Fomichev <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
1d7e4538 |
| 07-Jun-2023 |
David Howells <[email protected]> |
ipv4, ipv6: Use splice_eof() to flush
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splic
ipv4, ipv6: Use splice_eof() to flush
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE.
For UDP, a pending packet will not be emitted if the socket is closed before it is flushed; with this change, it be flushed by ->splice_eof().
For TCP, it's not clear that MSG_MORE is actually effective.
Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> cc: Kuniyuki Iwashima <[email protected]> cc: Willem de Bruijn <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3 |
|
| #
e4fe1bf1 |
| 19-May-2023 |
Aditi Ghag <[email protected]> |
udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state
This is a preparatory commit to remove the field. The field was previously shared between proc fs and BPF UDP socket iterators. As the follow
udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state
This is a preparatory commit to remove the field. The field was previously shared between proc fs and BPF UDP socket iterators. As the follow-up commits will decouple the implementation for the iterators, remove the field. As for BPF socket iterator, filtering of sockets is exepected to be done in BPF programs.
Suggested-by: Martin KaFai Lau <[email protected]> Signed-off-by: Aditi Ghag <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2 |
|
| #
8a3854c7 |
| 20-Oct-2022 |
Paolo Abeni <[email protected]> |
udp: track the forward memory release threshold in an hot cacheline
When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf
udp: track the forward memory release threshold in an hot cacheline
When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf, as the latter shares the same cacheline with sk_forward_alloc, written by the BH.
With this patch, UDP tracks the rcvbuf value and its update via custom SOL_SOCKET socket options, and copies the forward memory threshold value used by udp_rmem_release() in a different cacheline, already accessed by the above function and uncontended.
Since the UDP socket init operation grown a bit, factor out the common code between v4 and v6 in a shared helper.
Overall the above give a 10% peek throughput increase under UDP flood.
Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc1 |
|
| #
d38afeec |
| 06-Oct-2022 |
Kuniyuki Iwashima <[email protected]> |
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during th
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak:
setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case
- __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed.
- goto out_no_dst;
- lock_sock(sk)
For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally.
We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next.
Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8 |
|
| #
3d72bb41 |
| 18-Jul-2022 |
Kuniyuki Iwashima <[email protected]> |
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("n
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc7 |
|
| #
11052589 |
| 13-Jul-2022 |
Kuniyuki Iwashima <[email protected]> |
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we intr
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3 |
|
| #
965b57b4 |
| 15-Jun-2022 |
Cong Wang <[email protected]> |
net: Introduce a new proto_ops ->read_skb()
Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_soc
net: Introduce a new proto_ops ->read_skb()
Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors.
For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb().
Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.19-rc2 |
|
| #
0defbb0a |
| 09-Jun-2022 |
Eric Dumazet <[email protected]> |
net: add per_cpu_fw_alloc field to struct proto
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use.
Instead of having reserved b
net: add per_cpu_fw_alloc field to struct proto
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use.
Instead of having reserved bytes per socket, we want to have per-cpu reserves.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3 |
|
| #
ec095263 |
| 11-Apr-2022 |
Oliver Hartkopp <[email protected]> |
net: remove noblock parameter from recvmsg() entities
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to
net: remove noblock parameter from recvmsg() entities
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to commit f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()") this patch removes the separate 'noblock' parameter for recvmsg().
Analogue to the referenced patch for skb_recv_datagram() the 'flags' and 'noblock' parameters are unnecessarily split up with e.g.
err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len);
or in
err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len);
instead of simply using only flags all the time and check for MSG_DONTWAIT where needed (to preserve for the formerly separated no(n)block condition).
Signed-off-by: Oliver Hartkopp <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2 |
|
| #
4721031c |
| 15-Nov-2021 |
Eric Dumazet <[email protected]> |
net: move gro definitions to include/net/gro.h
include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David
net: move gro definitions to include/net/gro.h
include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc1, v5.15, v5.15-rc7 |
|
| #
9122a70a |
| 24-Oct-2021 |
Cyril Strejc <[email protected]> |
net: multicast: calculate csum of looped-back and forwarded packets
During a testing of an user-space application which transmits UDP multicast datagrams and utilizes multicast routing to send the U
net: multicast: calculate csum of looped-back and forwarded packets
During a testing of an user-space application which transmits UDP multicast datagrams and utilizes multicast routing to send the UDP datagrams out of defined network interfaces, I've found a multicast router does not fill-in UDP checksum into locally produced, looped-back and forwarded UDP datagrams, if an original output NIC the datagrams are sent to has UDP TX checksum offload enabled.
The datagrams are sent malformed out of the NIC the datagrams have been forwarded to.
It is because:
1. If TX checksum offload is enabled on the output NIC, UDP checksum is not calculated by kernel and is not filled into skb data.
2. dev_loopback_xmit(), which is called solely by ip_mc_finish_output(), sets skb->ip_summed = CHECKSUM_UNNECESSARY unconditionally.
3. Since 35fc92a9 ("[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE"), the ip_summed value is preserved during forwarding.
4. If ip_summed != CHECKSUM_PARTIAL, checksum is not calculated during a packet egress.
The minimum fix in dev_loopback_xmit():
1. Preserves skb->ip_summed CHECKSUM_PARTIAL. This is the case when the original output NIC has TX checksum offload enabled. The effects are:
a) If the forwarding destination interface supports TX checksum offloading, the NIC driver is responsible to fill-in the checksum.
b) If the forwarding destination interface does NOT support TX checksum offloading, checksums are filled-in by kernel before skb is submitted to the NIC driver.
c) For local delivery, checksum validation is skipped as in the case of CHECKSUM_UNNECESSARY, thanks to skb_csum_unnecessary().
2. Translates ip_summed CHECKSUM_NONE to CHECKSUM_UNNECESSARY. It means, for CHECKSUM_NONE, the behavior is unmodified and is there to skip a looped-back packet local delivery checksum validation.
Signed-off-by: Cyril Strejc <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|