|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
4618e195 |
| 05-Mar-2025 |
Ilpo Järvinen <[email protected]> |
tcp: add new TCP_TW_ACK_OOW state and allow ECN bits in TOS
ECN bits in TOS are always cleared when sending in ACKs in TW. Clearing them is problematic for TCP flows that used Accurate ECN because E
tcp: add new TCP_TW_ACK_OOW state and allow ECN bits in TOS
ECN bits in TOS are always cleared when sending in ACKs in TW. Clearing them is problematic for TCP flows that used Accurate ECN because ECN bits decide which service queue the packet is placed into (L4S vs Classic). Effectively, TW ACKs are always downgraded from L4S to Classic queue which might impact, e.g., delay the ACK will experience on the path compared with the other packets of the flow.
Change the TW ACK sending code to differentiate: - In tcp_v4_send_reset(), commit ba9e04a7ddf4f ("ip: fix tos reflection in ack and reset packets") cleans ECN bits for TW reset and this is not affected. - In tcp_v4_timewait_ack(), ECN bits for all TW ACKs are cleaned. But now only ECN bits of ACKs for oow data or paws_reject are cleaned, and ECN bits of other ACKs will not be cleaned. - In tcp_v4_reqsk_send_ack(), commit 66b13d99d96a1 ("ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT") did not clean ECN bits of ACKs for oow data or paws_reject. But now the ECN bits rae cleaned for these ACKs.
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: Chia-Yu Chang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
148721f8 |
| 16-Dec-2024 |
Guillaume Nault <[email protected]> |
ipv4: Use inet_sk_init_flowi4() in __ip_queue_xmit().
Use inet_sk_init_flowi4() to automatically initialise the flowi4 structure in __ip_queue_xmit() instead of passing parameters manually to ip_rou
ipv4: Use inet_sk_init_flowi4() in __ip_queue_xmit().
Use inet_sk_init_flowi4() to automatically initialise the flowi4 structure in __ip_queue_xmit() instead of passing parameters manually to ip_route_output_ports().
Override ->flowi4_tos with the value passed as parameter since that's required by SCTP.
Signed-off-by: Guillaume Nault <[email protected]> Link: https://patch.msgid.link/37e64ffbd9adac187b14aa9097b095f5c86e85be.1734357769.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3 |
|
| #
a32f3e9d |
| 13-Dec-2024 |
Anna Emese Nyiri <[email protected]> |
sock: support SO_PRIORITY cmsg
The Linux socket API currently allows setting SO_PRIORITY at the socket level, applying a uniform priority to all packets sent through that socket. The exception to th
sock: support SO_PRIORITY cmsg
The Linux socket API currently allows setting SO_PRIORITY at the socket level, applying a uniform priority to all packets sent through that socket. The exception to this is IP_TOS, when the priority value is calculated during the handling of ancillary data, as implemented in commit f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data"). However, this is a computed value, and there is currently no mechanism to set a custom priority via control messages prior to this patch.
According to this patch, if SO_PRIORITY is specified as ancillary data, the packet is sent with the priority value set through sockc->priority, overriding the socket-level values set via the traditional setsockopt() method. This is analogous to the existing support for SO_MARK, as implemented in commit c6af0c227a22 ("ip: support SO_MARK cmsg").
If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that takes precedence is the last one in the cmsg list.
This patch has the side effect that raw_send_hdrinc now interprets cmsg IP_TOS.
Reviewed-by: Willem de Bruijn <[email protected]> Suggested-by: Ferenc Fejes <[email protected]> Signed-off-by: Anna Emese Nyiri <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2 |
|
| #
5204ccbf |
| 03-Dec-2024 |
Eric Dumazet <[email protected]> |
inet: add indirect call wrapper for getfrag() calls
UDP send path suffers from one indirect call to ip_generic_getfrag()
We can use INDIRECT_CALL_1() to avoid it.
Signed-off-by: Eric Dumazet <edum
inet: add indirect call wrapper for getfrag() calls
UDP send path suffers from one indirect call to ip_generic_getfrag()
We can use INDIRECT_CALL_1() to avoid it.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Brian Vazquez <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
79636038 |
| 10-Oct-2024 |
Eric Dumazet <[email protected]> |
ipv4: tcp: give socket pointer to control skbs
ip_send_unicast_reply() send orphaned 'control packets'.
These are RST packets and also ACK packets sent from TIME_WAIT.
Some eBPF programs would pre
ipv4: tcp: give socket pointer to control skbs
ip_send_unicast_reply() send orphaned 'control packets'.
These are RST packets and also ACK packets sent from TIME_WAIT.
Some eBPF programs would prefer to have a meaningful skb->sk pointer as much as possible.
This means that TCP can now attach TIME_WAIT sockets to outgoing skbs.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Brian Vazquez <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2 |
|
| #
822b5bc6 |
| 01-Oct-2024 |
Vadim Fedorenko <[email protected]> |
net_tstamp: add SCM_TS_OPT_ID for RAW sockets
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW sockets. To add new option this patch converts all callers (direct and indirect)
net_tstamp: add SCM_TS_OPT_ID for RAW sockets
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW sockets. To add new option this patch converts all callers (direct and indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of tsflags. And while here fix __sock_tx_timestamp to receive tsflags as __u32 instead of __u16.
Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Jason Xing <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
4aecca4c |
| 01-Oct-2024 |
Vadim Fedorenko <[email protected]> |
net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there
net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there is no way to reliably predict socket timestamp ID value in case of error returned by sendmsg. For UDP sockets it's impossible because of lockless nature of UDP transmit, several threads may send packets in parallel. In case of RAW sockets MSG_MORE option makes things complicated. More details are in the conversation [1]. This patch adds new control message type to give user-space software an opportunity to control the mapping between packets and values by providing ID with each sendmsg for UDP sockets. The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Jason Xing <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11, v6.11-rc7 |
|
| #
71f1fea4 |
| 03-Sep-2024 |
Ido Schimmel <[email protected]> |
ipv4: Unmask upper DSCP bits in __ip_queue_xmit()
The function is passed the full DS field in its 'tos' argument by its two callers. It then masks the upper DSCP bits using RT_TOS() when passing it
ipv4: Unmask upper DSCP bits in __ip_queue_xmit()
The function is passed the full DS field in its 'tos' argument by its two callers. It then masks the upper DSCP bits using RT_TOS() when passing it to ip_route_output_ports().
Unmask the upper DSCP bits when passing 'tos' to ip_route_output_ports() so that in the future it could perform the FIB lookup according to the full DSCP value.
Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6 |
|
| #
13f6538d |
| 29-Aug-2024 |
Ido Schimmel <[email protected]> |
ipv4: Unmask upper DSCP bits in ip_send_unicast_reply()
The function calls flowi4_init_output() to initialize an IPv4 flow key with which it then performs a FIB lookup using ip_route_output_flow().
ipv4: Unmask upper DSCP bits in ip_send_unicast_reply()
The function calls flowi4_init_output() to initialize an IPv4 flow key with which it then performs a FIB lookup using ip_route_output_flow().
'arg->tos' with which the TOS value in the IPv4 flow key (flowi4_tos) is initialized contains the full DS field. Unmask the upper DSCP bits so that in the future the FIB lookup could be performed according to the full DSCP value.
Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5 |
|
| #
bc3dd9ed |
| 19-Aug-2024 |
James Chapman <[email protected]> |
l2tp: use skb_queue_purge in l2tp_ip_destroy_sock
Recent commit ed8ebee6def7 ("l2tp: have l2tp_ip_destroy_sock use ip_flush_pending_frames") was incorrect in that l2tp_ip does not use socket cork an
l2tp: use skb_queue_purge in l2tp_ip_destroy_sock
Recent commit ed8ebee6def7 ("l2tp: have l2tp_ip_destroy_sock use ip_flush_pending_frames") was incorrect in that l2tp_ip does not use socket cork and ip_flush_pending_frames is for sockets that do. Use __skb_queue_purge instead and remove the unnecessary lock.
Also unexport ip_flush_pending_frames since it was originally exported in commit 4ff8863419cd ("ipv4: export ip_flush_pending_frames") for l2tp and is not used by other modules.
Suggested-by: [email protected] Signed-off-by: James Chapman <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
4ff88634 |
| 29-Jul-2024 |
James Chapman <[email protected]> |
ipv4: export ip_flush_pending_frames
To avoid protocol modules implementing their own, export ip_flush_pending_frames.
Signed-off-by: James Chapman <[email protected]> Signed-off-by: Tom Parkin
ipv4: export ip_flush_pending_frames
To avoid protocol modules implementing their own, export ip_flush_pending_frames.
Signed-off-by: James Chapman <[email protected]> Signed-off-by: Tom Parkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9 |
|
| #
1693c5db |
| 09-May-2024 |
Abhishek Chauhan <[email protected]> |
net: Add additional bit to support clockid_t timestamp type
tstamp_type is now set based on actual clockid_t compressed into 2 bits.
To make the design scalable for future needs this commit bring i
net: Add additional bit to support clockid_t timestamp type
tstamp_type is now set based on actual clockid_t compressed into 2 bits.
To make the design scalable for future needs this commit bring in the change to extend the tstamp_type:1 to tstamp_type:2 to support other clockid_t timestamp.
We now support CLOCK_TAI as part of tstamp_type as part of this commit with existing support CLOCK_MONOTONIC and CLOCK_REALTIME.
Signed-off-by: Abhishek Chauhan <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
show more ...
|
| #
4d25ca2d |
| 09-May-2024 |
Abhishek Chauhan <[email protected]> |
net: Rename mono_delivery_time to tstamp_type for scalabilty
mono_delivery_time was added to check if skb->tstamp has delivery time in mono clock base (i.e. EDT) otherwise skb->tstamp has timestamp
net: Rename mono_delivery_time to tstamp_type for scalabilty
mono_delivery_time was added to check if skb->tstamp has delivery time in mono clock base (i.e. EDT) otherwise skb->tstamp has timestamp in ingress and delivery_time at egress.
Renaming the bitfield from mono_delivery_time to tstamp_type is for extensibilty for other timestamps such as userspace timestamp (i.e. SO_TXTIME) set via sock opts.
As we are renaming the mono_delivery_time to tstamp_type, it makes sense to start assigning tstamp_type based on enum defined in this commit.
Earlier we used bool arg flag to check if the tstamp is mono in function skb_set_delivery_time, Now the signature of the functions accepts tstamp_type to distinguish between mono and real time.
Also skb_set_delivery_type_by_clockid is a new function which accepts clockid to determine the tstamp_type.
In future tstamp_type:1 can be extended to support userspace timestamp by increasing the bitfield.
Signed-off-by: Abhishek Chauhan <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc7 |
|
| #
fc1092f5 |
| 30-Apr-2024 |
Shigeru Yoshida <[email protected]> |
ipv4: Fix uninit-value access in __ip_make_skb()
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb() tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause
ipv4: Fix uninit-value access in __ip_make_skb()
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb() tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL while __ip_make_skb() is running, the function will access icmphdr in the skb even if it is not included. This causes the issue reported by KMSAN.
Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL on the socket.
Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These are union in struct flowi4 and are implicitly initialized by flowi4_init_output(), but we should not rely on specific union layout.
Initialize these explicitly in raw_sendmsg().
[1] BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 ip_finish_skb include/net/ip.h:243 [inline] ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508 raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] __ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128 ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365 raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
Fixes: 99e5acae193e ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()") Reported-by: syzkaller <[email protected]> Signed-off-by: Shigeru Yoshida <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
05d6d492 |
| 29-Apr-2024 |
Eric Dumazet <[email protected]> |
inet: introduce dst_rtable() helper
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper")
This patch does a similar change for IPv4.
Instead of (struct rtable *)d
inet: introduce dst_rtable() helper
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper")
This patch does a similar change for IPv4.
Instead of (struct rtable *)dst casts, we can use :
#define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst)
Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
35c3e279 |
| 14-Mar-2024 |
Abhishek Chauhan <[email protected]> |
Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets"
This reverts commit 885c36e59f46375c138de18ff1692f18eff67b7f.
The patch currently broke the bpf selftest test_tc_dti
Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets"
This reverts commit 885c36e59f46375c138de18ff1692f18eff67b7f.
The patch currently broke the bpf selftest test_tc_dtime because uapi field __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono with the original fix as the bit was re-used for userspace timestamp as well to avoid tstamp reset in the forwarding path. To solve this we need to keep mono_delivery_time as is and introduce another bit called user_delivery_time and fall back to the initial proposal of setting the user_delivery_time bit based on sk_clockid set from userspace.
Fixes: 885c36e59f46 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets") Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Abhishek Chauhan <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7 |
|
| #
885c36e5 |
| 01-Mar-2024 |
Abhishek Chauhan <[email protected]> |
net: Re-use and set mono_delivery_time bit for userspace tstamp packets
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc c
net: Re-use and set mono_delivery_time bit for userspace tstamp packets
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc checks the packet coming from userspace and encounters to be 0 thereby dropping time sensitive packets. These changes will allow userspace timestamps packets to be forwarded from the bridge to NIC drivers.
Setting the same bit (mono_delivery_time) to avoid dropping of userspace tstamp packets in the forwarding path.
Existing functionality of mono_delivery_time remains unaltered here, instead just extended with userspace tstamp support for bridge forwarding path.
Signed-off-by: Abhishek Chauhan <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5 |
|
| #
488b6d91 |
| 13-Feb-2024 |
Vadim Fedorenko <[email protected]> |
net-timestamp: make sk_tskey more predictable in error path
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams, the sk_tskey can become unpredictable in case of any error happen
net-timestamp: make sk_tskey more predictable in error path
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams, the sk_tskey can become unpredictable in case of any error happened during sendmsg(). Move increment later in the code and make decrement of sk_tskey in error path. This solution is still racy in case of multiple threads doing snedmsg() over the very same socket in parallel, but still makes error path much more predictable.
Fixes: 09c2d251b707 ("net-timestamp: add key to disambiguate concurrent datagrams") Reported-by: Andy Lutomirski <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4 |
|
| #
a3522a2e |
| 09-Feb-2024 |
Guillaume Nault <[email protected]> |
ipv4: Set the routing scope properly in ip_route_output_ports().
Set scope automatically in ip_route_output_ports() (using the socket SOCK_LOCALROUTE flag). This way, callers don't have to overload
ipv4: Set the routing scope properly in ip_route_output_ports().
Set scope automatically in ip_route_output_ports() (using the socket SOCK_LOCALROUTE flag). This way, callers don't have to overload the tos with the RTO_ONLINK flag, like RT_CONN_FLAGS() does.
For callers that don't pass a struct sock, this doesn't change anything as the scope is still set to RT_SCOPE_UNIVERSE when sk is NULL.
Callers that passed a struct sock and used RT_CONN_FLAGS(sk) or RT_CONN_FLAGS_TOS(sk, tos) for the tos are modified to use ip_sock_tos(sk) and RT_TOS(tos) respectively, as overloading tos with the RTO_ONLINK flag now becomes unnecessary.
In drivers/net/amt.c, all ip_route_output_ports() calls use a 0 tos parameter, ignoring the SOCK_LOCALROUTE flag of the socket. But the sk parameter is a kernel socket, which doesn't have any configuration path for setting SOCK_LOCALROUTE anyway. Therefore, ip_route_output_ports() will continue to initialise scope with RT_SCOPE_UNIVERSE and amt.c doesn't need to be modified.
Also, remove RT_CONN_FLAGS() and RT_CONN_FLAGS_TOS() from route.h as these macros are now unused.
The objective is to eventually remove RTO_ONLINK entirely to allow converting ->flowi4_tos to dscp_t. This will ensure proper isolation between the DSCP and ECN bits, thus minimising the risk of introducing bugs where TOS values interfere with ECN.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/dacfd2ab40685e20959ab7b53c427595ba229e7d.1707496938.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc3 |
|
| #
5dee6d69 |
| 29-Jan-2024 |
Zhipeng Lu <[email protected]> |
net: ipv4: fix a memleak in ip_setup_cork
When inetdev_valid_mtu fails, cork->opt should be freed if it is allocated in ip_setup_cork. Otherwise there could be a memleak.
Fixes: 501a90c94510 ("inet
net: ipv4: fix a memleak in ip_setup_cork
When inetdev_valid_mtu fails, cork->opt should be freed if it is allocated in ip_setup_cork. Otherwise there could be a memleak.
Fixes: 501a90c94510 ("inet: protect against too small mtu values.") Signed-off-by: Zhipeng Lu <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
b4a11b20 |
| 19-Oct-2023 |
Heng Guo <[email protected]> |
net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.
Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.16
net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.
Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1500 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1500
Reproduce: VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0. scapy command: send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1, inter=1.000000)
Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- "ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase from 7 to 8.
Issue description and patch: IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS ("OutOctets") in ip_finish_output2(). According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but not "OutRequests". "OutRequests" does not include any datagrams counted in "ForwDatagrams". ipSystemStatsOutOctets OBJECT-TYPE DESCRIPTION "The total number of octets in IP datagrams delivered to the lower layers for transmission. Octets from datagrams counted in ipIfStatsOutTransmits MUST be counted here. ipSystemStatsOutRequests OBJECT-TYPE DESCRIPTION "The total number of IP datagrams that local IP user- protocols (including ICMP) supplied to IP in requests for transmission. Note that this counter does not include any datagrams counted in ipSystemStatsOutForwDatagrams. So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add IPSTATS_MIB_OUTREQUESTS for "OutRequests". Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6.
Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0 ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0 ---------------------------------------------------------------------- "ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3. "OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428.
Signed-off-by: Heng Guo <[email protected]> Reviewed-by: Kun Song <[email protected]> Reviewed-by: Filip Pudak <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3 |
|
| #
e08d0b3d |
| 22-Sep-2023 |
Eric Dumazet <[email protected]> |
inet: implement lockless IP_TOS
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
S
inet: implement lockless IP_TOS
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
ceaa7141 |
| 22-Sep-2023 |
Eric Dumazet <[email protected]> |
inet: implement lockless IP_MTU_DISCOVER
inet->pmtudisc can be read locklessly.
Implement proper lockless reads and writes to inet->pmtudisc
ip_sock_set_mtu_discover() can now be called from arbit
inet: implement lockless IP_MTU_DISCOVER
inet->pmtudisc can be read locklessly.
Implement proper lockless reads and writes to inet->pmtudisc
ip_sock_set_mtu_discover() can now be called from arbitrary contexts.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
c9746e6a |
| 22-Sep-2023 |
Eric Dumazet <[email protected]> |
inet: implement lockless IP_MULTICAST_TTL
inet->mc_ttl can be read locklessly.
Implement proper lockless reads and writes to inet->mc_ttl
Signed-off-by: Eric Dumazet <[email protected]> Reviewed
inet: implement lockless IP_MULTICAST_TTL
inet->mc_ttl can be read locklessly.
Implement proper lockless reads and writes to inet->mc_ttl
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
10bbf165 |
| 21-Sep-2023 |
Eric Dumazet <[email protected]> |
net: implement lockless SO_PRIORITY
This is a followup of 8bf43be799d4 ("net: annotate data-races around sk->sk_priority").
sk->sk_priority can be read and written without holding the socket lock.
net: implement lockless SO_PRIORITY
This is a followup of 8bf43be799d4 ("net: annotate data-races around sk->sk_priority").
sk->sk_priority can be read and written without holding the socket lock.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|