|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4 |
|
| #
eacb1160 |
| 19-Feb-2025 |
Xiao Liang <[email protected]> |
net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This prepares for rtnetlink core to create device in target netns direct
net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This prepares for rtnetlink core to create device in target netns directly, in which case the two namespaces may be different.
Convert common ip_tunnel_newlink() to accept an extra link netns argument.
Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
bb5e62f2 |
| 19-Feb-2025 |
Gal Pressman <[email protected]> |
net: Add options as a flexible array to struct ip_tunnel_info
Remove the hidden assumption that options are allocated at the end of the struct, and teach the compiler about them using a flexible arr
net: Add options as a flexible array to struct ip_tunnel_info
Remove the hidden assumption that options are allocated at the end of the struct, and teach the compiler about them using a flexible array.
With this, we can revert the unsafe_memcpy() call we have in tun_dst_unclone() [1], and resolve the false field-spanning write warning caused by the memcpy() in ip_tunnel_info_opts_set().
The layout of struct ip_tunnel_info remains the same with this patch. Before this patch, there was an implicit padding at the end of the struct, options would be written at 'info + 1' which is after the padding. This will remain the same as this patch explicitly aligns 'options'. The alignment is needed as the options are later casted to different structs, and might result in unaligned memory access.
Pahole output before this patch: struct ip_tunnel_info { struct ip_tunnel_key key; /* 0 64 */
/* XXX last struct has 1 byte of padding */
/* --- cacheline 1 boundary (64 bytes) --- */ struct ip_tunnel_encap encap; /* 64 8 */ struct dst_cache dst_cache; /* 72 16 */ u8 options_len; /* 88 1 */ u8 mode; /* 89 1 */
/* size: 96, cachelines: 2, members: 5 */ /* padding: 6 */ /* paddings: 1, sum paddings: 1 */ /* last cacheline: 32 bytes */ };
Pahole output after this patch: struct ip_tunnel_info { struct ip_tunnel_key key; /* 0 64 */
/* XXX last struct has 1 byte of padding */
/* --- cacheline 1 boundary (64 bytes) --- */ struct ip_tunnel_encap encap; /* 64 8 */ struct dst_cache dst_cache; /* 72 16 */ u8 options_len; /* 88 1 */ u8 mode; /* 89 1 */
/* XXX 6 bytes hole, try to pack */
u8 options[] __attribute__((__aligned__(16))); /* 96 0 */
/* size: 96, cachelines: 2, members: 6 */ /* sum members: 90, holes: 1, sum holes: 6 */ /* paddings: 1, sum paddings: 1 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 6 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(16)));
[1] Commit 13cfd6a6d7ac ("net: Silence false field-spanning write warning in metadata_dst memcpy")
Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Kees Cook <[email protected]> Reviewed-by: Cosmin Ratiu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
ba3fa6e8 |
| 19-Feb-2025 |
Gal Pressman <[email protected]> |
ip_tunnel: Use ip_tunnel_info() helper instead of 'info + 1'
Tunnel options should not be accessed directly, use the ip_tunnel_info() accessor instead.
Signed-off-by: Gal Pressman <[email protected]>
ip_tunnel: Use ip_tunnel_info() helper instead of 'info + 1'
Tunnel options should not be accessed directly, use the ip_tunnel_info() accessor instead.
Signed-off-by: Gal Pressman <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
ad4a3ca6 |
| 22-Oct-2024 |
Ido Schimmel <[email protected]> |
ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_init_flow()
There are code paths from which the function is called without holding the RCU read lock, resulting in a suspicious RCU usa
ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_init_flow()
There are code paths from which the function is called without holding the RCU read lock, resulting in a suspicious RCU usage warning [1].
Fix by using l3mdev_master_upper_ifindex_by_index() which will acquire the RCU read lock before calling l3mdev_master_upper_ifindex_by_index_rcu().
[1] WARNING: suspicious RCU usage 6.12.0-rc3-custom-gac8f72681cf2 #141 Not tainted ----------------------------- net/core/dev.c:876 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/361: #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60
stack backtrace: CPU: 3 UID: 0 PID: 361 Comm: ip Not tainted 6.12.0-rc3-custom-gac8f72681cf2 #141 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl+0xba/0x110 lockdep_rcu_suspicious.cold+0x4f/0xd6 dev_get_by_index_rcu+0x1d3/0x210 l3mdev_master_upper_ifindex_by_index_rcu+0x2b/0xf0 ip_tunnel_bind_dev+0x72f/0xa00 ip_tunnel_newlink+0x368/0x7a0 ipgre_newlink+0x14c/0x170 __rtnl_newlink+0x1173/0x19c0 rtnl_newlink+0x6c/0xa0 rtnetlink_rcv_msg+0x3cc/0xf60 netlink_rcv_skb+0x171/0x450 netlink_unicast+0x539/0x7f0 netlink_sendmsg+0x8c1/0xd80 ____sys_sendmsg+0x8f9/0xc20 ___sys_sendmsg+0x197/0x1e0 __sys_sendmsg+0x122/0x1f0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: db53cd3d88dc ("net: Handle l3mdev in ip_tunnel_init_flow") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3 |
|
| #
9990ddf4 |
| 09-Oct-2024 |
Menglong Dong <[email protected]> |
net: tunnel: make skb_vlan_inet_prepare() return drop reasons
Make skb_vlan_inet_prepare return the skb drop reasons, which is just what pskb_may_pull_reason() returns. Meanwhile, adjust all the cal
net: tunnel: make skb_vlan_inet_prepare() return drop reasons
Make skb_vlan_inet_prepare return the skb drop reasons, which is just what pskb_may_pull_reason() returns. Meanwhile, adjust all the call of it.
Signed-off-by: Menglong Dong <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
7f20dbd7 |
| 09-Oct-2024 |
Menglong Dong <[email protected]> |
net: tunnel: add pskb_inet_may_pull_reason() helper
Introduce the function pskb_inet_may_pull_reason() and make pskb_inet_may_pull a simple inline call to it. The drop reasons of it just come from p
net: tunnel: add pskb_inet_may_pull_reason() helper
Introduce the function pskb_inet_may_pull_reason() and make pskb_inet_may_pull a simple inline call to it. The drop reasons of it just come from pskb_may_pull_reason().
Signed-off-by: Menglong Dong <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
d0193b16 |
| 22-Aug-2024 |
Simon Horman <[email protected]> |
ip_tunnel: Correct spelling in ip_tunnels.h
Correct spelling in ip_tunnels.h As reported by codespell.
Cc: David Ahern <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: http
ip_tunnel: Correct spelling in ip_tunnels.h
Correct spelling in ip_tunnels.h As reported by codespell.
Cc: David Ahern <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10 |
|
| #
db5271d5 |
| 13-Jul-2024 |
Asbjørn Sloth Tønnesen <[email protected]> |
flow_dissector: cleanup FLOW_DISSECTOR_KEY_ENC_FLAGS
Now that TCA_FLOWER_KEY_ENC_FLAGS is unused, as it's former data is stored behind TCA_FLOWER_KEY_ENC_CONTROL, then remove the last bits of FLOW_D
flow_dissector: cleanup FLOW_DISSECTOR_KEY_ENC_FLAGS
Now that TCA_FLOWER_KEY_ENC_FLAGS is unused, as it's former data is stored behind TCA_FLOWER_KEY_ENC_CONTROL, then remove the last bits of FLOW_DISSECTOR_KEY_ENC_FLAGS.
FLOW_DISSECTOR_KEY_ENC_FLAGS is unreleased, and have been in net-next since 2024-06-04.
Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Tested-by: Davide Caratti <[email protected]> Reviewed-by: Davide Caratti <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
668b6a2e |
| 30-May-2024 |
Davide Caratti <[email protected]> |
flow_dissector: add support for tunnel control flags
Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata. This is a prerequisite for matching these control flags using TC flowe
flow_dissector: add support for tunnel control flags
Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata. This is a prerequisite for matching these control flags using TC flower.
Suggested-by: Ilya Maximets <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
c6ae073f |
| 06-Jun-2024 |
Gal Pressman <[email protected]> |
geneve: Fix incorrect inner network header offset when innerprotoinherit is set
When innerprotoinherit is set, the tunneled packets do not have an inner Ethernet header. Change 'maclen' to not alway
geneve: Fix incorrect inner network header offset when innerprotoinherit is set
When innerprotoinherit is set, the tunneled packets do not have an inner Ethernet header. Change 'maclen' to not always assume the header length is ETH_HLEN, as there might not be a MAC header.
This resolves issues with drivers (e.g. mlx5, in mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset to be correct, and use it for TX offloads.
Fixes: d8a6213d70ac ("geneve: fix header validation in geneve[6]_xmit_skb") Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3 |
|
| #
d8a6213d |
| 05-Apr-2024 |
Eric Dumazet <[email protected]> |
geneve: fix header validation in geneve[6]_xmit_skb
syzbot is able to trigger an uninit-value in geneve_xmit() [1]
Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield()) uses skb_pro
geneve: fix header validation in geneve[6]_xmit_skb
syzbot is able to trigger an uninit-value in geneve_xmit() [1]
Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield()) uses skb_protocol(skb, true), pskb_inet_may_pull() is only using skb->protocol.
If anything else than ETH_P_IPV6 or ETH_P_IP is found in skb->protocol, pskb_inet_may_pull() does nothing at all.
If a vlan tag was provided by the caller (af_packet in the syzbot case), the network header might not point to the correct location, and skb linear part could be smaller than expected.
Add skb_vlan_inet_prepare() to perform a complete mac validation.
Use this in geneve for the moment, I suspect we need to adopt this more broadly.
v4 - Jakub reported v3 broke l2_tos_ttl_inherit.sh selftest - Only call __vlan_get_protocol() for vlan types. Link: https://lore.kernel.org/netdev/[email protected]/
v2,v3 - Addressed Sabrina comments on v1 and v2 Link: https://lore.kernel.org/netdev/Zg1l9L2BNoZWZDZG@hog/
[1]
BUG: KMSAN: uninit-value in geneve_xmit_skb drivers/net/geneve.c:910 [inline] BUG: KMSAN: uninit-value in geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030 geneve_xmit_skb drivers/net/geneve.c:910 [inline] geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547 __dev_queue_xmit+0x348d/0x52c0 net/core/dev.c:4335 dev_queue_xmit include/linux/netdevice.h:3091 [inline] packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3081 [inline] packet_sendmsg+0x8bb0/0x9ef0 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795 packet_alloc_skb net/packet/af_packet.c:2930 [inline] packet_snd net/packet/af_packet.c:3024 [inline] packet_sendmsg+0x722d/0x9ef0 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 0 PID: 5033 Comm: syz-executor346 Not tainted 6.9.0-rc1-syzkaller-00005-g928a87efa423 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Fixes: d13f048dd40e ("net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb") Reported-by: [email protected] Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Eric Dumazet <[email protected]> Cc: Phillip Potter <[email protected]> Cc: Sabrina Dubroca <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Reviewed-by: Phillip Potter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc2 |
|
| #
6dd514f4 |
| 27-Mar-2024 |
Michal Swiatkowski <[email protected]> |
pfcp: always set pfcp metadata
In PFCP receive path set metadata needed by flower code to do correct classification based on this metadata.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@lin
pfcp: always set pfcp metadata
In PFCP receive path set metadata needed by flower code to do correct classification based on this metadata.
Signed-off-by: Michal Swiatkowski <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
5832c4a7 |
| 27-Mar-2024 |
Alexander Lobakin <[email protected]> |
ip_tunnel: convert __be16 tunnel flags to bitmaps
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no m
ip_tunnel: convert __be16 tunnel flags to bitmaps
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice.
Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_* occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made.
Most noticeable bloat-o-meter difference (.text):
vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [*] ip_tunnel.ko: 390/-900 (-510) [**] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [*] ip6_tunnel.ko: 118/-10 (108)
[*] gre_flags_to_tnl_flags() grew, but still is inlined [**] ip_tunnel_find() got uninlined, hence such decrease
The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_*() calls here into direct operations on scalars.
Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
117aef12 |
| 27-Mar-2024 |
Alexander Lobakin <[email protected]> |
ip_tunnel: use a separate struct to store tunnel params in the kernel
Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure to store params inside the kernel, IPv4 tunnel code uses th
ip_tunnel: use a separate struct to store tunnel params in the kernel
Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure to store params inside the kernel, IPv4 tunnel code uses the same ip_tunnel_parm which is being used to talk with the userspace. This makes it difficult to alter or add any fields or use a different format for whatever data. Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for now, and use it throughout the code. Define the pieces, where the copy user <-> kernel happens, as standalone functions, and copy the data there field-by-field, so that the kernel-side structure could be easily modified later on and the users wouldn't have to care about this.
Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4 |
|
| #
9b5b3637 |
| 06-Feb-2024 |
Eric Dumazet <[email protected]> |
ip_tunnel: use exit_batch_rtnl() method
exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list.
This saves one rtnl_lock()/rtnl_unlock() p
ip_tunnel: use exit_batch_rtnl() method
exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list.
This saves one rtnl_lock()/rtnl_unlock() pair and one unregister_netdevice_many() call.
This patch takes care of ipip, ip_vti, and ip_gre tunnels.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Antoine Tenart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2 |
|
| #
c6e9dba3 |
| 14-Nov-2023 |
Alce Lafranque <[email protected]> |
vxlan: add support for flowlabel inherit
By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with an option for a fixed value. This commits add the ability to inherit the flow label
vxlan: add support for flowlabel inherit
By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with an option for a fixed value. This commits add the ability to inherit the flow label from the inner packet, like for other tunnel implementations. This enables devices using only L3 headers for ECMP to correctly balance VXLAN-encapsulated IPv6 packets.
``` $ ./ip/ip link add dummy1 type dummy $ ./ip/ip addr add 2001:db8::2/64 dev dummy1 $ ./ip/ip link set up dev dummy1 $ ./ip/ip link add vxlan1 type vxlan id 100 flowlabel inherit remote 2001:db8::1 local 2001:db8::2 $ ./ip/ip link set up dev vxlan1 $ ./ip/ip addr add 2001:db8:1::2/64 dev vxlan1 $ ./ip/ip link set arp off dev vxlan1 $ ping -q 2001:db8:1::1 & $ tshark -d udp.port==8472,vxlan -Vpni dummy1 -c1 [...] Internet Protocol Version 6, Src: 2001:db8::2, Dst: 2001:db8::1 0110 .... = Version: 6 .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0) .... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb [...] Virtual eXtensible Local Area Network Flags: 0x0800, VXLAN Network ID (VNI) Group Policy ID: 0 VXLAN Network Identifier (VNI): 100 [...] Internet Protocol Version 6, Src: 2001:db8:1::2, Dst: 2001:db8:1::1 0110 .... = Version: 6 .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0) .... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb ```
Signed-off-by: Alce Lafranque <[email protected]> Co-developed-by: Vincent Bernat <[email protected]> Signed-off-by: Vincent Bernat <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1 |
|
| #
9b271eba |
| 05-Sep-2023 |
Eric Dumazet <[email protected]> |
ip_tunnels: use DEV_STATS_INC()
syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1]
This can run from multiple cpus without mutual exclusion.
Adopt SMP safe DEV_STATS_INC() to update dev
ip_tunnels: use DEV_STATS_INC()
syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1]
This can run from multiple cpus without mutual exclusion.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
[1] BUG: KCSAN: data-race in iptunnel_xmit / iptunnel_xmit
read-write to 0xffff8881353df170 of 8 bytes by task 30263 on cpu 1: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read-write to 0xffff8881353df170 of 8 bytes by task 30249 on cpu 0: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000018830 -> 0x0000000000018831
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 30249 Comm: syz-executor.4 Not tainted 6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0
Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3 |
|
| #
8bb5e825 |
| 17-Jul-2023 |
Ido Schimmel <[email protected]> |
ip_tunnels: Add nexthop ID field to ip_tunnel_key
Extend the ip_tunnel_key structure with a field indicating the ID of the nexthop object via which the skb should be routed.
The field is going to b
ip_tunnels: Add nexthop ID field to ip_tunnel_key
Extend the ip_tunnel_key structure with a field indicating the ID of the nexthop object via which the skb should be routed.
The field is going to be populated in subsequent patches by the bridge driver in order to indicate to the VXLAN driver which FDB nexthop object to use in order to reach the target host.
Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6 |
|
| #
ac931d4c |
| 07-Apr-2023 |
Christian Ehrig <[email protected]> |
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices
Today ipip devices in collect-metadata mode don't allow for sending FOU or GUE encapsulated packets. This patch lifts the r
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices
Today ipip devices in collect-metadata mode don't allow for sending FOU or GUE encapsulated packets. This patch lifts the restriction by adding a struct ip_tunnel_encap to the tunnel metadata.
On the egress path, the members of this struct can be set by the bpf_skb_set_fou_encap kfunc via a BPF tc-hook. Instead of dropping packets wishing to use additional UDP encapsulation, ip_md_tunnel_xmit now evaluates the contents of this struct and adds the corresponding FOU or GUE header. Furthermore, it is making sure that additional header bytes are taken into account for PMTU discovery.
On the ingress path, an ipip device in collect-metadata mode will fill this struct and a BPF tc-hook can obtain the information via a call to the bpf_skb_get_fou_encap kfunc.
The minor change to ip_tunnel_encap, which now takes a pointer to struct ip_tunnel_encap instead of struct ip_tunnel, allows us to control FOU encap type and parameters on a per packet-level.
Signed-off-by: Christian Ehrig <[email protected]> Link: https://lore.kernel.org/r/cfea47de655d0f870248abf725932f851b53960a.1680874078.git.cehrig@cloudflare.com Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc5, v6.3-rc4, v6.3-rc3 |
|
| #
bc9d003d |
| 16-Mar-2023 |
Gavin Li <[email protected]> |
ip_tunnel: Preserve pointer const in ip_tunnel_info_opts
Change ip_tunnel_info_opts( ) from static function to macro to cast return value and preserve the const-ness of the pointer.
Signed-off-by:
ip_tunnel: Preserve pointer const in ip_tunnel_info_opts
Change ip_tunnel_info_opts( ) from static function to macro to cast return value and preserve the const-ness of the pointer.
Signed-off-by: Gavin Li <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0 |
|
| #
b86fca80 |
| 29-Sep-2022 |
Liu Jian <[email protected]> |
net: Add helper function to parse netlink msg of ip_tunnel_parm
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm. Reduces duplicate code, no actual functional changes.
Signed-off-
net: Add helper function to parse netlink msg of ip_tunnel_parm
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm. Reduces duplicate code, no actual functional changes.
Signed-off-by: Liu Jian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
537dd2d9 |
| 29-Sep-2022 |
Liu Jian <[email protected]> |
net: Add helper function to parse netlink msg of ip_tunnel_encap
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap. Reduces duplicate code, no actual functional changes.
Sig
net: Add helper function to parse netlink msg of ip_tunnel_encap
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap. Reduces duplicate code, no actual functional changes.
Signed-off-by: Liu Jian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2 |
|
| #
7ec9fce4 |
| 18-Aug-2022 |
Eyal Birger <[email protected]> |
ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels
Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a "flow_flags" member to struct ip_tunnel_key which was
ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels
Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a "flow_flags" member to struct ip_tunnel_key which was later used by the commit in the fixes tag to avoid dropping packets with sources that aren't locally configured when set in bpf_set_tunnel_key().
VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE were not.
This commit fixes this omission by making ip_tunnel_init_flow() receive the flow flags from the tunnel key in the relevant collect_md paths.
Fixes: b8fff748521c ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key") Signed-off-by: Eyal Birger <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Paul Chaignon <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v6.0-rc1, v5.19 |
|
| #
451ef36b |
| 25-Jul-2022 |
Paul Chaignon <[email protected]> |
ip_tunnels: Add new flow flags field to ip_tunnel_key
This commit extends the ip_tunnel_key struct with a new field for the flow flags, to pass them to the route lookups. This new field will be popu
ip_tunnels: Add new flow flags field to ip_tunnel_key
This commit extends the ip_tunnel_key struct with a new field for the flow flags, to pass them to the route lookups. This new field will be populated and used in subsequent commits.
Signed-off-by: Paul Chaignon <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/f8bfd4983bd06685a59b1e3ba76ca27496f51ef3.1658759380.git.paul@isovalent.com
show more ...
|
|
Revision tags: v5.19-rc8 |
|
| #
7074732c |
| 21-Jul-2022 |
Matthias May <[email protected]> |
ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN
The current code allows for VXLAN and GENEVE to inherit the TOS respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6. However wh
ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN
The current code allows for VXLAN and GENEVE to inherit the TOS respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6. However when the payload is VLAN encapsulated, then this inheriting does not work, because the visible skb-protocol is of type ETH_P_8021Q or ETH_P_8021AD.
Instead of skb->protocol use skb_protocol().
Signed-off-by: Matthias May <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|