History log of /linux-6.15/include/net/ip_tunnels.h (Results 1 – 25 of 120)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4
# eacb1160 19-Feb-2025 Xiao Liang <[email protected]>

net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops

When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns direct

net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops

When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Convert common ip_tunnel_newlink() to accept an extra link netns
argument.

Signed-off-by: Xiao Liang <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# bb5e62f2 19-Feb-2025 Gal Pressman <[email protected]>

net: Add options as a flexible array to struct ip_tunnel_info

Remove the hidden assumption that options are allocated at the end of
the struct, and teach the compiler about them using a flexible arr

net: Add options as a flexible array to struct ip_tunnel_info

Remove the hidden assumption that options are allocated at the end of
the struct, and teach the compiler about them using a flexible array.

With this, we can revert the unsafe_memcpy() call we have in
tun_dst_unclone() [1], and resolve the false field-spanning write
warning caused by the memcpy() in ip_tunnel_info_opts_set().

The layout of struct ip_tunnel_info remains the same with this patch.
Before this patch, there was an implicit padding at the end of the
struct, options would be written at 'info + 1' which is after the
padding.
This will remain the same as this patch explicitly aligns 'options'.
The alignment is needed as the options are later casted to different
structs, and might result in unaligned memory access.

Pahole output before this patch:
struct ip_tunnel_info {
struct ip_tunnel_key key; /* 0 64 */

/* XXX last struct has 1 byte of padding */

/* --- cacheline 1 boundary (64 bytes) --- */
struct ip_tunnel_encap encap; /* 64 8 */
struct dst_cache dst_cache; /* 72 16 */
u8 options_len; /* 88 1 */
u8 mode; /* 89 1 */

/* size: 96, cachelines: 2, members: 5 */
/* padding: 6 */
/* paddings: 1, sum paddings: 1 */
/* last cacheline: 32 bytes */
};

Pahole output after this patch:
struct ip_tunnel_info {
struct ip_tunnel_key key; /* 0 64 */

/* XXX last struct has 1 byte of padding */

/* --- cacheline 1 boundary (64 bytes) --- */
struct ip_tunnel_encap encap; /* 64 8 */
struct dst_cache dst_cache; /* 72 16 */
u8 options_len; /* 88 1 */
u8 mode; /* 89 1 */

/* XXX 6 bytes hole, try to pack */

u8 options[] __attribute__((__aligned__(16))); /* 96 0 */

/* size: 96, cachelines: 2, members: 6 */
/* sum members: 90, holes: 1, sum holes: 6 */
/* paddings: 1, sum paddings: 1 */
/* forced alignments: 1, forced holes: 1, sum forced holes: 6 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(16)));

[1] Commit 13cfd6a6d7ac ("net: Silence false field-spanning write warning in metadata_dst memcpy")

Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: Cosmin Ratiu <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Gal Pressman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# ba3fa6e8 19-Feb-2025 Gal Pressman <[email protected]>

ip_tunnel: Use ip_tunnel_info() helper instead of 'info + 1'

Tunnel options should not be accessed directly, use the ip_tunnel_info()
accessor instead.

Signed-off-by: Gal Pressman <[email protected]>

ip_tunnel: Use ip_tunnel_info() helper instead of 'info + 1'

Tunnel options should not be accessed directly, use the ip_tunnel_info()
accessor instead.

Signed-off-by: Gal Pressman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5
# ad4a3ca6 22-Oct-2024 Ido Schimmel <[email protected]>

ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_init_flow()

There are code paths from which the function is called without holding
the RCU read lock, resulting in a suspicious RCU usa

ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_init_flow()

There are code paths from which the function is called without holding
the RCU read lock, resulting in a suspicious RCU usage warning [1].

Fix by using l3mdev_master_upper_ifindex_by_index() which will acquire
the RCU read lock before calling
l3mdev_master_upper_ifindex_by_index_rcu().

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gac8f72681cf2 #141 Not tainted
-----------------------------
net/core/dev.c:876 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/361:
#0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 3 UID: 0 PID: 361 Comm: ip Not tainted 6.12.0-rc3-custom-gac8f72681cf2 #141
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0xba/0x110
lockdep_rcu_suspicious.cold+0x4f/0xd6
dev_get_by_index_rcu+0x1d3/0x210
l3mdev_master_upper_ifindex_by_index_rcu+0x2b/0xf0
ip_tunnel_bind_dev+0x72f/0xa00
ip_tunnel_newlink+0x368/0x7a0
ipgre_newlink+0x14c/0x170
__rtnl_newlink+0x1173/0x19c0
rtnl_newlink+0x6c/0xa0
rtnetlink_rcv_msg+0x3cc/0xf60
netlink_rcv_skb+0x171/0x450
netlink_unicast+0x539/0x7f0
netlink_sendmsg+0x8c1/0xd80
____sys_sendmsg+0x8f9/0xc20
___sys_sendmsg+0x197/0x1e0
__sys_sendmsg+0x122/0x1f0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: db53cd3d88dc ("net: Handle l3mdev in ip_tunnel_init_flow")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.12-rc4, v6.12-rc3
# 9990ddf4 09-Oct-2024 Menglong Dong <[email protected]>

net: tunnel: make skb_vlan_inet_prepare() return drop reasons

Make skb_vlan_inet_prepare return the skb drop reasons, which is just
what pskb_may_pull_reason() returns. Meanwhile, adjust all the cal

net: tunnel: make skb_vlan_inet_prepare() return drop reasons

Make skb_vlan_inet_prepare return the skb drop reasons, which is just
what pskb_may_pull_reason() returns. Meanwhile, adjust all the call of
it.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


# 7f20dbd7 09-Oct-2024 Menglong Dong <[email protected]>

net: tunnel: add pskb_inet_may_pull_reason() helper

Introduce the function pskb_inet_may_pull_reason() and make
pskb_inet_may_pull a simple inline call to it. The drop reasons of it just
come from p

net: tunnel: add pskb_inet_may_pull_reason() helper

Introduce the function pskb_inet_may_pull_reason() and make
pskb_inet_may_pull a simple inline call to it. The drop reasons of it just
come from pskb_may_pull_reason().

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# d0193b16 22-Aug-2024 Simon Horman <[email protected]>

ip_tunnel: Correct spelling in ip_tunnels.h

Correct spelling in ip_tunnels.h
As reported by codespell.

Cc: David Ahern <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: http

ip_tunnel: Correct spelling in ip_tunnels.h

Correct spelling in ip_tunnels.h
As reported by codespell.

Cc: David Ahern <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10
# db5271d5 13-Jul-2024 Asbjørn Sloth Tønnesen <[email protected]>

flow_dissector: cleanup FLOW_DISSECTOR_KEY_ENC_FLAGS

Now that TCA_FLOWER_KEY_ENC_FLAGS is unused, as it's
former data is stored behind TCA_FLOWER_KEY_ENC_CONTROL,
then remove the last bits of FLOW_D

flow_dissector: cleanup FLOW_DISSECTOR_KEY_ENC_FLAGS

Now that TCA_FLOWER_KEY_ENC_FLAGS is unused, as it's
former data is stored behind TCA_FLOWER_KEY_ENC_CONTROL,
then remove the last bits of FLOW_DISSECTOR_KEY_ENC_FLAGS.

FLOW_DISSECTOR_KEY_ENC_FLAGS is unreleased, and have been
in net-next since 2024-06-04.

Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]>
Tested-by: Davide Caratti <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 668b6a2e 30-May-2024 Davide Caratti <[email protected]>

flow_dissector: add support for tunnel control flags

Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata.
This is a prerequisite for matching these control flags using TC flowe

flow_dissector: add support for tunnel control flags

Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata.
This is a prerequisite for matching these control flags using TC flower.

Suggested-by: Ilya Maximets <[email protected]>
Signed-off-by: Davide Caratti <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

show more ...


# c6ae073f 06-Jun-2024 Gal Pressman <[email protected]>

geneve: Fix incorrect inner network header offset when innerprotoinherit is set

When innerprotoinherit is set, the tunneled packets do not have an inner
Ethernet header.
Change 'maclen' to not alway

geneve: Fix incorrect inner network header offset when innerprotoinherit is set

When innerprotoinherit is set, the tunneled packets do not have an inner
Ethernet header.
Change 'maclen' to not always assume the header length is ETH_HLEN, as
there might not be a MAC header.

This resolves issues with drivers (e.g. mlx5, in
mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset
to be correct, and use it for TX offloads.

Fixes: d8a6213d70ac ("geneve: fix header validation in geneve[6]_xmit_skb")
Signed-off-by: Gal Pressman <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3
# d8a6213d 05-Apr-2024 Eric Dumazet <[email protected]>

geneve: fix header validation in geneve[6]_xmit_skb

syzbot is able to trigger an uninit-value in geneve_xmit() [1]

Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield())
uses skb_pro

geneve: fix header validation in geneve[6]_xmit_skb

syzbot is able to trigger an uninit-value in geneve_xmit() [1]

Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield())
uses skb_protocol(skb, true), pskb_inet_may_pull() is only using
skb->protocol.

If anything else than ETH_P_IPV6 or ETH_P_IP is found in skb->protocol,
pskb_inet_may_pull() does nothing at all.

If a vlan tag was provided by the caller (af_packet in the syzbot case),
the network header might not point to the correct location, and skb
linear part could be smaller than expected.

Add skb_vlan_inet_prepare() to perform a complete mac validation.

Use this in geneve for the moment, I suspect we need to adopt this
more broadly.

v4 - Jakub reported v3 broke l2_tos_ttl_inherit.sh selftest
- Only call __vlan_get_protocol() for vlan types.
Link: https://lore.kernel.org/netdev/[email protected]/

v2,v3 - Addressed Sabrina comments on v1 and v2
Link: https://lore.kernel.org/netdev/Zg1l9L2BNoZWZDZG@hog/

[1]

BUG: KMSAN: uninit-value in geneve_xmit_skb drivers/net/geneve.c:910 [inline]
BUG: KMSAN: uninit-value in geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
geneve_xmit_skb drivers/net/geneve.c:910 [inline]
geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
__dev_queue_xmit+0x348d/0x52c0 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8bb0/0x9ef0 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
__sys_sendto+0x685/0x830 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75

Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1318 [inline]
alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x722d/0x9ef0 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
__sys_sendto+0x685/0x830 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75

CPU: 0 PID: 5033 Comm: syz-executor346 Not tainted 6.9.0-rc1-syzkaller-00005-g928a87efa423 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024

Fixes: d13f048dd40e ("net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb")
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Phillip Potter <[email protected]>
Cc: Sabrina Dubroca <[email protected]>
Reviewed-by: Sabrina Dubroca <[email protected]>
Reviewed-by: Phillip Potter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.9-rc2
# 6dd514f4 27-Mar-2024 Michal Swiatkowski <[email protected]>

pfcp: always set pfcp metadata

In PFCP receive path set metadata needed by flower code to do correct
classification based on this metadata.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@lin

pfcp: always set pfcp metadata

In PFCP receive path set metadata needed by flower code to do correct
classification based on this metadata.

Signed-off-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Marcin Szycik <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


# 5832c4a7 27-Mar-2024 Alexander Lobakin <[email protected]>

ip_tunnel: convert __be16 tunnel flags to bitmaps

Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
have been defined as __be16. Now all of those 16 bits are occupied
and there's no m

ip_tunnel: convert __be16 tunnel flags to bitmaps

Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
have been defined as __be16. Now all of those 16 bits are occupied
and there's no more free space for new flags.
It can't be simply switched to a bigger container with no
adjustments to the values, since it's an explicit Endian storage,
and on LE systems (__be16)0x0001 equals to
(__be64)0x0001000000000000.
We could probably define new 64-bit flags depending on the
Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
LE, but that would introduce an Endianness dependency and spawn a
ton of Sparse warnings. To mitigate them, all of those places which
were adjusted with this change would be touched anyway, so why not
define stuff properly if there's no choice.

Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
value already coded and a fistful of <16 <-> bitmap> converters and
helpers. The two flags which have a different bit position are
SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
__cpu_to_be16(), but as (__force __be16), i.e. had different
positions on LE and BE. Now they both have strongly defined places.
Change all __be16 fields which were used to store those flags, to
IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
unsigned long[1] for now, and replace all TUNNEL_* occurrences to
their bitmap counterparts. Use the converters in the places which talk
to the userspace, hardware (NFP) or other hosts (GRE header). The rest
must explicitly use the new flags only. This must be done at once,
otherwise there will be too many conversions throughout the code in
the intermediate commits.
Finally, disable the old __be16 flags for use in the kernel code
(except for the two 'irregular' flags mentioned above), to prevent
any accidental (mis)use of them. For the userspace, nothing is
changed, only additions were made.

Most noticeable bloat-o-meter difference (.text):

vmlinux: 307/-1 (306)
gre.ko: 62/0 (62)
ip_gre.ko: 941/-217 (724) [*]
ip_tunnel.ko: 390/-900 (-510) [**]
ip_vti.ko: 138/0 (138)
ip6_gre.ko: 534/-18 (516) [*]
ip6_tunnel.ko: 118/-10 (108)

[*] gre_flags_to_tnl_flags() grew, but still is inlined
[**] ip_tunnel_find() got uninlined, hence such decrease

The average code size increase in non-extreme case is 100-200 bytes
per module, mostly due to sizeof(long) > sizeof(__be16), as
%__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers
are able to expand the majority of bitmap_*() calls here into direct
operations on scalars.

Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


# 117aef12 27-Mar-2024 Alexander Lobakin <[email protected]>

ip_tunnel: use a separate struct to store tunnel params in the kernel

Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure
to store params inside the kernel, IPv4 tunnel code uses th

ip_tunnel: use a separate struct to store tunnel params in the kernel

Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure
to store params inside the kernel, IPv4 tunnel code uses the same
ip_tunnel_parm which is being used to talk with the userspace.
This makes it difficult to alter or add any fields or use a
different format for whatever data.
Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for
now, and use it throughout the code. Define the pieces, where the copy
user <-> kernel happens, as standalone functions, and copy the data
there field-by-field, so that the kernel-side structure could be easily
modified later on and the users wouldn't have to care about this.

Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4
# 9b5b3637 06-Feb-2024 Eric Dumazet <[email protected]>

ip_tunnel: use exit_batch_rtnl() method

exit_batch_rtnl() is called while RTNL is held,
and devices to be unregistered can be queued in the dev_kill_list.

This saves one rtnl_lock()/rtnl_unlock() p

ip_tunnel: use exit_batch_rtnl() method

exit_batch_rtnl() is called while RTNL is held,
and devices to be unregistered can be queued in the dev_kill_list.

This saves one rtnl_lock()/rtnl_unlock() pair
and one unregister_netdevice_many() call.

This patch takes care of ipip, ip_vti, and ip_gre tunnels.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Antoine Tenart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2
# c6e9dba3 14-Nov-2023 Alce Lafranque <[email protected]>

vxlan: add support for flowlabel inherit

By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with
an option for a fixed value. This commits add the ability to inherit the
flow label

vxlan: add support for flowlabel inherit

By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with
an option for a fixed value. This commits add the ability to inherit the
flow label from the inner packet, like for other tunnel implementations.
This enables devices using only L3 headers for ECMP to correctly balance
VXLAN-encapsulated IPv6 packets.

```
$ ./ip/ip link add dummy1 type dummy
$ ./ip/ip addr add 2001:db8::2/64 dev dummy1
$ ./ip/ip link set up dev dummy1
$ ./ip/ip link add vxlan1 type vxlan id 100 flowlabel inherit remote 2001:db8::1 local 2001:db8::2
$ ./ip/ip link set up dev vxlan1
$ ./ip/ip addr add 2001:db8:1::2/64 dev vxlan1
$ ./ip/ip link set arp off dev vxlan1
$ ping -q 2001:db8:1::1 &
$ tshark -d udp.port==8472,vxlan -Vpni dummy1 -c1
[...]
Internet Protocol Version 6, Src: 2001:db8::2, Dst: 2001:db8::1
0110 .... = Version: 6
.... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
.... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
.... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
.... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb
[...]
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 100
[...]
Internet Protocol Version 6, Src: 2001:db8:1::2, Dst: 2001:db8:1::1
0110 .... = Version: 6
.... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
.... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
.... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
.... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb
```

Signed-off-by: Alce Lafranque <[email protected]>
Co-developed-by: Vincent Bernat <[email protected]>
Signed-off-by: Vincent Bernat <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1
# 9b271eba 05-Sep-2023 Eric Dumazet <[email protected]>

ip_tunnels: use DEV_STATS_INC()

syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1]

This can run from multiple cpus without mutual exclusion.

Adopt SMP safe DEV_STATS_INC() to update dev

ip_tunnels: use DEV_STATS_INC()

syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1]

This can run from multiple cpus without mutual exclusion.

Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.

[1]
BUG: KCSAN: data-race in iptunnel_xmit / iptunnel_xmit

read-write to 0xffff8881353df170 of 8 bytes by task 30263 on cpu 1:
iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline]
iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87
ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560
__dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
__bpf_tx_skb net/core/filter.c:2129 [inline]
__bpf_redirect_no_mac net/core/filter.c:2159 [inline]
__bpf_redirect+0x723/0x9c0 net/core/filter.c:2182
____bpf_clone_redirect net/core/filter.c:2453 [inline]
bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425
___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954
__bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195
bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline]
__bpf_prog_run include/linux/filter.h:609 [inline]
bpf_prog_run include/linux/filter.h:616 [inline]
bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423
bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045
bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996
__sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353
__do_sys_bpf kernel/bpf/syscall.c:5439 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5437 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read-write to 0xffff8881353df170 of 8 bytes by task 30249 on cpu 0:
iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline]
iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87
ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560
__dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
__bpf_tx_skb net/core/filter.c:2129 [inline]
__bpf_redirect_no_mac net/core/filter.c:2159 [inline]
__bpf_redirect+0x723/0x9c0 net/core/filter.c:2182
____bpf_clone_redirect net/core/filter.c:2453 [inline]
bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425
___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954
__bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195
bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline]
__bpf_prog_run include/linux/filter.h:609 [inline]
bpf_prog_run include/linux/filter.h:616 [inline]
bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423
bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045
bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996
__sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353
__do_sys_bpf kernel/bpf/syscall.c:5439 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5437 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x0000000000018830 -> 0x0000000000018831

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 30249 Comm: syz-executor.4 Not tainted 6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0

Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3
# 8bb5e825 17-Jul-2023 Ido Schimmel <[email protected]>

ip_tunnels: Add nexthop ID field to ip_tunnel_key

Extend the ip_tunnel_key structure with a field indicating the ID of the
nexthop object via which the skb should be routed.

The field is going to b

ip_tunnels: Add nexthop ID field to ip_tunnel_key

Extend the ip_tunnel_key structure with a field indicating the ID of the
nexthop object via which the skb should be routed.

The field is going to be populated in subsequent patches by the bridge
driver in order to indicate to the VXLAN driver which FDB nexthop object
to use in order to reach the target host.

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6
# ac931d4c 07-Apr-2023 Christian Ehrig <[email protected]>

ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices

Today ipip devices in collect-metadata mode don't allow for sending FOU
or GUE encapsulated packets. This patch lifts the r

ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices

Today ipip devices in collect-metadata mode don't allow for sending FOU
or GUE encapsulated packets. This patch lifts the restriction by adding
a struct ip_tunnel_encap to the tunnel metadata.

On the egress path, the members of this struct can be set by the
bpf_skb_set_fou_encap kfunc via a BPF tc-hook. Instead of dropping packets
wishing to use additional UDP encapsulation, ip_md_tunnel_xmit now
evaluates the contents of this struct and adds the corresponding FOU or
GUE header. Furthermore, it is making sure that additional header bytes
are taken into account for PMTU discovery.

On the ingress path, an ipip device in collect-metadata mode will fill this
struct and a BPF tc-hook can obtain the information via a call to the
bpf_skb_get_fou_encap kfunc.

The minor change to ip_tunnel_encap, which now takes a pointer to
struct ip_tunnel_encap instead of struct ip_tunnel, allows us to control
FOU encap type and parameters on a per packet-level.

Signed-off-by: Christian Ehrig <[email protected]>
Link: https://lore.kernel.org/r/cfea47de655d0f870248abf725932f851b53960a.1680874078.git.cehrig@cloudflare.com
Signed-off-by: Alexei Starovoitov <[email protected]>

show more ...


Revision tags: v6.3-rc5, v6.3-rc4, v6.3-rc3
# bc9d003d 16-Mar-2023 Gavin Li <[email protected]>

ip_tunnel: Preserve pointer const in ip_tunnel_info_opts

Change ip_tunnel_info_opts( ) from static function to macro to cast return
value and preserve the const-ness of the pointer.

Signed-off-by:

ip_tunnel: Preserve pointer const in ip_tunnel_info_opts

Change ip_tunnel_info_opts( ) from static function to macro to cast return
value and preserve the const-ness of the pointer.

Signed-off-by: Gavin Li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0
# b86fca80 29-Sep-2022 Liu Jian <[email protected]>

net: Add helper function to parse netlink msg of ip_tunnel_parm

Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm.
Reduces duplicate code, no actual functional changes.

Signed-off-

net: Add helper function to parse netlink msg of ip_tunnel_parm

Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm.
Reduces duplicate code, no actual functional changes.

Signed-off-by: Liu Jian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


# 537dd2d9 29-Sep-2022 Liu Jian <[email protected]>

net: Add helper function to parse netlink msg of ip_tunnel_encap

Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap.
Reduces duplicate code, no actual functional changes.

Sig

net: Add helper function to parse netlink msg of ip_tunnel_encap

Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap.
Reduces duplicate code, no actual functional changes.

Signed-off-by: Liu Jian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2
# 7ec9fce4 18-Aug-2022 Eyal Birger <[email protected]>

ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels

Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key")
added a "flow_flags" member to struct ip_tunnel_key which was

ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels

Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key")
added a "flow_flags" member to struct ip_tunnel_key which was later used by
the commit in the fixes tag to avoid dropping packets with sources that
aren't locally configured when set in bpf_set_tunnel_key().

VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE
were not.

This commit fixes this omission by making ip_tunnel_init_flow() receive
the flow flags from the tunnel key in the relevant collect_md paths.

Fixes: b8fff748521c ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key")
Signed-off-by: Eyal Birger <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Paul Chaignon <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

show more ...


Revision tags: v6.0-rc1, v5.19
# 451ef36b 25-Jul-2022 Paul Chaignon <[email protected]>

ip_tunnels: Add new flow flags field to ip_tunnel_key

This commit extends the ip_tunnel_key struct with a new field for the
flow flags, to pass them to the route lookups. This new field will be
popu

ip_tunnels: Add new flow flags field to ip_tunnel_key

This commit extends the ip_tunnel_key struct with a new field for the
flow flags, to pass them to the route lookups. This new field will be
populated and used in subsequent commits.

Signed-off-by: Paul Chaignon <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/f8bfd4983bd06685a59b1e3ba76ca27496f51ef3.1658759380.git.paul@isovalent.com

show more ...


Revision tags: v5.19-rc8
# 7074732c 21-Jul-2022 Matthias May <[email protected]>

ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN

The current code allows for VXLAN and GENEVE to inherit the TOS
respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6.
However wh

ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN

The current code allows for VXLAN and GENEVE to inherit the TOS
respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6.
However when the payload is VLAN encapsulated, then this inheriting
does not work, because the visible skb-protocol is of type
ETH_P_8021Q or ETH_P_8021AD.

Instead of skb->protocol use skb_protocol().

Signed-off-by: Matthias May <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


12345