|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
8ef890df |
| 07-Mar-2025 |
Jakub Kicinski <[email protected]> |
net: move misc netdev_lock flavors to a separate header
Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuildin
net: move misc netdev_lock flavors to a separate header
Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers).
The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h.
Acked-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
1c5bf4de |
| 25-Feb-2025 |
Alexander Lobakin <[email protected]> |
veth: use napi_skb_cache_get_bulk() instead of xdp_alloc_skb_bulk()
Now that we can bulk-allocate skbs from the NAPI cache, use that function to do that in veth as well instead of direct allocation
veth: use napi_skb_cache_get_bulk() instead of xdp_alloc_skb_bulk()
Now that we can bulk-allocate skbs from the NAPI cache, use that function to do that in veth as well instead of direct allocation from the kmem caches. veth uses NAPI and GRO, so this is both context-safe and beneficial.
Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4 |
|
| #
cf517ac1 |
| 19-Feb-2025 |
Xiao Liang <[email protected]> |
net: Use link/peer netns in newlink() of rtnl_link_ops
Add two helper functions - rtnl_newlink_link_net() and rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back to link netns, a
net: Use link/peer netns in newlink() of rtnl_link_ops
Add two helper functions - rtnl_newlink_link_net() and rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back to link netns, and link netns falls back to source netns.
Convert the use of params->net in netdevice drivers to one of the helper functions for clarity.
Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
69c7be1b |
| 19-Feb-2025 |
Xiao Liang <[email protected]> |
rtnetlink: Pack newlink() params into struct
There are 4 net namespaces involved when creating links:
- source netns - where the netlink socket resides, - target netns - where to put the device b
rtnetlink: Pack newlink() params into struct
There are 4 net namespaces involved when creating links:
- source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device.
Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request.
+------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+
When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning.
On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead.
To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially.
Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
56d95b0a |
| 11-Dec-2024 |
Alexander Lobakin <[email protected]> |
xdp: get rid of xdp_frame::mem.id
Initially, xdp_frame::mem.id was used to search for the corresponding &page_pool to return the page correctly. However, after that struct page was extended to have
xdp: get rid of xdp_frame::mem.id
Initially, xdp_frame::mem.id was used to search for the corresponding &page_pool to return the page correctly. However, after that struct page was extended to have a direct pointer to its PP (netmem has it as well), further keeping of this field makes no sense. xdp_return_frame_bulk() still used it to do a lookup, and this leftover is now removed. Remove xdp_frame::mem and replace it with ::mem_type, as only memory type still matters and we need to know it to be able to free the frame correctly. As a cute side effect, we can now make every scalar field in &xdp_frame of 4 byte width, speeding up accesses to them.
Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1 |
|
| #
48327566 |
| 29-Nov-2024 |
Cong Wang <[email protected]> |
rtnetlink: fix double call of rtnl_link_get_net_ifla()
Currently rtnl_link_get_net_ifla() gets called twice when we create peer devices, once in rtnl_add_peer_net() and once in each ->newlink() impl
rtnetlink: fix double call of rtnl_link_get_net_ifla()
Currently rtnl_link_get_net_ifla() gets called twice when we create peer devices, once in rtnl_add_peer_net() and once in each ->newlink() implementation.
This looks safer, however, it leads to a classic Time-of-Check to Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And because of the lack of checking error pointer of the second call, it also leads to a kernel crash as reported by syzbot.
Fix this by getting rid of the second call, which already becomes redudant after Kuniyuki's work. We have to propagate the result of the first rtnl_link_get_net_ifla() down to each ->newlink().
Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=21ba4d5adff0b6a7cfc6 Fixes: 0eb87b02a705 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.") Fixes: 6b84e558e95d ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.") Fixes: fefd5d082172 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.") Cc: Kuniyuki Iwashima <[email protected]> Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7 |
|
| #
0eb87b02 |
| 08-Nov-2024 |
Kuniyuki Iwashima <[email protected]> |
veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.
For per-netns RTNL, we need to prefetch the peer device's netns.
Let's set rtnl_link_ops.peer_type and accordingly remove duplicated validation
veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.
For per-netns RTNL, we need to prefetch the peer device's netns.
Let's set rtnl_link_ops.peer_type and accordingly remove duplicated validation in ->newlink().
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
00d066a4 |
| 29-Aug-2024 |
Alexander Lobakin <[email protected]> |
netdev_features: convert NETIF_F_LLTX to dev->lltx
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_featur
netdev_features: convert NETIF_F_LLTX to dev->lltx
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3 |
|
| #
45160ceb |
| 05-Aug-2024 |
Breno Leitao <[email protected]> |
net: veth: Disable netpoll support
The current implementation of netpoll in veth devices leads to suboptimal behavior, as it triggers warnings due to the invocation of __netif_rx() within a softirq
net: veth: Disable netpoll support
The current implementation of netpoll in veth devices leads to suboptimal behavior, as it triggers warnings due to the invocation of __netif_rx() within a softirq context. This is not compliant with expected practices, as __netif_rx() has the following statement:
lockdep_assert_once(hardirq_count() | softirq_count());
Given that veth devices typically do not benefit from the functionalities provided by netpoll, Disable netpoll for veth interfaces.
Signed-off-by: Breno Leitao <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
173e7622 |
| 02-May-2024 |
Mina Almasry <[email protected]> |
Revert "net: mirror skb frag ref/unref helpers"
This reverts commit a580ea994fd37f4105028f5a85c38ff6508a2b25.
This revert is to resolve Dragos's report of page_pool leak here: https://lore.kernel.o
Revert "net: mirror skb frag ref/unref helpers"
This reverts commit a580ea994fd37f4105028f5a85c38ff6508a2b25.
This revert is to resolve Dragos's report of page_pool leak here: https://lore.kernel.org/lkml/[email protected]/
The reverted patch interacts very badly with commit 2cc3aeb5eccc ("skbuff: Fix a potential race while recycling page_pool packets"). The reverted commit hopes that the pp_recycle + is_pp_page variables do not change between the skb_frag_ref and skb_frag_unref operation. If such a change occurs, the skb_frag_ref/unref will not operate on the same reference type. In the case of Dragos's report, the grabbed ref was a pp ref, but the unref was a page ref, because the pp_recycle setting on the skb was changed.
Attempting to fix this issue on the fly is risky. Lets revert and I hope to reland this with better understanding and testing to ensure we don't regress some edge case while streamlining skb reffing.
Fixes: a580ea994fd3 ("net: mirror skb frag ref/unref helpers") Reported-by: Dragos Tatulea <[email protected]> Signed-off-by: Mina Almasry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
a580ea99 |
| 10-Apr-2024 |
Mina Almasry <[email protected]> |
net: mirror skb frag ref/unref helpers
Refactor some of the skb frag ref/unref helpers for improved clarity.
Implement napi_pp_get_page() to be the mirror counterpart of napi_pp_put_page().
Implem
net: mirror skb frag ref/unref helpers
Refactor some of the skb frag ref/unref helpers for improved clarity.
Implement napi_pp_get_page() to be the mirror counterpart of napi_pp_put_page().
Implement skb_page_ref() to be the mirror of skb_page_unref().
Improve __skb_frag_ref() to become a mirror counterpart of __skb_frag_unref(). Previously unref could handle pp & non-pp pages, while the ref could only handle non-pp pages. Now both the ref & unref helpers can correctly handle both pp & non-pp pages.
Now that __skb_frag_ref() can handle both pp & non-pp pages, remove skb_pp_frag_ref(), and use __skb_frag_ref() instead. This lets us remove pp specific handling from skb_try_coalesce.
Additionally, since __skb_frag_ref() can now handle both pp & non-pp pages, a latent issue in skb_shift() should now be fixed. Previously this function would do a non-pp ref & pp unref on potential pp frags (fragfrom). After this patch, skb_shift() should correctly do a pp ref/unref on pp frags.
Signed-off-by: Mina Almasry <[email protected]> Reviewed-by: Dragos Tatulea <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
f6d827b1 |
| 10-Apr-2024 |
Mina Almasry <[email protected]> |
net: move skb ref helpers to new header
Add a new header, linux/skbuff_ref.h, which contains all the skb_*_ref() helpers. Many of the consumers of skbuff.h do not actually use any of the skb ref hel
net: move skb ref helpers to new header
Add a new header, linux/skbuff_ref.h, which contains all the skb_*_ref() helpers. Many of the consumers of skbuff.h do not actually use any of the skb ref helpers, and we can speed up compilation a bit by minimizing this header file.
Additionally in the later patch in the series we add page_pool support to skb_frag_ref(), which requires some page_pool dependencies. We can now add these dependencies to skbuff_ref.h instead of a very ubiquitous skbuff.h
Signed-off-by: Mina Almasry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
d7db7775 |
| 13-Mar-2024 |
Ignat Korchagin <[email protected]> |
net: veth: do not manipulate GRO when using XDP
Commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP") tried to fix the fact that GRO was not possible without XDP, because veth did not u
net: veth: do not manipulate GRO when using XDP
Commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP") tried to fix the fact that GRO was not possible without XDP, because veth did not use NAPI without XDP. However, it also introduced the behaviour that GRO is always enabled, when XDP is enabled.
While it might be desired for most cases, it is confusing for the user at best as the GRO flag suddenly changes, when an XDP program is attached. It also introduces some complexities in state management as was partially addressed in commit fe9f801355f0 ("net: veth: clear GRO when clearing XDP even when down").
But the biggest problem is that it is not possible to disable GRO at all, when an XDP program is attached, which might be needed for some use cases.
Fix this by not touching the GRO flag on XDP enable/disable as the code already supports switching to NAPI if either GRO or XDP is requested.
Link: https://lore.kernel.org/lkml/[email protected]/ Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") Fixes: fe9f801355f0 ("net: veth: clear GRO when clearing XDP even when down") Signed-off-by: Ignat Korchagin <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
1ce7d306 |
| 23-Feb-2024 |
Jakub Kicinski <[email protected]> |
veth: try harder when allocating queue memory
struct veth_rq is pretty large, 832B total without debug options enabled. Since commit under Fixes we try to pre-allocate enough queues for every possib
veth: try harder when allocating queue memory
struct veth_rq is pretty large, 832B total without debug options enabled. Since commit under Fixes we try to pre-allocate enough queues for every possible CPU. Miao Wang reports that this may lead to order-5 allocations which will fail in production.
Let the allocation fallback to vmalloc() and try harder. These are the same flags we pass to netdev queue allocation.
Reported-and-tested-by: Miao Wang <[email protected]> Fixes: 9d3684c24a52 ("veth: create by default nr_possible_cpus queues") Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
e353ea9c |
| 22-Feb-2024 |
Eric Dumazet <[email protected]> |
rtnetlink: prepare nla_put_iflink() to run under RCU
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future.
This patch prepares dev_get_iflink() and nla_pu
rtnetlink: prepare nla_put_iflink() to run under RCU
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future.
This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held.
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
fe9f8013 |
| 21-Feb-2024 |
Jakub Kicinski <[email protected]> |
net: veth: clear GRO when clearing XDP even when down
veth sets NETIF_F_GRO automatically when XDP is enabled, because both features use the same NAPI machinery.
The logic to clear NETIF_F_GRO sits
net: veth: clear GRO when clearing XDP even when down
veth sets NETIF_F_GRO automatically when XDP is enabled, because both features use the same NAPI machinery.
The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which is called both on ndo_stop and when XDP is turned off. To avoid the flag from being cleared when the device is brought down, the clearing is skipped when IFF_UP is not set. Bringing the device down should indeed not modify its features.
Unfortunately, this means that clearing is also skipped when XDP is disabled _while_ the device is down. And there's nothing on the open path to bring the device features back into sync. IOW if user enables XDP, disables it and then brings the device up we'll end up with a stray GRO flag set but no NAPI instances.
We don't depend on the GRO flag on the datapath, so the datapath won't crash. We will crash (or hang), however, next time features are sync'ed (either by user via ethtool or peer changing its config). The GRO flag will go away, and veth will try to disable the NAPIs. But the open path never created them since XDP was off, the GRO flag was a stray. If NAPI was initialized before we'll hang in napi_disable(). If it never was we'll crash trying to stop uninitialized hrtimer.
Move the GRO flag updates to the XDP enable / disable paths, instead of mixing them with the ndo_open / ndo_close paths.
Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") Reported-by: Thomas Gleixner <[email protected]> Reported-by: [email protected] Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5 |
|
| #
27accb3c |
| 12-Feb-2024 |
Lorenzo Bianconi <[email protected]> |
veth: rely on skb_pp_cow_data utility routine
Rely on skb_pp_cow_data utility routine and remove duplicated code.
Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Toke Hoiland-Jorgen
veth: rely on skb_pp_cow_data utility routine
Rely on skb_pp_cow_data utility routine and remove duplicated code.
Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Toke Hoiland-Jorgensen <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/029cc14cce41cb242ee7efdcf32acc81f1ce4e9f.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
0bef5120 |
| 12-Feb-2024 |
Eric Dumazet <[email protected]> |
net: add netdev_lockdep_set_classes() to virtual drivers
Based on a syzbot report, it appears many virtual drivers do not yet use netdev_lockdep_set_classes(), triggerring lockdep false positives.
net: add netdev_lockdep_set_classes() to virtual drivers
Based on a syzbot report, it appears many virtual drivers do not yet use netdev_lockdep_set_classes(), triggerring lockdep false positives.
WARNING: possible recursive locking detected 6.8.0-rc4-next-20240212-syzkaller #0 Not tainted
syz-executor.0/19016 is trying to acquire lock: ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340
but task is already holding lock: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 lock(_xmit_ETHER#2); lock(_xmit_ETHER#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
9 locks held by syz-executor.0/19016: #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x82c/0x1040 net/core/rtnetlink.c:6603 #1: ffffc90000a08c00 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x600 kernel/time/timer.c:1697 #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325 #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325
stack backtrace: CPU: 1 PID: 19016 Comm: syz-executor.0 Not tainted 6.8.0-rc4-next-20240212-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_deadlock kernel/locking/lockdep.c:3062 [inline] validate_chain+0x15c1/0x58e0 kernel/locking/lockdep.c:3856 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __netif_tx_lock include/linux/netdevice.h:4452 [inline] sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 iptunnel_xmit+0x540/0x9b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x20ee/0x2960 net/ipv4/ip_tunnel.c:831 erspan_xmit+0x9de/0x1460 net/ipv4/ip_gre.c:720 __netdev_start_xmit include/linux/netdevice.h:4989 [inline] netdev_start_xmit include/linux/netdevice.h:5003 [inline] xmit_one net/core/dev.c:3555 [inline] dev_hard_start_xmit+0x242/0x770 net/core/dev.c:3571 sch_direct_xmit+0x2b6/0x5f0 net/sched/sch_generic.c:342 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 igmpv3_send_cr net/ipv4/igmp.c:723 [inline] igmp_ifc_timer_expire+0xb71/0xd90 net/ipv4/igmp.c:813 call_timer_fn+0x17e/0x600 kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1751 [inline] __run_timers+0x621/0x830 kernel/time/timer.c:2038 run_timer_softirq+0x67/0xf0 kernel/time/timer.c:2051 __do_softirq+0x2bc/0x943 kernel/softirq.c:554 invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1076 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1076 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0010:resched_offsets_ok kernel/sched/core.c:10127 [inline] RIP: 0010:__might_resched+0x16f/0x780 kernel/sched/core.c:10142 Code: 00 4c 89 e8 48 c1 e8 03 48 ba 00 00 00 00 00 fc ff df 48 89 44 24 38 0f b6 04 10 84 c0 0f 85 87 04 00 00 41 8b 45 00 c1 e0 08 <01> d8 44 39 e0 0f 85 d6 00 00 00 44 89 64 24 1c 48 8d bc 24 a0 00 RSP: 0018:ffffc9000ee069e0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8880296a9e00 RDX: dffffc0000000000 RSI: ffff8880296a9e00 RDI: ffffffff8bfe8fa0 RBP: ffffc9000ee06b00 R08: ffffffff82326877 R09: 1ffff11002b5ad1b R10: dffffc0000000000 R11: ffffed1002b5ad1c R12: 0000000000000000 R13: ffff8880296aa23c R14: 000000000000062a R15: 1ffff92001dc0d44 down_write+0x19/0x50 kernel/locking/rwsem.c:1578 kernfs_activate fs/kernfs/dir.c:1403 [inline] kernfs_add_one+0x4af/0x8b0 fs/kernfs/dir.c:819 __kernfs_create_file+0x22e/0x2e0 fs/kernfs/file.c:1056 sysfs_add_file_mode_ns+0x24a/0x310 fs/sysfs/file.c:307 create_files fs/sysfs/group.c:64 [inline] internal_create_group+0x4f4/0xf20 fs/sysfs/group.c:152 internal_create_groups fs/sysfs/group.c:192 [inline] sysfs_create_groups+0x56/0x120 fs/sysfs/group.c:218 create_dir lib/kobject.c:78 [inline] kobject_add_internal+0x472/0x8d0 lib/kobject.c:240 kobject_add_varg lib/kobject.c:374 [inline] kobject_init_and_add+0x124/0x190 lib/kobject.c:457 netdev_queue_add_kobject net/core/net-sysfs.c:1706 [inline] netdev_queue_update_kobjects+0x1f3/0x480 net/core/net-sysfs.c:1758 register_queue_kobjects net/core/net-sysfs.c:1819 [inline] netdev_register_kobject+0x265/0x310 net/core/net-sysfs.c:2059 register_netdevice+0x1191/0x19c0 net/core/dev.c:10298 bond_newlink+0x3b/0x90 drivers/net/bonding/bond_netlink.c:576 rtnl_newlink_create net/core/rtnetlink.c:3506 [inline] __rtnl_newlink net/core/rtnetlink.c:3726 [inline] rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3739 rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6606 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3c/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 __sys_sendto+0x3a4/0x4f0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2199 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fc3fa87fa9c
Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5 |
|
| #
fca78379 |
| 05-Dec-2023 |
Larysa Zaremba <[email protected]> |
veth: Implement VLAN tag XDP hint
In order to test VLAN tag hint in hardware-independent selftests, implement newly added hint in veth driver.
Acked-by: Stanislav Fomichev <[email protected]> Signed-o
veth: Implement VLAN tag XDP hint
In order to test VLAN tag hint in hardware-independent selftests, implement newly added hint in veth driver.
Acked-by: Stanislav Fomichev <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| #
a61f46e1 |
| 04-Dec-2023 |
Lorenzo Bianconi <[email protected]> |
net: veth: fix packet segmentation in veth_convert_skb_to_xdp_buff
Based on the previous allocated packet, page_offset can be not null in veth_convert_skb_to_xdp_buff routine. Take into account page
net: veth: fix packet segmentation in veth_convert_skb_to_xdp_buff
Based on the previous allocated packet, page_offset can be not null in veth_convert_skb_to_xdp_buff routine. Take into account page fragment offset during the skb paged area copy in veth_convert_skb_to_xdp_buff().
Fixes: 2d0de67da51a ("net: veth: use newly added page pool API for veth with xdp") Signed-off-by: Lorenzo Bianconi <[email protected]> Reviewed-by: Yunsheng Lin <[email protected]> Link: https://lore.kernel.org/r/eddfe549e7e626870071930964ac3c38a1dc8068.1701702000.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc4, v6.7-rc3 |
|
| #
818ad9cc |
| 21-Nov-2023 |
Lorenzo Bianconi <[email protected]> |
net: veth: fix ethtool stats reporting
Fix a possible misalignment between page_pool stats and tx xdp_stats reported in veth_get_ethtool_stats routine. The issue can be reproduced configuring the ve
net: veth: fix ethtool stats reporting
Fix a possible misalignment between page_pool stats and tx xdp_stats reported in veth_get_ethtool_stats routine. The issue can be reproduced configuring the veth pair with the following tx/rx queues:
$ip link add v0 numtxqueues 2 numrxqueues 4 type veth peer name v1 \ numtxqueues 1 numrxqueues 1
and loading a simple XDP program on v0 that just returns XDP_PASS. In this case on v0 the page_pool stats overwrites tx xdp_stats for queue 1. Fix the issue incrementing pp_idx of dev->real_num_tx_queues * VETH_TQ_STATS_LEN since we always report xdp_stats for all tx queues in ethtool.
Fixes: 4fc418053ec7 ("net: veth: add page_pool stats") Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/c5b5d0485016836448453f12846c7c4ab75b094a.1700593593.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc2 |
|
| #
6f2684bf |
| 14-Nov-2023 |
Peilin Ye <[email protected]> |
veth: Use tstats per-CPU traffic counters
Currently veth devices use the lstats per-CPU traffic counters, which only cover TX traffic. veth_get_stats64() actually populates RX stats of a veth device
veth: Use tstats per-CPU traffic counters
Currently veth devices use the lstats per-CPU traffic counters, which only cover TX traffic. veth_get_stats64() actually populates RX stats of a veth device from its peer's TX counters, based on the assumption that a veth device can _only_ receive packets from its peer, which is no longer true:
For example, recent CNIs (like Cilium) can use the bpf_redirect_peer() BPF helper to redirect traffic from NIC's tc ingress to veth's tc ingress (in a different netns), skipping veth's peer device. Unfortunately, this kind of traffic isn't currently accounted for in veth's RX stats.
In preparation for the fix, use tstats (instead of lstats) to maintain both RX and TX counters for each veth device. We'll use RX counters for bpf_redirect_peer() traffic, and keep using TX counters for the usual "peer-to-peer" traffic. In veth_get_stats64(), calculate RX stats by _adding_ RX count to peer's TX count, in order to cover both kinds of traffic.
veth_stats_rx() might need a name change (perhaps to "veth_stats_xdp()") for less confusion, but let's leave it to another patch to keep the fix minimal.
Signed-off-by: Peilin Ye <[email protected]> Co-developed-by: Daniel Borkmann <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
show more ...
|
| #
34d21de9 |
| 14-Nov-2023 |
Daniel Borkmann <[email protected]> |
net: Move {l,t,d}stats allocation to core and convert veth & vrf
Move {l,t,d}stats allocation to the core and let netdevs pick the stats type they need. That way the driver doesn't have to bother wi
net: Move {l,t,d}stats allocation to core and convert veth & vrf
Move {l,t,d}stats allocation to the core and let netdevs pick the stats type they need. That way the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc) - all happening in the core.
Co-developed-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Cc: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
2d0de67d |
| 20-Oct-2023 |
Yunsheng Lin <[email protected]> |
net: veth: use newly added page pool API for veth with xdp
Use page_pool_alloc() API to allocate memory with least memory utilization and performance penalty.
Signed-off-by: Yunsheng Lin <linyunshe
net: veth: use newly added page pool API for veth with xdp
Use page_pool_alloc() API to allocate memory with least memory utilization and performance penalty.
Signed-off-by: Yunsheng Lin <[email protected]> CC: Lorenzo Bianconi <[email protected]> CC: Alexander Duyck <[email protected]> CC: Liang Chen <[email protected]> CC: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2 |
|
| #
7a6102aa |
| 11-Sep-2023 |
Toke Høiland-Jørgensen <[email protected]> |
veth: Update XDP feature set when bringing up device
There's an early return in veth_set_features() if the device is in a down state, which leads to the XDP feature flags not being updated when enab
veth: Update XDP feature set when bringing up device
There's an early return in veth_set_features() if the device is in a down state, which leads to the XDP feature flags not being updated when enabling GRO while the device is down. Which in turn leads to XDP_REDIRECT not working, because the redirect code now checks the flags.
Fix this by updating the feature flags after bringing the device up.
Before this patch:
NETDEV_XDP_ACT_BASIC: yes NETDEV_XDP_ACT_REDIRECT: yes NETDEV_XDP_ACT_NDO_XMIT: no NETDEV_XDP_ACT_XSK_ZEROCOPY: no NETDEV_XDP_ACT_HW_OFFLOAD: no NETDEV_XDP_ACT_RX_SG: yes NETDEV_XDP_ACT_NDO_XMIT_SG: no
After this patch:
NETDEV_XDP_ACT_BASIC: yes NETDEV_XDP_ACT_REDIRECT: yes NETDEV_XDP_ACT_NDO_XMIT: yes NETDEV_XDP_ACT_XSK_ZEROCOPY: no NETDEV_XDP_ACT_HW_OFFLOAD: no NETDEV_XDP_ACT_RX_SG: yes NETDEV_XDP_ACT_NDO_XMIT_SG: yes
Fixes: fccca038f300 ("veth: take into account device reconfiguration for xdp_features flag") Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|