|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
04efcee6 |
| 04-Apr-2025 |
Stanislav Fomichev <[email protected]> |
net: hold instance lock during NETDEV_CHANGE
Cosmin reports an issue with ipv6_add_dev being called from NETDEV_CHANGE notifier:
[ 3455.008776] ? ipv6_add_dev+0x370/0x620 [ 3455.010097] ipv6_find
net: hold instance lock during NETDEV_CHANGE
Cosmin reports an issue with ipv6_add_dev being called from NETDEV_CHANGE notifier:
[ 3455.008776] ? ipv6_add_dev+0x370/0x620 [ 3455.010097] ipv6_find_idev+0x96/0xe0 [ 3455.010725] addrconf_add_dev+0x1e/0xa0 [ 3455.011382] addrconf_init_auto_addrs+0xb0/0x720 [ 3455.013537] addrconf_notify+0x35f/0x8d0 [ 3455.014214] notifier_call_chain+0x38/0xf0 [ 3455.014903] netdev_state_change+0x65/0x90 [ 3455.015586] linkwatch_do_dev+0x5a/0x70 [ 3455.016238] rtnl_getlink+0x241/0x3e0 [ 3455.019046] rtnetlink_rcv_msg+0x177/0x5e0
Similarly, linkwatch might get to ipv6_add_dev without ops lock: [ 3456.656261] ? ipv6_add_dev+0x370/0x620 [ 3456.660039] ipv6_find_idev+0x96/0xe0 [ 3456.660445] addrconf_add_dev+0x1e/0xa0 [ 3456.660861] addrconf_init_auto_addrs+0xb0/0x720 [ 3456.661803] addrconf_notify+0x35f/0x8d0 [ 3456.662236] notifier_call_chain+0x38/0xf0 [ 3456.662676] netdev_state_change+0x65/0x90 [ 3456.663112] linkwatch_do_dev+0x5a/0x70 [ 3456.663529] __linkwatch_run_queue+0xeb/0x200 [ 3456.663990] linkwatch_event+0x21/0x30 [ 3456.664399] process_one_work+0x211/0x610 [ 3456.664828] worker_thread+0x1cc/0x380 [ 3456.665691] kthread+0xf4/0x210
Reclassify NETDEV_CHANGE as a notifier that consistently runs under the instance lock.
Link: https://lore.kernel.org/netdev/[email protected]/ Reported-by: Cosmin Ratiu <[email protected]> Tested-by: Cosmin Ratiu <[email protected]> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
79c61899 |
| 04-Feb-2025 |
Antoine Tenart <[email protected]> |
net-sysfs: remove rtnl_trylock from device attributes
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths takin
net-sysfs: remove rtnl_trylock from device attributes
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths taking the rtnl lock after the sysfs one (actually kn->active refcount) use rtnl_trylock and return early (using restart_syscall)[3], which can make syscalls to spin for a long time when there is contention on the rtnl lock[4].
There are not many possibilities to improve the above: - Rework the entire net/ locking logic. - Invert two locks in one of the paths — not possible.
But here it's actually possible to drop one of the locks safely: the kernfs_node refcount. More details in the code itself, which comes with lots of comments.
Note that we check the device is alive in the added sysfs_rtnl_lock helper to disallow sysfs operations to run after device dismantle has started. This also help keeping the same behavior as before. Because of this calls to dev_isalive in sysfs ops were removed.
[1] https://lore.kernel.org/netdev/[email protected]/ [2] https://lore.kernel.org/netdev/[email protected]/ [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/ [4] https://lore.kernel.org/all/[email protected]/T/
Signed-off-by: Antoine Tenart <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6 |
|
| #
7bd72a4a |
| 04-Jan-2025 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Add rtnl_net_lock_killable().
rtnl_lock_killable() is used only in register_netdev() and will be converted to per-netns RTNL.
Let's unexport it and add the corresponding helper.
Signed-
rtnetlink: Add rtnl_net_lock_killable().
rtnl_lock_killable() is used only in register_netdev() and will be converted to per-netns RTNL.
Let's unexport it and add the corresponding helper.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
53a6d891 |
| 09-Dec-2024 |
Eric Dumazet <[email protected]> |
rtnetlink: remove pad field in ndo_fdb_dump_context
I chose to remove this field in a separate patch to ease potential bisection, in case one ndo_fdb_dump() is still using the old way (cb->args[2] i
rtnetlink: remove pad field in ndo_fdb_dump_context
I chose to remove this field in a separate patch to ease potential bisection, in case one ndo_fdb_dump() is still using the old way (cb->args[2] instead of ctx->fdb_idx)
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
53970a05 |
| 09-Dec-2024 |
Eric Dumazet <[email protected]> |
rtnetlink: switch rtnl_fdb_dump() to for_each_netdev_dump()
This is the last netdev iterator still using net->dev_index_head[].
Convert to modern for_each_netdev_dump() for better scalability, and
rtnetlink: switch rtnl_fdb_dump() to for_each_netdev_dump()
This is the last netdev iterator still using net->dev_index_head[].
Convert to modern for_each_netdev_dump() for better scalability, and use common patterns in our stack.
Following patch in this series removes the pad field in struct ndo_fdb_dump_context.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
be325f08 |
| 09-Dec-2024 |
Eric Dumazet <[email protected]> |
rtnetlink: add ndo_fdb_dump_context
rtnl_fdb_dump() and various ndo_fdb_dump() helpers share a hidden layout of cb->ctx.
Before switching rtnl_fdb_dump() to for_each_netdev_dump() in the following
rtnetlink: add ndo_fdb_dump_context
rtnl_fdb_dump() and various ndo_fdb_dump() helpers share a hidden layout of cb->ctx.
Before switching rtnl_fdb_dump() to for_each_netdev_dump() in the following patch, make this more explicit.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2 |
|
| #
3f330db3 |
| 05-Dec-2024 |
Jakub Kicinski <[email protected]> |
net: reformat kdoc return statements
kernel-doc -Wall warns about missing Return: statement for non-void functions. We have a number of kdocs in our headers which are missing the colon, IOW they use
net: reformat kdoc return statements
kernel-doc -Wall warns about missing Return: statement for non-void functions. We have a number of kdocs in our headers which are missing the colon, IOW they use * Return some value or * Returns some value
Having the colon makes some sense, it should help kdoc parser avoid false positives. So add them. This is mostly done with a sed script, and removing the unnecessary cases (mostly the comments which aren't kdoc).
Acked-by: Johannes Berg <[email protected]> Acked-by: Richard Cochran <[email protected]> Acked-by: Sergey Ryazanov <[email protected]> Reviewed-by: Edward Cree <[email protected]> Acked-by: Alexandra Winter <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
d1c81818 |
| 21-Oct-2024 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Define rtnl_net_trylock().
We will need the per-netns version of rtnl_trylock().
rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock() successfully holds RTNL.
When RTNL i
rtnetlink: Define rtnl_net_trylock().
We will need the per-netns version of rtnl_trylock().
rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock() successfully holds RTNL.
When RTNL is removed, we will use mutex_trylock() for per-netns RTNL.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
9cb7e40d |
| 21-Oct-2024 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Make per-netns RTNL dereference helpers to macro.
When CONFIG_DEBUG_NET_SMALL_RTNL is off, rtnl_net_dereference() is the static inline wrapper of rtnl_dereference() returning a plain (voi
rtnetlink: Make per-netns RTNL dereference helpers to macro.
When CONFIG_DEBUG_NET_SMALL_RTNL is off, rtnl_net_dereference() is the static inline wrapper of rtnl_dereference() returning a plain (void *) pointer to make sure net is always evaluated as requested in [0].
But, it makes sparse complain [1] when the pointer has __rcu annotation:
net/ipv4/devinet.c:674:47: sparse: warning: incorrect type in argument 2 (different address spaces) net/ipv4/devinet.c:674:47: sparse: expected void *p net/ipv4/devinet.c:674:47: sparse: got struct in_ifaddr [noderef] __rcu *
Also, if we evaluate net as (void *) in a macro, then the compiler in turn fails to build due to -Werror=unused-value.
#define rtnl_net_dereference(net, p) \ ({ \ (void *)net; \ rtnl_dereference(p); \ })
net/ipv4/devinet.c: In function ‘inet_rtm_deladdr’: ./include/linux/rtnetlink.h:154:17: error: statement with no effect [-Werror=unused-value] 154 | (void *)net; \ net/ipv4/devinet.c:674:21: note: in expansion of macro ‘rtnl_net_dereference’ 674 | (ifa = rtnl_net_dereference(net, *ifap)) != NULL; | ^~~~~~~~~~~~~~~~~~~~
Let's go back to the original simplest macro.
Note that checkpatch complains about this approach, but it's one-shot and less noisy than the other two.
WARNING: Argument 'net' is not used in function-like macro #76: FILE: include/linux/rtnetlink.h:142: +#define rtnl_net_dereference(net, p) \ + rtnl_dereference(p)
Fixes: 844e5e7e656d ("rtnetlink: Add assertion helpers for per-netns RTNL.") Link: https://lore.kernel.org/netdev/[email protected]/ [0] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ [1] Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
844e5e7e |
| 04-Oct-2024 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Add assertion helpers for per-netns RTNL.
Once an RTNL scope is converted with rtnl_net_lock(), we will replace RTNL helper functions inside the scope with the following per-netns alterna
rtnetlink: Add assertion helpers for per-netns RTNL.
Once an RTNL scope is converted with rtnl_net_lock(), we will replace RTNL helper functions inside the scope with the following per-netns alternatives:
ASSERT_RTNL() -> ASSERT_RTNL_NET(net) rcu_dereference_rtnl(p) -> rcu_dereference_rtnl_net(net, p)
Note that the per-netns helpers are equivalent to the conventional helpers unless CONFIG_DEBUG_NET_SMALL_RTNL is enabled.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
76aed953 |
| 04-Oct-2024 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Add per-netns RTNL.
The goal is to break RTNL down into per-netns mutex.
This patch adds per-netns mutex and its helper functions, rtnl_net_lock() and rtnl_net_unlock().
rtnl_net_lock()
rtnetlink: Add per-netns RTNL.
The goal is to break RTNL down into per-netns mutex.
This patch adds per-netns mutex and its helper functions, rtnl_net_lock() and rtnl_net_unlock().
rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and rtnl_net_unlock() releases them.
We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes rtnl_lock() in rtnl_net_lock().
When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(), and its locking order is defined by rtnl_net_lock_cmp_fn() as follows:
1. init_net is first 2. netns address ascending order
Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL with LOCKDEP so that we can carefully add the extra mutex without slowing down RTNL operations during conversion.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
ec763c23 |
| 04-Oct-2024 |
Kuniyuki Iwashima <[email protected]> |
Revert "rtnetlink: add guard for RTNL"
This reverts commit 464eb03c4a7cfb32cb3324249193cf6bb5b35152.
Once we have a per-netns RTNL, we won't use guard(rtnl).
Also, there's no users for now.
$ g
Revert "rtnetlink: add guard for RTNL"
This reverts commit 464eb03c4a7cfb32cb3324249193cf6bb5b35152.
Once we have a per-netns RTNL, we won't use guard(rtnl).
Also, there's no users for now.
$ grep -rnI "guard(rtnl" || true $
Suggested-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/netdev/CANn89i+KoYzUH+VPLdGmLABYf5y4TW0hrM4UAeQQJ9AREty0iw@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
464eb03c |
| 28-Mar-2024 |
Johannes Berg <[email protected]> |
rtnetlink: add guard for RTNL
The new guard/scoped_gard can be useful for the RTNL as well, so add a guard definition for it. It gets used like
{ guard(rtnl)(); // RTNL held until end of blo
rtnetlink: add guard for RTNL
The new guard/scoped_gard can be useful for the RTNL as well, so add a guard definition for it. It gets used like
{ guard(rtnl)(); // RTNL held until end of block }
or
scoped_guard(rtnl) { // RTNL held in this block }
as with any other guard/scoped_guard.
Signed-off-by: Johannes Berg <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5 |
|
| #
6a2968ee |
| 13-Feb-2024 |
Eric Dumazet <[email protected]> |
net: add netdev_set_operstate() helper
dev_base_lock is going away, add netdev_set_operstate() helper so that hsr does not have to know core internals.
Remove dev_base_lock acquisition from rfc2863
net: add netdev_set_operstate() helper
dev_base_lock is going away, add netdev_set_operstate() helper so that hsr does not have to know core internals.
Remove dev_base_lock acquisition from rfc2863_policy()
v3: use an "unsigned int" for dev->operstate, so that try_cmpxchg() can work on all arches. ( https://lore.kernel.org/oe-kbuild-all/[email protected]/ )
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3 |
|
| #
ffabe98c |
| 02-Feb-2024 |
Eric Dumazet <[email protected]> |
net: make dev_unreg_count global
We can use a global dev_unreg_count counter instead of a per netns one.
As a bonus we can factorize the changes done on it for bulk device removals.
Signed-off-by:
net: make dev_unreg_count global
We can use a global dev_unreg_count counter instead of a per netns one.
As a bonus we can factorize the changes done on it for bulk device removals.
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
32da0f00 |
| 15-Dec-2023 |
Jamal Hadi Salim <[email protected]> |
net: rtnl: introduce rcu_replace_pointer_rtnl
Introduce the rcu_replace_pointer_rtnl helper to lockdep check rtnl lock rcu replacements, alongside the already existing helpers.
This is a quality of
net: rtnl: introduce rcu_replace_pointer_rtnl
Introduce the rcu_replace_pointer_rtnl helper to lockdep check rtnl lock rcu replacements, alongside the already existing helpers.
This is a quality of life helper so instead of using: rcu_replace_pointer(rp, p, lockdep_rtnl_is_held()) .. or the open coded.. rtnl_dereference() / rcu_assign_pointer() .. or the lazy check version .. rcu_replace_pointer(rp, p, 1) Use: rcu_replace_pointer_rtnl(rp, p)
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: Pedro Tammela <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5 |
|
| #
ddb6b284 |
| 08-Dec-2023 |
Pedro Tammela <[email protected]> |
rtnl: add helper to send if skb is not null
This is a convenience helper for routines handling conditional rtnl events, that is code that might send a notification depending on rtnl_has_listeners/rt
rtnl: add helper to send if skb is not null
This is a convenience helper for routines handling conditional rtnl events, that is code that might send a notification depending on rtnl_has_listeners/rtnl_notify_needed.
Instead of: if (skb) rtnetlink_send(...)
Use: rtnetlink_maybe_send(...)
Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pedro Tammela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
8439109b |
| 08-Dec-2023 |
Victor Nogueira <[email protected]> |
rtnl: add helper to check if a notification is needed
Building on the rtnl_has_listeners helper, add the rtnl_notify_needed helper to check if we can bail out early in the notification routines.
Re
rtnl: add helper to check if a notification is needed
Building on the rtnl_has_listeners helper, add the rtnl_notify_needed helper to check if we can bail out early in the notification routines.
Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: Pedro Tammela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
c5e2a973 |
| 08-Dec-2023 |
Jamal Hadi Salim <[email protected]> |
rtnl: add helper to check if rtnl group has listeners
As of today, rtnl code creates a new skb and unconditionally fills and broadcasts it to the relevant group. For most operations this is okay and
rtnl: add helper to check if rtnl group has listeners
As of today, rtnl code creates a new skb and unconditionally fills and broadcasts it to the relevant group. For most operations this is okay and doesn't waste resources in general.
When operations are done without the rtnl_lock, as in tc-flower, such skb allocation, message fill and no-op broadcasting can happen in all cores of the system, which contributes to system pressure and wastes precious cpu cycles when no one will receive the built message.
Introduce this helper so rtnetlink operations can simply check if someone is listening and then proceed if necessary.
Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: Pedro Tammela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7 |
|
| #
59d3efd2 |
| 11-Apr-2023 |
Martin Willi <[email protected]> |
rtnetlink: Restore RTM_NEW/DELLINK notification behavior
The commits referenced below allows userspace to use the NLM_F_ECHO flag for RTM_NEW/DELLINK operations to receive unicast notifications for
rtnetlink: Restore RTM_NEW/DELLINK notification behavior
The commits referenced below allows userspace to use the NLM_F_ECHO flag for RTM_NEW/DELLINK operations to receive unicast notifications for the affected link. Prior to these changes, applications may have relied on multicast notifications to learn the same information without specifying the NLM_F_ECHO flag.
For such applications, the mentioned commits changed the behavior for requests not using NLM_F_ECHO. Multicast notifications are still received, but now use the portid of the requester and the sequence number of the request instead of zero values used previously. For the application, this message may be unexpected and likely handled as a response to the NLM_F_ACKed request, especially if it uses the same socket to handle requests and notifications.
To fix existing applications relying on the old notification behavior, set the portid and sequence number in the notification only if the request included the NLM_F_ECHO flag. This restores the old behavior for applications not using it, but allows unicasted notifications for others.
Fixes: f3a63cce1b4f ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link") Fixes: d88e136cab37 ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_create") Signed-off-by: Martin Willi <[email protected]> Acked-by: Guillaume Nault <[email protected]> Acked-by: Hangbin Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
fe602c87 |
| 21-Mar-2023 |
Eric Dumazet <[email protected]> |
net: remove rcu_dereference_bh_rtnl()
This helper is no longer used in the tree.
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Revision tags: v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3 |
|
| #
1d997f10 |
| 28-Oct-2022 |
Hangbin Liu <[email protected]> |
rtnetlink: pass netlink message header and portid to rtnl_configure_link()
This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add
rtnetlink: pass netlink message header and portid to rtnl_configure_link()
This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add the parameters so we can use them in the last call rtnl_notify(), and notify the userspace about the new link info if NLM_F_ECHO flag is set.
- rtnl_configure_link() - __dev_notify_flags() - rtmsg_ifinfo() - rtmsg_ifinfo_event() - rtmsg_ifinfo_build_skb() - rtmsg_ifinfo_send() - rtnl_notify()
Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub suggested.
Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3 |
|
| #
2f1e85b1 |
| 15-Apr-2022 |
Tonghao Zhang <[email protected]> |
net: sched: use queue_mapping to pick tx queue
This patch fixes issue: * If we install tc filters with act_skbedit in clsact hook. It doesn't work, because netdev_core_pick_tx() overwrites queue
net: sched: use queue_mapping to pick tx queue
This patch fixes issue: * If we install tc filters with act_skbedit in clsact hook. It doesn't work, because netdev_core_pick_tx() overwrites queue_mapping.
$ tc filter ... action skbedit queue_mapping 1
And this patch is useful: * We can use FQ + EDT to implement efficient policies. Tx queues are picked by xps, ndo_select_queue of netdev driver, or skb hash in netdev_core_pick_tx(). In fact, the netdev driver, and skb hash are _not_ under control. xps uses the CPUs map to select Tx queues, but we can't figure out which task_struct of pod/containter running on this cpu in most case. We can use clsact filters to classify one pod/container traffic to one Tx queue. Why ?
In containter networking environment, there are two kinds of pod/ containter/net-namespace. One kind (e.g. P1, P2), the high throughput is key in these applications. But avoid running out of network resource, the outbound traffic of these pods is limited, using or sharing one dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods (e.g. Pn), the low latency of data access is key. And the traffic is not limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc. This choice provides two benefits. First, contention on the HTB/FQ Qdisc lock is significantly reduced since fewer CPUs contend for the same queue. More importantly, Qdisc contention can be eliminated completely if each CPU has its own FIFO Qdisc for the second kind of pods.
There must be a mechanism in place to support classifying traffic based on pods/container to different Tx queues. Note that clsact is outside of Qdisc while Qdisc can run a classifier to select a sub-queue under the lock.
In general recording the decision in the skb seems a little heavy handed. This patch introduces a per-CPU variable, suggested by Eric.
The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit(). - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag is set in qdisc->enqueue() though tx queue has been selected in netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared firstly in __dev_queue_xmit(), is useful: - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy: For example, eth0, macvlan in pod, which root Qdisc install skbedit queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping because there is no filters in clsact or tx Qdisc of this netdev. Same action taked in eth0, ixgbe in Host. - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it in __dev_queue_xmit when processing next packets.
For performance reasons, use the static key. If user does not config the NET_EGRESS, the patch will not be compiled.
+----+ +----+ +----+ | P1 | | P2 | | Pn | +----+ +----+ +----+ | | | +-----------+-----------+ | | clsact/skbedit | MQ v +-----------+-----------+ | q0 | q1 | qn v v v HTB/FQ HTB/FQ ... FIFO
Cc: Jamal Hadi Salim <[email protected]> Cc: Cong Wang <[email protected]> Cc: Jiri Pirko <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jonathan Lemon <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Talal Ahmad <[email protected]> Cc: Kevin Hao <[email protected]> Cc: Ilias Apalodimas <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Antoine Tenart <[email protected]> Cc: Wei Wang <[email protected]> Cc: Arnd Bergmann <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7 |
|
| #
5fd0b838 |
| 02-Mar-2022 |
Petr Machata <[email protected]> |
net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS
The offloaded HW stats are designed to allow per-netdevice enablement and disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTAT
net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS
The offloaded HW stats are designed to allow per-netdevice enablement and disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS, which should be carried by the RTM_SETSTATS message, and expresses a desire to toggle L3 offload xstats on or off.
As part of the above, add an exported function rtnl_offload_xstats_notify() that drivers can use when they have installed or deinstalled the counters backing the HW stats.
At this point, it is possible to enable, disable and query L3 offload xstats on netdevices. (However there is no driver actually implementing these.)
Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6 |
|
| #
3a7d0d07 |
| 24-Sep-2018 |
Vlad Buslov <[email protected]> |
net: sched: extend Qdisc with rcu
Currently, Qdisc API functions assume that users have rtnl lock taken. To implement rtnl unlocked classifiers update interface, Qdisc API must be extended with func
net: sched: extend Qdisc with rcu
Currently, Qdisc API functions assume that users have rtnl lock taken. To implement rtnl unlocked classifiers update interface, Qdisc API must be extended with functions that do not require rtnl lock.
Extend Qdisc structure with rcu. Implement special version of put function qdisc_put_unlocked() that is called without rtnl lock taken. This function only takes rtnl lock if Qdisc reference counter reached zero and is intended to be used as optimization.
Signed-off-by: Vlad Buslov <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|