|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3 |
|
| #
4798cfa2 |
| 15-Apr-2025 |
Jakub Kicinski <[email protected]> |
net: don't try to ops lock uninitialized devs
We need to be careful when operating on dev while in rtnl_create_link(). Some devices (vxlan) initialize netdev_ops in ->newlink, so later on. Avoid usi
net: don't try to ops lock uninitialized devs
We need to be careful when operating on dev while in rtnl_create_link(). Some devices (vxlan) initialize netdev_ops in ->newlink, so later on. Avoid using netdev_lock_ops(), the device isn't registered so we cannot legally call its ops or generate any notifications for it.
netdev_ops_assert_locked_or_invisible() is safe to use, it checks registration status first.
Reported-by: [email protected] Fixes: 04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE") Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc2 |
|
| #
445e99bd |
| 07-Apr-2025 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Fix bad unlock balance in do_setlink().
When validate_linkmsg() fails in do_setlink(), we jump to the errout label and calls netdev_unlock_ops() even though we have not called netdev_lock
rtnetlink: Fix bad unlock balance in do_setlink().
When validate_linkmsg() fails in do_setlink(), we jump to the errout label and calls netdev_unlock_ops() even though we have not called netdev_lock_ops() as reported by syzbot. [0]
Let's return an error directly in such a case.
[0] WARNING: bad unlock balance detected! 6.14.0-syzkaller-12504-g8bc251e5d874 #0 Not tainted
syz-executor814/5834 is trying to release lock (&dev_instance_lock_key) at: [<ffffffff89f41f56>] netdev_unlock include/linux/netdevice.h:2756 [inline] [<ffffffff89f41f56>] netdev_unlock_ops include/net/netdev_lock.h:48 [inline] [<ffffffff89f41f56>] do_setlink+0xc26/0x43a0 net/core/rtnetlink.c:3406 but there are no more locks to release!
other info that might help us debug this: 1 lock held by syz-executor814/5834: #0: ffffffff900fc408 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline] #0: ffffffff900fc408 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline] #0: ffffffff900fc408 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0xd68/0x1fe0 net/core/rtnetlink.c:4064
stack backtrace: CPU: 0 UID: 0 PID: 5834 Comm: syz-executor814 Not tainted 6.14.0-syzkaller-12504-g8bc251e5d874 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_unlock_imbalance_bug+0x185/0x1a0 kernel/locking/lockdep.c:5296 __lock_release kernel/locking/lockdep.c:5535 [inline] lock_release+0x1ed/0x3e0 kernel/locking/lockdep.c:5887 __mutex_unlock_slowpath+0xee/0x800 kernel/locking/mutex.c:907 netdev_unlock include/linux/netdevice.h:2756 [inline] netdev_unlock_ops include/net/netdev_lock.h:48 [inline] do_setlink+0xc26/0x43a0 net/core/rtnetlink.c:3406 rtnl_group_changelink net/core/rtnetlink.c:3783 [inline] __rtnl_newlink net/core/rtnetlink.c:3937 [inline] rtnl_newlink+0x1619/0x1fe0 net/core/rtnetlink.c:4065 rtnetlink_rcv_msg+0x80f/0xd70 net/core/rtnetlink.c:6955 netlink_rcv_skb+0x208/0x480 net/netlink/af_netlink.c:2534 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x7f8/0x9a0 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x8c3/0xcd0 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:727 ____sys_sendmsg+0x523/0x860 net/socket.c:2566 ___sys_sendmsg net/socket.c:2620 [inline] __sys_sendmsg+0x271/0x360 net/socket.c:2652 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f8427b614a9 Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff9b59f3a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fff9b59f578 RCX: 00007f8427b614a9 RDX: 0000000000000000 RSI: 0000200000000300 RDI: 0000000000000004 RBP: 00007f8427bd4610 R08: 000000000000000c R09: 00007fff9b59f578 R10: 000000000000001b R11: 0000000000000246 R12: 0000000000000001 R13:
Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP") Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=45016fe295243a7882d3 Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc1 |
|
| #
04efcee6 |
| 04-Apr-2025 |
Stanislav Fomichev <[email protected]> |
net: hold instance lock during NETDEV_CHANGE
Cosmin reports an issue with ipv6_add_dev being called from NETDEV_CHANGE notifier:
[ 3455.008776] ? ipv6_add_dev+0x370/0x620 [ 3455.010097] ipv6_find
net: hold instance lock during NETDEV_CHANGE
Cosmin reports an issue with ipv6_add_dev being called from NETDEV_CHANGE notifier:
[ 3455.008776] ? ipv6_add_dev+0x370/0x620 [ 3455.010097] ipv6_find_idev+0x96/0xe0 [ 3455.010725] addrconf_add_dev+0x1e/0xa0 [ 3455.011382] addrconf_init_auto_addrs+0xb0/0x720 [ 3455.013537] addrconf_notify+0x35f/0x8d0 [ 3455.014214] notifier_call_chain+0x38/0xf0 [ 3455.014903] netdev_state_change+0x65/0x90 [ 3455.015586] linkwatch_do_dev+0x5a/0x70 [ 3455.016238] rtnl_getlink+0x241/0x3e0 [ 3455.019046] rtnetlink_rcv_msg+0x177/0x5e0
Similarly, linkwatch might get to ipv6_add_dev without ops lock: [ 3456.656261] ? ipv6_add_dev+0x370/0x620 [ 3456.660039] ipv6_find_idev+0x96/0xe0 [ 3456.660445] addrconf_add_dev+0x1e/0xa0 [ 3456.660861] addrconf_init_auto_addrs+0xb0/0x720 [ 3456.661803] addrconf_notify+0x35f/0x8d0 [ 3456.662236] notifier_call_chain+0x38/0xf0 [ 3456.662676] netdev_state_change+0x65/0x90 [ 3456.663112] linkwatch_do_dev+0x5a/0x70 [ 3456.663529] __linkwatch_run_queue+0xeb/0x200 [ 3456.663990] linkwatch_event+0x21/0x30 [ 3456.664399] process_one_work+0x211/0x610 [ 3456.664828] worker_thread+0x1cc/0x380 [ 3456.665691] kthread+0xf4/0x210
Reclassify NETDEV_CHANGE as a notifier that consistently runs under the instance lock.
Link: https://lore.kernel.org/netdev/[email protected]/ Reported-by: Cosmin Ratiu <[email protected]> Tested-by: Cosmin Ratiu <[email protected]> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
4c975fd7 |
| 01-Apr-2025 |
Stanislav Fomichev <[email protected]> |
net: hold instance lock during NETDEV_REGISTER/UP
Callers of inetdev_init can come from several places with inconsistent expectation about netdev instance lock. Grab instance lock during REGISTER (p
net: hold instance lock during NETDEV_REGISTER/UP
Callers of inetdev_init can come from several places with inconsistent expectation about netdev instance lock. Grab instance lock during REGISTER (plus UP). Also solve the inconsistency with UNREGISTER where it was locked only during move netns path.
WARNING: CPU: 10 PID: 1479 at ./include/net/netdev_lock.h:54 __netdev_update_features+0x65f/0xca0 __warn+0x81/0x180 __netdev_update_features+0x65f/0xca0 report_bug+0x156/0x180 handle_bug+0x4f/0x90 exc_invalid_op+0x13/0x60 asm_exc_invalid_op+0x16/0x20 __netdev_update_features+0x65f/0xca0 netif_disable_lro+0x30/0x1d0 inetdev_init+0x12f/0x1f0 inetdev_event+0x48b/0x870 notifier_call_chain+0x38/0xf0 register_netdevice+0x741/0x8b0 register_netdev+0x1f/0x40 mlx5e_probe+0x4e3/0x8e0 [mlx5_core] auxiliary_bus_probe+0x3f/0x90 really_probe+0xc3/0x3a0 __driver_probe_device+0x80/0x150 driver_probe_device+0x1f/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xb4/0x1c0 bus_probe_device+0x91/0xa0 device_add+0x657/0x870
Reviewed-by: Jakub Kicinski <[email protected]> Reported-by: Cosmin Ratiu <[email protected]> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
23f00807 |
| 25-Mar-2025 |
Mark Zhang <[email protected]> |
rtnetlink: Allocate vfinfo size for VF GUIDs when supported
Commit 30aad41721e0 ("net/core: Add support for getting VF GUIDs") added support for getting VF port and node GUIDs in netlink ifinfo mess
rtnetlink: Allocate vfinfo size for VF GUIDs when supported
Commit 30aad41721e0 ("net/core: Add support for getting VF GUIDs") added support for getting VF port and node GUIDs in netlink ifinfo messages, but their size was not taken into consideration in the function that allocates the netlink message, causing the following warning when a netlink message is filled with many VF port and node GUIDs: # echo 64 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_numvfs # ip link show dev ib0 RTNETLINK answers: Message too long Cannot send link get request: Message too long
Kernel warning:
------------[ cut here ]------------ WARNING: CPU: 2 PID: 1930 at net/core/rtnetlink.c:4151 rtnl_getlink+0x586/0x5a0 Modules linked in: xt_conntrack xt_MASQUERADE nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay mlx5_ib macsec mlx5_core tls rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm iw_cm ib_ipoib fuse ib_cm ib_core CPU: 2 UID: 0 PID: 1930 Comm: ip Not tainted 6.14.0-rc2+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:rtnl_getlink+0x586/0x5a0 Code: cb 82 e8 3d af 0a 00 4d 85 ff 0f 84 08 ff ff ff 4c 89 ff 41 be ea ff ff ff e8 66 63 5b ff 49 c7 07 80 4f cb 82 e9 36 fc ff ff <0f> 0b e9 16 fe ff ff e8 de a0 56 00 66 66 2e 0f 1f 84 00 00 00 00 RSP: 0018:ffff888113557348 EFLAGS: 00010246 RAX: 00000000ffffffa6 RBX: ffff88817e87aa34 RCX: dffffc0000000000 RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff88817e87afb8 RBP: 0000000000000009 R08: ffffffff821f44aa R09: 0000000000000000 R10: ffff8881260f79a8 R11: ffff88817e87af00 R12: ffff88817e87aa00 R13: ffffffff8563d300 R14: 00000000ffffffa6 R15: 00000000ffffffff FS: 00007f63a5dbf280(0000) GS:ffff88881ee00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f63a5ba4493 CR3: 00000001700fe002 CR4: 0000000000772eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0xa5/0x230 ? rtnl_getlink+0x586/0x5a0 ? report_bug+0x22d/0x240 ? handle_bug+0x53/0xa0 ? exc_invalid_op+0x14/0x50 ? asm_exc_invalid_op+0x16/0x20 ? skb_trim+0x6a/0x80 ? rtnl_getlink+0x586/0x5a0 ? __pfx_rtnl_getlink+0x10/0x10 ? rtnetlink_rcv_msg+0x1e5/0x860 ? __pfx___mutex_lock+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? __pfx_lock_acquire+0x10/0x10 ? stack_trace_save+0x90/0xd0 ? filter_irq_stacks+0x1d/0x70 ? kasan_save_stack+0x30/0x40 ? kasan_save_stack+0x20/0x40 ? kasan_save_track+0x10/0x30 rtnetlink_rcv_msg+0x21c/0x860 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e ? __pfx_rtnetlink_rcv_msg+0x10/0x10 ? arch_stack_walk+0x9e/0xf0 ? rcu_is_watching+0x34/0x60 ? lock_acquire+0xd5/0x410 ? rcu_is_watching+0x34/0x60 netlink_rcv_skb+0xe0/0x210 ? __pfx_rtnetlink_rcv_msg+0x10/0x10 ? __pfx_netlink_rcv_skb+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? __pfx___netlink_lookup+0x10/0x10 ? lock_release+0x62/0x200 ? netlink_deliver_tap+0xfd/0x290 ? rcu_is_watching+0x34/0x60 ? lock_release+0x62/0x200 ? netlink_deliver_tap+0x95/0x290 netlink_unicast+0x31f/0x480 ? __pfx_netlink_unicast+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? lock_acquire+0xd5/0x410 netlink_sendmsg+0x369/0x660 ? lock_release+0x62/0x200 ? __pfx_netlink_sendmsg+0x10/0x10 ? import_ubuf+0xb9/0xf0 ? __import_iovec+0x254/0x2b0 ? lock_release+0x62/0x200 ? __pfx_netlink_sendmsg+0x10/0x10 ____sys_sendmsg+0x559/0x5a0 ? __pfx_____sys_sendmsg+0x10/0x10 ? __pfx_copy_msghdr_from_user+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? do_read_fault+0x213/0x4a0 ? rcu_is_watching+0x34/0x60 ___sys_sendmsg+0xe4/0x150 ? __pfx____sys_sendmsg+0x10/0x10 ? do_fault+0x2cc/0x6f0 ? handle_pte_fault+0x2e3/0x3d0 ? __pfx_handle_pte_fault+0x10/0x10 ? preempt_count_sub+0x14/0xc0 ? __down_read_trylock+0x150/0x270 ? __handle_mm_fault+0x404/0x8e0 ? __pfx___handle_mm_fault+0x10/0x10 ? lock_release+0x62/0x200 ? __rcu_read_unlock+0x65/0x90 ? rcu_is_watching+0x34/0x60 __sys_sendmsg+0xd5/0x150 ? __pfx___sys_sendmsg+0x10/0x10 ? __up_read+0x192/0x480 ? lock_release+0x62/0x200 ? __rcu_read_unlock+0x65/0x90 ? rcu_is_watching+0x34/0x60 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f63a5b13367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007fff8c726bc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000067b687c2 RCX: 00007f63a5b13367 RDX: 0000000000000000 RSI: 00007fff8c726c30 RDI: 0000000000000004 RBP: 00007fff8c726cb8 R08: 0000000000000000 R09: 0000000000000034 R10: 00007fff8c726c7c R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 00007fff8c726cd0 R15: 00007fff8c726cd0 </TASK> irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff813f9e58>] copy_process+0xd08/0x2830 softirqs last enabled at (0): [<ffffffff813f9e58>] copy_process+0xd08/0x2830 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace 0000000000000000 ]---
Thus, when calculating ifinfo message size, take VF GUIDs sizes into account when supported.
Fixes: 30aad41721e0 ("net/core: Add support for getting VF GUIDs") Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Maher Sanalla <[email protected]> Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7 |
|
| #
6dd13251 |
| 12-Mar-2025 |
Stanislav Fomichev <[email protected]> |
net: reorder dev_addr_sem lock
Lockdep complains about circular lock in 1 -> 2 -> 3 (see below).
Change the lock ordering to be: - rtnl_lock - dev_addr_sem - netdev_ops (only for lower devices!) -
net: reorder dev_addr_sem lock
Lockdep complains about circular lock in 1 -> 2 -> 3 (see below).
Change the lock ordering to be: - rtnl_lock - dev_addr_sem - netdev_ops (only for lower devices!) - team_lock (or other per-upper device lock)
1. rtnl_lock -> netdev_ops -> dev_addr_sem
rtnl_setlink rtnl_lock do_setlink IFLA_ADDRESS on lower netdev_ops dev_addr_sem
2. rtnl_lock -> team_lock -> netdev_ops
rtnl_newlink rtnl_lock do_setlink IFLA_MASTER on lower do_set_master team_add_slave team_lock team_port_add dev_set_mtu netdev_ops
3. rtnl_lock -> dev_addr_sem -> team_lock
rtnl_newlink rtnl_lock do_setlink IFLA_ADDRESS on upper dev_addr_sem netif_set_mac_address team_set_mac_address team_lock
4. rtnl_lock -> netdev_ops -> dev_addr_sem
rtnl_lock dev_ifsioc dev_set_mac_address_user
__tun_chr_ioctl rtnl_lock dev_set_mac_address_user
tap_ioctl rtnl_lock dev_set_mac_address_user
dev_set_mac_address_user netdev_lock_ops netif_set_mac_address_user dev_addr_sem
v2: - move lock reorder to happen after kmalloc (Kuniyuki)
Cc: Kohei Enju <[email protected]> Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Tested-by: Lei Yang <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
8033d2ae |
| 12-Mar-2025 |
Stanislav Fomichev <[email protected]> |
Revert "net: replace dev_addr_sem with netdev instance lock"
This reverts commit df43d8bf10316a7c3b1e47e3cc0057a54df4a5b8.
Cc: Kohei Enju <[email protected]> Reviewed-by: Kuniyuki Iwashima <kuniyu@a
Revert "net: replace dev_addr_sem with netdev instance lock"
This reverts commit df43d8bf10316a7c3b1e47e3cc0057a54df4a5b8.
Cc: Kohei Enju <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Tested-by: Lei Yang <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6 |
|
| #
8ef890df |
| 07-Mar-2025 |
Jakub Kicinski <[email protected]> |
net: move misc netdev_lock flavors to a separate header
Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuildin
net: move misc netdev_lock flavors to a separate header
Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers).
The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h.
Acked-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
df43d8bf |
| 05-Mar-2025 |
Stanislav Fomichev <[email protected]> |
net: replace dev_addr_sem with netdev instance lock
Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most o
net: replace dev_addr_sem with netdev instance lock
Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most of the paths that set/get mac are RTNL protected. Two places where it's not, convert to explicit locking: - sysfs address_show - dev_get_mac_address via dev_ioctl
0: https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/993321/24-router-bridge-1d-lag-sh/stderr
Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
7e4d784f |
| 05-Mar-2025 |
Stanislav Fomichev <[email protected]> |
net: hold netdev instance lock during rtnetlink operations
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock ar
net: hold netdev instance lock during rtnetlink operations
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries:
- (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference)
This avoids touching most of the existing ioctl/sysfs/drivers paths.
Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked.
Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now.
Cc: Saeed Mahameed <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
12b6f706 |
| 28-Feb-2025 |
Nicolas Dichtel <[email protected]> |
net: plumb extack in __dev_change_net_namespace()
It could be hard to understand why the netlink command fails. For example, if dev->netns_immutable is set, the error is "Invalid argument".
Signed-
net: plumb extack in __dev_change_net_namespace()
It could be hard to understand why the netlink command fails. For example, if dev->netns_immutable is set, the error is "Invalid argument".
Signed-off-by: Nicolas Dichtel <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
4754affe |
| 28-Feb-2025 |
Nicolas Dichtel <[email protected]> |
net: advertise netns_immutable property via netlink
Since commit 05c1280a2bcf ("netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local"), there is no way to see if the netns_immutable prop
net: advertise netns_immutable property via netlink
Since commit 05c1280a2bcf ("netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local"), there is no way to see if the netns_immutable property s set on a device. Let's add a netlink attribute to advertise it.
Signed-off-by: Nicolas Dichtel <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4 |
|
| #
7ca486d0 |
| 19-Feb-2025 |
Xiao Liang <[email protected]> |
rtnetlink: Create link directly in target net namespace
Make rtnl_newlink_create() create device in target namespace directly. Avoid extra netns change when link netns is provided.
Device drivers h
rtnetlink: Create link directly in target net namespace
Make rtnl_newlink_create() create device in target namespace directly. Avoid extra netns change when link netns is provided.
Device drivers has been converted to be aware of link netns, that is not assuming device netns is and link netns is the same when ops->newlink() is called.
Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
9c0fc091 |
| 19-Feb-2025 |
Xiao Liang <[email protected]> |
rtnetlink: Remove "net" from newlink params
Now that devices have been converted to use the specific netns instead of ambiguous "net", let's remove it from newlink parameters.
Signed-off-by: Xiao L
rtnetlink: Remove "net" from newlink params
Now that devices have been converted to use the specific netns instead of ambiguous "net", let's remove it from newlink parameters.
Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
69c7be1b |
| 19-Feb-2025 |
Xiao Liang <[email protected]> |
rtnetlink: Pack newlink() params into struct
There are 4 net namespaces involved when creating links:
- source netns - where the netlink socket resides, - target netns - where to put the device b
rtnetlink: Pack newlink() params into struct
There are 4 net namespaces involved when creating links:
- source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device.
Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request.
+------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+
When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning.
On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead.
To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially.
Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
ec061546 |
| 19-Feb-2025 |
Xiao Liang <[email protected]> |
rtnetlink: Lookup device in target netns when creating link
When creating link, lookup for existing device in target net namespace instead of current one. For example, two links created by:
# ip
rtnetlink: Lookup device in target netns when creating link
When creating link, lookup for existing device in target net namespace instead of current one. For example, two links created by:
# ip link add dummy1 type dummy # ip link add netns ns1 dummy1 type dummy
should have no conflict since they are in different namespaces.
Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2 |
|
| #
1438f5d0 |
| 05-Feb-2025 |
Nicolas Dichtel <[email protected]> |
rtnetlink: fix netns leak with rtnl_setlink()
A call to rtnl_nets_destroy() is needed to release references taken on netns put in rtnl_nets.
CC: [email protected] Fixes: 636af13f213b ("rtnetli
rtnetlink: fix netns leak with rtnl_setlink()
A call to rtnl_nets_destroy() is needed to release references taken on netns put in rtnl_nets.
CC: [email protected] Fixes: 636af13f213b ("rtnetlink: Register rtnl_dellink() and rtnl_setlink() with RTNL_FLAG_DOIT_PERNET_WIP.") Signed-off-by: Nicolas Dichtel <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
79c61899 |
| 04-Feb-2025 |
Antoine Tenart <[email protected]> |
net-sysfs: remove rtnl_trylock from device attributes
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths takin
net-sysfs: remove rtnl_trylock from device attributes
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths taking the rtnl lock after the sysfs one (actually kn->active refcount) use rtnl_trylock and return early (using restart_syscall)[3], which can make syscalls to spin for a long time when there is contention on the rtnl lock[4].
There are not many possibilities to improve the above: - Rework the entire net/ locking logic. - Invert two locks in one of the paths — not possible.
But here it's actually possible to drop one of the locks safely: the kernfs_node refcount. More details in the code itself, which comes with lots of comments.
Note that we check the device is alive in the added sysfs_rtnl_lock helper to disallow sysfs operations to run after device dismantle has started. This also help keeping the same behavior as before. Because of this calls to dev_isalive in sysfs ops were removed.
[1] https://lore.kernel.org/netdev/[email protected]/ [2] https://lore.kernel.org/netdev/[email protected]/ [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/ [4] https://lore.kernel.org/all/[email protected]/T/
Signed-off-by: Antoine Tenart <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6 |
|
| #
7bd72a4a |
| 04-Jan-2025 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Add rtnl_net_lock_killable().
rtnl_lock_killable() is used only in register_netdev() and will be converted to per-netns RTNL.
Let's unexport it and add the corresponding helper.
Signed-
rtnetlink: Add rtnl_net_lock_killable().
rtnl_lock_killable() is used only in register_netdev() and will be converted to per-netns RTNL.
Let's unexport it and add the corresponding helper.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc5, v6.13-rc4 |
|
| #
954a2b40 |
| 16-Dec-2024 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().
Xiao Liang reported that the cited commit changed netns handling in newlink() of netkit, veth, and vxcan.
Before the patch, if we do
rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().
Xiao Liang reported that the cited commit changed netns handling in newlink() of netkit, veth, and vxcan.
Before the patch, if we don't find a netns attribute in the peer device attributes, we tried to find another netns attribute in the outer netlink attributes by passing it to rtnl_link_get_net().
Let's restore the original behaviour.
Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()") Reported-by: Xiao Liang <[email protected]> Closes: https://lore.kernel.org/netdev/CABAhCORBVVU8P6AHcEkENMj+gD2d3ce9t=A_o48E0yOQp8_wUQ@mail.gmail.com/#t Signed-off-by: Kuniyuki Iwashima <[email protected]> Tested-by: Xiao Liang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3 |
|
| #
53970a05 |
| 09-Dec-2024 |
Eric Dumazet <[email protected]> |
rtnetlink: switch rtnl_fdb_dump() to for_each_netdev_dump()
This is the last netdev iterator still using net->dev_index_head[].
Convert to modern for_each_netdev_dump() for better scalability, and
rtnetlink: switch rtnl_fdb_dump() to for_each_netdev_dump()
This is the last netdev iterator still using net->dev_index_head[].
Convert to modern for_each_netdev_dump() for better scalability, and use common patterns in our stack.
Following patch in this series removes the pad field in struct ndo_fdb_dump_context.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
be325f08 |
| 09-Dec-2024 |
Eric Dumazet <[email protected]> |
rtnetlink: add ndo_fdb_dump_context
rtnl_fdb_dump() and various ndo_fdb_dump() helpers share a hidden layout of cb->ctx.
Before switching rtnl_fdb_dump() to for_each_netdev_dump() in the following
rtnetlink: add ndo_fdb_dump_context
rtnl_fdb_dump() and various ndo_fdb_dump() helpers share a hidden layout of cb->ctx.
Before switching rtnl_fdb_dump() to for_each_netdev_dump() in the following patch, make this more explicit.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2 |
|
| #
09310cfd |
| 06-Dec-2024 |
Dan Carpenter <[email protected]> |
rtnetlink: fix error code in rtnl_newlink()
If rtnl_get_peer_net() fails, then propagate the error code. Don't return success.
Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net
rtnetlink: fix error code in rtnl_newlink()
If rtnl_get_peer_net() fails, then propagate the error code. Don't return success.
Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1 |
|
| #
48327566 |
| 29-Nov-2024 |
Cong Wang <[email protected]> |
rtnetlink: fix double call of rtnl_link_get_net_ifla()
Currently rtnl_link_get_net_ifla() gets called twice when we create peer devices, once in rtnl_add_peer_net() and once in each ->newlink() impl
rtnetlink: fix double call of rtnl_link_get_net_ifla()
Currently rtnl_link_get_net_ifla() gets called twice when we create peer devices, once in rtnl_add_peer_net() and once in each ->newlink() implementation.
This looks safer, however, it leads to a classic Time-of-Check to Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And because of the lack of checking error pointer of the second call, it also leads to a kernel crash as reported by syzbot.
Fix this by getting rid of the second call, which already becomes redudant after Kuniyuki's work. We have to propagate the result of the first rtnl_link_get_net_ifla() down to each ->newlink().
Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=21ba4d5adff0b6a7cfc6 Fixes: 0eb87b02a705 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.") Fixes: 6b84e558e95d ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.") Fixes: fefd5d082172 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.") Cc: Kuniyuki Iwashima <[email protected]> Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
9b234a97 |
| 21-Nov-2024 |
Eric Dumazet <[email protected]> |
rtnetlink: fix rtnl_dump_ifinfo() error path
syzbot found that rtnl_dump_ifinfo() could return with a lock held [1]
Move code around so that rtnl_link_ops_put() and put_net() can be called at the e
rtnetlink: fix rtnl_dump_ifinfo() error path
syzbot found that rtnl_dump_ifinfo() could return with a lock held [1]
Move code around so that rtnl_link_ops_put() and put_net() can be called at the end of this function.
[1] WARNING: lock held when returning to user space! 6.12.0-rc7-syzkaller-01681-g38f83a57aa8e #0 Not tainted syz-executor399/5841 is leaving the kernel with locks still held! 1 lock held by syz-executor399/5841: #0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline] #0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline] #0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rtnl_link_ops_get+0x22/0x250 net/core/rtnetlink.c:555
Fixes: 43c7ce69d28e ("rtnetlink: Protect struct rtnl_link_ops with SRCU.") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Joe Damato <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|