History log of /linux-6.15/include/net/ip6_fib.h (Results 1 – 25 of 195)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3
# e60ea454 09-Oct-2024 Eric Dumazet <[email protected]>

ipv6: use READ_ONCE()/WRITE_ONCE() on fib6_table->fib_seq

Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.

Writes are protected by RTNL.
We can use READ_ONCE() when reading it.

C

ipv6: use READ_ONCE()/WRITE_ONCE() on fib6_table->fib_seq

Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.

Writes are protected by RTNL.
We can use READ_ONCE() when reading it.

Constify 'struct net' argument of fib6_tables_seq_read() and
fib6_rules_seq_read().

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6
# e8dfd42c 26-Apr-2024 Eric Dumazet <[email protected]>

ipv6: introduce dst_rt6_info() helper

Instead of (struct rt6_info *)dst casts, we can use :

#define dst_rt6_info(_ptr) \
container_of_const(_ptr, struct rt6_info, dst)

Some places needed

ipv6: introduce dst_rt6_info() helper

Instead of (struct rt6_info *)dst casts, we can use :

#define dst_rt6_info(_ptr) \
container_of_const(_ptr, struct rt6_info, dst)

Some places needed missing const qualifiers :

ip6_confirm_neigh(), ipv6_anycast_destination(),
ipv6_unicast_destination(), has_gateway()

v2: added missing parts (David Ahern)

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


# b5327b9a 26-Apr-2024 Eric Dumazet <[email protected]>

ipv6: use call_rcu_hurry() in fib6_info_release()

This is a followup of commit c4e86b4363ac ("net: add two more
call_rcu_hurry()")

fib6_info_destroy_rcu() is calling nexthop_put() or fib6_nh_releas

ipv6: use call_rcu_hurry() in fib6_info_release()

This is a followup of commit c4e86b4363ac ("net: add two more
call_rcu_hurry()")

fib6_info_destroy_rcu() is calling nexthop_put() or fib6_nh_release()

We must not delay it too much or risk unregister_netdevice/ref_tracker
traces because references to netdev are not released in time.

This should speedup device/netns dismantles when CONFIG_RCU_LAZY=y

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4
# 5eb902b8 08-Feb-2024 Kui-Feng Lee <[email protected]>

net/ipv6: Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is bi

net/ipv6: Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.

Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.8-rc3, v6.8-rc2
# 20df28fb 22-Jan-2024 Breno Leitao <[email protected]>

net/ipv6: resolve warning in ip6_fib.c

In some configurations, the 'iter' variable in function
fib6_repair_tree() is unused, resulting the following warning when
compiled with W=1.

net/ipv6/ip6

net/ipv6: resolve warning in ip6_fib.c

In some configurations, the 'iter' variable in function
fib6_repair_tree() is unused, resulting the following warning when
compiled with W=1.

net/ipv6/ip6_fib.c:1781:6: warning: variable 'iter' set but not used [-Wunused-but-set-variable]
1781 | int iter = 0;
| ^

It is unclear what is the advantage of this RT6_TRACE() macro[1], since
users can control pr_debug() in runtime, which is better than at
compilation time. pr_debug() has no overhead when disabled.

Remove the RT6_TRACE() in favor of simple pr_debug() helpers.

[1] Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7
# 5a78a812 19-Dec-2023 David Ahern <[email protected]>

net/ipv6: Remove gc_link warn on in fib6_info_release

A revert of
3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes")
was sent for net-next. Revert the remainder of 5

net/ipv6: Remove gc_link warn on in fib6_info_release

A revert of
3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes")
was sent for net-next. Revert the remainder of 5a08d0065a915
which added a warn on if a fib entry is still on the gc_link list
to avoid compile failures when net is merged to net-next

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

show more ...


# dade3f6a 19-Dec-2023 David Ahern <[email protected]>

net/ipv6: Revert remove expired routes with a separated list of routes

This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87.

The commit has some race conditions given how expires is managed

net/ipv6: Revert remove expired routes with a separated list of routes

This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87.

The commit has some race conditions given how expires is managed on a
fib6_info in relation to gc start, adding the entry to the gc list and
setting the timer value leading to UAF. Revert the commit and try again
in a later release.

Fixes: 3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes")
Cc: Kui-Feng Lee <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

show more ...


Revision tags: v6.7-rc6, v6.7-rc5
# a3c205d0 07-Dec-2023 Eric Dumazet <[email protected]>

ipv6: do not check fib6_has_expires() in fib6_info_release()

My prior patch went a bit too far, because apparently fib6_has_expires()
could be true while f6i->gc_link is not hashed yet.

fib6_set_ex

ipv6: do not check fib6_has_expires() in fib6_info_release()

My prior patch went a bit too far, because apparently fib6_has_expires()
could be true while f6i->gc_link is not hashed yet.

fib6_set_expires_locked() can indeed set RTF_EXPIRES
while f6i->fib6_table is NULL.

Original syzbot reports were about corruptions caused
by dangling f6i->gc_link.

Fixes: 5a08d0065a91 ("ipv6: add debug checks in fib6_info_release()")
Reported-by: [email protected]
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Kui-Feng Lee <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# 5a08d006 05-Dec-2023 Eric Dumazet <[email protected]>

ipv6: add debug checks in fib6_info_release()

Some elusive syzbot reports are hinting to fib6_info_release(),
with a potential dangling f6i->gc_link anchor.

Add debug checks so that syzbot can catc

ipv6: add debug checks in fib6_info_release()

Some elusive syzbot reports are hinting to fib6_info_release(),
with a potential dangling f6i->gc_link anchor.

Add debug checks so that syzbot can catch the issue earlier eventually.

BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_init include/linux/list.h:1016 [inline]
BUG: KASAN: slab-use-after-free in fib6_clean_expires_locked include/net/ip6_fib.h:533 [inline]
BUG: KASAN: slab-use-after-free in fib6_purge_rt+0x986/0x9c0 net/ipv6/ip6_fib.c:1064
Write of size 8 at addr ffff88802805a840 by task syz-executor.1/10057

CPU: 1 PID: 10057 Comm: syz-executor.1 Not tainted 6.7.0-rc2-syzkaller-00029-g9b6de136b5f0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc4/0x620 mm/kasan/report.c:475
kasan_report+0xda/0x110 mm/kasan/report.c:588
__hlist_del include/linux/list.h:990 [inline]
hlist_del_init include/linux/list.h:1016 [inline]
fib6_clean_expires_locked include/net/ip6_fib.h:533 [inline]
fib6_purge_rt+0x986/0x9c0 net/ipv6/ip6_fib.c:1064
fib6_del_route net/ipv6/ip6_fib.c:1993 [inline]
fib6_del+0xa7a/0x1750 net/ipv6/ip6_fib.c:2038
__ip6_del_rt net/ipv6/route.c:3866 [inline]
ip6_del_rt+0xf7/0x200 net/ipv6/route.c:3881
ndisc_router_discovery+0x295b/0x3560 net/ipv6/ndisc.c:1372
ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856
icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979
ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438
ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492
ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586
dst_input include/net/dst.h:461 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529
__netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643
netif_receive_skb_internal net/core/dev.c:5729 [inline]
netif_receive_skb+0x133/0x700 net/core/dev.c:5788
tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x64f/0xdf0 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f38e387b82f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 b9 80 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c 81 02 00 48
RSP: 002b:00007f38e45c9090 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f38e399bf80 RCX: 00007f38e387b82f
RDX: 00000000000003b6 RSI: 0000000020000680 RDI: 00000000000000c8
RBP: 00007f38e38c847a R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000003b6 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000000000b R14: 00007f38e399bf80 R15: 00007f38e3abfa48
</TASK>

Allocated by task 10044:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383
kasan_kmalloc include/linux/kasan.h:198 [inline]
__do_kmalloc_node mm/slab_common.c:1007 [inline]
__kmalloc+0x59/0x90 mm/slab_common.c:1020
kmalloc include/linux/slab.h:604 [inline]
kzalloc include/linux/slab.h:721 [inline]
fib6_info_alloc+0x40/0x160 net/ipv6/ip6_fib.c:155
ip6_route_info_create+0x337/0x1e70 net/ipv6/route.c:3749
ip6_route_add+0x26/0x150 net/ipv6/route.c:3843
rt6_add_route_info+0x2e7/0x4b0 net/ipv6/route.c:4316
rt6_route_rcv+0x76c/0xbf0 net/ipv6/route.c:985
ndisc_router_discovery+0x138b/0x3560 net/ipv6/ndisc.c:1529
ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856
icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979
ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438
ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492
ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586
dst_input include/net/dst.h:461 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529
__netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643
netif_receive_skb_internal net/core/dev.c:5729 [inline]
netif_receive_skb+0x133/0x700 net/core/dev.c:5788
tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x64f/0xdf0 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Freed by task 5123:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x15b/0x1b0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:164 [inline]
slab_free_hook mm/slub.c:1800 [inline]
slab_free_freelist_hook+0x114/0x1e0 mm/slub.c:1826
slab_free mm/slub.c:3809 [inline]
__kmem_cache_free+0xc0/0x180 mm/slub.c:3822
rcu_do_batch kernel/rcu/tree.c:2158 [inline]
rcu_core+0x819/0x1680 kernel/rcu/tree.c:2431
__do_softirq+0x21a/0x8de kernel/softirq.c:553

Last potentially related work creation:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:492
__call_rcu_common.constprop.0+0x9a/0x7a0 kernel/rcu/tree.c:2681
fib6_info_release include/net/ip6_fib.h:332 [inline]
fib6_info_release include/net/ip6_fib.h:329 [inline]
rt6_route_rcv+0xa4e/0xbf0 net/ipv6/route.c:997
ndisc_router_discovery+0x138b/0x3560 net/ipv6/ndisc.c:1529
ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856
icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979
ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438
ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492
ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586
dst_input include/net/dst.h:461 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529
__netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643
netif_receive_skb_internal net/core/dev.c:5729 [inline]
netif_receive_skb+0x133/0x700 net/core/dev.c:5788
tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x64f/0xdf0 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Second to last potentially related work creation:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:492
insert_work+0x38/0x230 kernel/workqueue.c:1647
__queue_work+0xcdc/0x11f0 kernel/workqueue.c:1803
call_timer_fn+0x193/0x590 kernel/time/timer.c:1700
expire_timers kernel/time/timer.c:1746 [inline]
__run_timers+0x585/0xb20 kernel/time/timer.c:2022
run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035
__do_softirq+0x21a/0x8de kernel/softirq.c:553

The buggy address belongs to the object at ffff88802805a800
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 64 bytes inside of
freed 512-byte region [ffff88802805a800, ffff88802805aa00)

The buggy address belongs to the physical page:
page:ffffea0000a01600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x28058
head:ffffea0000a01600 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888013041c80 ffffea0001e02600 dead000000000002
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 18706, tgid 18699 (syz-executor.2), ts 999991973280, free_ts 996884464281
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1537
prep_new_page mm/page_alloc.c:1544 [inline]
get_page_from_freelist+0xa25/0x36d0 mm/page_alloc.c:3312
__alloc_pages+0x22e/0x2420 mm/page_alloc.c:4568
alloc_pages_mpol+0x258/0x5f0 mm/mempolicy.c:2133
alloc_slab_page mm/slub.c:1870 [inline]
allocate_slab mm/slub.c:2017 [inline]
new_slab+0x283/0x3c0 mm/slub.c:2070
___slab_alloc+0x979/0x1500 mm/slub.c:3223
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3322
__slab_alloc_node mm/slub.c:3375 [inline]
slab_alloc_node mm/slub.c:3468 [inline]
__kmem_cache_alloc_node+0x131/0x310 mm/slub.c:3517
__do_kmalloc_node mm/slab_common.c:1006 [inline]
__kmalloc+0x49/0x90 mm/slab_common.c:1020
kmalloc include/linux/slab.h:604 [inline]
kzalloc include/linux/slab.h:721 [inline]
copy_splice_read+0x1ac/0x8f0 fs/splice.c:338
vfs_splice_read fs/splice.c:992 [inline]
vfs_splice_read+0x2ea/0x3b0 fs/splice.c:962
splice_direct_to_actor+0x2a5/0xa30 fs/splice.c:1069
do_splice_direct+0x1af/0x280 fs/splice.c:1194
do_sendfile+0xb3e/0x1310 fs/read_write.c:1254
__do_sys_sendfile64 fs/read_write.c:1322 [inline]
__se_sys_sendfile64 fs/read_write.c:1308 [inline]
__x64_sys_sendfile64+0x1d6/0x220 fs/read_write.c:1308
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1137 [inline]
free_unref_page_prepare+0x4fa/0xaa0 mm/page_alloc.c:2347
free_unref_page_list+0xe6/0xb40 mm/page_alloc.c:2533
release_pages+0x32a/0x14f0 mm/swap.c:1042
tlb_batch_pages_flush+0x9a/0x190 mm/mmu_gather.c:98
tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
tlb_flush_mmu mm/mmu_gather.c:300 [inline]
tlb_finish_mmu+0x14b/0x6f0 mm/mmu_gather.c:392
exit_mmap+0x38b/0xa70 mm/mmap.c:3321
__mmput+0x12a/0x4d0 kernel/fork.c:1349
mmput+0x62/0x70 kernel/fork.c:1371
exit_mm kernel/exit.c:567 [inline]
do_exit+0x9ad/0x2ae0 kernel/exit.c:858
do_group_exit+0xd4/0x2a0 kernel/exit.c:1021
get_signal+0x23be/0x2790 kernel/signal.c:2904
arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204
irqentry_exit_to_user_mode+0xa/0x40 kernel/entry/common.c:309
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1
# 8aae7625 30-Aug-2023 Florian Westphal <[email protected]>

net: fib: avoid warn splat in flow dissector

New skbs allocated via nf_send_reset() have skb->dev == NULL.

fib*_rules_early_flow_dissect helpers already have a 'struct net'
argument but its not pas

net: fib: avoid warn splat in flow dissector

New skbs allocated via nf_send_reset() have skb->dev == NULL.

fib*_rules_early_flow_dissect helpers already have a 'struct net'
argument but its not passed down to the flow dissector core, which
will then WARN as it can't derive a net namespace to use:

WARNING: CPU: 0 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0
[..]
ip_route_me_harder+0x143/0x330
nf_send_reset+0x17c/0x2d0 [nf_reject_ipv4]
nft_reject_inet_eval+0xa9/0xf2 [nft_reject_inet]
nft_do_chain+0x198/0x5d0 [nf_tables]
nft_do_chain_inet+0xa4/0x110 [nf_tables]
nf_hook_slow+0x41/0xc0
ip_local_deliver+0xce/0x110
..

Cc: Stanislav Fomichev <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ido Schimmel <[email protected]>
Fixes: 812fa71f0d96 ("netfilter: Dissect flow after packet mangling")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217826
Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

show more ...


Revision tags: v6.5, v6.5-rc7
# 3dec89b1 15-Aug-2023 Kui-Feng Lee <[email protected]>

net/ipv6: Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is bi

net/ipv6: Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.

Signed-off-by: Kui-Feng Lee <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5
# 8cdc3223 27-Mar-2023 Kuniyuki Iwashima <[email protected]>

ipv6: Remove in6addr_any alternatives.

Some code defines the IPv6 wildcard address as a local variable and
use it with memcmp() or ipv6_addr_equal().

Let's use in6addr_any and ipv6_addr_any() inste

ipv6: Remove in6addr_any alternatives.

Some code defines the IPv6 wildcard address as a local variable and
use it with memcmp() or ipv6_addr_equal().

Let's use in6addr_any and ipv6_addr_any() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.3-rc4
# d288a162 23-Mar-2023 Wangyang Guo <[email protected]>

net: dst: Prevent false sharing vs. dst_entry:: __refcnt

dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_

net: dst: Prevent false sharing vs. dst_entry:: __refcnt

dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_t, so the
reference count operations have to take the cache-line exclusive.

Aside of the unavoidable reference count contention there is another
significant problem which is caused by that: False sharing.

perf top identified two affected read accesses. dst_entry::lwtstate and
rtable::rt_genid.

dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into
a seperate cacheline vs. the read mostly members located at the beginning
of the struct.

That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also

dst_entry::lwtstate

which is located after the reference count and in the same cache line. This
member is read after a reference count has been acquired.

struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a
size of 112 bytes, which means that the struct members of rtable which
follow the dst member share the same cache line as dst_entry::__refcnt.
Especially

rtable::rt_genid

is also read by the contexts which have a reference count acquired
already.

When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall. This was found when analysing the
memtier benchmark in 1:100 mode, which amplifies the problem extremly.

Move the rt[6i]_uncached[_list] members out of struct rtable and struct
rt6_info into struct dst_entry to provide padding and move the lwtstate
member after that so it ends up in the same cache line.

The resulting improvement depends on the micro-architecture and the number
of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached
benchmark.

[ tglx: Rearrange struct ]

Signed-off-by: Wangyang Guo <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5
# d95d6320 16-Feb-2022 Eric Dumazet <[email protected]>

ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt

Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
ne

ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt

Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().

BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt

read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
__ip6_del_rt net/ipv6/route.c:3876 [inline]
ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
__ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
__ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
__sock_release net/socket.c:650 [inline]
sock_close+0x6c/0x150 net/socket.c:1318
__fput+0x295/0x520 fs/file_table.c:280
____fput+0x11/0x20 fs/file_table.c:313
task_work_run+0x8e/0x110 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x2c7/0x2e0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30

value changed: 0x22 -> 0x2a

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 0c5fcf9e249e ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e44 ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Amit Cohen <[email protected]>
Cc: Ido Schimmel <[email protected]>
Reported-by: syzbot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v5.17-rc4
# 2d4feb2c 10-Feb-2022 Eric Dumazet <[email protected]>

ipv6: get rid of net->ipv6.rt6_stats->fib_rt_uncache

This counter has never been visible, there is little point
trying to maintain it.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-b

ipv6: get rid of net->ipv6.rt6_stats->fib_rt_uncache

This counter has never been visible, there is little point
trying to maintain it.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v5.17-rc3, v5.17-rc2, v5.17-rc1
# aafc2e32 20-Jan-2022 Eric Dumazet <[email protected]>

ipv6: annotate accesses to fn->fn_sernum

struct fib6_node's fn_sernum field can be
read while other threads change it.

Add READ_ONCE()/WRITE_ONCE() annotations.

Do not change existing smp barriers

ipv6: annotate accesses to fn->fn_sernum

struct fib6_node's fn_sernum field can be
read while other threads change it.

Add READ_ONCE()/WRITE_ONCE() annotations.

Do not change existing smp barriers in fib6_get_cookie_safe()
and __fib6_update_sernum_upto_root()

syzbot reported:

BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket

write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1:
fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178
fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112
fib6_walk net/ipv6/ip6_fib.c:2160 [inline]
fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline]
__fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256
fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281
rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline]
addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230
addrconf_dad_work+0x908/0x1170
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:359
ret_from_fork+0x1f/0x30

read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0:
fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline]
rt6_get_cookie include/net/ip6_fib.h:306 [inline]
ip6_dst_store include/net/ip6_route.h:234 [inline]
inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109
inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121
__tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402
tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
__tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725
mptcp_push_release net/mptcp/protocol.c:1491 [inline]
__mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578
mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764
inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
kernel_sendmsg+0x97/0xd0 net/socket.c:745
sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086
inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834
kernel_sendpage+0x187/0x200 net/socket.c:3492
sock_sendpage+0x5a/0x70 net/socket.c:1007
pipe_to_sendpage+0x128/0x160 fs/splice.c:364
splice_from_pipe_feed fs/splice.c:418 [inline]
__splice_from_pipe+0x207/0x500 fs/splice.c:562
splice_from_pipe fs/splice.c:597 [inline]
generic_splice_sendpage+0x94/0xd0 fs/splice.c:746
do_splice_from fs/splice.c:767 [inline]
direct_splice_actor+0x80/0xa0 fs/splice.c:936
splice_direct_to_actor+0x345/0x650 fs/splice.c:891
do_splice_direct+0x106/0x190 fs/splice.c:979
do_sendfile+0x675/0xc40 fs/read_write.c:1245
__do_sys_sendfile64 fs/read_write.c:1310 [inline]
__se_sys_sendfile64 fs/read_write.c:1296 [inline]
__x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000026f -> 0x00000271

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

The Fixes tag I chose is probably arbitrary, I do not think
we need to backport this patch to older kernels.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v5.16, v5.16-rc8
# 3b80b73a 30-Dec-2021 Jakub Kicinski <[email protected]>

net: Add includes masked by netdevice.h including uapi/bpf.h

Add missing includes unmasked by the subsequent change.

Mostly network drivers missing an include for XDP_PACKET_HEADROOM.

Signed-off-b

net: Add includes masked by netdevice.h including uapi/bpf.h

Add missing includes unmasked by the subsequent change.

Mostly network drivers missing an include for XDP_PACKET_HEADROOM.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

show more ...


Revision tags: v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3
# 8837cbbf 22-Nov-2021 Nikolay Aleksandrov <[email protected]>

net: ipv6: add fib6_nh_release_dsts stub

We need a way to release a fib6_nh's per-cpu dsts when replacing
nexthops otherwise we can end up with stale per-cpu dsts which hold net
device references, s

net: ipv6: add fib6_nh_release_dsts stub

We need a way to release a fib6_nh's per-cpu dsts when replacing
nexthops otherwise we can end up with stale per-cpu dsts which hold net
device references, so add a new IPv6 stub called fib6_nh_release_dsts.
It must be used after an RCU grace period, so no new dsts can be created
through a group's nexthop entry.
Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed
so it doesn't need a dummy stub when IPv6 is not enabled.

Fixes: 7bf4796dd099 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14
# 446e7f21 23-Aug-2021 zhang kai <[email protected]>

ipv6: correct comments about fib6_node sernum

correct comments in set and get fn_sernum

Signed-off-by: zhang kai <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: D

ipv6: correct comments about fib6_node sernum

correct comments in set and get fn_sernum

Signed-off-by: zhang kai <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7
# 0c5fcf9e 07-Feb-2021 Amit Cohen <[email protected]>

IPv6: Add "offload failed" indication to routes

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily

IPv6: Add "offload failed" indication to routes

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv6 routes, so that
users will have better visibility into the offload process.

'struct fib6_info' is extended with new field that indicates if route
offload failed. Note that the new field is added using unused bit and
therefore there is no need to increase struct size.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


# 907eea48 01-Feb-2021 Amit Cohen <[email protected]>

net: ipv6: Emit notification when fib hardware flags are changed

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but

net: ipv6: Emit notification when fib hardware flags are changed

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but not necessarily in hardware.

The asynchronous nature of route installation in hardware can lead
to a routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

It is also possible for a route already installed in hardware to change
its action and therefore its flags. For example, a host route that is
trapping packets can be "promoted" to perform decapsulation following
the installation of an IPinIP/VXLAN tunnel.

Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed. The aim is to provide an indication to user-space
(e.g., routing daemons) about the state of the route in hardware.

Introduce a sysctl that controls this behavior.

Keep the default value at 0 (i.e., do not emit notifications) for several
reasons:
- Multiple RTM_NEWROUTE notification per-route might confuse existing
routing daemons.
- Convergence reasons in routing daemons.
- The extra notifications will negatively impact the insertion rate.
- Not all users are interested in these notifications.

Move fib6_info_hw_flags_set() to C file because it is no longer a short
function.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# fbaca8f8 01-Feb-2021 Amit Cohen <[email protected]>

net: Pass 'net' struct as first argument to fib6_info_hw_flags_set()

The next patch will emit notification when hardware flags are changed,
in case that fib_notify_on_flag_change sysctl is set to 1.

net: Pass 'net' struct as first argument to fib6_info_hw_flags_set()

The next patch will emit notification when hardware flags are changed,
in case that fib_notify_on_flag_change sysctl is set to 1.

To know sysctl values, net struct is needed.
This change is consistent with the IPv4 version, which gets 'net' struct
as its first argument.

Currently, the only callers of this function are mlxsw and netdevsim.
Patch the callers to pass net.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6
# 4b48b0a3 15-Jul-2020 Randy Dunlap <[email protected]>

net: ip6_fib.h: drop duplicate word in comment

Drop doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: netdev@vger.

net: ip6_fib.h: drop duplicate word in comment

Drop doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v5.8-rc5, v5.8-rc4, v5.8-rc3
# 55cced4f 23-Jun-2020 Brian Vazquez <[email protected]>

ipv6: fib6: avoid indirect calls from fib6_rule_lookup

It was reported that a considerable amount of cycles were spent on the
expensive indirect calls on fib6_rule_lookup. This patch introduces an
i

ipv6: fib6: avoid indirect calls from fib6_rule_lookup

It was reported that a considerable amount of cycles were spent on the
expensive indirect calls on fib6_rule_lookup. This patch introduces an
inline helper called pol_route_func that uses the indirect_call_wrappers
to avoid the indirect calls.

This patch saves around 50ns per call.

Performance was measured on the receiver by checking the amount of
syncookies that server was able to generate under a synflood load.

Traffic was generated using trafgen[1] which was pushing around 1Mpps on
a single queue. Receiver was using only one rx queue which help to
create a bottle neck and make the experiment rx-bounded.

These are the syncookies generated over 10s from the different runs:

Whithout the patch:
TcpExtSyncookiesSent 3553749 0.0
TcpExtSyncookiesSent 3550895 0.0
TcpExtSyncookiesSent 3553845 0.0
TcpExtSyncookiesSent 3541050 0.0
TcpExtSyncookiesSent 3539921 0.0
TcpExtSyncookiesSent 3557659 0.0
TcpExtSyncookiesSent 3526812 0.0
TcpExtSyncookiesSent 3536121 0.0
TcpExtSyncookiesSent 3529963 0.0
TcpExtSyncookiesSent 3536319 0.0

With the patch:
TcpExtSyncookiesSent 3611786 0.0
TcpExtSyncookiesSent 3596682 0.0
TcpExtSyncookiesSent 3606878 0.0
TcpExtSyncookiesSent 3599564 0.0
TcpExtSyncookiesSent 3601304 0.0
TcpExtSyncookiesSent 3609249 0.0
TcpExtSyncookiesSent 3617437 0.0
TcpExtSyncookiesSent 3608765 0.0
TcpExtSyncookiesSent 3620205 0.0
TcpExtSyncookiesSent 3601895 0.0

Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with
the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an
estimate of 50 ns per packet.

[1] http://netsniff-ng.org/

Changelog since v1:
- Change ordering in the ICW (Paolo Abeni)

Cc: Luigi Rizzo <[email protected]>
Cc: Paolo Abeni <[email protected]>
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Brian Vazquez <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7
# 38428d68 22-May-2020 Roopa Prabhu <[email protected]>

nexthop: support for fdb ecmp nexthops

This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is

nexthop: support for fdb ecmp nexthops

This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 - fixed uninitialized variable reported by kernel test robot
Reported-by: kernel test robot <[email protected]>

Signed-off-by: Roopa Prabhu <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


12345678