|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
9f7f3ebe |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Namespacify fib_info hash tables.
We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Then, we need to have per-netns hash tables for struct fib_info.
Let's allocate the has
ipv4: fib: Namespacify fib_info hash tables.
We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Then, we need to have per-netns hash tables for struct fib_info.
Let's allocate the hash tables per netns.
fib_info_hash, fib_info_hash_bits, and fib_info_cnt are now moved to struct netns_ipv4 and accessed with net->ipv4.fib_XXX.
Also, the netns checks are removed from fib_find_info_nh() and fib_find_info().
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
b79bcaf7 |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Add fib_info_hash_grow().
When the number of struct fib_info exceeds the hash table size in fib_create_info(), we try to allocate a new hash table with the doubled size.
The allocation i
ipv4: fib: Add fib_info_hash_grow().
When the number of struct fib_info exceeds the hash table size in fib_create_info(), we try to allocate a new hash table with the doubled size.
The allocation is done in fib_create_info(), and if successful, each struct fib_info is moved to the new hash table by fib_info_hash_move().
Let's integrate the allocation and fib_info_hash_move() as fib_info_hash_grow() to make the following change cleaner.
While at it, fib_info_hash_grow() is placed near other hash-table-specific functions.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
d6306b9d |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Remove fib_info_hash_size.
We will allocate the fib_info hash tables per netns.
There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info_hash_si
ipv4: fib: Remove fib_info_hash_size.
We will allocate the fib_info hash tables per netns.
There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info_hash_size, fib_info_hash_bits, fib_info_cnt.
However, fib_info_laddrhash and fib_info_hash_size can be easily calculated from fib_info_hash and fib_info_hash_bits.
Let's remove fib_info_hash_size and use (1 << fib_info_hash_bits) instead.
Now we need not pass the new hash table size to fib_info_hash_move().
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
0dbca8c2 |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Remove fib_info_laddrhash pointer.
We will allocate the fib_info hash tables per netns.
There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info
ipv4: fib: Remove fib_info_laddrhash pointer.
We will allocate the fib_info hash tables per netns.
There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info_hash_size, fib_info_hash_bits, fib_info_cnt.
However, fib_info_laddrhash and fib_info_hash_size can be easily calculated from fib_info_hash and fib_info_hash_bits.
Let's remove the fib_info_laddrhash pointer and instead use fib_info_hash + (1 << fib_info_hash_bits).
While at it, fib_info_laddrhash_bucket() is moved near other hash-table-specific functions.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
84c75e94 |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Make fib_info_hashfn() return struct hlist_head.
Every time fib_info_hashfn() returns a hash value, we fetch &fib_info_hash[hash].
Let's return the hlist_head pointer from fib_info_hashf
ipv4: fib: Make fib_info_hashfn() return struct hlist_head.
Every time fib_info_hashfn() returns a hash value, we fetch &fib_info_hash[hash].
Let's return the hlist_head pointer from fib_info_hashfn() and rename it to fib_info_hash_bucket() to match a similar function, fib_info_laddrhash_bucket().
Note that we need to move the fib_info_hash assignment earlier in fib_info_hash_move() to use fib_info_hash_bucket() in the for loop.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
cfc47029 |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Allocate fib_info_hash[] during netns initialisation.
We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns.
Currently, fib_info_hash[] is allocated when the first rou
ipv4: fib: Allocate fib_info_hash[] during netns initialisation.
We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns.
Currently, fib_info_hash[] is allocated when the first route is added.
Let's move the first allocation to a new __net_init function.
Note that we must call fib4_semantics_exit() in fib_net_exit_batch() because ->exit() is called earlier than ->exit_batch().
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
fa336adc |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Allocate fib_info_hash[] and fib_info_laddrhash[] by kvcalloc().
Both fib_info_hash[] and fib_info_laddrhash[] are hash tables for struct fib_info and are allocated by kvzmalloc() separat
ipv4: fib: Allocate fib_info_hash[] and fib_info_laddrhash[] by kvcalloc().
Both fib_info_hash[] and fib_info_laddrhash[] are hash tables for struct fib_info and are allocated by kvzmalloc() separately.
Let's replace the two kvzmalloc() calls with kvcalloc() to remove the fib_info_laddrhash pointer later.
Note that fib_info_hash_alloc() allocates a new hash table based on fib_info_hash_bits because we will remove fib_info_hash_size later.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
a3f5f4c2 |
| 04-Oct-2024 |
Eric Dumazet <[email protected]> |
ipv4: remove fib_info_devhash[]
Upcoming per-netns RTNL conversion needs to get rid of shared hash tables.
fib_info_devhash[] is one of them.
It is unclear why we used a hash table, because a sing
ipv4: remove fib_info_devhash[]
Upcoming per-netns RTNL conversion needs to get rid of shared hash tables.
fib_info_devhash[] is one of them.
It is unclear why we used a hash table, because a single hlist_head per net device was cheaper and scalable.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
143ca845 |
| 04-Oct-2024 |
Eric Dumazet <[email protected]> |
ipv4: remove fib_info_lock
After the prior patch, fib_info_lock became redundant because all of its users are holding RTNL.
BH protection is not needed.
Remove the READ_ONCE()/WRITE_ONCE() annotat
ipv4: remove fib_info_lock
After the prior patch, fib_info_lock became redundant because all of its users are holding RTNL.
BH protection is not needed.
Remove the READ_ONCE()/WRITE_ONCE() annotations around fib_info_cnt, since it is protected by RTNL.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
fc38b283 |
| 04-Oct-2024 |
Eric Dumazet <[email protected]> |
ipv4: use rcu in ip_fib_check_default()
fib_info_devhash[] is not resized in fib_info_hash_move().
fib_nh structs are already freed after an rcu grace period.
This will allow to remove fib_info_lo
ipv4: use rcu in ip_fib_check_default()
fib_info_devhash[] is not resized in fib_info_hash_move().
fib_nh structs are already freed after an rcu grace period.
This will allow to remove fib_info_lock in the following patch.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
8a0f62fd |
| 04-Oct-2024 |
Eric Dumazet <[email protected]> |
ipv4: remove fib_devindex_hashfn()
fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash.
It makes no sense doing this from fib_info_hashfn() and fib_find_info_nh().
It is better to
ipv4: remove fib_devindex_hashfn()
fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash.
It makes no sense doing this from fib_info_hashfn() and fib_find_info_nh().
It is better to keep as many bits as possible to let fib_info_hashfn_result() have better spread.
Only fib_info_devhash_bucket() needs to make this operation, we can 'inline' trivial fib_devindex_hashfn() in it.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
9b8ca048 |
| 01-Oct-2024 |
Alexandre Ferrieux <[email protected]> |
ipv4: avoid quadratic behavior in FIB insertion of common address
Mix netns into all IPv4 FIB hashes to avoid massive collision when inserting the same address in many netns.
Signed-off-by: Alexand
ipv4: avoid quadratic behavior in FIB insertion of common address
Mix netns into all IPv4 FIB hashes to avoid massive collision when inserting the same address in many netns.
Signed-off-by: Alexandre Ferrieux <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
4c180887 |
| 22-Aug-2024 |
Li Zetao <[email protected]> |
ipv4: delete redundant judgment statements
The initial value of err is -ENOBUFS, and err is guaranteed to be less than 0 before all goto errout. Therefore, on the error path of errout, there is no n
ipv4: delete redundant judgment statements
The initial value of err is -ENOBUFS, and err is guaranteed to be less than 0 before all goto errout. Therefore, on the error path of errout, there is no need to repeatedly judge that err is less than 0, and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4 |
|
| #
1fa3314c |
| 14-Aug-2024 |
Ido Schimmel <[email protected]> |
ipv4: Centralize TOS matching
The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized
ipv4: Centralize TOS matching
The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized differently by different call sites. Some treat it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC 791 TOS and initialize it using IPTOS_RT_MASK.
What is common to all these call sites is that they all initialize the lower three DSCP bits, which fits the TOS definition in the initial IPv4 specification (RFC 791).
Therefore, the kernel only allows configuring IPv4 FIB rules that match on the lower three DSCP bits which are always guaranteed to be initialized by all call sites:
# ip -4 rule add tos 0x1c table 100 # ip -4 rule add tos 0x3c table 100 Error: Invalid tos.
While this works, it is unlikely to be very useful. RFC 791 that initially defined the TOS and IP precedence fields was updated by RFC 2474 over twenty five years ago where these fields were replaced by a single six bits DSCP field.
Extending FIB rules to match on DSCP can be done by adding a new DSCP selector while maintaining the existing semantics of the TOS selector for applications that rely on that.
A prerequisite for allowing FIB rules to match on DSCP is to adjust all the call sites to initialize the high order DSCP bits and remove their masking along the path to the core where the field is matched on.
However, making this change alone will result in a behavior change. For example, a forwarded IPv4 packet with a DS field of 0xfc will no longer match a FIB rule that was configured with 'tos 0x1c'.
This behavior change can be avoided by masking the upper three DSCP bits in 'flowi4_tos' before comparing it against the TOS selectors in FIB rules and routes.
Implement the above by adding a new function that checks whether a given DSCP value matches the one specified in the IPv4 flow information structure and invoke it from the three places that currently match on 'flowi4_tos'.
Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK since the latter is not uAPI and we should be able to remove it at some point.
Include <linux/ip.h> in <linux/in_route.h> since the former defines IPTOS_TOS_MASK which is used in the definition of RT_TOS() in <linux/in_route.h>.
No regressions in FIB tests:
# ./fib_tests.sh [...] Tests passed: 218 Tests failed: 0
And FIB rule tests:
# ./fib_rule_tests.sh [...] Tests passed: 116 Tests failed: 0
Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10 |
|
| #
68073523 |
| 10-Jul-2024 |
Nicolas Dichtel <[email protected]> |
ipv4: fix source address selection with route leak
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, con
ipv4: fix source address selection with route leak
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address.
Let's add a check against the output interface and call the appropriate function to select the source address.
CC: [email protected] Fixes: 8cbb512c923d ("net: Add source address lookup op for VRF") Signed-off-by: Nicolas Dichtel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
61e2bbaf |
| 31-May-2024 |
Jason Xing <[email protected]> |
net: remove NULL-pointer net parameter in ip_metrics_convert
When I was doing some experiments, I found that when using the first parameter, namely, struct net, in ip_metrics_convert() always trigge
net: remove NULL-pointer net parameter in ip_metrics_convert
When I was doing some experiments, I found that when using the first parameter, namely, struct net, in ip_metrics_convert() always triggers NULL pointer crash. Then I digged into this part, realizing that we can remove this one due to its uselessness.
Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
c4e86b43 |
| 23-Apr-2024 |
Eric Dumazet <[email protected]> |
net: add two more call_rcu_hurry()
I had failures with pmtu.sh selftests lately, with netns dismantles firing ref_tracking alerts [1].
After much debugging, I found that some queued rcu callbacks w
net: add two more call_rcu_hurry()
I had failures with pmtu.sh selftests lately, with netns dismantles firing ref_tracking alerts [1].
After much debugging, I found that some queued rcu callbacks were delayed by minutes, because of CONFIG_RCU_LAZY=y option.
Joel Fernandes had a similar issue in the past, fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() for dst_release()")
In this commit, I make sure nexthop_free_rcu() and free_fib_info_rcu() are not delayed too much because they both can release device references.
tools/testing/selftests/net/pmtu.sh no longer fails.
Traces were:
[ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 3/5 users at dst_alloc+0x76/0x160 ip6_dst_alloc+0x25/0x80 ip6_pol_route+0x2a8/0x450 ip6_pol_route_output+0x1f/0x30 fib6_rule_lookup+0x163/0x270 ip6_route_output_flags+0xda/0x190 ip6_dst_lookup_tail.constprop.0+0x1d0/0x260 ip6_dst_lookup_flow+0x47/0xa0 udp_tunnel6_dst_lookup+0x158/0x210 vxlan_xmit_one+0x4c2/0x1550 [vxlan] vxlan_xmit+0x52d/0x14f0 [vxlan] dev_hard_start_xmit+0x7b/0x1e0 __dev_queue_xmit+0x20b/0xe40 ip6_finish_output2+0x2ea/0x6e0 ip6_finish_output+0x143/0x320 ip6_output+0x74/0x140
[ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at netdev_get_by_index+0xc0/0xe0 fib6_nh_init+0x1a9/0xa90 rtm_new_nexthop+0x6fa/0x1580 rtnetlink_rcv_msg+0x155/0x3e0 netlink_rcv_skb+0x61/0x110 rtnetlink_rcv+0x19/0x20 netlink_unicast+0x23f/0x380 netlink_sendmsg+0x1fc/0x430 ____sys_sendmsg+0x2ef/0x320 ___sys_sendmsg+0x86/0xd0 __sys_sendmsg+0x67/0xc0 __x64_sys_sendmsg+0x21/0x30 x64_sys_call+0x252/0x2030 do_syscall_64+0x6c/0x190 entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at ipv6_add_dev+0x136/0x530 addrconf_notify+0x19d/0x770 notifier_call_chain+0x65/0xd0 raw_notifier_call_chain+0x1a/0x20 call_netdevice_notifiers_info+0x54/0x90 register_netdevice+0x61e/0x790 veth_newlink+0x230/0x440 __rtnl_newlink+0x7d2/0xaa0 rtnl_newlink+0x4c/0x70 rtnetlink_rcv_msg+0x155/0x3e0 netlink_rcv_skb+0x61/0x110 rtnetlink_rcv+0x19/0x20 netlink_unicast+0x23f/0x380 netlink_sendmsg+0x1fc/0x430 ____sys_sendmsg+0x2ef/0x320 ___sys_sendmsg+0x86/0xd0 .... [ 1079.316024] ? show_regs+0x68/0x80 [ 1079.316087] ? __warn+0x8c/0x140 [ 1079.316103] ? ref_tracker_free+0x1a0/0x270 [ 1079.316117] ? report_bug+0x196/0x1c0 [ 1079.316135] ? handle_bug+0x42/0x80 [ 1079.316149] ? exc_invalid_op+0x1c/0x70 [ 1079.316162] ? asm_exc_invalid_op+0x1f/0x30 [ 1079.316193] ? ref_tracker_free+0x1a0/0x270 [ 1079.316208] ? _raw_spin_unlock+0x1a/0x40 [ 1079.316222] ? free_unref_page+0x126/0x1a0 [ 1079.316239] ? destroy_large_folio+0x69/0x90 [ 1079.316251] ? __folio_put+0x99/0xd0 [ 1079.316276] dst_dev_put+0x69/0xd0 [ 1079.316308] fib6_nh_release_dsts.part.0+0x3d/0x80 [ 1079.316327] fib6_nh_release+0x45/0x70 [ 1079.316340] nexthop_free_rcu+0x131/0x170 [ 1079.316356] rcu_do_batch+0x1ee/0x820 [ 1079.316370] ? rcu_do_batch+0x179/0x820 [ 1079.316388] rcu_core+0x1aa/0x4d0 [ 1079.316405] rcu_core_si+0x12/0x20 [ 1079.316417] __do_softirq+0x13a/0x3dc [ 1079.316435] __irq_exit_rcu+0xa3/0x110 [ 1079.316449] irq_exit_rcu+0x12/0x30 [ 1079.316462] sysvec_apic_timer_interrupt+0x5b/0xe0 [ 1079.316474] asm_sysvec_apic_timer_interrupt+0x1f/0x30 [ 1079.316569] RIP: 0033:0x7f06b65c63f0
Signed-off-by: Eric Dumazet <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Paul E. McKenney <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
195374d8 |
| 17-Oct-2023 |
Eric Dumazet <[email protected]> |
ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
syzbot reported a data-race while accessing nh->nh_saddr_genid [1]
Add annotations, but leave the code lazy as intended.
[1] BU
ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
syzbot reported a data-race while accessing nh->nh_saddr_genid [1]
Add annotations, but leave the code lazy as intended.
[1] BUG: KCSAN: data-race in fib_select_path / fib_select_path
write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1: fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline] fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline] fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0: fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline] fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
value changed: 0x959d3217 -> 0x959d3218
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker
Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3 |
|
| #
4b2b6060 |
| 22-Sep-2023 |
Hangbin Liu <[email protected]> |
ipv4/fib: send notify when delete source address routes
After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_
ipv4/fib: send notify when delete source address routes
After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_table_flush(). Then the stray entries will be deleted silently and no RTM_DELROUTE notification will be sent.
This lack of notification can make routing daemons, or monitor like `ip monitor route` miss the routing changes. e.g.
+ ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 192.168.5.5/24 dev dummy1 + ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5 + ip -4 route 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
As Ido reminded, fib_table_flush() isn't only called when an address is deleted, but also when an interface is deleted or put down. The lack of notification in these cases is deliberate. And commit 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down") introduced a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send the route delete notify blindly in fib_table_flush().
To fix this issue, let's add a new flag in "struct fib_info" to track the deleted prefer source address routes, and only send notify for them.
After update: + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
Suggested-by: Thomas Haller <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc2, v6.6-rc1 |
|
| #
fce92af1 |
| 30-Aug-2023 |
Eric Dumazet <[email protected]> |
ipv4: annotate data-races around fi->fib_dead
syzbot complained about a data-race in fib_table_lookup() [1]
Add appropriate annotations to document it.
[1] BUG: KCSAN: data-race in fib_release_inf
ipv4: annotate data-races around fi->fib_dead
syzbot complained about a data-race in fib_table_lookup() [1]
Add appropriate annotations to document it.
[1] BUG: KCSAN: data-race in fib_release_info / fib_table_lookup
write to 0xffff888150f31744 of 1 bytes by task 1189 on cpu 0: fib_release_info+0x3a0/0x460 net/ipv4/fib_semantics.c:281 fib_table_delete+0x8d2/0x900 net/ipv4/fib_trie.c:1777 fib_magic+0x1c1/0x1f0 net/ipv4/fib_frontend.c:1106 fib_del_ifaddr+0x8cf/0xa60 net/ipv4/fib_frontend.c:1317 fib_inetaddr_event+0x77/0x200 net/ipv4/fib_frontend.c:1448 notifier_call_chain kernel/notifier.c:93 [inline] blocking_notifier_call_chain+0x90/0x200 kernel/notifier.c:388 __inet_del_ifa+0x4df/0x800 net/ipv4/devinet.c:432 inet_del_ifa net/ipv4/devinet.c:469 [inline] inetdev_destroy net/ipv4/devinet.c:322 [inline] inetdev_event+0x553/0xaf0 net/ipv4/devinet.c:1606 notifier_call_chain kernel/notifier.c:93 [inline] raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461 call_netdevice_notifiers_info net/core/dev.c:1962 [inline] call_netdevice_notifiers_mtu+0xd2/0x130 net/core/dev.c:2037 dev_set_mtu_ext+0x30b/0x3e0 net/core/dev.c:8673 do_setlink+0x5be/0x2430 net/core/rtnetlink.c:2837 rtnl_setlink+0x255/0x300 net/core/rtnetlink.c:3177 rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6445 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2549 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6463 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] sock_write_iter+0x1aa/0x230 net/socket.c:1129 do_iter_write+0x4b4/0x7b0 fs/read_write.c:860 vfs_writev+0x1a8/0x320 fs/read_write.c:933 do_writev+0xf8/0x220 fs/read_write.c:976 __do_sys_writev fs/read_write.c:1049 [inline] __se_sys_writev fs/read_write.c:1046 [inline] __x64_sys_writev+0x45/0x50 fs/read_write.c:1046 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888150f31744 of 1 bytes by task 21839 on cpu 1: fib_table_lookup+0x2bf/0xd50 net/ipv4/fib_trie.c:1585 fib_lookup include/net/ip_fib.h:383 [inline] ip_route_output_key_hash_rcu+0x38c/0x12c0 net/ipv4/route.c:2751 ip_route_output_key_hash net/ipv4/route.c:2641 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2869 send4+0x1e7/0x500 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work+0x434/0x860 kernel/workqueue.c:2600 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2751 kthread+0x1d7/0x210 kernel/kthread.c:389 ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 21839 Comm: kworker/u4:18 Tainted: G W 6.5.0-syzkaller #0
Fixes: dccd9ecc3744 ("ipv4: Do not use dead fib_info entries.") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
09eed119 |
| 21-Mar-2023 |
Eric Dumazet <[email protected]> |
neighbour: switch to standard rcu, instead of rcu_bh
rcu_bh is no longer a win, especially for objects freed with standard call_rcu().
Switch neighbour code to no longer disable BH when not necessa
neighbour: switch to standard rcu, instead of rcu_bh
rcu_bh is no longer a win, especially for objects freed with standard call_rcu().
Switch neighbour code to no longer disable BH when not necessary.
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc3 |
|
| #
b071af52 |
| 13-Mar-2023 |
Eric Dumazet <[email protected]> |
neighbour: annotate lockless accesses to n->nud_state
We have many lockless accesses to n->nud_state.
Before adding another one in the following patch, add annotations to readers and writers.
Sign
neighbour: annotate lockless accesses to n->nud_state
We have many lockless accesses to n->nud_state.
Before adding another one in the following patch, add annotations to readers and writers.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Martin KaFai Lau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5 |
|
| #
5e9398a2 |
| 20-Jan-2023 |
Eric Dumazet <[email protected]> |
ipv4: prevent potential spectre v1 gadget in fib_metrics_match()
if (!type) continue; if (type > RTAX_MAX) return false; ... fi_val = fi->fib_metrics->metrics[type - 1];
ipv4: prevent potential spectre v1 gadget in fib_metrics_match()
if (!type) continue; if (type > RTAX_MAX) return false; ... fi_val = fi->fib_metrics->metrics[type - 1];
@type being used as an array index, we need to prevent cpu speculation or risk leaking kernel memory content.
Fixes: 5f9ae3d9e7e4 ("ipv4: do metrics match when looking up and deleting a route") Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8 |
|
| #
f96a3d74 |
| 04-Dec-2022 |
Ido Schimmel <[email protected]> |
ipv4: Fix incorrect route flushing when source address is deleted
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consol
ipv4: Fix incorrect route flushing when source address is deleted
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc7 |
|
| #
d5082d38 |
| 24-Nov-2022 |
Ido Schimmel <[email protected]> |
ipv4: Fix route deletion when nexthop info is not specified
When the kernel receives a route deletion request from user space it tries to delete a route that matches the route attributes specified i
ipv4: Fix route deletion when nexthop info is not specified
When the kernel receives a route deletion request from user space it tries to delete a route that matches the route attributes specified in the request.
If only prefix information is specified in the request, the kernel should delete the first matching FIB alias regardless of its associated FIB info. However, an error is currently returned when the FIB info is backed by a nexthop object:
# ip nexthop add id 1 via 192.0.2.2 dev dummy10 # ip route add 198.51.100.0/24 nhid 1 # ip route del 198.51.100.0/24 RTNETLINK answers: No such process
Fix by matching on such a FIB info when legacy nexthop attributes are not specified in the request. An earlier check already covers the case where a nexthop ID is specified in the request.
Add tests that cover these flows. Before the fix:
# ./fib_nexthops.sh -t ipv4_fcnal ... TEST: Delete route when not specifying nexthop attributes [FAIL]
Tests passed: 11 Tests failed: 1
After the fix:
# ./fib_nexthops.sh -t ipv4_fcnal ... TEST: Delete route when not specifying nexthop attributes [ OK ]
Tests passed: 12 Tests failed: 0
No regressions in other tests:
# ./fib_nexthops.sh ... Tests passed: 228 Tests failed: 0
# ./fib_tests.sh ... Tests passed: 186 Tests failed: 0
Cc: [email protected] Reported-by: Jonas Gorski <[email protected]> Tested-by: Jonas Gorski <[email protected]> Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Fixes: 6bf92d70e690 ("net: ipv4: fix route with nexthop object delete warning") Fixes: 61b91eb33a69 ("ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|