History log of /linux-6.15/net/ipv4/fib_semantics.c (Results 1 – 25 of 299)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5
# 9f7f3ebe 28-Feb-2025 Kuniyuki Iwashima <[email protected]>

ipv4: fib: Namespacify fib_info hash tables.

We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.
Then, we need to have per-netns hash tables for struct fib_info.

Let's allocate the has

ipv4: fib: Namespacify fib_info hash tables.

We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.
Then, we need to have per-netns hash tables for struct fib_info.

Let's allocate the hash tables per netns.

fib_info_hash, fib_info_hash_bits, and fib_info_cnt are now moved
to struct netns_ipv4 and accessed with net->ipv4.fib_XXX.

Also, the netns checks are removed from fib_find_info_nh() and
fib_find_info().

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# b79bcaf7 28-Feb-2025 Kuniyuki Iwashima <[email protected]>

ipv4: fib: Add fib_info_hash_grow().

When the number of struct fib_info exceeds the hash table size in
fib_create_info(), we try to allocate a new hash table with the
doubled size.

The allocation i

ipv4: fib: Add fib_info_hash_grow().

When the number of struct fib_info exceeds the hash table size in
fib_create_info(), we try to allocate a new hash table with the
doubled size.

The allocation is done in fib_create_info(), and if successful, each
struct fib_info is moved to the new hash table by fib_info_hash_move().

Let's integrate the allocation and fib_info_hash_move() as
fib_info_hash_grow() to make the following change cleaner.

While at it, fib_info_hash_grow() is placed near other hash-table-specific
functions.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# d6306b9d 28-Feb-2025 Kuniyuki Iwashima <[email protected]>

ipv4: fib: Remove fib_info_hash_size.

We will allocate the fib_info hash tables per netns.

There are 5 global variables for fib_info hash tables:
fib_info_hash, fib_info_laddrhash, fib_info_hash_si

ipv4: fib: Remove fib_info_hash_size.

We will allocate the fib_info hash tables per netns.

There are 5 global variables for fib_info hash tables:
fib_info_hash, fib_info_laddrhash, fib_info_hash_size,
fib_info_hash_bits, fib_info_cnt.

However, fib_info_laddrhash and fib_info_hash_size can be
easily calculated from fib_info_hash and fib_info_hash_bits.

Let's remove fib_info_hash_size and use (1 << fib_info_hash_bits)
instead.

Now we need not pass the new hash table size to fib_info_hash_move().

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# 0dbca8c2 28-Feb-2025 Kuniyuki Iwashima <[email protected]>

ipv4: fib: Remove fib_info_laddrhash pointer.

We will allocate the fib_info hash tables per netns.

There are 5 global variables for fib_info hash tables:
fib_info_hash, fib_info_laddrhash, fib_info

ipv4: fib: Remove fib_info_laddrhash pointer.

We will allocate the fib_info hash tables per netns.

There are 5 global variables for fib_info hash tables:
fib_info_hash, fib_info_laddrhash, fib_info_hash_size,
fib_info_hash_bits, fib_info_cnt.

However, fib_info_laddrhash and fib_info_hash_size can be
easily calculated from fib_info_hash and fib_info_hash_bits.

Let's remove the fib_info_laddrhash pointer and instead use
fib_info_hash + (1 << fib_info_hash_bits).

While at it, fib_info_laddrhash_bucket() is moved near other
hash-table-specific functions.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# 84c75e94 28-Feb-2025 Kuniyuki Iwashima <[email protected]>

ipv4: fib: Make fib_info_hashfn() return struct hlist_head.

Every time fib_info_hashfn() returns a hash value, we fetch
&fib_info_hash[hash].

Let's return the hlist_head pointer from fib_info_hashf

ipv4: fib: Make fib_info_hashfn() return struct hlist_head.

Every time fib_info_hashfn() returns a hash value, we fetch
&fib_info_hash[hash].

Let's return the hlist_head pointer from fib_info_hashfn() and
rename it to fib_info_hash_bucket() to match a similar function,
fib_info_laddrhash_bucket().

Note that we need to move the fib_info_hash assignment earlier in
fib_info_hash_move() to use fib_info_hash_bucket() in the for loop.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# cfc47029 28-Feb-2025 Kuniyuki Iwashima <[email protected]>

ipv4: fib: Allocate fib_info_hash[] during netns initialisation.

We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns.

Currently, fib_info_hash[] is allocated when the first rou

ipv4: fib: Allocate fib_info_hash[] during netns initialisation.

We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns.

Currently, fib_info_hash[] is allocated when the first route is added.

Let's move the first allocation to a new __net_init function.

Note that we must call fib4_semantics_exit() in fib_net_exit_batch()
because ->exit() is called earlier than ->exit_batch().

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# fa336adc 28-Feb-2025 Kuniyuki Iwashima <[email protected]>

ipv4: fib: Allocate fib_info_hash[] and fib_info_laddrhash[] by kvcalloc().

Both fib_info_hash[] and fib_info_laddrhash[] are hash tables for
struct fib_info and are allocated by kvzmalloc() separat

ipv4: fib: Allocate fib_info_hash[] and fib_info_laddrhash[] by kvcalloc().

Both fib_info_hash[] and fib_info_laddrhash[] are hash tables for
struct fib_info and are allocated by kvzmalloc() separately.

Let's replace the two kvzmalloc() calls with kvcalloc() to remove
the fib_info_laddrhash pointer later.

Note that fib_info_hash_alloc() allocates a new hash table based on
fib_info_hash_bits because we will remove fib_info_hash_size later.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2
# a3f5f4c2 04-Oct-2024 Eric Dumazet <[email protected]>

ipv4: remove fib_info_devhash[]

Upcoming per-netns RTNL conversion needs to get rid
of shared hash tables.

fib_info_devhash[] is one of them.

It is unclear why we used a hash table, because
a sing

ipv4: remove fib_info_devhash[]

Upcoming per-netns RTNL conversion needs to get rid
of shared hash tables.

fib_info_devhash[] is one of them.

It is unclear why we used a hash table, because
a single hlist_head per net device was cheaper and scalable.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# 143ca845 04-Oct-2024 Eric Dumazet <[email protected]>

ipv4: remove fib_info_lock

After the prior patch, fib_info_lock became redundant
because all of its users are holding RTNL.

BH protection is not needed.

Remove the READ_ONCE()/WRITE_ONCE() annotat

ipv4: remove fib_info_lock

After the prior patch, fib_info_lock became redundant
because all of its users are holding RTNL.

BH protection is not needed.

Remove the READ_ONCE()/WRITE_ONCE() annotations around fib_info_cnt,
since it is protected by RTNL.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# fc38b283 04-Oct-2024 Eric Dumazet <[email protected]>

ipv4: use rcu in ip_fib_check_default()

fib_info_devhash[] is not resized in fib_info_hash_move().

fib_nh structs are already freed after an rcu grace period.

This will allow to remove fib_info_lo

ipv4: use rcu in ip_fib_check_default()

fib_info_devhash[] is not resized in fib_info_hash_move().

fib_nh structs are already freed after an rcu grace period.

This will allow to remove fib_info_lock in the following patch.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# 8a0f62fd 04-Oct-2024 Eric Dumazet <[email protected]>

ipv4: remove fib_devindex_hashfn()

fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash.

It makes no sense doing this from fib_info_hashfn() and
fib_find_info_nh().

It is better to

ipv4: remove fib_devindex_hashfn()

fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash.

It makes no sense doing this from fib_info_hashfn() and
fib_find_info_nh().

It is better to keep as many bits as possible to let
fib_info_hashfn_result() have better spread.

Only fib_info_devhash_bucket() needs to make this operation,
we can 'inline' trivial fib_devindex_hashfn() in it.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


# 9b8ca048 01-Oct-2024 Alexandre Ferrieux <[email protected]>

ipv4: avoid quadratic behavior in FIB insertion of common address

Mix netns into all IPv4 FIB hashes to avoid massive collision when
inserting the same address in many netns.

Signed-off-by: Alexand

ipv4: avoid quadratic behavior in FIB insertion of common address

Mix netns into all IPv4 FIB hashes to avoid massive collision when
inserting the same address in many netns.

Signed-off-by: Alexandre Ferrieux <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# 4c180887 22-Aug-2024 Li Zetao <[email protected]>

ipv4: delete redundant judgment statements

The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no n

ipv4: delete redundant judgment statements

The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.

Signed-off-by: Li Zetao <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.11-rc4
# 1fa3314c 14-Aug-2024 Ido Schimmel <[email protected]>

ipv4: Centralize TOS matching

The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized

ipv4: Centralize TOS matching

The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.

What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).

Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:

# ip -4 rule add tos 0x1c table 100
# ip -4 rule add tos 0x3c table 100
Error: Invalid tos.

While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.

Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.

A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.

However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.

This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.

Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.

Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.

Include <linux/ip.h> in <linux/in_route.h> since the former defines
IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
<linux/in_route.h>.

No regressions in FIB tests:

# ./fib_tests.sh
[...]
Tests passed: 218
Tests failed: 0

And FIB rule tests:

# ./fib_rule_tests.sh
[...]
Tests passed: 116
Tests failed: 0

Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

show more ...


Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10
# 68073523 10-Jul-2024 Nicolas Dichtel <[email protected]>

ipv4: fix source address selection with route leak

By default, an address assigned to the output interface is selected when
the source address is not specified. This is problematic when a route,
con

ipv4: fix source address selection with route leak

By default, an address assigned to the output interface is selected when
the source address is not specified. This is problematic when a route,
configured in a vrf, uses an interface from another vrf (aka route leak).
The original vrf does not own the selected source address.

Let's add a check against the output interface and call the appropriate
function to select the source address.

CC: [email protected]
Fixes: 8cbb512c923d ("net: Add source address lookup op for VRF")
Signed-off-by: Nicolas Dichtel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 61e2bbaf 31-May-2024 Jason Xing <[email protected]>

net: remove NULL-pointer net parameter in ip_metrics_convert

When I was doing some experiments, I found that when using the first
parameter, namely, struct net, in ip_metrics_convert() always trigge

net: remove NULL-pointer net parameter in ip_metrics_convert

When I was doing some experiments, I found that when using the first
parameter, namely, struct net, in ip_metrics_convert() always triggers NULL
pointer crash. Then I digged into this part, realizing that we can remove
this one due to its uselessness.

Signed-off-by: Jason Xing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

show more ...


Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6
# c4e86b43 23-Apr-2024 Eric Dumazet <[email protected]>

net: add two more call_rcu_hurry()

I had failures with pmtu.sh selftests lately,
with netns dismantles firing ref_tracking alerts [1].

After much debugging, I found that some queued
rcu callbacks w

net: add two more call_rcu_hurry()

I had failures with pmtu.sh selftests lately,
with netns dismantles firing ref_tracking alerts [1].

After much debugging, I found that some queued
rcu callbacks were delayed by minutes, because
of CONFIG_RCU_LAZY=y option.

Joel Fernandes had a similar issue in the past,
fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry()
for dst_release()")

In this commit, I make sure nexthop_free_rcu()
and free_fib_info_rcu() are not delayed too much
because they both can release device references.

tools/testing/selftests/net/pmtu.sh no longer fails.

Traces were:

[ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 3/5 users at
dst_alloc+0x76/0x160
ip6_dst_alloc+0x25/0x80
ip6_pol_route+0x2a8/0x450
ip6_pol_route_output+0x1f/0x30
fib6_rule_lookup+0x163/0x270
ip6_route_output_flags+0xda/0x190
ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
ip6_dst_lookup_flow+0x47/0xa0
udp_tunnel6_dst_lookup+0x158/0x210
vxlan_xmit_one+0x4c2/0x1550 [vxlan]
vxlan_xmit+0x52d/0x14f0 [vxlan]
dev_hard_start_xmit+0x7b/0x1e0
__dev_queue_xmit+0x20b/0xe40
ip6_finish_output2+0x2ea/0x6e0
ip6_finish_output+0x143/0x320
ip6_output+0x74/0x140

[ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at
netdev_get_by_index+0xc0/0xe0
fib6_nh_init+0x1a9/0xa90
rtm_new_nexthop+0x6fa/0x1580
rtnetlink_rcv_msg+0x155/0x3e0
netlink_rcv_skb+0x61/0x110
rtnetlink_rcv+0x19/0x20
netlink_unicast+0x23f/0x380
netlink_sendmsg+0x1fc/0x430
____sys_sendmsg+0x2ef/0x320
___sys_sendmsg+0x86/0xd0
__sys_sendmsg+0x67/0xc0
__x64_sys_sendmsg+0x21/0x30
x64_sys_call+0x252/0x2030
do_syscall_64+0x6c/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e

[ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at
ipv6_add_dev+0x136/0x530
addrconf_notify+0x19d/0x770
notifier_call_chain+0x65/0xd0
raw_notifier_call_chain+0x1a/0x20
call_netdevice_notifiers_info+0x54/0x90
register_netdevice+0x61e/0x790
veth_newlink+0x230/0x440
__rtnl_newlink+0x7d2/0xaa0
rtnl_newlink+0x4c/0x70
rtnetlink_rcv_msg+0x155/0x3e0
netlink_rcv_skb+0x61/0x110
rtnetlink_rcv+0x19/0x20
netlink_unicast+0x23f/0x380
netlink_sendmsg+0x1fc/0x430
____sys_sendmsg+0x2ef/0x320
___sys_sendmsg+0x86/0xd0
....
[ 1079.316024] ? show_regs+0x68/0x80
[ 1079.316087] ? __warn+0x8c/0x140
[ 1079.316103] ? ref_tracker_free+0x1a0/0x270
[ 1079.316117] ? report_bug+0x196/0x1c0
[ 1079.316135] ? handle_bug+0x42/0x80
[ 1079.316149] ? exc_invalid_op+0x1c/0x70
[ 1079.316162] ? asm_exc_invalid_op+0x1f/0x30
[ 1079.316193] ? ref_tracker_free+0x1a0/0x270
[ 1079.316208] ? _raw_spin_unlock+0x1a/0x40
[ 1079.316222] ? free_unref_page+0x126/0x1a0
[ 1079.316239] ? destroy_large_folio+0x69/0x90
[ 1079.316251] ? __folio_put+0x99/0xd0
[ 1079.316276] dst_dev_put+0x69/0xd0
[ 1079.316308] fib6_nh_release_dsts.part.0+0x3d/0x80
[ 1079.316327] fib6_nh_release+0x45/0x70
[ 1079.316340] nexthop_free_rcu+0x131/0x170
[ 1079.316356] rcu_do_batch+0x1ee/0x820
[ 1079.316370] ? rcu_do_batch+0x179/0x820
[ 1079.316388] rcu_core+0x1aa/0x4d0
[ 1079.316405] rcu_core_si+0x12/0x20
[ 1079.316417] __do_softirq+0x13a/0x3dc
[ 1079.316435] __irq_exit_rcu+0xa3/0x110
[ 1079.316449] irq_exit_rcu+0x12/0x30
[ 1079.316462] sysvec_apic_timer_interrupt+0x5b/0xe0
[ 1079.316474] asm_sysvec_apic_timer_interrupt+0x1f/0x30
[ 1079.316569] RIP: 0033:0x7f06b65c63f0

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7
# 195374d8 17-Oct-2023 Eric Dumazet <[email protected]>

ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr

syzbot reported a data-race while accessing nh->nh_saddr_genid [1]

Add annotations, but leave the code lazy as intended.

[1]
BU

ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr

syzbot reported a data-race while accessing nh->nh_saddr_genid [1]

Add annotations, but leave the code lazy as intended.

[1]
BUG: KCSAN: data-race in fib_select_path / fib_select_path

write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1:
fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline]
fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline]
fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269
ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0:
fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline]
fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269
ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x959d3217 -> 0x959d3218

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker

Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3
# 4b2b6060 22-Sep-2023 Hangbin Liu <[email protected]>

ipv4/fib: send notify when delete source address routes

After deleting an interface address in fib_del_ifaddr(), the function
scans the fib_info list for stray entries and calls fib_flush() and
fib_

ipv4/fib: send notify when delete source address routes

After deleting an interface address in fib_del_ifaddr(), the function
scans the fib_info list for stray entries and calls fib_flush() and
fib_table_flush(). Then the stray entries will be deleted silently and no
RTM_DELROUTE notification will be sent.

This lack of notification can make routing daemons, or monitor like
`ip monitor route` miss the routing changes. e.g.

+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 192.168.5.5/24 dev dummy1
+ ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5
+ ip -4 route
7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5

As Ido reminded, fib_table_flush() isn't only called when an address is
deleted, but also when an interface is deleted or put down. The lack of
notification in these cases is deliberate. And commit 7c6bb7d2faaf
("net/ipv6: Add knob to skip DELROUTE message on device down") introduced
a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send
the route delete notify blindly in fib_table_flush().

To fix this issue, let's add a new flag in "struct fib_info" to track the
deleted prefer source address routes, and only send notify for them.

After update:
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5

Suggested-by: Thomas Haller <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

show more ...


Revision tags: v6.6-rc2, v6.6-rc1
# fce92af1 30-Aug-2023 Eric Dumazet <[email protected]>

ipv4: annotate data-races around fi->fib_dead

syzbot complained about a data-race in fib_table_lookup() [1]

Add appropriate annotations to document it.

[1]
BUG: KCSAN: data-race in fib_release_inf

ipv4: annotate data-races around fi->fib_dead

syzbot complained about a data-race in fib_table_lookup() [1]

Add appropriate annotations to document it.

[1]
BUG: KCSAN: data-race in fib_release_info / fib_table_lookup

write to 0xffff888150f31744 of 1 bytes by task 1189 on cpu 0:
fib_release_info+0x3a0/0x460 net/ipv4/fib_semantics.c:281
fib_table_delete+0x8d2/0x900 net/ipv4/fib_trie.c:1777
fib_magic+0x1c1/0x1f0 net/ipv4/fib_frontend.c:1106
fib_del_ifaddr+0x8cf/0xa60 net/ipv4/fib_frontend.c:1317
fib_inetaddr_event+0x77/0x200 net/ipv4/fib_frontend.c:1448
notifier_call_chain kernel/notifier.c:93 [inline]
blocking_notifier_call_chain+0x90/0x200 kernel/notifier.c:388
__inet_del_ifa+0x4df/0x800 net/ipv4/devinet.c:432
inet_del_ifa net/ipv4/devinet.c:469 [inline]
inetdev_destroy net/ipv4/devinet.c:322 [inline]
inetdev_event+0x553/0xaf0 net/ipv4/devinet.c:1606
notifier_call_chain kernel/notifier.c:93 [inline]
raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461
call_netdevice_notifiers_info net/core/dev.c:1962 [inline]
call_netdevice_notifiers_mtu+0xd2/0x130 net/core/dev.c:2037
dev_set_mtu_ext+0x30b/0x3e0 net/core/dev.c:8673
do_setlink+0x5be/0x2430 net/core/rtnetlink.c:2837
rtnl_setlink+0x255/0x300 net/core/rtnetlink.c:3177
rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6445
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2549
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6463
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
sock_write_iter+0x1aa/0x230 net/socket.c:1129
do_iter_write+0x4b4/0x7b0 fs/read_write.c:860
vfs_writev+0x1a8/0x320 fs/read_write.c:933
do_writev+0xf8/0x220 fs/read_write.c:976
__do_sys_writev fs/read_write.c:1049 [inline]
__se_sys_writev fs/read_write.c:1046 [inline]
__x64_sys_writev+0x45/0x50 fs/read_write.c:1046
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff888150f31744 of 1 bytes by task 21839 on cpu 1:
fib_table_lookup+0x2bf/0xd50 net/ipv4/fib_trie.c:1585
fib_lookup include/net/ip_fib.h:383 [inline]
ip_route_output_key_hash_rcu+0x38c/0x12c0 net/ipv4/route.c:2751
ip_route_output_key_hash net/ipv4/route.c:2641 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2869
send4+0x1e7/0x500 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work+0x434/0x860 kernel/workqueue.c:2600
worker_thread+0x5f2/0xa10 kernel/workqueue.c:2751
kthread+0x1d7/0x210 kernel/kthread.c:389
ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x00 -> 0x01

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 21839 Comm: kworker/u4:18 Tainted: G W 6.5.0-syzkaller #0

Fixes: dccd9ecc3744 ("ipv4: Do not use dead fib_info entries.")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

show more ...


Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4
# 09eed119 21-Mar-2023 Eric Dumazet <[email protected]>

neighbour: switch to standard rcu, instead of rcu_bh

rcu_bh is no longer a win, especially for objects freed
with standard call_rcu().

Switch neighbour code to no longer disable BH when not necessa

neighbour: switch to standard rcu, instead of rcu_bh

rcu_bh is no longer a win, especially for objects freed
with standard call_rcu().

Switch neighbour code to no longer disable BH when not necessary.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.3-rc3
# b071af52 13-Mar-2023 Eric Dumazet <[email protected]>

neighbour: annotate lockless accesses to n->nud_state

We have many lockless accesses to n->nud_state.

Before adding another one in the following patch,
add annotations to readers and writers.

Sign

neighbour: annotate lockless accesses to n->nud_state

We have many lockless accesses to n->nud_state.

Before adding another one in the following patch,
add annotations to readers and writers.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5
# 5e9398a2 20-Jan-2023 Eric Dumazet <[email protected]>

ipv4: prevent potential spectre v1 gadget in fib_metrics_match()

if (!type)
continue;
if (type > RTAX_MAX)
return false;
...
fi_val = fi->fib_metrics->metrics[type - 1];

ipv4: prevent potential spectre v1 gadget in fib_metrics_match()

if (!type)
continue;
if (type > RTAX_MAX)
return false;
...
fi_val = fi->fib_metrics->metrics[type - 1];

@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.

Fixes: 5f9ae3d9e7e4 ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8
# f96a3d74 04-Dec-2022 Ido Schimmel <[email protected]>

ipv4: Fix incorrect route flushing when source address is deleted

Cited commit added the table ID to the FIB info structure, but did not
prevent structures with different table IDs from being consol

ipv4: Fix incorrect route flushing when source address is deleted

Cited commit added the table ID to the FIB info structure, but did not
prevent structures with different table IDs from being consolidated.
This can lead to routes being flushed from a VRF when an address is
deleted from a different VRF.

Fix by taking the table ID into account when looking for a matching FIB
info. This is already done for FIB info structures backed by a nexthop
object in fib_find_info_nh().

Add test cases that fail before the fix:

# ./fib_tests.sh -t ipv4_del_addr

IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [FAIL]
TEST: Route in default VRF not removed [ OK ]
RTNETLINK answers: File exists
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [FAIL]

Tests passed: 6
Tests failed: 2

And pass after:

# ./fib_tests.sh -t ipv4_del_addr

IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]

Tests passed: 8
Tests failed: 0

Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


Revision tags: v6.1-rc7
# d5082d38 24-Nov-2022 Ido Schimmel <[email protected]>

ipv4: Fix route deletion when nexthop info is not specified

When the kernel receives a route deletion request from user space it
tries to delete a route that matches the route attributes specified i

ipv4: Fix route deletion when nexthop info is not specified

When the kernel receives a route deletion request from user space it
tries to delete a route that matches the route attributes specified in
the request.

If only prefix information is specified in the request, the kernel
should delete the first matching FIB alias regardless of its associated
FIB info. However, an error is currently returned when the FIB info is
backed by a nexthop object:

# ip nexthop add id 1 via 192.0.2.2 dev dummy10
# ip route add 198.51.100.0/24 nhid 1
# ip route del 198.51.100.0/24
RTNETLINK answers: No such process

Fix by matching on such a FIB info when legacy nexthop attributes are
not specified in the request. An earlier check already covers the case
where a nexthop ID is specified in the request.

Add tests that cover these flows. Before the fix:

# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [FAIL]

Tests passed: 11
Tests failed: 1

After the fix:

# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [ OK ]

Tests passed: 12
Tests failed: 0

No regressions in other tests:

# ./fib_nexthops.sh
...
Tests passed: 228
Tests failed: 0

# ./fib_tests.sh
...
Tests passed: 186
Tests failed: 0

Cc: [email protected]
Reported-by: Jonas Gorski <[email protected]>
Tested-by: Jonas Gorski <[email protected]>
Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects")
Fixes: 6bf92d70e690 ("net: ipv4: fix route with nexthop object delete warning")
Fixes: 61b91eb33a69 ("ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

show more ...


12345678910>>...12