|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10 |
|
| #
252442f2 |
| 10-Jul-2024 |
Nicolas Dichtel <[email protected]> |
ipv6: fix source address selection with route leak
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, con
ipv6: fix source address selection with route leak
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address.
Let's add a check against the output interface and call the appropriate function to select the source address.
CC: [email protected] Fixes: 0d240e7811c4 ("net: vrf: Implement get_saddr for IPv6") Signed-off-by: Nicolas Dichtel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
e8dfd42c |
| 26-Apr-2024 |
Eric Dumazet <[email protected]> |
ipv6: introduce dst_rt6_info() helper
Instead of (struct rt6_info *)dst casts, we can use :
#define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst)
Some places needed
ipv6: introduce dst_rt6_info() helper
Instead of (struct rt6_info *)dst casts, we can use :
#define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst)
Some places needed missing const qualifiers :
ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway()
v2: added missing parts (David Ahern)
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
e7135f48 |
| 28-Feb-2024 |
Eric Dumazet <[email protected]> |
ipv6: annotate data-races around cnf.mtu6
idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <[email protected]> Review
ipv6: annotate data-races around cnf.mtu6
idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4 |
|
| #
129e406e |
| 08-Feb-2024 |
Kui-Feng Lee <[email protected]> |
net/ipv6: set expires in rt6_add_dflt_router().
Pass the duration of a lifetime (in seconds) to the function rt6_add_dflt_router() so that it can properly set the expiration time.
The function ndis
net/ipv6: set expires in rt6_add_dflt_router().
Pass the duration of a lifetime (in seconds) to the function rt6_add_dflt_router() so that it can properly set the expiration time.
The function ndisc_router_discovery() is the only one that calls rt6_add_dflt_router(), and it will later set the expiration time for the route created by rt6_add_dflt_router(). However, there is a gap of time between calling rt6_add_dflt_router() and setting the expiration time in ndisc_router_discovery(). During this period, there is a possibility that a new route may be removed from the routing table. By setting the correct expiration time in rt6_add_dflt_router(), we can prevent this from happening. The reason for setting RTF_EXPIRES in rt6_add_dflt_router() is to start the Garbage Collection (GC) timer, as it only activates when a route with RTF_EXPIRES is added to a table.
Suggested-by: David Ahern <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Kui-Feng Lee <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3 |
|
| #
fa17a6d8 |
| 18-Sep-2023 |
Eric Dumazet <[email protected]> |
ipv6: lockless IPV6_ADDR_PREFERENCES implementation
We have data-races while reading np->srcprefs
Switch the field to a plain byte, add READ_ONCE() and WRITE_ONCE() annotations where needed, and IP
ipv6: lockless IPV6_ADDR_PREFERENCES implementation
We have data-races while reading np->srcprefs
Switch the field to a plain byte, add READ_ONCE() and WRITE_ONCE() annotations where needed, and IPV6_ADDR_PREFERENCES setsockopt() can now be lockless.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc2 |
|
| #
6b724bc4 |
| 12-Sep-2023 |
Eric Dumazet <[email protected]> |
ipv6: lockless IPV6_MTU_DISCOVER implementation
Most np->pmtudisc reads are racy.
Move this 3bit field on a full byte, add annotations and make IPV6_MTU_DISCOVER setsockopt() lockless.
Signed-off-
ipv6: lockless IPV6_MTU_DISCOVER implementation
Most np->pmtudisc reads are racy.
Move this 3bit field on a full byte, add annotations and make IPV6_MTU_DISCOVER setsockopt() lockless.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4 |
|
| #
7f6c4039 |
| 26-Jul-2023 |
Hangbin Liu <[email protected]> |
IPv6: add extack info for IPv6 address add/delete
Add extack info for IPv6 address add/delete, which would be useful for users to understand the problem without having to read kernel code.
Suggeste
IPv6: add extack info for IPv6 address add/delete
Add extack info for IPv6 address add/delete, which would be useful for users to understand the problem without having to read kernel code.
Suggested-by: Beniamino Galvani <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
d288a162 |
| 23-Mar-2023 |
Wangyang Guo <[email protected]> |
net: dst: Prevent false sharing vs. dst_entry:: __refcnt
dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_
net: dst: Prevent false sharing vs. dst_entry:: __refcnt
dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_t, so the reference count operations have to take the cache-line exclusive.
Aside of the unavoidable reference count contention there is another significant problem which is caused by that: False sharing.
perf top identified two affected read accesses. dst_entry::lwtstate and rtable::rt_genid.
dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into a seperate cacheline vs. the read mostly members located at the beginning of the struct.
That prevents false sharing vs. the struct members in the first 64 bytes of the structure, but there is also
dst_entry::lwtstate
which is located after the reference count and in the same cache line. This member is read after a reference count has been acquired.
struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a size of 112 bytes, which means that the struct members of rtable which follow the dst member share the same cache line as dst_entry::__refcnt. Especially
rtable::rt_genid
is also read by the contexts which have a reference count acquired already.
When dst_entry:__refcnt is incremented or decremented via an atomic operation these read accesses stall. This was found when analysing the memtier benchmark in 1:100 mode, which amplifies the problem extremly.
Move the rt[6i]_uncached[_list] members out of struct rtable and struct rt6_info into struct dst_entry to provide padding and move the lwtstate member after that so it ends up in the same cache line.
The resulting improvement depends on the micro-architecture and the number of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached benchmark.
[ tglx: Rearrange struct ]
Signed-off-by: Wangyang Guo <[email protected]> Signed-off-by: Arjan van de Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6 |
|
| #
90317bcd |
| 23-Jan-2023 |
Guillaume Nault <[email protected]> |
ipv6: Make ip6_route_output_flags_noref() static.
This function is only used in net/ipv6/route.c and has no reason to be visible outside of it.
Signed-off-by: Guillaume Nault <[email protected]> Re
ipv6: Make ip6_route_output_flags_noref() static.
This function is only used in net/ipv6/route.c and has no reason to be visible outside of it.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/50706db7f675e40b3594d62011d9363dce32b92e.1674495822.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8 |
|
| #
949d6b40 |
| 20-Jul-2022 |
Jakub Kicinski <[email protected]> |
net: add missing includes and forward declarations under net/
This patch adds missing includes to headers under include/net. All these problems are currently masked by the existing users including t
net: add missing includes and forward declarations under net/
This patch adds missing includes to headers under include/net. All these problems are currently masked by the existing users including the missing dependency before the broken header.
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2 |
|
| #
8d22679d |
| 19-Nov-2021 |
Eric Dumazet <[email protected]> |
ipv6: ip6_skb_dst_mtu() cleanups
Use const attribute where we can, and cache skb_dst()
Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/20211119022355.2985984-1-eri
ipv6: ip6_skb_dst_mtu() cleanups
Use const attribute where we can, and cache skb_dst()
Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5 |
|
| #
40391467 |
| 03-Aug-2021 |
Antoine Tenart <[email protected]> |
net: ipv6: fix returned variable type in ip6_skb_dst_mtu
The patch fixing the returned value of ip6_skb_dst_mtu (int -> unsigned int) was rebased between its initial review and the version applied.
net: ipv6: fix returned variable type in ip6_skb_dst_mtu
The patch fixing the returned value of ip6_skb_dst_mtu (int -> unsigned int) was rebased between its initial review and the version applied. In the meantime fade56410c22 was applied, which added a new variable (int) used as the returned value. This lead to a mismatch between the function prototype and the variable used as the return value.
Fixes: 40fc3054b458 ("net: ipv6: fix return value of ip6_skb_dst_mtu") Cc: Vadim Fedorenko <[email protected]> Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc4, v5.14-rc3 |
|
| #
427faee1 |
| 20-Jul-2021 |
Vadim Fedorenko <[email protected]> |
net: ipv6: introduce ip6_dst_mtu_maybe_forward
Replace ip6_dst_mtu_forward with ip6_dst_mtu_maybe_forward and reuse this code in ip6_mtu. Actually these two functions were almost duplicates, this ch
net: ipv6: introduce ip6_dst_mtu_maybe_forward
Replace ip6_dst_mtu_forward with ip6_dst_mtu_maybe_forward and reuse this code in ip6_mtu. Actually these two functions were almost duplicates, this change will simplify the maintaince of mtu calculation code.
Signed-off-by: Vadim Fedorenko <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc2, v5.14-rc1 |
|
| #
40fc3054 |
| 01-Jul-2021 |
Vadim Fedorenko <[email protected]> |
net: ipv6: fix return value of ip6_skb_dst_mtu
Commit 628a5c561890 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") introduced ip6_skb_dst_mtu with return value of signed int which is inconsistent with actuall
net: ipv6: fix return value of ip6_skb_dst_mtu
Commit 628a5c561890 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") introduced ip6_skb_dst_mtu with return value of signed int which is inconsistent with actually returned values. Also 2 users of this function actually assign its value to unsigned int variable and only __xfrm6_output assigns result of this function to signed variable but actually uses as unsigned in further comparisons and calls. Change this function to return unsigned int value.
Fixes: 628a5c561890 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") Reviewed-by: David Ahern <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.13 |
|
| #
fade5641 |
| 25-Jun-2021 |
Vadim Fedorenko <[email protected]> |
net: lwtunnel: handle MTU calculation in forwading
Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved fragmentation logic away from lwtunnel by carry encap headroom and use it in outp
net: lwtunnel: handle MTU calculation in forwading
Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved fragmentation logic away from lwtunnel by carry encap headroom and use it in output MTU calculation. But the forwarding part was not covered and created difference in MTU for output and forwarding and further to silent drops on ipv4 forwarding path. Fix it by taking into account lwtunnel encap headroom.
The same commit also introduced difference in how to treat RTAX_MTU in IPv4 and IPv6 where latter explicitly removes lwtunnel encap headroom from route MTU. Make IPv4 version do the same.
Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation") Suggested-by: David Ahern <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6 |
|
| #
6b2e04bc |
| 25-Jan-2021 |
Praveen Chaudhary <[email protected]> |
net: allow user to set metric on default route learned via Router Advertisement
For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces
net: allow user to set metric on default route learned via Router Advertisement
For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used.
Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This patch adds sysctl for the same.
Logs:
For IPv4:
Config in etc/network/interfaces: auto eth0 iface eth0 inet dhcp metric 4261413864
IPv4 Kernel Route Table: $ ip route list default via 172.21.47.1 dev eth0 metric 4261413864
FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.] Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route
S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03 K 0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m
i.e. User can prefer Default Router learned via Routing Protocol in IPv4. Similar behavior is not possible for IPv6, without this fix.
After fix [for IPv6]: sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705
IP monitor: [When IPv6 RA is received] default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 pref high
Kernel IPv6 routing table $ ip -6 route list default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high
FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.] Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route
S>* ::/0 [20/0] is directly connected, eth0, 00:00:06 K ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m
If the metric is changed later, the effect will be seen only when next IPv6 RA is received, because the default route must be fully controlled by RA msg. Below metric is changed from 1996489705 to 1996489704.
$ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704 net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704
IP monitor: [On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric]
Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high
Signed-off-by: Praveen Chaudhary <[email protected]> Signed-off-by: Zhenggen Xu <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7 |
|
| #
7c1552da |
| 18-May-2020 |
Christoph Hellwig <[email protected]> |
ipv6: lift copy_from_user out of ipv6_route_ioctl
Prepare for better compat ioctl handling by moving the user copy out of ipv6_route_ioctl.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-
ipv6: lift copy_from_user out of ipv6_route_ioctl
Prepare for better compat ioctl handling by moving the user copy out of ipv6_route_ioctl.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.7-rc6, v5.7-rc5, v5.7-rc4 |
|
| #
11dd74b3 |
| 27-Apr-2020 |
Roopa Prabhu <[email protected]> |
net: ipv6: new arg skip_notify to ip6_rt_del
Used in subsequent work to skip route delete notifications on nexthop deletes.
Suggested-by: David Ahern <[email protected]> Signed-off-by: Roopa Prabhu
net: ipv6: new arg skip_notify to ip6_rt_del
Used in subsequent work to skip route delete notifications on nexthop deletes.
Suggested-by: David Ahern <[email protected]> Signed-off-by: Roopa Prabhu <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.7-rc3, v5.7-rc2, v5.7-rc1 |
|
| #
03e2a984 |
| 03-Apr-2020 |
Tim Stallard <[email protected]> |
net: ipv6: do not consider routes via gateways for anycast address check
The behaviour for what is considered an anycast address changed in commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes a
net: ipv6: do not consider routes via gateways for anycast address check
The behaviour for what is considered an anycast address changed in commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception"). This now considers the first address in a subnet where there is a route via a gateway to be an anycast address.
This breaks path MTU discovery and traceroutes when a host in a remote network uses the address at the start of a prefix (eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors will not be sent to anycast addresses.
This patch excludes any routes with a gateway, or via point to point links, like the behaviour previously from rt6_is_gw_or_nonexthop in net/ipv6/route.c.
This can be tested with: ip link add v1 type veth peer name v2 ip netns add test ip netns exec test ip link set lo up ip link set v2 netns test ip link set v1 up ip netns exec test ip link set v2 up ip addr add 2001:db8::1/64 dev v1 nodad ip addr add 2001:db8:100:: dev lo nodad ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad ip netns exec test ip route add unreachable 2001:db8:1::1 ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1 ip netns exec test sysctl net.ipv6.conf.all.forwarding=1 ip route add 2001:db8:1::1 via 2001:db8::2 ping -I 2001:db8::1 2001:db8:1::1 -c1 ping -I 2001:db8:100:: 2001:db8:1::1 -c1 ip addr delete 2001:db8:100:: dev lo ip netns delete test
Currently the first ping will get back a destination unreachable ICMP error, but the second will never get a response, with "icmp6_send: acast source" logged. After this patch, both get destination unreachable ICMP replies.
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Tim Stallard <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4 |
|
| #
207644f5 |
| 29-Feb-2020 |
Gustavo A. R. Silva <[email protected]> |
net: ip6_route: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to de
net: ip6_route: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99:
struct foo { int stuff; struct boo array[]; };
By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this change:
"Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7 |
|
| #
9b1c1ef1 |
| 24-Jun-2019 |
Nicolas Dichtel <[email protected]> |
ipv6: constify rt6_nexthop()
There is no functional change in this patch, it only prepares the next one.
rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const variables.
Signed-of
ipv6: constify rt6_nexthop()
There is no functional change in this patch, it only prepares the next one.
rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const variables.
Signed-off-by: Nicolas Dichtel <[email protected]> Reported-by: kbuild test robot <[email protected]> Acked-by: Nick Desaulniers <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc6 |
|
| #
1e47b483 |
| 21-Jun-2019 |
Stefano Brivio <[email protected]> |
ipv6: Dump route exceptions if requested
Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"), route exceptions reside in a separate hash table, and won't be found by walki
ipv6: Dump route exceptions if requested
Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"), route exceptions reside in a separate hash table, and won't be found by walking the FIB, so they won't be dumped to userspace on a RTM_GETROUTE message.
This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to have no function anymore:
# ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium # ip -6 route list cache # ip -6 route flush cache # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium
because iproute2 lists cached routes using RTM_GETROUTE, and flushes them by listing all the routes, and deleting them with RTM_DELROUTE one by one.
If cached routes are requested using the RTM_F_CLONED flag together with strict checking, or if no strict checking is requested (and hence we can't consistently apply filters), look up exceptions in the hash table associated with the current fib6_info in rt6_dump_route(), and, if present and not expired, add them to the dump.
We might be unable to dump all the entries for a given node in a single message, so keep track of how many entries were handled for the current node in fib6_walker, and skip that amount in case we start from the same partially dumped node.
When a partial dump restarts, as the starting node might change when 'sernum' changes, we have no guarantee that we need to skip the same amount of in-node entries. Therefore, we need two counters, and we need to zero the in-node counter if the node from which the dump is resumed differs.
Note that, with the current version of iproute2, this only fixes the 'ip -6 route list cache': on a flush command, iproute2 doesn't pass RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is still unable to fetch the routes to be flushed. This will be addressed in a patch for iproute2.
To flush cached routes, a procfs entry could be introduced instead: that's how it works for IPv4. We already have a rt6_flush_exception() function ready to be wired to it. However, this would not solve the issue for listing.
Versions of iproute2 and kernel tested:
iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched 3.18 list + + + + + + flush + + + + + + 4.4 list + + + + + + flush + + + + + + 4.9 list + + + + + + flush + + + + + + 4.14 list + + + + + + flush + + + + + + 4.15 list flush 4.19 list flush 5.0 list flush 5.1 list flush with list + + + + + + fix flush + + + +
v7: - Explain usage of "skip" counters in commit message (suggested by David Ahern)
v6: - Rebase onto net-next, use recently introduced nexthop walker - Make rt6_nh_dump_exceptions() a separate function (suggested by David Ahern)
v5: - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH, update test results (flushing works with iproute2 < 5.0.0 now)
v4: - Split NLM_F_MATCH and strict check handling in separate patches - Filter routes using RTM_F_CLONED: if it's not set, only return non-cached routes, and if it's set, only return cached routes: change requested by David Ahern and Martin Lau. This implies that iproute2 needs a separate patch to be able to flush IPv6 cached routes. This is not ideal because we can't fix the breakage caused by 2b760fcf5cfb entirely in kernel. However, two years have passed since then, and this makes it more tolerable
v3: - More descriptive comment about expired exceptions in rt6_dump_route() - Swap return values of rt6_dump_route() (suggested by Martin Lau) - Don't zero skip_in_node in case we don't dump anything in a given pass (also suggested by Martin Lau) - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic, it's just a flag to indicate the route was cloned, not to filter on routes
v2: Add tracking of number of entries to be skipped in current node after a partial dump. As we restart from the same node, if not all the exceptions for a given node fit in a single message, the dump will not terminate, as suggested by Martin Lau. This is a concrete possibility, setting up a big number of exceptions for the same route actually causes the issue, suggested by David Ahern.
Reported-by: Jianlin Shi <[email protected]> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Stefano Brivio <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
7d9e5f42 |
| 21-Jun-2019 |
Wei Wang <[email protected]> |
ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF
For tx path, in most cases, we still have to take refcnt on the dst cause the caller is caching the dst somewhere. But it still is beneficia
ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF
For tx path, in most cases, we still have to take refcnt on the dst cause the caller is caching the dst somewhere. But it still is beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the route lookup. It is cause this flag prevents manipulating refcnt on net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each routing table. The null_entry is a shared object and constant updates on it cause false sharing.
We converted the current major lookup function ip6_route_output_flags() to make use of RT6_LOOKUP_F_DST_NOREF.
Together with the change in the rx path, we see noticable performance boost: I ran synflood tests between 2 hosts under the same switch. Both hosts have 20G mlx NIC, and 8 tx/rx queues. Sender sends pure SYN flood with random src IPs and ports using trafgen. Receiver has a simple TCP listener on the target port. Both hosts have multiple custom rules: - For incoming packets, only local table is traversed. - For outgoing packets, 3 tables are traversed to find the route. The packet processing rate on the receiver is as follows: - Before the fix: 3.78Mpps - After the fix: 5.50Mpps
Signed-off-by: Wei Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
d64a1f57 |
| 21-Jun-2019 |
Wei Wang <[email protected]> |
ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic
This patch specifically converts the rule lookup logic to honor this flag and not release refcnt when traversing each rule and calling lookup(
ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic
This patch specifically converts the rule lookup logic to honor this flag and not release refcnt when traversing each rule and calling lookup() on each routing table. Similar to previous patch, we also need some special handling of dst entries in uncached list because there is always 1 refcnt taken for them even if RT6_LOOKUP_F_DST_NOREF flag is set.
Signed-off-by: Wei Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
0e09edcc |
| 21-Jun-2019 |
Wei Wang <[email protected]> |
ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route()
This new flag is to instruct the route lookup function to not take refcnt on the dst entry. The user which does route lookup with this
ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route()
This new flag is to instruct the route lookup function to not take refcnt on the dst entry. The user which does route lookup with this flag must properly use rcu protection. ip6_pol_route() is the major route lookup function for both tx and rx path. In this function: Do not take refcnt on dst if RT6_LOOKUP_F_DST_NOREF flag is set, and directly return the route entry. The caller should be holding rcu lock when using this flag, and decide whether to take refcnt or not.
One note on the dst cache in the uncached_list: As uncached_list does not consume refcnt, one refcnt is always returned back to the caller even if RT6_LOOKUP_F_DST_NOREF flag is set. Uncached dst is only possible in the output path. So in such call path, caller MUST check if the dst is in the uncached_list before assuming that there is no refcnt taken on the returned dst.
Signed-off-by: Wei Wang <[email protected]> Acked-by: Eric Dumazet <[email protected]> Acked-by: Mahesh Bandewar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|