|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
cfc47029 |
| 28-Feb-2025 |
Kuniyuki Iwashima <[email protected]> |
ipv4: fib: Allocate fib_info_hash[] during netns initialisation.
We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns.
Currently, fib_info_hash[] is allocated when the first rou
ipv4: fib: Allocate fib_info_hash[] during netns initialisation.
We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns.
Currently, fib_info_hash[] is allocated when the first route is added.
Let's move the first allocation to a new __net_init function.
Note that we must call fib4_semantics_exit() in fib_net_exit_batch() because ->exit() is called earlier than ->exit_batch().
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
37653a0b |
| 07-Nov-2024 |
Menglong Dong <[email protected]> |
net: ip: make fib_validate_source() support drop reasons
In this commit, we make fib_validate_source() and __fib_validate_source() return -reason instead of errno on error.
The return value of fib_
net: ip: make fib_validate_source() support drop reasons
In this commit, we make fib_validate_source() and __fib_validate_source() return -reason instead of errno on error.
The return value of fib_validate_source can be -errno, 0, and 1. It's hard to make fib_validate_source() return drop reasons directly.
The fib_validate_source() will return 1 if the scope of the source(revert) route is HOST. And the __mkroute_input() will mark the skb with IPSKB_DOREDIRECT in this case (combine with some other conditions). And then, a REDIRECT ICMP will be sent in ip_forward() if this flag exists. We can't pass this information to __mkroute_input if we make fib_validate_source() return drop reasons.
Therefore, we introduce the wrapper fib_validate_source_reason() for fib_validate_source(), which will return the drop reasons on error.
In the origin logic, LINUX_MIB_IPRPFILTER will be counted if fib_validate_source() return -EXDEV. And now, we need to adjust it by checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take effect only after the patch "net: ip: make ip_route_input_noref() return drop reasons", as we can't pass the drop reasons from fib_validate_source() to ip_rcv_finish_core() in this patch.
Following new drop reasons are added in this patch:
SKB_DROP_REASON_IP_LOCAL_SOURCE SKB_DROP_REASON_IP_INVALID_SOURCE
Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
16207384 |
| 09-Oct-2024 |
Eric Dumazet <[email protected]> |
ipv4: use READ_ONCE()/WRITE_ONCE() on net->ipv4.fib_seq
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL. We can use READ_ONCE() when reading it.
Con
ipv4: use READ_ONCE()/WRITE_ONCE() on net->ipv4.fib_seq
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL. We can use READ_ONCE() when reading it.
Constify 'struct net' argument of fib4_rules_seq_read()
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
d36236ab |
| 07-Oct-2024 |
Guillaume Nault <[email protected]> |
ipv4: Convert fib_validate_source() to dscp_t.
Pass a dscp_t variable to fib_validate_source(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
All callers of fib_
ipv4: Convert fib_validate_source() to dscp_t.
Pass a dscp_t variable to fib_validate_source(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
All callers of fib_validate_source() already have a dscp_t variable to pass as parameter. We just need to remove the inet_dscp_to_dsfield() conversions.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/08612a4519bc5a3578bb493fbaad82437ebb73dc.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7 |
|
| #
1083d733 |
| 03-Sep-2024 |
Ido Schimmel <[email protected]> |
ipv4: Fix user space build failure due to header change
RT_TOS() from include/uapi/linux/in_route.h is defined using IPTOS_TOS_MASK from include/uapi/linux/ip.h. This is problematic for files such a
ipv4: Fix user space build failure due to header change
RT_TOS() from include/uapi/linux/in_route.h is defined using IPTOS_TOS_MASK from include/uapi/linux/ip.h. This is problematic for files such as include/net/ip_fib.h that want to use RT_TOS() as without including both header files kernel compilation fails:
In file included from ./include/net/ip_fib.h:25, from ./include/net/route.h:27, from ./include/net/lwtunnel.h:9, from net/core/dst.c:24: ./include/net/ip_fib.h: In function ‘fib_dscp_masked_match’: ./include/uapi/linux/in_route.h:31:32: error: ‘IPTOS_TOS_MASK’ undeclared (first use in this function) 31 | #define RT_TOS(tos) ((tos)&IPTOS_TOS_MASK) | ^~~~~~~~~~~~~~ ./include/net/ip_fib.h:440:45: note: in expansion of macro ‘RT_TOS’ 440 | return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));
Therefore, cited commit changed linux/in_route.h to include linux/ip.h. However, as reported by David, this breaks iproute2 compilation due overlapping definitions between linux/ip.h and /usr/include/netinet/ip.h:
In file included from ../include/uapi/linux/in_route.h:5, from iproute.c:19: ../include/uapi/linux/ip.h:25:9: warning: "IPTOS_TOS" redefined 25 | #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK) | ^~~~~~~~~ In file included from iproute.c:17: /usr/include/netinet/ip.h:222:9: note: this is the location of the previous definition 222 | #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK)
Fix by changing include/net/ip_fib.h to include linux/ip.h. Note that usage of RT_TOS() should not spread further in the kernel due to recent work in this area.
Fixes: 1fa3314c14c6 ("ipv4: Centralize TOS matching") Reported-by: David Ahern <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
1fa3314c |
| 14-Aug-2024 |
Ido Schimmel <[email protected]> |
ipv4: Centralize TOS matching
The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized
ipv4: Centralize TOS matching
The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized differently by different call sites. Some treat it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC 791 TOS and initialize it using IPTOS_RT_MASK.
What is common to all these call sites is that they all initialize the lower three DSCP bits, which fits the TOS definition in the initial IPv4 specification (RFC 791).
Therefore, the kernel only allows configuring IPv4 FIB rules that match on the lower three DSCP bits which are always guaranteed to be initialized by all call sites:
# ip -4 rule add tos 0x1c table 100 # ip -4 rule add tos 0x3c table 100 Error: Invalid tos.
While this works, it is unlikely to be very useful. RFC 791 that initially defined the TOS and IP precedence fields was updated by RFC 2474 over twenty five years ago where these fields were replaced by a single six bits DSCP field.
Extending FIB rules to match on DSCP can be done by adding a new DSCP selector while maintaining the existing semantics of the TOS selector for applications that rely on that.
A prerequisite for allowing FIB rules to match on DSCP is to adjust all the call sites to initialize the high order DSCP bits and remove their masking along the path to the core where the field is matched on.
However, making this change alone will result in a behavior change. For example, a forwarded IPv4 packet with a DS field of 0xfc will no longer match a FIB rule that was configured with 'tos 0x1c'.
This behavior change can be avoided by masking the upper three DSCP bits in 'flowi4_tos' before comparing it against the TOS selectors in FIB rules and routes.
Implement the above by adding a new function that checks whether a given DSCP value matches the one specified in the IPv4 flow information structure and invoke it from the three places that currently match on 'flowi4_tos'.
Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK since the latter is not uAPI and we should be able to remove it at some point.
Include <linux/ip.h> in <linux/in_route.h> since the former defines IPTOS_TOS_MASK which is used in the definition of RT_TOS() in <linux/in_route.h>.
No regressions in FIB tests:
# ./fib_tests.sh [...] Tests passed: 218 Tests failed: 0
And FIB rule tests:
# ./fib_rule_tests.sh [...] Tests passed: 116 Tests failed: 0
Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
338bb57e |
| 15-Jul-2024 |
Ido Schimmel <[email protected]> |
ipv4: Fix incorrect TOS in route get reply
The TOS value that is returned to user space in the route get reply is the one with which the lookup was performed ('fl4->flowi4_tos'). This is fine when t
ipv4: Fix incorrect TOS in route get reply
The TOS value that is returned to user space in the route get reply is the one with which the lookup was performed ('fl4->flowi4_tos'). This is fine when the matched route is configured with a TOS as it would not match if its TOS value did not match the one with which the lookup was performed.
However, matching on TOS is only performed when the route's TOS is not zero. It is therefore possible to have the kernel incorrectly return a non-zero TOS:
# ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get 192.0.2.2 tos 0xfc 192.0.2.2 tos 0x1c dev dummy1 src 192.0.2.1 uid 0 cache
Fix by adding a DSCP field to the FIB result structure (inside an existing 4 bytes hole), populating it in the route lookup and using it when filling the route get reply.
Output after the patch:
# ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get 192.0.2.2 tos 0xfc 192.0.2.2 dev dummy1 src 192.0.2.1 uid 0 cache
Fixes: 1a00fee4ffb2 ("ipv4: Remove rt_key_{src,dst,tos} from struct rtable.") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
4ee2a8ca |
| 07-Jun-2024 |
Petr Machata <[email protected]> |
net: ipv4: Add a sysctl to set multipath hash seed
When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
net: ipv4: Add a sysctl to set multipath hash seed
When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used.
In this patch, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Like those, the multipath hash seed is a per-netns variable.
Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables.
The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key.
Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
3e453ca1 |
| 07-Jun-2024 |
Petr Machata <[email protected]> |
net: ipv4,ipv6: Pass multipath hash computation through a helper
The following patches will add a sysctl to control multipath hash seed. In order to centralize the hash computation, add a helper, fi
net: ipv4,ipv6: Pass multipath hash computation through a helper
The following patches will add a sysctl to control multipath hash seed. In order to centralize the hash computation, add a helper, fib_multipath_hash_from_keys(), and have all IPv4 and IPv6 route.c invocations of flow_hash_from_keys() go through this helper instead.
Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
22e36ea9 |
| 22-Feb-2024 |
Eric Dumazet <[email protected]> |
inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
Add a new field into struct fib_dump_filter, to let callers tell if they use RTNL locking or RCU.
This is used in the following pat
inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
Add a new field into struct fib_dump_filter, to let callers tell if they use RTNL locking or RCU.
This is used in the following patch, when inet_dump_fib() no longer holds RTNL.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Donald Hunter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3 |
|
| #
4b2b6060 |
| 22-Sep-2023 |
Hangbin Liu <[email protected]> |
ipv4/fib: send notify when delete source address routes
After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_
ipv4/fib: send notify when delete source address routes
After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_table_flush(). Then the stray entries will be deleted silently and no RTM_DELROUTE notification will be sent.
This lack of notification can make routing daemons, or monitor like `ip monitor route` miss the routing changes. e.g.
+ ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 192.168.5.5/24 dev dummy1 + ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5 + ip -4 route 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
As Ido reminded, fib_table_flush() isn't only called when an address is deleted, but also when an interface is deleted or put down. The lack of notification in these cases is deliberate. And commit 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down") introduced a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send the route delete notify blindly in fib_table_flush().
To fix this issue, let's add a new flag in "struct fib_info" to track the deleted prefer source address routes, and only send notify for them.
After update: + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
Suggested-by: Thomas Haller <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
5b98fd5d |
| 22-Sep-2023 |
Kees Cook <[email protected]> |
ipv4: Annotate struct fib_info with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have the
ipv4: Annotate struct fib_info with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).
As found with Coccinelle[1], add __counted_by for struct fib_info.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: David Ahern <[email protected]> Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc2, v6.6-rc1 |
|
| #
8aae7625 |
| 30-Aug-2023 |
Florian Westphal <[email protected]> |
net: fib: avoid warn splat in flow dissector
New skbs allocated via nf_send_reset() have skb->dev == NULL.
fib*_rules_early_flow_dissect helpers already have a 'struct net' argument but its not pas
net: fib: avoid warn splat in flow dissector
New skbs allocated via nf_send_reset() have skb->dev == NULL.
fib*_rules_early_flow_dissect helpers already have a 'struct net' argument but its not passed down to the flow dissector core, which will then WARN as it can't derive a net namespace to use:
WARNING: CPU: 0 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0 [..] ip_route_me_harder+0x143/0x330 nf_send_reset+0x17c/0x2d0 [nf_reject_ipv4] nft_reject_inet_eval+0xa9/0xf2 [nft_reject_inet] nft_do_chain+0x198/0x5d0 [nf_tables] nft_do_chain_inet+0xa4/0x110 [nf_tables] nf_hook_slow+0x41/0xc0 ip_local_deliver+0xce/0x110 ..
Cc: Stanislav Fomichev <[email protected]> Cc: David Ahern <[email protected]> Cc: Ido Schimmel <[email protected]> Fixes: 812fa71f0d96 ("netfilter: Dissect flow after packet mangling") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217826 Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2 |
|
| #
568a3f33 |
| 08-Apr-2022 |
Guillaume Nault <[email protected]> |
ipv4: Use dscp_t in struct fib_entry_notifier_info
Use the new dscp_t type to replace the tos field of struct fib_entry_notifier_info. This ensures ECN bits are ignored and makes it compatible with
ipv4: Use dscp_t in struct fib_entry_notifier_info
Use the new dscp_t type to replace the tos field of struct fib_entry_notifier_info. This ensures ECN bits are ignored and makes it compatible with the dscp field of struct fib_rt_info.
This also allows sparse to flag potential incorrect uses of DSCP and ECN bits.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
888ade8f |
| 08-Apr-2022 |
Guillaume Nault <[email protected]> |
ipv4: Use dscp_t in struct fib_rt_info
Use the new dscp_t type to replace the tos field of struct fib_rt_info. This ensures ECN bits are ignored and makes it compatible with the fa_dscp field of str
ipv4: Use dscp_t in struct fib_rt_info
Use the new dscp_t type to replace the tos field of struct fib_rt_info. This ensures ECN bits are ignored and makes it compatible with the fa_dscp field of struct fib_alias.
This also allows sparse to flag potential incorrect uses of DSCP and ECN bits.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3 |
|
| #
f55fbb6a |
| 04-Feb-2022 |
Guillaume Nault <[email protected]> |
ipv4: Reject routes specifying ECN bits in rtm_tos
Use the new dscp_t type to replace the fc_tos field of fib_config, to ensure IPv4 routes aren't influenced by ECN bits when configured with non-zer
ipv4: Reject routes specifying ECN bits in rtm_tos
Use the new dscp_t type to replace the fc_tos field of fib_config, to ensure IPv4 routes aren't influenced by ECN bits when configured with non-zero rtm_tos.
Before this patch, IPv4 routes specifying an rtm_tos with some of the ECN bits set were accepted. However they wouldn't work (never match) as IPv4 normally clears the ECN bits with IPTOS_RT_MASK before doing a FIB lookup (although a few buggy code paths don't).
After this patch, IPv4 routes specifying an rtm_tos with any ECN bit set is rejected.
Note: IPv6 routes ignore rtm_tos altogether, any rtm_tos is accepted, but treated as if it were 0.
Signed-off-by: Guillaume Nault <[email protected]> Acked-by: David Ahern <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5 |
|
| #
e44b14eb |
| 07-Dec-2021 |
Eric Dumazet <[email protected]> |
inet: add net device refcount tracker to struct fib_nh_common
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Revision tags: v5.16-rc4 |
|
| #
213f5f8f |
| 02-Dec-2021 |
Eric Dumazet <[email protected]> |
ipv4: convert fib_num_tclassid_users to atomic_t
Before commit faa041a40b9f ("ipv4: Create cleanup helper for fib_nh") changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.
After the
ipv4: convert fib_num_tclassid_users to atomic_t
Before commit faa041a40b9f ("ipv4: Create cleanup helper for fib_nh") changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.
After the change, this is no longer the case, as free_fib_info_rcu() runs after rcu grace period, without rtnl being held.
Fixes: faa041a40b9f ("ipv4: Create cleanup helper for fib_nh") Signed-off-by: Eric Dumazet <[email protected]> Cc: David Ahern <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3 |
|
| #
597aa16c |
| 23-Sep-2021 |
Xiao Liang <[email protected]> |
net: ipv4: Fix rtnexthop len when RTA_FLOW is present
Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct.
Fixes: b0f60193632e ("ipv4: Ref
net: ipv4: Fix rtnexthop len when RTA_FLOW is present
Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct.
Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4 |
|
| #
79976892 |
| 29-Jul-2021 |
Yajun Deng <[email protected]> |
net: convert fib_treeref from int to refcount_t
refcount_t type should be used instead of int when fib_treeref is used as a reference counter,and avoid use-after-free risks.
Signed-off-by: Yajun De
net: convert fib_treeref from int to refcount_t
refcount_t type should be used instead of int when fib_treeref is used as a reference counter,and avoid use-after-free risks.
Signed-off-by: Yajun Deng <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3 |
|
| #
ce5c9c20 |
| 17-May-2021 |
Ido Schimmel <[email protected]> |
ipv4: Add a sysctl to control multipath hash fields
A subsequent patch will add a new multipath hash policy where the packet fields used for multipath hash calculation are determined by user space.
ipv4: Add a sysctl to control multipath hash fields
A subsequent patch will add a new multipath hash policy where the packet fields used for multipath hash calculation are determined by user space. This patch adds a sysctl that allows user space to set these fields.
The packet fields are represented using a bitmask and are common between IPv4 and IPv6 to allow user space to use the same numbering across both protocols. For example, to hash based on standard 5-tuple:
# sysctl -w net.ipv4.fib_multipath_hash_fields=0x0037 net.ipv4.fib_multipath_hash_fields = 0x0037
The kernel rejects unknown fields, for example:
# sysctl -w net.ipv4.fib_multipath_hash_fields=0x1000 sysctl: setting key "net.ipv4.fib_multipath_hash_fields": Invalid argument
More fields can be added in the future, if needed.
Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7 |
|
| #
36c5100e |
| 07-Feb-2021 |
Amit Cohen <[email protected]> |
IPv4: Add "offload failed" indication to routes
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily
IPv4: Add "offload failed" indication to routes
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware.
The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware.
To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl.
With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come.
This patch adds an "offload_failed" indication to IPv4 routes, so that users will have better visibility into the offload process.
'struct fib_alias', and 'struct fib_rt_info' are extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase structs size.
Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7 |
|
| #
1fd1c768 |
| 26-May-2020 |
David Ahern <[email protected]> |
ipv4: nexthop version of fib_info_nh_uses_dev
Similar to the last path, need to fix fib_info_nh_uses_dev for external nexthops to avoid referencing multiple nh_grp structs. Move the device check in
ipv4: nexthop version of fib_info_nh_uses_dev
Similar to the last path, need to fix fib_info_nh_uses_dev for external nexthops to avoid referencing multiple nh_grp structs. Move the device check in fib_info_nh_uses_dev to a helper and create a nexthop version that is called if the fib_info uses an external nexthop.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
af7888ad |
| 26-May-2020 |
David Ahern <[email protected]> |
ipv4: Refactor nhc evaluation in fib_table_lookup
FIB lookups can return an entry that references an external nexthop. While walking the nexthop struct we do not want to make multiple calls into the
ipv4: Refactor nhc evaluation in fib_table_lookup
FIB lookups can return an entry that references an external nexthop. While walking the nexthop struct we do not want to make multiple calls into the nexthop code which can result in 2 different structs getting accessed - one returning the number of paths the rest of the loop seeing a different nh_grp struct. If the nexthop group shrunk, the result is an attempt to access a fib_nh_common that does not exist for the new nh_grp struct but did for the old one.
To fix that move the device evaluation code to a helper that can be used for inline fib_nh path as well as external nexthops.
Update the existing check for fi->nh in fib_table_lookup to call a new helper, nexthop_get_nhc_lookup, which walks the external nexthop with a single rcu dereference.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.7-rc7 |
|
| #
41b4bd98 |
| 20-May-2020 |
Sabrina Dubroca <[email protected]> |
net: don't return invalid table id error when we fall back to PF_UNSPEC
In case we can't find a ->dumpit callback for the requested (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, w
net: don't return invalid table id error when we fall back to PF_UNSPEC
In case we can't find a ->dumpit callback for the requested (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're in the same situation as if userspace had requested a PF_UNSPEC dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all the registered RTM_GETROUTE handlers.
The requested table id may or may not exist for all of those families. commit ae677bbb4441 ("net: Don't return invalid table id error when dumping all families") fixed the problem when userspace explicitly requests a PF_UNSPEC dump, but missed the fallback case.
For example, when we pass ipv6.disable=1 to a kernel with CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y, the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in rtnl_dump_all, and listing IPv6 routes will unexpectedly print:
# ip -6 r Error: ipv4: MR table does not exist. Dump terminated
commit ae677bbb4441 introduced the dump_all_families variable, which gets set when userspace requests a PF_UNSPEC dump. However, we can't simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the fallback case to get dump_all_families == true, because some messages types (for example RTM_GETRULE and RTM_GETNEIGH) only register the PF_UNSPEC handler and use the family to filter in the kernel what is dumped to userspace. We would then export more entries, that userspace would have to filter. iproute does that, but other programs may not.
Instead, this patch removes dump_all_families and updates the RTM_GETROUTE handlers to check if the family that is being dumped is their own. When it's not, which covers both the intentional PF_UNSPEC dumps (as dump_all_families did) and the fallback case, ignore the missing table id error.
Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|