|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
46930855 |
| 05-Feb-2025 |
Eric Dumazet <[email protected]> |
ipv4: add RCU protection to ip4_dst_hoplimit()
ip4_dst_hoplimit() must use RCU protection to make sure the net structure it reads does not disappear.
Fixes: fa50d974d104 ("ipv4: Namespaceify ip_def
ipv4: add RCU protection to ip4_dst_hoplimit()
ip4_dst_hoplimit() must use RCU protection to make sure the net structure it reads does not disappear.
Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
1dbdce30 |
| 16-Dec-2024 |
Guillaume Nault <[email protected]> |
ipv4: Define inet_sk_init_flowi4() and use it in inet_sk_rebuild_header().
IPv4 code commonly has to initialise a flowi4 structure from an IPv4 socket. This requires looking at potential IPv4 option
ipv4: Define inet_sk_init_flowi4() and use it in inet_sk_rebuild_header().
IPv4 code commonly has to initialise a flowi4 structure from an IPv4 socket. This requires looking at potential IPv4 options to set the proper destination address, call flowi4_init_output() with the correct set of parameters and run the sk_classify_flow security hook.
Instead of reimplementing these operations in different parts of the stack, let's define inet_sk_init_flowi4() which does all these operations.
The first user is inet_sk_rebuild_header(), where inet_sk_init_flowi4() replaces ip_route_output_ports(). Unlike ip_route_output_ports(), which sets the flowi4 structure and performs the route lookup in one go, inet_sk_init_flowi4() only initialises the flow. The route lookup is then done by ip_route_output_flow(). Decoupling flow initialisation from route lookup makes this new interface applicable more broadly as it will allow some users to overwrite specific struct flowi4 members before the route lookup.
Signed-off-by: Guillaume Nault <[email protected]> Link: https://patch.msgid.link/fd416275262b1f518d5abfcef740ce4f4a1a6522.1734357769.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
29b54079 |
| 18-Dec-2024 |
Guillaume Nault <[email protected]> |
gre: Drop ip_route_output_gre().
We already have enough variants of ip_route_output*() functions. We don't need a GRE specific one in the generic route.h header file.
Furthermore, ip_route_output_g
gre: Drop ip_route_output_gre().
We already have enough variants of ip_route_output*() functions. We don't need a GRE specific one in the generic route.h header file.
Furthermore, ip_route_output_gre() is only used once, in ipgre_open(), where it can be easily replaced by a simple call to ip_route_output_key().
While there, and for clarity, explicitly set .flowi4_scope to RT_SCOPE_UNIVERSE instead of relying on the implicit zero initialisation.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Link: https://patch.msgid.link/ab7cba47b8558cd4bfe2dc843c38b622a95ee48e.1734527729.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
479aed04 |
| 07-Nov-2024 |
Menglong Dong <[email protected]> |
net: ip: make ip_route_use_hint() return drop reasons
In this commit, we make ip_route_use_hint() return drop reasons. The drop reasons that we return are similar to what we do in ip_route_input_slo
net: ip: make ip_route_use_hint() return drop reasons
In this commit, we make ip_route_use_hint() return drop reasons. The drop reasons that we return are similar to what we do in ip_route_input_slow(), and no drop reasons are added in this commit.
Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
50038bf3 |
| 07-Nov-2024 |
Menglong Dong <[email protected]> |
net: ip: make ip_route_input() return drop reasons
In this commit, we make ip_route_input() return skb drop reasons that come from ip_route_input_noref().
Meanwhile, adjust all the call to it.
Sig
net: ip: make ip_route_input() return drop reasons
In this commit, we make ip_route_input() return skb drop reasons that come from ip_route_input_noref().
Meanwhile, adjust all the call to it.
Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
82d9983e |
| 07-Nov-2024 |
Menglong Dong <[email protected]> |
net: ip: make ip_route_input_noref() return drop reasons
In this commit, we make ip_route_input_noref() return drop reasons, which come from ip_route_input_rcu().
We need adjust the callers of ip_r
net: ip: make ip_route_input_noref() return drop reasons
In this commit, we make ip_route_input_noref() return drop reasons, which come from ip_route_input_rcu().
We need adjust the callers of ip_route_input_noref() to make sure the return value of ip_route_input_noref() is used properly.
The errno that ip_route_input_noref() returns comes from ip_route_input and bpf_lwt_input_reroute in the origin logic, and we make them return -EINVAL on error instead. In the following patch, we will make ip_route_input() returns drop reasons too.
Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
d46f8270 |
| 07-Nov-2024 |
Menglong Dong <[email protected]> |
net: ip: make ip_mc_validate_source() return drop reason
Make ip_mc_validate_source() return drop reason, and adjust the call of it in ip_route_input_mc().
Another caller of it is ip_rcv_finish_cor
net: ip: make ip_mc_validate_source() return drop reason
Make ip_mc_validate_source() return drop reason, and adjust the call of it in ip_route_input_mc().
Another caller of it is ip_rcv_finish_core->udp_v4_early_demux, and the errno is not checked in detail, so we don't do more adjustment for it.
The drop reason "SKB_DROP_REASON_IP_LOCALNET" is added in this commit.
Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
48171c65 |
| 06-Nov-2024 |
Guillaume Nault <[email protected]> |
ipv4: Prepare ip_route_output() to future .flowi4_tos conversion.
Convert the "tos" parameter of ip_route_output() to dscp_t. This way we'll have a dscp_t value directly available when .flowi4_tos w
ipv4: Prepare ip_route_output() to future .flowi4_tos conversion.
Convert the "tos" parameter of ip_route_output() to dscp_t. This way we'll have a dscp_t value directly available when .flowi4_tos will eventually be converted to dscp_t.
All ip_route_output() callers but one set this "tos" parameter to 0 and therefore don't need to be adapted to the new prototype.
Only br_nf_pre_routing_finish() needs conversion. It can just use ip4h_dscp() to get the DSCP field from the IPv4 header.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://patch.msgid.link/0f10d031dd44c70aae9bc6e19391cb30d5c2fe71.1730928699.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
d3297640 |
| 07-Oct-2024 |
Guillaume Nault <[email protected]> |
ipv4: Convert ip_mc_validate_source() to dscp_t.
Pass a dscp_t variable to ip_mc_validate_source(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_m
ipv4: Convert ip_mc_validate_source() to dscp_t.
Pass a dscp_t variable to ip_mc_validate_source(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_mc_validate_source() to consider are:
* ip_route_input_mc() which already has a dscp_t variable to pass as parameter. We just need to remove the inet_dscp_to_dsfield() conversion.
* udp_v4_early_demux() which gets the DSCP directly from the IPv4 header and can simply use the ip4h_dscp() helper.
Also, stop including net/inet_dscp.h in udp.c as we don't use any of its declarations anymore.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/c91b2cca04718b7ee6cf5b9c1d5b40507d65a8d4.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
2b78d306 |
| 07-Oct-2024 |
Guillaume Nault <[email protected]> |
ipv4: Convert ip_route_use_hint() to dscp_t.
Pass a dscp_t variable to ip_route_use_hint(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Only ip_rcv_finish_core
ipv4: Convert ip_route_use_hint() to dscp_t.
Pass a dscp_t variable to ip_route_use_hint(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Only ip_rcv_finish_core() actually calls ip_route_use_hint(). Use the ip4h_dscp() helper to get the DSCP from the IPv4 header.
While there, modify the declaration of ip_route_use_hint() in include/net/route.h so that it matches the prototype of its implementation in net/ipv4/route.c.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/c40994fdf804db7a363d04fdee01bf48dddda676.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2 |
|
| #
66fb6386 |
| 01-Oct-2024 |
Guillaume Nault <[email protected]> |
ipv4: Convert ip_route_input_noref() to dscp_t.
Pass a dscp_t variable to ip_route_input_noref(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_rou
ipv4: Convert ip_route_input_noref() to dscp_t.
Pass a dscp_t variable to ip_route_input_noref(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input_noref() to consider are:
* arp_process() in net/ipv4/arp.c. This function sets the tos parameter to 0, which is already a valid dscp_t value, so it doesn't need to be adjusted for the new prototype.
* ip_route_input(), which already has a dscp_t variable to pass as parameter. We just need to remove the inet_dscp_to_dsfield() conversion.
* ipvlan_l3_rcv(), bpf_lwt_input_reroute(), ip_expire(), ip_rcv_finish_core(), xfrm4_rcv_encap_finish() and xfrm4_rcv_encap(), which get the DSCP directly from IPv4 headers and can simply use the ip4h_dscp() helper.
While there, declare the IPv4 header pointers as const in ipvlan_l3_rcv() and bpf_lwt_input_reroute(). Also, modify the declaration of ip_route_input_noref() in include/net/route.h so that it matches the prototype of its implementation in net/ipv4/route.c.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/a8a747bed452519c4d0cc06af32c7e7795d7b627.1727807926.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
7e863e5d |
| 01-Oct-2024 |
Guillaume Nault <[email protected]> |
ipv4: Convert ip_route_input() to dscp_t.
Pass a dscp_t variable to ip_route_input(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input() t
ipv4: Convert ip_route_input() to dscp_t.
Pass a dscp_t variable to ip_route_input(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input() to consider are:
* input_action_end_dx4_finish() and input_action_end_dt4() in net/ipv6/seg6_local.c. These functions set the tos parameter to 0, which is already a valid dscp_t value, so they don't need to be adjusted for the new prototype.
* icmp_route_lookup(), which already has a dscp_t variable to pass as parameter. We just need to remove the inet_dscp_to_dsfield() conversion.
* br_nf_pre_routing_finish(), ip_options_rcv_srr() and ip4ip6_err(), which get the DSCP directly from IPv4 headers. Define a helper to read the .tos field of struct iphdr as dscp_t, so that these function don't have to do the conversion manually.
While there, declare *iph as const in br_nf_pre_routing_finish(), declare its local variables in reverse-christmas-tree order and move the "err = ip_route_input()" assignment out of the conditional to avoid checkpatch warning.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://patch.msgid.link/e9d40781d64d3d69f4c79ac8a008b8d67a033e8d.1727807926.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
b261b2c6 |
| 29-Aug-2024 |
Ido Schimmel <[email protected]> |
xfrm: Unmask upper DSCP bits in xfrm_get_tos()
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain:
xfrm_bundl
xfrm: Unmask upper DSCP bits in xfrm_get_tos()
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain:
xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4)
Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value.
Remove IPTOS_RT_MASK since it is no longer used.
Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
ff95cb5e |
| 29-Aug-2024 |
Ido Schimmel <[email protected]> |
ipv4: Unmask upper DSCP bits in ip_sock_rt_tos()
The function is used to read the DS field that was stored in IPv4 sockets via the IP_TOS socket option so that it could be used to initialize the flo
ipv4: Unmask upper DSCP bits in ip_sock_rt_tos()
The function is used to read the DS field that was stored in IPv4 sockets via the IP_TOS socket option so that it could be used to initialize the flowi4_tos field before resolving an output route.
Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value.
Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
05d6d492 |
| 29-Apr-2024 |
Eric Dumazet <[email protected]> |
inet: introduce dst_rtable() helper
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper")
This patch does a similar change for IPv4.
Instead of (struct rtable *)d
inet: introduce dst_rtable() helper
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper")
This patch does a similar change for IPv4.
Instead of (struct rtable *)dst casts, we can use :
#define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst)
Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
5618603f |
| 10-Apr-2024 |
Guillaume Nault <[email protected]> |
ipv4: Remove RTO_ONLINK.
RTO_ONLINK was a flag used in ->flowi4_tos that allowed to alter the scope of an IPv4 route lookup. Setting this flag was equivalent to specifying RT_SCOPE_LINK in ->flowi4_
ipv4: Remove RTO_ONLINK.
RTO_ONLINK was a flag used in ->flowi4_tos that allowed to alter the scope of an IPv4 route lookup. Setting this flag was equivalent to specifying RT_SCOPE_LINK in ->flowi4_scope.
With commit ec20b2830093 ("ipv4: Set scope explicitly in ip_route_output()."), the last users of RTO_ONLINK have been removed. Therefore, we can now drop the code that checked this bit and stop modifying ->flowi4_scope in ip_route_output_key_hash().
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/57de760565cab55df7b129f523530ac6475865b2.1712754146.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3 |
|
| #
ec20b283 |
| 05-Apr-2024 |
Guillaume Nault <[email protected]> |
ipv4: Set scope explicitly in ip_route_output().
Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local s
ipv4: Set scope explicitly in ip_route_output().
Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local scope.
This will allow converting flowi4_tos to dscp_t in the future, thus allowing static analysers to flag invalid interactions between "tos" (the DSCP bits) and ECN.
Only three users ask for local scope (bonding, arp and atm). The others continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn users about the limitations of ip_route_output().
Signed-off-by: Guillaume Nault <[email protected]> Acked-by: Leon Romanovsky <[email protected]> # infiniband Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4 |
|
| #
a3522a2e |
| 09-Feb-2024 |
Guillaume Nault <[email protected]> |
ipv4: Set the routing scope properly in ip_route_output_ports().
Set scope automatically in ip_route_output_ports() (using the socket SOCK_LOCALROUTE flag). This way, callers don't have to overload
ipv4: Set the routing scope properly in ip_route_output_ports().
Set scope automatically in ip_route_output_ports() (using the socket SOCK_LOCALROUTE flag). This way, callers don't have to overload the tos with the RTO_ONLINK flag, like RT_CONN_FLAGS() does.
For callers that don't pass a struct sock, this doesn't change anything as the scope is still set to RT_SCOPE_UNIVERSE when sk is NULL.
Callers that passed a struct sock and used RT_CONN_FLAGS(sk) or RT_CONN_FLAGS_TOS(sk, tos) for the tos are modified to use ip_sock_tos(sk) and RT_TOS(tos) respectively, as overloading tos with the RTO_ONLINK flag now becomes unnecessary.
In drivers/net/amt.c, all ip_route_output_ports() calls use a 0 tos parameter, ignoring the SOCK_LOCALROUTE flag of the socket. But the sk parameter is a kernel socket, which doesn't have any configuration path for setting SOCK_LOCALROUTE anyway. Therefore, ip_route_output_ports() will continue to initialise scope with RT_SCOPE_UNIVERSE and amt.c doesn't need to be modified.
Also, remove RT_CONN_FLAGS() and RT_CONN_FLAGS_TOS() from route.h as these macros are now unused.
The objective is to eventually remove RTO_ONLINK entirely to allow converting ->flowi4_tos to dscp_t. This will ensure proper isolation between the DSCP and ECN bits, thus minimising the risk of introducing bugs where TOS values interfere with ECN.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/dacfd2ab40685e20959ab7b53c427595ba229e7d.1707496938.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
bf3fcbf7 |
| 16-Oct-2023 |
Beniamino Galvani <[email protected]> |
ipv4: rename and move ip_route_output_tunnel()
At the moment ip_route_output_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function need
ipv4: rename and move ip_route_output_tunnel()
At the moment ip_route_output_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports.
Prepare for these changes by renaming the function to udp_tunnel_dst_lookup() and move it to file net/ipv4/udp_tunnel_core.c.
Suggested-by: Guillaume Nault <[email protected]> Signed-off-by: Beniamino Galvani <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3 |
|
| #
e08d0b3d |
| 22-Sep-2023 |
Eric Dumazet <[email protected]> |
inet: implement lockless IP_TOS
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
S
inet: implement lockless IP_TOS
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7 |
|
| #
4bd0623f |
| 16-Aug-2023 |
Eric Dumazet <[email protected]> |
inet: move inet->transparent to inet->inet_flags
IP_TRANSPARENT socket option can now be set/read without locking the socket.
v2: removed unused issk variable in mptcp_setsockopt_sol_ip_set_transpa
inet: move inet->transparent to inet->inet_flags
IP_TRANSPARENT socket option can now be set/read without locking the socket.
v2: removed unused issk variable in mptcp_setsockopt_sol_ip_set_transparent() v4: rebased after commit 3f326a821b99 ("mptcp: change the mpc check helper to return a sk")
Signed-off-by: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4 |
|
| #
3c5b4d69 |
| 28-Jul-2023 |
Eric Dumazet <[email protected]> |
net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-b
net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc3, v6.5-rc2 |
|
| #
8d6eba33 |
| 11-Jul-2023 |
Guillaume Nault <[email protected]> |
ipv4: Constify the sk parameter of ip_route_output_*().
These functions don't need to modify the socket, so let's allow the callers to pass a const struct sock *.
Signed-off-by: Guillaume Nault <gn
ipv4: Constify the sk parameter of ip_route_output_*().
These functions don't need to modify the socket, so let's allow the callers to pass a const struct sock *.
Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5 |
|
| #
3f06760c |
| 01-Jun-2023 |
Guillaume Nault <[email protected]> |
ipv4: Drop tos parameter from flowi4_update_output()
Callers of flowi4_update_output() never try to update ->flowi4_tos:
* ip_route_connect() updates ->flowi4_tos with its own current value.
ipv4: Drop tos parameter from flowi4_update_output()
Callers of flowi4_update_output() never try to update ->flowi4_tos:
* ip_route_connect() updates ->flowi4_tos with its own current value.
* ip_route_newports() has two users: tcp_v4_connect() and dccp_v4_connect. Both initialise fl4 with ip_route_connect(), which in turn sets ->flowi4_tos with RT_TOS(inet_sk(sk)->tos) and ->flowi4_scope based on SOCK_LOCALROUTE.
Then ip_route_newports() updates ->flowi4_tos with RT_CONN_FLAGS(sk), which is the same as RT_TOS(inet_sk(sk)->tos), unless SOCK_LOCALROUTE is set on the socket. In that case, the lowest order bit is set to 1, to eventually inform ip_route_output_key_hash() to restrict the scope to RT_SCOPE_LINK. This is equivalent to properly setting ->flowi4_scope as ip_route_connect() did.
* ip_vs_xmit.c initialises ->flowi4_tos with memset(0), then calls flowi4_update_output() with tos=0.
* sctp_v4_get_dst() uses the same RT_CONN_FLAGS_TOS() when initialising ->flowi4_tos and when calling flowi4_update_output().
In the end, ->flowi4_tos never changes. So let's just drop the tos parameter. This will simplify the conversion of ->flowi4_tos from __u8 to dscp_t.
Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
d288a162 |
| 23-Mar-2023 |
Wangyang Guo <[email protected]> |
net: dst: Prevent false sharing vs. dst_entry:: __refcnt
dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_
net: dst: Prevent false sharing vs. dst_entry:: __refcnt
dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_t, so the reference count operations have to take the cache-line exclusive.
Aside of the unavoidable reference count contention there is another significant problem which is caused by that: False sharing.
perf top identified two affected read accesses. dst_entry::lwtstate and rtable::rt_genid.
dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into a seperate cacheline vs. the read mostly members located at the beginning of the struct.
That prevents false sharing vs. the struct members in the first 64 bytes of the structure, but there is also
dst_entry::lwtstate
which is located after the reference count and in the same cache line. This member is read after a reference count has been acquired.
struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a size of 112 bytes, which means that the struct members of rtable which follow the dst member share the same cache line as dst_entry::__refcnt. Especially
rtable::rt_genid
is also read by the contexts which have a reference count acquired already.
When dst_entry:__refcnt is incremented or decremented via an atomic operation these read accesses stall. This was found when analysing the memtier benchmark in 1:100 mode, which amplifies the problem extremly.
Move the rt[6i]_uncached[_list] members out of struct rtable and struct rt6_info into struct dst_entry to provide padding and move the lwtstate member after that so it ends up in the same cache line.
The resulting improvement depends on the micro-architecture and the number of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached benchmark.
[ tglx: Rearrange struct ]
Signed-off-by: Wangyang Guo <[email protected]> Signed-off-by: Arjan van de Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|