|
Revision tags: release/12.4.0, release/13.1.0 |
|
| #
a7e7700f |
| 26-Dec-2021 |
Bjoern A. Zeeb <[email protected]> |
IPv4: fix redirect sending conditions
RFC792,1009,1122 state the original conditions for sending a redirect. RFC1812 further refine these. ip_forward() still sepcifies the checks originally implemen
IPv4: fix redirect sending conditions
RFC792,1009,1122 state the original conditions for sending a redirect. RFC1812 further refine these. ip_forward() still sepcifies the checks originally implemented for these (we do slightly more/different than suggested as makes sense). The implementation added in 8ad114c082a159c0dde95aa35d2e3e108aa30a75 to ip_tryforward() however is flawed and may send a "multi-hop" redirects (to a host not on the directly connected network).
Do proper checks in ip_tryforward() to stop us from sending redirects in situations we may not. Keep as much logic out of ip_tryforward() and in ip_redir_alloc() and only do the mbuf copy once we are sure we will send a redirect.
While here enhance and fix comments as to which conditions are handled for sending redirects in various places.
Reported by: pi (on net@ 2021-12-04) Sponsored by: Dr.-Ing. Nepustil & Co. GmbH Reviewed by: cy, others (earlier versions) Differential Revision: https://reviews.freebsd.org/D33274
(cherry picked from commit f389439f50fc4c27d15d3017b622270e25ba71c7)
show more ...
|
|
Revision tags: release/12.3.0 |
|
| #
e8df60a6 |
| 22-Aug-2021 |
Zhenlei Huang <[email protected]> |
routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).
Implement kernel support for RFC 5549/8950.
* Relax control plane restrictions and allow specifying IPv6 gateways for IPv4 routes. T
routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).
Implement kernel support for RFC 5549/8950.
* Relax control plane restrictions and allow specifying IPv6 gateways for IPv4 routes. This behavior is controlled by the net.route.rib_route_ipv6_nexthop sysctl (on by default).
* Always pass final destination in ro->ro_dst in ip_forward().
* Use ro->ro_dst to exract packet family inside if_output() routines. Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.
* Pass extracted family to nd6_resolve() to get the LLE with proper encap. It leverages recent lltable changes committed in c541bd368f86.
Presence of the functionality can be checked using ipv4_rfc5549_support feature(3). Example usage: route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0
Differential Revision: https://reviews.freebsd.org/D30398
(cherry picked from commit 62e1a437f3285e785d9b35a476d36a469a90028d)
show more ...
|
| #
7da8312f |
| 18-May-2021 |
Zhenlei Huang <[email protected]> |
Do not forward datagrams originated by link-local addresses
The current implement of ip_input() reject packets destined for 169.254.0.0/16, but not those original from 169.254.0.0/16 link-local addr
Do not forward datagrams originated by link-local addresses
The current implement of ip_input() reject packets destined for 169.254.0.0/16, but not those original from 169.254.0.0/16 link-local addresses.
Fix to fully respect RFC 3927 section 2.7.
PR: 255388 Reviewed by: donner, rgrimes, karels Differential Revision: https://reviews.freebsd.org/D29968 Reviewed by: rgrimes, donner, karels, marcus, emaste Differential Revision: https://reviews.freebsd.org/D30374
(cherry picked from commit 3d846e48227e2e78c1e7b35145f57353ffda56ba) (cherry picked from commit 03b0505b8fe848f33f2f38fe89dd5538908c847e)
show more ...
|
|
Revision tags: release/13.0.0 |
|
| #
8aafa7a0 |
| 08-Mar-2021 |
Alexander V. Chernikov <[email protected]> |
Flush remaining routes from the routing table during VNET shutdown.
Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET.
Loopback teardown order is `SI_SUB_INIT_IF`,
Flush remaining routes from the routing table during VNET shutdown.
Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET.
Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages | grep rtentry ```
PR: 253998 Reported by: rashey at superbox.pl Reviewed By: kp Differential Revision: https://reviews.freebsd.org/D29116
(cherry picked from commit b1d63265ac399112b3bca36c3d75df1a3c2c8102)
show more ...
|
| #
d1d941c5 |
| 29-Nov-2020 |
Alexander V. Chernikov <[email protected]> |
Remove RADIX_MPATH config option.
ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option.
Remove RADIX_MPATH config option.
ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option.
Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244
show more ...
|
| #
360d1232 |
| 13-Nov-2020 |
Ed Maste <[email protected]> |
ip_fastfwd: style(9) tidy for r367628
Discussed with: gnn MFC with: r367628
|
| #
8ad114c0 |
| 12-Nov-2020 |
George V. Neville-Neil <[email protected]> |
An earlier commit effectively turned out the fast forwading path due to its lack of support for ICMP redirects. The following commit adds redirects to the fastforward path, again allowing for decent
An earlier commit effectively turned out the fast forwading path due to its lack of support for ICMP redirects. The following commit adds redirects to the fastforward path, again allowing for decent forwarding performance in the kernel.
Reviewed by: ae, melifaro Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
show more ...
|
|
Revision tags: release/12.2.0 |
|
| #
662c1305 |
| 01-Sep-2020 |
Mateusz Guzik <[email protected]> |
net: clean up empty lines in .c and .h files
|
| #
d16a2e47 |
| 01-Jul-2020 |
Mark Johnston <[email protected]> |
Fix a possible next-hop refcount leak when handling IPSec traffic.
It may be possible to fix this by deferring the lookup, but let's keep the initial change simple to make MFCs easier.
PR: 246951
Fix a possible next-hop refcount leak when handling IPSec traffic.
It may be possible to fix this by deferring the lookup, but let's keep the initial change simple to make MFCs easier.
PR: 246951 Reviewed by: melifaro MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25519
show more ...
|
|
Revision tags: release/11.4.0 |
|
| #
4043ee3c |
| 28-Apr-2020 |
Alexander V. Chernikov <[email protected]> |
Convert rtalloc_mpath_fib() users to the new KPI.
New fib[46]_lookup() functions support multipath transparently. Given that, switch the last rtalloc_mpath_fib() calls to dib4_lookup() and eliminat
Convert rtalloc_mpath_fib() users to the new KPI.
New fib[46]_lookup() functions support multipath transparently. Given that, switch the last rtalloc_mpath_fib() calls to dib4_lookup() and eliminate the function itself.
Note: proper flowid generation (especially for the outbound traffic) is a bigger topic and will be handled in a separate review. This change leaves flowid generation intact.
Differential Revision: https://reviews.freebsd.org/D24595
show more ...
|
| #
983066f0 |
| 25-Apr-2020 |
Alexander V. Chernikov <[email protected]> |
Convert route caching to nexthop caching.
This change is build on top of nexthop objects introduced in r359823.
Nexthops are separate datastructures, containing all necessary information to perfor
Convert route caching to nexthop caching.
This change is build on top of nexthop objects introduced in r359823.
Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms.
Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn.
Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same.
Differential Revision: https://reviews.freebsd.org/D24340
show more ...
|
| #
c012cfe6 |
| 27-Mar-2020 |
Ed Maste <[email protected]> |
sys/netinet: remove spurious doubled ;s
|
| #
7029da5c |
| 26-Feb-2020 |
Pawel Biernacki <[email protected]> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly mark
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
show more ...
|
| #
481be5de |
| 12-Feb-2020 |
Randall Stewart <[email protected]> |
White space cleanup -- remove trailing tab's or spaces from any line.
Sponsored by: Netflix Inc.
|
|
Revision tags: release/12.1.0 |
|
| #
b8a6e03f |
| 07-Oct-2019 |
Gleb Smirnoff <[email protected]> |
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex cover
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier.
On output path we already enter epoch quite early - in the ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111
show more ...
|
|
Revision tags: release/11.3.0 |
|
| #
6c1c6ae5 |
| 04-Apr-2019 |
Rodney W. Grimes <[email protected]> |
Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter
Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage.
Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317
show more ...
|
| #
b252313f |
| 31-Jan-2019 |
Gleb Smirnoff <[email protected]> |
New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) intern
New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses.
In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together.
New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters.
Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points.
Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated.
Differential Revision: https://reviews.freebsd.org/D18951
show more ...
|
| #
a68cc388 |
| 09-Jan-2019 |
Gleb Smirnoff <[email protected]> |
Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro
Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively.
Discussed with: jtl, gallatin
show more ...
|
|
Revision tags: release/12.0.0 |
|
| #
1a5995cc |
| 27-Oct-2018 |
Eugene Grosbein <[email protected]> |
Prevent ip_input() from panicing due to unprotected access to INADDR_HASH.
PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and
Prevent ip_input() from panicing due to unprotected access to INADDR_HASH.
PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others
show more ...
|
| #
62484790 |
| 14-Aug-2018 |
Andrey V. Elsukov <[email protected]> |
Restore ability to send ICMP and ICMPv6 redirects.
It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via
Restore ability to send ICMP and ICMPv6 redirects.
It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via sysctl. Otherwise will be used default forwarding function.
PR: 221137 Submitted by: mckay@ MFC after: 2 weeks
show more ...
|
| #
6040822c |
| 30-Jul-2018 |
Alan Somers <[email protected]> |
Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have
Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have grown at least 28 syscalls that use timespecs in some way, leading many programs both inside and outside of the base system to redefine those macros. It's better just to make the definitions public.
Our kernel currently defines two-argument versions of timespecadd and timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define three-argument versions. Solaris also defines a three-argument version, but only in its kernel. This revision changes our definition to match the common three-argument version.
Bump _FreeBSD_version due to the breaking KPI change.
Discussed with: cem, jilles, ian, bde Differential Revision: https://reviews.freebsd.org/D14725
show more ...
|
| #
5f901c92 |
| 24-Jul-2018 |
Andrew Turner <[email protected]> |
Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables.
Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147
|
|
Revision tags: release/11.2.0 |
|
| #
4f6c66cc |
| 23-May-2018 |
Matt Macy <[email protected]> |
UDP: further performance improvements on tx
Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps
Single stream throughput incre
UDP: further performance improvements on tx
Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps
Single stream throughput increases from 910kpps to 1.18Mpps
Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg
- Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg
- Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg
- Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg
A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW.
Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409
show more ...
|
| #
d7c5a620 |
| 18-May-2018 |
Matt Macy <[email protected]> |
ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@
gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I
ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@
gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace.
When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch:
InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32
After the patch
InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52
Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch
Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366
show more ...
|
| #
effaab88 |
| 23-Mar-2018 |
Kristof Provost <[email protected]> |
netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue
netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf.
Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded.
Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715
show more ...
|