History log of /freebsd-13.1/sys/netinet/ip_input.c (Results 1 – 25 of 503)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: release/12.4.0, release/13.1.0
# a7e7700f 26-Dec-2021 Bjoern A. Zeeb <[email protected]>

IPv4: fix redirect sending conditions

RFC792,1009,1122 state the original conditions for sending a redirect.
RFC1812 further refine these.
ip_forward() still sepcifies the checks originally implemen

IPv4: fix redirect sending conditions

RFC792,1009,1122 state the original conditions for sending a redirect.
RFC1812 further refine these.
ip_forward() still sepcifies the checks originally implemented for these
(we do slightly more/different than suggested as makes sense).
The implementation added in 8ad114c082a159c0dde95aa35d2e3e108aa30a75
to ip_tryforward() however is flawed and may send a "multi-hop"
redirects (to a host not on the directly connected network).

Do proper checks in ip_tryforward() to stop us from sending redirects
in situations we may not. Keep as much logic out of ip_tryforward()
and in ip_redir_alloc() and only do the mbuf copy once we are sure we
will send a redirect.

While here enhance and fix comments as to which conditions are handled
for sending redirects in various places.

Reported by: pi (on net@ 2021-12-04)
Sponsored by: Dr.-Ing. Nepustil & Co. GmbH
Reviewed by: cy, others (earlier versions)
Differential Revision: https://reviews.freebsd.org/D33274

(cherry picked from commit f389439f50fc4c27d15d3017b622270e25ba71c7)

show more ...


Revision tags: release/12.3.0
# e8df60a6 22-Aug-2021 Zhenlei Huang <[email protected]>

routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).

Implement kernel support for RFC 5549/8950.

* Relax control plane restrictions and allow specifying IPv6 gateways
for IPv4 routes. T

routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).

Implement kernel support for RFC 5549/8950.

* Relax control plane restrictions and allow specifying IPv6 gateways
for IPv4 routes. This behavior is controlled by the
net.route.rib_route_ipv6_nexthop sysctl (on by default).

* Always pass final destination in ro->ro_dst in ip_forward().

* Use ro->ro_dst to exract packet family inside if_output() routines.
Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.

* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
It leverages recent lltable changes committed in c541bd368f86.

Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0

Differential Revision: https://reviews.freebsd.org/D30398

(cherry picked from commit 62e1a437f3285e785d9b35a476d36a469a90028d)

show more ...


# 7da8312f 18-May-2021 Zhenlei Huang <[email protected]>

Do not forward datagrams originated by link-local addresses

The current implement of ip_input() reject packets destined for
169.254.0.0/16, but not those original from 169.254.0.0/16 link-local
addr

Do not forward datagrams originated by link-local addresses

The current implement of ip_input() reject packets destined for
169.254.0.0/16, but not those original from 169.254.0.0/16 link-local
addresses.

Fix to fully respect RFC 3927 section 2.7.

PR: 255388
Reviewed by: donner, rgrimes, karels
Differential Revision: https://reviews.freebsd.org/D29968
Reviewed by: rgrimes, donner, karels, marcus, emaste
Differential Revision: https://reviews.freebsd.org/D30374

(cherry picked from commit 3d846e48227e2e78c1e7b35145f57353ffda56ba)
(cherry picked from commit 03b0505b8fe848f33f2f38fe89dd5538908c847e)

show more ...


Revision tags: release/13.0.0
# 8aafa7a0 08-Mar-2021 Alexander V. Chernikov <[email protected]>

Flush remaining routes from the routing table during VNET shutdown.

Summary:
This fixes rtentry leak for the cloned interfaces created inside the
VNET.

Loopback teardown order is `SI_SUB_INIT_IF`,

Flush remaining routes from the routing table during VNET shutdown.

Summary:
This fixes rtentry leak for the cloned interfaces created inside the
VNET.

Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.

Test Plan:
```
set_skip:set_skip_group_lo -> passed [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```

PR: 253998
Reported by: rashey at superbox.pl
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D29116

(cherry picked from commit b1d63265ac399112b3bca36c3d75df1a3c2c8102)

show more ...


# d1d941c5 29-Nov-2020 Alexander V. Chernikov <[email protected]>

Remove RADIX_MPATH config option.

ROUTE_MPATH is the new config option controlling new multipath routing
implementation. Remove the last pieces of RADIX_MPATH-related code and
the config option.

Remove RADIX_MPATH config option.

ROUTE_MPATH is the new config option controlling new multipath routing
implementation. Remove the last pieces of RADIX_MPATH-related code and
the config option.

Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D27244

show more ...


# 360d1232 13-Nov-2020 Ed Maste <[email protected]>

ip_fastfwd: style(9) tidy for r367628

Discussed with: gnn
MFC with: r367628


# 8ad114c0 12-Nov-2020 George V. Neville-Neil <[email protected]>

An earlier commit effectively turned out the fast forwading path
due to its lack of support for ICMP redirects. The following commit
adds redirects to the fastforward path, again allowing for decent

An earlier commit effectively turned out the fast forwading path
due to its lack of support for ICMP redirects. The following commit
adds redirects to the fastforward path, again allowing for decent
forwarding performance in the kernel.

Reviewed by: ae, melifaro
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")

show more ...


Revision tags: release/12.2.0
# 662c1305 01-Sep-2020 Mateusz Guzik <[email protected]>

net: clean up empty lines in .c and .h files


# d16a2e47 01-Jul-2020 Mark Johnston <[email protected]>

Fix a possible next-hop refcount leak when handling IPSec traffic.

It may be possible to fix this by deferring the lookup, but let's
keep the initial change simple to make MFCs easier.

PR: 246951

Fix a possible next-hop refcount leak when handling IPSec traffic.

It may be possible to fix this by deferring the lookup, but let's
keep the initial change simple to make MFCs easier.

PR: 246951
Reviewed by: melifaro
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25519

show more ...


Revision tags: release/11.4.0
# 4043ee3c 28-Apr-2020 Alexander V. Chernikov <[email protected]>

Convert rtalloc_mpath_fib() users to the new KPI.

New fib[46]_lookup() functions support multipath transparently.
Given that, switch the last rtalloc_mpath_fib() calls to
dib4_lookup() and eliminat

Convert rtalloc_mpath_fib() users to the new KPI.

New fib[46]_lookup() functions support multipath transparently.
Given that, switch the last rtalloc_mpath_fib() calls to
dib4_lookup() and eliminate the function itself.

Note: proper flowid generation (especially for the outbound traffic) is a
bigger topic and will be handled in a separate review.
This change leaves flowid generation intact.

Differential Revision: https://reviews.freebsd.org/D24595

show more ...


# 983066f0 25-Apr-2020 Alexander V. Chernikov <[email protected]>

Convert route caching to nexthop caching.

This change is build on top of nexthop objects introduced in r359823.

Nexthops are separate datastructures, containing all necessary information
to perfor

Convert route caching to nexthop caching.

This change is build on top of nexthop objects introduced in r359823.

Nexthops are separate datastructures, containing all necessary information
to perform packet forwarding such as gateway interface and mtu. Nexthops
are shared among the routes, providing more pre-computed cache-efficient
data while requiring less memory. Splitting the LPM code and the attached
data solves multiple long-standing problems in the routing layer,
drastically reduces the coupling with outher parts of the stack and allows
to transparently introduce faster lookup algorithms.

Route caching was (re)introduced to minimise (slow) routing lookups, allowing
for notably better performance for large TCP senders. Caching works by
acquiring rtentry reference, which is protected by per-rtentry mutex.
If the routing table is changed (checked by comparing the rtable generation id)
or link goes down, cache record gets withdrawn.

Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
cached object, which is mostly mechanical. Other moving parts like cache
cleanup on rtable change remains the same.

Differential Revision: https://reviews.freebsd.org/D24340

show more ...


# c012cfe6 27-Mar-2020 Ed Maste <[email protected]>

sys/netinet: remove spurious doubled ;s


# 7029da5c 26-Feb-2020 Pawel Biernacki <[email protected]>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly mark

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718

show more ...


# 481be5de 12-Feb-2020 Randall Stewart <[email protected]>

White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by: Netflix Inc.


Revision tags: release/12.1.0
# b8a6e03f 07-Oct-2019 Gleb Smirnoff <[email protected]>

Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex cover

Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111

show more ...


Revision tags: release/11.3.0
# 6c1c6ae5 04-Apr-2019 Rodney W. Grimes <[email protected]>

Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code

There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter

Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code

There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros. Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by: karels, kristof
Approved by: bde (mentor)
MFC after: 3 weeks
Sponsored by: John Gilmore
Differential Revision: https://reviews.freebsd.org/D19317

show more ...


# b252313f 31-Jan-2019 Gleb Smirnoff <[email protected]>

New pfil(9) KPI together with newborn pfil API and control utility.

The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) intern

New pfil(9) KPI together with newborn pfil API and control utility.

The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision: https://reviews.freebsd.org/D18951

show more ...


# a68cc388 09-Jan-2019 Gleb Smirnoff <[email protected]>

Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro

Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with: jtl, gallatin

show more ...


Revision tags: release/12.0.0
# 1a5995cc 27-Oct-2018 Eugene Grosbein <[email protected]>

Prevent ip_input() from panicing due to unprotected access to INADDR_HASH.

PR: 220078
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12457
Tested-by: Cassiano Peixoto and

Prevent ip_input() from panicing due to unprotected access to INADDR_HASH.

PR: 220078
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12457
Tested-by: Cassiano Peixoto and others

show more ...


# 62484790 14-Aug-2018 Andrey V. Elsukov <[email protected]>

Restore ability to send ICMP and ICMPv6 redirects.

It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled
only when sending redirects for corresponding IP version is disabled via

Restore ability to send ICMP and ICMPv6 redirects.

It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled
only when sending redirects for corresponding IP version is disabled via
sysctl. Otherwise will be used default forwarding function.

PR: 221137
Submitted by: mckay@
MFC after: 2 weeks

show more ...


# 6040822c 30-Jul-2018 Alan Somers <[email protected]>

Make timespecadd(3) and friends public

The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have

Make timespecadd(3) and friends public

The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725

show more ...


# 5f901c92 24-Jul-2018 Andrew Turner <[email protected]>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


Revision tags: release/11.2.0
# 4f6c66cc 23-May-2018 Matt Macy <[email protected]>

UDP: further performance improvements on tx

Cumulative throughput while running 64
netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput incre

UDP: further performance improvements on tx

Cumulative throughput while running 64
netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15409

show more ...


# d7c5a620 18-May-2018 Matt Macy <[email protected]>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366

show more ...


# effaab88 23-Mar-2018 Kristof Provost <[email protected]>

netpfil: Introduce PFIL_FWD flag

Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue

netpfil: Introduce PFIL_FWD flag

Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by: ae, kevans
Differential Revision: https://reviews.freebsd.org/D13715

show more ...


12345678910>>...21