ifnet: Replace if_addr_lock rwlock with epoch + mutexRun on LLNW canaries and tested by pho@gallatin:Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5based ConnectX 4-LX NIC, I
ifnet: Replace if_addr_lock rwlock with epoch + mutexRun on LLNW canaries and tested by pho@gallatin:Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5based ConnectX 4-LX NIC, I see an almost 12% improvement in receivedpacket rate, and a larger improvement in bytes delivered all the wayto userspace.When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,I see, using nstat -I mce0 1 before the patch:InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.324.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.324.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.324.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.324.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.324.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.324.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32After the patchInMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.515.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.515.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.515.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.515.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.525.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patchReviewed by: gallatinSponsored by: Limelight NetworksDifferential Revision: https://reviews.freebsd.org/D15366
show more ...
sys/dev: further adoption of SPDX licensing ID tags.Mainly focus on files that use BSD 2-Clause license, however the tool Iwas using misidentified many licenses so this was mostly a manual - error
sys/dev: further adoption of SPDX licensing ID tags.Mainly focus on files that use BSD 2-Clause license, however the tool Iwas using misidentified many licenses so this was mostly a manual - errorprone - task.The Software Package Data Exchange (SPDX) group provides a specificationto make it easier for automated tools to detect and summarize well knownopensource licenses. We are gradually adopting the specification, notingthat the tags are considered only advisory and do not, in any way,superceed or replace the license texts.
sys/dev: use our nitems() macro when it is avaliable through param.h.No functional change, only trivial cases are done in this sweep,Drivers that can get further enhancements will be done independ
sys/dev: use our nitems() macro when it is avaliable through param.h.No functional change, only trivial cases are done in this sweep,Drivers that can get further enhancements will be done independently.Discussed in: freebsd-current
Fix variable assignment.Found by: PVS-Studio
Mechanically convert to if_inc_counter().
Use define from if_var.h to access a field inside struct if_data,that resides in struct ifnet.Sponsored by: Nginx, Inc.
Fix various NIC drivers to properly cleanup static DMA resources.In particular, don't check the value of the bus_dma map against NULLto determine if either bus_dmamem_alloc() or bus_dmamap_load() s
Fix various NIC drivers to properly cleanup static DMA resources.In particular, don't check the value of the bus_dma map against NULLto determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.Instead, assume that bus_dmamap_load() succeeeded (and thus thatbus_dmamap_unload() should be called) if the bus address for a resourceis non-zero, and assume that bus_dmamem_alloc() succeeded (and thusthat bus_dmamem_free() should be called) if the virtual address for aresource is not NULL.In many cases these bugs could result in leaks when a driver was detached.Reviewed by: yongariMFC after: 2 weeks
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepareto this event, adding if_var.h to files that do need it. Also, includeall includes that now are included due to implicit po
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepareto this event, adding if_var.h to files that do need it. Also, includeall includes that now are included due to implicit pollution via if_var.hSponsored by: NetflixSponsored by: Nginx, Inc.
Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCIcommand register. The lazy BAR allocation code in FreeBSD sometimesdisables this bit when it detects a range conflict, and w
Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCIcommand register. The lazy BAR allocation code in FreeBSD sometimesdisables this bit when it detects a range conflict, and will re-enableit on demand when a driver allocates the BAR. Thus, the bit is no longera reliable indication of capability, and should not be checked. Thisresults in the elimination of a lot of code from drivers, and also givesthe opportunity to simplify a lot of drivers to use a helper API to setthe busmaster enable bit.This changes fixes some recent reports of disk controllers and theirassociated drives/enclosures disappearing during boot.Submitted by: jhbReviewed by: jfv, marius, achadd, achimMFC after: 1 day
Mechanically substitute flags from historic mbuf allocator withmalloc(9) flags in sys/dev.
Remove duplicate const specifiers in many drivers (I hope I got all ofthem, please let me know if not). Most of these are of the form:static const struct bzzt_type { [...list of members...]} co
Remove duplicate const specifiers in many drivers (I hope I got all ofthem, please let me know if not). Most of these are of the form:static const struct bzzt_type { [...list of members...]} const bzzt_devs[] = { [...list of initializers...]};The second const is unnecessary, as arrays cannot be modified anyway,and if the elements are const, the whole thing is const automatically(e.g. it is placed in .rodata).I have verified this does not change the binary output of a full kernelbuild (except for build timestamps embedded in the object files).Reviewed by: yongari, mariusMFC after: 1 week
ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it againReviewed by: yongari
s,KOBJMETHOD_END,DEVMETHOD_END,g in order to fully hide the explicit mentionof kobj(9) from device drivers.
- Import the common MII bitbang'ing code from NetBSD and convert drivers to take advantage of it instead of duplicating it. This reduces the size of the i386 GENERIC kernel by about 4k. The only
- Import the common MII bitbang'ing code from NetBSD and convert drivers to take advantage of it instead of duplicating it. This reduces the size of the i386 GENERIC kernel by about 4k. The only potential in-tree user left unconverted is xe(4), which generally should be changed to use miibus(4) instead of implementing PHY handling on its own, as otherwise it makes not much sense to add a dependency on miibus(4)/mii_bitbang(4) to xe(4) just for the MII bitbang'ing code. The common MII bitbang'ing code also is useful in the embedded space for using GPIO pins to implement MII access.- Based on lessons learnt with dc(4) (see r185750), add bus barriers to the MII bitbang read and write functions of the other drivers converted in order to ensure the intended ordering. Given that register access via an index register as well as register bank/window switching is subject to the same problem, also add bus barriers to the respective functions of smc(4), tl(4) and xl(4).- Sprinkle some const.Thanks to the following testers:Andrew Bliznak (nge(4)), nwhitehorn@ (bm(4)), yongari@ (sis(4) and ste(4))Thanks to Hans-Joerg Sirtl for supplying hardware to test stge(4).Reviewed by: yongari (subset of drivers)Obtained from: NetBSD (partially)
Prefer KOBJMETHOD_END.
Allocate the DMA memory shared between the host and the controller ascoherent.MFC after: 2 weeks
o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control support in mii(4): - Merge generic flow control advertisement (which can be enabled by passing by MIIF_DOPAUSE to mii_att
o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control support in mii(4): - Merge generic flow control advertisement (which can be enabled by passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD, IFM_FLOW isn't implemented as a global option via the "don't care mask" but instead as a media specific option this. This has the following advantages: o allows flow control advertisement with autonegotiation to be turned on and off via ifconfig(8) with the default typically being off (though MIIF_FORCEPAUSE has been added causing flow control to be always advertised, allowing to easily MFC this changes for drivers that previously used home-grown support for flow control that behaved that way without breaking POLA) o allows to deal with PHY drivers where flow control advertisement with manual selection doesn't work or at least isn't implemented, like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4), by setting MIIF_NOMANPAUSE o the available combinations of media options are readily available from the `ifconfig -m` output - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so these are understood by ifconfig(8).o Make the master/slave support in mii(4) actually usable: - Change IFM_ETH_MASTER from being implemented as a global option via the "don't care mask" to a media specific one as it actually is only applicable to IFM_1000_T to date. - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to actually configure manually selected slave mode (like we also do in the PHY specific implementations). - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it is understood by ifconfig(8).o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4), e1000phy(4) and ip1000phy(4) to use the generic flow control support instead of home-grown solutions via IFM_FLAGs. This includes changing these PHY drivers and smcphy(4) to no longer unconditionally advertise support for flow control but only if the selected media has IFM_FLOW set (or MIIF_FORCEPAUSE is set) and implemented for these media variants, i.e. typically only for copper.o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0 and some IFM_FLAGn.o Switch brgphy(4) to add at least the the supported copper media based on the contents of the BMSR via mii_phy_add_media() instead of hardcoding them. The latter approach seems to have developed historically, besides causing unnecessary code duplication it was also undesirable because brgphy_mii_phy_auto() already based the capability advertisement on the contents of the BMSR though.o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not just BCM5701. Apparently this was a misinterpretation of a workaround in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but this doesn't mean we can't set these as well on other PHYs for manual media selection.o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so IFM_1000_T master mode support now is generally available with all PHY drivers.o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's not applicable there.Reviewed by: yongari (plus additional testing)Obtained from: NetBSD (partially), OpenBSD (partially)MFC after: 2 weeks
Convert the PHY drivers to honor the mii_flags passed down and convertthe NIC drivers as well as the PHY drivers to take advantage of themii_attach() introduced in r213878 to get rid of certain hac
Convert the PHY drivers to honor the mii_flags passed down and convertthe NIC drivers as well as the PHY drivers to take advantage of themii_attach() introduced in r213878 to get rid of certain hacks. Forthe most part these were:- Artificially limiting miibus_{read,write}reg methods to certain PHY addresses; we now let mii_attach() only probe the PHY at the desired address(es) instead.- PHY drivers setting MIIF_* flags based on the NIC driver they hang off from, partly even based on grabbing and using the softc of the parent; we now pass these flags down from the NIC to the PHY drivers via mii_attach(). This got us rid of all such hacks except those of brgphy() in combination with bce(4) and bge(4), which is way beyond what can be expressed with simple flags.While at it, I took the opportunity to change the NIC drivers to passup the error returned by mii_attach() (previously by mii_phy_probe())and unify the error message used in this case where and as appropriateas mii_attach() actually can fail for a number of reasons, not justbecause of no PHY(s) being present at the expected address(es).Reviewed by: jhb, yongari
Fix build breakage introduced in r212972.
Remove unnecessary controller reinitialization.PR: kern/87506
The NetBSD Foundation has granted permission to remove clause 3 and 4 fromtheir software.Obtained from: NetBSD
Take a step towards removing if_watchdog/if_timer. Don't explicitly setif_watchdog/if_timer to NULL/0 when initializing an ifnet. if_alloc()sets those members to NULL/0 already.
Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK() across network device drivers when accessing theper-interface multicast address list, if_multiaddrs. This willa
Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK() across network device drivers when accessing theper-interface multicast address list, if_multiaddrs. This willallow us to change the locking strategy without affecting our driverprogramming interface or binary interface.For two wireless drivers, remove unnecessary locking, since theydon't actually access the multicast address list.Approved by: re (kib)MFC after: 6 weeks
When user_frac in the polling subsystem is low it is going to busy theCPU for too long period than necessary. Additively, interfaces are keptpolled (in the tick) even if no more packets are availa
When user_frac in the polling subsystem is low it is going to busy theCPU for too long period than necessary. Additively, interfaces are keptpolled (in the tick) even if no more packets are available.In order to avoid such situations a new generic mechanism can beimplemented in proactive way, keeping track of the time spent on anypacket and fragmenting the time for any tick, stopping the processingas soon as possible.In order to implement such mechanism, the polling handler needs tochange, returning the number of packets processed.While the intended logic is not part of this patch, the polling KPI isbroken by this commit, adding an int return value and the new flagIFCAP_POLLING_NOCOUNT (which will signal that the return value ismeaningless for the installed handler and checking should be skipped).Bump __FreeBSD_version in order to signal such situation.Reviewed by: emasteSponsored by: Sandvine Incorporated
Use m_collapse(9) to collapse mbuf chains instead of relying onshortest possible chain of mbufs of m_defrag(9). What we want ischains of mbufs that can be safely stored to a Tx descriptor whichcan
Use m_collapse(9) to collapse mbuf chains instead of relying onshortest possible chain of mbufs of m_defrag(9). What we want ischains of mbufs that can be safely stored to a Tx descriptor whichcan have up to STGE_MAXTXSEGS mbufs. The ethernet controller doesnot need to align Tx buffers on 32bit boundary. So the use ofm_defrag(9) was waste of time.
12