mxge: replace 65536 with IP_MAXPACKET in tso settings.
mxge: choose appropriate values for hw tso
mxge: Add SIOCGI2C support for devices with SFP/XFP cages
mxge: fix panic at module unloadr333175 (multicast changes) exposed a bug wheremxge was not checking to see if the driver was beingunloaded while handing ioctls that touch hardware.As a result,
mxge: fix panic at module unloadr333175 (multicast changes) exposed a bug wheremxge was not checking to see if the driver was beingunloaded while handing ioctls that touch hardware.As a result, now that in6m_disconnect() is run froman async gtaskq, it was busy-waiting in mxge_send_cmd()while the mcast list was destroyed.
show more ...
ifnet: Replace if_addr_lock rwlock with epoch + mutexRun on LLNW canaries and tested by pho@gallatin:Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5based ConnectX 4-LX NIC, I
ifnet: Replace if_addr_lock rwlock with epoch + mutexRun on LLNW canaries and tested by pho@gallatin:Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5based ConnectX 4-LX NIC, I see an almost 12% improvement in receivedpacket rate, and a larger improvement in bytes delivered all the wayto userspace.When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,I see, using nstat -I mce0 1 before the patch:InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.324.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.324.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.324.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.324.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.324.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.324.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32After the patchInMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.515.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.515.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.515.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.515.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.525.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patchReviewed by: gallatinSponsored by: Limelight NetworksDifferential Revision: https://reviews.freebsd.org/D15366
mxge(4) should pass unhandled ioctls to ether_ioctl()Panasas discovered that ioctl(SIOCGLAGGPORT) returns ENOTTY for mxge(4) whenthe NIC is not a member of a lagg. This came as a surprise, because
mxge(4) should pass unhandled ioctls to ether_ioctl()Panasas discovered that ioctl(SIOCGLAGGPORT) returns ENOTTY for mxge(4) whenthe NIC is not a member of a lagg. This came as a surprise, because theSIOCGLAGGPORT handler in if_lagg.c only returns ENOENT (if run against thelaggX interface, rather than a physical port) or EINVAL (if run against anon-member physical port). This behavior was not seen with other drivers,such as bge(4), igb(4), and cxl(4). When I compared their respective ioctlhandlers, I found that they all called ether_ioctl() for the default (i.e.unhandled) case; by contrast, mxge(4) only calls ether_ioctl() for twospecific cases, and returns ENOTTY for the default case.Remove the two cases which explicitly call ether_ioctl(), and let thedefault case call it instead. This matches what the vast majority of the NICdrivers do.Reviewed by: kmacyMFC after: 1 weekDifferential Revision: https://reviews.freebsd.org/D14381
Revert r327828, r327949, r327953, r328016-r328026, r328041:Uses of mallocarray(9).The use of mallocarray(9) has rocketed the required swap to build FreeBSD.This is likely caused by the allocation
Revert r327828, r327949, r327953, r328016-r328026, r328041:Uses of mallocarray(9).The use of mallocarray(9) has rocketed the required swap to build FreeBSD.This is likely caused by the allocation size attributes which put extra pressureon the compiler.Given that most of these checks are superfluous we have to choose betterwhere to use mallocarray(9). We still have more uses of mallocarray(9) buthopefully this is enough to bring swap usage to a reasonable level.Reported by: woschPR: 225197
Fix build after r327949.Reported by: Cy Schubert
dev: make some use of mallocarray(9).Focus on code where we are doing multiplications within malloc(9). None ofthese is likely to overflow, however the change is still useful as somestatic checke
dev: make some use of mallocarray(9).Focus on code where we are doing multiplications within malloc(9). None ofthese is likely to overflow, however the change is still useful as somestatic checkers can benefit from the allocation attributes we use formallocarray.This initial sweep only covers malloc(9) calls with M_NOWAIT. No goodreason but I started doing the changes before r327796 and at that time itwas convenient to make sure the sorrounding code could handle NULL values.
sys/dev: further adoption of SPDX licensing ID tags.Mainly focus on files that use BSD 2-Clause license, however the tool Iwas using misidentified many licenses so this was mostly a manual - error
sys/dev: further adoption of SPDX licensing ID tags.Mainly focus on files that use BSD 2-Clause license, however the tool Iwas using misidentified many licenses so this was mostly a manual - errorprone - task.The Software Package Data Exchange (SPDX) group provides a specificationto make it easier for automated tools to detect and summarize well knownopensource licenses. We are gradually adopting the specification, notingthat the tags are considered only advisory and do not, in any way,superceed or replace the license texts.
mxge: Setup mbuf flowid before calling tcp_lro_rx().Reviewed by: gallatinMFC after: 1 weekSponsored by: Microsoft OSTCDifferential Revision: https://reviews.freebsd.org/D6320
sys/dev: use our nitems() macro when it is avaliable through param.h.No functional change, only trivial cases are done in this sweep,Drivers that can get further enhancements will be done independ
sys/dev: use our nitems() macro when it is avaliable through param.h.No functional change, only trivial cases are done in this sweep,Drivers that can get further enhancements will be done independently.Discussed in: freebsd-current
tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplicationAnd factor out tcp_lro_rx_done, which deduplicates the same logic withnetinet/tcp_lro.cReviewed by: gallatin (1st versio
tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplicationAnd factor out tcp_lro_rx_done, which deduplicates the same logic withnetinet/tcp_lro.cReviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>Sponsored by: Microsoft OSTCDifferential Revision: https://reviews.freebsd.org/D5725
Use uintmax_t (typedef'd to rman_res_t type) for rman ranges.On some architectures, u_long isn't large enough for resource definitions.Particularly, powerpc and arm allow 36-bit (or larger) physic
Use uintmax_t (typedef'd to rman_res_t type) for rman ranges.On some architectures, u_long isn't large enough for resource definitions.Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, buttype `long' is only 32-bit. This extends rman's resources to uintmax_t. Withthis change, any resource can feasibly be placed anywhere in physical memory(within the constraints of the driver).Why uintmax_t and not something machine dependent, or uint64_t? Though it'spossible for uintmax_t to grow, it's highly unlikely it will become 128-bit on32-bit architectures. 64-bit architectures should have plenty of RAM to absorbthe increase on resource sizes if and when this occurs, and the number ofresources on memory-constrained systems should be sufficiently small as to notpose a drastic overhead. That being said, uintmax_t was chosen for sourceclarity. If it's specified as uint64_t, all printf()-like calls would eitherneed casts to uintmax_t, or be littered with PRI*64 macros. Casts to uintmax_taren't horrible, but it would also bake into the API forresource_list_print_type() either a hidden assumption that entries get cast touintmax_t for printing, or these calls would need the PRI*64 macros. Sincesource code is meant to be read more often than written, I chose the clearestpath of simply using uintmax_t.Tested on a PowerPC p5020-based board, which places all device resources in0xfxxxxxxxx, and has 8GB RAM.Regression tested on qemu-system-i386Regression tested on qemu-system-mips (malta profile)Tested PAE and devinfo on virtualbox (live CD)Special thanks to bz for his testing on ARM.Reviewed By: bz, jhb (previous)Relnotes: YesSponsored by: Alex Perez/Inertial ComputingDifferential Revision: https://reviews.freebsd.org/D4544
Replace several bus_alloc_resource() calls using default arguments with bus_alloc_resource_any()Since these calls only use default arguments, bus_alloc_resource_any() is theright call.Differenti
Replace several bus_alloc_resource() calls using default arguments with bus_alloc_resource_any()Since these calls only use default arguments, bus_alloc_resource_any() is theright call.Differential Revision: https://reviews.freebsd.org/D5306
Add optimizing LRO wrapper:- Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many c
Add optimizing LRO wrapper:- Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many connections at the same time. Testing using a larger number of higher bandwidth TCP connections showed that the incoming ACK packet aggregation rate increased from ~1.3:1 to almost 3:1. Another test showed that for a number of TCP connections greater than 16 per hardware receive ring, where 8 TCP connections was the LRO active entry limit, there was a significant improvement in throughput due to being able to fully aggregate more than 8 TCP stream. For very few very high bandwidth TCP streams, the optimizing LRO wrapper will add CPU usage instead of reducing CPU usage. This is expected. Network drivers which want to use the optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of "tcp_lro_flush()". Further the LRO control structure must be initialized using "tcp_lro_init_args()" passing a non-zero number into the "lro_mbufs" argument.- Make LRO statistics 64-bit. Previously 32-bit integers were used for statistics which can be prone to wrap-around. Fix this while at it and update all SYSCTL's which expose LRO statistics.- Ensure all data is freed when destroying a LRO control structures, especially leftover LRO entries.- Reduce number of memory allocations needed when setting up a LRO control structure by precomputing the total amount of memory needed.- Add own memory allocation counter for LRO.- Bump the FreeBSD version to force recompilation of all KLDs due to change of the LRO control structure size.Sponsored by: Mellanox TechnologiesReviewed by: gallatin, sbruno, rrs, gnn, transportTested by: NetflixDifferential Revision: https://reviews.freebsd.org/D4914
Move zlib.c from net to libkern.It is not network-specific code and wouldbe better as part of libkern instead.Move zlib.h and zutil.h from net/ to sys/Update includes to use sys/zlib.h and sys/z
Move zlib.c from net to libkern.It is not network-specific code and wouldbe better as part of libkern instead.Move zlib.h and zutil.h from net/ to sys/Update includes to use sys/zlib.h and sys/zutil.h instead of net/Submitted by: Steve Kiernan [email protected]Obtained from: Juniper Networks, Inc.GitHub Pull Request: https://github.com/freebsd/freebsd/pull/28Relnotes: yes
Start process of removing the use of the deprecated "M_FLOWID" flagfrom the FreeBSD network code. The flag is still kept around in the"sys/mbuf.h" header file, but does no longer have any users. In
Start process of removing the use of the deprecated "M_FLOWID" flagfrom the FreeBSD network code. The flag is still kept around in the"sys/mbuf.h" header file, but does no longer have any users. Insteadthe "m_pkthdr.rsstype" field in the mbuf structure is now used todecide the meaning of the "m_pkthdr.flowid" field. To modify the"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"macros as defined in the "sys/mbuf.h" header file.This patch introduces new behaviour in the transmit direction.Previously network drivers checked if "M_FLOWID" was set in "m_flags"before using the "m_pkthdr.flowid" field. This check has now now beenreplaced by checking if "M_HASHTYPE_GET(m)" is different from"M_HASHTYPE_NONE". In the future more hashtypes will be added, forexample hashtypes for hardware dedicated flows."M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value isvalid and has no particular type. This change removes the need for an"if" statement in TCP transmit code checking for the presence of avalid flowid value. The "if" statement mentioned above is now a directvariable assignment which is then later checked by the respectivenetwork drivers like before.Additional notes:- The SCTP code changes will be committed as a separate patch.- Removal of the "M_FLOWID" flag will also be done separately.- The FreeBSD version has been bumped.MFC after: 1 monthSponsored by: Mellanox Technologies
Fix multiple incorrect SYSCTL arguments in the kernel:- Wrong integer type was specified.- Wrong or missing "access" specifier. The "access" specifiersometimes included the SYSCTL type, which it
Fix multiple incorrect SYSCTL arguments in the kernel:- Wrong integer type was specified.- Wrong or missing "access" specifier. The "access" specifiersometimes included the SYSCTL type, which it should not, except forprocedural SYSCTL nodes.- Logical OR where binary OR was expected.- Properly assert the "access" argument passed to all SYSCTL macros,using the CTASSERT macro. This applies to both static- and dynamicallycreated SYSCTLs.- Properly assert the the data type for both static and dynamicSYSCTLs. In the case of static SYSCTLs we only assert that the datapointed to by the SYSCTL data pointer has the correct size, hencethere is no easy way to assert types in the C language outside aC-function.- Rewrote some code which doesn't pass a constant "access" specifierwhen creating dynamic SYSCTL nodes, which is now a requirement.- Updated "EXAMPLES" section in SYSCTL manual page.MFC after: 3 daysSponsored by: Mellanox Technologies
Whitespace cleanup.
- Provide mxge_get_counter() to return counters that are not collected, but taken from hardware.- Mechanically convert to if_inc_counter() the rest of counters.
Remove ifq_drops from struct ifqueue. Now queue drops are accounted instruct ifnet if_oqdrops.Some netgraph modules used ifqueue w/o ifnet. Accounting of queue dropsis simply removed from them. T
Remove ifq_drops from struct ifqueue. Now queue drops are accounted instruct ifnet if_oqdrops.Some netgraph modules used ifqueue w/o ifnet. Accounting of queue dropsis simply removed from them. There were no API to read this statistic.Sponsored by: NetflixSponsored by: Nginx, Inc.
Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbitinterface, in the r241616 a crutch was provided. It didn't work well, andfinally we decided that it is time to break ABI an
Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbitinterface, in the r241616 a crutch was provided. It didn't work well, andfinally we decided that it is time to break ABI and simply make if_baudratea 64-bit value. Meanwhile, the entire struct if_data was reviewed.o Remove the if_baudrate_pf crutch.o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code.o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes.o Give 32 bits to ifi_mtu and ifi_metric.o Give 64 bits to the rest of fields, since they are counters.__FreeBSD_version bumped.Discussed with: emaxSponsored by: NetflixSponsored by: Nginx, Inc.
Fix undefined behavior: (1 << 31) is not defined as 1 is an int and thisshifts into the sign bit. Instead use (1U << 31) which gets theexpected result.This fix is not ideal as it assumes a 32 bi
Fix undefined behavior: (1 << 31) is not defined as 1 is an int and thisshifts into the sign bit. Instead use (1U << 31) which gets theexpected result.This fix is not ideal as it assumes a 32 bit int, but does fix the issuefor most cases.A similar change was made in OpenBSD.Discussed with: -arch, rdivackyReviewed by: cperciva
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepareto this event, adding if_var.h to files that do need it. Also, includeall includes that now are included due to implicit po
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepareto this event, adding if_var.h to files that do need it. Also, includeall includes that now are included due to implicit pollution via if_var.hSponsored by: NetflixSponsored by: Nginx, Inc.
123456