|
Revision tags: release/12.4.0, release/13.1.0, release/12.3.0 |
|
| #
132894ca |
| 06-Aug-2021 |
John Baldwin <[email protected]> |
cxgbei: Support for ISO (iSCSI segmentation offload).
ISO can be disabled before establishing a connection by setting dev.tNnex.N.toe.iso to 0.
Sponsored by: Chelsio Communications Differential Rev
cxgbei: Support for ISO (iSCSI segmentation offload).
ISO can be disabled before establishing a connection by setting dev.tNnex.N.toe.iso to 0.
Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D31223
(cherry picked from commit 5b27e4b27caae840bd79ccc5cb7811a0c9acc656)
show more ...
|
|
Revision tags: release/13.0.0 |
|
| #
4e4ec8a9 |
| 26-Mar-2021 |
John Baldwin <[email protected]> |
cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload transmit queues. Currently it only holds a work queue but will include additional state
cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload transmit queues. Currently it only holds a work queue but will include additional state in future changes.
Sponsored by: Chelsio Communications
(cherry picked from commit 077ba6a845fab8f1d3bd83e07f61730f202a46fc)
show more ...
|
| #
0082e479 |
| 03-Dec-2020 |
John Baldwin <[email protected]> |
Clear TLS offload mode if a TLS socket hangs without receiving data.
By default, if a TOE TLS socket stops receiving data for more than 5 seconds, revert the connection back to plain TOE mode. This
Clear TLS offload mode if a TLS socket hangs without receiving data.
By default, if a TOE TLS socket stops receiving data for more than 5 seconds, revert the connection back to plain TOE mode. This provides a fallback if the userland SSL library does not support KTLS. In addition, for client TLS 1.3 sockets using connect(), the TOE socket blocks before the handshake has completed since the socket option is only invoked for the final handshake.
The timeout defaults to 5 seconds, but can be changed at boot via the hw.cxgbe.toe.tls_rx_timeout tunable or for an individual interface via the dev.<nexus>.toe.tls_rx_timeout sysctl.
Reviewed by: np MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27470
show more ...
|
|
Revision tags: release/12.2.0 |
|
| #
56fb710f |
| 06-Oct-2020 |
John Baldwin <[email protected]> |
Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag
Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag).
In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query.
Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689
show more ...
|
|
Revision tags: release/11.4.0 |
|
| #
b0dede77 |
| 19-May-2020 |
Navdeep Parhar <[email protected]> |
cxgbe/iw_cxgbe: Add an async callback to notify iw_cxgbe in case of a fatal error.
Submitted by: Krishnamraju Eraparaju @ Chelsio MFC after: 2 weeks Sponsored by: Chelsio Communications
|
| #
bddf7343 |
| 21-Nov-2019 |
John Baldwin <[email protected]> |
NIC KTLS for Chelsio T6 adapters.
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters. Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE connections.
NIC KTLS on T6 is n
NIC KTLS for Chelsio T6 adapters.
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters. Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE connections.
NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment the encrypted TLS frames output by the crypto engine. Instead, the TOE is placed into a special setup to permit "dummy" connections to be associated with regular sockets using KTLS. This permits using the TOE to segment the encrypted TLS records. However, this approach does have some limitations:
1) Regular TOE sockets cannot be used when the TOE is in this special mode. One can use either TOE and TOE-based KTLS or NIC KTLS, but not both at the same time.
2) In NIC KTLS mode, the TOE is only able to accept a per-connection timestamp offset that varies in the upper 4 bits. Put another way, only connections whose timestamp offset has the 28 lower bits cleared can use NIC KTLS and generate correct timestamps. The driver will refuse to enable NIC KTLS on connections with a timestamp offset with any of the lower 28 bits set. To use NIC KTLS, users can either disable TCP timestamps by setting the net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the tcp_new_ts_offset() function to clear the lower 28 bits of the generated offset.
3) Because the TCP segmentation relies on fields mirrored in a TCB in the TOE, not all fields in a TCP packet can be sent in the TCP segments generated from a TLS record. Specifically, for packets containing TCP options other than timestamps, the driver will inject an "empty" TCP packet holding the requested options (e.g. a SACK scoreboard) along with the segments from the TLS record. These empty TCP packets are counted by the dev.cc.N.txq.M.kern_tls_options sysctls.
Unlike TOE TLS which is able to buffer encrypted TLS records in on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS records for retransmit requests as well as non-retransmit requests that do not include the start of a TLS record but do include the trailer. The T6 NIC KTLS code tries to optimize some of the cases for requests to transmit partial TLS records. In particular it attempts to minimize sending "waste" bytes that have to be given as input to the crypto engine but are not needed on the wire to satisfy mbufs sent from the TCP stack down to the driver.
TCP packets for TLS requests are broken down into the following classes (with associated counters):
- Mbufs that send an entire TLS record in full do not have any waste bytes (dev.cc.N.txq.M.kern_tls_full).
- Mbufs that send a short TLS record that ends before the end of the trailer (dev.cc.N.txq.M.kern_tls_short). For sockets using AES-CBC, the encryption must always start at the beginning, so if the mbuf starts at an offset into the TLS record, the offset bytes will be "waste" bytes. For sockets using AES-GCM, the encryption can start at the 16 byte block before the starting offset capping the waste at 15 bytes.
- Mbufs that send a partial TLS record that has a non-zero starting offset but ends at the end of the trailer (dev.cc.N.txq.M.kern_tls_partial). In order to compute the authentication hash stored in the trailer, the entire TLS record must be sent as input to the crypto engine, so the bytes before the offset are always "waste" bytes.
In addition, other per-txq sysctls are provided:
- dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq using AES-CBC.
- dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq using AES-GCM.
- dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to compensate for the TOE engine not being able to set FIN on the last segment of a TLS record if the TLS record mbuf had FIN set.
- dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this txq including full, short, and partial records.
- dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header and payload) sent for TLS record requests.
- dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS record requests.
To enable NIC KTLS with T6, set the following tunables prior to loading the cxgbe(4) driver:
hw.cxgbe.config_file=kern_tls hw.cxgbe.kern_tls=1
Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21962
show more ...
|
|
Revision tags: release/12.1.0 |
|
| #
e38a50e8 |
| 22-Oct-2019 |
John Baldwin <[email protected]> |
Split Chelsio send tags into a generic base tag and a ratelimit tag.
NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a distinct tag from a ratelimit tag. To support this, refactor cxg
Split Chelsio send tags into a generic base tag and a ratelimit tag.
NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a distinct tag from a ratelimit tag. To support this, refactor cxgbe_snd_tag to be a simple send tag with a type and convert the existing ratelimit tag to a new cxgbe_rate_tag structure.
Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22072
show more ...
|
|
Revision tags: release/11.3.0 |
|
| #
be09e82a |
| 29-Mar-2019 |
Navdeep Parhar <[email protected]> |
cxgbe/t4_tom: Catch up with r344433, which removed tcb_autorcvbuf_inc.
The declaration in tcp_var.h is still around so t4_tom continued to compile but wouldn't load. A separate commit will fix tcp_
cxgbe/t4_tom: Catch up with r344433, which removed tcb_autorcvbuf_inc.
The declaration in tcp_var.h is still around so t4_tom continued to compile but wouldn't load. A separate commit will fix tcp_var.h
Reported By: Dustin Marquess (dmarquess at gmail)
Sponsored by: Chelsio Communications
show more ...
|
|
Revision tags: release/12.0.0 |
|
| #
51347c3f |
| 15-Aug-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Use two hashes instead of a table to keep track of hashfilters. Two because the driver needs to look up a hashfilter by its 4-tuple or tid.
A couple of fixes while here: - Reject attempts
cxgbe(4): Use two hashes instead of a table to keep track of hashfilters. Two because the driver needs to look up a hashfilter by its 4-tuple or tid.
A couple of fixes while here: - Reject attempts to add duplicate hashfilters. - Do not assume that any part of the 4-tuple that isn't specified is 0. This makes it consistent with all other mandatory parameters that already require explicit user input.
MFC after: 2 weeks Sponsored by: Chelsio Communications
show more ...
|
| #
5fc0f72f |
| 09-Aug-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Add support for high priority filters on T6+. They have their own region in the TCAM starting with T6, unlike previous chips where they were in the same region as normal filters.
These fi
cxgbe(4): Add support for high priority filters on T6+. They have their own region in the TCAM starting with T6, unlike previous chips where they were in the same region as normal filters.
These filters "hit" before anything else in the LE's lookup. The exact order is: a) High priority filters b) TOE's active region (TCAM and/or hash) c) Servers (TOE hw listeners) d) Normal filters
MFC after: 1 week Sponsored by: Chelsio Communications
show more ...
|
| #
0c71c9cc |
| 02-Aug-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Improvements in TID management.
- Ignore any type of TID where the start/end values are not in the correct order. There are situations where the firmware isn't able to reserve room fo
cxgbe(4): Improvements in TID management.
- Ignore any type of TID where the start/end values are not in the correct order. There are situations where the firmware isn't able to reserve room for the number requested in the config file but doesn't report a failure during configuration and instead sets end <= start.
- Track start/end in tid_tab and remove some redundant copies from adapter->params.
- Move all the start/end and other read-only parameters to a quiet part of tid_tab, away from the tid locks.
MFC after: 1 week Sponsored by: Chelsio Communications
show more ...
|
|
Revision tags: release/11.2.0 |
|
| #
786099de |
| 24-May-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Data path for rate-limited tx.
This is hardware support for the SO_MAX_PACING_RATE sockopt (see setsockopt(2)), which is available in kernels built with "options RATELIMIT".
Relnotes: Yes
cxgbe(4): Data path for rate-limited tx.
This is hardware support for the SO_MAX_PACING_RATE sockopt (see setsockopt(2)), which is available in kernels built with "options RATELIMIT".
Relnotes: Yes Sponsored by: Chelsio Communications
show more ...
|
| #
67e07112 |
| 18-May-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Implement ifnet callbacks that deal with send tags.
An etid (ethoffload tid) is allocated for a send tag and it acquires a reference on the traffic class that matches the send parameters a
cxgbe(4): Implement ifnet callbacks that deal with send tags.
An etid (ethoffload tid) is allocated for a send tag and it acquires a reference on the traffic class that matches the send parameters associated with the tag.
Sponsored by: Chelsio Communications
show more ...
|
| #
89f651e7 |
| 09-May-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be configured via a new "hashfilter" subcommand in cxgbetool. Hash and normal TCAM filt
cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be configured via a new "hashfilter" subcommand in cxgbetool. Hash and normal TCAM filters can be used together. The hardware does an exact-match of packet fields for hash filters, unlike the masked match performed for TCAM filters. Any T5/T6 card with memory can support at least half a million hash filters. The sample config file with the driver configures 512K of these, it is possible to double this to 1 million+ in some cases.
The chip does an exact-match of fields of incoming datagrams with hash filters and performs the action configured for the filter if it matches. The fields to match are specified in a "filter mask" in the firmware config file. The filter mask always includes the 5-tuple (sip, dip, sport, dport, ipproto). It can, optionally, also include any subset of the filter mode (see filterMode and filterMask in the firmware config file).
For example: filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe filterMask = protocol, port, vlan
Exact values of the 5-tuple, the physical port, and VLAN tag would have to be provided while setting up a hash filter with the chip configuration above.
Hash filters support all actions supported by TCAM filters. A packet that hits a hash filter can be dropped, let through (with optional steering to a specific queue or RSS region), switched out of another port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed. (Support for some of these will show up in the driver in a follow-up commit very shortly).
Sponsored by: Chelsio Communications
show more ...
|
| #
111638bf |
| 30-Apr-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Convert ACT_OPEN_RPL to a shared CPL.
Reserve 3b in the 14b atid to identify the owner and use it to dispatch the CPL. This allows all CPLs that use an atid to be used as shared CPLs, alt
cxgbe(4): Convert ACT_OPEN_RPL to a shared CPL.
Reserve 3b in the 14b atid to identify the owner and use it to dispatch the CPL. This allows all CPLs that use an atid to be used as shared CPLs, although ACT_OPEN_RPL is the only one being converted in this revision.
Sponsored by: Chelsio Communications
show more ...
|
| #
1131c927 |
| 14-Apr-2018 |
Navdeep Parhar <[email protected]> |
cxgbe(4): Add support for Connection Offload Policy (aka COP).
COP allows fine-grained control on whether to offload a TCP connection using t4_tom, and what settings to apply to a connection selecte
cxgbe(4): Add support for Connection Offload Policy (aka COP).
COP allows fine-grained control on whether to offload a TCP connection using t4_tom, and what settings to apply to a connection selected for offload. t4_tom must still be loaded and IFCAP_TOE must still be enabled for full TCP offload to take place on an interface. The difference is that IFCAP_TOE used to be the only knob and would enable TOE for all new connections on the inteface, but now the driver will also consult the COP, if any, before offloading to the hardware TOE.
A policy is a plain text file with any number of rules, one per line. Each rule has a "match" part consisting of a socket-type (L = listen, A = active open, P = passive open, D = don't care) and a pcap-filter(7) expression, and a "settings" part that specifies whether to offload the connection or not and the parameters to use if so. The general format of a rule is: [socket-type] expr => settings
Example. See cxgbetool(8) for more information. [L] ip && port http => offload [L] port 443 => !offload [L] port ssh => offload [P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno [P] dst port ssh => offload !nagle ecn cong tahoe [P] dst port http => offload [A] dst port 443 => offload tls [A] dst net 192.168/16 => offload !timestamp cong highspeed
The driver processes the rules for each new listen, active open, or passive open and stops at the first match. There is an implicit rule at the end of every policy that prohibits offload when no rule in the policy matches: [D] all => !offload
This is a reworked and expanded version of a patch submitted by Krishnamraju Eraparaju @ Chelsio.
Sponsored by: Chelsio Communications
show more ...
|
| #
1e9538d2 |
| 13-Mar-2018 |
John Baldwin <[email protected]> |
Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS encryption and TCP segmentation for offloaded connections. Sockets using
Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS encryption and TCP segmentation for offloaded connections. Sockets using TLS are required to use a set of custom socket options to upload RX and TX keys to the NIC and to enable RX processing. Currently these socket options are implemented as TCP options in the vendor specific range. A patched OpenSSL library will be made available in a port / package for use with the TLS TOE support.
TOE sockets can either offload both transmit and reception of TLS records or just transmit. TLS offload (both RX and TX) is enabled by setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be enabled on the relevant interface. Transmit offload can be used on any "normal" or TLS TOE socket by using the custom socket option to program a transmit key. This permits most TOE sockets to transparently offload TLS when applications use a patched SSL library (e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL library). Receive offload can only be used with TOE sockets using the TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a list of TCP port numbers. Any connection with either a local or remote port number in that list will be created as a TLS socket rather than a plain TOE socket. Note that although this sysctl accepts an arbitrary list of port numbers, the sysctl(8) tool is only able to set sysctl nodes to a single value. A TLS socket will hang without receiving data if used by an application that is not using a patched SSL library. Thus, the tls_rx_ports node should be used with care. For a server mostly concerned with offloading TLS transmit, this node is not needed as plain TOE sockets will fall back to software crypto when using an unpatched SSL library.
New per-interface statistics nodes are added giving counts of TLS packets and payload bytes (payload bytes do not include TLS headers or authentication tags/MACs) offloaded via the TOE engine, e.g.:
dev.cc.0.stats.rx_tls_octets: 149 dev.cc.0.stats.rx_tls_records: 13 dev.cc.0.stats.tx_tls_octets: 26501823 dev.cc.0.stats.tx_tls_records: 1620
TLS transmit work requests are constructed by a new variant of t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.
TLS transmit work requests require a buffer containing IVs. If the IVs are too large to fit into the work request, a separate buffer is allocated when constructing a work request. This buffer is associated with the transmit descriptor and freed when the descriptor is ACKed by the adapter.
Received TLS frames use two new CPL messages. The first message is a CPL_TLS_DATA containing the decryped payload of a single TLS record. The handler places the mbuf containing the received payload on an mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message which includes a copy of the TLS header and indicates if there were any errors. The handler for this message places the TLS header into the socket buffer followed by the saved mbuf with the payload data. Both of these handlers are contained in tom/t4_tls.c.
A few routines were exposed from t4_cpl_io.c for use by t4_tls.c including send_rx_credits(), a new send_rx_modulate(), and t4_close_conn().
TLS keys for both transmit and receive are stored in onboard memory in the NIC in the "TLS keys" memory region.
In some cases a TLS socket can hang with pending data available in the NIC that is not delivered to the host. As a workaround, TLS sockets are more aggressive about sending CPL_RX_DATA_ACK messages anytime that any data is read from a TLS socket. In addition, a fallback timer will periodically send CPL_RX_DATA_ACK messages to the NIC for connections that are still in the handshake phase. Once the connection has finished the handshake and programmed RX keys via the socket option, the timer is stopped.
A new function select_ulp_mode() is used to determine what sub-mode a given TOE socket should use (plain TOE, DDP, or TLS). The existing set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and handles initialization of TLS-specific state when necessary in addition to DDP-specific state.
Since TLS sockets do not receive individual TCP segments but always receive full TLS records, they can receive more data than is available in the current window (e.g. if a 16k TLS record is received but the socket buffer is itself 16k). To cope with this, just drop the window to 0 when this happens, but track the overage and "eat" the overage as it is read from the socket buffer not opening the window (or adding rx_credits) for the overage bytes.
Reviewed by: np (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14529
show more ...
|
| #
198729ea |
| 26-Feb-2018 |
John Baldwin <[email protected]> |
Fetch TLS key parameters from the firmware.
The parameters describe how much of the adapter's memory is reserved for storing TLS keys. The 'meminfo' sysctl now lists this region of adapter memory a
Fetch TLS key parameters from the firmware.
The parameters describe how much of the adapter's memory is reserved for storing TLS keys. The 'meminfo' sysctl now lists this region of adapter memory as 'TLS keys' if present.
Sponsored by: Chelsio Communications
show more ...
|
| #
718cf2cc |
| 27-Nov-2017 |
Pedro F. Giffuni <[email protected]> |
sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error
sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
show more ...
|
| #
5c2bacde |
| 07-Nov-2017 |
Navdeep Parhar <[email protected]> |
Update the iw_cxgbe bits in the projects branch.
Submitted by: Krishnamraju Eraparaju @ Chelsio Sponsored by: Chelsio Communications
|
|
Revision tags: release/10.4.0 |
|
| #
3ef74299 |
| 31-Aug-2017 |
Navdeep Parhar <[email protected]> |
cxgbe/t4_tom: Add a knob to select the congestion control algorigthm used by the TOE hardware for fully offloaded connections. The knob affects new connections only.
MFC after: 2 weeks Sponsored by
cxgbe/t4_tom: Add a knob to select the congestion control algorigthm used by the TOE hardware for fully offloaded connections. The knob affects new connections only.
MFC after: 2 weeks Sponsored by: Chelsio Communications
show more ...
|
|
Revision tags: release/11.1.0 |
|
| #
0a600b63 |
| 13-Apr-2017 |
Navdeep Parhar <[email protected]> |
cxgbe: Query some more RDMA related parameters from the firmware.
MFC after: 3 days Sponsored by: Chelsio Communications
|
|
Revision tags: release/11.0.1, release/11.0.0 |
|
| #
7cba15b1 |
| 01-Sep-2016 |
Navdeep Parhar <[email protected]> |
cxgbe/cxgbei: Retire all DDP related code from cxgbei and switch to routines available in t4_tom to manage the iSCSI DDP page pod region.
This adds the ability to use multiple DDP page sizes to the
cxgbe/cxgbei: Retire all DDP related code from cxgbei and switch to routines available in t4_tom to manage the iSCSI DDP page pod region.
This adds the ability to use multiple DDP page sizes to the iSCSI driver, among other improvements.
Sponsored by: Chelsio Communications
show more ...
|
| #
07159830 |
| 27-Jul-2016 |
John Baldwin <[email protected]> |
Add support for zero-copy aio_write() on TOE sockets.
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now DMA directly from the user-supplied buffer. This is implemented by wiring
Add support for zero-copy aio_write() on TOE sockets.
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now DMA directly from the user-supplied buffer. This is implemented by wiring the pages backing the user-supplied buffer and queueing special mbufs backed by raw VM pages to the socket buffer. The TOE code recognizes these special mbufs and builds a sglist from the VM page array associated with the mbuf when queueing a work request to the TOE.
Because these mbufs do not have an associated virtual address, m_data is not valid. Thus, the AIO handler does not invoke sosend() directly for these mbufs but instead inlines portions of sosend_generic() and tcp_usr_send().
An aiotx_buffer structure is used to describe the user buffer (e.g. it holds the array of VM pages and a reference to the AIO job). The special mbufs reference this structure via m_ext. Note that a single job might be split across multiple mbufs (e.g. if it is larger than the socket buffer size). The 'ext_arg2' member of each mbuf gives an offset relative to the backing aiotx_buffer. The AIO job associated with an aiotx_buffer structure is completed when the last reference to the structure is released.
Zero-copy aio_write()'s for connections associated with a given adapter can be enabled/disabled at runtime via the 'dev.t[45]nex.N.toe.tx_zcopy' sysctl.
MFC after: 1 month Relnotes: yes Sponsored by: Chelsio Communications
show more ...
|
| #
dc964385 |
| 07-May-2016 |
John Baldwin <[email protected]> |
Use DDP to implement zerocopy TCP receive with aio_read().
Chelsio's TCP offload engine supports direct DMA of received TCP payload into wired user buffers. This feature is known as Direct-Data Pla
Use DDP to implement zerocopy TCP receive with aio_read().
Chelsio's TCP offload engine supports direct DMA of received TCP payload into wired user buffers. This feature is known as Direct-Data Placement. However, to scale well the adapter needs to prepare buffers for DDP before data arrives. aio_read() is more amenable to this requirement than read() as applications often call read() only after data is available in the socket buffer.
When DDP is enabled, TOE sockets use the recently added pru_aio_queue protocol hook to claim aio_read(2) requests instead of letting them use the default AIO socket logic. The DDP feature supports scheduling DMA to two buffers at a time so that the second buffer is ready for use after the first buffer is filled. The aio/DDP code optimizes the case of an application ping-ponging between two buffers (similar to the zero-copy bpf(4) code) by keeping the two most recently used AIO buffers wired. If a buffer is reused, the aio/DDP code is able to reuse the vm_page_t array as well as page pod mappings (a kind of MMU mapping the Chelsio NIC uses to describe user buffers). The generation of the vmspace of the calling process is used in conjunction with the user buffer's address and length to determine if a user buffer matches a previously used buffer. If an application queues a buffer for AIO that does not match a previously used buffer then the least recently used buffer is unwired before the new buffer is wired. This ensures that no more than two user buffers per socket are ever wired.
Note that this feature is best suited to applications sending a steady stream of data vs short bursts of traffic.
Discussed with: np Relnotes: yes Sponsored by: Chelsio Communications
show more ...
|