1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2019 Intel Corporation. 3 4.. include:: <isonum.txt> 5 6IOAT Rawdev Driver 7=================== 8 9.. warning:: 10 As of DPDK 21.11 the rawdev implementation of the IOAT driver has been deprecated. 11 Please use the dmadev library instead. 12 13The ``ioat`` rawdev driver provides a poll-mode driver (PMD) for Intel\ |reg| 14Data Streaming Accelerator `(Intel DSA) 15<https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator>`_ and for Intel\ |reg| 16QuickData Technology, part of Intel\ |reg| I/O Acceleration Technology 17`(Intel I/OAT) 18<https://www.intel.com/content/www/us/en/wireless-network/accel-technology.html>`_. 19This PMD, when used on supported hardware, allows data copies, for example, 20cloning packet data, to be accelerated by that hardware rather than having to 21be done by software, freeing up CPU cycles for other tasks. 22 23Hardware Requirements 24---------------------- 25 26The ``dpdk-devbind.py`` script, included with DPDK, 27can be used to show the presence of supported hardware. 28Running ``dpdk-devbind.py --status-dev misc`` will show all the miscellaneous, 29or rawdev-based devices on the system. 30For Intel\ |reg| QuickData Technology devices, the hardware will be often listed as "Crystal Beach DMA", 31or "CBDMA". 32For Intel\ |reg| DSA devices, they are currently (at time of writing) appearing as devices with type "0b25", 33due to the absence of pci-id database entries for them at this point. 34 35Compilation 36------------ 37 38For builds using ``meson`` and ``ninja``, the driver will be built when the target platform is x86-based. 39No additional compilation steps are necessary. 40 41.. note:: 42 Since the addition of the dmadev library, the ``ioat`` and ``idxd`` parts of this driver 43 will only be built if their ``dmadev`` counterparts are not built. 44 The following can be used to disable the ``dmadev`` drivers, 45 if the raw drivers are to be used instead:: 46 47 $ meson -Ddisable_drivers=dma/* <build_dir> 48 49Device Setup 50------------- 51 52Depending on support provided by the PMD, HW devices can either use the kernel configured driver 53or be bound to a user-space IO driver for use. 54For example, Intel\ |reg| DSA devices can use the IDXD kernel driver or DPDK-supported drivers, 55such as ``vfio-pci``. 56 57Intel\ |reg| DSA devices using idxd kernel driver 58~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 59 60To use a Intel\ |reg| DSA device bound to the IDXD kernel driver, the device must first be configured. 61The `accel-config <https://github.com/intel/idxd-config>`_ utility library can be used for configuration. 62 63.. note:: 64 The device configuration can also be done by directly interacting with the sysfs nodes. 65 An example of how this may be done can be seen in the script ``dpdk_idxd_cfg.py`` 66 included in the driver source directory. 67 68There are some mandatory configuration steps before being able to use a device with an application. 69The internal engines, which do the copies or other operations, 70and the work-queues, which are used by applications to assign work to the device, 71need to be assigned to groups, and the various other configuration options, 72such as priority or queue depth, need to be set for each queue. 73 74To assign an engine to a group:: 75 76 $ accel-config config-engine dsa0/engine0.0 --group-id=0 77 $ accel-config config-engine dsa0/engine0.1 --group-id=1 78 79To assign work queues to groups for passing descriptors to the engines a similar accel-config command can be used. 80However, the work queues also need to be configured depending on the use case. 81Some configuration options include: 82 83* mode (Dedicated/Shared): Indicates whether a WQ may accept jobs from multiple queues simultaneously. 84* priority: WQ priority between 1 and 15. Larger value means higher priority. 85* wq-size: the size of the WQ. Sum of all WQ sizes must be less that the total-size defined by the device. 86* type: WQ type (kernel/mdev/user). Determines how the device is presented. 87* name: identifier given to the WQ. 88 89Example configuration for a work queue:: 90 91 $ accel-config config-wq dsa0/wq0.0 --group-id=0 \ 92 --mode=dedicated --priority=10 --wq-size=8 \ 93 --type=user --name=dpdk_app1 94 95Once the devices have been configured, they need to be enabled:: 96 97 $ accel-config enable-device dsa0 98 $ accel-config enable-wq dsa0/wq0.0 99 100Check the device configuration:: 101 102 $ accel-config list 103 104Devices using VFIO/UIO drivers 105~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 106 107The HW devices to be used will need to be bound to a user-space IO driver for use. 108The ``dpdk-devbind.py`` script can be used to view the state of the devices 109and to bind them to a suitable DPDK-supported driver, such as ``vfio-pci``. 110For example:: 111 112 $ dpdk-devbind.py -b vfio-pci 00:04.0 00:04.1 113 114Device Probing and Initialization 115~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 116 117For devices bound to a suitable DPDK-supported VFIO/UIO driver, the HW devices will 118be found as part of the device scan done at application initialization time without 119the need to pass parameters to the application. 120 121For Intel\ |reg| DSA devices, DPDK will automatically configure the device with the 122maximum number of workqueues available on it, partitioning all resources equally 123among the queues. 124If fewer workqueues are required, then the ``max_queues`` parameter may be passed to 125the device driver on the EAL commandline, via the ``allowlist`` or ``-a`` flag e.g.:: 126 127 $ dpdk-test -a <b:d:f>,max_queues=4 128 129For devices bound to the IDXD kernel driver, 130the DPDK ioat driver will automatically perform a scan for available workqueues to use. 131Any workqueues found listed in ``/dev/dsa`` on the system will be checked in ``/sys``, 132and any which have ``dpdk_`` prefix in their name will be automatically probed by the 133driver to make them available to the application. 134Alternatively, to support use by multiple DPDK processes simultaneously, 135the value used as the DPDK ``--file-prefix`` parameter may be used as a workqueue name prefix, 136instead of ``dpdk_``, 137allowing each DPDK application instance to only use a subset of configured queues. 138 139Once probed successfully, irrespective of kernel driver, the device will appear as a ``rawdev``, 140that is a "raw device type" inside DPDK, and can be accessed using APIs from the 141``rte_rawdev`` library. 142 143Using IOAT Rawdev Devices 144-------------------------- 145 146To use the devices from an application, the rawdev API can be used, along 147with definitions taken from the device-specific header file 148``rte_ioat_rawdev.h``. This header is needed to get the definition of 149structure parameters used by some of the rawdev APIs for IOAT rawdev 150devices, as well as providing key functions for using the device for memory 151copies. 152 153Getting Device Information 154~~~~~~~~~~~~~~~~~~~~~~~~~~~ 155 156Basic information about each rawdev device can be queried using the 157``rte_rawdev_info_get()`` API. For most applications, this API will be 158needed to verify that the rawdev in question is of the expected type. For 159example, the following code snippet can be used to identify an IOAT 160rawdev device for use by an application: 161 162.. code-block:: C 163 164 for (i = 0; i < count && !found; i++) { 165 struct rte_rawdev_info info = { .dev_private = NULL }; 166 found = (rte_rawdev_info_get(i, &info, 0) == 0 && 167 strcmp(info.driver_name, 168 IOAT_PMD_RAWDEV_NAME_STR) == 0); 169 } 170 171When calling the ``rte_rawdev_info_get()`` API for an IOAT rawdev device, 172the ``dev_private`` field in the ``rte_rawdev_info`` struct should either 173be NULL, or else be set to point to a structure of type 174``rte_ioat_rawdev_config``, in which case the size of the configured device 175input ring will be returned in that structure. 176 177Device Configuration 178~~~~~~~~~~~~~~~~~~~~~ 179 180Configuring an IOAT rawdev device is done using the 181``rte_rawdev_configure()`` API, which takes the same structure parameters 182as the, previously referenced, ``rte_rawdev_info_get()`` API. The main 183difference is that, because the parameter is used as input rather than 184output, the ``dev_private`` structure element cannot be NULL, and must 185point to a valid ``rte_ioat_rawdev_config`` structure, containing the ring 186size to be used by the device. The ring size must be a power of two, 187between 64 and 4096. 188If it is not needed, the tracking by the driver of user-provided completion 189handles may be disabled by setting the ``hdls_disable`` flag in 190the configuration structure also. 191 192The following code shows how the device is configured in 193``test_ioat_rawdev.c``: 194 195.. code-block:: C 196 197 #define IOAT_TEST_RINGSIZE 512 198 struct rte_ioat_rawdev_config p = { .ring_size = -1 }; 199 struct rte_rawdev_info info = { .dev_private = &p }; 200 201 /* ... */ 202 203 p.ring_size = IOAT_TEST_RINGSIZE; 204 if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) { 205 printf("Error with rte_rawdev_configure()\n"); 206 return -1; 207 } 208 209Once configured, the device can then be made ready for use by calling the 210``rte_rawdev_start()`` API. 211 212Performing Data Copies 213~~~~~~~~~~~~~~~~~~~~~~~ 214 215To perform data copies using IOAT rawdev devices, the functions 216``rte_ioat_enqueue_copy()`` and ``rte_ioat_perform_ops()`` should be used. 217Once copies have been completed, the completion will be reported back when 218the application calls ``rte_ioat_completed_ops()``. 219 220The ``rte_ioat_enqueue_copy()`` function enqueues a single copy to the 221device ring for copying at a later point. The parameters to that function 222include the IOVA addresses of both the source and destination buffers, 223as well as two "handles" to be returned to the user when the copy is 224completed. These handles can be arbitrary values, but two are provided so 225that the library can track handles for both source and destination on 226behalf of the user, e.g. virtual addresses for the buffers, or mbuf 227pointers if packet data is being copied. 228 229While the ``rte_ioat_enqueue_copy()`` function enqueues a copy operation on 230the device ring, the copy will not actually be performed until after the 231application calls the ``rte_ioat_perform_ops()`` function. This function 232informs the device hardware of the elements enqueued on the ring, and the 233device will begin to process them. It is expected that, for efficiency 234reasons, a burst of operations will be enqueued to the device via multiple 235enqueue calls between calls to the ``rte_ioat_perform_ops()`` function. 236 237The following code from ``test_ioat_rawdev.c`` demonstrates how to enqueue 238a burst of copies to the device and start the hardware processing of them: 239 240.. code-block:: C 241 242 struct rte_mbuf *srcs[32], *dsts[32]; 243 unsigned int j; 244 245 for (i = 0; i < RTE_DIM(srcs); i++) { 246 char *src_data; 247 248 srcs[i] = rte_pktmbuf_alloc(pool); 249 dsts[i] = rte_pktmbuf_alloc(pool); 250 srcs[i]->data_len = srcs[i]->pkt_len = length; 251 dsts[i]->data_len = dsts[i]->pkt_len = length; 252 src_data = rte_pktmbuf_mtod(srcs[i], char *); 253 254 for (j = 0; j < length; j++) 255 src_data[j] = rand() & 0xFF; 256 257 if (rte_ioat_enqueue_copy(dev_id, 258 srcs[i]->buf_iova + srcs[i]->data_off, 259 dsts[i]->buf_iova + dsts[i]->data_off, 260 length, 261 (uintptr_t)srcs[i], 262 (uintptr_t)dsts[i]) != 1) { 263 printf("Error with rte_ioat_enqueue_copy for buffer %u\n", 264 i); 265 return -1; 266 } 267 } 268 rte_ioat_perform_ops(dev_id); 269 270To retrieve information about completed copies, the API 271``rte_ioat_completed_ops()`` should be used. This API will return to the 272application a set of completion handles passed in when the relevant copies 273were enqueued. 274 275The following code from ``test_ioat_rawdev.c`` shows the test code 276retrieving information about the completed copies and validating the data 277is correct before freeing the data buffers using the returned handles: 278 279.. code-block:: C 280 281 if (rte_ioat_completed_ops(dev_id, 64, (void *)completed_src, 282 (void *)completed_dst) != RTE_DIM(srcs)) { 283 printf("Error with rte_ioat_completed_ops\n"); 284 return -1; 285 } 286 for (i = 0; i < RTE_DIM(srcs); i++) { 287 char *src_data, *dst_data; 288 289 if (completed_src[i] != srcs[i]) { 290 printf("Error with source pointer %u\n", i); 291 return -1; 292 } 293 if (completed_dst[i] != dsts[i]) { 294 printf("Error with dest pointer %u\n", i); 295 return -1; 296 } 297 298 src_data = rte_pktmbuf_mtod(srcs[i], char *); 299 dst_data = rte_pktmbuf_mtod(dsts[i], char *); 300 for (j = 0; j < length; j++) 301 if (src_data[j] != dst_data[j]) { 302 printf("Error with copy of packet %u, byte %u\n", 303 i, j); 304 return -1; 305 } 306 rte_pktmbuf_free(srcs[i]); 307 rte_pktmbuf_free(dsts[i]); 308 } 309 310 311Filling an Area of Memory 312~~~~~~~~~~~~~~~~~~~~~~~~~~ 313 314The IOAT driver also has support for the ``fill`` operation, where an area 315of memory is overwritten, or filled, with a short pattern of data. 316Fill operations can be performed in much the same was as copy operations 317described above, just using the ``rte_ioat_enqueue_fill()`` function rather 318than the ``rte_ioat_enqueue_copy()`` function. 319 320 321Querying Device Statistics 322~~~~~~~~~~~~~~~~~~~~~~~~~~~ 323 324The statistics from the IOAT rawdev device can be got via the xstats 325functions in the ``rte_rawdev`` library, i.e. 326``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and 327``rte_rawdev_xstats_by_name_get``. The statistics returned for each device 328instance are: 329 330* ``failed_enqueues`` 331* ``successful_enqueues`` 332* ``copies_started`` 333* ``copies_completed`` 334