xref: /f-stack/dpdk/doc/guides/rawdevs/ioat.rst (revision 2d9fd380)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2019 Intel Corporation.
3
4.. include:: <isonum.txt>
5
6IOAT Rawdev Driver
7===================
8
9The ``ioat`` rawdev driver provides a poll-mode driver (PMD) for Intel\ |reg|
10Data Streaming Accelerator `(Intel DSA)
11<https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator>`_ and for Intel\ |reg|
12QuickData Technology, part of Intel\ |reg| I/O Acceleration Technology
13`(Intel I/OAT)
14<https://www.intel.com/content/www/us/en/wireless-network/accel-technology.html>`_.
15This PMD, when used on supported hardware, allows data copies, for example,
16cloning packet data, to be accelerated by that hardware rather than having to
17be done by software, freeing up CPU cycles for other tasks.
18
19Hardware Requirements
20----------------------
21
22The ``dpdk-devbind.py`` script, included with DPDK,
23can be used to show the presence of supported hardware.
24Running ``dpdk-devbind.py --status-dev misc`` will show all the miscellaneous,
25or rawdev-based devices on the system.
26For Intel\ |reg| QuickData Technology devices, the hardware will be often listed as "Crystal Beach DMA",
27or "CBDMA".
28For Intel\ |reg| DSA devices, they are currently (at time of writing) appearing as devices with type "0b25",
29due to the absence of pci-id database entries for them at this point.
30
31Compilation
32------------
33
34For builds using ``meson`` and ``ninja``, the driver will be built when the target platform is x86-based.
35No additional compilation steps are necessary.
36
37Device Setup
38-------------
39
40Depending on support provided by the PMD, HW devices can either use the kernel configured driver
41or be bound to a user-space IO driver for use.
42For example, Intel\ |reg| DSA devices can use the IDXD kernel driver or DPDK-supported drivers,
43such as ``vfio-pci``.
44
45Intel\ |reg| DSA devices using idxd kernel driver
46~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47
48To use a Intel\ |reg| DSA device bound to the IDXD kernel driver, the device must first be configured.
49The `accel-config <https://github.com/intel/idxd-config>`_ utility library can be used for configuration.
50
51.. note::
52        The device configuration can also be done by directly interacting with the sysfs nodes.
53        An example of how this may be done can be seen in the script ``dpdk_idxd_cfg.py``
54        included in the driver source directory.
55
56There are some mandatory configuration steps before being able to use a device with an application.
57The internal engines, which do the copies or other operations,
58and the work-queues, which are used by applications to assign work to the device,
59need to be assigned to groups, and the various other configuration options,
60such as priority or queue depth, need to be set for each queue.
61
62To assign an engine to a group::
63
64        $ accel-config config-engine dsa0/engine0.0 --group-id=0
65        $ accel-config config-engine dsa0/engine0.1 --group-id=1
66
67To assign work queues to groups for passing descriptors to the engines a similar accel-config command can be used.
68However, the work queues also need to be configured depending on the use-case.
69Some configuration options include:
70
71* mode (Dedicated/Shared): Indicates whether a WQ may accept jobs from multiple queues simultaneously.
72* priority: WQ priority between 1 and 15. Larger value means higher priority.
73* wq-size: the size of the WQ. Sum of all WQ sizes must be less that the total-size defined by the device.
74* type: WQ type (kernel/mdev/user). Determines how the device is presented.
75* name: identifier given to the WQ.
76
77Example configuration for a work queue::
78
79        $ accel-config config-wq dsa0/wq0.0 --group-id=0 \
80           --mode=dedicated --priority=10 --wq-size=8 \
81           --type=user --name=app1
82
83Once the devices have been configured, they need to be enabled::
84
85        $ accel-config enable-device dsa0
86        $ accel-config enable-wq dsa0/wq0.0
87
88Check the device configuration::
89
90        $ accel-config list
91
92Devices using VFIO/UIO drivers
93~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94
95The HW devices to be used will need to be bound to a user-space IO driver for use.
96The ``dpdk-devbind.py`` script can be used to view the state of the devices
97and to bind them to a suitable DPDK-supported driver, such as ``vfio-pci``.
98For example::
99
100	$ dpdk-devbind.py -b vfio-pci 00:04.0 00:04.1
101
102Device Probing and Initialization
103~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
104
105For devices bound to a suitable DPDK-supported VFIO/UIO driver, the HW devices will
106be found as part of the device scan done at application initialization time without
107the need to pass parameters to the application.
108
109If the device is bound to the IDXD kernel driver (and previously configured with sysfs),
110then a specific work queue needs to be passed to the application via a vdev parameter.
111This vdev parameter take the driver name and work queue name as parameters.
112For example, to use work queue 0 on Intel\ |reg| DSA instance 0::
113
114        $ dpdk-test --no-pci --vdev=rawdev_idxd,wq=0.0
115
116Once probed successfully, the device will appear as a ``rawdev``, that is a
117"raw device type" inside DPDK, and can be accessed using APIs from the
118``rte_rawdev`` library.
119
120Using IOAT Rawdev Devices
121--------------------------
122
123To use the devices from an application, the rawdev API can be used, along
124with definitions taken from the device-specific header file
125``rte_ioat_rawdev.h``. This header is needed to get the definition of
126structure parameters used by some of the rawdev APIs for IOAT rawdev
127devices, as well as providing key functions for using the device for memory
128copies.
129
130Getting Device Information
131~~~~~~~~~~~~~~~~~~~~~~~~~~~
132
133Basic information about each rawdev device can be queried using the
134``rte_rawdev_info_get()`` API. For most applications, this API will be
135needed to verify that the rawdev in question is of the expected type. For
136example, the following code snippet can be used to identify an IOAT
137rawdev device for use by an application:
138
139.. code-block:: C
140
141        for (i = 0; i < count && !found; i++) {
142                struct rte_rawdev_info info = { .dev_private = NULL };
143                found = (rte_rawdev_info_get(i, &info, 0) == 0 &&
144                                strcmp(info.driver_name,
145                                                IOAT_PMD_RAWDEV_NAME_STR) == 0);
146        }
147
148When calling the ``rte_rawdev_info_get()`` API for an IOAT rawdev device,
149the ``dev_private`` field in the ``rte_rawdev_info`` struct should either
150be NULL, or else be set to point to a structure of type
151``rte_ioat_rawdev_config``, in which case the size of the configured device
152input ring will be returned in that structure.
153
154Device Configuration
155~~~~~~~~~~~~~~~~~~~~~
156
157Configuring an IOAT rawdev device is done using the
158``rte_rawdev_configure()`` API, which takes the same structure parameters
159as the, previously referenced, ``rte_rawdev_info_get()`` API. The main
160difference is that, because the parameter is used as input rather than
161output, the ``dev_private`` structure element cannot be NULL, and must
162point to a valid ``rte_ioat_rawdev_config`` structure, containing the ring
163size to be used by the device. The ring size must be a power of two,
164between 64 and 4096.
165If it is not needed, the tracking by the driver of user-provided completion
166handles may be disabled by setting the ``hdls_disable`` flag in
167the configuration structure also.
168
169The following code shows how the device is configured in
170``test_ioat_rawdev.c``:
171
172.. code-block:: C
173
174   #define IOAT_TEST_RINGSIZE 512
175        struct rte_ioat_rawdev_config p = { .ring_size = -1 };
176        struct rte_rawdev_info info = { .dev_private = &p };
177
178        /* ... */
179
180        p.ring_size = IOAT_TEST_RINGSIZE;
181        if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
182                printf("Error with rte_rawdev_configure()\n");
183                return -1;
184        }
185
186Once configured, the device can then be made ready for use by calling the
187``rte_rawdev_start()`` API.
188
189Performing Data Copies
190~~~~~~~~~~~~~~~~~~~~~~~
191
192To perform data copies using IOAT rawdev devices, the functions
193``rte_ioat_enqueue_copy()`` and ``rte_ioat_perform_ops()`` should be used.
194Once copies have been completed, the completion will be reported back when
195the application calls ``rte_ioat_completed_ops()``.
196
197The ``rte_ioat_enqueue_copy()`` function enqueues a single copy to the
198device ring for copying at a later point. The parameters to that function
199include the IOVA addresses of both the source and destination buffers,
200as well as two "handles" to be returned to the user when the copy is
201completed. These handles can be arbitrary values, but two are provided so
202that the library can track handles for both source and destination on
203behalf of the user, e.g. virtual addresses for the buffers, or mbuf
204pointers if packet data is being copied.
205
206While the ``rte_ioat_enqueue_copy()`` function enqueues a copy operation on
207the device ring, the copy will not actually be performed until after the
208application calls the ``rte_ioat_perform_ops()`` function. This function
209informs the device hardware of the elements enqueued on the ring, and the
210device will begin to process them. It is expected that, for efficiency
211reasons, a burst of operations will be enqueued to the device via multiple
212enqueue calls between calls to the ``rte_ioat_perform_ops()`` function.
213
214The following code from ``test_ioat_rawdev.c`` demonstrates how to enqueue
215a burst of copies to the device and start the hardware processing of them:
216
217.. code-block:: C
218
219        struct rte_mbuf *srcs[32], *dsts[32];
220        unsigned int j;
221
222        for (i = 0; i < RTE_DIM(srcs); i++) {
223                char *src_data;
224
225                srcs[i] = rte_pktmbuf_alloc(pool);
226                dsts[i] = rte_pktmbuf_alloc(pool);
227                srcs[i]->data_len = srcs[i]->pkt_len = length;
228                dsts[i]->data_len = dsts[i]->pkt_len = length;
229                src_data = rte_pktmbuf_mtod(srcs[i], char *);
230
231                for (j = 0; j < length; j++)
232                        src_data[j] = rand() & 0xFF;
233
234                if (rte_ioat_enqueue_copy(dev_id,
235                                srcs[i]->buf_iova + srcs[i]->data_off,
236                                dsts[i]->buf_iova + dsts[i]->data_off,
237                                length,
238                                (uintptr_t)srcs[i],
239                                (uintptr_t)dsts[i]) != 1) {
240                        printf("Error with rte_ioat_enqueue_copy for buffer %u\n",
241                                        i);
242                        return -1;
243                }
244        }
245        rte_ioat_perform_ops(dev_id);
246
247To retrieve information about completed copies, the API
248``rte_ioat_completed_ops()`` should be used. This API will return to the
249application a set of completion handles passed in when the relevant copies
250were enqueued.
251
252The following code from ``test_ioat_rawdev.c`` shows the test code
253retrieving information about the completed copies and validating the data
254is correct before freeing the data buffers using the returned handles:
255
256.. code-block:: C
257
258        if (rte_ioat_completed_ops(dev_id, 64, (void *)completed_src,
259                        (void *)completed_dst) != RTE_DIM(srcs)) {
260                printf("Error with rte_ioat_completed_ops\n");
261                return -1;
262        }
263        for (i = 0; i < RTE_DIM(srcs); i++) {
264                char *src_data, *dst_data;
265
266                if (completed_src[i] != srcs[i]) {
267                        printf("Error with source pointer %u\n", i);
268                        return -1;
269                }
270                if (completed_dst[i] != dsts[i]) {
271                        printf("Error with dest pointer %u\n", i);
272                        return -1;
273                }
274
275                src_data = rte_pktmbuf_mtod(srcs[i], char *);
276                dst_data = rte_pktmbuf_mtod(dsts[i], char *);
277                for (j = 0; j < length; j++)
278                        if (src_data[j] != dst_data[j]) {
279                                printf("Error with copy of packet %u, byte %u\n",
280                                                i, j);
281                                return -1;
282                        }
283                rte_pktmbuf_free(srcs[i]);
284                rte_pktmbuf_free(dsts[i]);
285        }
286
287
288Filling an Area of Memory
289~~~~~~~~~~~~~~~~~~~~~~~~~~
290
291The IOAT driver also has support for the ``fill`` operation, where an area
292of memory is overwritten, or filled, with a short pattern of data.
293Fill operations can be performed in much the same was as copy operations
294described above, just using the ``rte_ioat_enqueue_fill()`` function rather
295than the ``rte_ioat_enqueue_copy()`` function.
296
297
298Querying Device Statistics
299~~~~~~~~~~~~~~~~~~~~~~~~~~~
300
301The statistics from the IOAT rawdev device can be got via the xstats
302functions in the ``rte_rawdev`` library, i.e.
303``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and
304``rte_rawdev_xstats_by_name_get``. The statistics returned for each device
305instance are:
306
307* ``failed_enqueues``
308* ``successful_enqueues``
309* ``copies_started``
310* ``copies_completed``
311