xref: /f-stack/dpdk/doc/guides/sample_app_ug/ioat.rst (revision 2d9fd380)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2019 Intel Corporation.
3
4.. include:: <isonum.txt>
5
6Packet copying using Intel\ |reg| QuickData Technology
7======================================================
8
9Overview
10--------
11
12This sample is intended as a demonstration of the basic components of a DPDK
13forwarding application and example of how to use IOAT driver API to make
14packets copies.
15
16Also while forwarding, the MAC addresses are affected as follows:
17
18*   The source MAC address is replaced by the TX port MAC address
19
20*   The destination MAC address is replaced by  02:00:00:00:00:TX_PORT_ID
21
22This application can be used to compare performance of using software packet
23copy with copy done using a DMA device for different sizes of packets.
24The example will print out statistics each second. The stats shows
25received/send packets and packets dropped or failed to copy.
26
27Compiling the Application
28-------------------------
29
30To compile the sample application see :doc:`compiling`.
31
32The application is located in the ``ioat`` sub-directory.
33
34
35Running the Application
36-----------------------
37
38In order to run the hardware copy application, the copying device
39needs to be bound to user-space IO driver.
40
41Refer to the "IOAT Rawdev Driver" chapter in the "Rawdev Drivers" document
42for information on using the driver.
43
44The application requires a number of command line options:
45
46.. code-block:: console
47
48    ./<build_dir>/examples/dpdk-ioat [EAL options] -- [-p MASK] [-q NQ] [-s RS] [-c <sw|hw>]
49        [--[no-]mac-updating]
50
51where,
52
53*   p MASK: A hexadecimal bitmask of the ports to configure (default is all)
54
55*   q NQ: Number of Rx queues used per port equivalent to CBDMA channels
56    per port (default is 1)
57
58*   c CT: Performed packet copy type: software (sw) or hardware using
59    DMA (hw) (default is hw)
60
61*   s RS: Size of IOAT rawdev ring for hardware copy mode or rte_ring for
62    software copy mode (default is 2048)
63
64*   --[no-]mac-updating: Whether MAC address of packets should be changed
65    or not (default is mac-updating)
66
67The application can be launched in various configurations depending on
68provided parameters. The app can use up to 2 lcores: one of them receives
69incoming traffic and makes a copy of each packet. The second lcore then
70updates MAC address and sends the copy. If one lcore per port is used,
71both operations are done sequentially. For each configuration an additional
72lcore is needed since the main lcore does not handle traffic but is
73responsible for configuration, statistics printing and safe shutdown of
74all ports and devices.
75
76The application can use a maximum of 8 ports.
77
78To run the application in a Linux environment with 3 lcores (the main lcore,
79plus two forwarding cores), a single port (port 0), software copying and MAC
80updating issue the command:
81
82.. code-block:: console
83
84    $ ./<build_dir>/examples/dpdk-ioat -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
85
86To run the application in a Linux environment with 2 lcores (the main lcore,
87plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
88updating issue the command:
89
90.. code-block:: console
91
92    $ ./<build_dir>/examples/dpdk-ioat -l 0-1 -n 1 -- -p 0x3 --no-mac-updating -c hw
93
94Refer to the *DPDK Getting Started Guide* for general information on
95running applications and the Environment Abstraction Layer (EAL) options.
96
97Explanation
98-----------
99
100The following sections provide an explanation of the main components of the
101code.
102
103All DPDK library functions used in the sample code are prefixed with
104``rte_`` and are explained in detail in the *DPDK API Documentation*.
105
106
107The Main Function
108~~~~~~~~~~~~~~~~~
109
110The ``main()`` function performs the initialization and calls the execution
111threads for each lcore.
112
113The first task is to initialize the Environment Abstraction Layer (EAL).
114The ``argc`` and ``argv`` arguments are provided to the ``rte_eal_init()``
115function. The value returned is the number of parsed arguments:
116
117.. code-block:: c
118
119    /* init EAL */
120    ret = rte_eal_init(argc, argv);
121    if (ret < 0)
122        rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
123
124
125The ``main()`` also allocates a mempool to hold the mbufs (Message Buffers)
126used by the application:
127
128.. code-block:: c
129
130    nb_mbufs = RTE_MAX(rte_eth_dev_count_avail() * (nb_rxd + nb_txd
131        + MAX_PKT_BURST + rte_lcore_count() * MEMPOOL_CACHE_SIZE),
132        MIN_POOL_SIZE);
133
134    /* Create the mbuf pool */
135    ioat_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", nb_mbufs,
136        MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
137        rte_socket_id());
138    if (ioat_pktmbuf_pool == NULL)
139        rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");
140
141Mbufs are the packet buffer structure used by DPDK. They are explained in
142detail in the "Mbuf Library" section of the *DPDK Programmer's Guide*.
143
144The ``main()`` function also initializes the ports:
145
146.. code-block:: c
147
148    /* Initialise each port */
149    RTE_ETH_FOREACH_DEV(portid) {
150        port_init(portid, ioat_pktmbuf_pool);
151    }
152
153Each port is configured using ``port_init()`` function. The Ethernet
154ports are configured with local settings using the ``rte_eth_dev_configure()``
155function and the ``port_conf`` struct. The RSS is enabled so that
156multiple Rx queues could be used for packet receiving and copying by
157multiple CBDMA channels per port:
158
159.. code-block:: c
160
161    /* configuring port to use RSS for multiple RX queues */
162    static const struct rte_eth_conf port_conf = {
163        .rxmode = {
164            .mq_mode        = ETH_MQ_RX_RSS,
165            .max_rx_pkt_len = RTE_ETHER_MAX_LEN
166        },
167        .rx_adv_conf = {
168            .rss_conf = {
169                .rss_key = NULL,
170                .rss_hf = ETH_RSS_PROTO_MASK,
171            }
172        }
173    };
174
175For this example the ports are set up with the number of Rx queues provided
176with -q option and 1 Tx queue using the ``rte_eth_rx_queue_setup()``
177and ``rte_eth_tx_queue_setup()`` functions.
178
179The Ethernet port is then started:
180
181.. code-block:: c
182
183    ret = rte_eth_dev_start(portid);
184    if (ret < 0)
185        rte_exit(EXIT_FAILURE, "rte_eth_dev_start:err=%d, port=%u\n",
186            ret, portid);
187
188
189Finally the Rx port is set in promiscuous mode:
190
191.. code-block:: c
192
193    rte_eth_promiscuous_enable(portid);
194
195
196After that each port application assigns resources needed.
197
198.. code-block:: c
199
200    check_link_status(ioat_enabled_port_mask);
201
202    if (!cfg.nb_ports) {
203        rte_exit(EXIT_FAILURE,
204            "All available ports are disabled. Please set portmask.\n");
205    }
206
207    /* Check if there is enough lcores for all ports. */
208    cfg.nb_lcores = rte_lcore_count() - 1;
209    if (cfg.nb_lcores < 1)
210        rte_exit(EXIT_FAILURE,
211            "There should be at least one worker lcore.\n");
212
213    ret = 0;
214
215    if (copy_mode == COPY_MODE_IOAT_NUM) {
216        assign_rawdevs();
217    } else /* copy_mode == COPY_MODE_SW_NUM */ {
218        assign_rings();
219    }
220
221Depending on mode set (whether copy should be done by software or by hardware)
222special structures are assigned to each port. If software copy was chosen,
223application have to assign ring structures for packet exchanging between lcores
224assigned to ports.
225
226.. code-block:: c
227
228    static void
229    assign_rings(void)
230    {
231        uint32_t i;
232
233        for (i = 0; i < cfg.nb_ports; i++) {
234            char ring_name[20];
235
236            snprintf(ring_name, 20, "rx_to_tx_ring_%u", i);
237            /* Create ring for inter core communication */
238            cfg.ports[i].rx_to_tx_ring = rte_ring_create(
239                    ring_name, ring_size,
240                    rte_socket_id(), RING_F_SP_ENQ);
241
242            if (cfg.ports[i].rx_to_tx_ring == NULL)
243                rte_exit(EXIT_FAILURE, "%s\n",
244                        rte_strerror(rte_errno));
245        }
246    }
247
248
249When using hardware copy each Rx queue of the port is assigned an
250IOAT device (``assign_rawdevs()``) using IOAT Rawdev Driver API
251functions:
252
253.. code-block:: c
254
255    static void
256    assign_rawdevs(void)
257    {
258        uint16_t nb_rawdev = 0, rdev_id = 0;
259        uint32_t i, j;
260
261        for (i = 0; i < cfg.nb_ports; i++) {
262            for (j = 0; j < cfg.ports[i].nb_queues; j++) {
263                struct rte_rawdev_info rdev_info = { 0 };
264
265                do {
266                    if (rdev_id == rte_rawdev_count())
267                        goto end;
268                    rte_rawdev_info_get(rdev_id++, &rdev_info, 0);
269                } while (strcmp(rdev_info.driver_name,
270                    IOAT_PMD_RAWDEV_NAME_STR) != 0);
271
272                cfg.ports[i].ioat_ids[j] = rdev_id - 1;
273                configure_rawdev_queue(cfg.ports[i].ioat_ids[j]);
274                ++nb_rawdev;
275            }
276        }
277    end:
278        if (nb_rawdev < cfg.nb_ports * cfg.ports[0].nb_queues)
279            rte_exit(EXIT_FAILURE,
280                "Not enough IOAT rawdevs (%u) for all queues (%u).\n",
281                nb_rawdev, cfg.nb_ports * cfg.ports[0].nb_queues);
282        RTE_LOG(INFO, IOAT, "Number of used rawdevs: %u.\n", nb_rawdev);
283    }
284
285
286The initialization of hardware device is done by ``rte_rawdev_configure()``
287function using ``rte_rawdev_info`` struct. After configuration the device is
288started using ``rte_rawdev_start()`` function. Each of the above operations
289is done in ``configure_rawdev_queue()``.
290
291.. code-block:: c
292
293    static void
294    configure_rawdev_queue(uint32_t dev_id)
295    {
296        struct rte_ioat_rawdev_config dev_config = { .ring_size = ring_size };
297        struct rte_rawdev_info info = { .dev_private = &dev_config };
298
299        if (rte_rawdev_configure(dev_id, &info, sizeof(dev_config)) != 0) {
300            rte_exit(EXIT_FAILURE,
301                "Error with rte_rawdev_configure()\n");
302        }
303        if (rte_rawdev_start(dev_id) != 0) {
304            rte_exit(EXIT_FAILURE,
305                "Error with rte_rawdev_start()\n");
306        }
307    }
308
309If initialization is successful, memory for hardware device
310statistics is allocated.
311
312Finally ``main()`` function starts all packet handling lcores and starts
313printing stats in a loop on the main lcore. The application can be
314interrupted and closed using ``Ctrl-C``. The main lcore waits for
315all worker lcores to finish, deallocates resources and exits.
316
317The processing lcores launching function are described below.
318
319The Lcores Launching Functions
320~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
321
322As described above, ``main()`` function invokes ``start_forwarding_cores()``
323function in order to start processing for each lcore:
324
325.. code-block:: c
326
327    static void start_forwarding_cores(void)
328    {
329        uint32_t lcore_id = rte_lcore_id();
330
331        RTE_LOG(INFO, IOAT, "Entering %s on lcore %u\n",
332                __func__, rte_lcore_id());
333
334        if (cfg.nb_lcores == 1) {
335            lcore_id = rte_get_next_lcore(lcore_id, true, true);
336            rte_eal_remote_launch((lcore_function_t *)rxtx_main_loop,
337                NULL, lcore_id);
338        } else if (cfg.nb_lcores > 1) {
339            lcore_id = rte_get_next_lcore(lcore_id, true, true);
340            rte_eal_remote_launch((lcore_function_t *)rx_main_loop,
341                NULL, lcore_id);
342
343            lcore_id = rte_get_next_lcore(lcore_id, true, true);
344            rte_eal_remote_launch((lcore_function_t *)tx_main_loop, NULL,
345                lcore_id);
346        }
347    }
348
349The function launches Rx/Tx processing functions on configured lcores
350using ``rte_eal_remote_launch()``. The configured ports, their number
351and number of assigned lcores are stored in user-defined
352``rxtx_transmission_config`` struct:
353
354.. code-block:: c
355
356    struct rxtx_transmission_config {
357        struct rxtx_port_config ports[RTE_MAX_ETHPORTS];
358        uint16_t nb_ports;
359        uint16_t nb_lcores;
360    };
361
362The structure is initialized in 'main()' function with the values
363corresponding to ports and lcores configuration provided by the user.
364
365The Lcores Processing Functions
366~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
367
368For receiving packets on each port, the ``ioat_rx_port()`` function is used.
369The function receives packets on each configured Rx queue. Depending on the
370mode the user chose, it will enqueue packets to IOAT rawdev channels and
371then invoke copy process (hardware copy), or perform software copy of each
372packet using ``pktmbuf_sw_copy()`` function and enqueue them to an rte_ring:
373
374.. code-block:: c
375
376    /* Receive packets on one port and enqueue to IOAT rawdev or rte_ring. */
377    static void
378    ioat_rx_port(struct rxtx_port_config *rx_config)
379    {
380        uint32_t nb_rx, nb_enq, i, j;
381        struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
382        for (i = 0; i < rx_config->nb_queues; i++) {
383
384            nb_rx = rte_eth_rx_burst(rx_config->rxtx_port, i,
385                pkts_burst, MAX_PKT_BURST);
386
387            if (nb_rx == 0)
388                continue;
389
390            port_statistics.rx[rx_config->rxtx_port] += nb_rx;
391
392            if (copy_mode == COPY_MODE_IOAT_NUM) {
393                /* Perform packet hardware copy */
394                nb_enq = ioat_enqueue_packets(pkts_burst,
395                    nb_rx, rx_config->ioat_ids[i]);
396                if (nb_enq > 0)
397                    rte_ioat_perform_ops(rx_config->ioat_ids[i]);
398            } else {
399                /* Perform packet software copy, free source packets */
400                int ret;
401                struct rte_mbuf *pkts_burst_copy[MAX_PKT_BURST];
402
403                ret = rte_mempool_get_bulk(ioat_pktmbuf_pool,
404                    (void *)pkts_burst_copy, nb_rx);
405
406                if (unlikely(ret < 0))
407                    rte_exit(EXIT_FAILURE,
408                        "Unable to allocate memory.\n");
409
410                for (j = 0; j < nb_rx; j++)
411                    pktmbuf_sw_copy(pkts_burst[j],
412                        pkts_burst_copy[j]);
413
414                rte_mempool_put_bulk(ioat_pktmbuf_pool,
415                    (void *)pkts_burst, nb_rx);
416
417                nb_enq = rte_ring_enqueue_burst(
418                    rx_config->rx_to_tx_ring,
419                    (void *)pkts_burst_copy, nb_rx, NULL);
420
421                /* Free any not enqueued packets. */
422                rte_mempool_put_bulk(ioat_pktmbuf_pool,
423                    (void *)&pkts_burst_copy[nb_enq],
424                    nb_rx - nb_enq);
425            }
426
427            port_statistics.copy_dropped[rx_config->rxtx_port] +=
428                (nb_rx - nb_enq);
429        }
430    }
431
432The packets are received in burst mode using ``rte_eth_rx_burst()``
433function. When using hardware copy mode the packets are enqueued in
434copying device's buffer using ``ioat_enqueue_packets()`` which calls
435``rte_ioat_enqueue_copy()``. When all received packets are in the
436buffer the copy operations are started by calling ``rte_ioat_perform_ops()``.
437Function ``rte_ioat_enqueue_copy()`` operates on physical address of
438the packet. Structure ``rte_mbuf`` contains only physical address to
439start of the data buffer (``buf_iova``). Thus the address is adjusted
440by ``addr_offset`` value in order to get the address of ``rearm_data``
441member of ``rte_mbuf``. That way both the packet data and metadata can
442be copied in a single operation. This method can be used because the mbufs
443are direct mbufs allocated by the apps. If another app uses external buffers,
444or indirect mbufs, then multiple copy operations must be used.
445
446.. code-block:: c
447
448    static uint32_t
449    ioat_enqueue_packets(struct rte_mbuf **pkts,
450        uint32_t nb_rx, uint16_t dev_id)
451    {
452        int ret;
453        uint32_t i;
454        struct rte_mbuf *pkts_copy[MAX_PKT_BURST];
455
456        const uint64_t addr_offset = RTE_PTR_DIFF(pkts[0]->buf_addr,
457            &pkts[0]->rearm_data);
458
459        ret = rte_mempool_get_bulk(ioat_pktmbuf_pool,
460                (void *)pkts_copy, nb_rx);
461
462        if (unlikely(ret < 0))
463            rte_exit(EXIT_FAILURE, "Unable to allocate memory.\n");
464
465        for (i = 0; i < nb_rx; i++) {
466            /* Perform data copy */
467            ret = rte_ioat_enqueue_copy(dev_id,
468                pkts[i]->buf_iova
469                    - addr_offset,
470                pkts_copy[i]->buf_iova
471                    - addr_offset,
472                rte_pktmbuf_data_len(pkts[i])
473                    + addr_offset,
474                (uintptr_t)pkts[i],
475                (uintptr_t)pkts_copy[i],
476                0 /* nofence */);
477
478            if (ret != 1)
479                break;
480        }
481
482        ret = i;
483        /* Free any not enqueued packets. */
484        rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts[i], nb_rx - i);
485        rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts_copy[i],
486            nb_rx - i);
487
488        return ret;
489    }
490
491
492All completed copies are processed by ``ioat_tx_port()`` function. When using
493hardware copy mode the function invokes ``rte_ioat_completed_ops()``
494on each assigned IOAT channel to gather copied packets. If software copy
495mode is used the function dequeues copied packets from the rte_ring. Then each
496packet MAC address is changed if it was enabled. After that copies are sent
497in burst mode using `` rte_eth_tx_burst()``.
498
499
500.. code-block:: c
501
502    /* Transmit packets from IOAT rawdev/rte_ring for one port. */
503    static void
504    ioat_tx_port(struct rxtx_port_config *tx_config)
505    {
506        uint32_t i, j, nb_dq = 0;
507        struct rte_mbuf *mbufs_src[MAX_PKT_BURST];
508        struct rte_mbuf *mbufs_dst[MAX_PKT_BURST];
509
510        for (i = 0; i < tx_config->nb_queues; i++) {
511            if (copy_mode == COPY_MODE_IOAT_NUM) {
512                /* Deque the mbufs from IOAT device. */
513                nb_dq = rte_ioat_completed_ops(
514                    tx_config->ioat_ids[i], MAX_PKT_BURST,
515                    (void *)mbufs_src, (void *)mbufs_dst);
516            } else {
517                /* Deque the mbufs from rx_to_tx_ring. */
518                nb_dq = rte_ring_dequeue_burst(
519                    tx_config->rx_to_tx_ring, (void *)mbufs_dst,
520                    MAX_PKT_BURST, NULL);
521            }
522
523            if (nb_dq == 0)
524                return;
525
526            if (copy_mode == COPY_MODE_IOAT_NUM)
527                rte_mempool_put_bulk(ioat_pktmbuf_pool,
528                    (void *)mbufs_src, nb_dq);
529
530            /* Update macs if enabled */
531            if (mac_updating) {
532                for (j = 0; j < nb_dq; j++)
533                    update_mac_addrs(mbufs_dst[j],
534                        tx_config->rxtx_port);
535            }
536
537            const uint16_t nb_tx = rte_eth_tx_burst(
538                tx_config->rxtx_port, 0,
539                (void *)mbufs_dst, nb_dq);
540
541            port_statistics.tx[tx_config->rxtx_port] += nb_tx;
542
543            /* Free any unsent packets. */
544            if (unlikely(nb_tx < nb_dq))
545                rte_mempool_put_bulk(ioat_pktmbuf_pool,
546                (void *)&mbufs_dst[nb_tx],
547                    nb_dq - nb_tx);
548        }
549    }
550
551The Packet Copying Functions
552~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
553
554In order to perform packet copy there is a user-defined function
555``pktmbuf_sw_copy()`` used. It copies a whole packet by copying
556metadata from source packet to new mbuf, and then copying a data
557chunk of source packet. Both memory copies are done using
558``rte_memcpy()``:
559
560.. code-block:: c
561
562    static inline void
563    pktmbuf_sw_copy(struct rte_mbuf *src, struct rte_mbuf *dst)
564    {
565        /* Copy packet metadata */
566        rte_memcpy(&dst->rearm_data,
567            &src->rearm_data,
568            offsetof(struct rte_mbuf, cacheline1)
569                - offsetof(struct rte_mbuf, rearm_data));
570
571        /* Copy packet data */
572        rte_memcpy(rte_pktmbuf_mtod(dst, char *),
573            rte_pktmbuf_mtod(src, char *), src->data_len);
574    }
575
576The metadata in this example is copied from ``rearm_data`` member of
577``rte_mbuf`` struct up to ``cacheline1``.
578
579In order to understand why software packet copying is done as shown
580above please refer to the "Mbuf Library" section of the
581*DPDK Programmer's Guide*.
582