1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2019 Intel Corporation. 3 4.. include:: <isonum.txt> 5 6Packet copying using Intel\ |reg| QuickData Technology 7====================================================== 8 9Overview 10-------- 11 12This sample is intended as a demonstration of the basic components of a DPDK 13forwarding application and example of how to use IOAT driver API to make 14packets copies. 15 16Also while forwarding, the MAC addresses are affected as follows: 17 18* The source MAC address is replaced by the TX port MAC address 19 20* The destination MAC address is replaced by 02:00:00:00:00:TX_PORT_ID 21 22This application can be used to compare performance of using software packet 23copy with copy done using a DMA device for different sizes of packets. 24The example will print out statistics each second. The stats shows 25received/send packets and packets dropped or failed to copy. 26 27Compiling the Application 28------------------------- 29 30To compile the sample application see :doc:`compiling`. 31 32The application is located in the ``ioat`` sub-directory. 33 34 35Running the Application 36----------------------- 37 38In order to run the hardware copy application, the copying device 39needs to be bound to user-space IO driver. 40 41Refer to the "IOAT Rawdev Driver" chapter in the "Rawdev Drivers" document 42for information on using the driver. 43 44The application requires a number of command line options: 45 46.. code-block:: console 47 48 ./<build_dir>/examples/dpdk-ioat [EAL options] -- [-p MASK] [-q NQ] [-s RS] [-c <sw|hw>] 49 [--[no-]mac-updating] 50 51where, 52 53* p MASK: A hexadecimal bitmask of the ports to configure (default is all) 54 55* q NQ: Number of Rx queues used per port equivalent to CBDMA channels 56 per port (default is 1) 57 58* c CT: Performed packet copy type: software (sw) or hardware using 59 DMA (hw) (default is hw) 60 61* s RS: Size of IOAT rawdev ring for hardware copy mode or rte_ring for 62 software copy mode (default is 2048) 63 64* --[no-]mac-updating: Whether MAC address of packets should be changed 65 or not (default is mac-updating) 66 67The application can be launched in various configurations depending on 68provided parameters. The app can use up to 2 lcores: one of them receives 69incoming traffic and makes a copy of each packet. The second lcore then 70updates MAC address and sends the copy. If one lcore per port is used, 71both operations are done sequentially. For each configuration an additional 72lcore is needed since the main lcore does not handle traffic but is 73responsible for configuration, statistics printing and safe shutdown of 74all ports and devices. 75 76The application can use a maximum of 8 ports. 77 78To run the application in a Linux environment with 3 lcores (the main lcore, 79plus two forwarding cores), a single port (port 0), software copying and MAC 80updating issue the command: 81 82.. code-block:: console 83 84 $ ./<build_dir>/examples/dpdk-ioat -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw 85 86To run the application in a Linux environment with 2 lcores (the main lcore, 87plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC 88updating issue the command: 89 90.. code-block:: console 91 92 $ ./<build_dir>/examples/dpdk-ioat -l 0-1 -n 1 -- -p 0x3 --no-mac-updating -c hw 93 94Refer to the *DPDK Getting Started Guide* for general information on 95running applications and the Environment Abstraction Layer (EAL) options. 96 97Explanation 98----------- 99 100The following sections provide an explanation of the main components of the 101code. 102 103All DPDK library functions used in the sample code are prefixed with 104``rte_`` and are explained in detail in the *DPDK API Documentation*. 105 106 107The Main Function 108~~~~~~~~~~~~~~~~~ 109 110The ``main()`` function performs the initialization and calls the execution 111threads for each lcore. 112 113The first task is to initialize the Environment Abstraction Layer (EAL). 114The ``argc`` and ``argv`` arguments are provided to the ``rte_eal_init()`` 115function. The value returned is the number of parsed arguments: 116 117.. code-block:: c 118 119 /* init EAL */ 120 ret = rte_eal_init(argc, argv); 121 if (ret < 0) 122 rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n"); 123 124 125The ``main()`` also allocates a mempool to hold the mbufs (Message Buffers) 126used by the application: 127 128.. code-block:: c 129 130 nb_mbufs = RTE_MAX(rte_eth_dev_count_avail() * (nb_rxd + nb_txd 131 + MAX_PKT_BURST + rte_lcore_count() * MEMPOOL_CACHE_SIZE), 132 MIN_POOL_SIZE); 133 134 /* Create the mbuf pool */ 135 ioat_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", nb_mbufs, 136 MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, 137 rte_socket_id()); 138 if (ioat_pktmbuf_pool == NULL) 139 rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n"); 140 141Mbufs are the packet buffer structure used by DPDK. They are explained in 142detail in the "Mbuf Library" section of the *DPDK Programmer's Guide*. 143 144The ``main()`` function also initializes the ports: 145 146.. code-block:: c 147 148 /* Initialise each port */ 149 RTE_ETH_FOREACH_DEV(portid) { 150 port_init(portid, ioat_pktmbuf_pool); 151 } 152 153Each port is configured using ``port_init()`` function. The Ethernet 154ports are configured with local settings using the ``rte_eth_dev_configure()`` 155function and the ``port_conf`` struct. The RSS is enabled so that 156multiple Rx queues could be used for packet receiving and copying by 157multiple CBDMA channels per port: 158 159.. code-block:: c 160 161 /* configuring port to use RSS for multiple RX queues */ 162 static const struct rte_eth_conf port_conf = { 163 .rxmode = { 164 .mq_mode = ETH_MQ_RX_RSS, 165 .max_rx_pkt_len = RTE_ETHER_MAX_LEN 166 }, 167 .rx_adv_conf = { 168 .rss_conf = { 169 .rss_key = NULL, 170 .rss_hf = ETH_RSS_PROTO_MASK, 171 } 172 } 173 }; 174 175For this example the ports are set up with the number of Rx queues provided 176with -q option and 1 Tx queue using the ``rte_eth_rx_queue_setup()`` 177and ``rte_eth_tx_queue_setup()`` functions. 178 179The Ethernet port is then started: 180 181.. code-block:: c 182 183 ret = rte_eth_dev_start(portid); 184 if (ret < 0) 185 rte_exit(EXIT_FAILURE, "rte_eth_dev_start:err=%d, port=%u\n", 186 ret, portid); 187 188 189Finally the Rx port is set in promiscuous mode: 190 191.. code-block:: c 192 193 rte_eth_promiscuous_enable(portid); 194 195 196After that each port application assigns resources needed. 197 198.. code-block:: c 199 200 check_link_status(ioat_enabled_port_mask); 201 202 if (!cfg.nb_ports) { 203 rte_exit(EXIT_FAILURE, 204 "All available ports are disabled. Please set portmask.\n"); 205 } 206 207 /* Check if there is enough lcores for all ports. */ 208 cfg.nb_lcores = rte_lcore_count() - 1; 209 if (cfg.nb_lcores < 1) 210 rte_exit(EXIT_FAILURE, 211 "There should be at least one worker lcore.\n"); 212 213 ret = 0; 214 215 if (copy_mode == COPY_MODE_IOAT_NUM) { 216 assign_rawdevs(); 217 } else /* copy_mode == COPY_MODE_SW_NUM */ { 218 assign_rings(); 219 } 220 221Depending on mode set (whether copy should be done by software or by hardware) 222special structures are assigned to each port. If software copy was chosen, 223application have to assign ring structures for packet exchanging between lcores 224assigned to ports. 225 226.. code-block:: c 227 228 static void 229 assign_rings(void) 230 { 231 uint32_t i; 232 233 for (i = 0; i < cfg.nb_ports; i++) { 234 char ring_name[20]; 235 236 snprintf(ring_name, 20, "rx_to_tx_ring_%u", i); 237 /* Create ring for inter core communication */ 238 cfg.ports[i].rx_to_tx_ring = rte_ring_create( 239 ring_name, ring_size, 240 rte_socket_id(), RING_F_SP_ENQ); 241 242 if (cfg.ports[i].rx_to_tx_ring == NULL) 243 rte_exit(EXIT_FAILURE, "%s\n", 244 rte_strerror(rte_errno)); 245 } 246 } 247 248 249When using hardware copy each Rx queue of the port is assigned an 250IOAT device (``assign_rawdevs()``) using IOAT Rawdev Driver API 251functions: 252 253.. code-block:: c 254 255 static void 256 assign_rawdevs(void) 257 { 258 uint16_t nb_rawdev = 0, rdev_id = 0; 259 uint32_t i, j; 260 261 for (i = 0; i < cfg.nb_ports; i++) { 262 for (j = 0; j < cfg.ports[i].nb_queues; j++) { 263 struct rte_rawdev_info rdev_info = { 0 }; 264 265 do { 266 if (rdev_id == rte_rawdev_count()) 267 goto end; 268 rte_rawdev_info_get(rdev_id++, &rdev_info, 0); 269 } while (strcmp(rdev_info.driver_name, 270 IOAT_PMD_RAWDEV_NAME_STR) != 0); 271 272 cfg.ports[i].ioat_ids[j] = rdev_id - 1; 273 configure_rawdev_queue(cfg.ports[i].ioat_ids[j]); 274 ++nb_rawdev; 275 } 276 } 277 end: 278 if (nb_rawdev < cfg.nb_ports * cfg.ports[0].nb_queues) 279 rte_exit(EXIT_FAILURE, 280 "Not enough IOAT rawdevs (%u) for all queues (%u).\n", 281 nb_rawdev, cfg.nb_ports * cfg.ports[0].nb_queues); 282 RTE_LOG(INFO, IOAT, "Number of used rawdevs: %u.\n", nb_rawdev); 283 } 284 285 286The initialization of hardware device is done by ``rte_rawdev_configure()`` 287function using ``rte_rawdev_info`` struct. After configuration the device is 288started using ``rte_rawdev_start()`` function. Each of the above operations 289is done in ``configure_rawdev_queue()``. 290 291.. code-block:: c 292 293 static void 294 configure_rawdev_queue(uint32_t dev_id) 295 { 296 struct rte_ioat_rawdev_config dev_config = { .ring_size = ring_size }; 297 struct rte_rawdev_info info = { .dev_private = &dev_config }; 298 299 if (rte_rawdev_configure(dev_id, &info, sizeof(dev_config)) != 0) { 300 rte_exit(EXIT_FAILURE, 301 "Error with rte_rawdev_configure()\n"); 302 } 303 if (rte_rawdev_start(dev_id) != 0) { 304 rte_exit(EXIT_FAILURE, 305 "Error with rte_rawdev_start()\n"); 306 } 307 } 308 309If initialization is successful, memory for hardware device 310statistics is allocated. 311 312Finally ``main()`` function starts all packet handling lcores and starts 313printing stats in a loop on the main lcore. The application can be 314interrupted and closed using ``Ctrl-C``. The main lcore waits for 315all worker lcores to finish, deallocates resources and exits. 316 317The processing lcores launching function are described below. 318 319The Lcores Launching Functions 320~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 321 322As described above, ``main()`` function invokes ``start_forwarding_cores()`` 323function in order to start processing for each lcore: 324 325.. code-block:: c 326 327 static void start_forwarding_cores(void) 328 { 329 uint32_t lcore_id = rte_lcore_id(); 330 331 RTE_LOG(INFO, IOAT, "Entering %s on lcore %u\n", 332 __func__, rte_lcore_id()); 333 334 if (cfg.nb_lcores == 1) { 335 lcore_id = rte_get_next_lcore(lcore_id, true, true); 336 rte_eal_remote_launch((lcore_function_t *)rxtx_main_loop, 337 NULL, lcore_id); 338 } else if (cfg.nb_lcores > 1) { 339 lcore_id = rte_get_next_lcore(lcore_id, true, true); 340 rte_eal_remote_launch((lcore_function_t *)rx_main_loop, 341 NULL, lcore_id); 342 343 lcore_id = rte_get_next_lcore(lcore_id, true, true); 344 rte_eal_remote_launch((lcore_function_t *)tx_main_loop, NULL, 345 lcore_id); 346 } 347 } 348 349The function launches Rx/Tx processing functions on configured lcores 350using ``rte_eal_remote_launch()``. The configured ports, their number 351and number of assigned lcores are stored in user-defined 352``rxtx_transmission_config`` struct: 353 354.. code-block:: c 355 356 struct rxtx_transmission_config { 357 struct rxtx_port_config ports[RTE_MAX_ETHPORTS]; 358 uint16_t nb_ports; 359 uint16_t nb_lcores; 360 }; 361 362The structure is initialized in 'main()' function with the values 363corresponding to ports and lcores configuration provided by the user. 364 365The Lcores Processing Functions 366~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 367 368For receiving packets on each port, the ``ioat_rx_port()`` function is used. 369The function receives packets on each configured Rx queue. Depending on the 370mode the user chose, it will enqueue packets to IOAT rawdev channels and 371then invoke copy process (hardware copy), or perform software copy of each 372packet using ``pktmbuf_sw_copy()`` function and enqueue them to an rte_ring: 373 374.. code-block:: c 375 376 /* Receive packets on one port and enqueue to IOAT rawdev or rte_ring. */ 377 static void 378 ioat_rx_port(struct rxtx_port_config *rx_config) 379 { 380 uint32_t nb_rx, nb_enq, i, j; 381 struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; 382 for (i = 0; i < rx_config->nb_queues; i++) { 383 384 nb_rx = rte_eth_rx_burst(rx_config->rxtx_port, i, 385 pkts_burst, MAX_PKT_BURST); 386 387 if (nb_rx == 0) 388 continue; 389 390 port_statistics.rx[rx_config->rxtx_port] += nb_rx; 391 392 if (copy_mode == COPY_MODE_IOAT_NUM) { 393 /* Perform packet hardware copy */ 394 nb_enq = ioat_enqueue_packets(pkts_burst, 395 nb_rx, rx_config->ioat_ids[i]); 396 if (nb_enq > 0) 397 rte_ioat_perform_ops(rx_config->ioat_ids[i]); 398 } else { 399 /* Perform packet software copy, free source packets */ 400 int ret; 401 struct rte_mbuf *pkts_burst_copy[MAX_PKT_BURST]; 402 403 ret = rte_mempool_get_bulk(ioat_pktmbuf_pool, 404 (void *)pkts_burst_copy, nb_rx); 405 406 if (unlikely(ret < 0)) 407 rte_exit(EXIT_FAILURE, 408 "Unable to allocate memory.\n"); 409 410 for (j = 0; j < nb_rx; j++) 411 pktmbuf_sw_copy(pkts_burst[j], 412 pkts_burst_copy[j]); 413 414 rte_mempool_put_bulk(ioat_pktmbuf_pool, 415 (void *)pkts_burst, nb_rx); 416 417 nb_enq = rte_ring_enqueue_burst( 418 rx_config->rx_to_tx_ring, 419 (void *)pkts_burst_copy, nb_rx, NULL); 420 421 /* Free any not enqueued packets. */ 422 rte_mempool_put_bulk(ioat_pktmbuf_pool, 423 (void *)&pkts_burst_copy[nb_enq], 424 nb_rx - nb_enq); 425 } 426 427 port_statistics.copy_dropped[rx_config->rxtx_port] += 428 (nb_rx - nb_enq); 429 } 430 } 431 432The packets are received in burst mode using ``rte_eth_rx_burst()`` 433function. When using hardware copy mode the packets are enqueued in 434copying device's buffer using ``ioat_enqueue_packets()`` which calls 435``rte_ioat_enqueue_copy()``. When all received packets are in the 436buffer the copy operations are started by calling ``rte_ioat_perform_ops()``. 437Function ``rte_ioat_enqueue_copy()`` operates on physical address of 438the packet. Structure ``rte_mbuf`` contains only physical address to 439start of the data buffer (``buf_iova``). Thus the address is adjusted 440by ``addr_offset`` value in order to get the address of ``rearm_data`` 441member of ``rte_mbuf``. That way both the packet data and metadata can 442be copied in a single operation. This method can be used because the mbufs 443are direct mbufs allocated by the apps. If another app uses external buffers, 444or indirect mbufs, then multiple copy operations must be used. 445 446.. code-block:: c 447 448 static uint32_t 449 ioat_enqueue_packets(struct rte_mbuf **pkts, 450 uint32_t nb_rx, uint16_t dev_id) 451 { 452 int ret; 453 uint32_t i; 454 struct rte_mbuf *pkts_copy[MAX_PKT_BURST]; 455 456 const uint64_t addr_offset = RTE_PTR_DIFF(pkts[0]->buf_addr, 457 &pkts[0]->rearm_data); 458 459 ret = rte_mempool_get_bulk(ioat_pktmbuf_pool, 460 (void *)pkts_copy, nb_rx); 461 462 if (unlikely(ret < 0)) 463 rte_exit(EXIT_FAILURE, "Unable to allocate memory.\n"); 464 465 for (i = 0; i < nb_rx; i++) { 466 /* Perform data copy */ 467 ret = rte_ioat_enqueue_copy(dev_id, 468 pkts[i]->buf_iova 469 - addr_offset, 470 pkts_copy[i]->buf_iova 471 - addr_offset, 472 rte_pktmbuf_data_len(pkts[i]) 473 + addr_offset, 474 (uintptr_t)pkts[i], 475 (uintptr_t)pkts_copy[i], 476 0 /* nofence */); 477 478 if (ret != 1) 479 break; 480 } 481 482 ret = i; 483 /* Free any not enqueued packets. */ 484 rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts[i], nb_rx - i); 485 rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts_copy[i], 486 nb_rx - i); 487 488 return ret; 489 } 490 491 492All completed copies are processed by ``ioat_tx_port()`` function. When using 493hardware copy mode the function invokes ``rte_ioat_completed_ops()`` 494on each assigned IOAT channel to gather copied packets. If software copy 495mode is used the function dequeues copied packets from the rte_ring. Then each 496packet MAC address is changed if it was enabled. After that copies are sent 497in burst mode using `` rte_eth_tx_burst()``. 498 499 500.. code-block:: c 501 502 /* Transmit packets from IOAT rawdev/rte_ring for one port. */ 503 static void 504 ioat_tx_port(struct rxtx_port_config *tx_config) 505 { 506 uint32_t i, j, nb_dq = 0; 507 struct rte_mbuf *mbufs_src[MAX_PKT_BURST]; 508 struct rte_mbuf *mbufs_dst[MAX_PKT_BURST]; 509 510 for (i = 0; i < tx_config->nb_queues; i++) { 511 if (copy_mode == COPY_MODE_IOAT_NUM) { 512 /* Deque the mbufs from IOAT device. */ 513 nb_dq = rte_ioat_completed_ops( 514 tx_config->ioat_ids[i], MAX_PKT_BURST, 515 (void *)mbufs_src, (void *)mbufs_dst); 516 } else { 517 /* Deque the mbufs from rx_to_tx_ring. */ 518 nb_dq = rte_ring_dequeue_burst( 519 tx_config->rx_to_tx_ring, (void *)mbufs_dst, 520 MAX_PKT_BURST, NULL); 521 } 522 523 if (nb_dq == 0) 524 return; 525 526 if (copy_mode == COPY_MODE_IOAT_NUM) 527 rte_mempool_put_bulk(ioat_pktmbuf_pool, 528 (void *)mbufs_src, nb_dq); 529 530 /* Update macs if enabled */ 531 if (mac_updating) { 532 for (j = 0; j < nb_dq; j++) 533 update_mac_addrs(mbufs_dst[j], 534 tx_config->rxtx_port); 535 } 536 537 const uint16_t nb_tx = rte_eth_tx_burst( 538 tx_config->rxtx_port, 0, 539 (void *)mbufs_dst, nb_dq); 540 541 port_statistics.tx[tx_config->rxtx_port] += nb_tx; 542 543 /* Free any unsent packets. */ 544 if (unlikely(nb_tx < nb_dq)) 545 rte_mempool_put_bulk(ioat_pktmbuf_pool, 546 (void *)&mbufs_dst[nb_tx], 547 nb_dq - nb_tx); 548 } 549 } 550 551The Packet Copying Functions 552~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 553 554In order to perform packet copy there is a user-defined function 555``pktmbuf_sw_copy()`` used. It copies a whole packet by copying 556metadata from source packet to new mbuf, and then copying a data 557chunk of source packet. Both memory copies are done using 558``rte_memcpy()``: 559 560.. code-block:: c 561 562 static inline void 563 pktmbuf_sw_copy(struct rte_mbuf *src, struct rte_mbuf *dst) 564 { 565 /* Copy packet metadata */ 566 rte_memcpy(&dst->rearm_data, 567 &src->rearm_data, 568 offsetof(struct rte_mbuf, cacheline1) 569 - offsetof(struct rte_mbuf, rearm_data)); 570 571 /* Copy packet data */ 572 rte_memcpy(rte_pktmbuf_mtod(dst, char *), 573 rte_pktmbuf_mtod(src, char *), src->data_len); 574 } 575 576The metadata in this example is copied from ``rearm_data`` member of 577``rte_mbuf`` struct up to ``cacheline1``. 578 579In order to understand why software packet copying is done as shown 580above please refer to the "Mbuf Library" section of the 581*DPDK Programmer's Guide*. 582