1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31Kernel NIC Interface Sample Application 32======================================= 33 34The Kernel NIC Interface (KNI) is a DPDK control plane solution that 35allows userspace applications to exchange packets with the kernel networking stack. 36To accomplish this, DPDK userspace applications use an IOCTL call 37to request the creation of a KNI virtual device in the Linux* kernel. 38The IOCTL call provides interface information and the DPDK's physical address space, 39which is re-mapped into the kernel address space by the KNI kernel loadable module 40that saves the information to a virtual device context. 41The DPDK creates FIFO queues for packet ingress and egress 42to the kernel module for each device allocated. 43 44The KNI kernel loadable module is a standard net driver, 45which upon receiving the IOCTL call access the DPDK's FIFO queue to 46receive/transmit packets from/to the DPDK userspace application. 47The FIFO queues contain pointers to data packets in the DPDK. This: 48 49* Provides a faster mechanism to interface with the kernel net stack and eliminates system calls 50 51* Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on) 52 53* Eliminate the copy_to_user and copy_from_user operations on packets. 54 55The Kernel NIC Interface sample application is a simple example that demonstrates the use 56of the DPDK to create a path for packets to go through the Linux* kernel. 57This is done by creating one or more kernel net devices for each of the DPDK ports. 58The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and 59also the exchange of packets between the DPDK application and the Linux* kernel. 60 61Overview 62-------- 63 64The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used, 65and allocates one or more KNI device for each physical NIC port with kernel module's support. 66For a physical NIC port, one thread reads from the port and writes to KNI devices, 67and another thread reads from KNI devices and writes the data unmodified to the physical NIC port. 68It is recommended to configure one KNI device for each physical NIC port. 69If configured with more than one KNI devices for a physical NIC port, 70it is just for performance testing, or it can work together with VMDq support in future. 71 72The packet flow through the Kernel NIC Interface application is as shown in the following figure. 73 74.. _figure_2: 75 76**Figure 2. Kernel NIC Application Packet Flow** 77 78.. image3_png has been renamed to kernel_nic.png 79 80|kernel_nic| 81 82Compiling the Application 83------------------------- 84 85Compile the application as follows: 86 87#. Go to the example directory: 88 89 .. code-block:: console 90 91 export RTE_SDK=/path/to/rte_sdk cd 92 ${RTE_SDK}/examples/kni 93 94#. Set the target (a default target is used if not specified) 95 96 .. note:: 97 98 This application is intended as a linuxapp only. 99 100 .. code-block:: console 101 102 export RTE_TARGET=x86_64-native-linuxapp-gcc 103 104#. Build the application: 105 106 .. code-block:: console 107 108 make 109 110Loading the Kernel Module 111------------------------- 112 113Loading the KNI kernel module without any parameter is the typical way a DPDK application 114gets packets into and out of the kernel net stack. 115This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side: 116 117.. code-block:: console 118 119 #insmod rte_kni.ko 120 121Pinning the kernel thread to a specific core can be done using a taskset command such as following: 122 123.. code-block:: console 124 125 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'` 126 127This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0), 128which means it needs to check if that lcore is available on the board. 129This command must be sent after the application has been launched, as insmod does not start the kni thread. 130 131For optimum performance, 132the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application. 133 134To provide flexibility of performance, the kernel module of the KNI, 135located in the kmod sub-directory of the DPDK target directory, 136can be loaded with parameter of kthread_mode as follows: 137 138* #insmod rte_kni.ko kthread_mode=single 139 140 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side. 141 By default, it is in this single kernel thread mode. 142 It can set core affinity for this kernel thread by using Linux command taskset. 143 144* #insmod rte_kni.ko kthread_mode =multiple 145 146 This mode will create a kernel thread for each KNI device for packet receiving in kernel side. 147 The core affinity of each kernel thread is set when creating the KNI device. 148 The lcore ID for each kernel thread is provided in the command line of launching the application. 149 Multiple kernel thread mode can provide scalable higher performance. 150 151To measure the throughput in a loopback mode, the kernel module of the KNI, 152located in the kmod sub-directory of the DPDK target directory, 153can be loaded with parameters as follows: 154 155* #insmod rte_kni.ko lo_mode=lo_mode_fifo 156 157 This loopback mode will involve ring enqueue/dequeue operations in kernel space. 158 159* #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb 160 161 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space. 162 163Running the Application 164----------------------- 165 166The application requires a number of command line options: 167 168.. code-block:: console 169 170 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]" 171 172Where: 173 174* -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. 175 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. 176 177* -p PORTMASK: Hexadecimal bitmask of ports to configure. 178 179* --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]": 180 Determines which lcores of RX, TX, kernel thread are mapped to which ports. 181 182Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options. 183 184The -c coremask parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx, 185but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on. 186The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less. 187 188The lcore_kthread in --config can be configured none, one or more lcore IDs. 189In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port, 190while no specific lcore affinity will be set for its kernel thread. 191If configured one or more lcore IDs, one or more KNI devices will be allocated for each port, 192while specific lcore affinity will be set for its kernel thread. 193In single kernel thread mode, if configured none, a KNI device will be allocated for each port. 194If configured one or more lcore IDs, 195one or more KNI devices will be allocated for each port while 196no lcore affinity will be set as there is only one kernel thread for all KNI devices. 197 198For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX, 199and one lcore of kernel thread for each port: 200 201.. code-block:: console 202 203 ./build/kni -c 0xf0 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)" 204 205KNI Operations 206-------------- 207 208Once the KNI application is started, one can use different Linux* commands to manage the net interfaces. 209If more than one KNI devices configured for a physical port, 210only the first KNI device will be paired to the physical device. 211Operations on other KNI devices will not affect the physical port handled in user space application. 212 213Assigning an IP address: 214 215.. code-block:: console 216 217 #ifconfig vEth0_0 192.168.0.1 218 219Displaying the NIC registers: 220 221.. code-block:: console 222 223 #ethtool -d vEth0_0 224 225Dumping the network traffic: 226 227.. code-block:: console 228 229 #tcpdump -i vEth0_0 230 231When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*. 232 233Explanation 234----------- 235 236The following sections provide some explanation of code. 237 238Initialization 239~~~~~~~~~~~~~~ 240 241Setup of mbuf pool, driver and queues is similar to the setup done in the L2 Forwarding sample application 242(see Chapter 9 "L2 Forwarding Sample Application (in Real and Virtualized Environments" for details). 243In addition, one or more kernel NIC interfaces are allocated for each 244of the configured ports according to the command line parameters. 245 246The code for creating the kernel NIC interface for a specific port is as follows: 247 248.. code-block:: c 249 250 kni = rte_kni_create(port, MAX_PACKET_SZ, pktmbuf_pool, &kni_ops); 251 if (kni == NULL) 252 rte_exit(EXIT_FAILURE, "Fail to create kni dev " 253 "for port: %d\n", port); 254 255The code for allocating the kernel NIC interfaces for a specific port is as follows: 256 257.. code-block:: c 258 259 static int 260 kni_alloc(uint8_t port_id) 261 { 262 uint8_t i; 263 struct rte_kni *kni; 264 struct rte_kni_conf conf; 265 struct kni_port_params **params = kni_port_params_array; 266 267 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id]) 268 return -1; 269 270 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1; 271 272 for (i = 0; i < params[port_id]->nb_kni; i++) { 273 274 /* Clear conf at first */ 275 276 memset(&conf, 0, sizeof(conf)); 277 if (params[port_id]->nb_lcore_k) { 278 rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i); 279 conf.core_id = params[port_id]->lcore_k[i]; 280 conf.force_bind = 1; 281 } else 282 rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id); 283 conf.group_id = (uint16_t)port_id; 284 conf.mbuf_size = MAX_PACKET_SZ; 285 286 /* 287 * The first KNI device associated to a port 288 * is the master, for multiple kernel thread 289 * environment. 290 */ 291 292 if (i == 0) { 293 struct rte_kni_ops ops; 294 struct rte_eth_dev_info dev_info; 295 296 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info); 297 298 conf.addr = dev_info.pci_dev->addr; 299 conf.id = dev_info.pci_dev->id; 300 301 memset(&ops, 0, sizeof(ops)); 302 303 ops.port_id = port_id; 304 ops.change_mtu = kni_change_mtu; 305 ops.config_network_if = kni_config_network_interface; 306 307 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops); 308 } else 309 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL); 310 311 if (!kni) 312 rte_exit(EXIT_FAILURE, "Fail to create kni for " 313 "port: %d\n", port_id); 314 315 params[port_id]->kni[i] = kni; 316 } 317 return 0; 318 } 319 320The other step in the initialization process that is unique to this sample application 321is the association of each port with lcores for RX, TX and kernel threads. 322 323* One lcore to read from the port and write to the associated one or more KNI devices 324 325* Another lcore to read from one or more KNI devices and write to the port 326 327* Other lcores for pinning the kernel threads on one by one 328 329This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID. 330The code is as follows: 331 332.. code-block:: console 333 334 static int 335 parse_config(const char *arg) 336 { 337 const char *p, *p0 = arg; 338 char s[256], *end; 339 unsigned size; 340 enum fieldnames { 341 FLD_PORT = 0, 342 FLD_LCORE_RX, 343 FLD_LCORE_TX, 344 _NUM_FLD = KNI_MAX_KTHREAD + 3, 345 }; 346 int i, j, nb_token; 347 char *str_fld[_NUM_FLD]; 348 unsigned long int_fld[_NUM_FLD]; 349 uint8_t port_id, nb_kni_port_params = 0; 350 351 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array)); 352 353 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) { 354 p++; 355 if ((p0 = strchr(p, ')')) == NULL) 356 goto fail; 357 358 size = p0 - p; 359 360 if (size >= sizeof(s)) { 361 printf("Invalid config parameters\n"); 362 goto fail; 363 } 364 365 rte_snprintf(s, sizeof(s), "%.*s", size, p); 366 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ','); 367 368 if (nb_token <= FLD_LCORE_TX) { 369 printf("Invalid config parameters\n"); 370 goto fail; 371 } 372 373 for (i = 0; i < nb_token; i++) { 374 errno = 0; 375 int_fld[i] = strtoul(str_fld[i], &end, 0); 376 if (errno != 0 || end == str_fld[i]) { 377 printf("Invalid config parameters\n"); 378 goto fail; 379 } 380 } 381 382 i = 0; 383 port_id = (uint8_t)int_fld[i++]; 384 385 if (port_id >= RTE_MAX_ETHPORTS) { 386 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS); 387 goto fail; 388 } 389 390 if (kni_port_params_array[port_id]) { 391 printf("Port %u has been configured\n", port_id); 392 goto fail; 393 } 394 395 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE); 396 kni_port_params_array[port_id]->port_id = port_id; 397 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++]; 398 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++]; 399 400 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) { 401 printf("lcore_rx %u or lcore_tx %u ID could not " 402 "exceed the maximum %u\n", 403 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE); 404 goto fail; 405 } 406 407 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++) 408 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i]; 409 kni_port_params_array[port_id]->nb_lcore_k = j; 410 } 411 412 print_config(); 413 414 return 0; 415 416 fail: 417 418 for (i = 0; i < RTE_MAX_ETHPORTS; i++) { 419 if (kni_port_params_array[i]) { 420 rte_free(kni_port_params_array[i]); 421 kni_port_params_array[i] = NULL; 422 } 423 } 424 425 return -1; 426 427 } 428 429Packet Forwarding 430~~~~~~~~~~~~~~~~~ 431 432After the initialization steps are completed, the main_loop() function is run on each lcore. 433This function first checks the lcore_id against the user provided lcore_rx and lcore_tx 434to see if this lcore is reading from or writing to kernel NIC interfaces. 435 436For the case that reads from a NIC port and writes to the kernel NIC interfaces, 437the packet reception is the same as in L2 Forwarding sample application 438(see Section 9.4.6 "Receive, Process and Transmit Packets"). 439The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst(). 440The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs. 441 442.. code-block:: c 443 444 /** 445 * Interface to burst rx and enqueue mbufs into rx_q 446 */ 447 448 static void 449 kni_ingress(struct kni_port_params *p) 450 { 451 uint8_t i, nb_kni, port_id; 452 unsigned nb_rx, num; 453 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 454 455 if (p == NULL) 456 return; 457 458 nb_kni = p->nb_kni; 459 port_id = p->port_id; 460 461 for (i = 0; i < nb_kni; i++) { 462 /* Burst rx from eth */ 463 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ); 464 if (unlikely(nb_rx > PKT_BURST_SZ)) { 465 RTE_LOG(ERR, APP, "Error receiving from eth\n"); 466 return; 467 } 468 469 /* Burst tx to kni */ 470 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx); 471 kni_stats[port_id].rx_packets += num; 472 rte_kni_handle_request(p->kni[i]); 473 474 if (unlikely(num < nb_rx)) { 475 /* Free mbufs not tx to kni interface */ 476 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num); 477 kni_stats[port_id].rx_dropped += nb_rx - num; 478 } 479 } 480 } 481 482For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading 483mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`. 484The packet transmission is the same as in the L2 Forwarding sample application 485(see Section 9.4.6 "Receive, Process and Transmit Packet's"). 486 487.. code-block:: c 488 489 /** 490 * Interface to dequeue mbufs from tx_q and burst tx 491 */ 492 493 static void 494 495 kni_egress(struct kni_port_params *p) 496 { 497 uint8_t i, nb_kni, port_id; 498 unsigned nb_tx, num; 499 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 500 501 if (p == NULL) 502 return; 503 504 nb_kni = p->nb_kni; 505 port_id = p->port_id; 506 507 for (i = 0; i < nb_kni; i++) { 508 /* Burst rx from kni */ 509 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ); 510 if (unlikely(num > PKT_BURST_SZ)) { 511 RTE_LOG(ERR, APP, "Error receiving from KNI\n"); 512 return; 513 } 514 515 /* Burst tx to eth */ 516 517 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num); 518 519 kni_stats[port_id].tx_packets += nb_tx; 520 521 if (unlikely(nb_tx < num)) { 522 /* Free mbufs not tx to NIC */ 523 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx); 524 kni_stats[port_id].tx_dropped += num - nb_tx; 525 } 526 } 527 } 528 529Callbacks for Kernel Requests 530~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 531 532To execute specific PMD operations in user space requested by some Linux* commands, 533callbacks must be implemented and filled in the struct rte_kni_ops structure. 534Currently, setting a new MTU and configuring the network interface (up/ down) are supported. 535 536.. code-block:: c 537 538 static struct rte_kni_ops kni_ops = { 539 .change_mtu = kni_change_mtu, 540 .config_network_if = kni_config_network_interface, 541 }; 542 543 /* Callback for request of changing MTU */ 544 545 static int 546 kni_change_mtu(uint8_t port_id, unsigned new_mtu) 547 { 548 int ret; 549 struct rte_eth_conf conf; 550 551 if (port_id >= rte_eth_dev_count()) { 552 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id); 553 return -EINVAL; 554 } 555 556 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu); 557 558 /* Stop specific port */ 559 560 rte_eth_dev_stop(port_id); 561 562 memcpy(&conf, &port_conf, sizeof(conf)); 563 564 /* Set new MTU */ 565 566 if (new_mtu > ETHER_MAX_LEN) 567 conf.rxmode.jumbo_frame = 1; 568 else 569 conf.rxmode.jumbo_frame = 0; 570 571 /* mtu + length of header + length of FCS = max pkt length */ 572 573 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE; 574 575 ret = rte_eth_dev_configure(port_id, 1, 1, &conf); 576 if (ret < 0) { 577 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id); 578 return ret; 579 } 580 581 /* Restart specific port */ 582 583 ret = rte_eth_dev_start(port_id); 584 if (ret < 0) { 585 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id); 586 return ret; 587 } 588 589 return 0; 590 } 591 592 /* Callback for request of configuring network interface up/down */ 593 594 static int 595 kni_config_network_interface(uint8_t port_id, uint8_t if_up) 596 { 597 int ret = 0; 598 599 if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) { 600 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id); 601 return -EINVAL; 602 } 603 604 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n", 605 606 port_id, if_up ? "up" : "down"); 607 608 if (if_up != 0) { 609 /* Configure network interface up */ 610 rte_eth_dev_stop(port_id); 611 ret = rte_eth_dev_start(port_id); 612 } else /* Configure network interface down */ 613 rte_eth_dev_stop(port_id); 614 615 if (ret < 0) 616 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id); 617 return ret; 618 } 619 620.. |kernel_nic| image:: img/kernel_nic.png 621