1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31Kernel NIC Interface Sample Application 32======================================= 33 34The Kernel NIC Interface (KNI) is a DPDK control plane solution that 35allows userspace applications to exchange packets with the kernel networking stack. 36To accomplish this, DPDK userspace applications use an IOCTL call 37to request the creation of a KNI virtual device in the Linux* kernel. 38The IOCTL call provides interface information and the DPDK's physical address space, 39which is re-mapped into the kernel address space by the KNI kernel loadable module 40that saves the information to a virtual device context. 41The DPDK creates FIFO queues for packet ingress and egress 42to the kernel module for each device allocated. 43 44The KNI kernel loadable module is a standard net driver, 45which upon receiving the IOCTL call access the DPDK's FIFO queue to 46receive/transmit packets from/to the DPDK userspace application. 47The FIFO queues contain pointers to data packets in the DPDK. This: 48 49* Provides a faster mechanism to interface with the kernel net stack and eliminates system calls 50 51* Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on) 52 53* Eliminate the copy_to_user and copy_from_user operations on packets. 54 55The Kernel NIC Interface sample application is a simple example that demonstrates the use 56of the DPDK to create a path for packets to go through the Linux* kernel. 57This is done by creating one or more kernel net devices for each of the DPDK ports. 58The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and 59also the exchange of packets between the DPDK application and the Linux* kernel. 60 61Overview 62-------- 63 64The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used, 65and allocates one or more KNI device for each physical NIC port with kernel module's support. 66For a physical NIC port, one thread reads from the port and writes to KNI devices, 67and another thread reads from KNI devices and writes the data unmodified to the physical NIC port. 68It is recommended to configure one KNI device for each physical NIC port. 69If configured with more than one KNI devices for a physical NIC port, 70it is just for performance testing, or it can work together with VMDq support in future. 71 72The packet flow through the Kernel NIC Interface application is as shown in the following figure. 73 74.. _figure_kernel_nic: 75 76.. figure:: img/kernel_nic.* 77 78 Kernel NIC Application Packet Flow 79 80 81Compiling the Application 82------------------------- 83 84Compile the application as follows: 85 86#. Go to the example directory: 87 88 .. code-block:: console 89 90 export RTE_SDK=/path/to/rte_sdk cd 91 ${RTE_SDK}/examples/kni 92 93#. Set the target (a default target is used if not specified) 94 95 .. note:: 96 97 This application is intended as a linuxapp only. 98 99 .. code-block:: console 100 101 export RTE_TARGET=x86_64-native-linuxapp-gcc 102 103#. Build the application: 104 105 .. code-block:: console 106 107 make 108 109Loading the Kernel Module 110------------------------- 111 112Loading the KNI kernel module without any parameter is the typical way a DPDK application 113gets packets into and out of the kernel net stack. 114This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side: 115 116.. code-block:: console 117 118 #insmod rte_kni.ko 119 120Pinning the kernel thread to a specific core can be done using a taskset command such as following: 121 122.. code-block:: console 123 124 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'` 125 126This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0), 127which means it needs to check if that lcore is available on the board. 128This command must be sent after the application has been launched, as insmod does not start the kni thread. 129 130For optimum performance, 131the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application. 132 133To provide flexibility of performance, the kernel module of the KNI, 134located in the kmod sub-directory of the DPDK target directory, 135can be loaded with parameter of kthread_mode as follows: 136 137* #insmod rte_kni.ko kthread_mode=single 138 139 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side. 140 By default, it is in this single kernel thread mode. 141 It can set core affinity for this kernel thread by using Linux command taskset. 142 143* #insmod rte_kni.ko kthread_mode =multiple 144 145 This mode will create a kernel thread for each KNI device for packet receiving in kernel side. 146 The core affinity of each kernel thread is set when creating the KNI device. 147 The lcore ID for each kernel thread is provided in the command line of launching the application. 148 Multiple kernel thread mode can provide scalable higher performance. 149 150To measure the throughput in a loopback mode, the kernel module of the KNI, 151located in the kmod sub-directory of the DPDK target directory, 152can be loaded with parameters as follows: 153 154* #insmod rte_kni.ko lo_mode=lo_mode_fifo 155 156 This loopback mode will involve ring enqueue/dequeue operations in kernel space. 157 158* #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb 159 160 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space. 161 162Running the Application 163----------------------- 164 165The application requires a number of command line options: 166 167.. code-block:: console 168 169 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]" 170 171Where: 172 173* -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. 174 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. 175 176* -p PORTMASK: Hexadecimal bitmask of ports to configure. 177 178* --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]": 179 Determines which lcores of RX, TX, kernel thread are mapped to which ports. 180 181Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options. 182 183The -c coremask parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx, 184but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on. 185The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less. 186 187The lcore_kthread in --config can be configured none, one or more lcore IDs. 188In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port, 189while no specific lcore affinity will be set for its kernel thread. 190If configured one or more lcore IDs, one or more KNI devices will be allocated for each port, 191while specific lcore affinity will be set for its kernel thread. 192In single kernel thread mode, if configured none, a KNI device will be allocated for each port. 193If configured one or more lcore IDs, 194one or more KNI devices will be allocated for each port while 195no lcore affinity will be set as there is only one kernel thread for all KNI devices. 196 197For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX, 198and one lcore of kernel thread for each port: 199 200.. code-block:: console 201 202 ./build/kni -c 0xf0 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)" 203 204KNI Operations 205-------------- 206 207Once the KNI application is started, one can use different Linux* commands to manage the net interfaces. 208If more than one KNI devices configured for a physical port, 209only the first KNI device will be paired to the physical device. 210Operations on other KNI devices will not affect the physical port handled in user space application. 211 212Assigning an IP address: 213 214.. code-block:: console 215 216 #ifconfig vEth0_0 192.168.0.1 217 218Displaying the NIC registers: 219 220.. code-block:: console 221 222 #ethtool -d vEth0_0 223 224Dumping the network traffic: 225 226.. code-block:: console 227 228 #tcpdump -i vEth0_0 229 230When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*. 231 232Explanation 233----------- 234 235The following sections provide some explanation of code. 236 237Initialization 238~~~~~~~~~~~~~~ 239 240Setup of mbuf pool, driver and queues is similar to the setup done in the L2 Forwarding sample application 241(see Chapter 9 "L2 Forwarding Sample Application (in Real and Virtualized Environments" for details). 242In addition, one or more kernel NIC interfaces are allocated for each 243of the configured ports according to the command line parameters. 244 245The code for allocating the kernel NIC interfaces for a specific port is as follows: 246 247.. code-block:: c 248 249 static int 250 kni_alloc(uint8_t port_id) 251 { 252 uint8_t i; 253 struct rte_kni *kni; 254 struct rte_kni_conf conf; 255 struct kni_port_params **params = kni_port_params_array; 256 257 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id]) 258 return -1; 259 260 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1; 261 262 for (i = 0; i < params[port_id]->nb_kni; i++) { 263 264 /* Clear conf at first */ 265 266 memset(&conf, 0, sizeof(conf)); 267 if (params[port_id]->nb_lcore_k) { 268 rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i); 269 conf.core_id = params[port_id]->lcore_k[i]; 270 conf.force_bind = 1; 271 } else 272 rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id); 273 conf.group_id = (uint16_t)port_id; 274 conf.mbuf_size = MAX_PACKET_SZ; 275 276 /* 277 * The first KNI device associated to a port 278 * is the master, for multiple kernel thread 279 * environment. 280 */ 281 282 if (i == 0) { 283 struct rte_kni_ops ops; 284 struct rte_eth_dev_info dev_info; 285 286 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info); 287 288 conf.addr = dev_info.pci_dev->addr; 289 conf.id = dev_info.pci_dev->id; 290 291 memset(&ops, 0, sizeof(ops)); 292 293 ops.port_id = port_id; 294 ops.change_mtu = kni_change_mtu; 295 ops.config_network_if = kni_config_network_interface; 296 297 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops); 298 } else 299 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL); 300 301 if (!kni) 302 rte_exit(EXIT_FAILURE, "Fail to create kni for " 303 "port: %d\n", port_id); 304 305 params[port_id]->kni[i] = kni; 306 } 307 return 0; 308 } 309 310The other step in the initialization process that is unique to this sample application 311is the association of each port with lcores for RX, TX and kernel threads. 312 313* One lcore to read from the port and write to the associated one or more KNI devices 314 315* Another lcore to read from one or more KNI devices and write to the port 316 317* Other lcores for pinning the kernel threads on one by one 318 319This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID. 320The code is as follows: 321 322.. code-block:: console 323 324 static int 325 parse_config(const char *arg) 326 { 327 const char *p, *p0 = arg; 328 char s[256], *end; 329 unsigned size; 330 enum fieldnames { 331 FLD_PORT = 0, 332 FLD_LCORE_RX, 333 FLD_LCORE_TX, 334 _NUM_FLD = KNI_MAX_KTHREAD + 3, 335 }; 336 int i, j, nb_token; 337 char *str_fld[_NUM_FLD]; 338 unsigned long int_fld[_NUM_FLD]; 339 uint8_t port_id, nb_kni_port_params = 0; 340 341 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array)); 342 343 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) { 344 p++; 345 if ((p0 = strchr(p, ')')) == NULL) 346 goto fail; 347 348 size = p0 - p; 349 350 if (size >= sizeof(s)) { 351 printf("Invalid config parameters\n"); 352 goto fail; 353 } 354 355 rte_snprintf(s, sizeof(s), "%.*s", size, p); 356 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ','); 357 358 if (nb_token <= FLD_LCORE_TX) { 359 printf("Invalid config parameters\n"); 360 goto fail; 361 } 362 363 for (i = 0; i < nb_token; i++) { 364 errno = 0; 365 int_fld[i] = strtoul(str_fld[i], &end, 0); 366 if (errno != 0 || end == str_fld[i]) { 367 printf("Invalid config parameters\n"); 368 goto fail; 369 } 370 } 371 372 i = 0; 373 port_id = (uint8_t)int_fld[i++]; 374 375 if (port_id >= RTE_MAX_ETHPORTS) { 376 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS); 377 goto fail; 378 } 379 380 if (kni_port_params_array[port_id]) { 381 printf("Port %u has been configured\n", port_id); 382 goto fail; 383 } 384 385 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE); 386 kni_port_params_array[port_id]->port_id = port_id; 387 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++]; 388 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++]; 389 390 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) { 391 printf("lcore_rx %u or lcore_tx %u ID could not " 392 "exceed the maximum %u\n", 393 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE); 394 goto fail; 395 } 396 397 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++) 398 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i]; 399 kni_port_params_array[port_id]->nb_lcore_k = j; 400 } 401 402 print_config(); 403 404 return 0; 405 406 fail: 407 408 for (i = 0; i < RTE_MAX_ETHPORTS; i++) { 409 if (kni_port_params_array[i]) { 410 rte_free(kni_port_params_array[i]); 411 kni_port_params_array[i] = NULL; 412 } 413 } 414 415 return -1; 416 417 } 418 419Packet Forwarding 420~~~~~~~~~~~~~~~~~ 421 422After the initialization steps are completed, the main_loop() function is run on each lcore. 423This function first checks the lcore_id against the user provided lcore_rx and lcore_tx 424to see if this lcore is reading from or writing to kernel NIC interfaces. 425 426For the case that reads from a NIC port and writes to the kernel NIC interfaces, 427the packet reception is the same as in L2 Forwarding sample application 428(see Section 9.4.6 "Receive, Process and Transmit Packets"). 429The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst(). 430The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs. 431 432.. code-block:: c 433 434 /** 435 * Interface to burst rx and enqueue mbufs into rx_q 436 */ 437 438 static void 439 kni_ingress(struct kni_port_params *p) 440 { 441 uint8_t i, nb_kni, port_id; 442 unsigned nb_rx, num; 443 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 444 445 if (p == NULL) 446 return; 447 448 nb_kni = p->nb_kni; 449 port_id = p->port_id; 450 451 for (i = 0; i < nb_kni; i++) { 452 /* Burst rx from eth */ 453 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ); 454 if (unlikely(nb_rx > PKT_BURST_SZ)) { 455 RTE_LOG(ERR, APP, "Error receiving from eth\n"); 456 return; 457 } 458 459 /* Burst tx to kni */ 460 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx); 461 kni_stats[port_id].rx_packets += num; 462 rte_kni_handle_request(p->kni[i]); 463 464 if (unlikely(num < nb_rx)) { 465 /* Free mbufs not tx to kni interface */ 466 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num); 467 kni_stats[port_id].rx_dropped += nb_rx - num; 468 } 469 } 470 } 471 472For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading 473mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`. 474The packet transmission is the same as in the L2 Forwarding sample application 475(see Section 9.4.6 "Receive, Process and Transmit Packet's"). 476 477.. code-block:: c 478 479 /** 480 * Interface to dequeue mbufs from tx_q and burst tx 481 */ 482 483 static void 484 485 kni_egress(struct kni_port_params *p) 486 { 487 uint8_t i, nb_kni, port_id; 488 unsigned nb_tx, num; 489 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 490 491 if (p == NULL) 492 return; 493 494 nb_kni = p->nb_kni; 495 port_id = p->port_id; 496 497 for (i = 0; i < nb_kni; i++) { 498 /* Burst rx from kni */ 499 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ); 500 if (unlikely(num > PKT_BURST_SZ)) { 501 RTE_LOG(ERR, APP, "Error receiving from KNI\n"); 502 return; 503 } 504 505 /* Burst tx to eth */ 506 507 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num); 508 509 kni_stats[port_id].tx_packets += nb_tx; 510 511 if (unlikely(nb_tx < num)) { 512 /* Free mbufs not tx to NIC */ 513 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx); 514 kni_stats[port_id].tx_dropped += num - nb_tx; 515 } 516 } 517 } 518 519Callbacks for Kernel Requests 520~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 521 522To execute specific PMD operations in user space requested by some Linux* commands, 523callbacks must be implemented and filled in the struct rte_kni_ops structure. 524Currently, setting a new MTU and configuring the network interface (up/ down) are supported. 525 526.. code-block:: c 527 528 static struct rte_kni_ops kni_ops = { 529 .change_mtu = kni_change_mtu, 530 .config_network_if = kni_config_network_interface, 531 }; 532 533 /* Callback for request of changing MTU */ 534 535 static int 536 kni_change_mtu(uint8_t port_id, unsigned new_mtu) 537 { 538 int ret; 539 struct rte_eth_conf conf; 540 541 if (port_id >= rte_eth_dev_count()) { 542 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id); 543 return -EINVAL; 544 } 545 546 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu); 547 548 /* Stop specific port */ 549 550 rte_eth_dev_stop(port_id); 551 552 memcpy(&conf, &port_conf, sizeof(conf)); 553 554 /* Set new MTU */ 555 556 if (new_mtu > ETHER_MAX_LEN) 557 conf.rxmode.jumbo_frame = 1; 558 else 559 conf.rxmode.jumbo_frame = 0; 560 561 /* mtu + length of header + length of FCS = max pkt length */ 562 563 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE; 564 565 ret = rte_eth_dev_configure(port_id, 1, 1, &conf); 566 if (ret < 0) { 567 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id); 568 return ret; 569 } 570 571 /* Restart specific port */ 572 573 ret = rte_eth_dev_start(port_id); 574 if (ret < 0) { 575 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id); 576 return ret; 577 } 578 579 return 0; 580 } 581 582 /* Callback for request of configuring network interface up/down */ 583 584 static int 585 kni_config_network_interface(uint8_t port_id, uint8_t if_up) 586 { 587 int ret = 0; 588 589 if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) { 590 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id); 591 return -EINVAL; 592 } 593 594 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n", 595 596 port_id, if_up ? "up" : "down"); 597 598 if (if_up != 0) { 599 /* Configure network interface up */ 600 rte_eth_dev_stop(port_id); 601 ret = rte_eth_dev_start(port_id); 602 } else /* Configure network interface down */ 603 rte_eth_dev_stop(port_id); 604 605 if (ret < 0) 606 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id); 607 return ret; 608 } 609