1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31Kernel NIC Interface Sample Application 32======================================= 33 34The Kernel NIC Interface (KNI) is a DPDK control plane solution that 35allows userspace applications to exchange packets with the kernel networking stack. 36To accomplish this, DPDK userspace applications use an IOCTL call 37to request the creation of a KNI virtual device in the Linux* kernel. 38The IOCTL call provides interface information and the DPDK's physical address space, 39which is re-mapped into the kernel address space by the KNI kernel loadable module 40that saves the information to a virtual device context. 41The DPDK creates FIFO queues for packet ingress and egress 42to the kernel module for each device allocated. 43 44The KNI kernel loadable module is a standard net driver, 45which upon receiving the IOCTL call access the DPDK's FIFO queue to 46receive/transmit packets from/to the DPDK userspace application. 47The FIFO queues contain pointers to data packets in the DPDK. This: 48 49* Provides a faster mechanism to interface with the kernel net stack and eliminates system calls 50 51* Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on) 52 53* Eliminate the copy_to_user and copy_from_user operations on packets. 54 55The Kernel NIC Interface sample application is a simple example that demonstrates the use 56of the DPDK to create a path for packets to go through the Linux* kernel. 57This is done by creating one or more kernel net devices for each of the DPDK ports. 58The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and 59also the exchange of packets between the DPDK application and the Linux* kernel. 60 61Overview 62-------- 63 64The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used, 65and allocates one or more KNI device for each physical NIC port with kernel module's support. 66For a physical NIC port, one thread reads from the port and writes to KNI devices, 67and another thread reads from KNI devices and writes the data unmodified to the physical NIC port. 68It is recommended to configure one KNI device for each physical NIC port. 69If configured with more than one KNI devices for a physical NIC port, 70it is just for performance testing, or it can work together with VMDq support in future. 71 72The packet flow through the Kernel NIC Interface application is as shown in the following figure. 73 74.. _figure_kernel_nic: 75 76.. figure:: img/kernel_nic.* 77 78 Kernel NIC Application Packet Flow 79 80Compiling the Application 81------------------------- 82 83To compile the sample application see :doc:`compiling`. 84 85The application is located in the ``kni`` sub-directory. 86 87.. note:: 88 89 This application is intended as a linuxapp only. 90 91Loading the Kernel Module 92------------------------- 93 94Loading the KNI kernel module without any parameter is the typical way a DPDK application 95gets packets into and out of the kernel net stack. 96This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side: 97 98.. code-block:: console 99 100 #insmod rte_kni.ko 101 102Pinning the kernel thread to a specific core can be done using a taskset command such as following: 103 104.. code-block:: console 105 106 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'` 107 108This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0), 109which means it needs to check if that lcore is available on the board. 110This command must be sent after the application has been launched, as insmod does not start the kni thread. 111 112For optimum performance, 113the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application. 114 115To provide flexibility of performance, the kernel module of the KNI, 116located in the kmod sub-directory of the DPDK target directory, 117can be loaded with parameter of kthread_mode as follows: 118 119* #insmod rte_kni.ko kthread_mode=single 120 121 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side. 122 By default, it is in this single kernel thread mode. 123 It can set core affinity for this kernel thread by using Linux command taskset. 124 125* #insmod rte_kni.ko kthread_mode =multiple 126 127 This mode will create a kernel thread for each KNI device for packet receiving in kernel side. 128 The core affinity of each kernel thread is set when creating the KNI device. 129 The lcore ID for each kernel thread is provided in the command line of launching the application. 130 Multiple kernel thread mode can provide scalable higher performance. 131 132To measure the throughput in a loopback mode, the kernel module of the KNI, 133located in the kmod sub-directory of the DPDK target directory, 134can be loaded with parameters as follows: 135 136* #insmod rte_kni.ko lo_mode=lo_mode_fifo 137 138 This loopback mode will involve ring enqueue/dequeue operations in kernel space. 139 140* #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb 141 142 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space. 143 144Running the Application 145----------------------- 146 147The application requires a number of command line options: 148 149.. code-block:: console 150 151 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]" 152 153Where: 154 155* -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. 156 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. 157 158* -p PORTMASK: Hexadecimal bitmask of ports to configure. 159 160* --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]": 161 Determines which lcores of RX, TX, kernel thread are mapped to which ports. 162 163Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options. 164 165The -c coremask or -l corelist parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx, 166but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on. 167The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less. 168 169The lcore_kthread in --config can be configured none, one or more lcore IDs. 170In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port, 171while no specific lcore affinity will be set for its kernel thread. 172If configured one or more lcore IDs, one or more KNI devices will be allocated for each port, 173while specific lcore affinity will be set for its kernel thread. 174In single kernel thread mode, if configured none, a KNI device will be allocated for each port. 175If configured one or more lcore IDs, 176one or more KNI devices will be allocated for each port while 177no lcore affinity will be set as there is only one kernel thread for all KNI devices. 178 179For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX, 180and one lcore of kernel thread for each port: 181 182.. code-block:: console 183 184 ./build/kni -l 4-7 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)" 185 186KNI Operations 187-------------- 188 189Once the KNI application is started, one can use different Linux* commands to manage the net interfaces. 190If more than one KNI devices configured for a physical port, 191only the first KNI device will be paired to the physical device. 192Operations on other KNI devices will not affect the physical port handled in user space application. 193 194Assigning an IP address: 195 196.. code-block:: console 197 198 #ifconfig vEth0_0 192.168.0.1 199 200Displaying the NIC registers: 201 202.. code-block:: console 203 204 #ethtool -d vEth0_0 205 206Dumping the network traffic: 207 208.. code-block:: console 209 210 #tcpdump -i vEth0_0 211 212When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*. 213 214Explanation 215----------- 216 217The following sections provide some explanation of code. 218 219Initialization 220~~~~~~~~~~~~~~ 221 222Setup of mbuf pool, driver and queues is similar to the setup done in the :doc:`l2_forward_real_virtual`.. 223In addition, one or more kernel NIC interfaces are allocated for each 224of the configured ports according to the command line parameters. 225 226The code for allocating the kernel NIC interfaces for a specific port is as follows: 227 228.. code-block:: c 229 230 static int 231 kni_alloc(uint16_t port_id) 232 { 233 uint8_t i; 234 struct rte_kni *kni; 235 struct rte_kni_conf conf; 236 struct kni_port_params **params = kni_port_params_array; 237 238 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id]) 239 return -1; 240 241 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1; 242 243 for (i = 0; i < params[port_id]->nb_kni; i++) { 244 245 /* Clear conf at first */ 246 247 memset(&conf, 0, sizeof(conf)); 248 if (params[port_id]->nb_lcore_k) { 249 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i); 250 conf.core_id = params[port_id]->lcore_k[i]; 251 conf.force_bind = 1; 252 } else 253 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id); 254 conf.group_id = (uint16_t)port_id; 255 conf.mbuf_size = MAX_PACKET_SZ; 256 257 /* 258 * The first KNI device associated to a port 259 * is the master, for multiple kernel thread 260 * environment. 261 */ 262 263 if (i == 0) { 264 struct rte_kni_ops ops; 265 struct rte_eth_dev_info dev_info; 266 267 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info); 268 269 conf.addr = dev_info.pci_dev->addr; 270 conf.id = dev_info.pci_dev->id; 271 272 memset(&ops, 0, sizeof(ops)); 273 274 ops.port_id = port_id; 275 ops.change_mtu = kni_change_mtu; 276 ops.config_network_if = kni_config_network_interface; 277 278 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops); 279 } else 280 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL); 281 282 if (!kni) 283 rte_exit(EXIT_FAILURE, "Fail to create kni for " 284 "port: %d\n", port_id); 285 286 params[port_id]->kni[i] = kni; 287 } 288 return 0; 289 } 290 291The other step in the initialization process that is unique to this sample application 292is the association of each port with lcores for RX, TX and kernel threads. 293 294* One lcore to read from the port and write to the associated one or more KNI devices 295 296* Another lcore to read from one or more KNI devices and write to the port 297 298* Other lcores for pinning the kernel threads on one by one 299 300This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID. 301The code is as follows: 302 303.. code-block:: console 304 305 static int 306 parse_config(const char *arg) 307 { 308 const char *p, *p0 = arg; 309 char s[256], *end; 310 unsigned size; 311 enum fieldnames { 312 FLD_PORT = 0, 313 FLD_LCORE_RX, 314 FLD_LCORE_TX, 315 _NUM_FLD = KNI_MAX_KTHREAD + 3, 316 }; 317 int i, j, nb_token; 318 char *str_fld[_NUM_FLD]; 319 unsigned long int_fld[_NUM_FLD]; 320 uint16_t port_id, nb_kni_port_params = 0; 321 322 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array)); 323 324 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) { 325 p++; 326 if ((p0 = strchr(p, ')')) == NULL) 327 goto fail; 328 329 size = p0 - p; 330 331 if (size >= sizeof(s)) { 332 printf("Invalid config parameters\n"); 333 goto fail; 334 } 335 336 snprintf(s, sizeof(s), "%.*s", size, p); 337 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ','); 338 339 if (nb_token <= FLD_LCORE_TX) { 340 printf("Invalid config parameters\n"); 341 goto fail; 342 } 343 344 for (i = 0; i < nb_token; i++) { 345 errno = 0; 346 int_fld[i] = strtoul(str_fld[i], &end, 0); 347 if (errno != 0 || end == str_fld[i]) { 348 printf("Invalid config parameters\n"); 349 goto fail; 350 } 351 } 352 353 i = 0; 354 port_id = (uint8_t)int_fld[i++]; 355 356 if (port_id >= RTE_MAX_ETHPORTS) { 357 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS); 358 goto fail; 359 } 360 361 if (kni_port_params_array[port_id]) { 362 printf("Port %u has been configured\n", port_id); 363 goto fail; 364 } 365 366 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE); 367 kni_port_params_array[port_id]->port_id = port_id; 368 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++]; 369 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++]; 370 371 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) { 372 printf("lcore_rx %u or lcore_tx %u ID could not " 373 "exceed the maximum %u\n", 374 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE); 375 goto fail; 376 } 377 378 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++) 379 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i]; 380 kni_port_params_array[port_id]->nb_lcore_k = j; 381 } 382 383 print_config(); 384 385 return 0; 386 387 fail: 388 389 for (i = 0; i < RTE_MAX_ETHPORTS; i++) { 390 if (kni_port_params_array[i]) { 391 rte_free(kni_port_params_array[i]); 392 kni_port_params_array[i] = NULL; 393 } 394 } 395 396 return -1; 397 398 } 399 400Packet Forwarding 401~~~~~~~~~~~~~~~~~ 402 403After the initialization steps are completed, the main_loop() function is run on each lcore. 404This function first checks the lcore_id against the user provided lcore_rx and lcore_tx 405to see if this lcore is reading from or writing to kernel NIC interfaces. 406 407For the case that reads from a NIC port and writes to the kernel NIC interfaces, 408the packet reception is the same as in L2 Forwarding sample application 409(see :ref:`l2_fwd_app_rx_tx_packets`). 410The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst(). 411The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs. 412 413.. code-block:: c 414 415 /** 416 * Interface to burst rx and enqueue mbufs into rx_q 417 */ 418 419 static void 420 kni_ingress(struct kni_port_params *p) 421 { 422 uint8_t i, nb_kni, port_id; 423 unsigned nb_rx, num; 424 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 425 426 if (p == NULL) 427 return; 428 429 nb_kni = p->nb_kni; 430 port_id = p->port_id; 431 432 for (i = 0; i < nb_kni; i++) { 433 /* Burst rx from eth */ 434 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ); 435 if (unlikely(nb_rx > PKT_BURST_SZ)) { 436 RTE_LOG(ERR, APP, "Error receiving from eth\n"); 437 return; 438 } 439 440 /* Burst tx to kni */ 441 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx); 442 kni_stats[port_id].rx_packets += num; 443 rte_kni_handle_request(p->kni[i]); 444 445 if (unlikely(num < nb_rx)) { 446 /* Free mbufs not tx to kni interface */ 447 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num); 448 kni_stats[port_id].rx_dropped += nb_rx - num; 449 } 450 } 451 } 452 453For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading 454mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`. 455The packet transmission is the same as in the L2 Forwarding sample application 456(see :ref:`l2_fwd_app_rx_tx_packets`). 457 458.. code-block:: c 459 460 /** 461 * Interface to dequeue mbufs from tx_q and burst tx 462 */ 463 464 static void 465 466 kni_egress(struct kni_port_params *p) 467 { 468 uint8_t i, nb_kni, port_id; 469 unsigned nb_tx, num; 470 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 471 472 if (p == NULL) 473 return; 474 475 nb_kni = p->nb_kni; 476 port_id = p->port_id; 477 478 for (i = 0; i < nb_kni; i++) { 479 /* Burst rx from kni */ 480 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ); 481 if (unlikely(num > PKT_BURST_SZ)) { 482 RTE_LOG(ERR, APP, "Error receiving from KNI\n"); 483 return; 484 } 485 486 /* Burst tx to eth */ 487 488 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num); 489 490 kni_stats[port_id].tx_packets += nb_tx; 491 492 if (unlikely(nb_tx < num)) { 493 /* Free mbufs not tx to NIC */ 494 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx); 495 kni_stats[port_id].tx_dropped += num - nb_tx; 496 } 497 } 498 } 499 500Callbacks for Kernel Requests 501~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 502 503To execute specific PMD operations in user space requested by some Linux* commands, 504callbacks must be implemented and filled in the struct rte_kni_ops structure. 505Currently, setting a new MTU and configuring the network interface (up/ down) are supported. 506 507.. code-block:: c 508 509 static struct rte_kni_ops kni_ops = { 510 .change_mtu = kni_change_mtu, 511 .config_network_if = kni_config_network_interface, 512 }; 513 514 /* Callback for request of changing MTU */ 515 516 static int 517 kni_change_mtu(uint16_t port_id, unsigned new_mtu) 518 { 519 int ret; 520 struct rte_eth_conf conf; 521 522 if (port_id >= rte_eth_dev_count()) { 523 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id); 524 return -EINVAL; 525 } 526 527 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu); 528 529 /* Stop specific port */ 530 531 rte_eth_dev_stop(port_id); 532 533 memcpy(&conf, &port_conf, sizeof(conf)); 534 535 /* Set new MTU */ 536 537 if (new_mtu > ETHER_MAX_LEN) 538 conf.rxmode.jumbo_frame = 1; 539 else 540 conf.rxmode.jumbo_frame = 0; 541 542 /* mtu + length of header + length of FCS = max pkt length */ 543 544 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE; 545 546 ret = rte_eth_dev_configure(port_id, 1, 1, &conf); 547 if (ret < 0) { 548 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id); 549 return ret; 550 } 551 552 /* Restart specific port */ 553 554 ret = rte_eth_dev_start(port_id); 555 if (ret < 0) { 556 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id); 557 return ret; 558 } 559 560 return 0; 561 } 562 563 /* Callback for request of configuring network interface up/down */ 564 565 static int 566 kni_config_network_interface(uint16_t port_id, uint8_t if_up) 567 { 568 int ret = 0; 569 570 if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) { 571 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id); 572 return -EINVAL; 573 } 574 575 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n", 576 577 port_id, if_up ? "up" : "down"); 578 579 if (if_up != 0) { 580 /* Configure network interface up */ 581 rte_eth_dev_stop(port_id); 582 ret = rte_eth_dev_start(port_id); 583 } else /* Configure network interface down */ 584 rte_eth_dev_stop(port_id); 585 586 if (ret < 0) 587 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id); 588 return ret; 589 } 590