1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2010-2014 Intel Corporation. 3 4Kernel NIC Interface Sample Application 5======================================= 6 7The Kernel NIC Interface (KNI) is a DPDK control plane solution that 8allows userspace applications to exchange packets with the kernel networking stack. 9To accomplish this, DPDK userspace applications use an IOCTL call 10to request the creation of a KNI virtual device in the Linux* kernel. 11The IOCTL call provides interface information and the DPDK's physical address space, 12which is re-mapped into the kernel address space by the KNI kernel loadable module 13that saves the information to a virtual device context. 14The DPDK creates FIFO queues for packet ingress and egress 15to the kernel module for each device allocated. 16 17The KNI kernel loadable module is a standard net driver, 18which upon receiving the IOCTL call access the DPDK's FIFO queue to 19receive/transmit packets from/to the DPDK userspace application. 20The FIFO queues contain pointers to data packets in the DPDK. This: 21 22* Provides a faster mechanism to interface with the kernel net stack and eliminates system calls 23 24* Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on) 25 26* Eliminate the copy_to_user and copy_from_user operations on packets. 27 28The Kernel NIC Interface sample application is a simple example that demonstrates the use 29of the DPDK to create a path for packets to go through the Linux* kernel. 30This is done by creating one or more kernel net devices for each of the DPDK ports. 31The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and 32also the exchange of packets between the DPDK application and the Linux* kernel. 33 34Overview 35-------- 36 37The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used, 38and allocates one or more KNI device for each physical NIC port with kernel module's support. 39For a physical NIC port, one thread reads from the port and writes to KNI devices, 40and another thread reads from KNI devices and writes the data unmodified to the physical NIC port. 41It is recommended to configure one KNI device for each physical NIC port. 42If configured with more than one KNI devices for a physical NIC port, 43it is just for performance testing, or it can work together with VMDq support in future. 44 45The packet flow through the Kernel NIC Interface application is as shown in the following figure. 46 47.. _figure_kernel_nic: 48 49.. figure:: img/kernel_nic.* 50 51 Kernel NIC Application Packet Flow 52 53Compiling the Application 54------------------------- 55 56To compile the sample application see :doc:`compiling`. 57 58The application is located in the ``kni`` sub-directory. 59 60.. note:: 61 62 This application is intended as a linuxapp only. 63 64Loading the Kernel Module 65------------------------- 66 67Loading the KNI kernel module without any parameter is the typical way a DPDK application 68gets packets into and out of the kernel net stack. 69This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side: 70 71.. code-block:: console 72 73 #insmod rte_kni.ko 74 75Pinning the kernel thread to a specific core can be done using a taskset command such as following: 76 77.. code-block:: console 78 79 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'` 80 81This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0), 82which means it needs to check if that lcore is available on the board. 83This command must be sent after the application has been launched, as insmod does not start the kni thread. 84 85For optimum performance, 86the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application. 87 88To provide flexibility of performance, the kernel module of the KNI, 89located in the kmod sub-directory of the DPDK target directory, 90can be loaded with parameter of kthread_mode as follows: 91 92* #insmod rte_kni.ko kthread_mode=single 93 94 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side. 95 By default, it is in this single kernel thread mode. 96 It can set core affinity for this kernel thread by using Linux command taskset. 97 98* #insmod rte_kni.ko kthread_mode =multiple 99 100 This mode will create a kernel thread for each KNI device for packet receiving in kernel side. 101 The core affinity of each kernel thread is set when creating the KNI device. 102 The lcore ID for each kernel thread is provided in the command line of launching the application. 103 Multiple kernel thread mode can provide scalable higher performance. 104 105To measure the throughput in a loopback mode, the kernel module of the KNI, 106located in the kmod sub-directory of the DPDK target directory, 107can be loaded with parameters as follows: 108 109* #insmod rte_kni.ko lo_mode=lo_mode_fifo 110 111 This loopback mode will involve ring enqueue/dequeue operations in kernel space. 112 113* #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb 114 115 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space. 116 117Running the Application 118----------------------- 119 120The application requires a number of command line options: 121 122.. code-block:: console 123 124 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]" 125 126Where: 127 128* -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. 129 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. 130 131* -p PORTMASK: Hexadecimal bitmask of ports to configure. 132 133* --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]": 134 Determines which lcores of RX, TX, kernel thread are mapped to which ports. 135 136Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options. 137 138The -c coremask or -l corelist parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx, 139but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on. 140The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less. 141 142The lcore_kthread in --config can be configured none, one or more lcore IDs. 143In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port, 144while no specific lcore affinity will be set for its kernel thread. 145If configured one or more lcore IDs, one or more KNI devices will be allocated for each port, 146while specific lcore affinity will be set for its kernel thread. 147In single kernel thread mode, if configured none, a KNI device will be allocated for each port. 148If configured one or more lcore IDs, 149one or more KNI devices will be allocated for each port while 150no lcore affinity will be set as there is only one kernel thread for all KNI devices. 151 152For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX, 153and one lcore of kernel thread for each port: 154 155.. code-block:: console 156 157 ./build/kni -l 4-7 -n 4 -- -P -p 0x3 --config="(0,4,6,8),(1,5,7,9)" 158 159KNI Operations 160-------------- 161 162Once the KNI application is started, one can use different Linux* commands to manage the net interfaces. 163If more than one KNI devices configured for a physical port, 164only the first KNI device will be paired to the physical device. 165Operations on other KNI devices will not affect the physical port handled in user space application. 166 167Assigning an IP address: 168 169.. code-block:: console 170 171 #ifconfig vEth0_0 192.168.0.1 172 173Displaying the NIC registers: 174 175.. code-block:: console 176 177 #ethtool -d vEth0_0 178 179Dumping the network traffic: 180 181.. code-block:: console 182 183 #tcpdump -i vEth0_0 184 185Change the MAC address: 186 187.. code-block:: console 188 189 #ifconfig vEth0_0 hw ether 0C:01:02:03:04:08 190 191When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*. 192 193Explanation 194----------- 195 196The following sections provide some explanation of code. 197 198Initialization 199~~~~~~~~~~~~~~ 200 201Setup of mbuf pool, driver and queues is similar to the setup done in the :doc:`l2_forward_real_virtual`.. 202In addition, one or more kernel NIC interfaces are allocated for each 203of the configured ports according to the command line parameters. 204 205The code for allocating the kernel NIC interfaces for a specific port is as follows: 206 207.. code-block:: c 208 209 static int 210 kni_alloc(uint16_t port_id) 211 { 212 uint8_t i; 213 struct rte_kni *kni; 214 struct rte_kni_conf conf; 215 struct kni_port_params **params = kni_port_params_array; 216 217 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id]) 218 return -1; 219 220 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1; 221 222 for (i = 0; i < params[port_id]->nb_kni; i++) { 223 224 /* Clear conf at first */ 225 226 memset(&conf, 0, sizeof(conf)); 227 if (params[port_id]->nb_lcore_k) { 228 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i); 229 conf.core_id = params[port_id]->lcore_k[i]; 230 conf.force_bind = 1; 231 } else 232 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id); 233 conf.group_id = (uint16_t)port_id; 234 conf.mbuf_size = MAX_PACKET_SZ; 235 236 /* 237 * The first KNI device associated to a port 238 * is the master, for multiple kernel thread 239 * environment. 240 */ 241 242 if (i == 0) { 243 struct rte_kni_ops ops; 244 struct rte_eth_dev_info dev_info; 245 246 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info); 247 248 conf.addr = dev_info.pci_dev->addr; 249 conf.id = dev_info.pci_dev->id; 250 251 /* Get the interface default mac address */ 252 rte_eth_macaddr_get(port_id, (struct ether_addr *)&conf.mac_addr); 253 254 memset(&ops, 0, sizeof(ops)); 255 256 ops.port_id = port_id; 257 ops.change_mtu = kni_change_mtu; 258 ops.config_network_if = kni_config_network_interface; 259 ops.config_mac_address = kni_config_mac_address; 260 261 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops); 262 } else 263 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL); 264 265 if (!kni) 266 rte_exit(EXIT_FAILURE, "Fail to create kni for " 267 "port: %d\n", port_id); 268 269 params[port_id]->kni[i] = kni; 270 } 271 return 0; 272 } 273 274The other step in the initialization process that is unique to this sample application 275is the association of each port with lcores for RX, TX and kernel threads. 276 277* One lcore to read from the port and write to the associated one or more KNI devices 278 279* Another lcore to read from one or more KNI devices and write to the port 280 281* Other lcores for pinning the kernel threads on one by one 282 283This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID. 284The code is as follows: 285 286.. code-block:: console 287 288 static int 289 parse_config(const char *arg) 290 { 291 const char *p, *p0 = arg; 292 char s[256], *end; 293 unsigned size; 294 enum fieldnames { 295 FLD_PORT = 0, 296 FLD_LCORE_RX, 297 FLD_LCORE_TX, 298 _NUM_FLD = KNI_MAX_KTHREAD + 3, 299 }; 300 int i, j, nb_token; 301 char *str_fld[_NUM_FLD]; 302 unsigned long int_fld[_NUM_FLD]; 303 uint16_t port_id, nb_kni_port_params = 0; 304 305 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array)); 306 307 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) { 308 p++; 309 if ((p0 = strchr(p, ')')) == NULL) 310 goto fail; 311 312 size = p0 - p; 313 314 if (size >= sizeof(s)) { 315 printf("Invalid config parameters\n"); 316 goto fail; 317 } 318 319 snprintf(s, sizeof(s), "%.*s", size, p); 320 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ','); 321 322 if (nb_token <= FLD_LCORE_TX) { 323 printf("Invalid config parameters\n"); 324 goto fail; 325 } 326 327 for (i = 0; i < nb_token; i++) { 328 errno = 0; 329 int_fld[i] = strtoul(str_fld[i], &end, 0); 330 if (errno != 0 || end == str_fld[i]) { 331 printf("Invalid config parameters\n"); 332 goto fail; 333 } 334 } 335 336 i = 0; 337 port_id = (uint8_t)int_fld[i++]; 338 339 if (port_id >= RTE_MAX_ETHPORTS) { 340 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS); 341 goto fail; 342 } 343 344 if (kni_port_params_array[port_id]) { 345 printf("Port %u has been configured\n", port_id); 346 goto fail; 347 } 348 349 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE); 350 kni_port_params_array[port_id]->port_id = port_id; 351 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++]; 352 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++]; 353 354 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) { 355 printf("lcore_rx %u or lcore_tx %u ID could not " 356 "exceed the maximum %u\n", 357 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE); 358 goto fail; 359 } 360 361 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++) 362 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i]; 363 kni_port_params_array[port_id]->nb_lcore_k = j; 364 } 365 366 print_config(); 367 368 return 0; 369 370 fail: 371 372 for (i = 0; i < RTE_MAX_ETHPORTS; i++) { 373 if (kni_port_params_array[i]) { 374 rte_free(kni_port_params_array[i]); 375 kni_port_params_array[i] = NULL; 376 } 377 } 378 379 return -1; 380 381 } 382 383Packet Forwarding 384~~~~~~~~~~~~~~~~~ 385 386After the initialization steps are completed, the main_loop() function is run on each lcore. 387This function first checks the lcore_id against the user provided lcore_rx and lcore_tx 388to see if this lcore is reading from or writing to kernel NIC interfaces. 389 390For the case that reads from a NIC port and writes to the kernel NIC interfaces, 391the packet reception is the same as in L2 Forwarding sample application 392(see :ref:`l2_fwd_app_rx_tx_packets`). 393The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst(). 394The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs. 395 396.. code-block:: c 397 398 /** 399 * Interface to burst rx and enqueue mbufs into rx_q 400 */ 401 402 static void 403 kni_ingress(struct kni_port_params *p) 404 { 405 uint8_t i, nb_kni, port_id; 406 unsigned nb_rx, num; 407 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 408 409 if (p == NULL) 410 return; 411 412 nb_kni = p->nb_kni; 413 port_id = p->port_id; 414 415 for (i = 0; i < nb_kni; i++) { 416 /* Burst rx from eth */ 417 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ); 418 if (unlikely(nb_rx > PKT_BURST_SZ)) { 419 RTE_LOG(ERR, APP, "Error receiving from eth\n"); 420 return; 421 } 422 423 /* Burst tx to kni */ 424 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx); 425 kni_stats[port_id].rx_packets += num; 426 rte_kni_handle_request(p->kni[i]); 427 428 if (unlikely(num < nb_rx)) { 429 /* Free mbufs not tx to kni interface */ 430 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num); 431 kni_stats[port_id].rx_dropped += nb_rx - num; 432 } 433 } 434 } 435 436For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading 437mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`. 438The packet transmission is the same as in the L2 Forwarding sample application 439(see :ref:`l2_fwd_app_rx_tx_packets`). 440 441.. code-block:: c 442 443 /** 444 * Interface to dequeue mbufs from tx_q and burst tx 445 */ 446 447 static void 448 449 kni_egress(struct kni_port_params *p) 450 { 451 uint8_t i, nb_kni, port_id; 452 unsigned nb_tx, num; 453 struct rte_mbuf *pkts_burst[PKT_BURST_SZ]; 454 455 if (p == NULL) 456 return; 457 458 nb_kni = p->nb_kni; 459 port_id = p->port_id; 460 461 for (i = 0; i < nb_kni; i++) { 462 /* Burst rx from kni */ 463 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ); 464 if (unlikely(num > PKT_BURST_SZ)) { 465 RTE_LOG(ERR, APP, "Error receiving from KNI\n"); 466 return; 467 } 468 469 /* Burst tx to eth */ 470 471 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num); 472 473 kni_stats[port_id].tx_packets += nb_tx; 474 475 if (unlikely(nb_tx < num)) { 476 /* Free mbufs not tx to NIC */ 477 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx); 478 kni_stats[port_id].tx_dropped += num - nb_tx; 479 } 480 } 481 } 482 483Callbacks for Kernel Requests 484~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 485 486To execute specific PMD operations in user space requested by some Linux* commands, 487callbacks must be implemented and filled in the struct rte_kni_ops structure. 488Currently, setting a new MTU, change in MAC address, configuring promiscusous mode and 489configuring the network interface(up/down) re supported. 490Default implementation for following is available in rte_kni library. 491Application may choose to not implement following callbacks: 492 493- ``config_mac_address`` 494- ``config_promiscusity`` 495 496 497.. code-block:: c 498 499 static struct rte_kni_ops kni_ops = { 500 .change_mtu = kni_change_mtu, 501 .config_network_if = kni_config_network_interface, 502 .config_mac_address = kni_config_mac_address, 503 .config_promiscusity = kni_config_promiscusity, 504 }; 505 506 /* Callback for request of changing MTU */ 507 508 static int 509 kni_change_mtu(uint16_t port_id, unsigned new_mtu) 510 { 511 int ret; 512 struct rte_eth_conf conf; 513 514 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu); 515 516 /* Stop specific port */ 517 518 rte_eth_dev_stop(port_id); 519 520 memcpy(&conf, &port_conf, sizeof(conf)); 521 522 /* Set new MTU */ 523 524 if (new_mtu > ETHER_MAX_LEN) 525 conf.rxmode.jumbo_frame = 1; 526 else 527 conf.rxmode.jumbo_frame = 0; 528 529 /* mtu + length of header + length of FCS = max pkt length */ 530 531 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE; 532 533 ret = rte_eth_dev_configure(port_id, 1, 1, &conf); 534 if (ret < 0) { 535 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id); 536 return ret; 537 } 538 539 /* Restart specific port */ 540 541 ret = rte_eth_dev_start(port_id); 542 if (ret < 0) { 543 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id); 544 return ret; 545 } 546 547 return 0; 548 } 549 550 /* Callback for request of configuring network interface up/down */ 551 552 static int 553 kni_config_network_interface(uint16_t port_id, uint8_t if_up) 554 { 555 int ret = 0; 556 557 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n", 558 559 port_id, if_up ? "up" : "down"); 560 561 if (if_up != 0) { 562 /* Configure network interface up */ 563 rte_eth_dev_stop(port_id); 564 ret = rte_eth_dev_start(port_id); 565 } else /* Configure network interface down */ 566 rte_eth_dev_stop(port_id); 567 568 if (ret < 0) 569 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id); 570 return ret; 571 } 572 573 /* Callback for request of configuring device mac address */ 574 575 static int 576 kni_config_mac_address(uint16_t port_id, uint8_t mac_addr[]) 577 { 578 ..... 579 } 580 581 /* Callback for request of configuring promiscuous mode */ 582 583 static int 584 kni_config_promiscusity(uint16_t port_id, uint8_t to_on) 585 { 586 ..... 587 } 588