1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2010-2014 Intel Corporation.
3
4Kernel NIC Interface Sample Application
5=======================================
6
7The Kernel NIC Interface (KNI) is a DPDK control plane solution that
8allows userspace applications to exchange packets with the kernel networking stack.
9To accomplish this, DPDK userspace applications use an IOCTL call
10to request the creation of a KNI virtual device in the Linux* kernel.
11The IOCTL call provides interface information and the DPDK's physical address space,
12which is re-mapped into the kernel address space by the KNI kernel loadable module
13that saves the information to a virtual device context.
14The DPDK creates FIFO queues for packet ingress and egress
15to the kernel module for each device allocated.
16
17The KNI kernel loadable module is a standard net driver,
18which upon receiving the IOCTL call access the DPDK's FIFO queue to
19receive/transmit packets from/to the DPDK userspace application.
20The FIFO queues contain pointers to data packets in the DPDK. This:
21
22*   Provides a faster mechanism to interface with the kernel net stack and eliminates system calls
23
24*   Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on)
25
26*   Eliminate the copy_to_user and copy_from_user operations on packets.
27
28The Kernel NIC Interface sample application is a simple example that demonstrates the use
29of the DPDK to create a path for packets to go through the Linux* kernel.
30This is done by creating one or more kernel net devices for each of the DPDK ports.
31The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and
32also the exchange of packets between the DPDK application and the Linux* kernel.
33
34Overview
35--------
36
37The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used,
38and allocates one or more KNI device for each physical NIC port with kernel module's support.
39For a physical NIC port, one thread reads from the port and writes to KNI devices,
40and another thread reads from KNI devices and writes the data unmodified to the physical NIC port.
41It is recommended to configure one KNI device for each physical NIC port.
42If configured with more than one KNI devices for a physical NIC port,
43it is just for performance testing, or it can work together with VMDq support in future.
44
45The packet flow through the Kernel NIC Interface application is as shown in the following figure.
46
47.. _figure_kernel_nic:
48
49.. figure:: img/kernel_nic.*
50
51   Kernel NIC Application Packet Flow
52
53Compiling the Application
54-------------------------
55
56To compile the sample application see :doc:`compiling`.
57
58The application is located in the ``kni`` sub-directory.
59
60.. note::
61
62        This application is intended as a linuxapp only.
63
64Loading the Kernel Module
65-------------------------
66
67Loading the KNI kernel module without any parameter is the typical way a DPDK application
68gets packets into and out of the kernel net stack.
69This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side:
70
71.. code-block:: console
72
73    #insmod rte_kni.ko
74
75Pinning the kernel thread to a specific core can be done using a taskset command such as following:
76
77.. code-block:: console
78
79    #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'`
80
81This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0),
82which means it needs to check if that lcore is available on the board.
83This command must be sent after the application has been launched, as insmod does not start the kni thread.
84
85For optimum performance,
86the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application.
87
88To provide flexibility of performance, the kernel module of the KNI,
89located in the kmod sub-directory of the DPDK target directory,
90can be loaded with parameter of kthread_mode as follows:
91
92*   #insmod rte_kni.ko kthread_mode=single
93
94    This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side.
95    By default, it is in this single kernel thread mode.
96    It can set core affinity for this kernel thread by using Linux command taskset.
97
98*   #insmod rte_kni.ko kthread_mode =multiple
99
100    This mode will create a kernel thread for each KNI device for packet receiving in kernel side.
101    The core affinity of each kernel thread is set when creating the KNI device.
102    The lcore ID for each kernel thread is provided in the command line of launching the application.
103    Multiple kernel thread mode can provide scalable higher performance.
104
105To measure the throughput in a loopback mode, the kernel module of the KNI,
106located in the kmod sub-directory of the DPDK target directory,
107can be loaded with parameters as follows:
108
109*   #insmod rte_kni.ko lo_mode=lo_mode_fifo
110
111    This loopback mode will involve ring enqueue/dequeue operations in kernel space.
112
113*   #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb
114
115    This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space.
116
117Running the Application
118-----------------------
119
120The application requires a number of command line options:
121
122.. code-block:: console
123
124    kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]"
125
126Where:
127
128*   -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address.
129    Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted.
130
131*   -p PORTMASK: Hexadecimal bitmask of ports to configure.
132
133*   --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]":
134    Determines which lcores of RX, TX, kernel thread are mapped to which ports.
135
136Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options.
137
138The -c coremask or -l corelist parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx,
139but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on.
140The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less.
141
142The lcore_kthread in --config can be configured none, one or more lcore IDs.
143In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port,
144while no specific lcore affinity will be set for its kernel thread.
145If configured one or more lcore IDs, one or more KNI devices will be allocated for each port,
146while specific lcore affinity will be set for its kernel thread.
147In single kernel thread mode, if configured none, a KNI device will be allocated for each port.
148If configured one or more lcore IDs,
149one or more KNI devices will be allocated for each port while
150no lcore affinity will be set as there is only one kernel thread for all KNI devices.
151
152For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX,
153and one lcore of kernel thread for each port:
154
155.. code-block:: console
156
157    ./build/kni -l 4-7 -n 4 -- -P -p 0x3 --config="(0,4,6,8),(1,5,7,9)"
158
159KNI Operations
160--------------
161
162Once the KNI application is started, one can use different Linux* commands to manage the net interfaces.
163If more than one KNI devices configured for a physical port,
164only the first KNI device will be paired to the physical device.
165Operations on other KNI devices will not affect the physical port handled in user space application.
166
167Assigning an IP address:
168
169.. code-block:: console
170
171    #ifconfig vEth0_0 192.168.0.1
172
173Displaying the NIC registers:
174
175.. code-block:: console
176
177    #ethtool -d vEth0_0
178
179Dumping the network traffic:
180
181.. code-block:: console
182
183    #tcpdump -i vEth0_0
184
185Change the MAC address:
186
187.. code-block:: console
188
189    #ifconfig vEth0_0 hw ether 0C:01:02:03:04:08
190
191When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*.
192
193Explanation
194-----------
195
196The following sections provide some explanation of code.
197
198Initialization
199~~~~~~~~~~~~~~
200
201Setup of mbuf pool, driver and queues is similar to the setup done in the :doc:`l2_forward_real_virtual`..
202In addition, one or more kernel NIC interfaces are allocated for each
203of the configured ports according to the command line parameters.
204
205The code for allocating the kernel NIC interfaces for a specific port is as follows:
206
207.. code-block:: c
208
209    static int
210    kni_alloc(uint16_t port_id)
211    {
212        uint8_t i;
213        struct rte_kni *kni;
214        struct rte_kni_conf conf;
215        struct kni_port_params **params = kni_port_params_array;
216
217        if (port_id >= RTE_MAX_ETHPORTS || !params[port_id])
218            return -1;
219
220        params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1;
221
222        for (i = 0; i < params[port_id]->nb_kni; i++) {
223
224            /* Clear conf at first */
225
226            memset(&conf, 0, sizeof(conf));
227            if (params[port_id]->nb_lcore_k) {
228                snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i);
229                conf.core_id = params[port_id]->lcore_k[i];
230                conf.force_bind = 1;
231            } else
232                snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id);
233                conf.group_id = (uint16_t)port_id;
234                conf.mbuf_size = MAX_PACKET_SZ;
235
236                /*
237                 *   The first KNI device associated to a port
238                 *   is the master, for multiple kernel thread
239                 *   environment.
240                 */
241
242                if (i == 0) {
243                    struct rte_kni_ops ops;
244                    struct rte_eth_dev_info dev_info;
245
246                    memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info);
247
248                    conf.addr = dev_info.pci_dev->addr;
249                    conf.id = dev_info.pci_dev->id;
250
251                    /* Get the interface default mac address */
252                    rte_eth_macaddr_get(port_id, (struct ether_addr *)&conf.mac_addr);
253
254                    memset(&ops, 0, sizeof(ops));
255
256                    ops.port_id = port_id;
257                    ops.change_mtu = kni_change_mtu;
258                    ops.config_network_if = kni_config_network_interface;
259                    ops.config_mac_address = kni_config_mac_address;
260
261                    kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops);
262                } else
263                    kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL);
264
265                if (!kni)
266                    rte_exit(EXIT_FAILURE, "Fail to create kni for "
267                            "port: %d\n", port_id);
268
269                params[port_id]->kni[i] = kni;
270            }
271        return 0;
272   }
273
274The other step in the initialization process that is unique to this sample application
275is the association of each port with lcores for RX, TX and kernel threads.
276
277*   One lcore to read from the port and write to the associated one or more KNI devices
278
279*   Another lcore to read from one or more KNI devices and write to the port
280
281*   Other lcores for pinning the kernel threads on one by one
282
283This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID.
284The code is as follows:
285
286.. code-block:: console
287
288    static int
289    parse_config(const char *arg)
290    {
291        const char *p, *p0 = arg;
292        char s[256], *end;
293        unsigned size;
294        enum fieldnames {
295            FLD_PORT = 0,
296            FLD_LCORE_RX,
297            FLD_LCORE_TX,
298            _NUM_FLD = KNI_MAX_KTHREAD + 3,
299        };
300        int i, j, nb_token;
301        char *str_fld[_NUM_FLD];
302        unsigned long int_fld[_NUM_FLD];
303        uint16_t port_id, nb_kni_port_params = 0;
304
305        memset(&kni_port_params_array, 0, sizeof(kni_port_params_array));
306
307        while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) {
308            p++;
309            if ((p0 = strchr(p, ')')) == NULL)
310                goto fail;
311
312            size = p0 - p;
313
314            if (size >= sizeof(s)) {
315                printf("Invalid config parameters\n");
316                goto fail;
317            }
318
319            snprintf(s, sizeof(s), "%.*s", size, p);
320            nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',');
321
322            if (nb_token <= FLD_LCORE_TX) {
323                printf("Invalid config parameters\n");
324                goto fail;
325            }
326
327            for (i = 0; i < nb_token; i++) {
328                errno = 0;
329                int_fld[i] = strtoul(str_fld[i], &end, 0);
330                if (errno != 0 || end == str_fld[i]) {
331                    printf("Invalid config parameters\n");
332                    goto fail;
333                }
334            }
335
336            i = 0;
337            port_id = (uint8_t)int_fld[i++];
338
339            if (port_id >= RTE_MAX_ETHPORTS) {
340                printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS);
341                goto fail;
342            }
343
344            if (kni_port_params_array[port_id]) {
345                printf("Port %u has been configured\n", port_id);
346                goto fail;
347            }
348
349            kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE);
350            kni_port_params_array[port_id]->port_id = port_id;
351            kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++];
352            kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++];
353
354            if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) {
355                printf("lcore_rx %u or lcore_tx %u ID could not "
356                        "exceed the maximum %u\n",
357                        kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE);
358                goto fail;
359           }
360
361        for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++)
362            kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i];
363            kni_port_params_array[port_id]->nb_lcore_k = j;
364        }
365
366        print_config();
367
368        return 0;
369
370    fail:
371
372        for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
373            if (kni_port_params_array[i]) {
374                rte_free(kni_port_params_array[i]);
375                kni_port_params_array[i] = NULL;
376            }
377        }
378
379        return -1;
380
381    }
382
383Packet Forwarding
384~~~~~~~~~~~~~~~~~
385
386After the initialization steps are completed, the main_loop() function is run on each lcore.
387This function first checks the lcore_id against the user provided lcore_rx and lcore_tx
388to see if this lcore is reading from or writing to kernel NIC interfaces.
389
390For the case that reads from a NIC port and writes to the kernel NIC interfaces,
391the packet reception is the same as in L2 Forwarding sample application
392(see :ref:`l2_fwd_app_rx_tx_packets`).
393The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst().
394The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs.
395
396.. code-block:: c
397
398    /**
399     *   Interface to burst rx and enqueue mbufs into rx_q
400     */
401
402    static void
403    kni_ingress(struct kni_port_params *p)
404    {
405        uint8_t i, nb_kni, port_id;
406        unsigned nb_rx, num;
407        struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
408
409        if (p == NULL)
410            return;
411
412        nb_kni = p->nb_kni;
413        port_id = p->port_id;
414
415        for (i = 0; i < nb_kni; i++) {
416            /* Burst rx from eth */
417            nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
418            if (unlikely(nb_rx > PKT_BURST_SZ)) {
419                RTE_LOG(ERR, APP, "Error receiving from eth\n");
420                return;
421            }
422
423            /* Burst tx to kni */
424            num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
425            kni_stats[port_id].rx_packets += num;
426            rte_kni_handle_request(p->kni[i]);
427
428            if (unlikely(num < nb_rx)) {
429                /* Free mbufs not tx to kni interface */
430                kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num);
431                kni_stats[port_id].rx_dropped += nb_rx - num;
432            }
433        }
434    }
435
436For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading
437mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`.
438The packet transmission is the same as in the L2 Forwarding sample application
439(see :ref:`l2_fwd_app_rx_tx_packets`).
440
441.. code-block:: c
442
443    /**
444     *   Interface to dequeue mbufs from tx_q and burst tx
445     */
446
447    static void
448
449    kni_egress(struct kni_port_params *p)
450    {
451        uint8_t i, nb_kni, port_id;
452        unsigned nb_tx, num;
453        struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
454
455        if (p == NULL)
456            return;
457
458        nb_kni = p->nb_kni;
459        port_id = p->port_id;
460
461        for (i = 0; i < nb_kni; i++) {
462            /* Burst rx from kni */
463            num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ);
464            if (unlikely(num > PKT_BURST_SZ)) {
465                RTE_LOG(ERR, APP, "Error receiving from KNI\n");
466                return;
467            }
468
469            /* Burst tx to eth */
470
471            nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
472
473            kni_stats[port_id].tx_packets += nb_tx;
474
475            if (unlikely(nb_tx < num)) {
476                /* Free mbufs not tx to NIC */
477                kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
478                kni_stats[port_id].tx_dropped += num - nb_tx;
479            }
480        }
481    }
482
483Callbacks for Kernel Requests
484~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
485
486To execute specific PMD operations in user space requested by some Linux* commands,
487callbacks must be implemented and filled in the struct rte_kni_ops structure.
488Currently, setting a new MTU, change in MAC address, configuring promiscusous mode and
489configuring the network interface(up/down) re supported.
490Default implementation for following is available in rte_kni library.
491Application may choose to not implement following callbacks:
492
493- ``config_mac_address``
494- ``config_promiscusity``
495
496
497.. code-block:: c
498
499    static struct rte_kni_ops kni_ops = {
500        .change_mtu = kni_change_mtu,
501        .config_network_if = kni_config_network_interface,
502        .config_mac_address = kni_config_mac_address,
503        .config_promiscusity = kni_config_promiscusity,
504    };
505
506    /* Callback for request of changing MTU */
507
508    static int
509    kni_change_mtu(uint16_t port_id, unsigned new_mtu)
510    {
511        int ret;
512        struct rte_eth_conf conf;
513
514        RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu);
515
516        /* Stop specific port */
517
518        rte_eth_dev_stop(port_id);
519
520        memcpy(&conf, &port_conf, sizeof(conf));
521
522        /* Set new MTU */
523
524        if (new_mtu > ETHER_MAX_LEN)
525            conf.rxmode.jumbo_frame = 1;
526        else
527            conf.rxmode.jumbo_frame = 0;
528
529        /* mtu + length of header + length of FCS = max pkt length */
530
531        conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE;
532
533        ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
534        if (ret < 0) {
535            RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
536            return ret;
537        }
538
539        /* Restart specific port */
540
541        ret = rte_eth_dev_start(port_id);
542        if (ret < 0) {
543             RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
544            return ret;
545        }
546
547        return 0;
548    }
549
550    /* Callback for request of configuring network interface up/down */
551
552    static int
553    kni_config_network_interface(uint16_t port_id, uint8_t if_up)
554    {
555        int ret = 0;
556
557        RTE_LOG(INFO, APP, "Configure network interface of %d %s\n",
558
559        port_id, if_up ? "up" : "down");
560
561        if (if_up != 0) {
562            /* Configure network interface up */
563            rte_eth_dev_stop(port_id);
564            ret = rte_eth_dev_start(port_id);
565        } else /* Configure network interface down */
566            rte_eth_dev_stop(port_id);
567
568        if (ret < 0)
569            RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
570        return ret;
571    }
572
573    /* Callback for request of configuring device mac address */
574
575    static int
576    kni_config_mac_address(uint16_t port_id, uint8_t mac_addr[])
577    {
578        .....
579    }
580
581    /* Callback for request of configuring promiscuous mode */
582
583    static int
584    kni_config_promiscusity(uint16_t port_id, uint8_t to_on)
585    {
586        .....
587    }
588