1..  BSD LICENSE
2    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31Kernel NIC Interface Sample Application
32=======================================
33
34The Kernel NIC Interface (KNI) is a DPDK control plane solution that
35allows userspace applications to exchange packets with the kernel networking stack.
36To accomplish this, DPDK userspace applications use an IOCTL call
37to request the creation of a KNI virtual device in the Linux* kernel.
38The IOCTL call provides interface information and the DPDK's physical address space,
39which is re-mapped into the kernel address space by the KNI kernel loadable module
40that saves the information to a virtual device context.
41The DPDK creates FIFO queues for packet ingress and egress
42to the kernel module for each device allocated.
43
44The KNI kernel loadable module is a standard net driver,
45which upon receiving the IOCTL call access the DPDK's FIFO queue to
46receive/transmit packets from/to the DPDK userspace application.
47The FIFO queues contain pointers to data packets in the DPDK. This:
48
49*   Provides a faster mechanism to interface with the kernel net stack and eliminates system calls
50
51*   Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on)
52
53*   Eliminate the copy_to_user and copy_from_user operations on packets.
54
55The Kernel NIC Interface sample application is a simple example that demonstrates the use
56of the DPDK to create a path for packets to go through the Linux* kernel.
57This is done by creating one or more kernel net devices for each of the DPDK ports.
58The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and
59also the exchange of packets between the DPDK application and the Linux* kernel.
60
61Overview
62--------
63
64The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used,
65and allocates one or more KNI device for each physical NIC port with kernel module's support.
66For a physical NIC port, one thread reads from the port and writes to KNI devices,
67and another thread reads from KNI devices and writes the data unmodified to the physical NIC port.
68It is recommended to configure one KNI device for each physical NIC port.
69If configured with more than one KNI devices for a physical NIC port,
70it is just for performance testing, or it can work together with VMDq support in future.
71
72The packet flow through the Kernel NIC Interface application is as shown in the following figure.
73
74.. _figure_kernel_nic:
75
76.. figure:: img/kernel_nic.*
77
78   Kernel NIC Application Packet Flow
79
80Compiling the Application
81-------------------------
82
83To compile the sample application see :doc:`compiling`.
84
85The application is located in the ``kni`` sub-directory.
86
87.. note::
88
89        This application is intended as a linuxapp only.
90
91Loading the Kernel Module
92-------------------------
93
94Loading the KNI kernel module without any parameter is the typical way a DPDK application
95gets packets into and out of the kernel net stack.
96This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side:
97
98.. code-block:: console
99
100    #insmod rte_kni.ko
101
102Pinning the kernel thread to a specific core can be done using a taskset command such as following:
103
104.. code-block:: console
105
106    #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'`
107
108This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0),
109which means it needs to check if that lcore is available on the board.
110This command must be sent after the application has been launched, as insmod does not start the kni thread.
111
112For optimum performance,
113the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application.
114
115To provide flexibility of performance, the kernel module of the KNI,
116located in the kmod sub-directory of the DPDK target directory,
117can be loaded with parameter of kthread_mode as follows:
118
119*   #insmod rte_kni.ko kthread_mode=single
120
121    This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side.
122    By default, it is in this single kernel thread mode.
123    It can set core affinity for this kernel thread by using Linux command taskset.
124
125*   #insmod rte_kni.ko kthread_mode =multiple
126
127    This mode will create a kernel thread for each KNI device for packet receiving in kernel side.
128    The core affinity of each kernel thread is set when creating the KNI device.
129    The lcore ID for each kernel thread is provided in the command line of launching the application.
130    Multiple kernel thread mode can provide scalable higher performance.
131
132To measure the throughput in a loopback mode, the kernel module of the KNI,
133located in the kmod sub-directory of the DPDK target directory,
134can be loaded with parameters as follows:
135
136*   #insmod rte_kni.ko lo_mode=lo_mode_fifo
137
138    This loopback mode will involve ring enqueue/dequeue operations in kernel space.
139
140*   #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb
141
142    This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space.
143
144Running the Application
145-----------------------
146
147The application requires a number of command line options:
148
149.. code-block:: console
150
151    kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]"
152
153Where:
154
155*   -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address.
156    Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted.
157
158*   -p PORTMASK: Hexadecimal bitmask of ports to configure.
159
160*   --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]":
161    Determines which lcores of RX, TX, kernel thread are mapped to which ports.
162
163Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options.
164
165The -c coremask or -l corelist parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx,
166but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on.
167The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less.
168
169The lcore_kthread in --config can be configured none, one or more lcore IDs.
170In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port,
171while no specific lcore affinity will be set for its kernel thread.
172If configured one or more lcore IDs, one or more KNI devices will be allocated for each port,
173while specific lcore affinity will be set for its kernel thread.
174In single kernel thread mode, if configured none, a KNI device will be allocated for each port.
175If configured one or more lcore IDs,
176one or more KNI devices will be allocated for each port while
177no lcore affinity will be set as there is only one kernel thread for all KNI devices.
178
179For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX,
180and one lcore of kernel thread for each port:
181
182.. code-block:: console
183
184    ./build/kni -l 4-7 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)"
185
186KNI Operations
187--------------
188
189Once the KNI application is started, one can use different Linux* commands to manage the net interfaces.
190If more than one KNI devices configured for a physical port,
191only the first KNI device will be paired to the physical device.
192Operations on other KNI devices will not affect the physical port handled in user space application.
193
194Assigning an IP address:
195
196.. code-block:: console
197
198    #ifconfig vEth0_0 192.168.0.1
199
200Displaying the NIC registers:
201
202.. code-block:: console
203
204    #ethtool -d vEth0_0
205
206Dumping the network traffic:
207
208.. code-block:: console
209
210    #tcpdump -i vEth0_0
211
212When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*.
213
214Explanation
215-----------
216
217The following sections provide some explanation of code.
218
219Initialization
220~~~~~~~~~~~~~~
221
222Setup of mbuf pool, driver and queues is similar to the setup done in the :doc:`l2_forward_real_virtual`..
223In addition, one or more kernel NIC interfaces are allocated for each
224of the configured ports according to the command line parameters.
225
226The code for allocating the kernel NIC interfaces for a specific port is as follows:
227
228.. code-block:: c
229
230    static int
231    kni_alloc(uint16_t port_id)
232    {
233        uint8_t i;
234        struct rte_kni *kni;
235        struct rte_kni_conf conf;
236        struct kni_port_params **params = kni_port_params_array;
237
238        if (port_id >= RTE_MAX_ETHPORTS || !params[port_id])
239            return -1;
240
241        params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1;
242
243        for (i = 0; i < params[port_id]->nb_kni; i++) {
244
245            /* Clear conf at first */
246
247            memset(&conf, 0, sizeof(conf));
248            if (params[port_id]->nb_lcore_k) {
249                snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i);
250                conf.core_id = params[port_id]->lcore_k[i];
251                conf.force_bind = 1;
252            } else
253                snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id);
254                conf.group_id = (uint16_t)port_id;
255                conf.mbuf_size = MAX_PACKET_SZ;
256
257                /*
258                 *   The first KNI device associated to a port
259                 *   is the master, for multiple kernel thread
260                 *   environment.
261                 */
262
263                if (i == 0) {
264                    struct rte_kni_ops ops;
265                    struct rte_eth_dev_info dev_info;
266
267                    memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info);
268
269                    conf.addr = dev_info.pci_dev->addr;
270                    conf.id = dev_info.pci_dev->id;
271
272                    memset(&ops, 0, sizeof(ops));
273
274                    ops.port_id = port_id;
275                    ops.change_mtu = kni_change_mtu;
276                    ops.config_network_if = kni_config_network_interface;
277
278                    kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops);
279                } else
280                    kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL);
281
282                if (!kni)
283                    rte_exit(EXIT_FAILURE, "Fail to create kni for "
284                            "port: %d\n", port_id);
285
286                params[port_id]->kni[i] = kni;
287            }
288        return 0;
289   }
290
291The other step in the initialization process that is unique to this sample application
292is the association of each port with lcores for RX, TX and kernel threads.
293
294*   One lcore to read from the port and write to the associated one or more KNI devices
295
296*   Another lcore to read from one or more KNI devices and write to the port
297
298*   Other lcores for pinning the kernel threads on one by one
299
300This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID.
301The code is as follows:
302
303.. code-block:: console
304
305    static int
306    parse_config(const char *arg)
307    {
308        const char *p, *p0 = arg;
309        char s[256], *end;
310        unsigned size;
311        enum fieldnames {
312            FLD_PORT = 0,
313            FLD_LCORE_RX,
314            FLD_LCORE_TX,
315            _NUM_FLD = KNI_MAX_KTHREAD + 3,
316        };
317        int i, j, nb_token;
318        char *str_fld[_NUM_FLD];
319        unsigned long int_fld[_NUM_FLD];
320        uint16_t port_id, nb_kni_port_params = 0;
321
322        memset(&kni_port_params_array, 0, sizeof(kni_port_params_array));
323
324        while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) {
325            p++;
326            if ((p0 = strchr(p, ')')) == NULL)
327                goto fail;
328
329            size = p0 - p;
330
331            if (size >= sizeof(s)) {
332                printf("Invalid config parameters\n");
333                goto fail;
334            }
335
336            snprintf(s, sizeof(s), "%.*s", size, p);
337            nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',');
338
339            if (nb_token <= FLD_LCORE_TX) {
340                printf("Invalid config parameters\n");
341                goto fail;
342            }
343
344            for (i = 0; i < nb_token; i++) {
345                errno = 0;
346                int_fld[i] = strtoul(str_fld[i], &end, 0);
347                if (errno != 0 || end == str_fld[i]) {
348                    printf("Invalid config parameters\n");
349                    goto fail;
350                }
351            }
352
353            i = 0;
354            port_id = (uint8_t)int_fld[i++];
355
356            if (port_id >= RTE_MAX_ETHPORTS) {
357                printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS);
358                goto fail;
359            }
360
361            if (kni_port_params_array[port_id]) {
362                printf("Port %u has been configured\n", port_id);
363                goto fail;
364            }
365
366            kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE);
367            kni_port_params_array[port_id]->port_id = port_id;
368            kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++];
369            kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++];
370
371            if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) {
372                printf("lcore_rx %u or lcore_tx %u ID could not "
373                        "exceed the maximum %u\n",
374                        kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE);
375                goto fail;
376           }
377
378        for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++)
379            kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i];
380            kni_port_params_array[port_id]->nb_lcore_k = j;
381        }
382
383        print_config();
384
385        return 0;
386
387    fail:
388
389        for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
390            if (kni_port_params_array[i]) {
391                rte_free(kni_port_params_array[i]);
392                kni_port_params_array[i] = NULL;
393            }
394        }
395
396        return -1;
397
398    }
399
400Packet Forwarding
401~~~~~~~~~~~~~~~~~
402
403After the initialization steps are completed, the main_loop() function is run on each lcore.
404This function first checks the lcore_id against the user provided lcore_rx and lcore_tx
405to see if this lcore is reading from or writing to kernel NIC interfaces.
406
407For the case that reads from a NIC port and writes to the kernel NIC interfaces,
408the packet reception is the same as in L2 Forwarding sample application
409(see :ref:`l2_fwd_app_rx_tx_packets`).
410The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst().
411The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs.
412
413.. code-block:: c
414
415    /**
416     *   Interface to burst rx and enqueue mbufs into rx_q
417     */
418
419    static void
420    kni_ingress(struct kni_port_params *p)
421    {
422        uint8_t i, nb_kni, port_id;
423        unsigned nb_rx, num;
424        struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
425
426        if (p == NULL)
427            return;
428
429        nb_kni = p->nb_kni;
430        port_id = p->port_id;
431
432        for (i = 0; i < nb_kni; i++) {
433            /* Burst rx from eth */
434            nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
435            if (unlikely(nb_rx > PKT_BURST_SZ)) {
436                RTE_LOG(ERR, APP, "Error receiving from eth\n");
437                return;
438            }
439
440            /* Burst tx to kni */
441            num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
442            kni_stats[port_id].rx_packets += num;
443            rte_kni_handle_request(p->kni[i]);
444
445            if (unlikely(num < nb_rx)) {
446                /* Free mbufs not tx to kni interface */
447                kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num);
448                kni_stats[port_id].rx_dropped += nb_rx - num;
449            }
450        }
451    }
452
453For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading
454mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`.
455The packet transmission is the same as in the L2 Forwarding sample application
456(see :ref:`l2_fwd_app_rx_tx_packets`).
457
458.. code-block:: c
459
460    /**
461     *   Interface to dequeue mbufs from tx_q and burst tx
462     */
463
464    static void
465
466    kni_egress(struct kni_port_params *p)
467    {
468        uint8_t i, nb_kni, port_id;
469        unsigned nb_tx, num;
470        struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
471
472        if (p == NULL)
473            return;
474
475        nb_kni = p->nb_kni;
476        port_id = p->port_id;
477
478        for (i = 0; i < nb_kni; i++) {
479            /* Burst rx from kni */
480            num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ);
481            if (unlikely(num > PKT_BURST_SZ)) {
482                RTE_LOG(ERR, APP, "Error receiving from KNI\n");
483                return;
484            }
485
486            /* Burst tx to eth */
487
488            nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
489
490            kni_stats[port_id].tx_packets += nb_tx;
491
492            if (unlikely(nb_tx < num)) {
493                /* Free mbufs not tx to NIC */
494                kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
495                kni_stats[port_id].tx_dropped += num - nb_tx;
496            }
497        }
498    }
499
500Callbacks for Kernel Requests
501~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
502
503To execute specific PMD operations in user space requested by some Linux* commands,
504callbacks must be implemented and filled in the struct rte_kni_ops structure.
505Currently, setting a new MTU and configuring the network interface (up/ down) are supported.
506
507.. code-block:: c
508
509    static struct rte_kni_ops kni_ops = {
510        .change_mtu = kni_change_mtu,
511        .config_network_if = kni_config_network_interface,
512    };
513
514    /* Callback for request of changing MTU */
515
516    static int
517    kni_change_mtu(uint16_t port_id, unsigned new_mtu)
518    {
519        int ret;
520        struct rte_eth_conf conf;
521
522        if (port_id >= rte_eth_dev_count()) {
523            RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
524            return -EINVAL;
525        }
526
527        RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu);
528
529        /* Stop specific port */
530
531        rte_eth_dev_stop(port_id);
532
533        memcpy(&conf, &port_conf, sizeof(conf));
534
535        /* Set new MTU */
536
537        if (new_mtu > ETHER_MAX_LEN)
538            conf.rxmode.jumbo_frame = 1;
539        else
540            conf.rxmode.jumbo_frame = 0;
541
542        /* mtu + length of header + length of FCS = max pkt length */
543
544        conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE;
545
546        ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
547        if (ret < 0) {
548            RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
549            return ret;
550        }
551
552        /* Restart specific port */
553
554        ret = rte_eth_dev_start(port_id);
555        if (ret < 0) {
556             RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
557            return ret;
558        }
559
560        return 0;
561    }
562
563    /* Callback for request of configuring network interface up/down */
564
565    static int
566    kni_config_network_interface(uint16_t port_id, uint8_t if_up)
567    {
568        int ret = 0;
569
570        if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) {
571            RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
572            return -EINVAL;
573        }
574
575        RTE_LOG(INFO, APP, "Configure network interface of %d %s\n",
576
577        port_id, if_up ? "up" : "down");
578
579        if (if_up != 0) {
580            /* Configure network interface up */
581            rte_eth_dev_stop(port_id);
582            ret = rte_eth_dev_start(port_id);
583        } else /* Configure network interface down */
584            rte_eth_dev_stop(port_id);
585
586        if (ret < 0)
587            RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
588        return ret;
589    }
590