1..  BSD LICENSE
2    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31Kernel NIC Interface Sample Application
32=======================================
33
34The Kernel NIC Interface (KNI) is a DPDK control plane solution that
35allows userspace applications to exchange packets with the kernel networking stack.
36To accomplish this, DPDK userspace applications use an IOCTL call
37to request the creation of a KNI virtual device in the Linux* kernel.
38The IOCTL call provides interface information and the DPDK's physical address space,
39which is re-mapped into the kernel address space by the KNI kernel loadable module
40that saves the information to a virtual device context.
41The DPDK creates FIFO queues for packet ingress and egress
42to the kernel module for each device allocated.
43
44The KNI kernel loadable module is a standard net driver,
45which upon receiving the IOCTL call access the DPDK's FIFO queue to
46receive/transmit packets from/to the DPDK userspace application.
47The FIFO queues contain pointers to data packets in the DPDK. This:
48
49*   Provides a faster mechanism to interface with the kernel net stack and eliminates system calls
50
51*   Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on)
52
53*   Eliminate the copy_to_user and copy_from_user operations on packets.
54
55The Kernel NIC Interface sample application is a simple example that demonstrates the use
56of the DPDK to create a path for packets to go through the Linux* kernel.
57This is done by creating one or more kernel net devices for each of the DPDK ports.
58The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and
59also the exchange of packets between the DPDK application and the Linux* kernel.
60
61Overview
62--------
63
64The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used,
65and allocates one or more KNI device for each physical NIC port with kernel module's support.
66For a physical NIC port, one thread reads from the port and writes to KNI devices,
67and another thread reads from KNI devices and writes the data unmodified to the physical NIC port.
68It is recommended to configure one KNI device for each physical NIC port.
69If configured with more than one KNI devices for a physical NIC port,
70it is just for performance testing, or it can work together with VMDq support in future.
71
72The packet flow through the Kernel NIC Interface application is as shown in the following figure.
73
74.. _figure_kernel_nic:
75
76.. figure:: img/kernel_nic.*
77
78   Kernel NIC Application Packet Flow
79
80
81Compiling the Application
82-------------------------
83
84Compile the application as follows:
85
86#.  Go to the example directory:
87
88    .. code-block:: console
89
90        export RTE_SDK=/path/to/rte_sdk cd
91        ${RTE_SDK}/examples/kni
92
93#.  Set the target (a default target is used if not specified)
94
95    .. note::
96
97        This application is intended as a linuxapp only.
98
99    .. code-block:: console
100
101        export RTE_TARGET=x86_64-native-linuxapp-gcc
102
103#.  Build the application:
104
105    .. code-block:: console
106
107        make
108
109Loading the Kernel Module
110-------------------------
111
112Loading the KNI kernel module without any parameter is the typical way a DPDK application
113gets packets into and out of the kernel net stack.
114This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side:
115
116.. code-block:: console
117
118    #insmod rte_kni.ko
119
120Pinning the kernel thread to a specific core can be done using a taskset command such as following:
121
122.. code-block:: console
123
124    #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'`
125
126This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0),
127which means it needs to check if that lcore is available on the board.
128This command must be sent after the application has been launched, as insmod does not start the kni thread.
129
130For optimum performance,
131the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application.
132
133To provide flexibility of performance, the kernel module of the KNI,
134located in the kmod sub-directory of the DPDK target directory,
135can be loaded with parameter of kthread_mode as follows:
136
137*   #insmod rte_kni.ko kthread_mode=single
138
139    This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side.
140    By default, it is in this single kernel thread mode.
141    It can set core affinity for this kernel thread by using Linux command taskset.
142
143*   #insmod rte_kni.ko kthread_mode =multiple
144
145    This mode will create a kernel thread for each KNI device for packet receiving in kernel side.
146    The core affinity of each kernel thread is set when creating the KNI device.
147    The lcore ID for each kernel thread is provided in the command line of launching the application.
148    Multiple kernel thread mode can provide scalable higher performance.
149
150To measure the throughput in a loopback mode, the kernel module of the KNI,
151located in the kmod sub-directory of the DPDK target directory,
152can be loaded with parameters as follows:
153
154*   #insmod rte_kni.ko lo_mode=lo_mode_fifo
155
156    This loopback mode will involve ring enqueue/dequeue operations in kernel space.
157
158*   #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb
159
160    This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space.
161
162Running the Application
163-----------------------
164
165The application requires a number of command line options:
166
167.. code-block:: console
168
169    kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]"
170
171Where:
172
173*   -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address.
174    Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted.
175
176*   -p PORTMASK: Hexadecimal bitmask of ports to configure.
177
178*   --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]":
179    Determines which lcores of RX, TX, kernel thread are mapped to which ports.
180
181Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options.
182
183The -c coremask parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx,
184but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on.
185The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less.
186
187The lcore_kthread in --config can be configured none, one or more lcore IDs.
188In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port,
189while no specific lcore affinity will be set for its kernel thread.
190If configured one or more lcore IDs, one or more KNI devices will be allocated for each port,
191while specific lcore affinity will be set for its kernel thread.
192In single kernel thread mode, if configured none, a KNI device will be allocated for each port.
193If configured one or more lcore IDs,
194one or more KNI devices will be allocated for each port while
195no lcore affinity will be set as there is only one kernel thread for all KNI devices.
196
197For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX,
198and one lcore of kernel thread for each port:
199
200.. code-block:: console
201
202    ./build/kni -c 0xf0 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)"
203
204KNI Operations
205--------------
206
207Once the KNI application is started, one can use different Linux* commands to manage the net interfaces.
208If more than one KNI devices configured for a physical port,
209only the first KNI device will be paired to the physical device.
210Operations on other KNI devices will not affect the physical port handled in user space application.
211
212Assigning an IP address:
213
214.. code-block:: console
215
216    #ifconfig vEth0_0 192.168.0.1
217
218Displaying the NIC registers:
219
220.. code-block:: console
221
222    #ethtool -d vEth0_0
223
224Dumping the network traffic:
225
226.. code-block:: console
227
228    #tcpdump -i vEth0_0
229
230When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*.
231
232Explanation
233-----------
234
235The following sections provide some explanation of code.
236
237Initialization
238~~~~~~~~~~~~~~
239
240Setup of mbuf pool, driver and queues is similar to the setup done in the L2 Forwarding sample application
241(see Chapter 9 "L2 Forwarding Sample Application (in Real and Virtualized Environments" for details).
242In addition, one or more kernel NIC interfaces are allocated for each
243of the configured ports according to the command line parameters.
244
245The code for allocating the kernel NIC interfaces for a specific port is as follows:
246
247.. code-block:: c
248
249    static int
250    kni_alloc(uint8_t port_id)
251    {
252        uint8_t i;
253        struct rte_kni *kni;
254        struct rte_kni_conf conf;
255        struct kni_port_params **params = kni_port_params_array;
256
257        if (port_id >= RTE_MAX_ETHPORTS || !params[port_id])
258            return -1;
259
260        params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1;
261
262        for (i = 0; i < params[port_id]->nb_kni; i++) {
263
264            /* Clear conf at first */
265
266            memset(&conf, 0, sizeof(conf));
267            if (params[port_id]->nb_lcore_k) {
268                rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i);
269                conf.core_id = params[port_id]->lcore_k[i];
270                conf.force_bind = 1;
271            } else
272                rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id);
273                conf.group_id = (uint16_t)port_id;
274                conf.mbuf_size = MAX_PACKET_SZ;
275
276                /*
277                 *   The first KNI device associated to a port
278                 *   is the master, for multiple kernel thread
279                 *   environment.
280                 */
281
282                if (i == 0) {
283                    struct rte_kni_ops ops;
284                    struct rte_eth_dev_info dev_info;
285
286                    memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info);
287
288                    conf.addr = dev_info.pci_dev->addr;
289                    conf.id = dev_info.pci_dev->id;
290
291                    memset(&ops, 0, sizeof(ops));
292
293                    ops.port_id = port_id;
294                    ops.change_mtu = kni_change_mtu;
295                    ops.config_network_if = kni_config_network_interface;
296
297                    kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops);
298                } else
299                    kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL);
300
301                if (!kni)
302                    rte_exit(EXIT_FAILURE, "Fail to create kni for "
303                            "port: %d\n", port_id);
304
305                params[port_id]->kni[i] = kni;
306            }
307        return 0;
308   }
309
310The other step in the initialization process that is unique to this sample application
311is the association of each port with lcores for RX, TX and kernel threads.
312
313*   One lcore to read from the port and write to the associated one or more KNI devices
314
315*   Another lcore to read from one or more KNI devices and write to the port
316
317*   Other lcores for pinning the kernel threads on one by one
318
319This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID.
320The code is as follows:
321
322.. code-block:: console
323
324    static int
325    parse_config(const char *arg)
326    {
327        const char *p, *p0 = arg;
328        char s[256], *end;
329        unsigned size;
330        enum fieldnames {
331            FLD_PORT = 0,
332            FLD_LCORE_RX,
333            FLD_LCORE_TX,
334            _NUM_FLD = KNI_MAX_KTHREAD + 3,
335        };
336        int i, j, nb_token;
337        char *str_fld[_NUM_FLD];
338        unsigned long int_fld[_NUM_FLD];
339        uint8_t port_id, nb_kni_port_params = 0;
340
341        memset(&kni_port_params_array, 0, sizeof(kni_port_params_array));
342
343        while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) {
344            p++;
345            if ((p0 = strchr(p, ')')) == NULL)
346                goto fail;
347
348            size = p0 - p;
349
350            if (size >= sizeof(s)) {
351                printf("Invalid config parameters\n");
352                goto fail;
353            }
354
355            rte_snprintf(s, sizeof(s), "%.*s", size, p);
356            nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',');
357
358            if (nb_token <= FLD_LCORE_TX) {
359                printf("Invalid config parameters\n");
360                goto fail;
361            }
362
363            for (i = 0; i < nb_token; i++) {
364                errno = 0;
365                int_fld[i] = strtoul(str_fld[i], &end, 0);
366                if (errno != 0 || end == str_fld[i]) {
367                    printf("Invalid config parameters\n");
368                    goto fail;
369                }
370            }
371
372            i = 0;
373            port_id = (uint8_t)int_fld[i++];
374
375            if (port_id >= RTE_MAX_ETHPORTS) {
376                printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS);
377                goto fail;
378            }
379
380            if (kni_port_params_array[port_id]) {
381                printf("Port %u has been configured\n", port_id);
382                goto fail;
383            }
384
385            kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE);
386            kni_port_params_array[port_id]->port_id = port_id;
387            kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++];
388            kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++];
389
390            if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) {
391                printf("lcore_rx %u or lcore_tx %u ID could not "
392                        "exceed the maximum %u\n",
393                        kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE);
394                goto fail;
395           }
396
397        for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++)
398            kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i];
399            kni_port_params_array[port_id]->nb_lcore_k = j;
400        }
401
402        print_config();
403
404        return 0;
405
406    fail:
407
408        for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
409            if (kni_port_params_array[i]) {
410                rte_free(kni_port_params_array[i]);
411                kni_port_params_array[i] = NULL;
412            }
413        }
414
415        return -1;
416
417    }
418
419Packet Forwarding
420~~~~~~~~~~~~~~~~~
421
422After the initialization steps are completed, the main_loop() function is run on each lcore.
423This function first checks the lcore_id against the user provided lcore_rx and lcore_tx
424to see if this lcore is reading from or writing to kernel NIC interfaces.
425
426For the case that reads from a NIC port and writes to the kernel NIC interfaces,
427the packet reception is the same as in L2 Forwarding sample application
428(see Section 9.4.6 "Receive, Process  and Transmit Packets").
429The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst().
430The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs.
431
432.. code-block:: c
433
434    /**
435     *   Interface to burst rx and enqueue mbufs into rx_q
436     */
437
438    static void
439    kni_ingress(struct kni_port_params *p)
440    {
441        uint8_t i, nb_kni, port_id;
442        unsigned nb_rx, num;
443        struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
444
445        if (p == NULL)
446            return;
447
448        nb_kni = p->nb_kni;
449        port_id = p->port_id;
450
451        for (i = 0; i < nb_kni; i++) {
452            /* Burst rx from eth */
453            nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
454            if (unlikely(nb_rx > PKT_BURST_SZ)) {
455                RTE_LOG(ERR, APP, "Error receiving from eth\n");
456                return;
457            }
458
459            /* Burst tx to kni */
460            num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
461            kni_stats[port_id].rx_packets += num;
462            rte_kni_handle_request(p->kni[i]);
463
464            if (unlikely(num < nb_rx)) {
465                /* Free mbufs not tx to kni interface */
466                kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num);
467                kni_stats[port_id].rx_dropped += nb_rx - num;
468            }
469        }
470    }
471
472For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading
473mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`.
474The packet transmission is the same as in the L2 Forwarding sample application
475(see Section 9.4.6 "Receive, Process and Transmit Packet's").
476
477.. code-block:: c
478
479    /**
480     *   Interface to dequeue mbufs from tx_q and burst tx
481     */
482
483    static void
484
485    kni_egress(struct kni_port_params *p)
486    {
487        uint8_t i, nb_kni, port_id;
488        unsigned nb_tx, num;
489        struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
490
491        if (p == NULL)
492            return;
493
494        nb_kni = p->nb_kni;
495        port_id = p->port_id;
496
497        for (i = 0; i < nb_kni; i++) {
498            /* Burst rx from kni */
499            num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ);
500            if (unlikely(num > PKT_BURST_SZ)) {
501                RTE_LOG(ERR, APP, "Error receiving from KNI\n");
502                return;
503            }
504
505            /* Burst tx to eth */
506
507            nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
508
509            kni_stats[port_id].tx_packets += nb_tx;
510
511            if (unlikely(nb_tx < num)) {
512                /* Free mbufs not tx to NIC */
513                kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
514                kni_stats[port_id].tx_dropped += num - nb_tx;
515            }
516        }
517    }
518
519Callbacks for Kernel Requests
520~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
521
522To execute specific PMD operations in user space requested by some Linux* commands,
523callbacks must be implemented and filled in the struct rte_kni_ops structure.
524Currently, setting a new MTU and configuring the network interface (up/ down) are supported.
525
526.. code-block:: c
527
528    static struct rte_kni_ops kni_ops = {
529        .change_mtu = kni_change_mtu,
530        .config_network_if = kni_config_network_interface,
531    };
532
533    /* Callback for request of changing MTU */
534
535    static int
536    kni_change_mtu(uint8_t port_id, unsigned new_mtu)
537    {
538        int ret;
539        struct rte_eth_conf conf;
540
541        if (port_id >= rte_eth_dev_count()) {
542            RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
543            return -EINVAL;
544        }
545
546        RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu);
547
548        /* Stop specific port */
549
550        rte_eth_dev_stop(port_id);
551
552        memcpy(&conf, &port_conf, sizeof(conf));
553
554        /* Set new MTU */
555
556        if (new_mtu > ETHER_MAX_LEN)
557            conf.rxmode.jumbo_frame = 1;
558        else
559            conf.rxmode.jumbo_frame = 0;
560
561        /* mtu + length of header + length of FCS = max pkt length */
562
563        conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE;
564
565        ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
566        if (ret < 0) {
567            RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
568            return ret;
569        }
570
571        /* Restart specific port */
572
573        ret = rte_eth_dev_start(port_id);
574        if (ret < 0) {
575             RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
576            return ret;
577        }
578
579        return 0;
580    }
581
582    /* Callback for request of configuring network interface up/down */
583
584    static int
585    kni_config_network_interface(uint8_t port_id, uint8_t if_up)
586    {
587        int ret = 0;
588
589        if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) {
590            RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
591            return -EINVAL;
592        }
593
594        RTE_LOG(INFO, APP, "Configure network interface of %d %s\n",
595
596        port_id, if_up ? "up" : "down");
597
598        if (if_up != 0) {
599            /* Configure network interface up */
600            rte_eth_dev_stop(port_id);
601            ret = rte_eth_dev_start(port_id);
602        } else /* Configure network interface down */
603            rte_eth_dev_stop(port_id);
604
605        if (ret < 0)
606            RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
607        return ret;
608    }
609