1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2010-2015 Intel Corporation. 3 4.. _kni: 5 6Kernel NIC Interface 7==================== 8 9The DPDK Kernel NIC Interface (KNI) allows userspace applications access to the Linux* control plane. 10 11The benefits of using the DPDK KNI are: 12 13* Faster than existing Linux TUN/TAP interfaces 14 (by eliminating system calls and copy_to_user()/copy_from_user() operations. 15 16* Allows management of DPDK ports using standard Linux net tools such as ethtool, ifconfig and tcpdump. 17 18* Allows an interface with the kernel network stack. 19 20The components of an application using the DPDK Kernel NIC Interface are shown in :numref:`figure_kernel_nic_intf`. 21 22.. _figure_kernel_nic_intf: 23 24.. figure:: img/kernel_nic_intf.* 25 26 Components of a DPDK KNI Application 27 28 29The DPDK KNI Kernel Module 30-------------------------- 31 32The KNI kernel loadable module ``rte_kni`` provides the kernel interface 33for DPDK applications. 34 35When the ``rte_kni`` module is loaded, it will create a device ``/dev/kni`` 36that is used by the DPDK KNI API functions to control and communicate with 37the kernel module. 38 39The ``rte_kni`` kernel module contains several optional parameters which 40can be specified when the module is loaded to control its behavior: 41 42.. code-block:: console 43 44 # modinfo rte_kni.ko 45 <snip> 46 parm: lo_mode: KNI loopback mode (default=lo_mode_none): 47 lo_mode_none Kernel loopback disabled 48 lo_mode_fifo Enable kernel loopback with fifo 49 lo_mode_fifo_skb Enable kernel loopback with fifo and skb buffer 50 (charp) 51 parm: kthread_mode: Kernel thread mode (default=single): 52 single Single kernel thread mode enabled. 53 multiple Multiple kernel thread mode enabled. 54 (charp) 55 parm: carrier: Default carrier state for KNI interface (default=off): 56 off Interfaces will be created with carrier state set to off. 57 on Interfaces will be created with carrier state set to on. 58 (charp) 59 60Loading the ``rte_kni`` kernel module without any optional parameters is 61the typical way a DPDK application gets packets into and out of the kernel 62network stack. Without any parameters, only one kernel thread is created 63for all KNI devices for packet receiving in kernel side, loopback mode is 64disabled, and the default carrier state of KNI interfaces is set to *off*. 65 66.. code-block:: console 67 68 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko 69 70.. _kni_loopback_mode: 71 72Loopback Mode 73~~~~~~~~~~~~~ 74 75For testing, the ``rte_kni`` kernel module can be loaded in loopback mode 76by specifying the ``lo_mode`` parameter: 77 78.. code-block:: console 79 80 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko lo_mode=lo_mode_fifo 81 82The ``lo_mode_fifo`` loopback option will loop back ring enqueue/dequeue 83operations in kernel space. 84 85.. code-block:: console 86 87 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko lo_mode=lo_mode_fifo_skb 88 89The ``lo_mode_fifo_skb`` loopback option will loop back ring enqueue/dequeue 90operations and sk buffer copies in kernel space. 91 92If the ``lo_mode`` parameter is not specified, loopback mode is disabled. 93 94.. _kni_kernel_thread_mode: 95 96Kernel Thread Mode 97~~~~~~~~~~~~~~~~~~ 98 99To provide flexibility of performance, the ``rte_kni`` KNI kernel module 100can be loaded with the ``kthread_mode`` parameter. The ``rte_kni`` kernel 101module supports two options: "single kernel thread" mode and "multiple 102kernel thread" mode. 103 104Single kernel thread mode is enabled as follows: 105 106.. code-block:: console 107 108 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko kthread_mode=single 109 110This mode will create only one kernel thread for all KNI interfaces to 111receive data on the kernel side. By default, this kernel thread is not 112bound to any particular core, but the user can set the core affinity for 113this kernel thread by setting the ``core_id`` and ``force_bind`` parameters 114in ``struct rte_kni_conf`` when the first KNI interface is created: 115 116For optimum performance, the kernel thread should be bound to a core in 117on the same socket as the DPDK lcores used in the application. 118 119The KNI kernel module can also be configured to start a separate kernel 120thread for each KNI interface created by the DPDK application. Multiple 121kernel thread mode is enabled as follows: 122 123.. code-block:: console 124 125 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko kthread_mode=multiple 126 127This mode will create a separate kernel thread for each KNI interface to 128receive data on the kernel side. The core affinity of each ``kni_thread`` 129kernel thread can be specified by setting the ``core_id`` and ``force_bind`` 130parameters in ``struct rte_kni_conf`` when each KNI interface is created. 131 132Multiple kernel thread mode can provide scalable higher performance if 133sufficient unused cores are available on the host system. 134 135If the ``kthread_mode`` parameter is not specified, the "single kernel 136thread" mode is used. 137 138.. _kni_default_carrier_state: 139 140Default Carrier State 141~~~~~~~~~~~~~~~~~~~~~ 142 143The default carrier state of KNI interfaces created by the ``rte_kni`` 144kernel module is controlled via the ``carrier`` option when the module 145is loaded. 146 147If ``carrier=off`` is specified, the kernel module will leave the carrier 148state of the interface *down* when the interface is management enabled. 149The DPDK application can set the carrier state of the KNI interface using the 150``rte_kni_update_link()`` function. This is useful for DPDK applications 151which require that the carrier state of the KNI interface reflect the 152actual link state of the corresponding physical NIC port. 153 154If ``carrier=on`` is specified, the kernel module will automatically set 155the carrier state of the interface to *up* when the interface is management 156enabled. This is useful for DPDK applications which use the KNI interface as 157a purely virtual interface that does not correspond to any physical hardware 158and do not wish to explicitly set the carrier state of the interface with 159``rte_kni_update_link()``. It is also useful for testing in loopback mode 160where the NIC port may not be physically connected to anything. 161 162To set the default carrier state to *on*: 163 164.. code-block:: console 165 166 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko carrier=on 167 168To set the default carrier state to *off*: 169 170.. code-block:: console 171 172 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko carrier=off 173 174If the ``carrier`` parameter is not specified, the default carrier state 175of KNI interfaces will be set to *off*. 176 177KNI Creation and Deletion 178------------------------- 179 180Before any KNI interfaces can be created, the ``rte_kni`` kernel module must 181be loaded into the kernel and configured with the ``rte_kni_init()`` function. 182 183The KNI interfaces are created by a DPDK application dynamically via the 184``rte_kni_alloc()`` function. 185 186The ``struct rte_kni_conf`` structure contains fields which allow the 187user to specify the interface name, set the MTU size, set an explicit or 188random MAC address and control the affinity of the kernel Rx thread(s) 189(both single and multi-threaded modes). 190By default the KNI sample example gets the MTU from the matching device, 191and in case of KNI PMD it is derived from mbuf buffer length. 192 193The ``struct rte_kni_ops`` structure contains pointers to functions to 194handle requests from the ``rte_kni`` kernel module. These functions 195allow DPDK applications to perform actions when the KNI interfaces are 196manipulated by control commands or functions external to the application. 197 198For example, the DPDK application may wish to enabled/disable a physical 199NIC port when a user enabled/disables a KNI interface with ``ip link set 200[up|down] dev <ifaceX>``. The DPDK application can register a callback for 201``config_network_if`` which will be called when the interface management 202state changes. 203 204There are currently four callbacks for which the user can register 205application functions: 206 207``config_network_if``: 208 209 Called when the management state of the KNI interface changes. 210 For example, when the user runs ``ip link set [up|down] dev <ifaceX>``. 211 212``change_mtu``: 213 214 Called when the user changes the MTU size of the KNI 215 interface. For example, when the user runs ``ip link set mtu <size> 216 dev <ifaceX>``. 217 218``config_mac_address``: 219 220 Called when the user changes the MAC address of the KNI interface. 221 For example, when the user runs ``ip link set address <MAC> 222 dev <ifaceX>``. If the user sets this callback function to NULL, 223 but sets the ``port_id`` field to a value other than -1, a default 224 callback handler in the rte_kni library ``kni_config_mac_address()`` 225 will be called which calls ``rte_eth_dev_default_mac_addr_set()`` 226 on the specified ``port_id``. 227 228``config_promiscusity``: 229 230 Called when the user changes the promiscuity state of the KNI 231 interface. For example, when the user runs ``ip link set promisc 232 [on|off] dev <ifaceX>``. If the user sets this callback function to 233 NULL, but sets the ``port_id`` field to a value other than -1, a default 234 callback handler in the rte_kni library ``kni_config_promiscusity()`` 235 will be called which calls ``rte_eth_promiscuous_<enable|disable>()`` 236 on the specified ``port_id``. 237 238``config_allmulticast``: 239 240 Called when the user changes the allmulticast state of the KNI interface. 241 For example, when the user runs ``ifconfig <ifaceX> [-]allmulti``. If the 242 user sets this callback function to NULL, but sets the ``port_id`` field to 243 a value other than -1, a default callback handler in the rte_kni library 244 ``kni_config_allmulticast()`` will be called which calls 245 ``rte_eth_allmulticast_<enable|disable>()`` on the specified ``port_id``. 246 247In order to run these callbacks, the application must periodically call 248the ``rte_kni_handle_request()`` function. Any user callback function 249registered will be called directly from ``rte_kni_handle_request()`` so 250care must be taken to prevent deadlock and to not block any DPDK fastpath 251tasks. Typically DPDK applications which use these callbacks will need 252to create a separate thread or secondary process to periodically call 253``rte_kni_handle_request()``. 254 255The KNI interfaces can be deleted by a DPDK application with 256``rte_kni_release()``. All KNI interfaces not explicitly deleted will be 257deleted when the ``/dev/kni`` device is closed, either explicitly with 258``rte_kni_close()`` or when the DPDK application is closed. 259 260DPDK mbuf Flow 261-------------- 262 263To minimize the amount of DPDK code running in kernel space, the mbuf mempool is managed in userspace only. 264The kernel module will be aware of mbufs, 265but all mbuf allocation and free operations will be handled by the DPDK application only. 266 267:numref:`figure_pkt_flow_kni` shows a typical scenario with packets sent in both directions. 268 269.. _figure_pkt_flow_kni: 270 271.. figure:: img/pkt_flow_kni.* 272 273 Packet Flow via mbufs in the DPDK KNI 274 275 276Use Case: Ingress 277----------------- 278 279On the DPDK RX side, the mbuf is allocated by the PMD in the RX thread context. 280This thread will enqueue the mbuf in the rx_q FIFO, 281and the next pointers in mbuf-chain will convert to physical address. 282The KNI thread will poll all KNI active devices for the rx_q. 283If an mbuf is dequeued, it will be converted to a sk_buff and sent to the net stack via netif_rx(). 284The dequeued mbuf must be freed, so the same pointer is sent back in the free_q FIFO, 285and next pointers must convert back to virtual address if exists before put in the free_q FIFO. 286 287The RX thread, in the same main loop, polls this FIFO and frees the mbuf after dequeuing it. 288The address conversion of the next pointer is to prevent the chained mbuf 289in different hugepage segments from causing kernel crash. 290 291Use Case: Egress 292---------------- 293 294For packet egress the DPDK application must first enqueue several mbufs to create an mbuf cache on the kernel side. 295 296The packet is received from the Linux net stack, by calling the kni_net_tx() callback. 297The mbuf is dequeued (without waiting due the cache) and filled with data from sk_buff. 298The sk_buff is then freed and the mbuf sent in the tx_q FIFO. 299 300The DPDK TX thread dequeues the mbuf and sends it to the PMD via ``rte_eth_tx_burst()``. 301It then puts the mbuf back in the cache. 302 303IOVA = VA: Support 304------------------ 305 306KNI operates in IOVA_VA scheme when 307 308- LINUX_VERSION_CODE >= KERNEL_VERSION(4, 10, 0) and 309- EAL option `iova-mode=va` is passed or bus IOVA scheme in the DPDK is selected 310 as RTE_IOVA_VA. 311 312Due to IOVA to KVA address translations, based on the KNI use case there 313can be a performance impact. For mitigation, forcing IOVA to PA via EAL 314"--iova-mode=pa" option can be used, IOVA_DC bus iommu scheme can also 315result in IOVA as PA. 316 317Ethtool 318------- 319 320Ethtool is a Linux-specific tool with corresponding support in the kernel. 321The current version of kni provides minimal ethtool functionality 322including querying version and link state. It does not support link 323control, statistics, or dumping device registers. 324