1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2010-2015 Intel Corporation. 3 4.. _kni: 5 6Kernel NIC Interface 7==================== 8 9The DPDK Kernel NIC Interface (KNI) allows userspace applications access to the Linux* control plane. 10 11The benefits of using the DPDK KNI are: 12 13* Faster than existing Linux TUN/TAP interfaces 14 (by eliminating system calls and copy_to_user()/copy_from_user() operations. 15 16* Allows management of DPDK ports using standard Linux net tools such as ethtool, ifconfig and tcpdump. 17 18* Allows an interface with the kernel network stack. 19 20The components of an application using the DPDK Kernel NIC Interface are shown in :numref:`figure_kernel_nic_intf`. 21 22.. _figure_kernel_nic_intf: 23 24.. figure:: img/kernel_nic_intf.* 25 26 Components of a DPDK KNI Application 27 28 29The DPDK KNI Kernel Module 30-------------------------- 31 32The KNI kernel loadable module ``rte_kni`` provides the kernel interface 33for DPDK applications. 34 35When the ``rte_kni`` module is loaded, it will create a device ``/dev/kni`` 36that is used by the DPDK KNI API functions to control and communicate with 37the kernel module. 38 39The ``rte_kni`` kernel module contains several optional parameters which 40can be specified when the module is loaded to control its behavior: 41 42.. code-block:: console 43 44 # modinfo rte_kni.ko 45 <snip> 46 parm: lo_mode: KNI loopback mode (default=lo_mode_none): 47 lo_mode_none Kernel loopback disabled 48 lo_mode_fifo Enable kernel loopback with fifo 49 lo_mode_fifo_skb Enable kernel loopback with fifo and skb buffer 50 (charp) 51 parm: kthread_mode: Kernel thread mode (default=single): 52 single Single kernel thread mode enabled. 53 multiple Multiple kernel thread mode enabled. 54 (charp) 55 parm: carrier: Default carrier state for KNI interface (default=off): 56 off Interfaces will be created with carrier state set to off. 57 on Interfaces will be created with carrier state set to on. 58 (charp) 59 parm: enable_bifurcated: Enable request processing support for 60 bifurcated drivers, which means releasing rtnl_lock before calling 61 userspace callback and supporting async requests (default=off): 62 on Enable request processing support for bifurcated drivers. 63 (charp) 64 parm: min_scheduling_interval: KNI thread min scheduling interval (default=100 microseconds) 65 (long) 66 parm: max_scheduling_interval: KNI thread max scheduling interval (default=200 microseconds) 67 (long) 68 69 70Loading the ``rte_kni`` kernel module without any optional parameters is 71the typical way a DPDK application gets packets into and out of the kernel 72network stack. Without any parameters, only one kernel thread is created 73for all KNI devices for packet receiving in kernel side, loopback mode is 74disabled, and the default carrier state of KNI interfaces is set to *off*. 75 76.. code-block:: console 77 78 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko 79 80.. _kni_loopback_mode: 81 82Loopback Mode 83~~~~~~~~~~~~~ 84 85For testing, the ``rte_kni`` kernel module can be loaded in loopback mode 86by specifying the ``lo_mode`` parameter: 87 88.. code-block:: console 89 90 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko lo_mode=lo_mode_fifo 91 92The ``lo_mode_fifo`` loopback option will loop back ring enqueue/dequeue 93operations in kernel space. 94 95.. code-block:: console 96 97 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko lo_mode=lo_mode_fifo_skb 98 99The ``lo_mode_fifo_skb`` loopback option will loop back ring enqueue/dequeue 100operations and sk buffer copies in kernel space. 101 102If the ``lo_mode`` parameter is not specified, loopback mode is disabled. 103 104.. _kni_kernel_thread_mode: 105 106Kernel Thread Mode 107~~~~~~~~~~~~~~~~~~ 108 109To provide flexibility of performance, the ``rte_kni`` KNI kernel module 110can be loaded with the ``kthread_mode`` parameter. The ``rte_kni`` kernel 111module supports two options: "single kernel thread" mode and "multiple 112kernel thread" mode. 113 114Single kernel thread mode is enabled as follows: 115 116.. code-block:: console 117 118 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko kthread_mode=single 119 120This mode will create only one kernel thread for all KNI interfaces to 121receive data on the kernel side. By default, this kernel thread is not 122bound to any particular core, but the user can set the core affinity for 123this kernel thread by setting the ``core_id`` and ``force_bind`` parameters 124in ``struct rte_kni_conf`` when the first KNI interface is created: 125 126For optimum performance, the kernel thread should be bound to a core in 127on the same socket as the DPDK lcores used in the application. 128 129The KNI kernel module can also be configured to start a separate kernel 130thread for each KNI interface created by the DPDK application. Multiple 131kernel thread mode is enabled as follows: 132 133.. code-block:: console 134 135 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko kthread_mode=multiple 136 137This mode will create a separate kernel thread for each KNI interface to 138receive data on the kernel side. The core affinity of each ``kni_thread`` 139kernel thread can be specified by setting the ``core_id`` and ``force_bind`` 140parameters in ``struct rte_kni_conf`` when each KNI interface is created. 141 142Multiple kernel thread mode can provide scalable higher performance if 143sufficient unused cores are available on the host system. 144 145If the ``kthread_mode`` parameter is not specified, the "single kernel 146thread" mode is used. 147 148.. _kni_default_carrier_state: 149 150Default Carrier State 151~~~~~~~~~~~~~~~~~~~~~ 152 153The default carrier state of KNI interfaces created by the ``rte_kni`` 154kernel module is controlled via the ``carrier`` option when the module 155is loaded. 156 157If ``carrier=off`` is specified, the kernel module will leave the carrier 158state of the interface *down* when the interface is management enabled. 159The DPDK application can set the carrier state of the KNI interface using the 160``rte_kni_update_link()`` function. This is useful for DPDK applications 161which require that the carrier state of the KNI interface reflect the 162actual link state of the corresponding physical NIC port. 163 164If ``carrier=on`` is specified, the kernel module will automatically set 165the carrier state of the interface to *up* when the interface is management 166enabled. This is useful for DPDK applications which use the KNI interface as 167a purely virtual interface that does not correspond to any physical hardware 168and do not wish to explicitly set the carrier state of the interface with 169``rte_kni_update_link()``. It is also useful for testing in loopback mode 170where the NIC port may not be physically connected to anything. 171 172To set the default carrier state to *on*: 173 174.. code-block:: console 175 176 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko carrier=on 177 178To set the default carrier state to *off*: 179 180.. code-block:: console 181 182 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko carrier=off 183 184If the ``carrier`` parameter is not specified, the default carrier state 185of KNI interfaces will be set to *off*. 186 187.. _kni_bifurcated_device_support: 188 189Bifurcated Device Support 190~~~~~~~~~~~~~~~~~~~~~~~~~ 191 192User callbacks are executed while kernel module holds the ``rtnl`` lock, this 193causes a deadlock when callbacks run control commands on another Linux kernel 194network interface. 195 196Bifurcated devices has kernel network driver part and to prevent deadlock for 197them ``enable_bifurcated`` is used. 198 199To enable bifurcated device support: 200 201.. code-block:: console 202 203 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko enable_bifurcated=on 204 205Enabling bifurcated device support releases ``rtnl`` lock before calling 206callback and locks it back after callback. Also enables asynchronous request to 207support callbacks that requires rtnl lock to work (interface down). 208 209KNI Kthread Scheduling 210~~~~~~~~~~~~~~~~~~~~~~ 211 212The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters 213control the rescheduling interval of the KNI kthreads. 214 215This might be useful if we have use cases in which we require improved 216latency or performance for control plane traffic. 217 218The implementation is backed by Linux High Precision Timers, and uses ``usleep_range``. 219Hence, it will have the same granularity constraints as this Linux subsystem. 220 221For Linux High Precision Timers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ 222 223To set the ``min_scheduling_interval`` to a value of 100 microseconds: 224 225.. code-block:: console 226 227 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 228 229To set the ``max_scheduling_interval`` to a value of 200 microseconds: 230 231.. code-block:: console 232 233 # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 234 235If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are 236not specified, the default interval limits will be set to *100* and *200* respectively. 237 238KNI Creation and Deletion 239------------------------- 240 241Before any KNI interfaces can be created, the ``rte_kni`` kernel module must 242be loaded into the kernel and configured with the ``rte_kni_init()`` function. 243 244The KNI interfaces are created by a DPDK application dynamically via the 245``rte_kni_alloc()`` function. 246 247The ``struct rte_kni_conf`` structure contains fields which allow the 248user to specify the interface name, set the MTU size, set an explicit or 249random MAC address and control the affinity of the kernel Rx thread(s) 250(both single and multi-threaded modes). 251By default the KNI sample example gets the MTU from the matching device, 252and in case of KNI PMD it is derived from mbuf buffer length. 253 254The ``struct rte_kni_ops`` structure contains pointers to functions to 255handle requests from the ``rte_kni`` kernel module. These functions 256allow DPDK applications to perform actions when the KNI interfaces are 257manipulated by control commands or functions external to the application. 258 259For example, the DPDK application may wish to enabled/disable a physical 260NIC port when a user enabled/disables a KNI interface with ``ip link set 261[up|down] dev <ifaceX>``. The DPDK application can register a callback for 262``config_network_if`` which will be called when the interface management 263state changes. 264 265There are currently four callbacks for which the user can register 266application functions: 267 268``config_network_if``: 269 270 Called when the management state of the KNI interface changes. 271 For example, when the user runs ``ip link set [up|down] dev <ifaceX>``. 272 273``change_mtu``: 274 275 Called when the user changes the MTU size of the KNI 276 interface. For example, when the user runs ``ip link set mtu <size> 277 dev <ifaceX>``. 278 279``config_mac_address``: 280 281 Called when the user changes the MAC address of the KNI interface. 282 For example, when the user runs ``ip link set address <MAC> 283 dev <ifaceX>``. If the user sets this callback function to NULL, 284 but sets the ``port_id`` field to a value other than -1, a default 285 callback handler in the rte_kni library ``kni_config_mac_address()`` 286 will be called which calls ``rte_eth_dev_default_mac_addr_set()`` 287 on the specified ``port_id``. 288 289``config_promiscusity``: 290 291 Called when the user changes the promiscuity state of the KNI 292 interface. For example, when the user runs ``ip link set promisc 293 [on|off] dev <ifaceX>``. If the user sets this callback function to 294 NULL, but sets the ``port_id`` field to a value other than -1, a default 295 callback handler in the rte_kni library ``kni_config_promiscusity()`` 296 will be called which calls ``rte_eth_promiscuous_<enable|disable>()`` 297 on the specified ``port_id``. 298 299``config_allmulticast``: 300 301 Called when the user changes the allmulticast state of the KNI interface. 302 For example, when the user runs ``ifconfig <ifaceX> [-]allmulti``. If the 303 user sets this callback function to NULL, but sets the ``port_id`` field to 304 a value other than -1, a default callback handler in the rte_kni library 305 ``kni_config_allmulticast()`` will be called which calls 306 ``rte_eth_allmulticast_<enable|disable>()`` on the specified ``port_id``. 307 308In order to run these callbacks, the application must periodically call 309the ``rte_kni_handle_request()`` function. Any user callback function 310registered will be called directly from ``rte_kni_handle_request()`` so 311care must be taken to prevent deadlock and to not block any DPDK fastpath 312tasks. Typically DPDK applications which use these callbacks will need 313to create a separate thread or secondary process to periodically call 314``rte_kni_handle_request()``. 315 316The KNI interfaces can be deleted by a DPDK application with 317``rte_kni_release()``. All KNI interfaces not explicitly deleted will be 318deleted when the ``/dev/kni`` device is closed, either explicitly with 319``rte_kni_close()`` or when the DPDK application is closed. 320 321DPDK mbuf Flow 322-------------- 323 324To minimize the amount of DPDK code running in kernel space, the mbuf mempool is managed in userspace only. 325The kernel module will be aware of mbufs, 326but all mbuf allocation and free operations will be handled by the DPDK application only. 327 328:numref:`figure_pkt_flow_kni` shows a typical scenario with packets sent in both directions. 329 330.. _figure_pkt_flow_kni: 331 332.. figure:: img/pkt_flow_kni.* 333 334 Packet Flow via mbufs in the DPDK KNI 335 336 337Use Case: Ingress 338----------------- 339 340On the DPDK RX side, the mbuf is allocated by the PMD in the RX thread context. 341This thread will enqueue the mbuf in the rx_q FIFO, 342and the next pointers in mbuf-chain will convert to physical address. 343The KNI thread will poll all KNI active devices for the rx_q. 344If an mbuf is dequeued, it will be converted to a sk_buff and sent to the net stack via netif_rx(). 345The dequeued mbuf must be freed, so the same pointer is sent back in the free_q FIFO, 346and next pointers must convert back to virtual address if exists before put in the free_q FIFO. 347 348The RX thread, in the same main loop, polls this FIFO and frees the mbuf after dequeuing it. 349The address conversion of the next pointer is to prevent the chained mbuf 350in different hugepage segments from causing kernel crash. 351 352Use Case: Egress 353---------------- 354 355For packet egress the DPDK application must first enqueue several mbufs to create an mbuf cache on the kernel side. 356 357The packet is received from the Linux net stack, by calling the kni_net_tx() callback. 358The mbuf is dequeued (without waiting due the cache) and filled with data from sk_buff. 359The sk_buff is then freed and the mbuf sent in the tx_q FIFO. 360 361The DPDK TX thread dequeues the mbuf and sends it to the PMD via ``rte_eth_tx_burst()``. 362It then puts the mbuf back in the cache. 363 364IOVA = VA: Support 365------------------ 366 367KNI operates in IOVA_VA scheme when 368 369- LINUX_VERSION_CODE >= KERNEL_VERSION(4, 10, 0) and 370- EAL option `iova-mode=va` is passed or bus IOVA scheme in the DPDK is selected 371 as RTE_IOVA_VA. 372 373Due to IOVA to KVA address translations, based on the KNI use case there 374can be a performance impact. For mitigation, forcing IOVA to PA via EAL 375"--iova-mode=pa" option can be used, IOVA_DC bus iommu scheme can also 376result in IOVA as PA. 377 378Ethtool 379------- 380 381Ethtool is a Linux-specific tool with corresponding support in the kernel. 382The current version of kni provides minimal ethtool functionality 383including querying version and link state. It does not support link 384control, statistics, or dumping device registers. 385