1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright 2015 6WIND S.A. 3 Copyright 2015 Mellanox Technologies, Ltd 4 5.. include:: <isonum.txt> 6 7MLX5 poll mode driver 8===================== 9 10The MLX5 poll mode driver library (**librte_net_mlx5**) provides support 11for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** , **Mellanox 12ConnectX-5**, **Mellanox ConnectX-6**, **Mellanox ConnectX-6 Dx**, **Mellanox 13ConnectX-6 Lx**, **Mellanox BlueField** and **Mellanox BlueField-2** families 14of 10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) 15in SR-IOV context. 16 17Information and documentation about these adapters can be found on the 18`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the 19`Mellanox community <http://community.mellanox.com/welcome>`__. 20 21There is also a `section dedicated to this poll mode driver 22<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__. 23 24 25Design 26------ 27 28Besides its dependency on libibverbs (that implies libmlx5 and associated 29kernel support), librte_net_mlx5 relies heavily on system calls for control 30operations such as querying/updating the MTU and flow control parameters. 31 32For security reasons and robustness, this driver only deals with virtual 33memory addresses. The way resources allocations are handled by the kernel, 34combined with hardware specifications that allow to handle virtual memory 35addresses directly, ensure that DPDK applications cannot access random 36physical memory (or memory that does not belong to the current process). 37 38This capability allows the PMD to coexist with kernel network interfaces 39which remain functional, although they stop receiving unicast packets as 40long as they share the same MAC address. 41This means legacy linux control tools (for example: ethtool, ifconfig and 42more) can operate on the same network interfaces that owned by the DPDK 43application. 44 45The PMD can use libibverbs and libmlx5 to access the device firmware 46or directly the hardware components. 47There are different levels of objects and bypassing abilities 48to get the best performances: 49 50- Verbs is a complete high-level generic API 51- Direct Verbs is a device-specific API 52- DevX allows to access firmware objects 53- Direct Rules manages flow steering at low-level hardware layer 54 55Enabling librte_net_mlx5 causes DPDK applications to be linked against 56libibverbs. 57 58Features 59-------- 60 61- Multi arch support: x86_64, POWER8, ARMv8, i686. 62- Multiple TX and RX queues. 63- Support for scattered TX frames. 64- Advanced support for scattered Rx frames with tunable buffer attributes. 65- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. 66- RSS using different combinations of fields: L3 only, L4 only or both, 67 and source only, destination only or both. 68- Several RSS hash keys, one for each flow type. 69- Default RSS operation with no hash key specification. 70- Configurable RETA table. 71- Link flow control (pause frame). 72- Support for multiple MAC addresses. 73- VLAN filtering. 74- RX VLAN stripping. 75- TX VLAN insertion. 76- RX CRC stripping configuration. 77- Promiscuous mode on PF and VF. 78- Multicast promiscuous mode on PF and VF. 79- Hardware checksum offloads. 80- Flow director (RTE_FDIR_MODE_PERFECT, RTE_FDIR_MODE_PERFECT_MAC_VLAN and 81 RTE_ETH_FDIR_REJECT). 82- Flow API, including :ref:`flow_isolated_mode`. 83- Multiple process. 84- KVM and VMware ESX SR-IOV modes are supported. 85- RSS hash result is supported. 86- Hardware TSO for generic IP or UDP tunnel, including VXLAN and GRE. 87- Hardware checksum Tx offload for generic IP or UDP tunnel, including VXLAN and GRE. 88- RX interrupts. 89- Statistics query including Basic, Extended and per queue. 90- Rx HW timestamp. 91- Tunnel types: VXLAN, L3 VXLAN, VXLAN-GPE, GRE, MPLSoGRE, MPLSoUDP, IP-in-IP, Geneve, GTP. 92- Tunnel HW offloads: packet type, inner/outer RSS, IP and UDP checksum verification. 93- NIC HW offloads: encapsulation (vxlan, gre, mplsoudp, mplsogre), NAT, routing, TTL 94 increment/decrement, count, drop, mark. For details please see :ref:`mlx5_offloads_support`. 95- Flow insertion rate of more then million flows per second, when using Direct Rules. 96- Support for multiple rte_flow groups. 97- Per packet no-inline hint flag to disable packet data copying into Tx descriptors. 98- Hardware LRO. 99- Hairpin. 100- Multiple-thread flow insertion. 101 102Limitations 103----------- 104 105- For secondary process: 106 107 - Forked secondary process not supported. 108 - External memory unregistered in EAL memseg list cannot be used for DMA 109 unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in 110 primary process and remapped to the same virtual address in secondary 111 process. If the external memory is registered by primary process but has 112 different virtual address in secondary process, unexpected error may happen. 113 114- When using Verbs flow engine (``dv_flow_en`` = 0), flow pattern without any 115 specific VLAN will match for VLAN packets as well: 116 117 When VLAN spec is not specified in the pattern, the matching rule will be created with VLAN as a wild card. 118 Meaning, the flow rule:: 119 120 flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ... 121 122 Will only match vlan packets with vid=3. and the flow rule:: 123 124 flow create 0 ingress pattern eth / ipv4 / end ... 125 126 Will match any ipv4 packet (VLAN included). 127 128- When using Verbs flow engine (``dv_flow_en`` = 0), multi-tagged(QinQ) match is not supported. 129 130- When using DV flow engine (``dv_flow_en`` = 1), flow pattern with any VLAN specification will match only single-tagged packets unless the ETH item ``type`` field is 0x88A8 or the VLAN item ``has_more_vlan`` field is 1. 131 The flow rule:: 132 133 flow create 0 ingress pattern eth / ipv4 / end ... 134 135 Will match any ipv4 packet. 136 The flow rules:: 137 138 flow create 0 ingress pattern eth / vlan / end ... 139 flow create 0 ingress pattern eth has_vlan is 1 / end ... 140 flow create 0 ingress pattern eth type is 0x8100 / end ... 141 142 Will match single-tagged packets only, with any VLAN ID value. 143 The flow rules:: 144 145 flow create 0 ingress pattern eth type is 0x88A8 / end ... 146 flow create 0 ingress pattern eth / vlan has_more_vlan is 1 / end ... 147 148 Will match multi-tagged packets only, with any VLAN ID value. 149 150- A flow pattern with 2 sequential VLAN items is not supported. 151 152- VLAN pop offload command: 153 154 - Flow rules having a VLAN pop offload command as one of their actions and 155 are lacking a match on VLAN as one of their items are not supported. 156 - The command is not supported on egress traffic. 157 158- VLAN push offload is not supported on ingress traffic. 159 160- VLAN set PCP offload is not supported on existing headers. 161 162- A multi segment packet must have not more segments than reported by dev_infos_get() 163 in tx_desc_lim.nb_seg_max field. This value depends on maximal supported Tx descriptor 164 size and ``txq_inline_min`` settings and may be from 2 (worst case forced by maximal 165 inline settings) to 58. 166 167- Flows with a VXLAN Network Identifier equal (or ends to be equal) 168 to 0 are not supported. 169 170- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP. 171 172- Match on Geneve header supports the following fields only: 173 174 - VNI 175 - OAM 176 - protocol type 177 - options length 178 Currently, the only supported options length value is 0. 179 180- VF: flow rules created on VF devices can only match traffic targeted at the 181 configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``). 182 183- Match on GTP tunnel header item supports the following fields only: 184 185 - v_pt_rsv_flags: E flag, S flag, PN flag 186 - msg_type 187 - teid 188 189- No Tx metadata go to the E-Switch steering domain for the Flow group 0. 190 The flows within group 0 and set metadata action are rejected by hardware. 191 192.. note:: 193 194 MAC addresses not already present in the bridge table of the associated 195 kernel network device will be added and cleaned up by the PMD when closing 196 the device. In case of ungraceful program termination, some entries may 197 remain present and should be removed manually by other means. 198 199- Buffer split offload is supported with regular Rx burst routine only, 200 no MPRQ feature or vectorized code can be engaged. 201 202- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be 203 externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in 204 ol_flags. As the mempool for the external buffer is managed by PMD, all the 205 Rx mbufs must be freed before the device is closed. Otherwise, the mempool of 206 the external buffers will be freed by PMD and the application which still 207 holds the external buffers may be corrupted. 208 209- If Multi-Packet Rx queue is configured (``mprq_en``) and Rx CQE compression is 210 enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully 211 supported. Some Rx packets may not have PKT_RX_RSS_HASH. 212 213- IPv6 Multicast messages are not supported on VM, while promiscuous mode 214 and allmulticast mode are both set to off. 215 To receive IPv6 Multicast messages on VM, explicitly set the relevant 216 MAC address using rte_eth_dev_mac_addr_add() API. 217 218- To support a mixed traffic pattern (some buffers from local host memory, some 219 buffers from other devices) with high bandwidth, a mbuf flag is used. 220 221 An application hints the PMD whether or not it should try to inline the 222 given mbuf data buffer. PMD should do the best effort to act upon this request. 223 224 The hint flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` is dynamic, 225 registered by application with rte_mbuf_dynflag_register(). This flag is 226 purely driver-specific and declared in PMD specific header ``rte_pmd_mlx5.h``, 227 which is intended to be used by the application. 228 229 To query the supported specific flags in runtime, 230 the function ``rte_pmd_mlx5_get_dyn_flag_names`` returns the array of 231 currently (over present hardware and configuration) supported specific flags. 232 The "not inline hint" feature operating flow is the following one: 233 234 - application starts 235 - probe the devices, ports are created 236 - query the port capabilities 237 - if port supporting the feature is found 238 - register dynamic flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` 239 - application starts the ports 240 - on ``dev_start()`` PMD checks whether the feature flag is registered and 241 enables the feature support in datapath 242 - application might set the registered flag bit in ``ol_flags`` field 243 of mbuf being sent and PMD will handle ones appropriately. 244 245- The amount of descriptors in Tx queue may be limited by data inline settings. 246 Inline data require the more descriptor building blocks and overall block 247 amount may exceed the hardware supported limits. The application should 248 reduce the requested Tx size or adjust data inline settings with 249 ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys. 250 251- To provide the packet send scheduling on mbuf timestamps the ``tx_pp`` 252 parameter should be specified. 253 When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME set on the packet 254 being sent it tries to synchronize the time of packet appearing on 255 the wire with the specified packet timestamp. It the specified one 256 is in the past it should be ignored, if one is in the distant future 257 it should be capped with some reasonable value (in range of seconds). 258 These specific cases ("too late" and "distant future") can be optionally 259 reported via device xstats to assist applications to detect the 260 time-related problems. 261 262 The timestamp upper "too-distant-future" limit 263 at the moment of invoking the Tx burst routine 264 can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23. 265 Please note, for the testpmd txonly mode, 266 the limit is deduced from the expression:: 267 268 (n_tx_descriptors / burst_size + 1) * inter_burst_gap 269 270 There is no any packet reordering according timestamps is supposed, 271 neither within packet burst, nor between packets, it is an entirely 272 application responsibility to generate packets and its timestamps 273 in desired order. The timestamps can be put only in the first packet 274 in the burst providing the entire burst scheduling. 275 276- E-Switch decapsulation Flow: 277 278 - can be applied to PF port only. 279 - must specify VF port action (packet redirection from PF to VF). 280 - optionally may specify tunnel inner source and destination MAC addresses. 281 282- E-Switch encapsulation Flow: 283 284 - can be applied to VF ports only. 285 - must specify PF port action (packet redirection from VF to PF). 286 287- Raw encapsulation: 288 289 - The input buffer, used as outer header, is not validated. 290 291- Raw decapsulation: 292 293 - The decapsulation is always done up to the outermost tunnel detected by the HW. 294 - The input buffer, providing the removal size, is not validated. 295 - The buffer size must match the length of the headers to be removed. 296 297- ICMP(code/type/identifier/sequence number) / ICMP6(code/type) matching, IP-in-IP and MPLS flow matching are all 298 mutually exclusive features which cannot be supported together 299 (see :ref:`mlx5_firmware_config`). 300 301- LRO: 302 303 - Requires DevX and DV flow to be enabled. 304 - KEEP_CRC offload cannot be supported with LRO. 305 - The first mbuf length, without head-room, must be big enough to include the 306 TCP header (122B). 307 - Rx queue with LRO offload enabled, receiving a non-LRO packet, can forward 308 it with size limited to max LRO size, not to max RX packet length. 309 - LRO can be used with outer header of TCP packets of the standard format: 310 eth (with or without vlan) / ipv4 or ipv6 / tcp / payload 311 312 Other TCP packets (e.g. with MPLS label) received on Rx queue with LRO enabled, will be received with bad checksum. 313 - LRO packet aggregation is performed by HW only for packet size larger than 314 ``lro_min_mss_size``. This value is reported on device start, when debug 315 mode is enabled. 316 317- CRC: 318 319 - ``DEV_RX_OFFLOAD_KEEP_CRC`` cannot be supported with decapsulation 320 for some NICs (such as ConnectX-6 Dx, ConnectX-6 Lx, and BlueField-2). 321 The capability bit ``scatter_fcs_w_decap_disable`` shows NIC support. 322 323- Sample flow: 324 325 - Supports ``RTE_FLOW_ACTION_TYPE_SAMPLE`` action only within NIC Rx and E-Switch steering domain. 326 - The E-Switch Sample flow must have the eswitch_manager VPORT destination (PF or ECPF) and no additional actions. 327 - For ConnectX-5, the ``RTE_FLOW_ACTION_TYPE_SAMPLE`` is typically used as first action in the E-Switch egress flow if with header modify or encapsulation actions. 328 329- IPv6 header item 'proto' field, indicating the next header protocol, should 330 not be set as extension header. 331 In case the next header is an extension header, it should not be specified in 332 IPv6 header item 'proto' field. 333 The last extension header item 'next header' field can specify the following 334 header protocol type. 335 336- Hairpin: 337 338 - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported. 339 - Hairpin in switchdev SR-IOV mode is not supported till now. 340 341Statistics 342---------- 343 344MLX5 supports various methods to report statistics: 345 346Port statistics can be queried using ``rte_eth_stats_get()``. The received and sent statistics are through SW only and counts the number of packets received or sent successfully by the PMD. The imissed counter is the amount of packets that could not be delivered to SW because a queue was full. Packets not received due to congestion in the bus or on the NIC can be queried via the rx_discards_phy xstats counter. 347 348Extended statistics can be queried using ``rte_eth_xstats_get()``. The extended statistics expose a wider set of counters counted by the device. The extended port statistics counts the number of packets received or sent successfully by the port. As Mellanox NICs are using the :ref:`Bifurcated Linux Driver <linux_gsg_linux_drivers>` those counters counts also packet received or sent by the Linux kernel. The counters with ``_phy`` suffix counts the total events on the physical port, therefore not valid for VF. 349 350Finally per-flow statistics can by queried using ``rte_flow_query`` when attaching a count action for specific flow. The flow counter counts the number of packets received successfully by the port and match the specific flow. 351 352Configuration 353------------- 354 355Compilation options 356~~~~~~~~~~~~~~~~~~~ 357 358The ibverbs libraries can be linked with this PMD in a number of ways, 359configured by the ``ibverbs_link`` build option: 360 361- ``shared`` (default): the PMD depends on some .so files. 362 363- ``dlopen``: Split the dependencies glue in a separate library 364 loaded when needed by dlopen. 365 It make dependencies on libibverbs and libmlx4 optional, 366 and has no performance impact. 367 368- ``static``: Embed static flavor of the dependencies libibverbs and libmlx4 369 in the PMD shared library or the executable static binary. 370 371Environment variables 372~~~~~~~~~~~~~~~~~~~~~ 373 374- ``MLX5_GLUE_PATH`` 375 376 A list of directories in which to search for the rdma-core "glue" plug-in, 377 separated by colons or semi-colons. 378 379- ``MLX5_SHUT_UP_BF`` 380 381 Configures HW Tx doorbell register as IO-mapped. 382 383 By default, the HW Tx doorbell is configured as a write-combining register. 384 The register would be flushed to HW usually when the write-combining buffer 385 becomes full, but it depends on CPU design. 386 387 Except for vectorized Tx burst routines, a write memory barrier is enforced 388 after updating the register so that the update can be immediately visible to 389 HW. 390 391 When vectorized Tx burst is called, the barrier is set only if the burst size 392 is not aligned to MLX5_VPMD_TX_MAX_BURST. However, setting this environmental 393 variable will bring better latency even though the maximum throughput can 394 slightly decline. 395 396Run-time configuration 397~~~~~~~~~~~~~~~~~~~~~~ 398 399- librte_net_mlx5 brings kernel network interfaces up during initialization 400 because it is affected by their state. Forcing them down prevents packets 401 reception. 402 403- **ethtool** operations on related kernel interfaces also affect the PMD. 404 405Run as non-root 406^^^^^^^^^^^^^^^ 407 408In order to run as a non-root user, 409some capabilities must be granted to the application:: 410 411 setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_ipc_lock+ep <dpdk-app> 412 413Below are the reasons of the need for each capability: 414 415``cap_sys_admin`` 416 When using physical addresses (PA mode), with Linux >= 4.0, 417 for access to ``/proc/self/pagemap``. 418 419``cap_net_admin`` 420 For device configuration. 421 422``cap_net_raw`` 423 For raw ethernet queue allocation through kernel driver. 424 425``cap_ipc_lock`` 426 For DMA memory pinning. 427 428Driver options 429^^^^^^^^^^^^^^ 430 431- ``rxq_cqe_comp_en`` parameter [int] 432 433 A nonzero value enables the compression of CQE on RX side. This feature 434 allows to save PCI bandwidth and improve performance. Enabled by default. 435 Different compression formats are supported in order to achieve the best 436 performance for different traffic patterns. Hash RSS format is the default. 437 438 Specifying 2 as a ``rxq_cqe_comp_en`` value selects Flow Tag format for 439 better compression rate in case of RTE Flow Mark traffic. 440 Specifying 3 as a ``rxq_cqe_comp_en`` value selects Checksum format. 441 Specifying 4 as a ``rxq_cqe_comp_en`` value selects L3/L4 Header format for 442 better compression rate in case of mixed TCP/UDP and IPv4/IPv6 traffic. 443 444 Supported on: 445 446 - x86_64 with ConnectX-4, ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx, 447 ConnectX-6 Lx, BlueField and BlueField-2. 448 - POWER9 and ARMv8 with ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx, 449 ConnectX-6 Lx, BlueField and BlueField-2. 450 451- ``rxq_cqe_pad_en`` parameter [int] 452 453 A nonzero value enables 128B padding of CQE on RX side. The size of CQE 454 is aligned with the size of a cacheline of the core. If cacheline size is 455 128B, the CQE size is configured to be 128B even though the device writes 456 only 64B data on the cacheline. This is to avoid unnecessary cache 457 invalidation by device's two consecutive writes on to one cacheline. 458 However in some architecture, it is more beneficial to update entire 459 cacheline with padding the rest 64B rather than striding because 460 read-modify-write could drop performance a lot. On the other hand, 461 writing extra data will consume more PCIe bandwidth and could also drop 462 the maximum throughput. It is recommended to empirically set this 463 parameter. Disabled by default. 464 465 Supported on: 466 467 - CPU having 128B cacheline with ConnectX-5 and BlueField. 468 469- ``rxq_pkt_pad_en`` parameter [int] 470 471 A nonzero value enables padding Rx packet to the size of cacheline on PCI 472 transaction. This feature would waste PCI bandwidth but could improve 473 performance by avoiding partial cacheline write which may cause costly 474 read-modify-copy in memory transaction on some architectures. Disabled by 475 default. 476 477 Supported on: 478 479 - x86_64 with ConnectX-4, ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx, 480 ConnectX-6 Lx, BlueField and BlueField-2. 481 - POWER8 and ARMv8 with ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx, 482 ConnectX-6 Lx, BlueField and BlueField-2. 483 484- ``mprq_en`` parameter [int] 485 486 A nonzero value enables configuring Multi-Packet Rx queues. Rx queue is 487 configured as Multi-Packet RQ if the total number of Rx queues is 488 ``rxqs_min_mprq`` or more. Disabled by default. 489 490 Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth 491 by posting a single large buffer for multiple packets. Instead of posting a 492 buffers per a packet, one large buffer is posted in order to receive multiple 493 packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides 494 and each stride receives one packet. MPRQ can improve throughput for 495 small-packet traffic. 496 497 When MPRQ is enabled, max_rx_pkt_len can be larger than the size of 498 user-provided mbuf even if DEV_RX_OFFLOAD_SCATTER isn't enabled. PMD will 499 configure large stride size enough to accommodate max_rx_pkt_len as long as 500 device allows. Note that this can waste system memory compared to enabling Rx 501 scatter and multi-segment packet. 502 503- ``mprq_log_stride_num`` parameter [int] 504 505 Log 2 of the number of strides for Multi-Packet Rx queue. Configuring more 506 strides can reduce PCIe traffic further. If configured value is not in the 507 range of device capability, the default value will be set with a warning 508 message. The default value is 4 which is 16 strides per a buffer, valid only 509 if ``mprq_en`` is set. 510 511 The size of Rx queue should be bigger than the number of strides. 512 513- ``mprq_log_stride_size`` parameter [int] 514 515 Log 2 of the size of a stride for Multi-Packet Rx queue. Configuring a smaller 516 stride size can save some memory and reduce probability of a depletion of all 517 available strides due to unreleased packets by an application. If configured 518 value is not in the range of device capability, the default value will be set 519 with a warning message. The default value is 11 which is 2048 bytes per a 520 stride, valid only if ``mprq_en`` is set. With ``mprq_log_stride_size`` set 521 it is possible for a packet to span across multiple strides. This mode allows 522 support of jumbo frames (9K) with MPRQ. The memcopy of some packets (or part 523 of a packet if Rx scatter is configured) may be required in case there is no 524 space left for a head room at the end of a stride which incurs some 525 performance penalty. 526 527- ``mprq_max_memcpy_len`` parameter [int] 528 529 The maximum length of packet to memcpy in case of Multi-Packet Rx queue. Rx 530 packet is mem-copied to a user-provided mbuf if the size of Rx packet is less 531 than or equal to this parameter. Otherwise, PMD will attach the Rx packet to 532 the mbuf by external buffer attachment - ``rte_pktmbuf_attach_extbuf()``. 533 A mempool for external buffers will be allocated and managed by PMD. If Rx 534 packet is externally attached, ol_flags field of the mbuf will have 535 EXT_ATTACHED_MBUF and this flag must be preserved. ``RTE_MBUF_HAS_EXTBUF()`` 536 checks the flag. The default value is 128, valid only if ``mprq_en`` is set. 537 538- ``rxqs_min_mprq`` parameter [int] 539 540 Configure Rx queues as Multi-Packet RQ if the total number of Rx queues is 541 greater or equal to this value. The default value is 12, valid only if 542 ``mprq_en`` is set. 543 544- ``txq_inline`` parameter [int] 545 546 Amount of data to be inlined during TX operations. This parameter is 547 deprecated and converted to the new parameter ``txq_inline_max`` providing 548 partial compatibility. 549 550- ``txqs_min_inline`` parameter [int] 551 552 Enable inline data send only when the number of TX queues is greater or equal 553 to this value. 554 555 This option should be used in combination with ``txq_inline_max`` and 556 ``txq_inline_mpw`` below and does not affect ``txq_inline_min`` settings above. 557 558 If this option is not specified the default value 16 is used for BlueField 559 and 8 for other platforms 560 561 The data inlining consumes the CPU cycles, so this option is intended to 562 auto enable inline data if we have enough Tx queues, which means we have 563 enough CPU cores and PCI bandwidth is getting more critical and CPU 564 is not supposed to be bottleneck anymore. 565 566 The copying data into WQE improves latency and can improve PPS performance 567 when PCI back pressure is detected and may be useful for scenarios involving 568 heavy traffic on many queues. 569 570 Because additional software logic is necessary to handle this mode, this 571 option should be used with care, as it may lower performance when back 572 pressure is not expected. 573 574 If inline data are enabled it may affect the maximal size of Tx queue in 575 descriptors because the inline data increase the descriptor size and 576 queue size limits supported by hardware may be exceeded. 577 578- ``txq_inline_min`` parameter [int] 579 580 Minimal amount of data to be inlined into WQE during Tx operations. NICs 581 may require this minimal data amount to operate correctly. The exact value 582 may depend on NIC operation mode, requested offloads, etc. It is strongly 583 recommended to omit this parameter and use the default values. Anyway, 584 applications using this parameter should take into consideration that 585 specifying an inconsistent value may prevent the NIC from sending packets. 586 587 If ``txq_inline_min`` key is present the specified value (may be aligned 588 by the driver in order not to exceed the limits and provide better descriptor 589 space utilization) will be used by the driver and it is guaranteed that 590 requested amount of data bytes are inlined into the WQE beside other inline 591 settings. This key also may update ``txq_inline_max`` value (default 592 or specified explicitly in devargs) to reserve the space for inline data. 593 594 If ``txq_inline_min`` key is not present, the value may be queried by the 595 driver from the NIC via DevX if this feature is available. If there is no DevX 596 enabled/supported the value 18 (supposing L2 header including VLAN) is set 597 for ConnectX-4 and ConnectX-4 Lx, and 0 is set by default for ConnectX-5 598 and newer NICs. If packet is shorter the ``txq_inline_min`` value, the entire 599 packet is inlined. 600 601 For ConnectX-4 NIC, driver does not allow specifying value below 18 602 (minimal L2 header, including VLAN), error will be raised. 603 604 For ConnectX-4 Lx NIC, it is allowed to specify values below 18, but 605 it is not recommended and may prevent NIC from sending packets over 606 some configurations. 607 608 Please, note, this minimal data inlining disengages eMPW feature (Enhanced 609 Multi-Packet Write), because last one does not support partial packet inlining. 610 This is not very critical due to minimal data inlining is mostly required 611 by ConnectX-4 and ConnectX-4 Lx, these NICs do not support eMPW feature. 612 613- ``txq_inline_max`` parameter [int] 614 615 Specifies the maximal packet length to be completely inlined into WQE 616 Ethernet Segment for ordinary SEND method. If packet is larger than specified 617 value, the packet data won't be copied by the driver at all, data buffer 618 is addressed with a pointer. If packet length is less or equal all packet 619 data will be copied into WQE. This may improve PCI bandwidth utilization for 620 short packets significantly but requires the extra CPU cycles. 621 622 The data inline feature is controlled by number of Tx queues, if number of Tx 623 queues is larger than ``txqs_min_inline`` key parameter, the inline feature 624 is engaged, if there are not enough Tx queues (which means not enough CPU cores 625 and CPU resources are scarce), data inline is not performed by the driver. 626 Assigning ``txqs_min_inline`` with zero always enables the data inline. 627 628 The default ``txq_inline_max`` value is 290. The specified value may be adjusted 629 by the driver in order not to exceed the limit (930 bytes) and to provide better 630 WQE space filling without gaps, the adjustment is reflected in the debug log. 631 Also, the default value (290) may be decreased in run-time if the large transmit 632 queue size is requested and hardware does not support enough descriptor 633 amount, in this case warning is emitted. If ``txq_inline_max`` key is 634 specified and requested inline settings can not be satisfied then error 635 will be raised. 636 637- ``txq_inline_mpw`` parameter [int] 638 639 Specifies the maximal packet length to be completely inlined into WQE for 640 Enhanced MPW method. If packet is large the specified value, the packet data 641 won't be copied, and data buffer is addressed with pointer. If packet length 642 is less or equal, all packet data will be copied into WQE. This may improve PCI 643 bandwidth utilization for short packets significantly but requires the extra 644 CPU cycles. 645 646 The data inline feature is controlled by number of TX queues, if number of Tx 647 queues is larger than ``txqs_min_inline`` key parameter, the inline feature 648 is engaged, if there are not enough Tx queues (which means not enough CPU cores 649 and CPU resources are scarce), data inline is not performed by the driver. 650 Assigning ``txqs_min_inline`` with zero always enables the data inline. 651 652 The default ``txq_inline_mpw`` value is 268. The specified value may be adjusted 653 by the driver in order not to exceed the limit (930 bytes) and to provide better 654 WQE space filling without gaps, the adjustment is reflected in the debug log. 655 Due to multiple packets may be included to the same WQE with Enhanced Multi 656 Packet Write Method and overall WQE size is limited it is not recommended to 657 specify large values for the ``txq_inline_mpw``. Also, the default value (268) 658 may be decreased in run-time if the large transmit queue size is requested 659 and hardware does not support enough descriptor amount, in this case warning 660 is emitted. If ``txq_inline_mpw`` key is specified and requested inline 661 settings can not be satisfied then error will be raised. 662 663- ``txqs_max_vec`` parameter [int] 664 665 Enable vectorized Tx only when the number of TX queues is less than or 666 equal to this value. This parameter is deprecated and ignored, kept 667 for compatibility issue to not prevent driver from probing. 668 669- ``txq_mpw_hdr_dseg_en`` parameter [int] 670 671 A nonzero value enables including two pointers in the first block of TX 672 descriptor. The parameter is deprecated and ignored, kept for compatibility 673 issue. 674 675- ``txq_max_inline_len`` parameter [int] 676 677 Maximum size of packet to be inlined. This limits the size of packet to 678 be inlined. If the size of a packet is larger than configured value, the 679 packet isn't inlined even though there's enough space remained in the 680 descriptor. Instead, the packet is included with pointer. This parameter 681 is deprecated and converted directly to ``txq_inline_mpw`` providing full 682 compatibility. Valid only if eMPW feature is engaged. 683 684- ``txq_mpw_en`` parameter [int] 685 686 A nonzero value enables Enhanced Multi-Packet Write (eMPW) for ConnectX-5, 687 ConnectX-6, ConnectX-6 Dx, ConnectX-6 Lx, BlueField, BlueField-2. 688 eMPW allows the Tx burst function to pack up multiple packets 689 in a single descriptor session in order to save PCI bandwidth 690 and improve performance at the cost of a slightly higher CPU usage. 691 When ``txq_inline_mpw`` is set along with ``txq_mpw_en``, 692 Tx burst function copies entire packet data on to Tx descriptor 693 instead of including pointer of packet. 694 695 The Enhanced Multi-Packet Write feature is enabled by default if NIC supports 696 it, can be disabled by explicit specifying 0 value for ``txq_mpw_en`` option. 697 Also, if minimal data inlining is requested by non-zero ``txq_inline_min`` 698 option or reported by the NIC, the eMPW feature is disengaged. 699 700- ``tx_db_nc`` parameter [int] 701 702 The rdma core library can map doorbell register in two ways, depending on the 703 environment variable "MLX5_SHUT_UP_BF": 704 705 - As regular cached memory (usually with write combining attribute), if the 706 variable is either missing or set to zero. 707 - As non-cached memory, if the variable is present and set to not "0" value. 708 709 The type of mapping may slightly affect the Tx performance, the optimal choice 710 is strongly relied on the host architecture and should be deduced practically. 711 712 If ``tx_db_nc`` is set to zero, the doorbell is forced to be mapped to regular 713 memory (with write combining), the PMD will perform the extra write memory barrier 714 after writing to doorbell, it might increase the needed CPU clocks per packet 715 to send, but latency might be improved. 716 717 If ``tx_db_nc`` is set to one, the doorbell is forced to be mapped to non 718 cached memory, the PMD will not perform the extra write memory barrier 719 after writing to doorbell, on some architectures it might improve the 720 performance. 721 722 If ``tx_db_nc`` is set to two, the doorbell is forced to be mapped to regular 723 memory, the PMD will use heuristics to decide whether write memory barrier 724 should be performed. For bursts with size multiple of recommended one (64 pkts) 725 it is supposed the next burst is coming and no need to issue the extra memory 726 barrier (it is supposed to be issued in the next coming burst, at least after 727 descriptor writing). It might increase latency (on some hosts till next 728 packets transmit) and should be used with care. 729 730 If ``tx_db_nc`` is omitted or set to zero, the preset (if any) environment 731 variable "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF", 732 the default ``tx_db_nc`` value is zero for ARM64 hosts and one for others. 733 734- ``tx_pp`` parameter [int] 735 736 If a nonzero value is specified the driver creates all necessary internal 737 objects to provide accurate packet send scheduling on mbuf timestamps. 738 The positive value specifies the scheduling granularity in nanoseconds, 739 the packet send will be accurate up to specified digits. The allowed range is 740 from 500 to 1 million of nanoseconds. The negative value specifies the module 741 of granularity and engages the special test mode the check the schedule rate. 742 By default (if the ``tx_pp`` is not specified) send scheduling on timestamps 743 feature is disabled. 744 745- ``tx_skew`` parameter [int] 746 747 The parameter adjusts the send packet scheduling on timestamps and represents 748 the average delay between beginning of the transmitting descriptor processing 749 by the hardware and appearance of actual packet data on the wire. The value 750 should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is 751 specified. The default value is zero. 752 753- ``tx_vec_en`` parameter [int] 754 755 A nonzero value enables Tx vector on ConnectX-5, ConnectX-6, ConnectX-6 Dx, 756 ConnectX-6 Lx, BlueField and BlueField-2 NICs 757 if the number of global Tx queues on the port is less than ``txqs_max_vec``. 758 The parameter is deprecated and ignored. 759 760- ``rx_vec_en`` parameter [int] 761 762 A nonzero value enables Rx vector if the port is not configured in 763 multi-segment otherwise this parameter is ignored. 764 765 Enabled by default. 766 767- ``vf_nl_en`` parameter [int] 768 769 A nonzero value enables Netlink requests from the VF to add/remove MAC 770 addresses or/and enable/disable promiscuous/all multicast on the Netdevice. 771 Otherwise the relevant configuration must be run with Linux iproute2 tools. 772 This is a prerequisite to receive this kind of traffic. 773 774 Enabled by default, valid only on VF devices ignored otherwise. 775 776- ``l3_vxlan_en`` parameter [int] 777 778 A nonzero value allows L3 VXLAN and VXLAN-GPE flow creation. To enable 779 L3 VXLAN or VXLAN-GPE, users has to configure firmware and enable this 780 parameter. This is a prerequisite to receive this kind of traffic. 781 782 Disabled by default. 783 784- ``dv_xmeta_en`` parameter [int] 785 786 A nonzero value enables extensive flow metadata support if device is 787 capable and driver supports it. This can enable extensive support of 788 ``MARK`` and ``META`` item of ``rte_flow``. The newly introduced 789 ``SET_TAG`` and ``SET_META`` actions do not depend on ``dv_xmeta_en``. 790 791 There are some possible configurations, depending on parameter value: 792 793 - 0, this is default value, defines the legacy mode, the ``MARK`` and 794 ``META`` related actions and items operate only within NIC Tx and 795 NIC Rx steering domains, no ``MARK`` and ``META`` information crosses 796 the domain boundaries. The ``MARK`` item is 24 bits wide, the ``META`` 797 item is 32 bits wide and match supported on egress only. 798 799 - 1, this engages extensive metadata mode, the ``MARK`` and ``META`` 800 related actions and items operate within all supported steering domains, 801 including FDB, ``MARK`` and ``META`` information may cross the domain 802 boundaries. The ``MARK`` item is 24 bits wide, the ``META`` item width 803 depends on kernel and firmware configurations and might be 0, 16 or 804 32 bits. Within NIC Tx domain ``META`` data width is 32 bits for 805 compatibility, the actual width of data transferred to the FDB domain 806 depends on kernel configuration and may be vary. The actual supported 807 width can be retrieved in runtime by series of rte_flow_validate() 808 trials. 809 810 - 2, this engages extensive metadata mode, the ``MARK`` and ``META`` 811 related actions and items operate within all supported steering domains, 812 including FDB, ``MARK`` and ``META`` information may cross the domain 813 boundaries. The ``META`` item is 32 bits wide, the ``MARK`` item width 814 depends on kernel and firmware configurations and might be 0, 16 or 815 24 bits. The actual supported width can be retrieved in runtime by 816 series of rte_flow_validate() trials. 817 818 - 3, this engages tunnel offload mode. In E-Switch configuration, that 819 mode implicitly activates ``dv_xmeta_en=1``. 820 821 +------+-----------+-----------+-------------+-------------+ 822 | Mode | ``MARK`` | ``META`` | ``META`` Tx | FDB/Through | 823 +======+===========+===========+=============+=============+ 824 | 0 | 24 bits | 32 bits | 32 bits | no | 825 +------+-----------+-----------+-------------+-------------+ 826 | 1 | 24 bits | vary 0-32 | 32 bits | yes | 827 +------+-----------+-----------+-------------+-------------+ 828 | 2 | vary 0-32 | 32 bits | 32 bits | yes | 829 +------+-----------+-----------+-------------+-------------+ 830 831 If there is no E-Switch configuration the ``dv_xmeta_en`` parameter is 832 ignored and the device is configured to operate in legacy mode (0). 833 834 Disabled by default (set to 0). 835 836 The Direct Verbs/Rules (engaged with ``dv_flow_en`` = 1) supports all 837 of the extensive metadata features. The legacy Verbs supports FLAG and 838 MARK metadata actions over NIC Rx steering domain only. 839 840- ``dv_flow_en`` parameter [int] 841 842 A nonzero value enables the DV flow steering assuming it is supported 843 by the driver (RDMA Core library version is rdma-core-24.0 or higher). 844 845 Enabled by default if supported. 846 847- ``dv_esw_en`` parameter [int] 848 849 A nonzero value enables E-Switch using Direct Rules. 850 851 Enabled by default if supported. 852 853- ``lacp_by_user`` parameter [int] 854 855 A nonzero value enables the control of LACP traffic by the user application. 856 When a bond exists in the driver, by default it should be managed by the 857 kernel and therefore LACP traffic should be steered to the kernel. 858 If this devarg is set to 1 it will allow the user to manage the bond by 859 itself and not steer LACP traffic to the kernel. 860 861 Disabled by default (set to 0). 862 863- ``mr_ext_memseg_en`` parameter [int] 864 865 A nonzero value enables extending memseg when registering DMA memory. If 866 enabled, the number of entries in MR (Memory Region) lookup table on datapath 867 is minimized and it benefits performance. On the other hand, it worsens memory 868 utilization because registered memory is pinned by kernel driver. Even if a 869 page in the extended chunk is freed, that doesn't become reusable until the 870 entire memory is freed. 871 872 Enabled by default. 873 874- ``representor`` parameter [list] 875 876 This parameter can be used to instantiate DPDK Ethernet devices from 877 existing port (or VF) representors configured on the device. 878 879 It is a standard parameter whose format is described in 880 :ref:`ethernet_device_standard_device_arguments`. 881 882 For instance, to probe port representors 0 through 2:: 883 884 representor=[0-2] 885 886- ``max_dump_files_num`` parameter [int] 887 888 The maximum number of files per PMD entity that may be created for debug information. 889 The files will be created in /var/log directory or in current directory. 890 891 set to 128 by default. 892 893- ``lro_timeout_usec`` parameter [int] 894 895 The maximum allowed duration of an LRO session, in micro-seconds. 896 PMD will set the nearest value supported by HW, which is not bigger than 897 the input ``lro_timeout_usec`` value. 898 If this parameter is not specified, by default PMD will set 899 the smallest value supported by HW. 900 901- ``hp_buf_log_sz`` parameter [int] 902 903 The total data buffer size of a hairpin queue (logarithmic form), in bytes. 904 PMD will set the data buffer size to 2 ** ``hp_buf_log_sz``, both for RX & TX. 905 The capacity of the value is specified by the firmware and the initialization 906 will get a failure if it is out of scope. 907 The range of the value is from 11 to 19 right now, and the supported frame 908 size of a single packet for hairpin is from 512B to 128KB. It might change if 909 different firmware release is being used. By using a small value, it could 910 reduce memory consumption but not work with a large frame. If the value is 911 too large, the memory consumption will be high and some potential performance 912 degradation will be introduced. 913 By default, the PMD will set this value to 16, which means that 9KB jumbo 914 frames will be supported. 915 916- ``reclaim_mem_mode`` parameter [int] 917 918 Cache some resources in flow destroy will help flow recreation more efficient. 919 While some systems may require the all the resources can be reclaimed after 920 flow destroyed. 921 The parameter ``reclaim_mem_mode`` provides the option for user to configure 922 if the resource cache is needed or not. 923 924 There are three options to choose: 925 926 - 0. It means the flow resources will be cached as usual. The resources will 927 be cached, helpful with flow insertion rate. 928 929 - 1. It will only enable the DPDK PMD level resources reclaim. 930 931 - 2. Both DPDK PMD level and rdma-core low level will be configured as 932 reclaimed mode. 933 934 By default, the PMD will set this value to 0. 935 936- ``sys_mem_en`` parameter [int] 937 938 A non-zero value enables the PMD memory management allocating memory 939 from system by default, without explicit rte memory flag. 940 941 By default, the PMD will set this value to 0. 942 943- ``decap_en`` parameter [int] 944 945 Some devices do not support FCS (frame checksum) scattering for 946 tunnel-decapsulated packets. 947 If set to 0, this option forces the FCS feature and rejects tunnel 948 decapsulation in the flow engine for such devices. 949 950 By default, the PMD will set this value to 1. 951 952.. _mlx5_firmware_config: 953 954Firmware configuration 955~~~~~~~~~~~~~~~~~~~~~~ 956 957Firmware features can be configured as key/value pairs. 958 959The command to set a value is:: 960 961 mlxconfig -d <device> set <key>=<value> 962 963The command to query a value is:: 964 965 mlxconfig -d <device> query | grep <key> 966 967The device name for the command ``mlxconfig`` can be either the PCI address, 968or the mst device name found with:: 969 970 mst status 971 972Below are some firmware configurations listed. 973 974- link type:: 975 976 LINK_TYPE_P1 977 LINK_TYPE_P2 978 value: 1=Infiniband 2=Ethernet 3=VPI(auto-sense) 979 980- enable SR-IOV:: 981 982 SRIOV_EN=1 983 984- maximum number of SR-IOV virtual functions:: 985 986 NUM_OF_VFS=<max> 987 988- enable DevX (required by Direct Rules and other features):: 989 990 UCTX_EN=1 991 992- aggressive CQE zipping:: 993 994 CQE_COMPRESSION=1 995 996- L3 VXLAN and VXLAN-GPE destination UDP port:: 997 998 IP_OVER_VXLAN_EN=1 999 IP_OVER_VXLAN_PORT=<udp dport> 1000 1001- enable VXLAN-GPE tunnel flow matching:: 1002 1003 FLEX_PARSER_PROFILE_ENABLE=0 1004 or 1005 FLEX_PARSER_PROFILE_ENABLE=2 1006 1007- enable IP-in-IP tunnel flow matching:: 1008 1009 FLEX_PARSER_PROFILE_ENABLE=0 1010 1011- enable MPLS flow matching:: 1012 1013 FLEX_PARSER_PROFILE_ENABLE=1 1014 1015- enable ICMP(code/type/identifier/sequence number) / ICMP6(code/type) fields matching:: 1016 1017 FLEX_PARSER_PROFILE_ENABLE=2 1018 1019- enable Geneve flow matching:: 1020 1021 FLEX_PARSER_PROFILE_ENABLE=0 1022 or 1023 FLEX_PARSER_PROFILE_ENABLE=1 1024 1025- enable GTP flow matching:: 1026 1027 FLEX_PARSER_PROFILE_ENABLE=3 1028 1029- enable eCPRI flow matching:: 1030 1031 FLEX_PARSER_PROFILE_ENABLE=4 1032 PROG_PARSE_GRAPH=1 1033 1034Prerequisites 1035------------- 1036 1037This driver relies on external libraries and kernel drivers for resources 1038allocations and initialization. The following dependencies are not part of 1039DPDK and must be installed separately: 1040 1041- **libibverbs** 1042 1043 User space Verbs framework used by librte_net_mlx5. This library provides 1044 a generic interface between the kernel and low-level user space drivers 1045 such as libmlx5. 1046 1047 It allows slow and privileged operations (context initialization, hardware 1048 resources allocations) to be managed by the kernel and fast operations to 1049 never leave user space. 1050 1051- **libmlx5** 1052 1053 Low-level user space driver library for Mellanox 1054 ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices, it is automatically loaded 1055 by libibverbs. 1056 1057 This library basically implements send/receive calls to the hardware 1058 queues. 1059 1060- **Kernel modules** 1061 1062 They provide the kernel-side Verbs API and low level device drivers that 1063 manage actual hardware initialization and resources sharing with user 1064 space processes. 1065 1066 Unlike most other PMDs, these modules must remain loaded and bound to 1067 their devices: 1068 1069 - mlx5_core: hardware driver managing Mellanox 1070 ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices and related Ethernet kernel 1071 network devices. 1072 - mlx5_ib: InifiniBand device driver. 1073 - ib_uverbs: user space driver for Verbs (entry point for libibverbs). 1074 1075- **Firmware update** 1076 1077 Mellanox OFED/EN releases include firmware updates for 1078 ConnectX-4/ConnectX-5/ConnectX-6/BlueField adapters. 1079 1080 Because each release provides new features, these updates must be applied to 1081 match the kernel modules and libraries they come with. 1082 1083.. note:: 1084 1085 Both libraries are BSD and GPL licensed. Linux kernel modules are GPL 1086 licensed. 1087 1088Installation 1089~~~~~~~~~~~~ 1090 1091Either RDMA Core library with a recent enough Linux kernel release 1092(recommended) or Mellanox OFED/EN, which provides compatibility with older 1093releases. 1094 1095RDMA Core with Linux Kernel 1096^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1097 1098- Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux installation documentation`_) 1099- Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull request #227 from yishaih/tm") 1100 (see `RDMA Core installation documentation`_) 1101- When building for i686 use: 1102 1103 - rdma-core version 18.0 or above built with 32bit support. 1104 - Kernel version 4.14.41 or above. 1105 1106- Starting with rdma-core v21, static libraries can be built:: 1107 1108 cd build 1109 CFLAGS=-fPIC cmake -DIN_PLACE=1 -DENABLE_STATIC=1 -GNinja .. 1110 ninja 1111 1112.. _`Linux installation documentation`: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/plain/Documentation/admin-guide/README.rst 1113.. _`RDMA Core installation documentation`: https://raw.githubusercontent.com/linux-rdma/rdma-core/master/README.md 1114 1115 1116Mellanox OFED/EN 1117^^^^^^^^^^^^^^^^ 1118 1119- Mellanox OFED version: **4.5** and above / 1120 Mellanox EN version: **4.5** and above 1121- firmware version: 1122 1123 - ConnectX-4: **12.21.1000** and above. 1124 - ConnectX-4 Lx: **14.21.1000** and above. 1125 - ConnectX-5: **16.21.1000** and above. 1126 - ConnectX-5 Ex: **16.21.1000** and above. 1127 - ConnectX-6: **20.27.0090** and above. 1128 - ConnectX-6 Dx: **22.27.0090** and above. 1129 - BlueField: **18.25.1010** and above. 1130 1131While these libraries and kernel modules are available on OpenFabrics 1132Alliance's `website <https://www.openfabrics.org/>`__ and provided by package 1133managers on most distributions, this PMD requires Ethernet extensions that 1134may not be supported at the moment (this is a work in progress). 1135 1136`Mellanox OFED 1137<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__ and 1138`Mellanox EN 1139<http://www.mellanox.com/page/products_dyn?product_family=27&mtag=linux>`__ 1140include the necessary support and should be used in the meantime. For DPDK, 1141only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are 1142required from that distribution. 1143 1144.. note:: 1145 1146 Several versions of Mellanox OFED/EN are available. Installing the version 1147 this DPDK release was developed and tested against is strongly 1148 recommended. Please check the `prerequisites`_. 1149 1150Supported NICs 1151-------------- 1152 1153The following Mellanox device families are supported by the same mlx5 driver: 1154 1155 - ConnectX-4 1156 - ConnectX-4 Lx 1157 - ConnectX-5 1158 - ConnectX-5 Ex 1159 - ConnectX-6 1160 - ConnectX-6 Dx 1161 - ConnectX-6 Lx 1162 - BlueField 1163 - BlueField-2 1164 1165Below are detailed device names: 1166 1167* Mellanox\ |reg| ConnectX\ |reg|-4 10G MCX4111A-XCAT (1x10G) 1168* Mellanox\ |reg| ConnectX\ |reg|-4 10G MCX412A-XCAT (2x10G) 1169* Mellanox\ |reg| ConnectX\ |reg|-4 25G MCX4111A-ACAT (1x25G) 1170* Mellanox\ |reg| ConnectX\ |reg|-4 25G MCX412A-ACAT (2x25G) 1171* Mellanox\ |reg| ConnectX\ |reg|-4 40G MCX413A-BCAT (1x40G) 1172* Mellanox\ |reg| ConnectX\ |reg|-4 40G MCX4131A-BCAT (1x40G) 1173* Mellanox\ |reg| ConnectX\ |reg|-4 40G MCX415A-BCAT (1x40G) 1174* Mellanox\ |reg| ConnectX\ |reg|-4 50G MCX413A-GCAT (1x50G) 1175* Mellanox\ |reg| ConnectX\ |reg|-4 50G MCX4131A-GCAT (1x50G) 1176* Mellanox\ |reg| ConnectX\ |reg|-4 50G MCX414A-BCAT (2x50G) 1177* Mellanox\ |reg| ConnectX\ |reg|-4 50G MCX415A-GCAT (1x50G) 1178* Mellanox\ |reg| ConnectX\ |reg|-4 50G MCX416A-BCAT (2x50G) 1179* Mellanox\ |reg| ConnectX\ |reg|-4 50G MCX416A-GCAT (2x50G) 1180* Mellanox\ |reg| ConnectX\ |reg|-4 50G MCX415A-CCAT (1x100G) 1181* Mellanox\ |reg| ConnectX\ |reg|-4 100G MCX416A-CCAT (2x100G) 1182* Mellanox\ |reg| ConnectX\ |reg|-4 Lx 10G MCX4111A-XCAT (1x10G) 1183* Mellanox\ |reg| ConnectX\ |reg|-4 Lx 10G MCX4121A-XCAT (2x10G) 1184* Mellanox\ |reg| ConnectX\ |reg|-4 Lx 25G MCX4111A-ACAT (1x25G) 1185* Mellanox\ |reg| ConnectX\ |reg|-4 Lx 25G MCX4121A-ACAT (2x25G) 1186* Mellanox\ |reg| ConnectX\ |reg|-4 Lx 40G MCX4131A-BCAT (1x40G) 1187* Mellanox\ |reg| ConnectX\ |reg|-5 100G MCX556A-ECAT (2x100G) 1188* Mellanox\ |reg| ConnectX\ |reg|-5 Ex EN 100G MCX516A-CDAT (2x100G) 1189* Mellanox\ |reg| ConnectX\ |reg|-6 200G MCX654106A-HCAT (2x200G) 1190* Mellanox\ |reg| ConnectX\ |reg|-6 Dx EN 100G MCX623106AN-CDAT (2x100G) 1191* Mellanox\ |reg| ConnectX\ |reg|-6 Dx EN 200G MCX623105AN-VDAT (1x200G) 1192* Mellanox\ |reg| ConnectX\ |reg|-6 Lx EN 25G MCX631102AN-ADAT (2x25G) 1193 1194Quick Start Guide on OFED/EN 1195---------------------------- 1196 11971. Download latest Mellanox OFED/EN. For more info check the `prerequisites`_. 1198 1199 12002. Install the required libraries and kernel modules either by installing 1201 only the required set, or by installing the entire Mellanox OFED/EN:: 1202 1203 ./mlnxofedinstall --upstream-libs --dpdk 1204 12053. Verify the firmware is the correct one:: 1206 1207 ibv_devinfo 1208 12094. Verify all ports links are set to Ethernet:: 1210 1211 mlxconfig -d <mst device> query | grep LINK_TYPE 1212 LINK_TYPE_P1 ETH(2) 1213 LINK_TYPE_P2 ETH(2) 1214 1215 Link types may have to be configured to Ethernet:: 1216 1217 mlxconfig -d <mst device> set LINK_TYPE_P1/2=1/2/3 1218 1219 * LINK_TYPE_P1=<1|2|3> , 1=Infiniband 2=Ethernet 3=VPI(auto-sense) 1220 1221 For hypervisors, verify SR-IOV is enabled on the NIC:: 1222 1223 mlxconfig -d <mst device> query | grep SRIOV_EN 1224 SRIOV_EN True(1) 1225 1226 If needed, configure SR-IOV:: 1227 1228 mlxconfig -d <mst device> set SRIOV_EN=1 NUM_OF_VFS=16 1229 mlxfwreset -d <mst device> reset 1230 12315. Restart the driver:: 1232 1233 /etc/init.d/openibd restart 1234 1235 or:: 1236 1237 service openibd restart 1238 1239 If link type was changed, firmware must be reset as well:: 1240 1241 mlxfwreset -d <mst device> reset 1242 1243 For hypervisors, after reset write the sysfs number of virtual functions 1244 needed for the PF. 1245 1246 To dynamically instantiate a given number of virtual functions (VFs):: 1247 1248 echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs 1249 12506. Install DPDK and you are ready to go. 1251 See :doc:`compilation instructions <../linux_gsg/build_dpdk>`. 1252 1253Enable switchdev mode 1254--------------------- 1255 1256Switchdev mode is a mode in E-Switch, that binds between representor and VF. 1257Representor is a port in DPDK that is connected to a VF in such a way 1258that assuming there are no offload flows, each packet that is sent from the VF 1259will be received by the corresponding representor. While each packet that is 1260sent to a representor will be received by the VF. 1261This is very useful in case of SRIOV mode, where the first packet that is sent 1262by the VF will be received by the DPDK application which will decide if this 1263flow should be offloaded to the E-Switch. After offloading the flow packet 1264that the VF that are matching the flow will not be received any more by 1265the DPDK application. 1266 12671. Enable SRIOV mode:: 1268 1269 mlxconfig -d <mst device> set SRIOV_EN=true 1270 12712. Configure the max number of VFs:: 1272 1273 mlxconfig -d <mst device> set NUM_OF_VFS=<num of vfs> 1274 12753. Reset the FW:: 1276 1277 mlxfwreset -d <mst device> reset 1278 12793. Configure the actual number of VFs:: 1280 1281 echo <num of vfs > /sys/class/net/<net device>/device/sriov_numvfs 1282 12834. Unbind the device (can be rebind after the switchdev mode):: 1284 1285 echo -n "<device pci address" > /sys/bus/pci/drivers/mlx5_core/unbind 1286 12875. Enbale switchdev mode:: 1288 1289 echo switchdev > /sys/class/net/<net device>/compat/devlink/mode 1290 1291Performance tuning 1292------------------ 1293 12941. Configure aggressive CQE Zipping for maximum performance:: 1295 1296 mlxconfig -d <mst device> s CQE_COMPRESSION=1 1297 1298 To set it back to the default CQE Zipping mode use:: 1299 1300 mlxconfig -d <mst device> s CQE_COMPRESSION=0 1301 13022. In case of virtualization: 1303 1304 - Make sure that hypervisor kernel is 3.16 or newer. 1305 - Configure boot with ``iommu=pt``. 1306 - Use 1G huge pages. 1307 - Make sure to allocate a VM on huge pages. 1308 - Make sure to set CPU pinning. 1309 13103. Use the CPU near local NUMA node to which the PCIe adapter is connected, 1311 for better performance. For VMs, verify that the right CPU 1312 and NUMA node are pinned according to the above. Run:: 1313 1314 lstopo-no-graphics 1315 1316 to identify the NUMA node to which the PCIe adapter is connected. 1317 13184. If more than one adapter is used, and root complex capabilities allow 1319 to put both adapters on the same NUMA node without PCI bandwidth degradation, 1320 it is recommended to locate both adapters on the same NUMA node. 1321 This in order to forward packets from one to the other without 1322 NUMA performance penalty. 1323 13245. Disable pause frames:: 1325 1326 ethtool -A <netdev> rx off tx off 1327 13286. Verify IO non-posted prefetch is disabled by default. This can be checked 1329 via the BIOS configuration. Please contact you server provider for more 1330 information about the settings. 1331 1332.. note:: 1333 1334 On some machines, depends on the machine integrator, it is beneficial 1335 to set the PCI max read request parameter to 1K. This can be 1336 done in the following way: 1337 1338 To query the read request size use:: 1339 1340 setpci -s <NIC PCI address> 68.w 1341 1342 If the output is different than 3XXX, set it by:: 1343 1344 setpci -s <NIC PCI address> 68.w=3XXX 1345 1346 The XXX can be different on different systems. Make sure to configure 1347 according to the setpci output. 1348 13497. To minimize overhead of searching Memory Regions: 1350 1351 - '--socket-mem' is recommended to pin memory by predictable amount. 1352 - Configure per-lcore cache when creating Mempools for packet buffer. 1353 - Refrain from dynamically allocating/freeing memory in run-time. 1354 1355Rx burst functions 1356------------------ 1357 1358There are multiple Rx burst functions with different advantages and limitations. 1359 1360.. table:: Rx burst functions 1361 1362 +-------------------+------------------------+---------+-----------------+------+-------+ 1363 || Function Name || Enabler || Scatter|| Error Recovery || CQE || Large| 1364 | | | | || comp|| MTU | 1365 +===================+========================+=========+=================+======+=======+ 1366 | rx_burst | rx_vec_en=0 | Yes | Yes | Yes | Yes | 1367 +-------------------+------------------------+---------+-----------------+------+-------+ 1368 | rx_burst_vec | rx_vec_en=1 (default) | No | if CQE comp off | Yes | No | 1369 +-------------------+------------------------+---------+-----------------+------+-------+ 1370 | rx_burst_mprq || mprq_en=1 | No | Yes | Yes | Yes | 1371 | || RxQs >= rxqs_min_mprq | | | | | 1372 +-------------------+------------------------+---------+-----------------+------+-------+ 1373 | rx_burst_mprq_vec || rx_vec_en=1 (default) | No | if CQE comp off | Yes | Yes | 1374 | || mprq_en=1 | | | | | 1375 | || RxQs >= rxqs_min_mprq | | | | | 1376 +-------------------+------------------------+---------+-----------------+------+-------+ 1377 1378.. _mlx5_offloads_support: 1379 1380Supported hardware offloads 1381--------------------------- 1382 1383.. table:: Minimal SW/HW versions for queue offloads 1384 1385 ============== ===== ===== ========= ===== ========== ============= 1386 Offload DPDK Linux rdma-core OFED firmware hardware 1387 ============== ===== ===== ========= ===== ========== ============= 1388 common base 17.11 4.14 16 4.2-1 12.21.1000 ConnectX-4 1389 checksums 17.11 4.14 16 4.2-1 12.21.1000 ConnectX-4 1390 Rx timestamp 17.11 4.14 16 4.2-1 12.21.1000 ConnectX-4 1391 TSO 17.11 4.14 16 4.2-1 12.21.1000 ConnectX-4 1392 LRO 19.08 N/A N/A 4.6-4 16.25.6406 ConnectX-5 1393 Buffer Split 20.11 N/A N/A 5.1-2 22.28.2006 ConnectX-6 Dx 1394 ============== ===== ===== ========= ===== ========== ============= 1395 1396.. table:: Minimal SW/HW versions for rte_flow offloads 1397 1398 +-----------------------+-----------------+-----------------+ 1399 | Offload | with E-Switch | with NIC | 1400 +=======================+=================+=================+ 1401 | Count | | DPDK 19.05 | | DPDK 19.02 | 1402 | | | OFED 4.6 | | OFED 4.6 | 1403 | | | rdma-core 24 | | rdma-core 23 | 1404 | | | ConnectX-5 | | ConnectX-5 | 1405 +-----------------------+-----------------+-----------------+ 1406 | Drop | | DPDK 19.05 | | DPDK 18.11 | 1407 | | | OFED 4.6 | | OFED 4.5 | 1408 | | | rdma-core 24 | | rdma-core 23 | 1409 | | | ConnectX-5 | | ConnectX-4 | 1410 +-----------------------+-----------------+-----------------+ 1411 | Queue / RSS | | | | DPDK 18.11 | 1412 | | | N/A | | OFED 4.5 | 1413 | | | | | rdma-core 23 | 1414 | | | | | ConnectX-4 | 1415 +-----------------------+-----------------+-----------------+ 1416 | RSS shared action | | | | DPDK 20.11 | 1417 | | | N/A | | OFED 5.2 | 1418 | | | | | rdma-core 33 | 1419 | | | | | ConnectX-5 | 1420 +-----------------------+-----------------+-----------------+ 1421 | | VLAN | | DPDK 19.11 | | DPDK 19.11 | 1422 | | (of_pop_vlan / | | OFED 4.7-1 | | OFED 4.7-1 | 1423 | | of_push_vlan / | | ConnectX-5 | | ConnectX-5 | 1424 | | of_set_vlan_pcp / | | | | | 1425 | | of_set_vlan_vid) | | | | | 1426 +-----------------------+-----------------+-----------------+ 1427 | Encapsulation | | DPDK 19.05 | | DPDK 19.02 | 1428 | (VXLAN / NVGRE / RAW) | | OFED 4.7-1 | | OFED 4.6 | 1429 | | | rdma-core 24 | | rdma-core 23 | 1430 | | | ConnectX-5 | | ConnectX-5 | 1431 +-----------------------+-----------------+-----------------+ 1432 | Encapsulation | | DPDK 19.11 | | DPDK 19.11 | 1433 | GENEVE | | OFED 4.7-3 | | OFED 4.7-3 | 1434 | | | rdma-core 27 | | rdma-core 27 | 1435 | | | ConnectX-5 | | ConnectX-5 | 1436 +-----------------------+-----------------+-----------------+ 1437 | Tunnel Offload | | DPDK 20.11 | | DPDK 20.11 | 1438 | | | OFED 5.1-2 | | OFED 5.1-2 | 1439 | | | rdma-core 32 | | N/A | 1440 | | | ConnectX-5 | | ConnectX-5 | 1441 +-----------------------+-----------------+-----------------+ 1442 | | Header rewrite | | DPDK 19.05 | | DPDK 19.02 | 1443 | | (set_ipv4_src / | | OFED 4.7-1 | | OFED 4.7-1 | 1444 | | set_ipv4_dst / | | rdma-core 24 | | rdma-core 24 | 1445 | | set_ipv6_src / | | ConnectX-5 | | ConnectX-5 | 1446 | | set_ipv6_dst / | | | | | 1447 | | set_tp_src / | | | | | 1448 | | set_tp_dst / | | | | | 1449 | | dec_ttl / | | | | | 1450 | | set_ttl / | | | | | 1451 | | set_mac_src / | | | | | 1452 | | set_mac_dst) | | | | | 1453 +-----------------------+-----------------+-----------------+ 1454 | | Header rewrite | | DPDK 20.02 | | DPDK 20.02 | 1455 | | (set_dscp) | | OFED 5.0 | | OFED 5.0 | 1456 | | | | rdma-core 24 | | rdma-core 24 | 1457 | | | | ConnectX-5 | | ConnectX-5 | 1458 +-----------------------+-----------------+-----------------+ 1459 | Jump | | DPDK 19.05 | | DPDK 19.02 | 1460 | | | OFED 4.7-1 | | OFED 4.7-1 | 1461 | | | rdma-core 24 | | N/A | 1462 | | | ConnectX-5 | | ConnectX-5 | 1463 +-----------------------+-----------------+-----------------+ 1464 | Mark / Flag | | DPDK 19.05 | | DPDK 18.11 | 1465 | | | OFED 4.6 | | OFED 4.5 | 1466 | | | rdma-core 24 | | rdma-core 23 | 1467 | | | ConnectX-5 | | ConnectX-4 | 1468 +-----------------------+-----------------+-----------------+ 1469 | Meta data | | DPDK 19.11 | | DPDK 19.11 | 1470 | | | OFED 4.7-3 | | OFED 4.7-3 | 1471 | | | rdma-core 26 | | rdma-core 26 | 1472 | | | ConnectX-5 | | ConnectX-5 | 1473 +-----------------------+-----------------+-----------------+ 1474 | Port ID | | DPDK 19.05 | | N/A | 1475 | | | OFED 4.7-1 | | N/A | 1476 | | | rdma-core 24 | | N/A | 1477 | | | ConnectX-5 | | N/A | 1478 +-----------------------+-----------------+-----------------+ 1479 | Hairpin | | | | DPDK 19.11 | 1480 | | | N/A | | OFED 4.7-3 | 1481 | | | | | rdma-core 26 | 1482 | | | | | ConnectX-5 | 1483 +-----------------------+-----------------+-----------------+ 1484 | 2-port Hairpin | | | | DPDK 20.11 | 1485 | | | N/A | | OFED 5.1-2 | 1486 | | | | | N/A | 1487 | | | | | ConnectX-5 | 1488 +-----------------------+-----------------+-----------------+ 1489 | Metering | | DPDK 19.11 | | DPDK 19.11 | 1490 | | | OFED 4.7-3 | | OFED 4.7-3 | 1491 | | | rdma-core 26 | | rdma-core 26 | 1492 | | | ConnectX-5 | | ConnectX-5 | 1493 +-----------------------+-----------------+-----------------+ 1494 | Sampling | | DPDK 20.11 | | DPDK 20.11 | 1495 | | | OFED 5.1-2 | | OFED 5.1-2 | 1496 | | | rdma-core 32 | | N/A | 1497 | | | ConnectX-5 | | ConnectX-5 | 1498 +-----------------------+-----------------+-----------------+ 1499 | Age shared action | | DPDK 20.11 | | DPDK 20.11 | 1500 | | | OFED 5.2 | | OFED 5.2 | 1501 | | | rdma-core 32 | | rdma-core 32 | 1502 | | | ConnectX-6 Dx| | ConnectX-6 Dx | 1503 +-----------------------+-----------------+-----------------+ 1504 1505Notes for metadata 1506------------------ 1507 1508MARK and META items are interrelated with datapath - they might move from/to 1509the applications in mbuf fields. Hence, zero value for these items has the 1510special meaning - it means "no metadata are provided", not zero values are 1511treated by applications and PMD as valid ones. 1512 1513Moreover in the flow engine domain the value zero is acceptable to match and 1514set, and we should allow to specify zero values as rte_flow parameters for the 1515META and MARK items and actions. In the same time zero mask has no meaning and 1516should be rejected on validation stage. 1517 1518Notes for rte_flow 1519------------------ 1520 1521Flows are not cached in the driver. 1522When stopping a device port, all the flows created on this port from the 1523application will be flushed automatically in the background. 1524After stopping the device port, all flows on this port become invalid and 1525not represented in the system. 1526All references to these flows held by the application should be discarded 1527directly but neither destroyed nor flushed. 1528 1529The application should re-create the flows as required after the port restart. 1530 1531Notes for testpmd 1532----------------- 1533 1534Compared to librte_net_mlx4 that implements a single RSS configuration per 1535port, librte_net_mlx5 supports per-protocol RSS configuration. 1536 1537Since ``testpmd`` defaults to IP RSS mode and there is currently no 1538command-line parameter to enable additional protocols (UDP and TCP as well 1539as IP), the following commands must be entered from its CLI to get the same 1540behavior as librte_net_mlx4:: 1541 1542 > port stop all 1543 > port config all rss all 1544 > port start all 1545 1546Usage example 1547------------- 1548 1549This section demonstrates how to launch **testpmd** with Mellanox 1550ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices managed by librte_net_mlx5. 1551 1552#. Load the kernel modules:: 1553 1554 modprobe -a ib_uverbs mlx5_core mlx5_ib 1555 1556 Alternatively if MLNX_OFED/MLNX_EN is fully installed, the following script 1557 can be run:: 1558 1559 /etc/init.d/openibd restart 1560 1561 .. note:: 1562 1563 User space I/O kernel modules (uio and igb_uio) are not used and do 1564 not have to be loaded. 1565 1566#. Make sure Ethernet interfaces are in working order and linked to kernel 1567 verbs. Related sysfs entries should be present:: 1568 1569 ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 1570 1571 Example output:: 1572 1573 eth30 1574 eth31 1575 eth32 1576 eth33 1577 1578#. Optionally, retrieve their PCI bus addresses for to be used with the allow list:: 1579 1580 { 1581 for intf in eth2 eth3 eth4 eth5; 1582 do 1583 (cd "/sys/class/net/${intf}/device/" && pwd -P); 1584 done; 1585 } | 1586 sed -n 's,.*/\(.*\),-a \1,p' 1587 1588 Example output:: 1589 1590 -a 0000:05:00.1 1591 -a 0000:06:00.0 1592 -a 0000:06:00.1 1593 -a 0000:05:00.0 1594 1595#. Request huge pages:: 1596 1597 echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages 1598 1599#. Start testpmd with basic parameters:: 1600 1601 testpmd -l 8-15 -n 4 -a 05:00.0 -a 05:00.1 -a 06:00.0 -a 06:00.1 -- --rxq=2 --txq=2 -i 1602 1603 Example output:: 1604 1605 [...] 1606 EAL: PCI device 0000:05:00.0 on NUMA socket 0 1607 EAL: probe driver: 15b3:1013 librte_net_mlx5 1608 PMD: librte_net_mlx5: PCI information matches, using device "mlx5_0" (VF: false) 1609 PMD: librte_net_mlx5: 1 port(s) detected 1610 PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe 1611 EAL: PCI device 0000:05:00.1 on NUMA socket 0 1612 EAL: probe driver: 15b3:1013 librte_net_mlx5 1613 PMD: librte_net_mlx5: PCI information matches, using device "mlx5_1" (VF: false) 1614 PMD: librte_net_mlx5: 1 port(s) detected 1615 PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff 1616 EAL: PCI device 0000:06:00.0 on NUMA socket 0 1617 EAL: probe driver: 15b3:1013 librte_net_mlx5 1618 PMD: librte_net_mlx5: PCI information matches, using device "mlx5_2" (VF: false) 1619 PMD: librte_net_mlx5: 1 port(s) detected 1620 PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa 1621 EAL: PCI device 0000:06:00.1 on NUMA socket 0 1622 EAL: probe driver: 15b3:1013 librte_net_mlx5 1623 PMD: librte_net_mlx5: PCI information matches, using device "mlx5_3" (VF: false) 1624 PMD: librte_net_mlx5: 1 port(s) detected 1625 PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb 1626 Interactive-mode selected 1627 Configuring Port 0 (socket 0) 1628 PMD: librte_net_mlx5: 0x8cba80: TX queues number update: 0 -> 2 1629 PMD: librte_net_mlx5: 0x8cba80: RX queues number update: 0 -> 2 1630 Port 0: E4:1D:2D:E7:0C:FE 1631 Configuring Port 1 (socket 0) 1632 PMD: librte_net_mlx5: 0x8ccac8: TX queues number update: 0 -> 2 1633 PMD: librte_net_mlx5: 0x8ccac8: RX queues number update: 0 -> 2 1634 Port 1: E4:1D:2D:E7:0C:FF 1635 Configuring Port 2 (socket 0) 1636 PMD: librte_net_mlx5: 0x8cdb10: TX queues number update: 0 -> 2 1637 PMD: librte_net_mlx5: 0x8cdb10: RX queues number update: 0 -> 2 1638 Port 2: E4:1D:2D:E7:0C:FA 1639 Configuring Port 3 (socket 0) 1640 PMD: librte_net_mlx5: 0x8ceb58: TX queues number update: 0 -> 2 1641 PMD: librte_net_mlx5: 0x8ceb58: RX queues number update: 0 -> 2 1642 Port 3: E4:1D:2D:E7:0C:FB 1643 Checking link statuses... 1644 Port 0 Link Up - speed 40000 Mbps - full-duplex 1645 Port 1 Link Up - speed 40000 Mbps - full-duplex 1646 Port 2 Link Up - speed 10000 Mbps - full-duplex 1647 Port 3 Link Up - speed 10000 Mbps - full-duplex 1648 Done 1649 testpmd> 1650 1651How to dump flows 1652----------------- 1653 1654This section demonstrates how to dump flows. Currently, it's possible to dump 1655all flows with assistance of external tools. 1656 1657#. 2 ways to get flow raw file: 1658 1659 - Using testpmd CLI: 1660 1661 .. code-block:: console 1662 1663 testpmd> flow dump <port> <output_file> 1664 1665 - call rte_flow_dev_dump api: 1666 1667 .. code-block:: console 1668 1669 rte_flow_dev_dump(port, file, NULL); 1670 1671#. Dump human-readable flows from raw file: 1672 1673 Get flow parsing tool from: https://github.com/Mellanox/mlx_steering_dump 1674 1675 .. code-block:: console 1676 1677 mlx_steering_dump.py -f <output_file> 1678