guides/sample_app_ug/performance_thread.rst

d30ea906Sjfb8856606..  SPDX-License-Identifier: BSD-3-Clause
d30ea906Sjfb8856606    Copyright(c) 2015 Intel Corporation.
a9643ea8Slogwang
a9643ea8SlogwangPerformance Thread Sample Application
a9643ea8Slogwang=====================================
a9643ea8Slogwang
a9643ea8SlogwangThe performance thread sample application is a derivative of the standard L3
a9643ea8Slogwangforwarding application that demonstrates different threading models.
a9643ea8Slogwang
a9643ea8SlogwangOverview
a9643ea8Slogwang--------
a9643ea8SlogwangFor a general description of the L3 forwarding applications capabilities
a9643ea8Slogwangplease refer to the documentation of the standard application in
a9643ea8Slogwang:doc:`l3_forward`.
a9643ea8Slogwang
a9643ea8SlogwangThe performance thread sample application differs from the standard L3
a9643ea8Slogwangforwarding example in that it divides the TX and RX processing between
a9643ea8Slogwangdifferent threads, and makes it possible to assign individual threads to
a9643ea8Slogwangdifferent cores.
a9643ea8Slogwang
a9643ea8SlogwangThree threading models are considered:
a9643ea8Slogwang
a9643ea8Slogwang#. When there is one EAL thread per physical core.
a9643ea8Slogwang#. When there are multiple EAL threads per physical core.
a9643ea8Slogwang#. When there are multiple lightweight threads per EAL thread.
a9643ea8Slogwang
a9643ea8SlogwangSince DPDK release 2.0 it is possible to launch applications using the
a9643ea8Slogwang``--lcores`` EAL parameter, specifying cpu-sets for a physical core. With the
a9643ea8Slogwangperformance thread sample application its is now also possible to assign
a9643ea8Slogwangindividual RX and TX functions to different cores.
a9643ea8Slogwang
a9643ea8SlogwangAs an alternative to dividing the L3 forwarding work between different EAL
a9643ea8Slogwangthreads the performance thread sample introduces the possibility to run the
a9643ea8Slogwangapplication threads as lightweight threads (L-threads) within one or
a9643ea8Slogwangmore EAL threads.
a9643ea8Slogwang
a9643ea8SlogwangIn order to facilitate this threading model the example includes a primitive
a9643ea8Slogwangcooperative scheduler (L-thread) subsystem. More details of the L-thread
a9643ea8Slogwangsubsystem can be found in :ref:`lthread_subsystem`.
a9643ea8Slogwang
a9643ea8Slogwang**Note:** Whilst theoretically possible it is not anticipated that multiple
a9643ea8SlogwangL-thread schedulers would be run on the same physical core, this mode of
a9643ea8Slogwangoperation should not be expected to yield useful performance and is considered
a9643ea8Slogwanginvalid.
a9643ea8Slogwang
a9643ea8SlogwangCompiling the Application
a9643ea8Slogwang-------------------------
a9643ea8Slogwang
2bfe3f2eSlogwangTo compile the sample application see :doc:`compiling`.
a9643ea8Slogwang
2bfe3f2eSlogwangThe application is located in the `performance-thread/l3fwd-thread` sub-directory.
a9643ea8Slogwang
a9643ea8SlogwangRunning the Application
a9643ea8Slogwang-----------------------
a9643ea8Slogwang
a9643ea8SlogwangThe application has a number of command line options::
a9643ea8Slogwang
*2d9fd380Sjfb8856606    ./<build_dir>/examples/dpdk-l3fwd-thread [EAL options] --
a9643ea8Slogwang        -p PORTMASK [-P]
a9643ea8Slogwang        --rx(port,queue,lcore,thread)[,(port,queue,lcore,thread)]
a9643ea8Slogwang        --tx(lcore,thread)[,(lcore,thread)]
a9643ea8Slogwang        [--enable-jumbo] [--max-pkt-len PKTLEN]]  [--no-numa]
a9643ea8Slogwang        [--hash-entry-num] [--ipv6] [--no-lthreads] [--stat-lcore lcore]
2bfe3f2eSlogwang        [--parse-ptype]
a9643ea8Slogwang
a9643ea8SlogwangWhere:
a9643ea8Slogwang
a9643ea8Slogwang* ``-p PORTMASK``: Hexadecimal bitmask of ports to configure.
a9643ea8Slogwang
a9643ea8Slogwang* ``-P``: optional, sets all ports to promiscuous mode so that packets are
a9643ea8Slogwang  accepted regardless of the packet's Ethernet MAC destination address.
a9643ea8Slogwang  Without this option, only packets with the Ethernet MAC destination address
a9643ea8Slogwang  set to the Ethernet address of the port are accepted.
a9643ea8Slogwang
a9643ea8Slogwang* ``--rx (port,queue,lcore,thread)[,(port,queue,lcore,thread)]``: the list of
a9643ea8Slogwang  NIC RX ports and queues handled by the RX lcores and threads. The parameters
a9643ea8Slogwang  are explained below.
a9643ea8Slogwang
a9643ea8Slogwang* ``--tx (lcore,thread)[,(lcore,thread)]``: the list of TX threads identifying
a9643ea8Slogwang  the lcore the thread runs on, and the id of RX thread with which it is
a9643ea8Slogwang  associated. The parameters are explained below.
a9643ea8Slogwang
a9643ea8Slogwang* ``--enable-jumbo``: optional, enables jumbo frames.
a9643ea8Slogwang
a9643ea8Slogwang* ``--max-pkt-len``: optional, maximum packet length in decimal (64-9600).
a9643ea8Slogwang
a9643ea8Slogwang* ``--no-numa``: optional, disables numa awareness.
a9643ea8Slogwang
a9643ea8Slogwang* ``--hash-entry-num``: optional, specifies the hash entry number in hex to be
a9643ea8Slogwang  setup.
a9643ea8Slogwang
a9643ea8Slogwang* ``--ipv6``: optional, set it if running ipv6 packets.
a9643ea8Slogwang
a9643ea8Slogwang* ``--no-lthreads``: optional, disables l-thread model and uses EAL threading
a9643ea8Slogwang  model. See below.
a9643ea8Slogwang
a9643ea8Slogwang* ``--stat-lcore``: optional, run CPU load stats collector on the specified
a9643ea8Slogwang  lcore.
a9643ea8Slogwang
2bfe3f2eSlogwang* ``--parse-ptype:`` optional, set to use software to analyze packet type.
2bfe3f2eSlogwang  Without this option, hardware will check the packet type.
2bfe3f2eSlogwang
a9643ea8SlogwangThe parameters of the ``--rx`` and ``--tx`` options are:
a9643ea8Slogwang
a9643ea8Slogwang* ``--rx`` parameters
a9643ea8Slogwang
a9643ea8Slogwang   .. _table_l3fwd_rx_parameters:
a9643ea8Slogwang
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang   | port   | RX port                                              |
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang   | queue  | RX queue that will be read on the specified RX port  |
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang   | lcore  | Core to use for the thread                           |
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang   | thread | Thread id (continuously from 0 to N)                 |
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang* ``--tx`` parameters
a9643ea8Slogwang
a9643ea8Slogwang   .. _table_l3fwd_tx_parameters:
a9643ea8Slogwang
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang   | lcore  | Core to use for L3 route match and transmit          |
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang   | thread | Id of RX thread to be associated with this TX thread |
a9643ea8Slogwang   +--------+------------------------------------------------------+
a9643ea8Slogwang
a9643ea8SlogwangThe ``l3fwd-thread`` application allows you to start packet processing in two
a9643ea8Slogwangthreading models: L-Threads (default) and EAL Threads (when the
a9643ea8Slogwang``--no-lthreads`` parameter is used). For consistency all parameters are used
a9643ea8Slogwangin the same way for both models.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangRunning with L-threads
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangWhen the L-thread model is used (default option), lcore and thread parameters
a9643ea8Slogwangin ``--rx/--tx`` are used to affinitize threads to the selected scheduler.
a9643ea8Slogwang
a9643ea8SlogwangFor example, the following places every l-thread on different lcores::
a9643ea8Slogwang
*2d9fd380Sjfb8856606   dpdk-l3fwd-thread -l 0-7 -n 2 -- -P -p 3 \
a9643ea8Slogwang                --rx="(0,0,0,0)(1,0,1,1)" \
a9643ea8Slogwang                --tx="(2,0)(3,1)"
a9643ea8Slogwang
a9643ea8SlogwangThe following places RX l-threads on lcore 0 and TX l-threads on lcore 1 and 2
a9643ea8Slogwangand so on::
a9643ea8Slogwang
*2d9fd380Sjfb8856606   dpdk-l3fwd-thread -l 0-7 -n 2 -- -P -p 3 \
a9643ea8Slogwang                --rx="(0,0,0,0)(1,0,0,1)" \
a9643ea8Slogwang                --tx="(1,0)(2,1)"
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangRunning with EAL threads
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangWhen the ``--no-lthreads`` parameter is used, the L-threading model is turned
a9643ea8Slogwangoff and EAL threads are used for all processing. EAL threads are enumerated in
a9643ea8Slogwangthe same way as L-threads, but the ``--lcores`` EAL parameter is used to
a9643ea8Slogwangaffinitize threads to the selected cpu-set (scheduler). Thus it is possible to
a9643ea8Slogwangplace every RX and TX thread on different lcores.
a9643ea8Slogwang
a9643ea8SlogwangFor example, the following places every EAL thread on different lcores::
a9643ea8Slogwang
*2d9fd380Sjfb8856606   dpdk-l3fwd-thread -l 0-7 -n 2 -- -P -p 3 \
a9643ea8Slogwang                --rx="(0,0,0,0)(1,0,1,1)" \
a9643ea8Slogwang                --tx="(2,0)(3,1)" \
a9643ea8Slogwang                --no-lthreads
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangTo affinitize two or more EAL threads to one cpu-set, the EAL ``--lcores``
a9643ea8Slogwangparameter is used.
a9643ea8Slogwang
a9643ea8SlogwangThe following places RX EAL threads on lcore 0 and TX EAL threads on lcore 1
a9643ea8Slogwangand 2 and so on::
a9643ea8Slogwang
*2d9fd380Sjfb8856606   dpdk-l3fwd-thread -l 0-7 -n 2 --lcores="(0,1)@0,(2,3)@1" -- -P -p 3 \
a9643ea8Slogwang                --rx="(0,0,0,0)(1,0,1,1)" \
a9643ea8Slogwang                --tx="(2,0)(3,1)" \
a9643ea8Slogwang                --no-lthreads
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangExamples
a9643ea8Slogwang~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangFor selected scenarios the command line configuration of the application for L-threads
a9643ea8Slogwangand its corresponding EAL threads command line can be realized as follows:
a9643ea8Slogwang
a9643ea8Slogwanga) Start every thread on different scheduler (1:1)::
a9643ea8Slogwang
*2d9fd380Sjfb8856606      dpdk-l3fwd-thread -l 0-7 -n 2 -- -P -p 3 \
a9643ea8Slogwang                   --rx="(0,0,0,0)(1,0,1,1)" \
a9643ea8Slogwang                   --tx="(2,0)(3,1)"
a9643ea8Slogwang
a9643ea8Slogwang   EAL thread equivalent::
a9643ea8Slogwang
*2d9fd380Sjfb8856606      dpdk-l3fwd-thread -l 0-7 -n 2 -- -P -p 3 \
a9643ea8Slogwang                   --rx="(0,0,0,0)(1,0,1,1)" \
a9643ea8Slogwang                   --tx="(2,0)(3,1)" \
a9643ea8Slogwang                   --no-lthreads
a9643ea8Slogwang
a9643ea8Slogwangb) Start all threads on one core (N:1).
a9643ea8Slogwang
a9643ea8Slogwang   Start 4 L-threads on lcore 0::
a9643ea8Slogwang
*2d9fd380Sjfb8856606      dpdk-l3fwd-thread -l 0-7 -n 2 -- -P -p 3 \
a9643ea8Slogwang                   --rx="(0,0,0,0)(1,0,0,1)" \
a9643ea8Slogwang                   --tx="(0,0)(0,1)"
a9643ea8Slogwang
a9643ea8Slogwang   Start 4 EAL threads on cpu-set 0::
a9643ea8Slogwang
*2d9fd380Sjfb8856606      dpdk-l3fwd-thread -l 0-7 -n 2 --lcores="(0-3)@0" -- -P -p 3 \
a9643ea8Slogwang                   --rx="(0,0,0,0)(1,0,0,1)" \
a9643ea8Slogwang                   --tx="(2,0)(3,1)" \
a9643ea8Slogwang                   --no-lthreads
a9643ea8Slogwang
a9643ea8Slogwangc) Start threads on different cores (N:M).
a9643ea8Slogwang
a9643ea8Slogwang   Start 2 L-threads for RX on lcore 0, and 2 L-threads for TX on lcore 1::
a9643ea8Slogwang
*2d9fd380Sjfb8856606      dpdk-l3fwd-thread -l 0-7 -n 2 -- -P -p 3 \
a9643ea8Slogwang                   --rx="(0,0,0,0)(1,0,0,1)" \
a9643ea8Slogwang                   --tx="(1,0)(1,1)"
a9643ea8Slogwang
a9643ea8Slogwang   Start 2 EAL threads for RX on cpu-set 0, and 2 EAL threads for TX on
a9643ea8Slogwang   cpu-set 1::
a9643ea8Slogwang
*2d9fd380Sjfb8856606      dpdk-l3fwd-thread -l 0-7 -n 2 --lcores="(0-1)@0,(2-3)@1" -- -P -p 3 \
a9643ea8Slogwang                   --rx="(0,0,0,0)(1,0,1,1)" \
a9643ea8Slogwang                   --tx="(2,0)(3,1)" \
a9643ea8Slogwang                   --no-lthreads
a9643ea8Slogwang
a9643ea8SlogwangExplanation
a9643ea8Slogwang-----------
a9643ea8Slogwang
a9643ea8SlogwangTo a great extent the sample application differs little from the standard L3
a9643ea8Slogwangforwarding application, and readers are advised to familiarize themselves with
a9643ea8Slogwangthe material covered in the :doc:`l3_forward` documentation before proceeding.
a9643ea8Slogwang
a9643ea8SlogwangThe following explanation is focused on the way threading is handled in the
a9643ea8Slogwangperformance thread example.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangMode of operation with EAL threads
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangThe performance thread sample application has split the RX and TX functionality
a9643ea8Slogwanginto two different threads, and the RX and TX threads are
a9643ea8Slogwanginterconnected via software rings. With respect to these rings the RX threads
a9643ea8Slogwangare producers and the TX threads are consumers.
a9643ea8Slogwang
a9643ea8SlogwangOn initialization the TX and RX threads are started according to the command
a9643ea8Slogwangline parameters.
a9643ea8Slogwang
a9643ea8SlogwangThe RX threads poll the network interface queues and post received packets to a
a9643ea8SlogwangTX thread via a corresponding software ring.
a9643ea8Slogwang
a9643ea8SlogwangThe TX threads poll software rings, perform the L3 forwarding hash/LPM match,
a9643ea8Slogwangand assemble packet bursts before performing burst transmit on the network
a9643ea8Slogwanginterface.
a9643ea8Slogwang
a9643ea8SlogwangAs with the standard L3 forward application, burst draining of residual packets
a9643ea8Slogwangis performed periodically with the period calculated from elapsed time using
a9643ea8Slogwangthe timestamps counter.
a9643ea8Slogwang
a9643ea8SlogwangThe diagram below illustrates a case with two RX threads and three TX threads.
a9643ea8Slogwang
a9643ea8Slogwang.. _figure_performance_thread_1:
a9643ea8Slogwang
a9643ea8Slogwang.. figure:: img/performance_thread_1.*
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangMode of operation with L-threads
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangLike the EAL thread configuration the application has split the RX and TX
a9643ea8Slogwangfunctionality into different threads, and the pairs of RX and TX threads are
a9643ea8Slogwanginterconnected via software rings.
a9643ea8Slogwang
a9643ea8SlogwangOn initialization an L-thread scheduler is started on every EAL thread. On all
*2d9fd380Sjfb8856606but the main EAL thread only a dummy L-thread is initially started.
*2d9fd380Sjfb8856606The L-thread started on the main EAL thread then spawns other L-threads on
d30ea906Sjfb8856606different L-thread schedulers according the command line parameters.
a9643ea8Slogwang
a9643ea8SlogwangThe RX threads poll the network interface queues and post received packets
a9643ea8Slogwangto a TX thread via the corresponding software ring.
a9643ea8Slogwang
a9643ea8SlogwangThe ring interface is augmented by means of an L-thread condition variable that
a9643ea8Slogwangenables the TX thread to be suspended when the TX ring is empty. The RX thread
a9643ea8Slogwangsignals the condition whenever it posts to the TX ring, causing the TX thread
a9643ea8Slogwangto be resumed.
a9643ea8Slogwang
a9643ea8SlogwangAdditionally the TX L-thread spawns a worker L-thread to take care of
a9643ea8Slogwangpolling the software rings, whilst it handles burst draining of the transmit
a9643ea8Slogwangbuffer.
a9643ea8Slogwang
a9643ea8SlogwangThe worker threads poll the software rings, perform L3 route lookup and
a9643ea8Slogwangassemble packet bursts. If the TX ring is empty the worker thread suspends
a9643ea8Slogwangitself by waiting on the condition variable associated with the ring.
a9643ea8Slogwang
a9643ea8SlogwangBurst draining of residual packets, less than the burst size, is performed by
a9643ea8Slogwangthe TX thread which sleeps (using an L-thread sleep function) and resumes
a9643ea8Slogwangperiodically to flush the TX buffer.
a9643ea8Slogwang
a9643ea8SlogwangThis design means that L-threads that have no work, can yield the CPU to other
a9643ea8SlogwangL-threads and avoid having to constantly poll the software rings.
a9643ea8Slogwang
a9643ea8SlogwangThe diagram below illustrates a case with two RX threads and three TX functions
a9643ea8Slogwang(each comprising a thread that processes forwarding and a thread that
a9643ea8Slogwangperiodically drains the output buffer of residual packets).
a9643ea8Slogwang
a9643ea8Slogwang.. _figure_performance_thread_2:
a9643ea8Slogwang
a9643ea8Slogwang.. figure:: img/performance_thread_2.*
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangCPU load statistics
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangIt is possible to display statistics showing estimated CPU load on each core.
a9643ea8SlogwangThe statistics indicate the percentage of CPU time spent: processing
a9643ea8Slogwangreceived packets (forwarding), polling queues/rings (waiting for work),
a9643ea8Slogwangand doing any other processing (context switch and other overhead).
a9643ea8Slogwang
a9643ea8SlogwangWhen enabled statistics are gathered by having the application threads set and
a9643ea8Slogwangclear flags when they enter and exit pertinent code sections. The flags are
a9643ea8Slogwangthen sampled in real time by a statistics collector thread running on another
a9643ea8Slogwangcore. This thread displays the data in real time on the console.
a9643ea8Slogwang
a9643ea8SlogwangThis feature is enabled by designating a statistics collector core, using the
a9643ea8Slogwang``--stat-lcore`` parameter.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _lthread_subsystem:
a9643ea8Slogwang
a9643ea8SlogwangThe L-thread subsystem
a9643ea8Slogwang----------------------
a9643ea8Slogwang
a9643ea8SlogwangThe L-thread subsystem resides in the examples/performance-thread/common
a9643ea8Slogwangdirectory and is built and linked automatically when building the
a9643ea8Slogwang``l3fwd-thread`` example.
a9643ea8Slogwang
a9643ea8SlogwangThe subsystem provides a simple cooperative scheduler to enable arbitrary
a9643ea8Slogwangfunctions to run as cooperative threads within a single EAL thread.
a9643ea8SlogwangThe subsystem provides a pthread like API that is intended to assist in
a9643ea8Slogwangreuse of legacy code written for POSIX pthreads.
a9643ea8Slogwang
a9643ea8SlogwangThe following sections provide some detail on the features, constraints,
a9643ea8Slogwangperformance and porting considerations when using L-threads.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _comparison_between_lthreads_and_pthreads:
a9643ea8Slogwang
a9643ea8SlogwangComparison between L-threads and POSIX pthreads
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangThe fundamental difference between the L-thread and pthread models is the
a9643ea8Slogwangway in which threads are scheduled. The simplest way to think about this is to
a9643ea8Slogwangconsider the case of a processor with a single CPU. To run multiple threads
a9643ea8Slogwangon a single CPU, the scheduler must frequently switch between the threads,
a9643ea8Slogwangin order that each thread is able to make timely progress.
a9643ea8SlogwangThis is the basis of any multitasking operating system.
a9643ea8Slogwang
a9643ea8SlogwangThis section explores the differences between the pthread model and the
a9643ea8SlogwangL-thread model as implemented in the provided L-thread subsystem. If needed a
a9643ea8Slogwangtheoretical discussion of preemptive vs cooperative multi-threading can be
a9643ea8Slogwangfound in any good text on operating system design.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangScheduling and context switching
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangThe POSIX pthread library provides an application programming interface to
a9643ea8Slogwangcreate and synchronize threads. Scheduling policy is determined by the host OS,
a9643ea8Slogwangand may be configurable. The OS may use sophisticated rules to determine which
a9643ea8Slogwangthread should be run next, threads may suspend themselves or make other threads
a9643ea8Slogwangready, and the scheduler may employ a time slice giving each thread a maximum
a9643ea8Slogwangtime quantum after which it will be preempted in favor of another thread that
a9643ea8Slogwangis ready to run. To complicate matters further threads may be assigned
a9643ea8Slogwangdifferent scheduling priorities.
a9643ea8Slogwang
a9643ea8SlogwangBy contrast the L-thread subsystem is considerably simpler. Logically the
a9643ea8SlogwangL-thread scheduler performs the same multiplexing function for L-threads
a9643ea8Slogwangwithin a single pthread as the OS scheduler does for pthreads within an
a9643ea8Slogwangapplication process. The L-thread scheduler is simply the main loop of a
a9643ea8Slogwangpthread, and in so far as the host OS is concerned it is a regular pthread
a9643ea8Slogwangjust like any other. The host OS is oblivious about the existence of and
a9643ea8Slogwangnot at all involved in the scheduling of L-threads.
a9643ea8Slogwang
a9643ea8SlogwangThe other and most significant difference between the two models is that
a9643ea8SlogwangL-threads are scheduled cooperatively. L-threads cannot not preempt each
a9643ea8Slogwangother, nor can the L-thread scheduler preempt a running L-thread (i.e.
a9643ea8Slogwangthere is no time slicing). The consequence is that programs implemented with
a9643ea8SlogwangL-threads must possess frequent rescheduling points, meaning that they must
a9643ea8Slogwangexplicitly and of their own volition return to the scheduler at frequent
a9643ea8Slogwangintervals, in order to allow other L-threads an opportunity to proceed.
a9643ea8Slogwang
a9643ea8SlogwangIn both models switching between threads requires that the current CPU
a9643ea8Slogwangcontext is saved and a new context (belonging to the next thread ready to run)
a9643ea8Slogwangis restored. With pthreads this context switching is handled transparently
a9643ea8Slogwangand the set of CPU registers that must be preserved between context switches
a9643ea8Slogwangis as per an interrupt handler.
a9643ea8Slogwang
a9643ea8SlogwangAn L-thread context switch is achieved by the thread itself making a function
a9643ea8Slogwangcall to the L-thread scheduler. Thus it is only necessary to preserve the
a9643ea8Slogwangcallee registers. The caller is responsible to save and restore any other
a9643ea8Slogwangregisters it is using before a function call, and restore them on return,
a9643ea8Slogwangand this is handled by the compiler. For ``X86_64`` on both Linux and BSD the
a9643ea8SlogwangSystem V calling convention is used, this defines registers RSP, RBP, and
a9643ea8SlogwangR12-R15 as callee-save registers (for more detailed discussion a good reference
a9643ea8Slogwangis `X86 Calling Conventions <https://en.wikipedia.org/wiki/X86_calling_conventions>`_).
a9643ea8Slogwang
a9643ea8SlogwangTaking advantage of this, and due to the absence of preemption, an L-thread
a9643ea8Slogwangcontext switch is achieved with less than 20 load/store instructions.
a9643ea8Slogwang
a9643ea8SlogwangThe scheduling policy for L-threads is fixed, there is no prioritization of
a9643ea8SlogwangL-threads, all L-threads are equal and scheduling is based on a FIFO
a9643ea8Slogwangready queue.
a9643ea8Slogwang
a9643ea8SlogwangAn L-thread is a struct containing the CPU context of the thread
a9643ea8Slogwang(saved on context switch) and other useful items. The ready queue contains
a9643ea8Slogwangpointers to threads that are ready to run. The L-thread scheduler is a simple
a9643ea8Slogwangloop that polls the ready queue, reads from it the next thread ready to run,
a9643ea8Slogwangwhich it resumes by saving the current context (the current position in the
a9643ea8Slogwangscheduler loop) and restoring the context of the next thread from its thread
a9643ea8Slogwangstruct. Thus an L-thread is always resumed at the last place it yielded.
a9643ea8Slogwang
a9643ea8SlogwangA well behaved L-thread will call the context switch regularly (at least once
a9643ea8Slogwangin its main loop) thus returning to the scheduler's own main loop. Yielding
a9643ea8Slogwanginserts the current thread at the back of the ready queue, and the process of
a9643ea8Slogwangservicing the ready queue is repeated, thus the system runs by flipping back
a9643ea8Slogwangand forth the between L-threads and scheduler loop.
a9643ea8Slogwang
a9643ea8SlogwangIn the case of pthreads, the preemptive scheduling, time slicing, and support
a9643ea8Slogwangfor thread prioritization means that progress is normally possible for any
a9643ea8Slogwangthread that is ready to run. This comes at the price of a relatively heavier
a9643ea8Slogwangcontext switch and scheduling overhead.
a9643ea8Slogwang
a9643ea8SlogwangWith L-threads the progress of any particular thread is determined by the
a9643ea8Slogwangfrequency of rescheduling opportunities in the other L-threads. This means that
a9643ea8Slogwangan errant L-thread monopolizing the CPU might cause scheduling of other threads
a9643ea8Slogwangto be stalled. Due to the lower cost of context switching, however, voluntary
a9643ea8Slogwangrescheduling to ensure progress of other threads, if managed sensibly, is not
a9643ea8Slogwanga prohibitive overhead, and overall performance can exceed that of an
a9643ea8Slogwangapplication using pthreads.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangMutual exclusion
a9643ea8Slogwang^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangWith pthreads preemption means that threads that share data must observe
a9643ea8Slogwangsome form of mutual exclusion protocol.
a9643ea8Slogwang
a9643ea8SlogwangThe fact that L-threads cannot preempt each other means that in many cases
a9643ea8Slogwangmutual exclusion devices can be completely avoided.
a9643ea8Slogwang
a9643ea8SlogwangLocking to protect shared data can be a significant bottleneck in
a9643ea8Slogwangmulti-threaded applications so a carefully designed cooperatively scheduled
a9643ea8Slogwangprogram can enjoy significant performance advantages.
a9643ea8Slogwang
a9643ea8SlogwangSo far we have considered only the simplistic case of a single core CPU,
a9643ea8Slogwangwhen multiple CPUs are considered things are somewhat more complex.
a9643ea8Slogwang
a9643ea8SlogwangFirst of all it is inevitable that there must be multiple L-thread schedulers,
a9643ea8Slogwangone running on each EAL thread. So long as these schedulers remain isolated
a9643ea8Slogwangfrom each other the above assertions about the potential advantages of
a9643ea8Slogwangcooperative scheduling hold true.
a9643ea8Slogwang
a9643ea8SlogwangA configuration with isolated cooperative schedulers is less flexible than the
a9643ea8Slogwangpthread model where threads can be affinitized to run on any CPU. With isolated
a9643ea8Slogwangschedulers scaling of applications to utilize fewer or more CPUs according to
a9643ea8Slogwangsystem demand is very difficult to achieve.
a9643ea8Slogwang
a9643ea8SlogwangThe L-thread subsystem makes it possible for L-threads to migrate between
a9643ea8Slogwangschedulers running on different CPUs. Needless to say if the migration means
a9643ea8Slogwangthat threads that share data end up running on different CPUs then this will
a9643ea8Slogwangintroduce the need for some kind of mutual exclusion system.
a9643ea8Slogwang
a9643ea8SlogwangOf course ``rte_ring`` software rings can always be used to interconnect
a9643ea8Slogwangthreads running on different cores, however to protect other kinds of shared
a9643ea8Slogwangdata structures, lock free constructs or else explicit locking will be
a9643ea8Slogwangrequired. This is a consideration for the application design.
a9643ea8Slogwang
a9643ea8SlogwangIn support of this extended functionality, the L-thread subsystem implements
a9643ea8Slogwangthread safe mutexes and condition variables.
a9643ea8Slogwang
a9643ea8SlogwangThe cost of affinitizing and of condition variable signaling is significantly
a9643ea8Slogwanglower than the equivalent pthread operations, and so applications using these
a9643ea8Slogwangfeatures will see a performance benefit.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangThread local storage
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangAs with applications written for pthreads an application written for L-threads
a9643ea8Slogwangcan take advantage of thread local storage, in this case local to an L-thread.
a9643ea8SlogwangAn application may save and retrieve a single pointer to application data in
a9643ea8Slogwangthe L-thread struct.
a9643ea8Slogwang
a9643ea8SlogwangFor legacy and backward compatibility reasons two alternative methods are also
1646932aSjfb8856606offered, the first is modeled directly on the pthread get/set specific APIs,
1646932aSjfb8856606the second approach is modeled on the ``RTE_PER_LCORE`` macros, whereby
a9643ea8Slogwang``PER_LTHREAD`` macros are introduced, in both cases the storage is local to
a9643ea8Slogwangthe L-thread.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _constraints_and_performance_implications:
a9643ea8Slogwang
a9643ea8SlogwangConstraints and performance implications when using L-threads
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _API_compatibility:
a9643ea8Slogwang
a9643ea8SlogwangAPI compatibility
a9643ea8Slogwang^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangThe L-thread subsystem provides a set of functions that are logically equivalent
a9643ea8Slogwangto the corresponding functions offered by the POSIX pthread library, however not
a9643ea8Slogwangall pthread functions have a corresponding L-thread equivalent, and not all
a9643ea8Slogwangfeatures available to pthreads are implemented for L-threads.
a9643ea8Slogwang
a9643ea8SlogwangThe pthread library offers considerable flexibility via programmable attributes
a9643ea8Slogwangthat can be associated with threads, mutexes, and condition variables.
a9643ea8Slogwang
a9643ea8SlogwangBy contrast the L-thread subsystem has fixed functionality, the scheduler policy
a9643ea8Slogwangcannot be varied, and L-threads cannot be prioritized. There are no variable
a9643ea8Slogwangattributes associated with any L-thread objects. L-threads, mutexes and
a9643ea8Slogwangconditional variables, all have fixed functionality. (Note: reserved parameters
a9643ea8Slogwangare included in the APIs to facilitate possible future support for attributes).
a9643ea8Slogwang
a9643ea8SlogwangThe table below lists the pthread and equivalent L-thread APIs with notes on
a9643ea8Slogwangdifferences and/or constraints. Where there is no L-thread entry in the table,
a9643ea8Slogwangthen the L-thread subsystem provides no equivalent function.
a9643ea8Slogwang
a9643ea8Slogwang.. _table_lthread_pthread:
a9643ea8Slogwang
a9643ea8Slogwang.. table:: Pthread and equivalent L-thread APIs.
a9643ea8Slogwang
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | **Pthread function**       | **L-thread function**  | **Notes**         |
a9643ea8Slogwang   +============================+========================+===================+
a9643ea8Slogwang   | pthread_barrier_destroy    |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_barrier_init       |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_barrier_wait       |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_cond_broadcast     | lthread_cond_broadcast | See note 1        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_cond_destroy       | lthread_cond_destroy   |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_cond_init          | lthread_cond_init      |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_cond_signal        | lthread_cond_signal    | See note 1        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_cond_timedwait     |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_cond_wait          | lthread_cond_wait      | See note 5        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_create             | lthread_create         | See notes 2, 3    |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_detach             | lthread_detach         | See note 4        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_equal              |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_exit               | lthread_exit           |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_getspecific        | lthread_getspecific    |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_getcpuclockid      |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_join               | lthread_join           |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_key_create         | lthread_key_create     |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_key_delete         | lthread_key_delete     |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_mutex_destroy      | lthread_mutex_destroy  |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_mutex_init         | lthread_mutex_init     |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_mutex_lock         | lthread_mutex_lock     | See note 6        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_mutex_trylock      | lthread_mutex_trylock  | See note 6        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_mutex_timedlock    |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_mutex_unlock       | lthread_mutex_unlock   |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_once               |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_destroy     |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_init        |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_rdlock      |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_timedrdlock |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_timedwrlock |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_tryrdlock   |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_trywrlock   |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_unlock      |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_rwlock_wrlock      |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_self               | lthread_current        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_setspecific        | lthread_setspecific    |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_spin_init          |                        | See note 10       |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_spin_destroy       |                        | See note 10       |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_spin_lock          |                        | See note 10       |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_spin_trylock       |                        | See note 10       |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_spin_unlock        |                        | See note 10       |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_cancel             | lthread_cancel         |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_setcancelstate     |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_setcanceltype      |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_testcancel         |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_getschedparam      |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_setschedparam      |                        |                   |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_yield              | lthread_yield          | See note 7        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   | pthread_setaffinity_np     | lthread_set_affinity   | See notes 2, 3, 8 |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   |                            | lthread_sleep          | See note 9        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang   |                            | lthread_sleep_clks     | See note 9        |
a9643ea8Slogwang   +----------------------------+------------------------+-------------------+
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 1**:
a9643ea8Slogwang
a9643ea8SlogwangNeither lthread signal nor broadcast may be called concurrently by L-threads
a9643ea8Slogwangrunning on different schedulers, although multiple L-threads running in the
a9643ea8Slogwangsame scheduler may freely perform signal or broadcast operations. L-threads
a9643ea8Slogwangrunning on the same or different schedulers may always safely wait on a
a9643ea8Slogwangcondition variable.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 2**:
a9643ea8Slogwang
a9643ea8SlogwangPthread attributes may be used to affinitize a pthread with a cpu-set. The
a9643ea8SlogwangL-thread subsystem does not support a cpu-set. An L-thread may be affinitized
a9643ea8Slogwangonly with a single CPU at any time.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 3**:
a9643ea8Slogwang
a9643ea8SlogwangIf an L-thread is intended to run on a different NUMA node than the node that
a9643ea8Slogwangcreates the thread then, when calling ``lthread_create()`` it is advantageous
a9643ea8Slogwangto specify the destination core as a parameter of ``lthread_create()``. See
a9643ea8Slogwang:ref:`memory_allocation_and_NUMA_awareness` for details.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 4**:
a9643ea8Slogwang
a9643ea8SlogwangAn L-thread can only detach itself, and cannot detach other L-threads.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 5**:
a9643ea8Slogwang
a9643ea8SlogwangA wait operation on a pthread condition variable is always associated with and
a9643ea8Slogwangprotected by a mutex which must be owned by the thread at the time it invokes
a9643ea8Slogwang``pthread_wait()``. By contrast L-thread condition variables are thread safe
a9643ea8Slogwang(for waiters) and do not use an associated mutex. Multiple L-threads (including
a9643ea8SlogwangL-threads running on other schedulers) can safely wait on a L-thread condition
a9643ea8Slogwangvariable. As a consequence the performance of an L-thread condition variables
a9643ea8Slogwangis typically an order of magnitude faster than its pthread counterpart.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 6**:
a9643ea8Slogwang
a9643ea8SlogwangRecursive locking is not supported with L-threads, attempts to take a lock
a9643ea8Slogwangrecursively will be detected and rejected.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 7**:
a9643ea8Slogwang
a9643ea8Slogwang``lthread_yield()`` will save the current context, insert the current thread
a9643ea8Slogwangto the back of the ready queue, and resume the next ready thread. Yielding
a9643ea8Slogwangincreases ready queue backlog, see :ref:`ready_queue_backlog` for more details
a9643ea8Slogwangabout the implications of this.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangN.B. The context switch time as measured from immediately before the call to
a9643ea8Slogwang``lthread_yield()`` to the point at which the next ready thread is resumed,
a9643ea8Slogwangcan be an order of magnitude faster that the same measurement for
a9643ea8Slogwangpthread_yield.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 8**:
a9643ea8Slogwang
a9643ea8Slogwang``lthread_set_affinity()`` is similar to a yield apart from the fact that the
a9643ea8Slogwangyielding thread is inserted into a peer ready queue of another scheduler.
a9643ea8SlogwangThe peer ready queue is actually a separate thread safe queue, which means that
a9643ea8Slogwangthreads appearing in the peer ready queue can jump any backlog in the local
a9643ea8Slogwangready queue on the destination scheduler.
a9643ea8Slogwang
a9643ea8SlogwangThe context switch time as measured from the time just before the call to
a9643ea8Slogwang``lthread_set_affinity()`` to just after the same thread is resumed on the new
a9643ea8Slogwangscheduler can be orders of magnitude faster than the same measurement for
a9643ea8Slogwang``pthread_setaffinity_np()``.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 9**:
a9643ea8Slogwang
a9643ea8SlogwangAlthough there is no ``pthread_sleep()`` function, ``lthread_sleep()`` and
a9643ea8Slogwang``lthread_sleep_clks()`` can be used wherever ``sleep()``, ``usleep()`` or
a9643ea8Slogwang``nanosleep()`` might ordinarily be used. The L-thread sleep functions suspend
a9643ea8Slogwangthe current thread, start an ``rte_timer`` and resume the thread when the
a9643ea8Slogwangtimer matures. The ``rte_timer_manage()`` entry point is called on every pass
a9643ea8Slogwangof the scheduler loop. This means that the worst case jitter on timer expiry
a9643ea8Slogwangis determined by the longest period between context switches of any running
a9643ea8SlogwangL-threads.
a9643ea8Slogwang
a9643ea8SlogwangIn a synthetic test with many threads sleeping and resuming then the measured
a9643ea8Slogwangjitter is typically orders of magnitude lower than the same measurement made
a9643ea8Slogwangfor ``nanosleep()``.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang**Note 10**:
a9643ea8Slogwang
a9643ea8SlogwangSpin locks are not provided because they are problematical in a cooperative
a9643ea8Slogwangenvironment, see :ref:`porting_locks_and_spinlocks` for a more detailed
a9643ea8Slogwangdiscussion on how to avoid spin locks.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _Thread_local_storage_performance:
a9643ea8Slogwang
a9643ea8SlogwangThread local storage
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangOf the three L-thread local storage options the simplest and most efficient is
a9643ea8Slogwangstoring a single application data pointer in the L-thread struct.
a9643ea8Slogwang
a9643ea8SlogwangThe ``PER_LTHREAD`` macros involve a run time computation to obtain the address
a9643ea8Slogwangof the variable being saved/retrieved and also require that the accesses are
a9643ea8Slogwangde-referenced  via a pointer. This means that code that has used
a9643ea8Slogwang``RTE_PER_LCORE`` macros being ported to L-threads might need some slight
a9643ea8Slogwangadjustment (see :ref:`porting_thread_local_storage` for hints about porting
a9643ea8Slogwangcode that makes use of thread local storage).
a9643ea8Slogwang
a9643ea8SlogwangThe get/set specific APIs are consistent with their pthread counterparts both
a9643ea8Slogwangin use and in performance.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _memory_allocation_and_NUMA_awareness:
a9643ea8Slogwang
a9643ea8SlogwangMemory allocation and NUMA awareness
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangAll memory allocation is from DPDK huge pages, and is NUMA aware. Each
a9643ea8Slogwangscheduler maintains its own caches of objects: lthreads, their stacks, TLS,
a9643ea8Slogwangmutexes and condition variables. These caches are implemented as unbounded lock
a9643ea8Slogwangfree MPSC queues. When objects are created they are always allocated from the
a9643ea8Slogwangcaches on the local core (current EAL thread).
a9643ea8Slogwang
a9643ea8SlogwangIf an L-thread has been affinitized to a different scheduler, then it can
a9643ea8Slogwangalways safely free resources to the caches from which they originated (because
a9643ea8Slogwangthe caches are MPSC queues).
a9643ea8Slogwang
a9643ea8SlogwangIf the L-thread has been affinitized to a different NUMA node then the memory
a9643ea8Slogwangresources associated with it may incur longer access latency.
a9643ea8Slogwang
a9643ea8SlogwangThe commonly used pattern of setting affinity on entry to a thread after it has
a9643ea8Slogwangstarted, means that memory allocation for both the stack and TLS will have been
a9643ea8Slogwangmade from caches on the NUMA node on which the threads creator is running.
a9643ea8SlogwangThis has the side effect that access latency will be sub-optimal after
a9643ea8Slogwangaffinitizing.
a9643ea8Slogwang
a9643ea8SlogwangThis side effect can be mitigated to some extent (although not completely) by
a9643ea8Slogwangspecifying the destination CPU as a parameter of ``lthread_create()`` this
a9643ea8Slogwangcauses the L-thread's stack and TLS to be allocated when it is first scheduled
a9643ea8Slogwangon the destination scheduler, if the destination is a on another NUMA node it
a9643ea8Slogwangresults in a more optimal memory allocation.
a9643ea8Slogwang
a9643ea8SlogwangNote that the lthread struct itself remains allocated from memory on the
a9643ea8Slogwangcreating node, this is unavoidable because an L-thread is known everywhere by
a9643ea8Slogwangthe address of this struct.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _object_cache_sizing:
a9643ea8Slogwang
a9643ea8SlogwangObject cache sizing
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangThe per lcore object caches pre-allocate objects in bulk whenever a request to
a9643ea8Slogwangallocate an object finds a cache empty. By default 100 objects are
a9643ea8Slogwangpre-allocated, this is defined by ``LTHREAD_PREALLOC`` in the public API
a9643ea8Slogwangheader file lthread_api.h. This means that the caches constantly grow to meet
a9643ea8Slogwangsystem demand.
a9643ea8Slogwang
a9643ea8SlogwangIn the present implementation there is no mechanism to reduce the cache sizes
a9643ea8Slogwangif system demand reduces. Thus the caches will remain at their maximum extent
a9643ea8Slogwangindefinitely.
a9643ea8Slogwang
a9643ea8SlogwangA consequence of the bulk pre-allocation of objects is that every 100 (default
a9643ea8Slogwangvalue) additional new object create operations results in a call to
a9643ea8Slogwang``rte_malloc()``. For creation of objects such as L-threads, which trigger the
a9643ea8Slogwangallocation of even more objects (i.e. their stacks and TLS) then this can
a9643ea8Slogwangcause outliers in scheduling performance.
a9643ea8Slogwang
a9643ea8SlogwangIf this is a problem the simplest mitigation strategy is to dimension the
a9643ea8Slogwangsystem, by setting the bulk object pre-allocation size to some large number
a9643ea8Slogwangthat you do not expect to be exceeded. This means the caches will be populated
a9643ea8Slogwangonce only, the very first time a thread is created.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _Ready_queue_backlog:
a9643ea8Slogwang
a9643ea8SlogwangReady queue backlog
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangOne of the more subtle performance considerations is managing the ready queue
a9643ea8Slogwangbacklog. The fewer threads that are waiting in the ready queue then the faster
a9643ea8Slogwangany particular thread will get serviced.
a9643ea8Slogwang
a9643ea8SlogwangIn a naive L-thread application with N L-threads simply looping and yielding,
a9643ea8Slogwangthis backlog will always be equal to the number of L-threads, thus the cost of
a9643ea8Slogwanga yield to a particular L-thread will be N times the context switch time.
a9643ea8Slogwang
a9643ea8SlogwangThis side effect can be mitigated by arranging for threads to be suspended and
a9643ea8Slogwangwait to be resumed, rather than polling for work by constantly yielding.
a9643ea8SlogwangBlocking on a mutex or condition variable or even more obviously having a
a9643ea8Slogwangthread sleep if it has a low frequency workload are all mechanisms by which a
a9643ea8Slogwangthread can be excluded from the ready queue until it really does need to be
a9643ea8Slogwangrun. This can have a significant positive impact on performance.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _Initialization_and_shutdown_dependencies:
a9643ea8Slogwang
a9643ea8SlogwangInitialization, shutdown and dependencies
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangThe L-thread subsystem depends on DPDK for huge page allocation and depends on
a9643ea8Slogwangthe ``rte_timer subsystem``. The DPDK EAL initialization and
a9643ea8Slogwang``rte_timer_subsystem_init()`` **MUST** be completed before the L-thread sub
a9643ea8Slogwangsystem can be used.
a9643ea8Slogwang
a9643ea8SlogwangThereafter initialization of the L-thread subsystem is largely transparent to
a9643ea8Slogwangthe application. Constructor functions ensure that global variables are properly
a9643ea8Slogwanginitialized. Other than global variables each scheduler is initialized
a9643ea8Slogwangindependently the first time that an L-thread is created by a particular EAL
a9643ea8Slogwangthread.
a9643ea8Slogwang
a9643ea8SlogwangIf the schedulers are to be run as isolated and independent schedulers, with
a9643ea8Slogwangno intention that L-threads running on different schedulers will migrate between
a9643ea8Slogwangschedulers or synchronize with L-threads running on other schedulers, then
a9643ea8Slogwanginitialization consists simply of creating an L-thread, and then running the
a9643ea8SlogwangL-thread scheduler.
a9643ea8Slogwang
a9643ea8SlogwangIf there will be interaction between L-threads running on different schedulers,
a9643ea8Slogwangthen it is important that the starting of schedulers on different EAL threads
a9643ea8Slogwangis synchronized.
a9643ea8Slogwang
a9643ea8SlogwangTo achieve this an additional initialization step is necessary, this is simply
a9643ea8Slogwangto set the number of schedulers by calling the API function
a9643ea8Slogwang``lthread_num_schedulers_set(n)``, where ``n`` is the number of EAL threads
a9643ea8Slogwangthat will run L-thread schedulers. Setting the number of schedulers to a
a9643ea8Slogwangnumber greater than 0 will cause all schedulers to wait until the others have
a9643ea8Slogwangstarted before beginning to schedule L-threads.
a9643ea8Slogwang
a9643ea8SlogwangThe L-thread scheduler is started by calling the function ``lthread_run()``
a9643ea8Slogwangand should be called from the EAL thread and thus become the main loop of the
a9643ea8SlogwangEAL thread.
a9643ea8Slogwang
a9643ea8SlogwangThe function ``lthread_run()``, will not return until all threads running on
a9643ea8Slogwangthe scheduler have exited, and the scheduler has been explicitly stopped by
a9643ea8Slogwangcalling ``lthread_scheduler_shutdown(lcore)`` or
a9643ea8Slogwang``lthread_scheduler_shutdown_all()``.
a9643ea8Slogwang
a9643ea8SlogwangAll these function do is tell the scheduler that it can exit when there are no
a9643ea8Slogwanglonger any running L-threads, neither function forces any running L-thread to
a9643ea8Slogwangterminate. Any desired application shutdown behavior must be designed and
a9643ea8Slogwangbuilt into the application to ensure that L-threads complete in a timely
a9643ea8Slogwangmanner.
a9643ea8Slogwang
a9643ea8Slogwang**Important Note:** It is assumed when the scheduler exits that the application
a9643ea8Slogwangis terminating for good, the scheduler does not free resources before exiting
a9643ea8Slogwangand running the scheduler a subsequent time will result in undefined behavior.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _porting_legacy_code_to_run_on_lthreads:
a9643ea8Slogwang
a9643ea8SlogwangPorting legacy code to run on L-threads
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangLegacy code originally written for a pthread environment may be ported to
a9643ea8SlogwangL-threads if the considerations about differences in scheduling policy, and
a9643ea8Slogwangconstraints discussed in the previous sections can be accommodated.
a9643ea8Slogwang
a9643ea8SlogwangThis section looks in more detail at some of the issues that may have to be
a9643ea8Slogwangresolved when porting code.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _pthread_API_compatibility:
a9643ea8Slogwang
a9643ea8Slogwangpthread API compatibility
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangThe first step is to establish exactly which pthread APIs the legacy
a9643ea8Slogwangapplication uses, and to understand the requirements of those APIs. If there
a9643ea8Slogwangare corresponding L-lthread APIs, and where the default pthread functionality
a9643ea8Slogwangis used by the application then, notwithstanding the other issues discussed
a9643ea8Slogwanghere, it should be feasible to run the application with L-threads. If the
a9643ea8Slogwanglegacy code modifies the default behavior using attributes then if may be
a9643ea8Slogwangnecessary to make some adjustments to eliminate those requirements.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _blocking_system_calls:
a9643ea8Slogwang
a9643ea8SlogwangBlocking system API calls
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangIt is important to understand what other system services the application may be
a9643ea8Slogwangusing, bearing in mind that in a cooperatively scheduled environment a thread
a9643ea8Slogwangcannot block without stalling the scheduler and with it all other cooperative
a9643ea8Slogwangthreads. Any kind of blocking system call, for example file or socket IO, is a
a9643ea8Slogwangpotential problem, a good tool to analyze the application for this purpose is
a9643ea8Slogwangthe ``strace`` utility.
a9643ea8Slogwang
a9643ea8SlogwangThere are many strategies to resolve these kind of issues, each with it
a9643ea8Slogwangmerits. Possible solutions include:
a9643ea8Slogwang
a9643ea8Slogwang* Adopting a polled mode of the system API concerned (if available).
a9643ea8Slogwang
a9643ea8Slogwang* Arranging for another core to perform the function and synchronizing with
a9643ea8Slogwang  that core via constructs that will not block the L-thread.
a9643ea8Slogwang
a9643ea8Slogwang* Affinitizing the thread to another scheduler devoted (as a matter of policy)
a9643ea8Slogwang  to handling threads wishing to make blocking calls, and then back again when
a9643ea8Slogwang  finished.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _porting_locks_and_spinlocks:
a9643ea8Slogwang
a9643ea8SlogwangLocks and spinlocks
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangLocks and spinlocks are another source of blocking behavior that for the same
a9643ea8Slogwangreasons as system calls will need to be addressed.
a9643ea8Slogwang
a9643ea8SlogwangIf the application design ensures that the contending L-threads will always
a9643ea8Slogwangrun on the same scheduler then it its probably safe to remove locks and spin
a9643ea8Slogwanglocks completely.
a9643ea8Slogwang
a9643ea8SlogwangThe only exception to the above rule is if for some reason the
a9643ea8Slogwangcode performs any kind of context switch whilst holding the lock
a9643ea8Slogwang(e.g. yield, sleep, or block on a different lock, or on a condition variable).
a9643ea8SlogwangThis will need to determined before deciding to eliminate a lock.
a9643ea8Slogwang
a9643ea8SlogwangIf a lock cannot be eliminated then an L-thread mutex can be substituted for
a9643ea8Slogwangeither kind of lock.
a9643ea8Slogwang
a9643ea8SlogwangAn L-thread blocking on an L-thread mutex will be suspended and will cause
a9643ea8Slogwanganother ready L-thread to be resumed, thus not blocking the scheduler. When
a9643ea8Slogwangdefault behavior is required, it can be used as a direct replacement for a
a9643ea8Slogwangpthread mutex lock.
a9643ea8Slogwang
a9643ea8SlogwangSpin locks are typically used when lock contention is likely to be rare and
a9643ea8Slogwangwhere the period during which the lock may be held is relatively short.
a9643ea8SlogwangWhen the contending L-threads are running on the same scheduler then an
a9643ea8SlogwangL-thread blocking on a spin lock will enter an infinite loop stopping the
a9643ea8Slogwangscheduler completely (see :ref:`porting_infinite_loops` below).
a9643ea8Slogwang
a9643ea8SlogwangIf the application design ensures that contending L-threads will always run
a9643ea8Slogwangon different schedulers then it might be reasonable to leave a short spin lock
a9643ea8Slogwangthat rarely experiences contention in place.
a9643ea8Slogwang
a9643ea8SlogwangIf after all considerations it appears that a spin lock can neither be
a9643ea8Slogwangeliminated completely, replaced with an L-thread mutex, or left in place as
a9643ea8Slogwangis, then an alternative is to loop on a flag, with a call to
a9643ea8Slogwang``lthread_yield()`` inside the loop (n.b. if the contending L-threads might
a9643ea8Slogwangever run on different schedulers the flag will need to be manipulated
a9643ea8Slogwangatomically).
a9643ea8Slogwang
a9643ea8SlogwangSpinning and yielding is the least preferred solution since it introduces
a9643ea8Slogwangready queue backlog (see also :ref:`ready_queue_backlog`).
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _porting_sleeps_and_delays:
a9643ea8Slogwang
a9643ea8SlogwangSleeps and delays
a9643ea8Slogwang^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangYet another kind of blocking behavior (albeit momentary) are delay functions
a9643ea8Slogwanglike ``sleep()``, ``usleep()``, ``nanosleep()`` etc. All will have the
a9643ea8Slogwangconsequence of stalling the L-thread scheduler and unless the delay is very
a9643ea8Slogwangshort (e.g. a very short nanosleep) calls to these functions will need to be
a9643ea8Slogwangeliminated.
a9643ea8Slogwang
a9643ea8SlogwangThe simplest mitigation strategy is to use the L-thread sleep API functions,
a9643ea8Slogwangof which two variants exist, ``lthread_sleep()`` and ``lthread_sleep_clks()``.
a9643ea8SlogwangThese functions start an rte_timer against the L-thread, suspend the L-thread
a9643ea8Slogwangand cause another ready L-thread to be resumed. The suspended L-thread is
a9643ea8Slogwangresumed when the rte_timer matures.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _porting_infinite_loops:
a9643ea8Slogwang
a9643ea8SlogwangInfinite loops
a9643ea8Slogwang^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangSome applications have threads with loops that contain no inherent
a9643ea8Slogwangrescheduling opportunity, and rely solely on the OS time slicing to share
a9643ea8Slogwangthe CPU. In a cooperative environment this will stop everything dead. These
a9643ea8Slogwangkind of loops are not hard to identify, in a debug session you will find the
a9643ea8Slogwangdebugger is always stopping in the same loop.
a9643ea8Slogwang
a9643ea8SlogwangThe simplest solution to this kind of problem is to insert an explicit
a9643ea8Slogwang``lthread_yield()`` or ``lthread_sleep()`` into the loop. Another solution
a9643ea8Slogwangmight be to include the function performed by the loop into the execution path
a9643ea8Slogwangof some other loop that does in fact yield, if this is possible.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _porting_thread_local_storage:
a9643ea8Slogwang
a9643ea8SlogwangThread local storage
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangIf the application uses thread local storage, the use case should be
a9643ea8Slogwangstudied carefully.
a9643ea8Slogwang
a9643ea8SlogwangIn a legacy pthread application either or both the ``__thread`` prefix, or the
a9643ea8Slogwangpthread set/get specific APIs may have been used to define storage local to a
a9643ea8Slogwangpthread.
a9643ea8Slogwang
a9643ea8SlogwangIn some applications it may be a reasonable assumption that the data could
a9643ea8Slogwangor in fact most likely should be placed in L-thread local storage.
a9643ea8Slogwang
a9643ea8SlogwangIf the application (like many DPDK applications) has assumed a certain
a9643ea8Slogwangrelationship between a pthread and the CPU to which it is affinitized, there
a9643ea8Slogwangis a risk that thread local storage may have been used to save some data items
a9643ea8Slogwangthat are correctly logically associated with the CPU, and others items which
a9643ea8Slogwangrelate to application context for the thread. Only a good understanding of the
a9643ea8Slogwangapplication will reveal such cases.
a9643ea8Slogwang
a9643ea8SlogwangIf the application requires an that an L-thread is to be able to move between
a9643ea8Slogwangschedulers then care should be taken to separate these kinds of data, into per
a9643ea8Slogwanglcore, and per L-thread storage. In this way a migrating thread will bring with
a9643ea8Slogwangit the local data it needs, and pick up the new logical core specific values
a9643ea8Slogwangfrom pthread local storage at its new home.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8Slogwang.. _pthread_shim:
a9643ea8Slogwang
a9643ea8SlogwangPthread shim
a9643ea8Slogwang~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangA convenient way to get something working with legacy code can be to use a
a9643ea8Slogwangshim that adapts pthread API calls to the corresponding L-thread ones.
a9643ea8SlogwangThis approach will not mitigate any of the porting considerations mentioned
a9643ea8Slogwangin the previous sections, but it will reduce the amount of code churn that
a9643ea8Slogwangwould otherwise been involved. It is a reasonable approach to evaluate
a9643ea8SlogwangL-threads, before investing effort in porting to the native L-thread APIs.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangOverview
a9643ea8Slogwang^^^^^^^^
a9643ea8SlogwangThe L-thread subsystem includes an example pthread shim. This is a partial
a9643ea8Slogwangimplementation but does contain the API stubs needed to get basic applications
a9643ea8Slogwangrunning. There is a simple "hello world" application that demonstrates the
a9643ea8Slogwanguse of the pthread shim.
a9643ea8Slogwang
a9643ea8SlogwangA subtlety of working with a shim is that the application will still need
a9643ea8Slogwangto make use of the genuine pthread library functions, at the very least in
a9643ea8Slogwangorder to create the EAL threads in which the L-thread schedulers will run.
a9643ea8SlogwangThis is the case with DPDK initialization, and exit.
a9643ea8Slogwang
a9643ea8SlogwangTo deal with the initialization and shutdown scenarios, the shim is capable of
a9643ea8Slogwangswitching on or off its adaptor functionality, an application can control this
a9643ea8Slogwangbehavior by the calling the function ``pt_override_set()``. The default state
a9643ea8Slogwangis disabled.
a9643ea8Slogwang
a9643ea8SlogwangThe pthread shim uses the dynamic linker loader and saves the loaded addresses
a9643ea8Slogwangof the genuine pthread API functions in an internal table, when the shim
a9643ea8Slogwangfunctionality is enabled it performs the adaptor function, when disabled it
a9643ea8Slogwanginvokes the genuine pthread function.
a9643ea8Slogwang
a9643ea8SlogwangThe function ``pthread_exit()`` has additional special handling. The standard
a9643ea8Slogwangsystem header file pthread.h declares ``pthread_exit()`` with
*2d9fd380Sjfb8856606``__rte_noreturn`` this is an optimization that is possible because
a9643ea8Slogwangthe pthread is terminating and this enables the compiler to omit the normal
a9643ea8Slogwanghandling of stack and protection of registers since the function is not
a9643ea8Slogwangexpected to return, and in fact the thread is being destroyed. These
a9643ea8Slogwangoptimizations are applied in both the callee and the caller of the
a9643ea8Slogwang``pthread_exit()`` function.
a9643ea8Slogwang
a9643ea8SlogwangIn our cooperative scheduling environment this behavior is inadmissible. The
a9643ea8Slogwangpthread is the L-thread scheduler thread, and, although an L-thread is
a9643ea8Slogwangterminating, there must be a return to the scheduler in order that the system
a9643ea8Slogwangcan continue to run. Further, returning from a function with attribute
a9643ea8Slogwang``noreturn`` is invalid and may result in undefined behavior.
a9643ea8Slogwang
a9643ea8SlogwangThe solution is to redefine the ``pthread_exit`` function with a macro,
a9643ea8Slogwangcausing it to be mapped to a stub function in the shim that does not have the
a9643ea8Slogwang``noreturn`` attribute. This macro is defined in the file
a9643ea8Slogwang``pthread_shim.h``. The stub function is otherwise no different than any of
a9643ea8Slogwangthe other stub functions in the shim, and will switch between the real
a9643ea8Slogwang``pthread_exit()`` function or the ``lthread_exit()`` function as
a9643ea8Slogwangrequired. The only difference is that the mapping to the stub by macro
a9643ea8Slogwangsubstitution.
a9643ea8Slogwang
a9643ea8SlogwangA consequence of this is that the file ``pthread_shim.h`` must be included in
a9643ea8Slogwanglegacy code wishing to make use of the shim. It also means that dynamic
a9643ea8Slogwanglinkage of a pre-compiled binary that did not include pthread_shim.h is not be
a9643ea8Slogwangsupported.
a9643ea8Slogwang
a9643ea8SlogwangGiven the requirements for porting legacy code outlined in
a9643ea8Slogwang:ref:`porting_legacy_code_to_run_on_lthreads` most applications will require at
a9643ea8Slogwangleast some minimal adjustment and recompilation to run on L-threads so
a9643ea8Slogwangpre-compiled binaries are unlikely to be met in practice.
a9643ea8Slogwang
a9643ea8SlogwangIn summary the shim approach adds some overhead but can be a useful tool to help
a9643ea8Slogwangestablish the feasibility of a code reuse project. It is also a fairly
a9643ea8Slogwangstraightforward task to extend the shim if necessary.
a9643ea8Slogwang
a9643ea8Slogwang**Note:** Bearing in mind the preceding discussions about the impact of making
a9643ea8Slogwangblocking calls then switching the shim in and out on the fly to invoke any
a9643ea8Slogwangpthread API this might block is something that should typically be avoided.
a9643ea8Slogwang
a9643ea8Slogwang
a9643ea8SlogwangBuilding and running the pthread shim
a9643ea8Slogwang^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
a9643ea8Slogwang
a9643ea8SlogwangThe shim example application is located in the sample application
a9643ea8Slogwangin the performance-thread folder
a9643ea8Slogwang
a9643ea8SlogwangTo build and run the pthread shim example
a9643ea8Slogwang
a9643ea8Slogwang#. Build the application:
a9643ea8Slogwang
*2d9fd380Sjfb8856606   To compile the sample application see :doc:`compiling`.
a9643ea8Slogwang
a9643ea8Slogwang#. To run the pthread_shim example
a9643ea8Slogwang
a9643ea8Slogwang   .. code-block:: console
a9643ea8Slogwang
*2d9fd380Sjfb8856606       dpdk-pthread-shim -c core_mask -n number_of_channels
a9643ea8Slogwang
a9643ea8Slogwang.. _lthread_diagnostics:
a9643ea8Slogwang
a9643ea8SlogwangL-thread Diagnostics
a9643ea8Slogwang~~~~~~~~~~~~~~~~~~~~
a9643ea8Slogwang
a9643ea8SlogwangWhen debugging you must take account of the fact that the L-threads are run in
a9643ea8Slogwanga single pthread. The current scheduler is defined by
a9643ea8Slogwang``RTE_PER_LCORE(this_sched)``, and the current lthread is stored at
a9643ea8Slogwang``RTE_PER_LCORE(this_sched)->current_lthread``. Thus on a breakpoint in a GDB
a9643ea8Slogwangsession the current lthread can be obtained by displaying the pthread local
a9643ea8Slogwangvariable ``per_lcore_this_sched->current_lthread``.
a9643ea8Slogwang
a9643ea8SlogwangAnother useful diagnostic feature is the possibility to trace significant
a9643ea8Slogwangevents in the life of an L-thread, this feature is enabled by changing the
a9643ea8Slogwangvalue of LTHREAD_DIAG from 0 to 1 in the file ``lthread_diag_api.h``.
a9643ea8Slogwang
a9643ea8SlogwangTracing of events can be individually masked, and the mask may be programmed
a9643ea8Slogwangat run time. An unmasked event results in a callback that provides information
a9643ea8Slogwangabout the event. The default callback simply prints trace information. The
a9643ea8Slogwangdefault mask is 0 (all events off) the mask can be modified by calling the
a9643ea8Slogwangfunction ``lthread_diagniostic_set_mask()``.
a9643ea8Slogwang
a9643ea8SlogwangIt is possible register a user callback function to implement more
a9643ea8Slogwangsophisticated diagnostic functions.
a9643ea8SlogwangObject creation events (lthread, mutex, and condition variable) accept, and
a9643ea8Slogwangstore in the created object, a user supplied reference value returned by the
a9643ea8Slogwangcallback function.
a9643ea8Slogwang
a9643ea8SlogwangThe lthread reference value is passed back in all subsequent event callbacks,
a9643ea8Slogwangthe mutex and APIs are provided to retrieve the reference value from
a9643ea8Slogwangmutexes and condition variables. This enables a user to monitor, count, or
a9643ea8Slogwangfilter for specific events, on specific objects, for example to monitor for a
a9643ea8Slogwangspecific thread signaling a specific condition variable, or to monitor
a9643ea8Slogwangon all timer events, the possibilities and combinations are endless.
a9643ea8Slogwang
a9643ea8SlogwangThe callback function can be set by calling the function
a9643ea8Slogwang``lthread_diagnostic_enable()`` supplying a callback function pointer and an
a9643ea8Slogwangevent mask.
a9643ea8Slogwang
a9643ea8SlogwangSetting ``LTHREAD_DIAG`` also enables counting of statistics about cache and
a9643ea8Slogwangqueue usage, and these statistics can be displayed by calling the function
a9643ea8Slogwang``lthread_diag_stats_display()``. This function also performs a consistency
a9643ea8Slogwangcheck on the caches and queues. The function should only be called from the
*2d9fd380Sjfb8856606main EAL thread after all worker threads have stopped and returned to the C
a9643ea8Slogwangmain program, otherwise the consistency check will fail.