1.. _guiding_task_scheduler_execution: 2 3Guiding Task Scheduler Execution 4================================ 5 6By default, the task scheduler tries to use all available computing resources. In some cases, 7you may want to configure the task scheduler to use only some of them. 8 9.. caution:: 10 11 Guiding the execution of the task scheduler may cause composability issues. 12 13|full_name| provides the ``task_arena`` interface to guide tasks execution within the arena by: 14 - setting the preferred computation units; 15 - restricting part of computation units. 16 17Such customizations are encapsulated within the ``task_arena::constraints`` structure. 18To set the limitation, you have to customize the ``task_arena::constraints`` and then pass 19it to the ``task_arena`` instance during the construction or initialization. 20 21The structure ``task_arena::constraints`` allows to specify the following restrictions: 22 23- Preferred NUMA node 24- Preferred core type 25- The maximum number of logical threads scheduled per single core simultaneously 26- The level of ``task_arena`` concurrency 27 28You may use the interfaces from ``tbb::info`` namespace to construct the ``tbb::task_arena::constraints`` 29instance. Interfaces from ``tbb::info`` namespace respect the process affinity mask. For instance, 30if the process affinity mask excludes execution on some of the NUMA nodes, then these NUMA nodes are 31not returned by ``tbb::info::numa_nodes()`` interface. 32 33The following examples show how to use these interfaces: 34 35.. rubric:: Setting the preferred NUMA node 36 37The execution on systems with non-uniform memory access (`NUMA <https://en.wikipedia.org/wiki/Non-uniform_memory_access>`_ systems) 38may cause a performance penalty if threads from one NUMA node access the memory allocated on 39a different NUMA node. To reduce this overhead, the work may be divided among several ``task_arena`` 40instances, whose execution preference is set to different NUMA nodes. To set execution preference, 41assign a NUMA node identifier to the ``task_arena::constraints::numa_id`` field. 42 43:: 44 45 std::vector<tbb::numa_node_id> numa_indexes = tbb::info::numa_nodes(); 46 std::vector<tbb::task_arena> arenas(numa_indexes.size()); 47 std::vector<tbb::task_group> task_groups(numa_indexes.size()); 48 49 for(unsigned j = 0; j < numa_indexes.size(); j++) { 50 arenas[j].initialize(tbb::task_arena::constraints(numa_indexes[j])); 51 arenas[j].execute([&task_groups, &j](){ 52 task_groups[j].run([](){/*some parallel stuff*/}); 53 }); 54 } 55 56 for(unsigned j = 0; j < numa_indexes.size(); j++) { 57 arenas[j].execute([&task_groups, &j](){ task_groups[j].wait(); }); 58 } 59 60.. rubric:: Setting the preferred core type 61 62The processors with `Intel® Hybrid Technology <https://www.intel.com/content/www/us/en/products/docs/processors/core/core-processors-with-hybrid-technology-brief.html>`_ 63contain several core types, each is suited for different purposes. 64In most cases, systems with hybrid CPU architecture show reasonable performance without involving additional API calls. 65However, in some exceptional scenarios, performance may be tuned by setting the preferred core type. 66To set the preferred core type for the execution, assign a specific core type identifier to the ``task_arena::constraints::core_type`` field. 67 68The example shows how to set the most performant core type as preferable for work execution: 69 70:: 71 72 std::vector<tbb::core_type_id> core_types = tbb::info::core_types(); 73 tbb::task_arena arena( 74 tbb::task_arena::constraints{}.set_core_type(core_types.back()) 75 ); 76 77 arena.execute( [] { 78 /*the most performant core type is defined as preferred.*/ 79 }); 80 81.. rubric:: Limiting the maximum number of threads simultaneously scheduled to one core 82 83The processors with `Intel® Hyper-Threading Technology <https://www.intel.com/content/www/us/en/architecture-and-technology/hyper-threading/hyper-threading-technology.html>`_ 84allow more than one thread to run on each core simultaneously. However, there might be situations 85when there is need to lower the number of simultaneously running threads per core. In such cases, 86assign the desired value to the ``task_arena::constraints::max_threads_per_core`` field. 87 88The example shows how to allow only one thread to run on each core at a time: 89 90:: 91 92 tbb::task_arena no_ht_arena( tbb::task_arena::constraints{}.set_max_threads_per_core(1) ); 93 no_ht_arena.execute( [] { 94 /*parallel work*/ 95 }); 96 97A more composable way to limit the number of threads executing on cores is by setting the maximal 98concurrency of the ``tbb::task_arena``: 99 100:: 101 102 int no_ht_concurrency = tbb::info::default_concurrency( 103 tbb::task_arena::constraints{}.set_max_threads_per_core(1) 104 ); 105 tbb::task_arena arena( no_ht_concurrency ); 106 arena.execute( [] { 107 /*parallel work*/ 108 }); 109 110Similarly to the previous example, the number of threads inside the arena is equal to the 111number of available cores. However, this one results in fewer overheads and better composability 112by imposing a less constrained execution. 113