1.. _How_Task_Scheduler_Works.rst: 2 3How Task Scheduler Works 4======================== 5 6 7While the task scheduler is not bound to any particular type of parallelism, 8it was designed to work efficiently for fork-join parallelism with lots of forks. 9This type of parallelism is typical for parallel algorithms such as `oneapi::tbb::parallel_for 10<https://spec.oneapi.io/versions/latest/elements/oneTBB/source/algorithms/functions/parallel_for_func.html>`_. 11 12Let's consider the mapping of fork-join parallelism on the task scheduler in more detail. 13 14The scheduler runs tasks in a way that tries to achieve several targets simultaneously: 15 - Enable as many threads as possible, by creating enough job, to achieve actual parallelism 16 - Preserve data locality to make a single thread execution more efficient 17 - Minimize both memory demands and cross-thread communication to reduce an overhead 18 19To achieve this, a balance between depth-first and breadth-first execution strategies 20must be reached. Assuming that the task graph is finite, depth-first is better for 21a sequential execution because: 22 23- **Strike when the cache is hot**. The deepest tasks are the most recently created tasks and therefore are the hottest in the cache. 24 Also, if they can be completed, tasks that depend on it can continue executing, and though not the hottest in a cache, 25 they are still warmer than the older tasks deeper in the dequeue. 26 27- **Minimize space**. Execution of the shallowest task leads to the breadth-first unfolding of a graph. It creates an exponential 28 number of nodes that co-exist simultaneously. In contrast, depth-first execution creates the same number 29 of nodes, but only a linear number can exists at the same time, since it creates a stack of other ready 30 tasks. 31 32Each thread has its deque of tasks that are ready to run. When a 33thread spawns a task, it pushes it onto the bottom of its deque. 34 35When a thread participates in the evaluation of tasks, it constantly executes 36a task obtained by the first rule that applies from the roughly equivalent ruleset: 37 38- Get the task returned by the previous one, if any. 39 40- Take a task from the bottom of its deque, if any. 41 42- Steal a task from the top of another randomly chosen deque. If the 43 selected deque is empty, the thread tries again to execute this rule until it succeeds. 44 45Rule 1 is described in :doc:`Task Scheduler Bypass <Task_Scheduler_Bypass>`. 46The overall effect of rule 2 is to execute the *youngest* task spawned by the thread, 47which causes the depth-first execution until the thread runs out of work. 48Then rule 3 applies. It steals the *oldest* task spawned by another thread, 49which causes temporary breadth-first execution that converts potential parallelism 50into actual parallelism. 51