167c11716SAlexey Oralov.. _Task_API:
267c11716SAlexey Oralov
367c11716SAlexey OralovMigrating from low-level task API
467c11716SAlexey Oralov=================================
567c11716SAlexey Oralov
667c11716SAlexey OralovThe low-level task API of Intel(R) Threading Building Blocks (TBB) was considered complex and hence
767c11716SAlexey Oraloverror-prone, which was the primary reason it had been removed from oneAPI Threading Building Blocks
867c11716SAlexey Oralov(oneTBB). This guide helps with the migration from TBB to oneTBB for the use cases where low-level
967c11716SAlexey Oralovtask API is used.
1067c11716SAlexey Oralov
1167c11716SAlexey OralovSpawning of individual tasks
1267c11716SAlexey Oralov----------------------------
1367c11716SAlexey OralovFor most use cases, the spawning of individual tasks can be replaced with the use of either
1467c11716SAlexey Oralov``oneapi::tbb::task_group`` or ``oneapi::tbb::parallel_invoke``.
1567c11716SAlexey Oralov
1667c11716SAlexey OralovFor example, ``RootTask``, ``ChildTask1``, and ``ChildTask2`` are the user-side functors that
1767c11716SAlexey Oralovinherit ``tbb::task`` and implement its interface. Then spawning of ``ChildTask1`` and
1867c11716SAlexey Oralov``ChildTask2`` tasks that can execute in parallel with each other and waiting on the ``RootTask`` is
1967c11716SAlexey Oralovimplemented as:
2067c11716SAlexey Oralov
2167c11716SAlexey Oralov.. code:: cpp
2267c11716SAlexey Oralov
2367c11716SAlexey Oralov    #include <tbb/task.h>
2467c11716SAlexey Oralov
2567c11716SAlexey Oralov    int main() {
2667c11716SAlexey Oralov        // Assuming RootTask, ChildTask1, and ChildTask2 are defined.
2767c11716SAlexey Oralov        RootTask& root = *new(tbb::task::allocate_root()) RootTask{};
2867c11716SAlexey Oralov
2967c11716SAlexey Oralov        ChildTask1& child1 = *new(root.allocate_child()) ChildTask1{/*params*/};
3067c11716SAlexey Oralov        ChildTask2& child2 = *new(root.allocate_child()) ChildTask2{/*params*/};
3167c11716SAlexey Oralov
3267c11716SAlexey Oralov        root.set_ref_count(3);
3367c11716SAlexey Oralov
3467c11716SAlexey Oralov        tbb::task::spawn(child1);
3567c11716SAlexey Oralov        tbb::task::spawn(child2);
3667c11716SAlexey Oralov
3767c11716SAlexey Oralov        root.wait_for_all();
3867c11716SAlexey Oralov    }
3967c11716SAlexey Oralov
4067c11716SAlexey Oralov
4167c11716SAlexey OralovUsing ``oneapi::tbb::task_group``
4267c11716SAlexey Oralov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4367c11716SAlexey OralovThe code above can be rewritten using ``oneapi::tbb::task_group``:
4467c11716SAlexey Oralov
4567c11716SAlexey Oralov.. code:: cpp
4667c11716SAlexey Oralov
4767c11716SAlexey Oralov    #include <oneapi/tbb/task_group.h>
4867c11716SAlexey Oralov
4967c11716SAlexey Oralov    int main() {
5067c11716SAlexey Oralov        // Assuming ChildTask1, and ChildTask2 are defined.
5167c11716SAlexey Oralov        oneapi::tbb::task_group tg;
5267c11716SAlexey Oralov        tg.run(ChildTask1{/*params*/});
5367c11716SAlexey Oralov        tg.run(ChildTask2{/*params*/});
5467c11716SAlexey Oralov        tg.wait();
5567c11716SAlexey Oralov    }
5667c11716SAlexey Oralov
5767c11716SAlexey OralovThe code looks more concise now. It also enables lambda functions and does not require you to
5867c11716SAlexey Oralovimplement ``tbb::task`` interface that overrides the ``tbb::task* tbb::task::execute()`` virtual
5967c11716SAlexey Oralovmethod. With this new approach, you work with functors in a C++-standard way by implementing ``void
6067c11716SAlexey Oralovoperator() const``:
6167c11716SAlexey Oralov
6267c11716SAlexey Oralov.. code:: cpp
6367c11716SAlexey Oralov
6467c11716SAlexey Oralov    struct Functor {
6567c11716SAlexey Oralov        // Member to be called when object of this type are passed into
6667c11716SAlexey Oralov        // oneapi::tbb::task_group::run() method
6767c11716SAlexey Oralov        void operator()() const {}
6867c11716SAlexey Oralov    };
6967c11716SAlexey Oralov
7067c11716SAlexey Oralov
7167c11716SAlexey OralovUsing ``oneapi::tbb::parallel_invoke``
7267c11716SAlexey Oralov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7367c11716SAlexey OralovIt is also possible to use ``oneapi::tbb::parallel_invoke`` to rewrite the original code and make it
7467c11716SAlexey Oraloveven more concise:
7567c11716SAlexey Oralov
7667c11716SAlexey Oralov.. code:: cpp
7767c11716SAlexey Oralov
7867c11716SAlexey Oralov    #include <oneapi/tbb/parallel_invoke.h>
7967c11716SAlexey Oralov
8067c11716SAlexey Oralov    int main() {
8167c11716SAlexey Oralov        // Assuming ChildTask1, and ChildTask2 are defined.
8267c11716SAlexey Oralov        oneapi::tbb::parallel_invoke(
8367c11716SAlexey Oralov            ChildTask1{/*params*/},
8467c11716SAlexey Oralov            ChildTask2{/*params*/}
8567c11716SAlexey Oralov        );
8667c11716SAlexey Oralov    }
8767c11716SAlexey Oralov
8867c11716SAlexey Oralov
8967c11716SAlexey OralovAdding more work during task execution
9067c11716SAlexey Oralov--------------------------------------
9167c11716SAlexey Oralov``oneapi::tbb::parallel_invoke`` follows a blocking style of programming, which means that it
9267c11716SAlexey Oralovcompletes only when all functors passed to the parallel pattern complete their execution.
9367c11716SAlexey Oralov
9467c11716SAlexey OralovIn TBB, cases when the amount of work is not known in advance and the work needs to be added during
9567c11716SAlexey Oralovthe execution of a parallel algorithm were mostly covered by ``tbb::parallel_do`` high-level
9667c11716SAlexey Oralovparallel pattern. The ``tbb::parallel_do`` algorithm logic may be implemented using the task API as:
9767c11716SAlexey Oralov
9867c11716SAlexey Oralov.. code:: cpp
9967c11716SAlexey Oralov
10067c11716SAlexey Oralov    #include <cstddef>
10167c11716SAlexey Oralov    #include <vector>
10267c11716SAlexey Oralov    #include <tbb/task.h>
10367c11716SAlexey Oralov
10467c11716SAlexey Oralov    // Assuming RootTask and OtherWork are defined and implement tbb::task interface.
10567c11716SAlexey Oralov
10667c11716SAlexey Oralov    struct Task : public tbb::task {
10767c11716SAlexey Oralov        Task(tbb::task& root, int i)
10867c11716SAlexey Oralov            : m_root(root), m_i(i)
10967c11716SAlexey Oralov        {}
11067c11716SAlexey Oralov
11167c11716SAlexey Oralov        tbb::task* execute() override {
11267c11716SAlexey Oralov            // ... do some work for item m_i ...
11367c11716SAlexey Oralov
11467c11716SAlexey Oralov            if (add_more_parallel_work) {
11567c11716SAlexey Oralov                tbb::task& child = *new(m_root.allocate_child()) OtherWork;
11667c11716SAlexey Oralov                tbb::task::spawn(child);
11767c11716SAlexey Oralov            }
11867c11716SAlexey Oralov            return nullptr;
11967c11716SAlexey Oralov        }
12067c11716SAlexey Oralov
12167c11716SAlexey Oralov        tbb::task& m_root;
12267c11716SAlexey Oralov        int m_i;
12367c11716SAlexey Oralov    };
12467c11716SAlexey Oralov
12567c11716SAlexey Oralov    int main() {
12667c11716SAlexey Oralov        std::vector<int> items = { 0, 1, 2, 3, 4, 5, 6, 7 };
12767c11716SAlexey Oralov        RootTask& root = *new(tbb::task::allocate_root()) RootTask{/*params*/};
12867c11716SAlexey Oralov
12967c11716SAlexey Oralov        root.set_ref_count(items.size() + 1);
13067c11716SAlexey Oralov
13167c11716SAlexey Oralov        for (std::size_t i = 0; i < items.size(); ++i) {
13267c11716SAlexey Oralov            Task& task = *new(root.allocate_child()) Task(root, items[i]);
13367c11716SAlexey Oralov            tbb::task::spawn(task);
13467c11716SAlexey Oralov        }
13567c11716SAlexey Oralov
13667c11716SAlexey Oralov        root.wait_for_all();
13767c11716SAlexey Oralov        return 0;
13867c11716SAlexey Oralov    }
13967c11716SAlexey Oralov
14067c11716SAlexey OralovIn oneTBB ``tbb::parallel_do`` interface was removed. Instead, the functionality of adding new work
14167c11716SAlexey Oralovwas included into the ``oneapi::tbb::parallel_for_each`` interface.
14267c11716SAlexey Oralov
14367c11716SAlexey OralovThe previous use case can be rewritten in oneTBB as follows:
14467c11716SAlexey Oralov
14567c11716SAlexey Oralov.. code:: cpp
14667c11716SAlexey Oralov
14767c11716SAlexey Oralov    #include <vector>
14867c11716SAlexey Oralov    #include <oneapi/tbb/parallel_for_each.h>
14967c11716SAlexey Oralov
15067c11716SAlexey Oralov    int main() {
15167c11716SAlexey Oralov        std::vector<int> items = { 0, 1, 2, 3, 4, 5, 6, 7 };
15267c11716SAlexey Oralov
15367c11716SAlexey Oralov        oneapi::tbb::parallel_for_each(
15467c11716SAlexey Oralov            items.begin(), items.end(),
15567c11716SAlexey Oralov            [](int& i, tbb::feeder<int>& feeder) {
15667c11716SAlexey Oralov
15767c11716SAlexey Oralov                // ... do some work for item i ...
15867c11716SAlexey Oralov
15967c11716SAlexey Oralov                if (add_more_parallel_work)
16067c11716SAlexey Oralov                    feeder.add(i);
16167c11716SAlexey Oralov            }
16267c11716SAlexey Oralov        );
16367c11716SAlexey Oralov    }
16467c11716SAlexey Oralov
16567c11716SAlexey OralovSince both TBB and oneTBB support nested expressions, you can run additional functors from within an
16667c11716SAlexey Oralovalready running functor.
16767c11716SAlexey Oralov
16867c11716SAlexey OralovThe previous use case can be rewritten using ``oneapi::tbb::task_group`` as:
16967c11716SAlexey Oralov
17067c11716SAlexey Oralov.. code:: cpp
17167c11716SAlexey Oralov
17267c11716SAlexey Oralov    #include <cstddef>
17367c11716SAlexey Oralov    #include <vector>
17467c11716SAlexey Oralov    #include <oneapi/tbb/task_group.h>
17567c11716SAlexey Oralov
17667c11716SAlexey Oralov    int main() {
17767c11716SAlexey Oralov        std::vector<int> items = { 0, 1, 2, 3, 4, 5, 6, 7 };
17867c11716SAlexey Oralov
17967c11716SAlexey Oralov        oneapi::tbb::task_group tg;
18067c11716SAlexey Oralov        for (std::size_t i = 0; i < items.size(); ++i) {
18167c11716SAlexey Oralov            tg.run([&i = items[i], &tg] {
18267c11716SAlexey Oralov
18367c11716SAlexey Oralov                // ... do some work for item i ...
18467c11716SAlexey Oralov
18567c11716SAlexey Oralov                if (add_more_parallel_work)
18667c11716SAlexey Oralov                    // Assuming OtherWork is defined.
18767c11716SAlexey Oralov                    tg.run(OtherWork{});
18867c11716SAlexey Oralov
18967c11716SAlexey Oralov            });
19067c11716SAlexey Oralov        }
19167c11716SAlexey Oralov        tg.wait();
19267c11716SAlexey Oralov    }
19367c11716SAlexey Oralov
19467c11716SAlexey Oralov
19567c11716SAlexey OralovTask recycling
19667c11716SAlexey Oralov--------------
19767c11716SAlexey OralovYou can re-run the functor by passing ``*this`` to the ``oneapi::tbb::task_group::run()``
19867c11716SAlexey Oralovmethod. The functor will be copied in this case. However, its state can be shared among instances:
19967c11716SAlexey Oralov
20067c11716SAlexey Oralov.. code:: cpp
20167c11716SAlexey Oralov
20267c11716SAlexey Oralov    #include <memory>
20367c11716SAlexey Oralov    #include <oneapi/tbb/task_group.h>
20467c11716SAlexey Oralov
20567c11716SAlexey Oralov    struct SharedStateFunctor {
20667c11716SAlexey Oralov        std::shared_ptr<Data> m_shared_data;
20767c11716SAlexey Oralov        oneapi::tbb::task_group& m_task_group;
20867c11716SAlexey Oralov
20967c11716SAlexey Oralov        void operator()() const {
21067c11716SAlexey Oralov            // do some work processing m_shared_data
21167c11716SAlexey Oralov
21267c11716SAlexey Oralov            if (has_more_work)
21367c11716SAlexey Oralov                m_task_group.run(*this);
21467c11716SAlexey Oralov
21567c11716SAlexey Oralov            // Note that this might be concurrently accessing m_shared_data already
21667c11716SAlexey Oralov        }
21767c11716SAlexey Oralov    };
21867c11716SAlexey Oralov
21967c11716SAlexey Oralov    int main() {
22067c11716SAlexey Oralov        // Assuming Data is defined.
22167c11716SAlexey Oralov        std::shared_ptr<Data> data = std::make_shared<Data>(/*params*/);
22267c11716SAlexey Oralov        oneapi::tbb::task_group tg;
22367c11716SAlexey Oralov        tg.run(SharedStateFunctor{data, tg});
22467c11716SAlexey Oralov        tg.wait();
22567c11716SAlexey Oralov    }
22667c11716SAlexey Oralov
22767c11716SAlexey OralovSuch patterns are particularly useful when the work within a functor is not completed but there is a
22867c11716SAlexey Oralovneed for the task scheduler to react to outer circumstances, such as cancellation of group
22967c11716SAlexey Oralovexecution. To avoid issues with concurrent access, it is recommended to submit it for re-execution
23067c11716SAlexey Oralovas the last step:
23167c11716SAlexey Oralov
23267c11716SAlexey Oralov.. code:: cpp
23367c11716SAlexey Oralov
23467c11716SAlexey Oralov    #include <memory>
23567c11716SAlexey Oralov    #include <oneapi/tbb/task_group.h>
23667c11716SAlexey Oralov
23767c11716SAlexey Oralov    struct SharedStateFunctor {
23867c11716SAlexey Oralov        std::shared_ptr<Data> m_shared_data;
23967c11716SAlexey Oralov        oneapi::tbb::task_group& m_task_group;
24067c11716SAlexey Oralov
24167c11716SAlexey Oralov        void operator()() const {
24267c11716SAlexey Oralov            // do some work processing m_shared_data
24367c11716SAlexey Oralov
24467c11716SAlexey Oralov            if (need_to_yield) {
24567c11716SAlexey Oralov                m_task_group.run(*this);
24667c11716SAlexey Oralov                return;
24767c11716SAlexey Oralov            }
24867c11716SAlexey Oralov        }
24967c11716SAlexey Oralov    };
25067c11716SAlexey Oralov
25167c11716SAlexey Oralov    int main() {
25267c11716SAlexey Oralov        // Assuming Data is defined.
25367c11716SAlexey Oralov        std::shared_ptr<Data> data = std::make_shared<Data>(/*params*/);
25467c11716SAlexey Oralov        oneapi::tbb::task_group tg;
25567c11716SAlexey Oralov        tg.run(SharedStateFunctor{data, tg});
25667c11716SAlexey Oralov        tg.wait();
25767c11716SAlexey Oralov    }
25867c11716SAlexey Oralov
25967c11716SAlexey Oralov
26067c11716SAlexey OralovRecycling as child or continuation
26167c11716SAlexey Oralov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
26267c11716SAlexey OralovIn oneTBB this kind of recycling is done manually. You have to track when it is time to run the
26367c11716SAlexey Oralovtask:
26467c11716SAlexey Oralov
26567c11716SAlexey Oralov.. code:: cpp
26667c11716SAlexey Oralov
26767c11716SAlexey Oralov    #include <cstddef>
26867c11716SAlexey Oralov    #include <vector>
26967c11716SAlexey Oralov    #include <atomic>
27067c11716SAlexey Oralov    #include <cassert>
27167c11716SAlexey Oralov    #include <oneapi/tbb/task_group.h>
27267c11716SAlexey Oralov
27367c11716SAlexey Oralov    struct ContinuationTask {
27467c11716SAlexey Oralov        ContinuationTask(std::vector<int>& data, int& result)
27567c11716SAlexey Oralov            : m_data(data), m_result(result)
27667c11716SAlexey Oralov        {}
27767c11716SAlexey Oralov
27867c11716SAlexey Oralov        void operator()() const {
27967c11716SAlexey Oralov            for (const auto& item : m_data)
28067c11716SAlexey Oralov                m_result += item;
28167c11716SAlexey Oralov        }
28267c11716SAlexey Oralov
28367c11716SAlexey Oralov        std::vector<int>& m_data;
28467c11716SAlexey Oralov        int& m_result;
28567c11716SAlexey Oralov    };
28667c11716SAlexey Oralov
28767c11716SAlexey Oralov    struct ChildTask {
28867c11716SAlexey Oralov        ChildTask(std::vector<int>& data, int& result,
28967c11716SAlexey Oralov                  std::atomic<std::size_t>& tasks_left, std::atomic<std::size_t>& tasks_done,
29067c11716SAlexey Oralov                  oneapi::tbb::task_group& tg)
29167c11716SAlexey Oralov            : m_data(data), m_result(result), m_tasks_left(tasks_left), m_tasks_done(tasks_done), m_tg(tg)
29267c11716SAlexey Oralov        {}
29367c11716SAlexey Oralov
29467c11716SAlexey Oralov        void operator()() const {
29567c11716SAlexey Oralov            std::size_t index = --m_tasks_left;
29667c11716SAlexey Oralov            m_data[index] = produce_item_for(index);
29767c11716SAlexey Oralov            std::size_t done_num = ++m_tasks_done;
29867c11716SAlexey Oralov            if (index % 2 != 0) {
29967c11716SAlexey Oralov                // Recycling as child
30067c11716SAlexey Oralov                m_tg.run(*this);
30167c11716SAlexey Oralov                return;
30267c11716SAlexey Oralov            } else if (done_num == m_data.size()) {
30367c11716SAlexey Oralov                assert(m_tasks_left == 0);
30467c11716SAlexey Oralov                // Spawning a continuation that does reduction
30567c11716SAlexey Oralov                m_tg.run(ContinuationTask(m_data, m_result));
30667c11716SAlexey Oralov            }
30767c11716SAlexey Oralov        }
30867c11716SAlexey Oralov        std::vector<int>& m_data;
30967c11716SAlexey Oralov        int& m_result;
31067c11716SAlexey Oralov        std::atomic<std::size_t>& m_tasks_left;
31167c11716SAlexey Oralov        std::atomic<std::size_t>& m_tasks_done;
31267c11716SAlexey Oralov        oneapi::tbb::task_group& m_tg;
31367c11716SAlexey Oralov    };
31467c11716SAlexey Oralov
31567c11716SAlexey Oralov
31667c11716SAlexey Oralov    int main() {
31767c11716SAlexey Oralov        int result = 0;
31867c11716SAlexey Oralov        std::vector<int> items(10, 0);
31967c11716SAlexey Oralov        std::atomic<std::size_t> tasks_left{items.size()};
32067c11716SAlexey Oralov        std::atomic<std::size_t> tasks_done{0};
32167c11716SAlexey Oralov
32267c11716SAlexey Oralov        oneapi::tbb::task_group tg;
32367c11716SAlexey Oralov        for (std::size_t i = 0; i < items.size(); i+=2) {
32467c11716SAlexey Oralov            tg.run(ChildTask(items, result, tasks_left, tasks_done, tg));
32567c11716SAlexey Oralov        }
32667c11716SAlexey Oralov        tg.wait();
32767c11716SAlexey Oralov    }
328e8037363SAnton Potapov
329e8037363SAnton Potapov
330e8037363SAnton PotapovScheduler Bypass
331e8037363SAnton Potapov----------------
332e8037363SAnton Potapov
333e8037363SAnton PotapovTBB ``task::execute()`` method can return a pointer to a task that can be executed next by the current thread.
334e8037363SAnton PotapovThis might reduce scheduling overheads compared to direct ``spawn``. Similar to ``spawn``, the returned task
335e8037363SAnton Potapovis not guaranteed to be executed next by the current thread.
336e8037363SAnton Potapov
337e8037363SAnton Potapov.. code:: cpp
338e8037363SAnton Potapov
339e8037363SAnton Potapov    #include <tbb/task.h>
340e8037363SAnton Potapov
341e8037363SAnton Potapov    // Assuming OtherTask is defined.
342e8037363SAnton Potapov
343e8037363SAnton Potapov    struct Task : tbb::task {
344e8037363SAnton Potapov        task* execute(){
345e8037363SAnton Potapov            // some work to do ...
346e8037363SAnton Potapov
347e8037363SAnton Potapov            auto* other_p = new(this->parent().allocate_child()) OtherTask{};
348e8037363SAnton Potapov            this->parent().add_ref_count();
349e8037363SAnton Potapov
350e8037363SAnton Potapov            return other_p;
351e8037363SAnton Potapov        }
352e8037363SAnton Potapov    };
353e8037363SAnton Potapov
354e8037363SAnton Potapov    int main(){
355e8037363SAnton Potapov        // Assuming RootTask is  defined.
356e8037363SAnton Potapov        RootTask& root = *new(tbb::task::allocate_root()) RootTask{};
357e8037363SAnton Potapov
358e8037363SAnton Potapov        Task& child = *new(root.allocate_child()) Task{/*params*/};
359e8037363SAnton Potapov
360e8037363SAnton Potapov        root.add_ref_count();
361e8037363SAnton Potapov
362e8037363SAnton Potapov        tbb::task_spawn(child);
363e8037363SAnton Potapov
364*5e91b2c0SVladislav Shchapov        root.wait_for_all();
365e8037363SAnton Potapov    }
366e8037363SAnton Potapov
367fd76f452SAnton PotapovIn oneTBB, this can be done using ``oneapi::tbb::task_group``.
368e8037363SAnton Potapov
369e8037363SAnton Potapov.. code:: cpp
370e8037363SAnton Potapov
371e8037363SAnton Potapov    #include <oneapi/tbb/task_group.h>
372e8037363SAnton Potapov
373e8037363SAnton Potapov    // Assuming OtherTask is defined.
374e8037363SAnton Potapov
375e8037363SAnton Potapov    int main(){
376e8037363SAnton Potapov        oneapi::tbb::task_group tg;
377e8037363SAnton Potapov
378e8037363SAnton Potapov        tg.run([&tg](){
379e8037363SAnton Potapov            //some work to do ...
380e8037363SAnton Potapov
381e8037363SAnton Potapov            return tg.defer(OtherTask{});
382e8037363SAnton Potapov        });
383e8037363SAnton Potapov
384e8037363SAnton Potapov        tg.wait();
385e8037363SAnton Potapov    }
386e8037363SAnton Potapov
387e8037363SAnton PotapovHere ``oneapi::tbb::task_group::defer`` adds a new task into the ``tg``. However, the task is not put into a
388e8037363SAnton Potapovqueue of tasks ready for execution via ``oneapi::tbb::task_group::run``, but bypassed to the executing thread directly
389e8037363SAnton Potapovvia function return value.
390e8037363SAnton Potapov
391f580ec8eSAnton PotapovDeferred task creation
392f580ec8eSAnton Potapov----------------------
393f580ec8eSAnton PotapovThe TBB low-level task API separates the task creation from the actual spawning. This separation allows to
394f580ec8eSAnton Potapovpostpone the task spawning, while the parent task and final result production are blocked from premature leave.
395f580ec8eSAnton PotapovFor example, ``RootTask``, ``ChildTask``, and ``CallBackTask`` are the user-side functors that
396f580ec8eSAnton Potapovinherit ``tbb::task`` and implement its interface. Then, blocking the ``RootTask`` from leaving prematurely
397f580ec8eSAnton Potapovand waiting on it is implemented as follows:
398f580ec8eSAnton Potapov
399f580ec8eSAnton Potapov.. code:: cpp
400f580ec8eSAnton Potapov
401f580ec8eSAnton Potapov    #include <tbb/task.h>
402f580ec8eSAnton Potapov
403f580ec8eSAnton Potapov    int main() {
404f580ec8eSAnton Potapov        // Assuming RootTask, ChildTask, and CallBackTask are defined.
405f580ec8eSAnton Potapov        RootTask& root = *new(tbb::task::allocate_root()) RootTask{};
406f580ec8eSAnton Potapov
407f580ec8eSAnton Potapov        ChildTask&    child    = *new(root.allocate_child()) ChildTask{/*params*/};
408f580ec8eSAnton Potapov        CallBackTask& cb_task  = *new(root.allocate_child()) CallBackTask{/*params*/};
409f580ec8eSAnton Potapov
410f580ec8eSAnton Potapov        root.set_ref_count(3);
411f580ec8eSAnton Potapov
412f580ec8eSAnton Potapov        tbb::task::spawn(child);
413f580ec8eSAnton Potapov
414f580ec8eSAnton Potapov        register_callback([cb_task&](){
415f580ec8eSAnton Potapov            tbb::task::enqueue(cb_task);
416f580ec8eSAnton Potapov        });
417f580ec8eSAnton Potapov
418f580ec8eSAnton Potapov        root.wait_for_all();
419f580ec8eSAnton Potapov        // Control flow will reach here only after both ChildTask and CallBackTask are executed,
420f580ec8eSAnton Potapov        // i.e. after the callback is called
421f580ec8eSAnton Potapov    }
422f580ec8eSAnton Potapov
423fd76f452SAnton PotapovIn oneTBB, this can be done using ``oneapi::tbb::task_group``.
424f580ec8eSAnton Potapov
425f580ec8eSAnton Potapov.. code:: cpp
426f580ec8eSAnton Potapov
427f580ec8eSAnton Potapov    #include <oneapi/tbb/task_group.h>
428f580ec8eSAnton Potapov
429f580ec8eSAnton Potapov    int main(){
430f580ec8eSAnton Potapov        oneapi::tbb::task_group tg;
431f580ec8eSAnton Potapov        oneapi::tbb::task_arena arena;
432f580ec8eSAnton Potapov        // Assuming ChildTask and CallBackTask are defined.
433f580ec8eSAnton Potapov
434f580ec8eSAnton Potapov        auto cb = tg.defer(CallBackTask{/*params*/});
435f580ec8eSAnton Potapov
436f580ec8eSAnton Potapov        register_callback([&tg, c = std::move(cb), &arena]{
437f580ec8eSAnton Potapov            arena.enqueue(c);
438f580ec8eSAnton Potapov        });
439f580ec8eSAnton Potapov
440f580ec8eSAnton Potapov        tg.run(ChildTask{/*params*/});
441f580ec8eSAnton Potapov
442f580ec8eSAnton Potapov
443f580ec8eSAnton Potapov        tg.wait();
444f580ec8eSAnton Potapov        // Control flow gets here once both ChildTask and CallBackTask are executed
445f580ec8eSAnton Potapov        // i.e. after the callback is called
446f580ec8eSAnton Potapov    }
447f580ec8eSAnton Potapov
448f580ec8eSAnton PotapovHere ``oneapi::tbb::task_group::defer`` adds a new task into the ``tg``. However, the task is not spawned until
449f580ec8eSAnton Potapov``oneapi::tbb::task_arena::enqueue`` is called.
450f580ec8eSAnton Potapov
451f580ec8eSAnton Potapov.. note::
452f580ec8eSAnton Potapov   The call to ``oneapi::tbb::task_group::wait`` will not return control until both ``ChildTask`` and
453f580ec8eSAnton Potapov   ``CallBackTask`` are executed.
454