1.. _parallel_for:
2
3parallel_for
4============
5
6
7Suppose you want to apply a function ``Foo`` to each element of an
8array, and it is safe to process each element concurrently. Here is the
9sequential code to do this:
10
11
12::
13
14
15   void SerialApplyFoo( float a[], size_t n ) {
16       for( size_t i=0; i!=n; ++i )
17           Foo(a[i]);
18   }
19
20
21The iteration space here is of type ``size_t``, and goes from ``0`` to
22``n-1``. The template function ``oneapi::tbb::parallel_for`` breaks this iteration
23space into chunks, and runs each chunk on a separate thread. The first
24step in parallelizing this loop is to convert the loop body into a form
25that operates on a chunk. The form is an STL-style function object,
26called the *body* object, in which ``operator()`` processes a chunk. The
27following code declares the body object.
28
29::
30
31   #include "oneapi/tbb.h"
32
33   using namespace oneapi::tbb;
34
35   class ApplyFoo {
36       float *const my_a;
37   public:
38       void operator()( const blocked_range<size_t>& r ) const {
39           float *a = my_a;
40           for( size_t i=r.begin(); i!=r.end(); ++i )
41              Foo(a[i]);
42       }
43       ApplyFoo( float a[] ) :
44           my_a(a)
45       {}
46   };
47
48
49The ``using`` directive in the example enables you to use the library
50identifiers without having to write out the namespace prefix ``oneapi::tbb``
51before each identifier. The rest of the examples assume that such a
52``using`` directive is present.
53
54
55Note the argument to ``operator()``. A ``blocked_range<T>`` is a
56template class provided by the library. It describes a one-dimensional
57iteration space over type ``T``. Class ``parallel_for`` works with other
58kinds of iteration spaces too. The library provides ``blocked_range2d``
59for two-dimensional spaces. You can define your own spaces as explained
60in :ref:`Advanced_Topic_Other_Kinds_of_Iteration_Spaces`.
61
62
63An instance of ``ApplyFoo`` needs member fields that remember all the
64local variables that were defined outside the original loop but used
65inside it. Usually, the constructor for the body object will initialize
66these fields, though ``parallel_for`` does not care how the body object
67is created. Template function ``parallel_for`` requires that the body
68object have a copy constructor, which is invoked to create a separate
69copy (or copies) for each worker thread. It also invokes the destructor
70to destroy these copies. In most cases, the implicitly generated copy
71constructor and destructor work correctly. If they do not, it is almost
72always the case (as usual in C++) that you must define *both* to be
73consistent.
74
75
76Because the body object might be copied, its ``operator()`` should not
77modify the body. Otherwise the modification might or might not become
78visible to the thread that invoked ``parallel_for``, depending upon
79whether ``operator()`` is acting on the original or a copy. As a
80reminder of this nuance, ``parallel_for`` requires that the body
81object's ``operator()`` be declared ``const``.
82
83
84The example ``operator()`` loads ``my_a`` into a local variable ``a``.
85Though not necessary, there are two reasons for doing this in the
86example:
87
88
89-  **Style**. It makes the loop body look more like the original.
90
91
92-  **Performance**. Sometimes putting frequently accessed values into
93   local variables helps the compiler optimize the loop better, because
94   local variables are often easier for the compiler to track.
95
96
97Once you have the loop body written as a body object, invoke the
98template function ``parallel_for``, as follows:
99
100
101::
102
103
104   #include "oneapi/tbb.h"
105    
106
107   void ParallelApplyFoo( float a[], size_t n ) {
108       parallel_for(blocked_range<size_t>(0,n), ApplyFoo(a));
109   }
110
111
112The ``blocked_range`` constructed here represents the entire iteration
113space from 0 to n-1, which ``parallel_for`` divides into subspaces for
114each processor. The general form of the constructor is
115``blocked_range<T>(begin,end,grainsize)``. The ``T`` specifies the value
116type. The arguments ``begin`` and ``end`` specify the iteration space
117STL-style as a half-open interval [``begin``,\ ``end``). The argument
118*grainsize* is explained in the :ref:`Controlling_Chunking` section. The
119example uses the default grainsize of 1 because by default
120``parallel_for`` applies a heuristic that works well with the default
121grainsize.
122
123.. include:: parallel_for_toctree.rst
124
125