1.. _parallel_for: 2 3parallel_for 4============ 5 6 7Suppose you want to apply a function ``Foo`` to each element of an 8array, and it is safe to process each element concurrently. Here is the 9sequential code to do this: 10 11 12:: 13 14 15 void SerialApplyFoo( float a[], size_t n ) { 16 for( size_t i=0; i!=n; ++i ) 17 Foo(a[i]); 18 } 19 20 21The iteration space here is of type ``size_t``, and goes from ``0`` to 22``n-1``. The template function ``oneapi::tbb::parallel_for`` breaks this iteration 23space into chunks, and runs each chunk on a separate thread. The first 24step in parallelizing this loop is to convert the loop body into a form 25that operates on a chunk. The form is an STL-style function object, 26called the *body* object, in which ``operator()`` processes a chunk. The 27following code declares the body object. 28 29:: 30 31 #include "oneapi/tbb.h" 32 33 using namespace oneapi::tbb; 34 35 class ApplyFoo { 36 float *const my_a; 37 public: 38 void operator()( const blocked_range<size_t>& r ) const { 39 float *a = my_a; 40 for( size_t i=r.begin(); i!=r.end(); ++i ) 41 Foo(a[i]); 42 } 43 ApplyFoo( float a[] ) : 44 my_a(a) 45 {} 46 }; 47 48 49The ``using`` directive in the example enables you to use the library 50identifiers without having to write out the namespace prefix ``oneapi::tbb`` 51before each identifier. The rest of the examples assume that such a 52``using`` directive is present. 53 54 55Note the argument to ``operator()``. A ``blocked_range<T>`` is a 56template class provided by the library. It describes a one-dimensional 57iteration space over type ``T``. Class ``parallel_for`` works with other 58kinds of iteration spaces too. The library provides ``blocked_range2d`` 59for two-dimensional spaces. You can define your own spaces as explained 60in :ref:`Advanced_Topic_Other_Kinds_of_Iteration_Spaces`. 61 62 63An instance of ``ApplyFoo`` needs member fields that remember all the 64local variables that were defined outside the original loop but used 65inside it. Usually, the constructor for the body object will initialize 66these fields, though ``parallel_for`` does not care how the body object 67is created. Template function ``parallel_for`` requires that the body 68object have a copy constructor, which is invoked to create a separate 69copy (or copies) for each worker thread. It also invokes the destructor 70to destroy these copies. In most cases, the implicitly generated copy 71constructor and destructor work correctly. If they do not, it is almost 72always the case (as usual in C++) that you must define *both* to be 73consistent. 74 75 76Because the body object might be copied, its ``operator()`` should not 77modify the body. Otherwise the modification might or might not become 78visible to the thread that invoked ``parallel_for``, depending upon 79whether ``operator()`` is acting on the original or a copy. As a 80reminder of this nuance, ``parallel_for`` requires that the body 81object's ``operator()`` be declared ``const``. 82 83 84The example ``operator()`` loads ``my_a`` into a local variable ``a``. 85Though not necessary, there are two reasons for doing this in the 86example: 87 88 89- **Style**. It makes the loop body look more like the original. 90 91 92- **Performance**. Sometimes putting frequently accessed values into 93 local variables helps the compiler optimize the loop better, because 94 local variables are often easier for the compiler to track. 95 96 97Once you have the loop body written as a body object, invoke the 98template function ``parallel_for``, as follows: 99 100 101:: 102 103 104 #include "oneapi/tbb.h" 105 106 107 void ParallelApplyFoo( float a[], size_t n ) { 108 parallel_for(blocked_range<size_t>(0,n), ApplyFoo(a)); 109 } 110 111 112The ``blocked_range`` constructed here represents the entire iteration 113space from 0 to n-1, which ``parallel_for`` divides into subspaces for 114each processor. The general form of the constructor is 115``blocked_range<T>(begin,end,grainsize)``. The ``T`` specifies the value 116type. The arguments ``begin`` and ``end`` specify the iteration space 117STL-style as a half-open interval [``begin``,\ ``end``). The argument 118*grainsize* is explained in the :ref:`Controlling_Chunking` section. The 119example uses the default grainsize of 1 because by default 120``parallel_for`` applies a heuristic that works well with the default 121grainsize. 122 123.. include:: parallel_for_toctree.rst 124 125