Home
last modified time | relevance | path

Searched refs:is (Results 1 – 25 of 259) sorted by relevance

1234567891011

/oneTBB/doc/
H A DDoxyfile.in6 # All text after a double hash (##) is considered a comment and is placed in
10 # The format is:
21 # that follow. The default is UTF-8 which is also the encoding used for all text
31 # project for which the documentation is generated. This name is used in the
39 # control system is used.
116 # text. Otherwise, the brief description is used as-is. If left blank, the
419 # is 0..9, the default is 0, corresponding to a cache size of 2^16=65536
704 # tag is left empty.
791 # Note: If this tag is empty the current directory is searched.
1212 # 0 represents red, 60 is yellow, 120 is green, 180 is cyan, 240 is blue, 300
[all …]
/oneTBB/doc/main/tbb_userguide/
H A DWhen_Task-Based_Programming_Is_Inappropriate.rst7 Using the task scheduler is usually the best approach to threading for
8 performance, however there are cases when the task scheduler is not
9 appropriate. The task scheduler is intended for high-performance
11 rarely block. However, if threads block frequently, there is a
13 is blocked, it is not working on any tasks. Blocking typically occurs
15 mutexes for long periods, your code is not likely to perform well
17 it is best to use full-blown threads for those. The task scheduler is
H A DConcurrent_Queue_Classes.rst9 push and pop elements from the queue. The queue is unbounded and has no
12 std::queue. The operation ``try_pop`` pops an item if it is available.
39 The equivalent thread-safe |full_name| code is:
54 definition of "first" is uncertain. Use of ``concurrent_queue``
62 or to wait for the queue to become non-empty. Typically this is
66 Template class ``concurrent_bounded_queue<T,Alloc>`` is a variant that
88 ``concurrent_queue`` is empty, and there are ``n`` pending pop
91 ``empty()`` is defined to be true if and only if ``size()`` is not
98 causes ``push`` to block until there is room in the queue. Bounded
99 queues are slower than unbounded queues, so if there is a constraint
[all …]
H A Duse_input_node.rst7 By default, an ``input_node`` is constructed in the inactive state:
29 activated after the entire flow graph is constructed.
33 the ``input_node`` is constructed in the inactive state and activated after
51 squarer is connected. Later, when the edge to cuber is connected, cuber
55 In general it is safest to create your ``input_node`` objects in the inactive
56 state and then activate them after the whole graph is constructed.
62 the overlap of construction and execution. If your graph is a directed
65 edges in reverse topological order; that is, make the edges at the
69 src is activated just after its construction:
105 The above code is safe because the edge from ``func1`` to ``func2`` is made
[all …]
H A DMutual_Exclusion.rst9 exclusion is implemented by *mutexes* and *locks.* A mutex is an object
14 The simplest mutex is ``spin_mutex``. A thread trying to acquire a lock
16 ``spin_mutex`` is appropriate when the lock is held for only a few
51 inside routine ``AllocateNode`` may look unusual. Their role is to keep
59 the example is changed to
65 then the ``scoped_lock`` is destroyed when execution reaches the
93 It is recommended that you add extra braces where possible, to clarify
94 to maintainers which code is protected by the lock.
99 itself. The reason is that the C interface would not be exception safe,
113 type ``M``, the corresponding lock type is ``M::scoped_lock``.
[all …]
H A DControlling_Chunking_os.rst7 Chunking is controlled by a *partitioner* and a *grainsize.*\ To gain
16 argument form of the constructor is
18 ``grainsize`` is 1. It is in units of loop iterations per chunk.
25 The following code is the last example from parallel_for, modified to
48 There is also an intermediate level of control where you specify the
50 ``affinity_partitioner``. An ``auto_partitioner`` is the default
62 Because of the impact of grainsize on parallel loops, it is worth
101 ``grainsize``\ =100,000. The rationale is that each iteration
113 A drawback of setting a grainsize too high is that it can reduce
143 that with a grainsize of one, most of the overhead is parallel
[all …]
H A DNodes.rst7 A node is a class that inherits from oneapi::tbb::flow::graph_node and also
14 While it is possible to define your own node types by inheriting from
15 graph_node, sender and receiver, it is more typical that predefined node
19 A ``function_node`` is a predefined type available in ``flow_graph.h`` and
50 Below is code for creating a simple graph that contains a single
51 function_node. In this example, a node n is constructed that belongs to
53 invocation of the node to occur concurrently. The body is a lambda
56 for the function spin_for is not provided.
71 After the node is constructed in the example above, you can pass
73 invoking its function try_put. Using edges is described in the next
[all …]
H A DPredefined_Node_Types.rst8 class sender and class receiver but it is likely that you can create
10 flow_graph.h. Below is a table that lists all of the predefined types
26 …When activated, it executes a user body to generate its output. Its body is invoked if downstream …
27 …Otherwise, the previous output is temporarily buffered until it is accepted downstream and then th…
29 …ntrollable concurrency level and buffering policy. For each input exactly one output is returned.
31 …utes a user body when it receives N continue_msg objects at its input. N is equal to the numbe…
39is a tuple of these generic types. The node combines one message from each input port to create …
41 …- A single-input, multi-output node. The input type is a tuple of generic types and there is on…
43 …sors. After broadcast, the nodes retain the last message received, so it is available to any f…
47 …l of its successors. The input type is a list of generic types and the output type is a tagged_m…
H A Dappendix_A.rst8 threads. Each logical thread is serviced for a *time slice* by a
14 The most obvious is the time for *context switching* between logical
20 A more subtle cost is *cache cooling*. Processors keep recently accessed
21 data in cache memory, which is very fast, but also relatively small
25 reality of set-associative caches is a bit more complicated, but this is
28 into cache, taking hundreds of cycles. If it is referenced frequently
30 cache, and only take a few cycles. Such data is called "hot in cache".
40 Another cost is *lock preemption.* This happens if a thread acquires a
43 it is now going to hold it for at least as long as it takes for its next
46 effect is called *convoying*, because the threads end up "bumper to
H A DNon-Linear_Pipelines.rst25 parallelism is lost by forcing a linear order on the filters, but in
26 fact the only loss is in the *latency* of the pipeline, not the
27 throughput. The latency is the time it takes a token to flow from the
29 processors, the latency of the original non-linear pipeline is three
30 filters. This is because filters A and B could process the token
42 In the linear pipeline, the latency is five filters. The behavior of
49 throughput is still limited by the throughput of the slowest serial
52 The linear limitation of ``parallel_pipeline`` is a good tradeoff of
H A Dcatching_exceptions.rst8 normally, as you might expect. If an exception is thrown but is not
10 of the graph's nodes are canceled and the exception is rethrown at the
41 that is not caught within the body. This will cause the execution of the
43 g.wait_for_all(). Since it is not handled there either, the program will
62 If the exception is caught and handled in the body, then there is no
77 In this case, the execution of the graph is canceled. For our example,
H A Dcancelling_nested_parallelism.rst7 Nested parallelism is canceled if the inner context is bound to the
8 outer context; otherwise it is not.
11 If the execution of a flow graph is canceled, either explicitly or due
20 graph, it is created with an isolated context by default.
H A DCook_Until_Done_parallel_do.rst7 For some loops, the end of the iteration space is not known in advance,
12 A linked list is an example of an iteration space that is not known in
13 advance. In parallel programming, it is usually better to use dynamic
15 is inherently serial. But if you are limited to linked lists, the items
36 qualified ``operator()``. This is similar to a C++ function object from
52 The parallel form of ``SerialApplyFooToList`` is as follows:
66 ``parallel_for_each`` unscalable, because the fetching of work is
80 a node in a tree is a prerequisite to processing its descendants.
H A DTask-Based_Programming.rst8 poor way to do multithreaded programming. It is much better to formulate
34 when there is exactly one running logical thread per physical thread.
47 The key advantage of tasks versus logical threads is that tasks are much
49 terminating a task is about 18 times faster than starting and
50 terminating a thread. On Windows systems, the ratio is more than 100.
51 This is because a thread has its own copy of a lot of resources, such as
53 id. A task in |full_name|, in contrast, is
59 distribute time slices in a round-robin fashion. This distribution is
61 Thread schedulers are typically fair because it is the safest strategy
70 number of threads, it is important to distribute work evenly across
[all …]
H A DParallelizing_Flow_Graph.rst14 edges. When a node in the graph receives a message, a task is spawned to
22 where a sequence of values is processed as each value passes through the
23 nodes in the graph. In this example, the sequence is created by a
27 completely processed, sum is equal to the sum of the sequence of squares
47 this example, a dependence graph is used to establish a partial ordering
54 together. This is a partial ordering because, for example, it doesn't
74 jar, are shared between ordered steps, it is not explicit in the graph.
75 Instead, only the required ordering of steps is explicit in a dependence
84 the runtime library spawns tasks to exploit the parallelism that is
90 is legal to execute in parallel, but allows the runtime library to
[all …]
H A DDependence_Graph.rst10 execute the body objects when it is legal to do so according to the
33 task to execute their body when this count is equal to the total number
42 complete before D and F start executing. This is a partial ordering
43 because, for example, there is no explicit ordering requirement between
71 The first argument is the graph it belongs to and the second is a
73 continue_node is always assumed to have unlimited concurrency and will
77 The following code snippet is an implementation of the example in this
108 One possible execution of this graph is shown below. The execution of D
126 Again, it is important to note that all execution in the flow graph
133 oneTBB work pool while it is waiting.
[all …]
/oneTBB/doc/main/tbb_userguide/design_patterns/
H A DReference_Counting.rst22 Often it is desirable to destroy an object when it is known that it
23 will not be used in the future. Reference counting is a common serial
33 - If there are cycles of references, basic reference counting is
34 insufficient unless the cycle is explicitly broken.
37 - Atomic counting is relatively expensive in hardware.
46 Thread-safe reference counting is like serial reference counting,
47 except that the increment/decrement is done atomically, and the
48 decrement and test "count is zero?" must act as a single atomic
75 It is incorrect to use a separate read for testing if the count is
91 writes complete before the object is deleted.
[all …]
H A DAgglomeration.rst13 Parallelism is so fine grained that overhead of parallel scheduling
27 if each scalar addition is scheduled as a separate task, most of the
42 - The parallelism is for sake of performance and not required for
62 The choice of block topology is typically driven by two concerns:
76 If the loop is "small", on the order of less than 10,000 clock
101 analysis, be careful to consider that information is transferred in
119 so the solution is to treat subtrees as a groups as shown in the
130 Often such an agglomeration is achieved by recursing serially once
144 There agglomeration is part of a four step **PCAM** design method:
151 #. **C**\ ommunication – figure out what communication is required
[all …]
H A DDivide_and_Conquer.rst22 Divide and conquer is widely used in serial algorithms. Common
36 - Splitting problem or merging solutions is relatively cheap
64 Quicksort is a classic divide-and-conquer algorithm. It divides a
83 provides a simple way to parallelize it. The parallel code is shown
101 Eventually the subsorts become small enough that serial execution is
121 The change is an instance of the Agglomeration pattern.
158 A parallel version is shown below.
191 The recursive walk is parallelized using class ``task_group`` to do
196 is introduced. Because it would be unsafe to update ``Hits``
198 accumulate results. Because it is of type
[all …]
H A DElementwise.rst25 summary information is collected, use the Reduction pattern instead.
34 No information is carried or merged between the computations.
43 If the number of items is known in advance, use
52 If the pattern is followed by a reduction on the same data, consider
54 the combination of the two patterns is accomplished in a single sweep
65 Convolution is often used in signal processing. The convolution of a
66 filter ``c`` and signal ``x`` is computed as:
85 For simplicity, the fragment assumes that ``x`` is a pointer into an
92 fits the elementwise pattern. It is straightforward to render it
109 there is reason to agglomerate explicitly, use the overload of
/oneTBB/
H A DBazel.md3 The main build system of oneTBB is CMake*.
4 [Bazel*](https://bazel.build/) support is community-based.
10 Bazel is not recommended for use by oneTBB maintainers. Thus, it is not used internally.
15 The Bazel oneTBB build is currently only intended for a subset of oneTBB that suffices restricted u…
18 The standard Bazel approach to handling third-party libraries is static linking. It is the best pra…
24 The following file structure is assumed:
45 In the *WORKSPACE* file, the oneTBB GitHub* repository is fetched.
77 The expected output of this program is the current version of oneTBB.
88 This flag is supported by the GNU* Compiler Collection (GCC) version 9.3, Clang* 12, and newer vers…
/oneTBB/examples/parallel_for/polygon_overlay/
H A DREADME.md4 This example is a simple implementation of polygon overlay, as described in Parallelizing the [Poly…
9is split into submaps, with each resulting submap being intersected against the corresponding subm…
11is that the area of the generated sub-polygons are subtracted from the original area of one of the…
15 * the fact that for each submap, the number of polygons is smaller than that for the other two case…
17is approximately 80,000 (400 * 200, where 200 is the average number of polygons examined before st…
25is, if we need `N` polygons, then `N` "boxes" are chosen at random, then one-at-a-time the areas a…
27 One limitation of the program is that if the number of polygons in the source map is greater than t…
35 … on Windows, `x`,`con` on Linux and `mac`,`con` on macOS. The default mode is `con`. See the [comm…
/oneTBB/examples/graph/cholesky/
H A DREADME.md4 …n or threaded implementation depending on the version of the oneMKL library that is linked against.
6 …hat uses the Crout-Cholesky algorithm for factorization. The same approach is parallelized for the…
8is similar to the serial implementation is used to create an unrolled version of the computation. …
10is a small, compact graph that passes tiles along its edges. There is one node per type of oneMKL …
33 …efix>_posdef.txt and <output_prefix>_X.txt; where `X` is the algorithm used. If `output_prefix` is
/oneTBB/examples/parallel_for_each/parallel_preorder/
H A DREADME.md4is called a "cell". Each cell has a value. The value is a matrix. Some of the cells have operators…
8 1. Compute the set of cells that have no inputs. This set is called `root_set`.
9 2. Each cell has an associated field `ref_count` that is an atomic integer. Initialize `ref_count` …
17 The example is using custom synchronization via ref_count atomic variable. Correctness checking too…
19 …te:** It is important to understand that this example is unlikely to show speedup if the cell valu…
42 * `n-of-nodes` - the number of nodes in the graph. Default value is 1000.
43 * `n-of-traversals` - the number of times to evaluate the graph. Default value is 500.
/oneTBB/doc/main/reference/
H A Dreference.rst7 version of oneAPI Specification is 1.0.
26 A preview feature is a component of oneTBB introduced to receive early feedback from
31 - It is off by default and must be explicitly enabled.
32 - It is intended to have a high quality implementation.
33 - There is no guarantee of future existence or compatibility.
38 A preview feature is subject to change in future. It might be removed or significantly
41 is strongly discouraged.

1234567891011