Name Date Size #Lines LOC

..12-Feb-2024-

CMakeLists.txtH A D12-Feb-20241.3 KiB3527

Graph.cppH A D12-Feb-20242.8 KiB8863

Graph.hppH A D12-Feb-20242 KiB8633

Matrix.hppH A D12-Feb-20242 KiB6343

README.mdH A D12-Feb-20242.7 KiB4535

main.cppH A D12-Feb-20243.3 KiB9054

parallel_preorder.cppH A D12-Feb-20241.8 KiB5222

README.md

1# Parallel_preorder sample
2Example that uses `parallel_for_each` to do parallel preorder traversal of a sparse graph.
3
4Each vertex in the graph is called a "cell". Each cell has a value. The value is a matrix. Some of the cells have operators that compute the cell's value, using other cell's values as input. A cell that uses the value of cell `x` is called a successor of `x`.
5
6The algorithm works as follows.
7
81. Compute the set of cells that have no inputs. This set is called `root_set`.
92. Each cell has an associated field `ref_count` that is an atomic integer. Initialize `ref_count` to the number of inputs for the `Cell`.
103. Update each cell in `root_set`, by applying a `parallel_for_each` to a `root_set`.
114. After updating a cell, for each of its successors
12    1. Atomically decrement the successor's ref_count
13    2. If the count became zero, add the cell to the set of cells to be updated, by calling `feeder::add`.
14
15The times printed are for the traversal and update, and do not include time for computing the `root_set`.
16
17The example is using custom synchronization via ref_count atomic variable. Correctness checking tools might not take this into account, and report data races between different tasks that are actually synchronized.
18
19**Note:** It is important to understand that this example is unlikely to show speedup if the cell values are changed to type "float". The reason is twofold.
20* The smaller value type causes each `Cell` to be significantly smaller than a cache line, which leads to false sharing conflicts.
21* The time to update the cells becomes very small, and consequently the overhead of `parallel_for_each` swamps the useful work.
22
23## Building the example
24```
25cmake <path_to_example>
26cmake --build .
27```
28
29## Running the sample
30### Predefined make targets
31* `make run_parallel_preorder` - executes the example with predefined parameters
32* `make perf_run_parallel_preorder` - executes the example with suggested parameters to measure the oneTBB performance
33* `make light_test_parallel_preorder` - executes the example with suggested parameters to reduce execution time.
34
35### Application parameters
36Usage:
37```
38parallel_preorder [n-of-threads=value] [n-of-nodes=value] [n-of-traversals=value] [silent] [-h] [n-of-threads [n-of-nodes [n-of-traversals]]]
39```
40* `-h` - prints the help for command line options.
41* `n-of-threads` - the number of threads to use; a range of the form low\[:high\], where low and optional high are non-negative integers or `auto` for a platform-specific default number.
42* `n-of-nodes` - the number of nodes in the graph. Default value is 1000.
43* `n-of-traversals` - the number of times to evaluate the graph. Default value is 500.
44* `silent` - no output except elapsed time.
45