| /oneTBB/test/conformance/ |
| H A D | conformance_concurrent_lru_cache.cpp | 42 preset::cache_type& cache = preset_object.cache; variable 45 preset::handle_type h = cache[dummy_key]; 54 preset::cache_type& cache = preset_object.cache; variable 60 preset::handle_type bull = cache["bull"]; 79 preset1::cache_type& cache = preset_instance.cache; variable 85 handle = cache["handle"]; 87 preset1::handle_type foo = cache["bar"]; 107 preset::cache_type& cache = preset_object.cache; variable 110 cache[dummy_key]; 111 cache[dummy_key]; [all …]
|
| /oneTBB/test/tbb/ |
| H A D | test_concurrent_lru_cache.cpp | 43 preset::cache_type& cache = preset_object.cache; variable 47 dummy_f(dummy_key) == cache[dummy_key].value(), 57 preset_object.cache[i]; 70 preset_object.cache[i]; 89 preset_object.cache[0]; 103 preset::handle_type h = preset_object.cache[0]; 117 preset::handle_type h = preset_object.cache[0]; 119 preset::handle_type h1 = preset_object.cache[0]; 135 cache_type cache{foo, 0}; variable 144 cache_type::handle h = cache[1]; [all …]
|
| /oneTBB/doc/main/tbb_userguide/ |
| H A D | appendix_A.rst | 20 A more subtle cost is *cache cooling*. Processors keep recently accessed 21 data in cache memory, which is very fast, but also relatively small 22 compared to main memory. When the processor runs out of cache memory, it 23 has to evict items from cache and put them back into main memory. 24 Typically, it chooses the least recently used items in the cache. (The 26 not a cache primer.) When a logical thread gets its time slice, as it 28 into cache, taking hundreds of cycles. If it is referenced frequently 30 cache, and only take a few cycles. Such data is called "hot in cache". 33 to evict data that was hot in cache for A, unless both threads need the 35 evicted data, at the cost of hundreds of cycles for each cache miss. Or [all …]
|
| H A D | Memory_Allocation.rst | 26 different words that share the same cache line. The problem is that a 27 cache line is the unit of information interchange between processor 28 caches. If one processor modifies a cache line and another processor 29 reads the same cache line, the line must be moved from one processor 32 cache lines can take hundreds of clocks to move. 36 a separate cache line. Two objects allocated by 55 cost in space, because it must allocate at least one cache line’s
|
| H A D | Bandwidth_and_Cache_Affinity_os.rst | 11 cache. Restructuring to better utilize the cache usually benefits the 17 grainsize, but also optimizes for cache affinity and tries to distribute 25 - The data acted upon by the loop fits in cache. 34 usually provides sufficient cache affinity. 85 large to be carried in cache between loop invocations. The peak in the
|
| H A D | Partitioner_Summary.rst | 29 - Automatic chunk size, cache affinity and uniform distribution of iterations. 32 …- Deterministic chunk size, cache affinity and uniform distribution of iterations without loa… 57 - A large subrange might use cache inefficiently. For example, suppose 60 repeatedly referenced memory locations to fit in cache.
|
| H A D | When_Not_to_Use_Queues.rst | 23 value (and whatever it references) becomes "cold" in cache. Or worse 31 cache.
|
| H A D | How_Task_Scheduler_Works.rst | 23 - **Strike when the cache is hot**. The deepest tasks are the most recently created tasks and there… 24 … completed, tasks that depend on it can continue executing, and though not the hottest in a cache,
|
| H A D | Throughput_of_pipeline.rst | 32 cache. A good guideline is to try for a large window size that still 33 fits in cache. You may have to experiment a bit to find a good window
|
| H A D | Advanced_Topic_Other_Kinds_of_Iteration_Spaces.rst | 72 loop to be "recursively blocked" in a way that improves cache usage. 73 This nice cache behavior means that using ``parallel_for`` over a
|
| H A D | Mutex_Flavors.rst | 123 - 2 cache lines 147 - 3 cache lines
|
| /oneTBB/test/common/ |
| H A D | concurrent_lru_cache_common.h | 167 cache_type cache; member 171 cache(callback, number_of_lru_history_items) {}; in preset_default() 178 cache_type cache; member 185 cache(&callback, number_of_lru_history_items) {}; in preset1() 196 cache_type cache; member 200 cache(&counter_type::call, number_of_lru_history_items) {} in preset_call_count() 214 cache_type cache; member 219 cache(cloner, number_of_lru_history_items) {} in preset_instance_count() 233 cache_type cache; member 236 cache(map_searcher_type(objects_map), number_of_lru_history_items) {} in preset_map_instance_count() [all …]
|
| /oneTBB/doc/main/tbb_userguide/design_patterns/ |
| H A D | Agglomeration.rst | 68 - Minimizing cache traffic between blocks. 72 be independent too, and then only cache traffic issues must be 91 When agglomerating, think about cache effects. Avoid having cache 102 cache line units. For a given area, the perimeter may be minimized 103 when the block is square with respect to the underlying grid of cache 152 between tasks. When using oneTBB, communication is usually cache
|
| H A D | Fenced_Data_Transfer.rst | 160 does not read individual values from main memory. It reads cache 161 lines. The target of a pointer may be in a cache line that has
|
| /oneTBB/doc/main/reference/ |
| H A D | concurrent_lru_cache_cls.rst | 9 A Class Template for Least Recently Used cache with concurrent operations. 91 **Effects**: Constructs an empty cache that can keep up to ``number_of_lru_history_items``
|
| /oneTBB/examples/parallel_for_each/parallel_preorder/ |
| H A D | README.md | 20 * The smaller value type causes each `Cell` to be significantly smaller than a cache line, which le…
|
| /oneTBB/ |
| H A D | CMakeLists.txt | 26 …# CMAKE_<LANG>_FLAGS_<CONFIG> cache entries and use CMAKE_MSVC_RUNTIME_LIBRARY abstraction instead.
|
| /oneTBB/doc/ |
| H A D | Doxyfile.in | 413 # The size of the symbol lookup cache can be set using LOOKUP_CACHE_SIZE. This 414 # cache is used to resolve symbols given their name and scope. Since this can be 416 # code, doxygen keeps a cache of pre-resolved symbols. If the cache is too small 417 # doxygen will become slower. If the cache is too large, memory is wasted. The 418 # cache size is given by this formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range 419 # is 0..9, the default is 0, corresponding to a cache size of 2^16=65536 420 # symbols. At the end of a run doxygen will report the cache usage and suggest 421 # the optimal cache size from a speed point of view.
|