|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
17dcde5f |
| 18-May-2022 |
AndreyChurbanov <[email protected]> |
[OpenMP][libomp] Allow reset affinity mask after parallel
Added control to reset affinity of primary thread after outermost parallel region to initial affinity encountered before OpenMP runtime was
[OpenMP][libomp] Allow reset affinity mask after parallel
Added control to reset affinity of primary thread after outermost parallel region to initial affinity encountered before OpenMP runtime was initialized. KMP_AFFINITY environment variable reset/noreset modifier introduced. Default behavior is unchanged.
Differential Revision: https://reviews.llvm.org/D125993
show more ...
|
| #
a01d274f |
| 19-May-2022 |
AndreyChurbanov <[email protected]> |
[OpenMP][libomp] Fix /dev/shm pollution after forked child process terminates
Made library registration conditional and skip it in the __kmp_atfork_child handler, postponed it till middle initializa
[OpenMP][libomp] Fix /dev/shm pollution after forked child process terminates
Made library registration conditional and skip it in the __kmp_atfork_child handler, postponed it till middle initialization in the child. This fixes the problem of applications those use e.g. popen/pclose which terminate the forked child process.
Differential Revision: https://reviews.llvm.org/D125996
show more ...
|
| #
d4a7b8de |
| 24-Jun-2022 |
Daniel Douglas <[email protected]> |
[OpenMP][libomp] avoid spin wait and yield on arm64 macOS
This patch changes the default behavior to avoid spin waiting and yielding. (See “Don’t Keep Threads Active And Idle” section here: https://
[OpenMP][libomp] avoid spin wait and yield on arm64 macOS
This patch changes the default behavior to avoid spin waiting and yielding. (See “Don’t Keep Threads Active And Idle” section here: https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon)
We verified using instruments traces that the changes improve scheduling behavior on macOS.
We also collected results using EPCC schedbench (https://github.com/LangdalP/EPCC-OpenMP-micro-benchmarks) that are attached here that show a reduction in standard deviation and max test run time across all scheduling types. Static scheduling sees dramatic improvements with these changes, we see a 2-4x average runtime improvement in the benchmark.
Differential Revision: https://reviews.llvm.org/D126510
show more ...
|
| #
b7b49865 |
| 05-May-2022 |
Jonathan Peyton <[email protected]> |
[OpenMP][libomp] Hold old __kmp_threads arrays until library shutdown
When many nested teams are formed, __kmp_threads may be reallocated to accommodate new threads. This reallocation causes a data
[OpenMP][libomp] Hold old __kmp_threads arrays until library shutdown
When many nested teams are formed, __kmp_threads may be reallocated to accommodate new threads. This reallocation causes a data race when another existing team's thread simultaneously references __kmp_threads. This patch keeps the old thread arrays around until library shutdown so these lingering references can complete without issue and access to __kmp_threads remains a simple array reference.
Fixes: https://github.com/llvm/llvm-project/issues/54708 Differential Revision: https://reviews.llvm.org/D125013
show more ...
|
| #
c44ba01d |
| 18-May-2022 |
AndreyChurbanov <[email protected]> |
[OpenMP] libomp: honor passive wait policy requested with tasking
Currently the library ignores requested wait policy in the presence of tasking. Threads always actively spin. The patch fixes this p
[OpenMP] libomp: honor passive wait policy requested with tasking
Currently the library ignores requested wait policy in the presence of tasking. Threads always actively spin. The patch fixes this problem making the wait policy passive if this explicitly requested by user.
Differential Revision: https://reviews.llvm.org/D123044
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
840c0404 |
| 06-Apr-2022 |
Joseph Huber <[email protected]> |
[OpenMP] Change target memory tests to use allocators
The target allocators have been supported for NVPTX offloading for awhile. The tests should use the allocators instead of calling the functions
[OpenMP] Change target memory tests to use allocators
The target allocators have been supported for NVPTX offloading for awhile. The tests should use the allocators instead of calling the functions manually. Also the comments indicating these being a preview should be removed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D123242
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
2e02579a |
| 14-Dec-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] Add use of TPAUSE
Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware, with an envirable to turn it on in a particular C-state. Always uses TPAUSE if it is selected and enab
[OpenMP] Add use of TPAUSE
Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware, with an envirable to turn it on in a particular C-state. Always uses TPAUSE if it is selected and enabled by Intel hardware and presence of WAITPKG, and if not, falls back to old way of checking __kmp_use_yield, etc.
Differential Revision: https://reviews.llvm.org/D115758
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
50b68a3d |
| 15-Sep-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP][host runtime] Add support for teams affinity
This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PRO
[OpenMP][host runtime] Add support for teams affinity
This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PROC_BIND environment variable. Unlike OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a list of values. The values follow the same semantics under the OpenMP specification for parallel regions except T is the number of teams in a league instead of the number of threads in a parallel region.
Differential Revision: https://reviews.llvm.org/D109921
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
d8e4cb91 |
| 15-Jul-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in so
[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier.
This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently.
The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required:
KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist
Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier.
Patch fixed for ITTNotify disabled builds and non-x86 builds
Co-authored-by: Jonathan Peyton <[email protected]> Co-authored-by: Vladislav Vinogradov <[email protected]>
Differential Revision: https://reviews.llvm.org/D103121
show more ...
|
| #
4eb90e89 |
| 29-Jun-2021 |
Johannes Doerfert <[email protected]> |
Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecfc9b2e3cb76776185e63bfdb094cd98.
This breaks non-x86 OpenMP builds for a while now. Until a solution is ready to be
Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecfc9b2e3cb76776185e63bfdb094cd98.
This breaks non-x86 OpenMP builds for a while now. Until a solution is ready to be upstreamed we revert the feature and unblock those builds. See: https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 and https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821
The currently proposed fix (D104788) seems not to be ready yet: https://reviews.llvm.org/D104788#2841928
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
25073a4e |
| 21-May-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper ba
[OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier.
This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently.
The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required:
KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist
Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier.
Differential Revision: https://reviews.llvm.org/D103121
show more ...
|
| #
8ec9aa23 |
| 10-May-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] Add experimental nesting mode feature
Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to th
[OpenMP] Add experimental nesting mode feature
Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to the hardware topology levels on the machine an application is being run on. For example, if a machine has 2 sockets, each with 12 cores, then use of nesting mode could set up an outer level of nesting that uses 2 threads per parallel region, and an inner level of nesting that uses 12 threads per parallel region.
Nesting mode is controlled with the KMP_NESTING_MODE environment variable as follows:
1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var is set to 1 (the default -- nesting is off, nested parallel regions are serialized).
2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads will be assigned for each level discovered in the machine topology; max-active-levels-var is set to the number of levels discovered.
3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change or be removed in the future.] Nesting mode is on, and a number of threads will be assigned for each topology level discovered on the machine, up to k<=n levels (since there may be fewer than n levels discovered in the topology), and beyond the kth level, nested parallel regions will be serialized; NOTE: max-active-levels-var is 1 (the default -- nesting is off, and nested parallel regions are serialized until the user changes max-active-levels-var.
If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will override KMP_NESTING_MODE settings for the associated environment variables. The detected topology may be limited by an affinity mask setting on the initial thread, or if the user sets KMP_HW_SUBSET. See also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for nested parallel regions. Note that this feature only sets numbers of threads used at nesting levels. The user should make use of OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those threads, if desired.
Differential Revision: https://reviews.llvm.org/D102188
show more ...
|
| #
9982f33e |
| 16-Apr-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP] Refactor/Rework topology discovery code
This patch does the following:
1) Introduce kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) t
[OpenMP] Refactor/Rework topology discovery code
This patch does the following:
1) Introduce kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) to determine the exact machine topology which can vary widely among current and future architectures. The current design is not easy to expand beyond the assumed three layer topology: sockets, cores, and threads so a rework capable of using the existing KMP_AFFINITY mechanisms is required.
This new topology structure has: * The depth and types of the topology * Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core) * Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads) * Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1/L2 cache equivalent to core) * Whether it is uniform or not
The hardware threads are represented with the kmp_hw_thread_t structure. This structure contains the ids (e.g., socket 0, core 1, thread 0) and other information grabbed from the previous Address structure. The kmp_topology_t structure contains an array of these.
2) Generalize the KMP_HW_SUBSET envirable for the new kmp_topology_t structure. The algorithm doesn't assume any order with tiles,numa domains,sockets,cores,threads. Instead it just parses the envirable, makes sure it is consistent with the detected topology (including taking into account equivalent layers) and then trims away the unneeded subset of hardware threads. To enable this, a new kmp_hw_subset_t structure is introduced which contains a vector of items (hardware type, number user wants, offset). Any keyword within __kmp_hw_get_keyword() can be used as a name and can be shortened as well. e.g., KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.
3) Simplify topology detection functions so they only do the singular task of detecting the machine's topology. Printing, and all canonicalizing functionality is now done afterwards. So many lines of duplicated code are eliminated.
4) Add new ll_caches and numa_domains to OMP_PLACES, and consequently, KMP_AFFINITY's granularity setting. All the names within __kmp_hw_get_keyword() are available for use in OMP_PLACES or KMP_AFFINITY's granularity setting.
5) Simplify and future-proof code where explicit lists of allowed affinity settings keywords inside if() conditions.
6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method so equivalent caches could be detected (in particular for the ll_caches place).
Differential Revision: https://reviews.llvm.org/D100997
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3 |
|
| #
97d000cf |
| 05-Mar-2021 |
tlwilmar <[email protected]> |
Added API for "masked" construct via two entrypoints: __kmpc_masked, and __kmpc_end_masked. The "master" construct is deprecated. Changed proc-bind keyword from "master" to "primary". Use of both mas
Added API for "masked" construct via two entrypoints: __kmpc_masked, and __kmpc_end_masked. The "master" construct is deprecated. Changed proc-bind keyword from "master" to "primary". Use of both master construct and master as proc-bind keyword is still allowed, but deprecated.
Remove references to "master" in comments and strings, and replace with "primary" or "primary thread". Function names and variables were not touched, nor were references to deprecated master construct. These can be updated over time. No new code should refer to master.
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2 |
|
| #
b6c2f538 |
| 12-Feb-2021 |
Hansang Bae <[email protected]> |
[OpenMP] Add allocator support for target memory
This is a preview of allocator support for target memory that depends on the offload runtime API which allocates memory as described below.
llvm_omp
[OpenMP] Add allocator support for target memory
This is a preview of allocator support for target memory that depends on the offload runtime API which allocates memory as described below.
llvm_omp_target_alloc_host(size_t size, int device_num); -- Returns non-migratable memory owned by host. -- Memory is accessible by host and device(s).
llvm_omp_target_alloc_shared(size_t size, int device_num); -- Returns migratable memory owned by host and device. -- Memory is accessible by host and device.
llvm_omp_target_alloc_device(size_t size, int device_num); -- Returns memory owned by device. -- Memory is only accessible by device.
New memory space and predefined allocator names are -- llvm_omp_target_host_mem_space -- llvm_omp_target_shared_mem_space -- llvm_omp_target_device_mem_space -- llvm_omp_target_host_mem_alloc -- llvm_omp_target_shared_mem_alloc -- llvm_omp_target_device_mem_alloc
Differential Revision: https://reviews.llvm.org/D96669
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3 |
|
| #
d7b12004 |
| 01-Feb-2021 |
AndreyChurbanov <[email protected]> |
[OpenMP] libomp: implement nteams-var and teams-thread-limit-var ICVs
The change includes OMP_NUM_TEAMS, OMP_TEAMS_THREAD_LIMIT env variables, omp_set_num_teams, omp_get_max_teams, omp_set_teams_thr
[OpenMP] libomp: implement nteams-var and teams-thread-limit-var ICVs
The change includes OMP_NUM_TEAMS, OMP_TEAMS_THREAD_LIMIT env variables, omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit, omp_get_teams_thread_limit routines.
Differential Revision: https://reviews.llvm.org/D95003
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init |
|
| #
67773681 |
| 22-Jan-2021 |
Jonathan Peyton <[email protected]> |
[OpenMP] Add environment variable to force monotonic dynamic scheduling
This patch introduces a new environment variable to force monotonic behavior for users that absolutely need it. This is in an
[OpenMP] Add environment variable to force monotonic dynamic scheduling
This patch introduces a new environment variable to force monotonic behavior for users that absolutely need it. This is in anticipation of 5.0 change that uses non-monotonic behavior for dynamic scheduling by default. Fixes for that and the actual switch are coming soon.
Differential Revision: https://reviews.llvm.org/D95263
show more ...
|
|
Revision tags: llvmorg-11.1.0-rc2 |
|
| #
598c590b |
| 15-Jan-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP] Add cpuid leaf 1f topology discovery
This patch adds the new algorithm for topology discovery using cpuid leaf 1f. Only the new die level is detected and integrated into the current affini
[OpenMP] Add cpuid leaf 1f topology discovery
This patch adds the new algorithm for topology discovery using cpuid leaf 1f. Only the new die level is detected and integrated into the current affinity mechanisms including KMP_AFFINITY (granularity level and compact/scatter algorithm), OMP_PLACES=dies, and KMP_HW_SUBSET.
Differential Revision: https://reviews.llvm.org/D95157
show more ...
|
| #
9d64275a |
| 26-Jan-2021 |
Shilei Tian <[email protected]> |
[OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper t
[OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed: 1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D77609
show more ...
|
| #
9bf843bd |
| 18-Jan-2021 |
Shilei Tian <[email protected]> |
Revert "[OpenMP] Added the support for hidden helper task in RTL"
This reverts commit ed939f853da1f2266f00ea087f778fda88848f73.
|
| #
ed939f85 |
| 16-Jan-2021 |
Shilei Tian <[email protected]> |
[OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper t
[OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed: 1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D77609
show more ...
|
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
e0665a90 |
| 01-Dec-2020 |
Terry Wilmarth <[email protected]> |
[OpenMP] Add support for Intel's umonitor/umwait
These changes add support for Intel's umonitor/umwait usage in wait code, for architectures that support those intrinsic functions. Usage of umonitor
[OpenMP] Add support for Intel's umonitor/umwait
These changes add support for Intel's umonitor/umwait usage in wait code, for architectures that support those intrinsic functions. Usage of umonitor/umwait is off by default, but can be turned on by setting the KMP_USER_LEVEL_MWAIT environment variable.
Differential Revision: https://reviews.llvm.org/D91189
show more ...
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
5644f734 |
| 20-Nov-2020 |
AndreyChurbanov <[email protected]> |
Revert "[OpenMP] Add support for Intel's umonitor/umwait"
This reverts commit 9cfad5f9c5bfd985f1bc8b0954f58013c5236e58.
|
| #
9cfad5f9 |
| 19-Nov-2020 |
AndreyChurbanov <[email protected]> |
[OpenMP] Add support for Intel's umonitor/umwait
Patch by tlwilmar (Terry Wilmarth)
Differential Revision: https://reviews.llvm.org/D91189
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4 |
|
| #
673e5476 |
| 04-Sep-2019 |
Jonas Hahnfeld <[email protected]> |
[OpenMP] Change initialization of __kmp_global
There's no need to initialize variables with static storage duration because they're implicitly initialized to zero. See https://en.cppreference.com/w/
[OpenMP] Change initialization of __kmp_global
There's no need to initialize variables with static storage duration because they're implicitly initialized to zero. See https://en.cppreference.com/w/c/language/initialization#Implicit_initialization
I think that's already relied upon because the supplied 0 only sets 'kmp_time_global_t g_time;' in 'struct kmp_base_global'. The other fields are not set in the code, but implicitly initialized by the compiler.
Differential Revision: https://reviews.llvm.org/D66292
llvm-svn: 370943
show more ...
|