|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
17dcde5f |
| 18-May-2022 |
AndreyChurbanov <[email protected]> |
[OpenMP][libomp] Allow reset affinity mask after parallel
Added control to reset affinity of primary thread after outermost parallel region to initial affinity encountered before OpenMP runtime was
[OpenMP][libomp] Allow reset affinity mask after parallel
Added control to reset affinity of primary thread after outermost parallel region to initial affinity encountered before OpenMP runtime was initialized. KMP_AFFINITY environment variable reset/noreset modifier introduced. Default behavior is unchanged.
Differential Revision: https://reviews.llvm.org/D125993
show more ...
|
| #
f613e6d1 |
| 19-May-2022 |
Jonathan Peyton <[email protected]> |
[OpenMP][libomp] Fix accidental removal of else for core attributes
|
| #
c44ba01d |
| 18-May-2022 |
AndreyChurbanov <[email protected]> |
[OpenMP] libomp: honor passive wait policy requested with tasking
Currently the library ignores requested wait policy in the presence of tasking. Threads always actively spin. The patch fixes this p
[OpenMP] libomp: honor passive wait policy requested with tasking
Currently the library ignores requested wait policy in the presence of tasking. Threads always actively spin. The patch fixes this problem making the wait policy passive if this explicitly requested by user.
Differential Revision: https://reviews.llvm.org/D123044
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
cb1bee47 |
| 12-Feb-2022 |
AndreyChurbanov <[email protected]> |
[OpenMP] libomp: fix UB when LIBOMP_NUM_HIDDEN_HELPER_THREADS=1.
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads. Thus number of worker hidden helper threads corresponds t
[OpenMP] libomp: fix UB when LIBOMP_NUM_HIDDEN_HELPER_THREADS=1.
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads. Thus number of worker hidden helper threads corresponds to user request, main thread of helper team excluded as it does not participate in actual work. This also fixes divide-by-0 issue in the code.
Fixes #48656
Differential Revision: https://reviews.llvm.org/D119586
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
2e02579a |
| 14-Dec-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] Add use of TPAUSE
Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware, with an envirable to turn it on in a particular C-state. Always uses TPAUSE if it is selected and enab
[OpenMP] Add use of TPAUSE
Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware, with an envirable to turn it on in a particular C-state. Always uses TPAUSE if it is selected and enabled by Intel hardware and presence of WAITPKG, and if not, falls back to old way of checking __kmp_use_yield, etc.
Differential Revision: https://reviews.llvm.org/D115758
show more ...
|
| #
6a556eca |
| 15-Dec-2021 |
Jonathan Peyton <[email protected]> |
[OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET
This patch allows the user to request all resources of a particular layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified so the num
[OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET
This patch allows the user to request all resources of a particular layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified so the number of units requested is optional or can be replaced with an '*' character.
e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3 e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per each socket and 1 thread per core.
Differential Revision: https://reviews.llvm.org/D115826
show more ...
|
| #
97693409 |
| 14-Dec-2021 |
Jonathan Peyton <[email protected]> |
[OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changes
Add missing guards around x86-specific code.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D115664
|
| #
df205995 |
| 30-Nov-2021 |
Jonathan Peyton <[email protected]> |
[OpenMP][libomp] Add core attributes to KMP_HW_SUBSET
Allow filtering of resources based on core attributes. There are two new attributes added: 1) Core Type (intel_atom, intel_core) 2) Core Efficie
[OpenMP][libomp] Add core attributes to KMP_HW_SUBSET
Allow filtering of resources based on core attributes. There are two new attributes added: 1) Core Type (intel_atom, intel_core) 2) Core Efficiency (integer) where the higher the efficiency, the more performant the core On hybrid architectures , e.g., Alder Lake, users can specify KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom and first four Big cores. The can also use the efficiency syntax. e.g., KMP_HW_SUBSET=2c:eff0,2c:eff1
Differential Revision: https://reviews.llvm.org/D114901
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
286094af |
| 21-Oct-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP][libomp] Improve Windows Processor Group handling within topology
The current implementation of Windows Processor Groups has a separate topology method to handle them. This patch deprecates
[OpenMP][libomp] Improve Windows Processor Group handling within topology
The current implementation of Windows Processor Groups has a separate topology method to handle them. This patch deprecates that specific method and uses the regular CPUID topology method by default and inserts the Windows Processor Group objects in the topology manually.
Notes: * The preference for processor groups is lowered to a value less than socket so that the user will see sockets in the KMP_AFFINITY=verbose output instead of processor groups when sockets=processor groups. * The topology's capacity is modified to handle additional topology layers without the need for reallocation. * If a user asks for a granularity setting that is "above" the processor group layer, then the granularity is adjusted "down" to the processor group since this is the coarsest layer available for threads.
Differential Revision: https://reviews.llvm.org/D112273
show more ...
|
| #
0808d956 |
| 08-Nov-2021 |
@t-msn <[email protected]> |
[OpenMP] libomp: Fix handling of barrier pattern environment variables
It is better to set all barrier patterns to use "dist" when at least one environment variable specifies "dist". Otherwise if on
[OpenMP] libomp: Fix handling of barrier pattern environment variables
It is better to set all barrier patterns to use "dist" when at least one environment variable specifies "dist". Otherwise if only one environment is set to "dist" and others left blank inadvertently, it would result in mixing dist barrier with default hyper barrier pattern.
Differential Revision: https://reviews.llvm.org/D112597
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
50b68a3d |
| 15-Sep-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP][host runtime] Add support for teams affinity
This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PRO
[OpenMP][host runtime] Add support for teams affinity
This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PROC_BIND environment variable. Unlike OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a list of values. The values follow the same semantics under the OpenMP specification for parallel regions except T is the number of teams in a league instead of the number of threads in a parallel region.
Differential Revision: https://reviews.llvm.org/D109921
show more ...
|
| #
5e58b63b |
| 13-Oct-2021 |
AndreyChurbanov <[email protected]> |
[OpenMP] libomp: fix warning on comparison of integer expressions of different signedness
Replaced macro with global variable of correspondent type.
Differential Revision: https://reviews.llvm.org/
[OpenMP] libomp: fix warning on comparison of integer expressions of different signedness
Replaced macro with global variable of correspondent type.
Differential Revision: https://reviews.llvm.org/D111562
show more ...
|
| #
343b9e85 |
| 20-Sep-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP][host runtime] Introduce kmp_cpuinfo_flags_t to replace integer flags
Store CPUID support flags as bits instead of using entire integers.
Differential Revision: https://reviews.llvm.org/D11
[OpenMP][host runtime] Introduce kmp_cpuinfo_flags_t to replace integer flags
Store CPUID support flags as bits instead of using entire integers.
Differential Revision: https://reviews.llvm.org/D110091
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
f5616a98 |
| 17-Aug-2021 |
Martin Storsjö <[email protected]> |
[OpenMP] Fix the usage of sscanf on MinGW
KMP_SSCANF only evaluates to sscanf_s within #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT so we need to pass the sscanf_s specific parameters within a similar
[OpenMP] Fix the usage of sscanf on MinGW
KMP_SSCANF only evaluates to sscanf_s within #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT so we need to pass the sscanf_s specific parameters within a similar condition.
Differential Revision: https://reviews.llvm.org/D108196
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
b4a1f441 |
| 28-Jun-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP] Add a few small fixes
* Add comment to help ensure new construct data are added in two places * Check for division by zero in the loop worksharing code * Check for syntax errors in parrange
[OpenMP] Add a few small fixes
* Add comment to help ensure new construct data are added in two places * Check for division by zero in the loop worksharing code * Check for syntax errors in parrange parsing
Differential Revision: https://reviews.llvm.org/D105929
show more ...
|
| #
6eeb4c1f |
| 13-Jul-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP] Fix incorrect parameters to sscanf_s call
On Windows, the documentation states that when using sscanf_s, each %c and %s specifier must also have additional size parameter. This patch adds t
[OpenMP] Fix incorrect parameters to sscanf_s call
On Windows, the documentation states that when using sscanf_s, each %c and %s specifier must also have additional size parameter. This patch adds the size parameter in the one place where %c is used.
Differential Revision: https://reviews.llvm.org/D105931
show more ...
|
| #
d8e4cb91 |
| 15-Jul-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in so
[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier.
This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently.
The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required:
KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist
Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier.
Patch fixed for ITTNotify disabled builds and non-x86 builds
Co-authored-by: Jonathan Peyton <[email protected]> Co-authored-by: Vladislav Vinogradov <[email protected]>
Differential Revision: https://reviews.llvm.org/D103121
show more ...
|
| #
4eb90e89 |
| 29-Jun-2021 |
Johannes Doerfert <[email protected]> |
Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecfc9b2e3cb76776185e63bfdb094cd98.
This breaks non-x86 OpenMP builds for a while now. Until a solution is ready to be
Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecfc9b2e3cb76776185e63bfdb094cd98.
This breaks non-x86 OpenMP builds for a while now. Until a solution is ready to be upstreamed we revert the feature and unblock those builds. See: https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 and https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821
The currently proposed fix (D104788) seems not to be ready yet: https://reviews.llvm.org/D104788#2841928
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
| #
5dd4d0d4 |
| 22-Jun-2021 |
AndreyChurbanov <[email protected]> |
[OpenMP] libomp: fix dynamic loop dispatcher
Restructured dynamic loop dispatcher code. Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule: - eliminated possibility of st
[OpenMP] libomp: fix dynamic loop dispatcher
Restructured dynamic loop dispatcher code. Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule: - eliminated possibility of stealing iterations of the wrong loop when victim thread changed its buffer to work on another loop; - fixed race when victim thread changed its buffer to work in nested parallel; - eliminated "static" property of the schedule, that is now a single thread can execute whole loop.
Differential Revision: https://reviews.llvm.org/D103648
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
25073a4e |
| 21-May-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper ba
[OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier.
This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently.
The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required:
KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist
Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier.
Differential Revision: https://reviews.llvm.org/D103121
show more ...
|
| #
f61602b0 |
| 08-Jun-2021 |
Vignesh Balasubramanian <[email protected]> |
[OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs. It contains
[OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs. It contains support code required in "openmp/runtime" for OMPD implementation.
Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100181
show more ...
|
| #
8ec9aa23 |
| 10-May-2021 |
Terry Wilmarth <[email protected]> |
[OpenMP] Add experimental nesting mode feature
Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to th
[OpenMP] Add experimental nesting mode feature
Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to the hardware topology levels on the machine an application is being run on. For example, if a machine has 2 sockets, each with 12 cores, then use of nesting mode could set up an outer level of nesting that uses 2 threads per parallel region, and an inner level of nesting that uses 12 threads per parallel region.
Nesting mode is controlled with the KMP_NESTING_MODE environment variable as follows:
1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var is set to 1 (the default -- nesting is off, nested parallel regions are serialized).
2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads will be assigned for each level discovered in the machine topology; max-active-levels-var is set to the number of levels discovered.
3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change or be removed in the future.] Nesting mode is on, and a number of threads will be assigned for each topology level discovered on the machine, up to k<=n levels (since there may be fewer than n levels discovered in the topology), and beyond the kth level, nested parallel regions will be serialized; NOTE: max-active-levels-var is 1 (the default -- nesting is off, and nested parallel regions are serialized until the user changes max-active-levels-var.
If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will override KMP_NESTING_MODE settings for the associated environment variables. The detected topology may be limited by an affinity mask setting on the initial thread, or if the user sets KMP_HW_SUBSET. See also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for nested parallel regions. Note that this feature only sets numbers of threads used at nesting levels. The user should make use of OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those threads, if desired.
Differential Revision: https://reviews.llvm.org/D102188
show more ...
|
| #
9982f33e |
| 16-Apr-2021 |
Peyton, Jonathan L <[email protected]> |
[OpenMP] Refactor/Rework topology discovery code
This patch does the following:
1) Introduce kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) t
[OpenMP] Refactor/Rework topology discovery code
This patch does the following:
1) Introduce kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) to determine the exact machine topology which can vary widely among current and future architectures. The current design is not easy to expand beyond the assumed three layer topology: sockets, cores, and threads so a rework capable of using the existing KMP_AFFINITY mechanisms is required.
This new topology structure has: * The depth and types of the topology * Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core) * Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads) * Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1/L2 cache equivalent to core) * Whether it is uniform or not
The hardware threads are represented with the kmp_hw_thread_t structure. This structure contains the ids (e.g., socket 0, core 1, thread 0) and other information grabbed from the previous Address structure. The kmp_topology_t structure contains an array of these.
2) Generalize the KMP_HW_SUBSET envirable for the new kmp_topology_t structure. The algorithm doesn't assume any order with tiles,numa domains,sockets,cores,threads. Instead it just parses the envirable, makes sure it is consistent with the detected topology (including taking into account equivalent layers) and then trims away the unneeded subset of hardware threads. To enable this, a new kmp_hw_subset_t structure is introduced which contains a vector of items (hardware type, number user wants, offset). Any keyword within __kmp_hw_get_keyword() can be used as a name and can be shortened as well. e.g., KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.
3) Simplify topology detection functions so they only do the singular task of detecting the machine's topology. Printing, and all canonicalizing functionality is now done afterwards. So many lines of duplicated code are eliminated.
4) Add new ll_caches and numa_domains to OMP_PLACES, and consequently, KMP_AFFINITY's granularity setting. All the names within __kmp_hw_get_keyword() are available for use in OMP_PLACES or KMP_AFFINITY's granularity setting.
5) Simplify and future-proof code where explicit lists of allowed affinity settings keywords inside if() conditions.
6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method so equivalent caches could be detected (in particular for the ll_caches place).
Differential Revision: https://reviews.llvm.org/D100997
show more ...
|
| #
77dc7b46 |
| 12-Apr-2021 |
Hansang Bae <[email protected]> |
[OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT
Also fixed typo in the verbose message.
Differential Revision: https://reviews.llvm.org/D100414
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
467f3924 |
| 11-Mar-2021 |
Hansang Bae <[email protected]> |
[OpenMP] Misc. changes that add or remove pointer/bound checks
-- Added or moved checks to appropriate places. -- Removed ineffective null check where the pointer is already being dereferenced ar
[OpenMP] Misc. changes that add or remove pointer/bound checks
-- Added or moved checks to appropriate places. -- Removed ineffective null check where the pointer is already being dereferenced around the code. -- Initialized variables that can be used without definitions. -- Added call to dlclose/FreeLibrary in OMPT tool activation. -- Added a new build compiler definition.
Differential Revision: https://reviews.llvm.org/D98584
show more ...
|