1.\" Copyright (c) 2016 The FreeBSD Foundation, Inc. 2.\" All rights reserved. 3.\" 4.\" This documentation was written by 5.\" Konstantin Belousov <[email protected]> under sponsorship 6.\" from the FreeBSD Foundation. 7.\" 8.\" Redistribution and use in source and binary forms, with or without 9.\" modification, are permitted provided that the following conditions 10.\" are met: 11.\" 1. Redistributions of source code must retain the above copyright 12.\" notice, this list of conditions and the following disclaimer. 13.\" 2. Redistributions in binary form must reproduce the above copyright 14.\" notice, this list of conditions and the following disclaimer in the 15.\" documentation and/or other materials provided with the distribution. 16.\" 17.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND 18.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 19.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 20.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE 21.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 22.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 23.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 24.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 25.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 26.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 27.\" SUCH DAMAGE. 28.\" 29.\" $FreeBSD$ 30.\" 31.Dd November 23, 2020 32.Dt _UMTX_OP 2 33.Os 34.Sh NAME 35.Nm _umtx_op 36.Nd interface for implementation of userspace threading synchronization primitives 37.Sh LIBRARY 38.Lb libc 39.Sh SYNOPSIS 40.In sys/types.h 41.In sys/umtx.h 42.Ft int 43.Fn _umtx_op "void *obj" "int op" "u_long val" "void *uaddr" "void *uaddr2" 44.Sh DESCRIPTION 45The 46.Fn _umtx_op 47system call provides kernel support for userspace implementation of 48the threading synchronization primitives. 49The 50.Lb libthr 51uses the syscall to implement 52.St -p1003.1-2001 53pthread locks, like mutexes, condition variables and so on. 54.Ss STRUCTURES 55The operations, performed by the 56.Fn _umtx_op 57syscall, operate on userspace objects which are described 58by the following structures. 59Reserved fields and paddings are omitted. 60All objects require ABI-mandated alignment, but this is not currently 61enforced consistently on all architectures. 62.Pp 63The following flags are defined for flag fields of all structures: 64.Bl -tag -width indent 65.It Dv USYNC_PROCESS_SHARED 66Allow selection of the process-shared sleep queue for the thread sleep 67container, when the lock ownership cannot be granted immediately, 68and the operation must sleep. 69The process-shared or process-private sleep queue is selected based on 70the attributes of the memory mapping which contains the first byte of 71the structure, see 72.Xr mmap 2 . 73Otherwise, if the flag is not specified, the process-private sleep queue 74is selected regardless of the memory mapping attributes, as an optimization. 75.Pp 76See the 77.Sx SLEEP QUEUES 78subsection below for more details on sleep queues. 79.El 80.Bl -hang -offset indent 81.It Sy Mutex 82.Bd -literal 83struct umutex { 84 volatile lwpid_t m_owner; 85 uint32_t m_flags; 86 uint32_t m_ceilings[2]; 87 uintptr_t m_rb_lnk; 88}; 89.Ed 90.Pp 91The 92.Dv m_owner 93field is the actual lock. 94It contains either the thread identifier of the lock owner in the 95locked state, or zero when the lock is unowned. 96The highest bit set indicates that there is contention on the lock. 97The constants are defined for special values: 98.Bl -tag -width indent 99.It Dv UMUTEX_UNOWNED 100Zero, the value stored in the unowned lock. 101.It Dv UMUTEX_CONTESTED 102The contention indicator. 103.It Dv UMUTEX_RB_OWNERDEAD 104A thread owning the robust mutex terminated. 105The mutex is in unlocked state. 106.It Dv UMUTEX_RB_NOTRECOV 107The robust mutex is in a non-recoverable state. 108It cannot be locked until reinitialized. 109.El 110.Pp 111The 112.Dv m_flags 113field may contain the following umutex-specific flags, in addition to 114the common flags: 115.Bl -tag -width indent 116.It Dv UMUTEX_PRIO_INHERIT 117Mutex implements 118.Em Priority Inheritance 119protocol. 120.It Dv UMUTEX_PRIO_PROTECT 121Mutex implements 122.Em Priority Protection 123protocol. 124.It Dv UMUTEX_ROBUST 125Mutex is robust, as described in the 126.Sx ROBUST UMUTEXES 127section below. 128.It Dv UMUTEX_NONCONSISTENT 129Robust mutex is in a transient non-consistent state. 130Not used by kernel. 131.El 132.Pp 133In the manual page, mutexes not having 134.Dv UMUTEX_PRIO_INHERIT 135and 136.Dv UMUTEX_PRIO_PROTECT 137flags set, are called normal mutexes. 138Each type of mutex 139.Pq normal, priority-inherited, and priority-protected 140has a separate sleep queue associated 141with the given key. 142.Pp 143For priority protected mutexes, the 144.Dv m_ceilings 145array contains priority ceiling values. 146The 147.Dv m_ceilings[0] 148is the ceiling value for the mutex, as specified by 149.St -p1003.1-2008 150for the 151.Em Priority Protected 152mutex protocol. 153The 154.Dv m_ceilings[1] 155is used only for the unlock of a priority protected mutex, when 156unlock is done in an order other than the reversed lock order. 157In this case, 158.Dv m_ceilings[1] 159must contain the ceiling value for the last locked priority protected 160mutex, for proper priority reassignment. 161If, instead, the unlocking mutex was the last priority propagated 162mutex locked by the thread, 163.Dv m_ceilings[1] 164should contain \-1. 165This is required because kernel does not maintain the ordered lock list. 166.It Sy Condition variable 167.Bd -literal 168struct ucond { 169 volatile uint32_t c_has_waiters; 170 uint32_t c_flags; 171 uint32_t c_clockid; 172}; 173.Ed 174.Pp 175A non-zero 176.Dv c_has_waiters 177value indicates that there are in-kernel waiters for the condition, 178executing the 179.Dv UMTX_OP_CV_WAIT 180request. 181.Pp 182The 183.Dv c_flags 184field contains flags. 185Only the common flags 186.Pq Dv USYNC_PROCESS_SHARED 187are defined for ucond. 188.Pp 189The 190.Dv c_clockid 191member provides the clock identifier to use for timeout, when the 192.Dv UMTX_OP_CV_WAIT 193request has both the 194.Dv CVWAIT_CLOCKID 195flag and the timeout specified. 196Valid clock identifiers are a subset of those for 197.Xr clock_gettime 2 : 198.Bl -bullet -compact 199.It 200.Dv CLOCK_MONOTONIC 201.It 202.Dv CLOCK_MONOTONIC_FAST 203.It 204.Dv CLOCK_MONOTONIC_PRECISE 205.It 206.Dv CLOCK_PROF 207.It 208.Dv CLOCK_REALTIME 209.It 210.Dv CLOCK_REALTIME_FAST 211.It 212.Dv CLOCK_REALTIME_PRECISE 213.It 214.Dv CLOCK_SECOND 215.It 216.Dv CLOCK_UPTIME 217.It 218.Dv CLOCK_UPTIME_FAST 219.It 220.Dv CLOCK_UPTIME_PRECISE 221.It 222.Dv CLOCK_VIRTUAL 223.El 224.It Sy Reader/writer lock 225.Bd -literal 226struct urwlock { 227 volatile int32_t rw_state; 228 uint32_t rw_flags; 229 uint32_t rw_blocked_readers; 230 uint32_t rw_blocked_writers; 231}; 232.Ed 233.Pp 234The 235.Dv rw_state 236field is the actual lock. 237It contains both the flags and counter of the read locks which were 238granted. 239Names of the 240.Dv rw_state 241bits are following: 242.Bl -tag -width indent 243.It Dv URWLOCK_WRITE_OWNER 244Write lock was granted. 245.It Dv URWLOCK_WRITE_WAITERS 246There are write lock waiters. 247.It Dv URWLOCK_READ_WAITERS 248There are read lock waiters. 249.It Dv URWLOCK_READER_COUNT(c) 250Returns the count of currently granted read locks. 251.El 252.Pp 253At any given time there may be only one thread to which the writer lock 254is granted on the 255.Vt struct rwlock , 256and no threads are granted read lock. 257Or, at the given time, up to 258.Dv URWLOCK_MAX_READERS 259threads may be granted the read lock simultaneously, but write lock is 260not granted to any thread. 261.Pp 262The following flags for the 263.Dv rw_flags 264member of 265.Vt struct urwlock 266are defined, in addition to the common flags: 267.Bl -tag -width indent 268.It Dv URWLOCK_PREFER_READER 269If specified, immediately grant read lock requests when 270.Dv urwlock 271is already read-locked, even in presence of unsatisfied write 272lock requests. 273By default, if there is a write lock waiter, further read requests are 274not granted, to prevent unfair write lock waiter starvation. 275.El 276.Pp 277The 278.Dv rw_blocked_readers 279and 280.Dv rw_blocked_writers 281members contain the count of threads which are sleeping in kernel, 282waiting for the associated request type to be granted. 283The fields are used by kernel to update the 284.Dv URWLOCK_READ_WAITERS 285and 286.Dv URWLOCK_WRITE_WAITERS 287flags of the 288.Dv rw_state 289lock after requesting thread was woken up. 290.It Sy Semaphore 291.Bd -literal 292struct _usem2 { 293 volatile uint32_t _count; 294 uint32_t _flags; 295}; 296.Ed 297.Pp 298The 299.Dv _count 300word represents a counting semaphore. 301A non-zero value indicates an unlocked (posted) semaphore, while zero 302represents the locked state. 303The maximal supported semaphore count is 304.Dv USEM_MAX_COUNT . 305.Pp 306The 307.Dv _count 308word, besides the counter of posts (unlocks), also contains the 309.Dv USEM_HAS_WAITERS 310bit, which indicates that locked semaphore has waiting threads. 311.Pp 312The 313.Dv USEM_COUNT() 314macro, applied to the 315.Dv _count 316word, returns the current semaphore counter, which is the number of posts 317issued on the semaphore. 318.Pp 319The following bits for the 320.Dv _flags 321member of 322.Vt struct _usem2 323are defined, in addition to the common flags: 324.Bl -tag -width indent 325.It Dv USEM_NAMED 326Flag is ignored by kernel. 327.El 328.It Sy Timeout parameter 329.Bd -literal 330struct _umtx_time { 331 struct timespec _timeout; 332 uint32_t _flags; 333 uint32_t _clockid; 334}; 335.Ed 336.Pp 337Several 338.Fn _umtx_op 339operations allow the blocking time to be limited, failing the request 340if it cannot be satisfied in the specified time period. 341The timeout is specified by passing either the address of 342.Vt struct timespec , 343or its extended variant, 344.Vt struct _umtx_time , 345as the 346.Fa uaddr2 347argument of 348.Fn _umtx_op . 349They are distinguished by the 350.Fa uaddr 351value, which must be equal to the size of the structure pointed to by 352.Fa uaddr2 , 353casted to 354.Vt uintptr_t . 355.Pp 356The 357.Dv _timeout 358member specifies the time when the timeout should occur. 359Legal values for clock identifier 360.Dv _clockid 361are shared with the 362.Fa clock_id 363argument to the 364.Xr clock_gettime 2 365function, 366and use the same underlying clocks. 367The specified clock is used to obtain the current time value. 368Interval counting is always performed by the monotonic wall clock. 369.Pp 370The 371.Dv _flags 372argument allows the following flags to further define the timeout behaviour: 373.Bl -tag -width indent 374.It Dv UMTX_ABSTIME 375The 376.Dv _timeout 377value is the absolute time. 378The thread will be unblocked and the request failed when specified 379clock value is equal or exceeds the 380.Dv _timeout. 381.Pp 382If the flag is absent, the timeout value is relative, that is the amount 383of time, measured by the monotonic wall clock from the moment of the request 384start. 385.El 386.El 387.Ss SLEEP QUEUES 388When a locking request cannot be immediately satisfied, the thread is 389typically put to 390.Em sleep , 391which is a non-runnable state terminated by the 392.Em wake 393operation. 394Lock operations include a 395.Em try 396variant which returns an error rather than sleeping if the lock cannot 397be obtained. 398Also, 399.Fn _umtx_op 400provides requests which explicitly put the thread to sleep. 401.Pp 402Wakes need to know which threads to make runnable, so sleeping threads 403are grouped into containers called 404.Em sleep queues . 405A sleep queue is identified by a key, which for 406.Fn _umtx_op 407is defined as the physical address of some variable. 408Note that the 409.Em physical 410address is used, which means that same variable mapped multiple 411times will give one key value. 412This mechanism enables the construction of 413.Em process-shared 414locks. 415.Pp 416A related attribute of the key is shareability. 417Some requests always interpret keys as private for the current process, 418creating sleep queues with the scope of the current process even if 419the memory is shared. 420Others either select the shareability automatically from the 421mapping attributes, or take additional input as the 422.Dv USYNC_PROCESS_SHARED 423common flag. 424This is done as optimization, allowing the lock scope to be limited 425regardless of the kind of backing memory. 426.Pp 427Only the address of the start byte of the variable specified as key is 428important for determining corresponding sleep queue. 429The size of the variable does not matter, so, for example, sleep on the same 430address interpeted as 431.Vt uint32_t 432and 433.Vt long 434on a little-endian 64-bit platform would collide. 435.Pp 436The last attribute of the key is the object type. 437The sleep queue to which a sleeping thread is assigned is an individual 438one for simple wait requests, mutexes, rwlocks, condvars and other 439primitives, even when the physical address of the key is same. 440.Pp 441When waking up a limited number of threads from a given sleep queue, 442the highest priority threads that have been blocked for the longest on 443the queue are selected. 444.Ss ROBUST UMUTEXES 445The 446.Em robust umutexes 447are provided as a substrate for a userspace library to implement 448.Tn POSIX 449robust mutexes. 450A robust umutex must have the 451.Dv UMUTEX_ROBUST 452flag set. 453.Pp 454On thread termination, the kernel walks two lists of mutexes. 455The two lists head addresses must be provided by a prior call to 456.Dv UMTX_OP_ROBUST_LISTS 457request. 458The lists are singly-linked. 459The link to next element is provided by the 460.Dv m_rb_lnk 461member of the 462.Vt struct umutex . 463.Pp 464Robust list processing is aborted if the kernel finds a mutex 465with any of the following conditions: 466.Bl -dash -offset indent -compact 467.It 468the 469.Dv UMUTEX_ROBUST 470flag is not set 471.It 472not owned by the current thread, except when the mutex is pointed to 473by the 474.Dv robust_inactive 475member of the 476.Vt struct umtx_robust_lists_params , 477registered for the current thread 478.It 479the combination of mutex flags is invalid 480.It 481read of the umutex memory faults 482.It 483the list length limit described in 484.Xr libthr 3 485is reached. 486.El 487.Pp 488Every mutex in both lists is unlocked as if the 489.Dv UMTX_OP_MUTEX_UNLOCK 490request is performed on it, but instead of the 491.Dv UMUTEX_UNOWNED 492value, the 493.Dv m_owner 494field is written with the 495.Dv UMUTEX_RB_OWNERDEAD 496value. 497When a mutex in the 498.Dv UMUTEX_RB_OWNERDEAD 499state is locked by kernel due to the 500.Dv UMTX_OP_MUTEX_TRYLOCK 501and 502.Dv UMTX_OP_MUTEX_LOCK 503requests, the lock is granted and 504.Er EOWNERDEAD 505error is returned. 506.Pp 507Also, the kernel handles the 508.Dv UMUTEX_RB_NOTRECOV 509value of 510.Dv the m_owner 511field specially, always returning the 512.Er ENOTRECOVERABLE 513error for lock attempts, without granting the lock. 514.Ss OPERATIONS 515The following operations, requested by the 516.Fa op 517argument to the function, are implemented: 518.Bl -tag -width indent 519.It Dv UMTX_OP_WAIT 520Wait. 521The arguments for the request are: 522.Bl -tag -width "obj" 523.It Fa obj 524Pointer to a variable of type 525.Vt long . 526.It Fa val 527Current value of the 528.Dv *obj . 529.El 530.Pp 531The current value of the variable pointed to by the 532.Fa obj 533argument is compared with the 534.Fa val . 535If they are equal, the requesting thread is put to interruptible sleep 536until woken up or the optionally specified timeout expires. 537.Pp 538The comparison and sleep are atomic. 539In other words, if another thread writes a new value to 540.Dv *obj 541and then issues 542.Dv UMTX_OP_WAKE , 543the request is guaranteed to not miss the wakeup, 544which might otherwise happen between comparison and blocking. 545.Pp 546The physical address of memory where the 547.Fa *obj 548variable is located, is used as a key to index sleeping threads. 549.Pp 550The read of the current value of the 551.Dv *obj 552variable is not guarded by barriers. 553In particular, it is the user's duty to ensure the lock acquire 554and release memory semantics, if the 555.Dv UMTX_OP_WAIT 556and 557.Dv UMTX_OP_WAKE 558requests are used as a substrate for implementing a simple lock. 559.Pp 560The request is not restartable. 561An unblocked signal delivered during the wait always results in sleep 562interruption and 563.Er EINTR 564error. 565.Pp 566Optionally, a timeout for the request may be specified. 567.It Dv UMTX_OP_WAKE 568Wake the threads possibly sleeping due to 569.Dv UMTX_OP_WAIT . 570The arguments for the request are: 571.Bl -tag -width "obj" 572.It Fa obj 573Pointer to a variable, used as a key to find sleeping threads. 574.It Fa val 575Up to 576.Fa val 577threads are woken up by this request. 578Specify 579.Dv INT_MAX 580to wake up all waiters. 581.El 582.It Dv UMTX_OP_MUTEX_TRYLOCK 583Try to lock umutex. 584The arguments to the request are: 585.Bl -tag -width "obj" 586.It Fa obj 587Pointer to the umutex. 588.El 589.Pp 590Operates same as the 591.Dv UMTX_OP_MUTEX_LOCK 592request, but returns 593.Er EBUSY 594instead of sleeping if the lock cannot be obtained immediately. 595.It Dv UMTX_OP_MUTEX_LOCK 596Lock umutex. 597The arguments to the request are: 598.Bl -tag -width "obj" 599.It Fa obj 600Pointer to the umutex. 601.El 602.Pp 603Locking is performed by writing the current thread id into the 604.Dv m_owner 605word of the 606.Vt struct umutex . 607The write is atomic, preserves the 608.Dv UMUTEX_CONTESTED 609contention indicator, and provides the acquire barrier for 610lock entrance semantic. 611.Pp 612If the lock cannot be obtained immediately because another thread owns 613the lock, the current thread is put to sleep, with 614.Dv UMUTEX_CONTESTED 615bit set before. 616Upon wake up, the lock conditions are re-tested. 617.Pp 618The request adheres to the priority protection or inheritance protocol 619of the mutex, specified by the 620.Dv UMUTEX_PRIO_PROTECT 621or 622.Dv UMUTEX_PRIO_INHERIT 623flag, respectively. 624.Pp 625Optionally, a timeout for the request may be specified. 626.Pp 627A request with a timeout specified is not restartable. 628An unblocked signal delivered during the wait always results in sleep 629interruption and 630.Er EINTR 631error. 632A request without timeout specified is always restarted after return 633from a signal handler. 634.It Dv UMTX_OP_MUTEX_UNLOCK 635Unlock umutex. 636The arguments to the request are: 637.Bl -tag -width "obj" 638.It Fa obj 639Pointer to the umutex. 640.El 641.Pp 642Unlocks the mutex, by writing 643.Dv UMUTEX_UNOWNED 644(zero) value into 645.Dv m_owner 646word of the 647.Vt struct umutex . 648The write is done with a release barrier, to provide lock leave semantic. 649.Pp 650If there are threads sleeping in the sleep queue associated with the 651umutex, one thread is woken up. 652If more than one thread sleeps in the sleep queue, the 653.Dv UMUTEX_CONTESTED 654bit is set together with the write of the 655.Dv UMUTEX_UNOWNED 656value into 657.Dv m_owner . 658.Pp 659The request adheres to the priority protection or inheritance protocol 660of the mutex, specified by the 661.Dv UMUTEX_PRIO_PROTECT 662or 663.Dv UMUTEX_PRIO_INHERIT 664flag, respectively. 665See description of the 666.Dv m_ceilings 667member of the 668.Vt struct umutex 669structure for additional details of the request operation on the 670priority protected protocol mutex. 671.It Dv UMTX_OP_SET_CEILING 672Set ceiling for the priority protected umutex. 673The arguments to the request are: 674.Bl -tag -width "uaddr" 675.It Fa obj 676Pointer to the umutex. 677.It Fa val 678New ceiling value. 679.It Fa uaddr 680Address of a variable of type 681.Vt uint32_t . 682If not 683.Dv NULL 684and the update was successful, the previous ceiling value is 685written to the location pointed to by 686.Fa uaddr . 687.El 688.Pp 689The request locks the umutex pointed to by the 690.Fa obj 691parameter, waiting for the lock if not immediately available. 692After the lock is obtained, the new ceiling value 693.Fa val 694is written to the 695.Dv m_ceilings[0] 696member of the 697.Vt struct umutex, 698after which the umutex is unlocked. 699.Pp 700The locking does not adhere to the priority protect protocol, 701to conform to the 702.Tn POSIX 703requirements for the 704.Xr pthread_mutex_setprioceiling 3 705interface. 706.It Dv UMTX_OP_CV_WAIT 707Wait for a condition. 708The arguments to the request are: 709.Bl -tag -width "uaddr2" 710.It Fa obj 711Pointer to the 712.Vt struct ucond . 713.It Fa val 714Request flags, see below. 715.It Fa uaddr 716Pointer to the umutex. 717.It Fa uaddr2 718Optional pointer to a 719.Vt struct timespec 720for timeout specification. 721.El 722.Pp 723The request must be issued by the thread owning the mutex pointed to 724by the 725.Fa uaddr 726argument. 727The 728.Dv c_hash_waiters 729member of the 730.Vt struct ucond , 731pointed to by the 732.Fa obj 733argument, is set to an arbitrary non-zero value, after which the 734.Fa uaddr 735mutex is unlocked (following the appropriate protocol), and 736the current thread is put to sleep on the sleep queue keyed by 737the 738.Fa obj 739argument. 740The operations are performed atomically. 741It is guaranteed to not miss a wakeup from 742.Dv UMTX_OP_CV_SIGNAL 743or 744.Dv UMTX_OP_CV_BROADCAST 745sent between mutex unlock and putting the current thread on the sleep queue. 746.Pp 747Upon wakeup, if the timeout expired and no other threads are sleeping in 748the same sleep queue, the 749.Dv c_hash_waiters 750member is cleared. 751After wakeup, the 752.Fa uaddr 753umutex is not relocked. 754.Pp 755The following flags are defined: 756.Bl -tag -width "CVWAIT_CLOCKID" 757.It Dv CVWAIT_ABSTIME 758Timeout is absolute. 759.It Dv CVWAIT_CLOCKID 760Clockid is provided. 761.El 762.Pp 763Optionally, a timeout for the request may be specified. 764Unlike other requests, the timeout value is specified directly by a 765.Vt struct timespec , 766pointed to by the 767.Fa uaddr2 768argument. 769If the 770.Dv CVWAIT_CLOCKID 771flag is provided, the timeout uses the clock from the 772.Dv c_clockid 773member of the 774.Vt struct ucond , 775pointed to by 776.Fa obj 777argument. 778Otherwise, 779.Dv CLOCK_REALTIME 780is used, regardless of the clock identifier possibly specified in the 781.Vt struct _umtx_time . 782If the 783.Dv CVWAIT_ABSTIME 784flag is supplied, the timeout specifies absolute time value, otherwise 785it denotes a relative time interval. 786.Pp 787The request is not restartable. 788An unblocked signal delivered during 789the wait always results in sleep interruption and 790.Er EINTR 791error. 792.It Dv UMTX_OP_CV_SIGNAL 793Wake up one condition waiter. 794The arguments to the request are: 795.Bl -tag -width "obj" 796.It Fa obj 797Pointer to 798.Vt struct ucond . 799.El 800.Pp 801The request wakes up at most one thread sleeping on the sleep queue keyed 802by the 803.Fa obj 804argument. 805If the woken up thread was the last on the sleep queue, the 806.Dv c_has_waiters 807member of the 808.Vt struct ucond 809is cleared. 810.It Dv UMTX_OP_CV_BROADCAST 811Wake up all condition waiters. 812The arguments to the request are: 813.Bl -tag -width "obj" 814.It Fa obj 815Pointer to 816.Vt struct ucond . 817.El 818.Pp 819The request wakes up all threads sleeping on the sleep queue keyed by the 820.Fa obj 821argument. 822The 823.Dv c_has_waiters 824member of the 825.Vt struct ucond 826is cleared. 827.It Dv UMTX_OP_WAIT_UINT 828Same as 829.Dv UMTX_OP_WAIT , 830but the type of the variable pointed to by 831.Fa obj 832is 833.Vt u_int 834.Pq a 32-bit integer . 835.It Dv UMTX_OP_RW_RDLOCK 836Read-lock a 837.Vt struct rwlock 838lock. 839The arguments to the request are: 840.Bl -tag -width "obj" 841.It Fa obj 842Pointer to the lock (of type 843.Vt struct rwlock ) 844to be read-locked. 845.It Fa val 846Additional flags to augment locking behaviour. 847The valid flags in the 848.Fa val 849argument are: 850.Bl -tag -width indent 851.It Dv URWLOCK_PREFER_READER 852.El 853.El 854.Pp 855The request obtains the read lock on the specified 856.Vt struct rwlock 857by incrementing the count of readers in the 858.Dv rw_state 859word of the structure. 860If the 861.Dv URWLOCK_WRITE_OWNER 862bit is set in the word 863.Dv rw_state , 864the lock was granted to a writer which has not yet relinquished 865its ownership. 866In this case the current thread is put to sleep until it makes sense to 867retry. 868.Pp 869If the 870.Dv URWLOCK_PREFER_READER 871flag is set either in the 872.Dv rw_flags 873word of the structure, or in the 874.Fa val 875argument of the request, the presence of the threads trying to obtain 876the write lock on the same structure does not prevent the current thread 877from trying to obtain the read lock. 878Otherwise, if the flag is not set, and the 879.Dv URWLOCK_WRITE_WAITERS 880flag is set in 881.Dv rw_state , 882the current thread does not attempt to obtain read-lock. 883Instead it sets the 884.Dv URWLOCK_READ_WAITERS 885in the 886.Dv rw_state 887word and puts itself to sleep on corresponding sleep queue. 888Upon wakeup, the locking conditions are re-evaluated. 889.Pp 890Optionally, a timeout for the request may be specified. 891.Pp 892The request is not restartable. 893An unblocked signal delivered during the wait always results in sleep 894interruption and 895.Er EINTR 896error. 897.It Dv UMTX_OP_RW_WRLOCK 898Write-lock a 899.Vt struct rwlock 900lock. 901The arguments to the request are: 902.Bl -tag -width "obj" 903.It Fa obj 904Pointer to the lock (of type 905.Vt struct rwlock ) 906to be write-locked. 907.El 908.Pp 909The request obtains a write lock on the specified 910.Vt struct rwlock , 911by setting the 912.Dv URWLOCK_WRITE_OWNER 913bit in the 914.Dv rw_state 915word of the structure. 916If there is already a write lock owner, as indicated by the 917.Dv URWLOCK_WRITE_OWNER 918bit being set, or there are read lock owners, as indicated 919by the read-lock counter, the current thread does not attempt to 920obtain the write-lock. 921Instead it sets the 922.Dv URWLOCK_WRITE_WAITERS 923in the 924.Dv rw_state 925word and puts itself to sleep on corresponding sleep queue. 926Upon wakeup, the locking conditions are re-evaluated. 927.Pp 928Optionally, a timeout for the request may be specified. 929.Pp 930The request is not restartable. 931An unblocked signal delivered during the wait always results in sleep 932interruption and 933.Er EINTR 934error. 935.It Dv UMTX_OP_RW_UNLOCK 936Unlock rwlock. 937The arguments to the request are: 938.Bl -tag -width "obj" 939.It Fa obj 940Pointer to the lock (of type 941.Vt struct rwlock ) 942to be unlocked. 943.El 944.Pp 945The unlock type (read or write) is determined by the 946current lock state. 947Note that the 948.Vt struct rwlock 949does not save information about the identity of the thread which 950acquired the lock. 951.Pp 952If there are pending writers after the unlock, and the 953.Dv URWLOCK_PREFER_READER 954flag is not set in the 955.Dv rw_flags 956member of the 957.Fa *obj 958structure, one writer is woken up, selected as described in the 959.Sx SLEEP QUEUES 960subsection. 961If the 962.Dv URWLOCK_PREFER_READER 963flag is set, a pending writer is woken up only if there is 964no pending readers. 965.Pp 966If there are no pending writers, or, in the case that the 967.Dv URWLOCK_PREFER_READER 968flag is set, then all pending readers are woken up by unlock. 969.It Dv UMTX_OP_WAIT_UINT_PRIVATE 970Same as 971.Dv UMTX_OP_WAIT_UINT , 972but unconditionally select the process-private sleep queue. 973.It Dv UMTX_OP_WAKE_PRIVATE 974Same as 975.Dv UMTX_OP_WAKE , 976but unconditionally select the process-private sleep queue. 977.It Dv UMTX_OP_MUTEX_WAIT 978Wait for mutex availability. 979The arguments to the request are: 980.Bl -tag -width "obj" 981.It Fa obj 982Address of the mutex. 983.El 984.Pp 985Similarly to the 986.Dv UMTX_OP_MUTEX_LOCK , 987put the requesting thread to sleep if the mutex lock cannot be obtained 988immediately. 989The 990.Dv UMUTEX_CONTESTED 991bit is set in the 992.Dv m_owner 993word of the mutex to indicate that there is a waiter, before the thread 994is added to the sleep queue. 995Unlike the 996.Dv UMTX_OP_MUTEX_LOCK 997request, the lock is not obtained. 998.Pp 999The operation is not implemented for priority protected and 1000priority inherited protocol mutexes. 1001.Pp 1002Optionally, a timeout for the request may be specified. 1003.Pp 1004A request with a timeout specified is not restartable. 1005An unblocked signal delivered during the wait always results in sleep 1006interruption and 1007.Er EINTR 1008error. 1009A request without a timeout automatically restarts if the signal disposition 1010requested restart via the 1011.Dv SA_RESTART 1012flag in 1013.Vt struct sigaction 1014member 1015.Dv sa_flags . 1016.It Dv UMTX_OP_NWAKE_PRIVATE 1017Wake up a batch of sleeping threads. 1018The arguments to the request are: 1019.Bl -tag -width "obj" 1020.It Fa obj 1021Pointer to the array of pointers. 1022.It Fa val 1023Number of elements in the array pointed to by 1024.Fa obj . 1025.El 1026.Pp 1027For each element in the array pointed to by 1028.Fa obj , 1029wakes up all threads waiting on the 1030.Em private 1031sleep queue with the key 1032being the byte addressed by the array element. 1033.It Dv UMTX_OP_MUTEX_WAKE 1034Check if a normal umutex is unlocked and wake up a waiter. 1035The arguments for the request are: 1036.Bl -tag -width "obj" 1037.It Fa obj 1038Pointer to the umutex. 1039.El 1040.Pp 1041If the 1042.Dv m_owner 1043word of the mutex pointed to by the 1044.Fa obj 1045argument indicates unowned mutex, which has its contention indicator bit 1046.Dv UMUTEX_CONTESTED 1047set, clear the bit and wake up one waiter in the sleep queue associated 1048with the byte addressed by the 1049.Fa obj , 1050if any. 1051Only normal mutexes are supported by the request. 1052The sleep queue is always one for a normal mutex type. 1053.Pp 1054This request is deprecated in favor of 1055.Dv UMTX_OP_MUTEX_WAKE2 1056since mutexes using it cannot synchronize their own destruction. 1057That is, the 1058.Dv m_owner 1059word has already been set to 1060.Dv UMUTEX_UNOWNED 1061when this request is made, 1062so that another thread can lock, unlock and destroy the mutex 1063(if no other thread uses the mutex afterwards). 1064Clearing the 1065.Dv UMUTEX_CONTESTED 1066bit may then modify freed memory. 1067.It Dv UMTX_OP_MUTEX_WAKE2 1068Check if a umutex is unlocked and wake up a waiter. 1069The arguments for the request are: 1070.Bl -tag -width "obj" 1071.It Fa obj 1072Pointer to the umutex. 1073.It Fa val 1074The umutex flags. 1075.El 1076.Pp 1077The request does not read the 1078.Dv m_flags 1079member of the 1080.Vt struct umutex ; 1081instead, the 1082.Fa val 1083argument supplies flag information, in particular, to determine the 1084sleep queue where the waiters are found for wake up. 1085.Pp 1086If the mutex is unowned, one waiter is woken up. 1087.Pp 1088If the mutex memory cannot be accessed, all waiters are woken up. 1089.Pp 1090If there is more than one waiter on the sleep queue, or there is only 1091one waiter but the mutex is owned by a thread, the 1092.Dv UMUTEX_CONTESTED 1093bit is set in the 1094.Dv m_owner 1095word of the 1096.Vt struct umutex . 1097.It Dv UMTX_OP_SEM2_WAIT 1098Wait until semaphore is available. 1099The arguments to the request are: 1100.Bl -tag -width "obj" 1101.It Fa obj 1102Pointer to the semaphore (of type 1103.Vt struct _usem2 ) . 1104.It Fa uaddr 1105Size of the memory passed in via the 1106.Fa uaddr2 1107argument. 1108.It Fa uaddr2 1109Optional pointer to a structure of type 1110.Vt struct _umtx_time , 1111which may be followed by a structure of type 1112.Vt struct timespec . 1113.El 1114.Pp 1115Put the requesting thread onto a sleep queue if the semaphore counter 1116is zero. 1117If the thread is put to sleep, the 1118.Dv USEM_HAS_WAITERS 1119bit is set in the 1120.Dv _count 1121word to indicate waiters. 1122The function returns either due to 1123.Dv _count 1124indicating the semaphore is available (non-zero count due to post), 1125or due to a wakeup. 1126The return does not guarantee that the semaphore is available, 1127nor does it consume the semaphore lock on successful return. 1128.Pp 1129Optionally, a timeout for the request may be specified. 1130.Pp 1131A request with non-absolute timeout value is not restartable. 1132An unblocked signal delivered during such wait results in sleep 1133interruption and 1134.Er EINTR 1135error. 1136.Pp 1137If 1138.Dv UMTX_ABSTIME 1139was not set, and the operation was interrupted and the caller passed in a 1140.Fa uaddr2 1141large enough to hold a 1142.Vt struct timespec 1143following the initial 1144.Vt struct _umtx_time , 1145then the 1146.Vt struct timespec 1147is updated to contain the unslept amount. 1148.It Dv UMTX_OP_SEM2_WAKE 1149Wake up waiters on semaphore lock. 1150The arguments to the request are: 1151.Bl -tag -width "obj" 1152.It Fa obj 1153Pointer to the semaphore (of type 1154.Vt struct _usem2 ) . 1155.El 1156.Pp 1157The request wakes up one waiter for the semaphore lock. 1158The function does not increment the semaphore lock count. 1159If the 1160.Dv USEM_HAS_WAITERS 1161bit was set in the 1162.Dv _count 1163word, and the last sleeping thread was woken up, the bit is cleared. 1164.It Dv UMTX_OP_SHM 1165Manage anonymous 1166.Tn POSIX 1167shared memory objects (see 1168.Xr shm_open 2 ) , 1169which can be attached to a byte of physical memory, mapped into the 1170process address space. 1171The objects are used to implement process-shared locks in 1172.Dv libthr . 1173.Pp 1174The 1175.Fa val 1176argument specifies the sub-request of the 1177.Dv UMTX_OP_SHM 1178request: 1179.Bl -tag -width indent 1180.It Dv UMTX_SHM_CREAT 1181Creates the anonymous shared memory object, which can be looked up 1182with the specified key 1183.Fa uaddr . 1184If the object associated with the 1185.Fa uaddr 1186key already exists, it is returned instead of creating a new object. 1187The object's size is one page. 1188On success, the file descriptor referencing the object is returned. 1189The descriptor can be used for mapping the object using 1190.Xr mmap 2 , 1191or for other shared memory operations. 1192.It Dv UMTX_SHM_LOOKUP 1193Same as 1194.Dv UMTX_SHM_CREATE 1195request, but if there is no shared memory object associated with 1196the specified key 1197.Fa uaddr , 1198an error is returned, and no new object is created. 1199.It Dv UMTX_SHM_DESTROY 1200De-associate the shared object with the specified key 1201.Fa uaddr . 1202The object is destroyed after the last open file descriptor is closed 1203and the last mapping for it is destroyed. 1204.It Dv UMTX_SHM_ALIVE 1205Checks whether there is a live shared object associated with the 1206supplied key 1207.Fa uaddr . 1208Returns zero if there is, and an error otherwise. 1209This request is an optimization of the 1210.Dv UMTX_SHM_LOOKUP 1211request. 1212It is cheaper when only the liveness of the associated object is asked 1213for, since no file descriptor is installed in the process fd table 1214on success. 1215.El 1216.Pp 1217The 1218.Fa uaddr 1219argument specifies the virtual address, which backing physical memory 1220byte identity is used as a key for the anonymous shared object 1221creation or lookup. 1222.It Dv UMTX_OP_ROBUST_LISTS 1223Register the list heads for the current thread's robust mutex lists. 1224The arguments to the request are: 1225.Bl -tag -width "uaddr" 1226.It Fa val 1227Size of the structure passed in the 1228.Fa uaddr 1229argument. 1230.It Fa uaddr 1231Pointer to the structure of type 1232.Vt struct umtx_robust_lists_params . 1233.El 1234.Pp 1235The structure is defined as 1236.Bd -literal 1237struct umtx_robust_lists_params { 1238 uintptr_t robust_list_offset; 1239 uintptr_t robust_priv_list_offset; 1240 uintptr_t robust_inact_offset; 1241}; 1242.Ed 1243.Pp 1244The 1245.Dv robust_list_offset 1246member contains address of the first element in the list of locked 1247robust shared mutexes. 1248The 1249.Dv robust_priv_list_offset 1250member contains address of the first element in the list of locked 1251robust private mutexes. 1252The private and shared robust locked lists are split to allow fast 1253termination of the shared list on fork, in the child. 1254.Pp 1255The 1256.Dv robust_inact_offset 1257contains a pointer to the mutex which might be locked in nearby future, 1258or might have been just unlocked. 1259It is typically set by the lock or unlock mutex implementation code 1260around the whole operation, since lists can be only changed race-free 1261when the thread owns the mutex. 1262The kernel inspects the 1263.Dv robust_inact_offset 1264in addition to walking the shared and private lists. 1265Also, the mutex pointed to by 1266.Dv robust_inact_offset 1267is handled more loosely at the thread termination time, 1268than other mutexes on the list. 1269That mutex is allowed to be not owned by the current thread, 1270in which case list processing is continued. 1271See 1272.Sx ROBUST UMUTEXES 1273subsection for details. 1274.El 1275.Pp 1276The 1277.Fa op 1278argument may be a bitwise OR of a single command from above with one or more of 1279the following flags: 1280.Bl -tag -width indent 1281.It Dv UMTX_OP__I386 1282Request i386 ABI compatibility from the native 1283.Nm 1284system call. 1285Specifically, this implies that: 1286.Bl -hang -offset indent 1287.It 1288.Fa obj 1289arguments that point to a word, point to a 32-bit integer. 1290.It 1291The 1292.Dv UMTX_OP_NWAKE_PRIVATE 1293.Fa obj 1294argument is a pointer to an array of 32-bit pointers. 1295.It 1296The 1297.Dv m_rb_lnk 1298member of 1299.Vt struct umutex 1300is a 32-bit pointer. 1301.It 1302.Vt struct timespec 1303uses a 32-bit time_t. 1304.El 1305.Pp 1306.Dv UMTX_OP__32BIT 1307has no effect if this flag is set. 1308This flag is valid for all architectures, but it is ignored on i386. 1309.It Dv UMTX_OP__32BIT 1310Request non-i386, 32-bit ABI compatibility from the native 1311.Nm 1312system call. 1313Specifically, this implies that: 1314.Bl -hang -offset indent 1315.It 1316.Fa obj 1317arguments that point to a word, point to a 32-bit integer. 1318.It 1319The 1320.Dv UMTX_OP_NWAKE_PRIVATE 1321.Fa obj 1322argument is a pointer to an array of 32-bit pointers. 1323.It 1324The 1325.Dv m_rb_lnk 1326member of 1327.Vt struct umutex 1328is a 32-bit pointer. 1329.It 1330.Vt struct timespec 1331uses a 64-bit time_t. 1332.El 1333.Pp 1334This flag has no effect if 1335.Dv UMTX_OP__I386 1336is set. 1337This flag is valid for all architectures. 1338.El 1339.Pp 1340Note that if any 32-bit ABI compatibility is being requested, then care must be 1341taken with robust lists. 1342A single thread may not mix 32-bit compatible robust lists with native 1343robust lists. 1344The first 1345.Dv UMTX_OP_ROBUST_LISTS 1346call in a given thread determines which ABI that thread will use for robust 1347lists going forward. 1348.Sh RETURN VALUES 1349If successful, 1350all requests, except 1351.Dv UMTX_SHM_CREAT 1352and 1353.Dv UMTX_SHM_LOOKUP 1354sub-requests of the 1355.Dv UMTX_OP_SHM 1356request, will return zero. 1357The 1358.Dv UMTX_SHM_CREAT 1359and 1360.Dv UMTX_SHM_LOOKUP 1361return a shared memory file descriptor on success. 1362On error \-1 is returned, and the 1363.Va errno 1364variable is set to indicate the error. 1365.Sh ERRORS 1366The 1367.Fn _umtx_op 1368operations can fail with the following errors: 1369.Bl -tag -width "[ETIMEDOUT]" 1370.It Bq Er EFAULT 1371One of the arguments point to invalid memory. 1372.It Bq Er EINVAL 1373The clock identifier, specified for the 1374.Vt struct _umtx_time 1375timeout parameter, or in the 1376.Dv c_clockid 1377member of 1378.Vt struct ucond, 1379is invalid. 1380.It Bq Er EINVAL 1381The type of the mutex, encoded by the 1382.Dv m_flags 1383member of 1384.Vt struct umutex , 1385is invalid. 1386.It Bq Er EINVAL 1387The 1388.Dv m_owner 1389member of the 1390.Vt struct umutex 1391has changed the lock owner thread identifier during unlock. 1392.It Bq Er EINVAL 1393The 1394.Dv timeout.tv_sec 1395or 1396.Dv timeout.tv_nsec 1397member of 1398.Vt struct _umtx_time 1399is less than zero, or 1400.Dv timeout.tv_nsec 1401is greater than 1000000000. 1402.It Bq Er EINVAL 1403The 1404.Fa op 1405argument specifies invalid operation. 1406.It Bq Er EINVAL 1407The 1408.Fa uaddr 1409argument for the 1410.Dv UMTX_OP_SHM 1411request specifies invalid operation. 1412.It Bq Er EINVAL 1413The 1414.Dv UMTX_OP_SET_CEILING 1415request specifies non priority protected mutex. 1416.It Bq Er EINVAL 1417The new ceiling value for the 1418.Dv UMTX_OP_SET_CEILING 1419request, or one or more of the values read from the 1420.Dv m_ceilings 1421array during lock or unlock operations, is greater than 1422.Dv RTP_PRIO_MAX . 1423.It Bq Er EPERM 1424Unlock attempted on an object not owned by the current thread. 1425.It Bq Er EOWNERDEAD 1426The lock was requested on an umutex where the 1427.Dv m_owner 1428field was set to the 1429.Dv UMUTEX_RB_OWNERDEAD 1430value, indicating terminated robust mutex. 1431The lock was granted to the caller, so this error in fact 1432indicates success with additional conditions. 1433.It Bq Er ENOTRECOVERABLE 1434The lock was requested on an umutex which 1435.Dv m_owner 1436field is equal to the 1437.Dv UMUTEX_RB_NOTRECOV 1438value, indicating abandoned robust mutex after termination. 1439The lock was not granted to the caller. 1440.It Bq Er ENOTTY 1441The shared memory object, associated with the address passed to the 1442.Dv UMTX_SHM_ALIVE 1443sub-request of 1444.Dv UMTX_OP_SHM 1445request, was destroyed. 1446.It Bq Er ESRCH 1447For the 1448.Dv UMTX_SHM_LOOKUP , 1449.Dv UMTX_SHM_DESTROY , 1450and 1451.Dv UMTX_SHM_ALIVE 1452sub-requests of the 1453.Dv UMTX_OP_SHM 1454request, there is no shared memory object associated with the provided key. 1455.It Bq Er ENOMEM 1456The 1457.Dv UMTX_SHM_CREAT 1458sub-request of the 1459.Dv UMTX_OP_SHM 1460request cannot be satisfied, because allocation of the shared memory object 1461would exceed the 1462.Dv RLIMIT_UMTXP 1463resource limit, see 1464.Xr setrlimit 2 . 1465.It Bq Er EAGAIN 1466The maximum number of readers 1467.Dv ( URWLOCK_MAX_READERS ) 1468were already granted ownership of the given 1469.Vt struct rwlock 1470for read. 1471.It Bq Er EBUSY 1472A try mutex lock operation was not able to obtain the lock. 1473.It Bq Er ETIMEDOUT 1474The request specified a timeout in the 1475.Fa uaddr 1476and 1477.Fa uaddr2 1478arguments, and timed out before obtaining the lock or being woken up. 1479.It Bq Er EINTR 1480A signal was delivered during wait, for a non-restartable operation. 1481Operations with timeouts are typically non-restartable, but timeouts 1482specified in absolute time may be restartable. 1483.It Bq Er ERESTART 1484A signal was delivered during wait, for a restartable operation. 1485Mutex lock requests without timeout specified are restartable. 1486The error is not returned to userspace code since restart 1487is handled by usual adjustment of the instruction counter. 1488.El 1489.Sh SEE ALSO 1490.Xr clock_gettime 2 , 1491.Xr mmap 2 , 1492.Xr setrlimit 2 , 1493.Xr shm_open 2 , 1494.Xr sigaction 2 , 1495.Xr thr_exit 2 , 1496.Xr thr_kill 2 , 1497.Xr thr_kill2 2 , 1498.Xr thr_new 2 , 1499.Xr thr_self 2 , 1500.Xr thr_set_name 2 , 1501.Xr signal 3 1502.Sh STANDARDS 1503The 1504.Fn _umtx_op 1505system call is non-standard and is used by the 1506.Lb libthr 1507to implement 1508.St -p1003.1-2001 1509.Xr pthread 3 1510functionality. 1511.Sh BUGS 1512A window between a unlocking robust mutex and resetting the pointer in the 1513.Dv robust_inact_offset 1514member of the registered 1515.Vt struct umtx_robust_lists_params 1516allows another thread to destroy the mutex, thus making the kernel inspect 1517freed or reused memory. 1518The 1519.Li libthr 1520implementation is only vulnerable to this race when operating on 1521a shared mutex. 1522A possible fix for the current implementation is to strengthen the checks 1523for shared mutexes before terminating them, in particular, verifying 1524that the mutex memory is mapped from a shared memory object allocated 1525by the 1526.Dv UMTX_OP_SHM 1527request. 1528This is not done because it is believed that the race is adequately 1529covered by other consistency checks, while adding the check would 1530prevent alternative implementations of 1531.Li libpthread . 1532