xref: /freebsd-12.1/lib/libc/sys/_umtx_op.2 (revision 9121aedd)
1.\" Copyright (c) 2016 The FreeBSD Foundation, Inc.
2.\" All rights reserved.
3.\"
4.\" This documentation was written by
5.\" Konstantin Belousov <[email protected]> under sponsorship
6.\" from the FreeBSD Foundation.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that the following conditions
10.\" are met:
11.\" 1. Redistributions of source code must retain the above copyright
12.\"    notice, this list of conditions and the following disclaimer.
13.\" 2. Redistributions in binary form must reproduce the above copyright
14.\"    notice, this list of conditions and the following disclaimer in the
15.\"    documentation and/or other materials provided with the distribution.
16.\"
17.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
18.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
21.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
27.\" SUCH DAMAGE.
28.\"
29.\" $FreeBSD$
30.\"
31.Dd November 13, 2017
32.Dt _UMTX_OP 2
33.Os
34.Sh NAME
35.Nm _umtx_op
36.Nd interface for implementation of userspace threading synchronization primitives
37.Sh LIBRARY
38.Lb libc
39.Sh SYNOPSIS
40.In sys/types.h
41.In sys/umtx.h
42.Ft int
43.Fn _umtx_op "void *obj" "int op" "u_long val" "void *uaddr" "void *uaddr2"
44.Sh DESCRIPTION
45The
46.Fn _umtx_op
47system call provides kernel support for userspace implementation of
48the threading synchronization primitives.
49The
50.Lb libthr
51uses the syscall to implement
52.St -p1003.1-2001
53pthread locks, like mutexes, condition variables and so on.
54.Ss STRUCTURES
55The operations, performed by the
56.Fn _umtx_op
57syscall, operate on userspace objects which are described
58by the following structures.
59Reserved fields and paddings are omitted.
60All objects require ABI-mandated alignment, but this is not currently
61enforced consistently on all architectures.
62.Pp
63The following flags are defined for flag fields of all structures:
64.Bl -tag -width indent
65.It Dv USYNC_PROCESS_SHARED
66Allow selection of the process-shared sleep queue for the thread sleep
67container, when the lock ownership cannot be granted immediately,
68and the operation must sleep.
69The process-shared or process-private sleep queue is selected based on
70the attributes of the memory mapping which contains the first byte of
71the structure, see
72.Xr mmap 2 .
73Otherwise, if the flag is not specified, the process-private sleep queue
74is selected regardless of the memory mapping attributes, as an optimization.
75.Pp
76See the
77.Sx SLEEP QUEUES
78subsection below for more details on sleep queues.
79.El
80.Bl -hang -offset indent
81.It Sy Mutex
82.Bd -literal
83struct umutex {
84	volatile lwpid_t m_owner;
85	uint32_t         m_flags;
86	uint32_t         m_ceilings[2];
87	uintptr_t        m_rb_lnk;
88};
89.Ed
90.Pp
91The
92.Dv m_owner
93field is the actual lock.
94It contains either the thread identifier of the lock owner in the
95locked state, or zero when the lock is unowned.
96The highest bit set indicates that there is contention on the lock.
97The constants are defined for special values:
98.Bl -tag -width indent
99.It Dv UMUTEX_UNOWNED
100Zero, the value stored in the unowned lock.
101.It Dv UMUTEX_CONTESTED
102The contenion indicator.
103.It Dv UMUTEX_RB_OWNERDEAD
104A thread owning the robust mutex terminated.
105The mutex is in unlocked state.
106.It Dv UMUTEX_RB_NOTRECOV
107The robust mutex is in a non-recoverable state.
108It cannot be locked until reinitialized.
109.El
110.Pp
111The
112.Dv m_flags
113field may contain the following umutex-specific flags, in addition to
114the common flags:
115.Bl -tag -width indent
116.It Dv UMUTEX_PRIO_INHERIT
117Mutex implements
118.Em Priority Inheritance
119protocol.
120.It Dv UMUTEX_PRIO_PROTECT
121Mutex implements
122.Em Priority Protection
123protocol.
124.It Dv UMUTEX_ROBUST
125Mutex is robust, as described in the
126.Sx ROBUST UMUTEXES
127section below.
128.It Dv UMUTEX_NONCONSISTENT
129Robust mutex is in a transient non-consistent state.
130Not used by kernel.
131.El
132.Pp
133In the manual page, mutexes not having
134.Dv UMUTEX_PRIO_INHERIT
135and
136.Dv UMUTEX_PRIO_PROTECT
137flags set, are called normal mutexes.
138Each type of mutex
139.Pq normal, priority-inherited, and priority-protected
140has a separate sleep queue associated
141with the given key.
142.Pp
143For priority protected mutexes, the
144.Dv m_ceilings
145array contains priority ceiling values.
146The
147.Dv m_ceilings[0]
148is the ceiling value for the mutex, as specified by
149.St -p1003.1-2008
150for the
151.Em Priority Protected
152mutex protocol.
153The
154.Dv m_ceilings[1]
155is used only for the unlock of a priority protected mutex, when
156unlock is done in an order other than the reversed lock order.
157In this case,
158.Dv m_ceilings[1]
159must contain the ceiling value for the last locked priority protected
160mutex, for proper priority reassignment.
161If, instead, the unlocking mutex was the last priority propagated
162mutex locked by the thread,
163.Dv m_ceilings[1]
164should contain \-1.
165This is required because kernel does not maintain the ordered lock list.
166.It Sy Condition variable
167.Bd -literal
168struct ucond {
169	volatile uint32_t c_has_waiters;
170	uint32_t          c_flags;
171	uint32_t          c_clockid;
172};
173.Ed
174.Pp
175A non-zero
176.Dv c_has_waiters
177value indicates that there are in-kernel waiters for the condition,
178executing the
179.Dv UMTX_OP_CV_WAIT
180request.
181.Pp
182The
183.Dv c_flags
184field contains flags.
185Only the common flags
186.Pq Dv USYNC_PROCESS_SHARED
187are defined for ucond.
188.Pp
189The
190.Dv c_clockid
191member provides the clock identifier to use for timeout, when the
192.Dv UMTX_OP_CV_WAIT
193request has both the
194.Dv CVWAIT_CLOCKID
195flag and the timeout specified.
196Valid clock identifiers are a subset of those for
197.Xr clock_gettime 2 :
198.Bl -bullet -compact
199.It
200.Dv CLOCK_MONOTONIC
201.It
202.Dv CLOCK_MONOTONIC_FAST
203.It
204.Dv CLOCK_MONOTONIC_PRECISE
205.It
206.Dv CLOCK_PROF
207.It
208.Dv CLOCK_REALTIME
209.It
210.Dv CLOCK_REALTIME_FAST
211.It
212.Dv CLOCK_REALTIME_PRECISE
213.It
214.Dv CLOCK_SECOND
215.It
216.Dv CLOCK_UPTIME
217.It
218.Dv CLOCK_UPTIME_FAST
219.It
220.Dv CLOCK_UPTIME_PRECISE
221.It
222.Dv CLOCK_VIRTUAL
223.El
224.It Sy Reader/writer lock
225.Bd -literal
226struct urwlock {
227	volatile int32_t rw_state;
228	uint32_t         rw_flags;
229	uint32_t         rw_blocked_readers;
230	uint32_t         rw_blocked_writers;
231};
232.Ed
233.Pp
234The
235.Dv rw_state
236field is the actual lock.
237It contains both the flags and counter of the read locks which were
238granted.
239Names of the
240.Dv rw_state
241bits are following:
242.Bl -tag -width indent
243.It Dv URWLOCK_WRITE_OWNER
244Write lock was granted.
245.It Dv URWLOCK_WRITE_WAITERS
246There are write lock waiters.
247.It Dv URWLOCK_READ_WAITERS
248There are read lock waiters.
249.It Dv URWLOCK_READER_COUNT(c)
250Returns the count of currently granted read locks.
251.El
252.Pp
253At any given time there may be only one thread to which the writer lock
254is granted on the
255.Vt struct rwlock ,
256and no threads are granted read lock.
257Or, at the given time, up to
258.Dv URWLOCK_MAX_READERS
259threads may be granted the read lock simultaneously, but write lock is
260not granted to any thread.
261.Pp
262The following flags for the
263.Dv rw_flags
264member of
265.Vt struct urwlock
266are defined, in addition to the common flags:
267.Bl -tag -width indent
268.It Dv URWLOCK_PREFER_READER
269If specified, immediately grant read lock requests when
270.Dv urwlock
271is already read-locked, even in presence of unsatisfied write
272lock requests.
273By default, if there is a write lock waiter, further read requests are
274not granted, to prevent unfair write lock waiter starvation.
275.El
276.Pp
277The
278.Dv rw_blocked_readers
279and
280.Dv rw_blocked_writers
281members contain the count of threads which are sleeping in kernel,
282waiting for the associated request type to be granted.
283The fields are used by kernel to update the
284.Dv URWLOCK_READ_WAITERS
285and
286.Dv URWLOCK_WRITE_WAITERS
287flags of the
288.Dv rw_state
289lock after requesting thread was woken up.
290.It Sy Semaphore
291.Bd -literal
292struct _usem2 {
293	volatile uint32_t _count;
294	uint32_t          _flags;
295};
296.Ed
297.Pp
298The
299.Dv _count
300word represents a counting semaphore.
301A non-zero value indicates an unlocked (posted) semaphore, while zero
302represents the locked state.
303The maximal supported semaphore count is
304.Dv USEM_MAX_COUNT .
305.Pp
306The
307.Dv _count
308word, besides the counter of posts (unlocks), also contains the
309.Dv USEM_HAS_WAITERS
310bit, which indicates that locked semaphore has waiting threads.
311.Pp
312The
313.Dv USEM_COUNT()
314macro, applied to the
315.Dv _count
316word, returns the current semaphore counter, which is the number of posts
317issued on the semaphore.
318.Pp
319The following bits for the
320.Dv _flags
321member of
322.Vt struct _usem2
323are defined, in addition to the common flags:
324.Bl -tag -width indent
325.It Dv USEM_NAMED
326Flag is ignored by kernel.
327.El
328.It Sy Timeout parameter
329.Bd -literal
330struct _umtx_time {
331	struct timespec _timeout;
332	uint32_t        _flags;
333	uint32_t        _clockid;
334};
335.Ed
336.Pp
337Several
338.Fn _umtx_op
339operations allow the blocking time to be limited, failing the request
340if it cannot be satisfied in the specified time period.
341The timeout is specified by passing either the address of
342.Vt struct timespec ,
343or its extended variant,
344.Vt struct _umtx_time ,
345as the
346.Fa uaddr2
347argument of
348.Fn _umtx_op .
349They are distinguished by the
350.Fa uaddr
351value, which must be equal to the size of the structure pointed to by
352.Fa uaddr2 ,
353casted to
354.Vt uintptr_t .
355.Pp
356The
357.Dv _timeout
358member specifies the time when the timeout should occur.
359Legal values for clock identifier
360.Dv _clockid
361are shared with the
362.Fa clock_id
363argument to the
364.Xr clock_gettime 2
365function,
366and use the same underlying clocks.
367The specified clock is used to obtain the current time value.
368Interval counting is always performed by the monotonic wall clock.
369.Pp
370The
371.Dv _flags
372argument allows the following flags to further define the timeout behaviour:
373.Bl -tag -width indent
374.It Dv UMTX_ABSTIME
375The
376.Dv _timeout
377value is the absolute time.
378The thread will be unblocked and the request failed when specified
379clock value is equal or exceeds the
380.Dv _timeout.
381.Pp
382If the flag is absent, the timeout value is relative, that is the amount
383of time, measured by the monotonic wall clock from the moment of the request
384start.
385.El
386.El
387.Ss SLEEP QUEUES
388When a locking request cannot be immediately satisfied, the thread is
389typically put to
390.Em sleep ,
391which is a non-runnable state terminated by the
392.Em wake
393operation.
394Lock operations include a
395.Em try
396variant which returns an error rather than sleeping if the lock cannot
397be obtained.
398Also,
399.Fn _umtx_op
400provides requests which explicitly put the thread to sleep.
401.Pp
402Wakes need to know which threads to make runnable, so sleeping threads
403are grouped into containers called
404.Em sleep queues .
405A sleep queue is identified by a key, which for
406.Fn _umtx_op
407is defined as the physical address of some variable.
408Note that the
409.Em physical
410address is used, which means that same variable mapped multiple
411times will give one key value.
412This mechanism enables the construction of
413.Em process-shared
414locks.
415.Pp
416A related attribute of the key is shareability.
417Some requests always interpret keys as private for the current process,
418creating sleep queues with the scope of the current process even if
419the memory is shared.
420Others either select the shareability automatically from the
421mapping attributes, or take additional input as the
422.Dv USYNC_PROCESS_SHARED
423common flag.
424This is done as optimization, allowing the lock scope to be limited
425regardless of the kind of backing memory.
426.Pp
427Only the address of the start byte of the variable specified as key is
428important for determining corresponding sleep queue.
429The size of the variable does not matter, so, for example, sleep on the same
430address interpeted as
431.Vt uint32_t
432and
433.Vt long
434on a little-endian 64-bit platform would collide.
435.Pp
436The last attribute of the key is the object type.
437The sleep queue to which a sleeping thread is assigned is an individual
438one for simple wait requests, mutexes, rwlocks, condvars and other
439primitives, even when the physical address of the key is same.
440.Pp
441When waking up a limited number of threads from a given sleep queue,
442the highest priority threads that have been blocked for the longest on
443the queue are selected.
444.Ss ROBUST UMUTEXES
445The
446.Em robust umutexes
447are provided as a substrate for a userspace library to implement
448.Tn POSIX
449robust mutexes.
450A robust umutex must have the
451.Dv UMUTEX_ROBUST
452flag set.
453.Pp
454On thread termination, the kernel walks two lists of mutexes.
455The two lists head addresses must be provided by a prior call to
456.Dv UMTX_OP_ROBUST_LISTS
457request.
458The lists are singly-linked.
459The link to next element is provided by the
460.Dv m_rb_lnk
461member of the
462.Vt struct umutex .
463.Pp
464Robust list processing is aborted if the kernel finds a mutex
465with any of the following conditions:
466.Bl -dash -offset indent -compact
467.It
468the
469.Dv UMUTEX_ROBUST
470flag is not set
471.It
472not owned by the current thread, except when the mutex is pointed to
473by the
474.Dv robust_inactive
475member of the
476.Vt struct umtx_robust_lists_params ,
477registered for the current thread
478.It
479the combination of mutex flags is invalid
480.It
481read of the umutex memory faults
482.It
483the list length limit described in
484.Xr libthr 3
485is reached.
486.El
487.Pp
488Every mutex in both lists is unlocked as if the
489.Dv UMTX_OP_MUTEX_UNLOCK
490request is performed on it, but instead of the
491.Dv UMUTEX_UNOWNED
492value, the
493.Dv m_owner
494field is written with the
495.Dv UMUTEX_RB_OWNERDEAD
496value.
497When a mutex in the
498.Dv UMUTEX_RB_OWNERDEAD
499state is locked by kernel due to the
500.Dv UMTX_OP_MUTEX_TRYLOCK
501and
502.Dv UMTX_OP_MUTEX_LOCK
503requests, the lock is granted and
504.Er EOWNERDEAD
505error is returned.
506.Pp
507Also, the kernel handles the
508.Dv UMUTEX_RB_NOTRECOV
509value of
510.Dv the m_owner
511field specially, always returning the
512.Er ENOTRECOVERABLE
513error for lock attempts, without granting the lock.
514.Ss OPERATIONS
515The following operations, requested by the
516.Fa op
517argument to the function, are implemented:
518.Bl -tag -width indent
519.It Dv UMTX_OP_WAIT
520Wait.
521The arguments for the request are:
522.Bl -tag -width "obj"
523.It Fa obj
524Pointer to a variable of type
525.Vt long .
526.It Fa val
527Current value of the
528.Dv *obj .
529.El
530.Pp
531The current value of the variable pointed to by the
532.Fa obj
533argument is compared with the
534.Fa val .
535If they are equal, the requesting thread is put to interruptible sleep
536until woken up or the optionally specified timeout expires.
537.Pp
538The comparison and sleep are atomic.
539In other words, if another thread writes a new value to
540.Dv *obj
541and then issues
542.Dv UMTX_OP_WAKE ,
543the request is guaranteed to not miss the wakeup,
544which might otherwise happen between comparison and blocking.
545.Pp
546The physical address of memory where the
547.Fa *obj
548variable is located, is used as a key to index sleeping threads.
549.Pp
550The read of the current value of the
551.Dv *obj
552variable is not guarded by barriers.
553In particular, it is the user's duty to ensure the lock acquire
554and release memory semantics, if the
555.Dv UMTX_OP_WAIT
556and
557.Dv UMTX_OP_WAKE
558requests are used as a substrate for implementing a simple lock.
559.Pp
560The request is not restartable.
561An unblocked signal delivered during the wait always results in sleep
562interruption and
563.Er EINTR
564error.
565.Pp
566Optionally, a timeout for the request may be specified.
567.It Dv UMTX_OP_WAKE
568Wake the threads possibly sleeping due to
569.Dv UMTX_OP_WAIT .
570The arguments for the request are:
571.Bl -tag -width "obj"
572.It Fa obj
573Pointer to a variable, used as a key to find sleeping threads.
574.It Fa val
575Up to
576.Fa val
577threads are woken up by this request.
578Specify
579.Dv INT_MAX
580to wake up all waiters.
581.El
582.It Dv UMTX_OP_MUTEX_TRYLOCK
583Try to lock umutex.
584The arguments to the request are:
585.Bl -tag -width "obj"
586.It Fa obj
587Pointer to the umutex.
588.El
589.Pp
590Operates same as the
591.Dv UMTX_OP_MUTEX_LOCK
592request, but returns
593.Er EBUSY
594instead of sleeping if the lock cannot be obtained immediately.
595.It Dv UMTX_OP_MUTEX_LOCK
596Lock umutex.
597The arguments to the request are:
598.Bl -tag -width "obj"
599.It Fa obj
600Pointer to the umutex.
601.El
602.Pp
603Locking is performed by writing the current thread id into the
604.Dv m_owner
605word of the
606.Vt struct umutex .
607The write is atomic, preserves the
608.Dv UMUTEX_CONTESTED
609contention indicator, and provides the acquire barrier for
610lock entrance semantic.
611.Pp
612If the lock cannot be obtained immediately because another thread owns
613the lock, the current thread is put to sleep, with
614.Dv UMUTEX_CONTESTED
615bit set before.
616Upon wake up, the lock conditions are re-tested.
617.Pp
618The request adheres to the priority protection or inheritance protocol
619of the mutex, specified by the
620.Dv UMUTEX_PRIO_PROTECT
621or
622.Dv UMUTEX_PRIO_INHERIT
623flag, respectively.
624.Pp
625Optionally, a timeout for the request may be specified.
626.Pp
627A request with a timeout specified is not restartable.
628An unblocked signal delivered during the wait always results in sleep
629interruption and
630.Er EINTR
631error.
632A request without timeout specified is always restarted after return
633from a signal handler.
634.It Dv UMTX_OP_MUTEX_UNLOCK
635Unlock umutex.
636The arguments to the request are:
637.Bl -tag -width "obj"
638.It Fa obj
639Pointer to the umutex.
640.El
641.Pp
642Unlocks the mutex, by writing
643.Dv UMUTEX_UNOWNED
644(zero) value into
645.Dv m_owner
646word of the
647.Vt struct umutex .
648The write is done with a release barrier, to provide lock leave semantic.
649.Pp
650If there are threads sleeping in the sleep queue associated with the
651umutex, one thread is woken up.
652If more than one thread sleeps in the sleep queue, the
653.Dv UMUTEX_CONTESTED
654bit is set together with the write of the
655.Dv UMUTEX_UNOWNED
656value into
657.Dv m_owner .
658.Pp
659The request adheres to the priority protection or inheritance protocol
660of the mutex, specified by the
661.Dv UMUTEX_PRIO_PROTECT
662or
663.Dv UMUTEX_PRIO_INHERIT
664flag, respectively.
665See description of the
666.Dv m_ceilings
667member of the
668.Vt struct umutex
669structure for additional details of the request operation on the
670priority protected protocol mutex.
671.It Dv UMTX_OP_SET_CEILING
672Set ceiling for the priority protected umutex.
673The arguments to the request are:
674.Bl -tag -width "uaddr"
675.It Fa obj
676Pointer to the umutex.
677.It Fa val
678New ceiling value.
679.It Fa uaddr
680Address of a variable of type
681.Vt uint32_t .
682If not
683.Dv NULL
684and the update was successful, the previous ceiling value is
685written to the location pointed to by
686.Fa uaddr .
687.El
688.Pp
689The request locks the umutex pointed to by the
690.Fa obj
691parameter, waiting for the lock if not immediately available.
692After the lock is obtained, the new ceiling value
693.Fa val
694is written to the
695.Dv m_ceilings[0]
696member of the
697.Vt struct umutex,
698after which the umutex is unlocked.
699.Pp
700The locking does not adhere to the priority protect protocol,
701to conform to the
702.Tn POSIX
703requirements for the
704.Xr pthread_mutex_setprioceiling 3
705interface.
706.It Dv UMTX_OP_CV_WAIT
707Wait for a condition.
708The arguments to the request are:
709.Bl -tag -width "uaddr2"
710.It Fa obj
711Pointer to the
712.Vt struct ucond .
713.It Fa val
714Request flags, see below.
715.It Fa uaddr
716Pointer to the umutex.
717.It Fa uaddr2
718Optional pointer to a
719.Vt struct timespec
720for timeout specification.
721.El
722.Pp
723The request must be issued by the thread owning the mutex pointed to
724by the
725.Fa uaddr
726argument.
727The
728.Dv c_hash_waiters
729member of the
730.Vt struct ucond ,
731pointed to by the
732.Fa obj
733argument, is set to an arbitrary non-zero value, after which the
734.Fa uaddr
735mutex is unlocked (following the appropriate protocol), and
736the current thread is put to sleep on the sleep queue keyed by
737the
738.Fa obj
739argument.
740The operations are performed atomically.
741It is guaranteed to not miss a wakeup from
742.Dv UMTX_OP_CV_SIGNAL
743or
744.Dv UMTX_OP_CV_BROADCAST
745sent between mutex unlock and putting the current thread on the sleep queue.
746.Pp
747Upon wakeup, if the timeout expired and no other threads are sleeping in
748the same sleep queue, the
749.Dv c_hash_waiters
750member is cleared.
751After wakeup, the
752.Fa uaddr
753umutex is not relocked.
754.Pp
755The following flags are defined:
756.Bl -tag -width "CVWAIT_CLOCKID"
757.It Dv CVWAIT_ABSTIME
758Timeout is absolute.
759.It Dv CVWAIT_CLOCKID
760Clockid is provided.
761.El
762.Pp
763Optionally, a timeout for the request may be specified.
764Unlike other requests, the timeout value is specified directly by a
765.Vt struct timespec ,
766pointed to by the
767.Fa uaddr2
768argument.
769If the
770.Dv CVWAIT_CLOCKID
771flag is provided, the timeout uses the clock from the
772.Dv c_clockid
773member of the
774.Vt struct ucond ,
775pointed to by
776.Fa obj
777argument.
778Otherwise,
779.Dv CLOCK_REALTIME
780is used, regardless of the clock identifier possibly specified in the
781.Vt struct _umtx_time .
782If the
783.Dv CVWAIT_ABSTIME
784flag is supplied, the timeout specifies absolute time value, otherwise
785it denotes a relative time interval.
786.Pp
787The request is not restartable.
788An unblocked signal delivered during
789the wait always results in sleep interruption and
790.Er EINTR
791error.
792.It Dv UMTX_OP_CV_SIGNAL
793Wake up one condition waiter.
794The arguments to the request are:
795.Bl -tag -width "obj"
796.It Fa obj
797Pointer to
798.Vt struct ucond .
799.El
800.Pp
801The request wakes up at most one thread sleeping on the sleep queue keyed
802by the
803.Fa obj
804argument.
805If the woken up thread was the last on the sleep queue, the
806.Dv c_has_waiters
807member of the
808.Vt struct ucond
809is cleared.
810.It Dv UMTX_OP_CV_BROADCAST
811Wake up all condition waiters.
812The arguments to the request are:
813.Bl -tag -width "obj"
814.It Fa obj
815Pointer to
816.Vt struct ucond .
817.El
818.Pp
819The request wakes up all threads sleeping on the sleep queue keyed by the
820.Fa obj
821argument.
822The
823.Dv c_has_waiters
824member of the
825.Vt struct ucond
826is cleared.
827.It Dv UMTX_OP_WAIT_UINT
828Same as
829.Dv UMTX_OP_WAIT ,
830but the type of the variable pointed to by
831.Fa obj
832is
833.Vt u_int
834.Pq a 32-bit integer .
835.It Dv UMTX_OP_RW_RDLOCK
836Read-lock a
837.Vt struct rwlock
838lock.
839The arguments to the request are:
840.Bl -tag -width "obj"
841.It Fa obj
842Pointer to the lock (of type
843.Vt struct rwlock )
844to be read-locked.
845.It Fa val
846Additional flags to augment locking behaviour.
847The valid flags in the
848.Fa val
849argument are:
850.Bl -tag -width indent
851.It Dv URWLOCK_PREFER_READER
852.El
853.El
854.Pp
855The request obtains the read lock on the specified
856.Vt struct rwlock
857by incrementing the count of readers in the
858.Dv rw_state
859word of the structure.
860If the
861.Dv URWLOCK_WRITE_OWNER
862bit is set in the word
863.Dv rw_state ,
864the lock was granted to a writer which has not yet relinquished
865its ownership.
866In this case the current thread is put to sleep until it makes sense to
867retry.
868.Pp
869If the
870.Dv URWLOCK_PREFER_READER
871flag is set either in the
872.Dv rw_flags
873word of the structure, or in the
874.Fa val
875argument of the request, the presence of the threads trying to obtain
876the write lock on the same structure does not prevent the current thread
877from trying to obtain the read lock.
878Otherwise, if the flag is not set, and the
879.Dv URWLOCK_WRITE_WAITERS
880flag is set in
881.Dv rw_state ,
882the current thread does not attempt to obtain read-lock.
883Instead it sets the
884.Dv URWLOCK_READ_WAITERS
885in the
886.Dv rw_state
887word and puts itself to sleep on corresponding sleep queue.
888Upon wakeup, the locking conditions are re-evaluated.
889.Pp
890Optionally, a timeout for the request may be specified.
891.Pp
892The request is not restartable.
893An unblocked signal delivered during the wait always results in sleep
894interruption and
895.Er EINTR
896error.
897.It Dv UMTX_OP_RW_WRLOCK
898Write-lock a
899.Vt struct rwlock
900lock.
901The arguments to the request are:
902.Bl -tag -width "obj"
903.It Fa obj
904Pointer to the lock (of type
905.Vt struct rwlock )
906to be write-locked.
907.El
908.Pp
909The request obtains a write lock on the specified
910.Vt struct rwlock ,
911by setting the
912.Dv URWLOCK_WRITE_OWNER
913bit in the
914.Dv rw_state
915word of the structure.
916If there is already a write lock owner, as indicated by the
917.Dv URWLOCK_WRITE_OWNER
918bit being set, or there are read lock owners, as indicated
919by the read-lock counter, the current thread does not attempt to
920obtain the write-lock.
921Instead it sets the
922.Dv URWLOCK_WRITE_WAITERS
923in the
924.Dv rw_state
925word and puts itself to sleep on corresponding sleep queue.
926Upon wakeup, the locking conditions are re-evaluated.
927.Pp
928Optionally, a timeout for the request may be specified.
929.Pp
930The request is not restartable.
931An unblocked signal delivered during the wait always results in sleep
932interruption and
933.Er EINTR
934error.
935.It Dv UMTX_OP_RW_UNLOCK
936Unlock rwlock.
937The arguments to the request are:
938.Bl -tag -width "obj"
939.It Fa obj
940Pointer to the lock (of type
941.Vt struct rwlock )
942to be unlocked.
943.El
944.Pp
945The unlock type (read or write) is determined by the
946current lock state.
947Note that the
948.Vt struct rwlock
949does not save information about the identity of the thread which
950acquired the lock.
951.Pp
952If there are pending writers after the unlock, and the
953.Dv URWLOCK_PREFER_READER
954flag is not set in the
955.Dv rw_flags
956member of the
957.Fa *obj
958structure, one writer is woken up, selected as described in the
959.Sx SLEEP QUEUES
960subsection.
961If the
962.Dv URWLOCK_PREFER_READER
963flag is set, a pending writer is woken up only if there is
964no pending readers.
965.Pp
966If there are no pending writers, or, in the case that the
967.Dv URWLOCK_PREFER_READER
968flag is set, then all pending readers are woken up by unlock.
969.It Dv UMTX_OP_WAIT_UINT_PRIVATE
970Same as
971.Dv UMTX_OP_WAIT_UINT ,
972but unconditionally select the process-private sleep queue.
973.It Dv UMTX_OP_WAKE_PRIVATE
974Same as
975.Dv UMTX_OP_WAKE ,
976but unconditionally select the process-private sleep queue.
977.It Dv UMTX_OP_MUTEX_WAIT
978Wait for mutex availability.
979The arguments to the request are:
980.Bl -tag -width "obj"
981.It Fa obj
982Address of the mutex.
983.El
984.Pp
985Similarly to the
986.Dv UMTX_OP_MUTEX_LOCK ,
987put the requesting thread to sleep if the mutex lock cannot be obtained
988immediately.
989The
990.Dv UMUTEX_CONTESTED
991bit is set in the
992.Dv m_owner
993word of the mutex to indicate that there is a waiter, before the thread
994is added to the sleep queue.
995Unlike the
996.Dv UMTX_OP_MUTEX_LOCK
997request, the lock is not obtained.
998.Pp
999The operation is not implemented for priority protected and
1000priority inherited protocol mutexes.
1001.Pp
1002Optionally, a timeout for the request may be specified.
1003.Pp
1004A request with a timeout specified is not restartable.
1005An unblocked signal delivered during the wait always results in sleep
1006interruption and
1007.Er EINTR
1008error.
1009A request without a timeout automatically restarts if the signal disposition
1010requested restart via the
1011.Dv SA_RESTART
1012flag in
1013.Vt struct sigaction
1014member
1015.Dv sa_flags .
1016.It Dv UMTX_OP_NWAKE_PRIVATE
1017Wake up a batch of sleeping threads.
1018The arguments to the request are:
1019.Bl -tag -width "obj"
1020.It Fa obj
1021Pointer to the array of pointers.
1022.It Fa val
1023Number of elements in the array pointed to by
1024.Fa obj .
1025.El
1026.Pp
1027For each element in the array pointed to by
1028.Fa obj ,
1029wakes up all threads waiting on the
1030.Em private
1031sleep queue with the key
1032being the byte addressed by the array element.
1033.It Dv UMTX_OP_MUTEX_WAKE
1034Check if a normal umutex is unlocked and wake up a waiter.
1035The arguments for the request are:
1036.Bl -tag -width "obj"
1037.It Fa obj
1038Pointer to the umutex.
1039.El
1040.Pp
1041If the
1042.Dv m_owner
1043word of the mutex pointed to by the
1044.Fa obj
1045argument indicates unowned mutex, which has its contention indicator bit
1046.Dv UMUTEX_CONTESTED
1047set, clear the bit and wake up one waiter in the sleep queue associated
1048with the byte addressed by the
1049.Fa obj ,
1050if any.
1051Only normal mutexes are supported by the request.
1052The sleep queue is always one for a normal mutex type.
1053.Pp
1054This request is deprecated in favor of
1055.Dv UMTX_OP_MUTEX_WAKE2
1056since mutexes using it cannot synchronize their own destruction.
1057That is, the
1058.Dv m_owner
1059word has already been set to
1060.Dv UMUTEX_UNOWNED
1061when this request is made,
1062so that another thread can lock, unlock and destroy the mutex
1063(if no other thread uses the mutex afterwards).
1064Clearing the
1065.Dv UMUTEX_CONTESTED
1066bit may then modify freed memory.
1067.It Dv UMTX_OP_MUTEX_WAKE2
1068Check if a umutex is unlocked and wake up a waiter.
1069The arguments for the request are:
1070.Bl -tag -width "obj"
1071.It Fa obj
1072Pointer to the umutex.
1073.It Fa val
1074The umutex flags.
1075.El
1076.Pp
1077The request does not read the
1078.Dv m_flags
1079member of the
1080.Vt struct umutex ;
1081instead, the
1082.Fa val
1083argument supplies flag information, in particular, to determine the
1084sleep queue where the waiters are found for wake up.
1085.Pp
1086If the mutex is unowned, one waiter is woken up.
1087.Pp
1088If the mutex memory cannot be accessed, all waiters are woken up.
1089.Pp
1090If there is more than one waiter on the sleep queue, or there is only
1091one waiter but the mutex is owned by a thread, the
1092.Dv UMUTEX_CONTESTED
1093bit is set in the
1094.Dv m_owner
1095word of the
1096.Vt struct umutex .
1097.It Dv UMTX_OP_SEM2_WAIT
1098Wait until semaphore is available.
1099The arguments to the request are:
1100.Bl -tag -width "obj"
1101.It Fa obj
1102Pointer to the semaphore (of type
1103.Vt struct _usem2 ) .
1104.El
1105.Pp
1106Put the requesting thread onto a sleep queue if the semaphore counter
1107is zero.
1108If the thread is put to sleep, the
1109.Dv USEM_HAS_WAITERS
1110bit is set in the
1111.Dv _count
1112word to indicate waiters.
1113The function returns either due to
1114.Dv _count
1115indicating the semaphore is available (non-zero count due to post),
1116or due to a wakeup.
1117The return does not guarantee that the semaphore is available,
1118nor does it consume the semaphore lock on successful return.
1119.Pp
1120Optionally, a timeout for the request may be specified.
1121.Pp
1122A request with non-absolute timeout value is not restartable.
1123An unblocked signal delivered during such wait results in sleep
1124interruption and
1125.Er EINTR
1126error.
1127.It Dv UMTX_OP_SEM2_WAKE
1128Wake up waiters on semaphore lock.
1129The arguments to the request are:
1130.Bl -tag -width "obj"
1131.It Fa obj
1132Pointer to the semaphore (of type
1133.Vt struct _usem2 ) .
1134.El
1135.Pp
1136The request wakes up one waiter for the semaphore lock.
1137The function does not increment the semaphore lock count.
1138If the
1139.Dv USEM_HAS_WAITERS
1140bit was set in the
1141.Dv _count
1142word, and the last sleeping thread was woken up, the bit is cleared.
1143.It Dv UMTX_OP_SHM
1144Manage anonymous
1145.Tn POSIX
1146shared memory objects (see
1147.Xr shm_open 2 ) ,
1148which can be attached to a byte of physical memory, mapped into the
1149process address space.
1150The objects are used to implement process-shared locks in
1151.Dv libthr .
1152.Pp
1153The
1154.Fa val
1155argument specifies the sub-request of the
1156.Dv UMTX_OP_SHM
1157request:
1158.Bl -tag -width indent
1159.It Dv UMTX_SHM_CREAT
1160Creates the anonymous shared memory object, which can be looked up
1161with the specified key
1162.Fa uaddr.
1163If the object associated with the
1164.Fa uaddr
1165key already exists, it is returned instead of creating a new object.
1166The object's size is one page.
1167On success, the file descriptor referencing the object is returned.
1168The descriptor can be used for mapping the object using
1169.Xr mmap 2 ,
1170or for other shared memory operations.
1171.It Dv UMTX_SHM_LOOKUP
1172Same as
1173.Dv UMTX_SHM_CREATE
1174request, but if there is no shared memory object associated with
1175the specified key
1176.Fa uaddr ,
1177an error is returned, and no new object is created.
1178.It Dv UMTX_SHM_DESTROY
1179De-associate the shared object with the specified key
1180.Fa uaddr.
1181The object is destroyed after the last open file descriptor is closed
1182and the last mapping for it is destroyed.
1183.It Dv UMTX_SHM_ALIVE
1184Checks whether there is a live shared object associated with the
1185supplied key
1186.Fa uaddr .
1187Returns zero if there is, and an error otherwise.
1188This request is an optimization of the
1189.Dv UMTX_SHM_LOOKUP
1190request.
1191It is cheaper when only the liveness of the associated object is asked
1192for, since no file descriptor is installed in the process fd table
1193on success.
1194.El
1195.Pp
1196The
1197.Fa uaddr
1198argument specifies the virtual address, which backing physical memory
1199byte identity is used as a key for the anonymous shared object
1200creation or lookup.
1201.It Dv UMTX_OP_ROBUST_LISTS
1202Register the list heads for the current thread's robust mutex lists.
1203The arguments to the request are:
1204.Bl -tag -width "uaddr"
1205.It Fa val
1206Size of the structure passed in the
1207.Fa uaddr
1208argument.
1209.It Fa uaddr
1210Pointer to the structure of type
1211.Vt struct umtx_robust_lists_params .
1212.El
1213.Pp
1214The structure is defined as
1215.Bd -literal
1216struct umtx_robust_lists_params {
1217	uintptr_t	robust_list_offset;
1218	uintptr_t	robust_priv_list_offset;
1219	uintptr_t	robust_inact_offset;
1220};
1221.Ed
1222.Pp
1223The
1224.Dv robust_list_offset
1225member contains address of the first element in the list of locked
1226robust shared mutexes.
1227The
1228.Dv robust_priv_list_offset
1229member contains address of the first element in the list of locked
1230robust private mutexes.
1231The private and shared robust locked lists are split to allow fast
1232termination of the shared list on fork, in the child.
1233.Pp
1234The
1235.Dv robust_inact_offset
1236contains a pointer to the mutex which might be locked in nearby future,
1237or might have been just unlocked.
1238It is typically set by the lock or unlock mutex implementation code
1239around the whole operation, since lists can be only changed race-free
1240when the thread owns the mutex.
1241The kernel inspects the
1242.Dv robust_inact_offset
1243in addition to walking the shared and private lists.
1244Also, the mutex pointed to by
1245.Dv robust_inact_offset
1246is handled more loosely at the thread termination time,
1247than other mutexes on the list.
1248That mutex is allowed to be not owned by the current thread,
1249in which case list processing is continued.
1250See
1251.Sx ROBUST UMUTEXES
1252subsection for details.
1253.El
1254.Sh RETURN VALUES
1255If successful,
1256all requests, except
1257.Dv UMTX_SHM_CREAT
1258and
1259.Dv UMTX_SHM_LOOKUP
1260sub-requests of the
1261.Dv UMTX_OP_SHM
1262request, will return zero.
1263The
1264.Dv UMTX_SHM_CREAT
1265and
1266.Dv UMTX_SHM_LOOKUP
1267return a shared memory file descriptor on success.
1268On error \-1 is returned, and the
1269.Va errno
1270variable is set to indicate the error.
1271.Sh ERRORS
1272The
1273.Fn _umtx_op
1274operations can fail with the following errors:
1275.Bl -tag -width "[ETIMEDOUT]"
1276.It Bq Er EFAULT
1277One of the arguments point to invalid memory.
1278.It Bq Er EINVAL
1279The clock identifier, specified for the
1280.Vt struct _umtx_time
1281timeout parameter, or in the
1282.Dv c_clockid
1283member of
1284.Vt struct ucond,
1285is invalid.
1286.It Bq Er EINVAL
1287The type of the mutex, encoded by the
1288.Dv m_flags
1289member of
1290.Vt struct umutex ,
1291is invalid.
1292.It Bq Er EINVAL
1293The
1294.Dv m_owner
1295member of the
1296.Vt struct umutex
1297has changed the lock owner thread identifier during unlock.
1298.It Bq Er EINVAL
1299The
1300.Dv timeout.tv_sec
1301or
1302.Dv timeout.tv_nsec
1303member of
1304.Vt struct _umtx_time
1305is less than zero, or
1306.Dv timeout.tv_nsec
1307is greater than 1000000000.
1308.It Bq Er EINVAL
1309The
1310.Fa op
1311argument specifies invalid operation.
1312.It Bq Er EINVAL
1313The
1314.Fa uaddr
1315argument for the
1316.Dv UMTX_OP_SHM
1317request specifies invalid operation.
1318.It Bq Er EINVAL
1319The
1320.Dv UMTX_OP_SET_CEILING
1321request specifies non priority protected mutex.
1322.It Bq Er EINVAL
1323The new ceiling value for the
1324.Dv UMTX_OP_SET_CEILING
1325request, or one or more of the values read from the
1326.Dv m_ceilings
1327array during lock or unlock operations, is greater than
1328.Dv RTP_PRIO_MAX .
1329.It Bq Er EPERM
1330Unlock attempted on an object not owned by the current thread.
1331.It Bq Er EOWNERDEAD
1332The lock was requested on an umutex where the
1333.Dv m_owner
1334field was set to the
1335.Dv UMUTEX_RB_OWNERDEAD
1336value, indicating terminated robust mutex.
1337The lock was granted to the caller, so this error in fact
1338indicates success with additional conditions.
1339.It Bq Er ENOTRECOVERABLE
1340The lock was requested on an umutex which
1341.Dv m_owner
1342field is equal to the
1343.Dv UMUTEX_RB_NOTRECOV
1344value, indicating abandoned robust mutex after termination.
1345The lock was not granted to the caller.
1346.It Bq Er ENOTTY
1347The shared memory object, associated with the address passed to the
1348.Dv UMTX_SHM_ALIVE
1349sub-request of
1350.Dv UMTX_OP_SHM
1351request, was destroyed.
1352.It Bq Er ESRCH
1353For the
1354.Dv UMTX_SHM_LOOKUP ,
1355.Dv UMTX_SHM_DESTROY ,
1356and
1357.Dv UMTX_SHM_ALIVE
1358sub-requests of the
1359.Dv UMTX_OP_SHM
1360request, there is no shared memory object associated with the provided key.
1361.It Bq Er ENOMEM
1362The
1363.Dv UMTX_SHM_CREAT
1364sub-request of the
1365.Dv UMTX_OP_SHM
1366request cannot be satisfied, because allocation of the shared memory object
1367would exceed the
1368.Dv RLIMIT_UMTXP
1369resource limit, see
1370.Xr setrlimit 2 .
1371.It Bq Er EAGAIN
1372The maximum number of readers
1373.Dv ( URWLOCK_MAX_READERS )
1374were already granted ownership of the given
1375.Vt struct rwlock
1376for read.
1377.It Bq Er EBUSY
1378A try mutex lock operation was not able to obtain the lock.
1379.It Bq Er ETIMEDOUT
1380The request specified a timeout in the
1381.Fa uaddr
1382and
1383.Fa uaddr2
1384arguments, and timed out before obtaining the lock or being woken up.
1385.It Bq Er EINTR
1386A signal was delivered during wait, for a non-restartable operation.
1387Operations with timeouts are typically non-restartable, but timeouts
1388specified in absolute time may be restartable.
1389.It Bq Er ERESTART
1390A signal was delivered during wait, for a restartable operation.
1391Mutex lock requests without timeout specified are restartable.
1392The error is not returned to userspace code since restart
1393is handled by usual adjustment of the instruction counter.
1394.El
1395.Sh SEE ALSO
1396.Xr clock_gettime 2 ,
1397.Xr mmap 2 ,
1398.Xr setrlimit 2 ,
1399.Xr shm_open 2 ,
1400.Xr sigaction 2 ,
1401.Xr thr_exit 2 ,
1402.Xr thr_kill 2 ,
1403.Xr thr_kill2 2 ,
1404.Xr thr_new 2 ,
1405.Xr thr_self 2 ,
1406.Xr thr_set_name 2 ,
1407.Xr signal 3
1408.Sh STANDARDS
1409The
1410.Fn _umtx_op
1411system call is non-standard and is used by the
1412.Lb libthr
1413to implement
1414.St -p1003.1-2001
1415.Xr pthread 3
1416functionality.
1417.Sh BUGS
1418A window between a unlocking robust mutex and resetting the pointer in the
1419.Dv robust_inact_offset
1420member of the registered
1421.Vt struct umtx_robust_lists_params
1422allows another thread to destroy the mutex, thus making the kernel inspect
1423freed or reused memory.
1424The
1425.Li libthr
1426implementation is only vulnerable to this race when operating on
1427a shared mutex.
1428A possible fix for the current implementation is to strengthen the checks
1429for shared mutexes before terminating them, in particular, verifying
1430that the mutex memory is mapped from a shared memory object allocated
1431by the
1432.Dv UMTX_OP_SHM
1433request.
1434This is not done because it is believed that the race is adequately
1435covered by other consistency checks, while adding the check would
1436prevent alternative implementations of
1437.Li libpthread .
1438