xref: /freebsd-13.1/lib/libc/sys/_umtx_op.2 (revision f6d234d8)
1.\" Copyright (c) 2016 The FreeBSD Foundation, Inc.
2.\" All rights reserved.
3.\"
4.\" This documentation was written by
5.\" Konstantin Belousov <[email protected]> under sponsorship
6.\" from the FreeBSD Foundation.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that the following conditions
10.\" are met:
11.\" 1. Redistributions of source code must retain the above copyright
12.\"    notice, this list of conditions and the following disclaimer.
13.\" 2. Redistributions in binary form must reproduce the above copyright
14.\"    notice, this list of conditions and the following disclaimer in the
15.\"    documentation and/or other materials provided with the distribution.
16.\"
17.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
18.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
21.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
27.\" SUCH DAMAGE.
28.\"
29.\" $FreeBSD$
30.\"
31.Dd November 23, 2020
32.Dt _UMTX_OP 2
33.Os
34.Sh NAME
35.Nm _umtx_op
36.Nd interface for implementation of userspace threading synchronization primitives
37.Sh LIBRARY
38.Lb libc
39.Sh SYNOPSIS
40.In sys/types.h
41.In sys/umtx.h
42.Ft int
43.Fn _umtx_op "void *obj" "int op" "u_long val" "void *uaddr" "void *uaddr2"
44.Sh DESCRIPTION
45The
46.Fn _umtx_op
47system call provides kernel support for userspace implementation of
48the threading synchronization primitives.
49The
50.Lb libthr
51uses the syscall to implement
52.St -p1003.1-2001
53pthread locks, like mutexes, condition variables and so on.
54.Ss STRUCTURES
55The operations, performed by the
56.Fn _umtx_op
57syscall, operate on userspace objects which are described
58by the following structures.
59Reserved fields and paddings are omitted.
60All objects require ABI-mandated alignment, but this is not currently
61enforced consistently on all architectures.
62.Pp
63The following flags are defined for flag fields of all structures:
64.Bl -tag -width indent
65.It Dv USYNC_PROCESS_SHARED
66Allow selection of the process-shared sleep queue for the thread sleep
67container, when the lock ownership cannot be granted immediately,
68and the operation must sleep.
69The process-shared or process-private sleep queue is selected based on
70the attributes of the memory mapping which contains the first byte of
71the structure, see
72.Xr mmap 2 .
73Otherwise, if the flag is not specified, the process-private sleep queue
74is selected regardless of the memory mapping attributes, as an optimization.
75.Pp
76See the
77.Sx SLEEP QUEUES
78subsection below for more details on sleep queues.
79.El
80.Bl -hang -offset indent
81.It Sy Mutex
82.Bd -literal
83struct umutex {
84	volatile lwpid_t m_owner;
85	uint32_t         m_flags;
86	uint32_t         m_ceilings[2];
87	uintptr_t        m_rb_lnk;
88};
89.Ed
90.Pp
91The
92.Dv m_owner
93field is the actual lock.
94It contains either the thread identifier of the lock owner in the
95locked state, or zero when the lock is unowned.
96The highest bit set indicates that there is contention on the lock.
97The constants are defined for special values:
98.Bl -tag -width indent
99.It Dv UMUTEX_UNOWNED
100Zero, the value stored in the unowned lock.
101.It Dv UMUTEX_CONTESTED
102The contention indicator.
103.It Dv UMUTEX_RB_OWNERDEAD
104A thread owning the robust mutex terminated.
105The mutex is in unlocked state.
106.It Dv UMUTEX_RB_NOTRECOV
107The robust mutex is in a non-recoverable state.
108It cannot be locked until reinitialized.
109.El
110.Pp
111The
112.Dv m_flags
113field may contain the following umutex-specific flags, in addition to
114the common flags:
115.Bl -tag -width indent
116.It Dv UMUTEX_PRIO_INHERIT
117Mutex implements
118.Em Priority Inheritance
119protocol.
120.It Dv UMUTEX_PRIO_PROTECT
121Mutex implements
122.Em Priority Protection
123protocol.
124.It Dv UMUTEX_ROBUST
125Mutex is robust, as described in the
126.Sx ROBUST UMUTEXES
127section below.
128.It Dv UMUTEX_NONCONSISTENT
129Robust mutex is in a transient non-consistent state.
130Not used by kernel.
131.El
132.Pp
133In the manual page, mutexes not having
134.Dv UMUTEX_PRIO_INHERIT
135and
136.Dv UMUTEX_PRIO_PROTECT
137flags set, are called normal mutexes.
138Each type of mutex
139.Pq normal, priority-inherited, and priority-protected
140has a separate sleep queue associated
141with the given key.
142.Pp
143For priority protected mutexes, the
144.Dv m_ceilings
145array contains priority ceiling values.
146The
147.Dv m_ceilings[0]
148is the ceiling value for the mutex, as specified by
149.St -p1003.1-2008
150for the
151.Em Priority Protected
152mutex protocol.
153The
154.Dv m_ceilings[1]
155is used only for the unlock of a priority protected mutex, when
156unlock is done in an order other than the reversed lock order.
157In this case,
158.Dv m_ceilings[1]
159must contain the ceiling value for the last locked priority protected
160mutex, for proper priority reassignment.
161If, instead, the unlocking mutex was the last priority propagated
162mutex locked by the thread,
163.Dv m_ceilings[1]
164should contain \-1.
165This is required because kernel does not maintain the ordered lock list.
166.It Sy Condition variable
167.Bd -literal
168struct ucond {
169	volatile uint32_t c_has_waiters;
170	uint32_t          c_flags;
171	uint32_t          c_clockid;
172};
173.Ed
174.Pp
175A non-zero
176.Dv c_has_waiters
177value indicates that there are in-kernel waiters for the condition,
178executing the
179.Dv UMTX_OP_CV_WAIT
180request.
181.Pp
182The
183.Dv c_flags
184field contains flags.
185Only the common flags
186.Pq Dv USYNC_PROCESS_SHARED
187are defined for ucond.
188.Pp
189The
190.Dv c_clockid
191member provides the clock identifier to use for timeout, when the
192.Dv UMTX_OP_CV_WAIT
193request has both the
194.Dv CVWAIT_CLOCKID
195flag and the timeout specified.
196Valid clock identifiers are a subset of those for
197.Xr clock_gettime 2 :
198.Bl -bullet -compact
199.It
200.Dv CLOCK_MONOTONIC
201.It
202.Dv CLOCK_MONOTONIC_FAST
203.It
204.Dv CLOCK_MONOTONIC_PRECISE
205.It
206.Dv CLOCK_PROF
207.It
208.Dv CLOCK_REALTIME
209.It
210.Dv CLOCK_REALTIME_FAST
211.It
212.Dv CLOCK_REALTIME_PRECISE
213.It
214.Dv CLOCK_SECOND
215.It
216.Dv CLOCK_UPTIME
217.It
218.Dv CLOCK_UPTIME_FAST
219.It
220.Dv CLOCK_UPTIME_PRECISE
221.It
222.Dv CLOCK_VIRTUAL
223.El
224.It Sy Reader/writer lock
225.Bd -literal
226struct urwlock {
227	volatile int32_t rw_state;
228	uint32_t         rw_flags;
229	uint32_t         rw_blocked_readers;
230	uint32_t         rw_blocked_writers;
231};
232.Ed
233.Pp
234The
235.Dv rw_state
236field is the actual lock.
237It contains both the flags and counter of the read locks which were
238granted.
239Names of the
240.Dv rw_state
241bits are following:
242.Bl -tag -width indent
243.It Dv URWLOCK_WRITE_OWNER
244Write lock was granted.
245.It Dv URWLOCK_WRITE_WAITERS
246There are write lock waiters.
247.It Dv URWLOCK_READ_WAITERS
248There are read lock waiters.
249.It Dv URWLOCK_READER_COUNT(c)
250Returns the count of currently granted read locks.
251.El
252.Pp
253At any given time there may be only one thread to which the writer lock
254is granted on the
255.Vt struct rwlock ,
256and no threads are granted read lock.
257Or, at the given time, up to
258.Dv URWLOCK_MAX_READERS
259threads may be granted the read lock simultaneously, but write lock is
260not granted to any thread.
261.Pp
262The following flags for the
263.Dv rw_flags
264member of
265.Vt struct urwlock
266are defined, in addition to the common flags:
267.Bl -tag -width indent
268.It Dv URWLOCK_PREFER_READER
269If specified, immediately grant read lock requests when
270.Dv urwlock
271is already read-locked, even in presence of unsatisfied write
272lock requests.
273By default, if there is a write lock waiter, further read requests are
274not granted, to prevent unfair write lock waiter starvation.
275.El
276.Pp
277The
278.Dv rw_blocked_readers
279and
280.Dv rw_blocked_writers
281members contain the count of threads which are sleeping in kernel,
282waiting for the associated request type to be granted.
283The fields are used by kernel to update the
284.Dv URWLOCK_READ_WAITERS
285and
286.Dv URWLOCK_WRITE_WAITERS
287flags of the
288.Dv rw_state
289lock after requesting thread was woken up.
290.It Sy Semaphore
291.Bd -literal
292struct _usem2 {
293	volatile uint32_t _count;
294	uint32_t          _flags;
295};
296.Ed
297.Pp
298The
299.Dv _count
300word represents a counting semaphore.
301A non-zero value indicates an unlocked (posted) semaphore, while zero
302represents the locked state.
303The maximal supported semaphore count is
304.Dv USEM_MAX_COUNT .
305.Pp
306The
307.Dv _count
308word, besides the counter of posts (unlocks), also contains the
309.Dv USEM_HAS_WAITERS
310bit, which indicates that locked semaphore has waiting threads.
311.Pp
312The
313.Dv USEM_COUNT()
314macro, applied to the
315.Dv _count
316word, returns the current semaphore counter, which is the number of posts
317issued on the semaphore.
318.Pp
319The following bits for the
320.Dv _flags
321member of
322.Vt struct _usem2
323are defined, in addition to the common flags:
324.Bl -tag -width indent
325.It Dv USEM_NAMED
326Flag is ignored by kernel.
327.El
328.It Sy Timeout parameter
329.Bd -literal
330struct _umtx_time {
331	struct timespec _timeout;
332	uint32_t        _flags;
333	uint32_t        _clockid;
334};
335.Ed
336.Pp
337Several
338.Fn _umtx_op
339operations allow the blocking time to be limited, failing the request
340if it cannot be satisfied in the specified time period.
341The timeout is specified by passing either the address of
342.Vt struct timespec ,
343or its extended variant,
344.Vt struct _umtx_time ,
345as the
346.Fa uaddr2
347argument of
348.Fn _umtx_op .
349They are distinguished by the
350.Fa uaddr
351value, which must be equal to the size of the structure pointed to by
352.Fa uaddr2 ,
353casted to
354.Vt uintptr_t .
355.Pp
356The
357.Dv _timeout
358member specifies the time when the timeout should occur.
359Legal values for clock identifier
360.Dv _clockid
361are shared with the
362.Fa clock_id
363argument to the
364.Xr clock_gettime 2
365function,
366and use the same underlying clocks.
367The specified clock is used to obtain the current time value.
368Interval counting is always performed by the monotonic wall clock.
369.Pp
370The
371.Dv _flags
372argument allows the following flags to further define the timeout behaviour:
373.Bl -tag -width indent
374.It Dv UMTX_ABSTIME
375The
376.Dv _timeout
377value is the absolute time.
378The thread will be unblocked and the request failed when specified
379clock value is equal or exceeds the
380.Dv _timeout.
381.Pp
382If the flag is absent, the timeout value is relative, that is the amount
383of time, measured by the monotonic wall clock from the moment of the request
384start.
385.El
386.El
387.Ss SLEEP QUEUES
388When a locking request cannot be immediately satisfied, the thread is
389typically put to
390.Em sleep ,
391which is a non-runnable state terminated by the
392.Em wake
393operation.
394Lock operations include a
395.Em try
396variant which returns an error rather than sleeping if the lock cannot
397be obtained.
398Also,
399.Fn _umtx_op
400provides requests which explicitly put the thread to sleep.
401.Pp
402Wakes need to know which threads to make runnable, so sleeping threads
403are grouped into containers called
404.Em sleep queues .
405A sleep queue is identified by a key, which for
406.Fn _umtx_op
407is defined as the physical address of some variable.
408Note that the
409.Em physical
410address is used, which means that same variable mapped multiple
411times will give one key value.
412This mechanism enables the construction of
413.Em process-shared
414locks.
415.Pp
416A related attribute of the key is shareability.
417Some requests always interpret keys as private for the current process,
418creating sleep queues with the scope of the current process even if
419the memory is shared.
420Others either select the shareability automatically from the
421mapping attributes, or take additional input as the
422.Dv USYNC_PROCESS_SHARED
423common flag.
424This is done as optimization, allowing the lock scope to be limited
425regardless of the kind of backing memory.
426.Pp
427Only the address of the start byte of the variable specified as key is
428important for determining corresponding sleep queue.
429The size of the variable does not matter, so, for example, sleep on the same
430address interpeted as
431.Vt uint32_t
432and
433.Vt long
434on a little-endian 64-bit platform would collide.
435.Pp
436The last attribute of the key is the object type.
437The sleep queue to which a sleeping thread is assigned is an individual
438one for simple wait requests, mutexes, rwlocks, condvars and other
439primitives, even when the physical address of the key is same.
440.Pp
441When waking up a limited number of threads from a given sleep queue,
442the highest priority threads that have been blocked for the longest on
443the queue are selected.
444.Ss ROBUST UMUTEXES
445The
446.Em robust umutexes
447are provided as a substrate for a userspace library to implement
448.Tn POSIX
449robust mutexes.
450A robust umutex must have the
451.Dv UMUTEX_ROBUST
452flag set.
453.Pp
454On thread termination, the kernel walks two lists of mutexes.
455The two lists head addresses must be provided by a prior call to
456.Dv UMTX_OP_ROBUST_LISTS
457request.
458The lists are singly-linked.
459The link to next element is provided by the
460.Dv m_rb_lnk
461member of the
462.Vt struct umutex .
463.Pp
464Robust list processing is aborted if the kernel finds a mutex
465with any of the following conditions:
466.Bl -dash -offset indent -compact
467.It
468the
469.Dv UMUTEX_ROBUST
470flag is not set
471.It
472not owned by the current thread, except when the mutex is pointed to
473by the
474.Dv robust_inactive
475member of the
476.Vt struct umtx_robust_lists_params ,
477registered for the current thread
478.It
479the combination of mutex flags is invalid
480.It
481read of the umutex memory faults
482.It
483the list length limit described in
484.Xr libthr 3
485is reached.
486.El
487.Pp
488Every mutex in both lists is unlocked as if the
489.Dv UMTX_OP_MUTEX_UNLOCK
490request is performed on it, but instead of the
491.Dv UMUTEX_UNOWNED
492value, the
493.Dv m_owner
494field is written with the
495.Dv UMUTEX_RB_OWNERDEAD
496value.
497When a mutex in the
498.Dv UMUTEX_RB_OWNERDEAD
499state is locked by kernel due to the
500.Dv UMTX_OP_MUTEX_TRYLOCK
501and
502.Dv UMTX_OP_MUTEX_LOCK
503requests, the lock is granted and
504.Er EOWNERDEAD
505error is returned.
506.Pp
507Also, the kernel handles the
508.Dv UMUTEX_RB_NOTRECOV
509value of
510.Dv the m_owner
511field specially, always returning the
512.Er ENOTRECOVERABLE
513error for lock attempts, without granting the lock.
514.Ss OPERATIONS
515The following operations, requested by the
516.Fa op
517argument to the function, are implemented:
518.Bl -tag -width indent
519.It Dv UMTX_OP_WAIT
520Wait.
521The arguments for the request are:
522.Bl -tag -width "obj"
523.It Fa obj
524Pointer to a variable of type
525.Vt long .
526.It Fa val
527Current value of the
528.Dv *obj .
529.El
530.Pp
531The current value of the variable pointed to by the
532.Fa obj
533argument is compared with the
534.Fa val .
535If they are equal, the requesting thread is put to interruptible sleep
536until woken up or the optionally specified timeout expires.
537.Pp
538The comparison and sleep are atomic.
539In other words, if another thread writes a new value to
540.Dv *obj
541and then issues
542.Dv UMTX_OP_WAKE ,
543the request is guaranteed to not miss the wakeup,
544which might otherwise happen between comparison and blocking.
545.Pp
546The physical address of memory where the
547.Fa *obj
548variable is located, is used as a key to index sleeping threads.
549.Pp
550The read of the current value of the
551.Dv *obj
552variable is not guarded by barriers.
553In particular, it is the user's duty to ensure the lock acquire
554and release memory semantics, if the
555.Dv UMTX_OP_WAIT
556and
557.Dv UMTX_OP_WAKE
558requests are used as a substrate for implementing a simple lock.
559.Pp
560The request is not restartable.
561An unblocked signal delivered during the wait always results in sleep
562interruption and
563.Er EINTR
564error.
565.Pp
566Optionally, a timeout for the request may be specified.
567.It Dv UMTX_OP_WAKE
568Wake the threads possibly sleeping due to
569.Dv UMTX_OP_WAIT .
570The arguments for the request are:
571.Bl -tag -width "obj"
572.It Fa obj
573Pointer to a variable, used as a key to find sleeping threads.
574.It Fa val
575Up to
576.Fa val
577threads are woken up by this request.
578Specify
579.Dv INT_MAX
580to wake up all waiters.
581.El
582.It Dv UMTX_OP_MUTEX_TRYLOCK
583Try to lock umutex.
584The arguments to the request are:
585.Bl -tag -width "obj"
586.It Fa obj
587Pointer to the umutex.
588.El
589.Pp
590Operates same as the
591.Dv UMTX_OP_MUTEX_LOCK
592request, but returns
593.Er EBUSY
594instead of sleeping if the lock cannot be obtained immediately.
595.It Dv UMTX_OP_MUTEX_LOCK
596Lock umutex.
597The arguments to the request are:
598.Bl -tag -width "obj"
599.It Fa obj
600Pointer to the umutex.
601.El
602.Pp
603Locking is performed by writing the current thread id into the
604.Dv m_owner
605word of the
606.Vt struct umutex .
607The write is atomic, preserves the
608.Dv UMUTEX_CONTESTED
609contention indicator, and provides the acquire barrier for
610lock entrance semantic.
611.Pp
612If the lock cannot be obtained immediately because another thread owns
613the lock, the current thread is put to sleep, with
614.Dv UMUTEX_CONTESTED
615bit set before.
616Upon wake up, the lock conditions are re-tested.
617.Pp
618The request adheres to the priority protection or inheritance protocol
619of the mutex, specified by the
620.Dv UMUTEX_PRIO_PROTECT
621or
622.Dv UMUTEX_PRIO_INHERIT
623flag, respectively.
624.Pp
625Optionally, a timeout for the request may be specified.
626.Pp
627A request with a timeout specified is not restartable.
628An unblocked signal delivered during the wait always results in sleep
629interruption and
630.Er EINTR
631error.
632A request without timeout specified is always restarted after return
633from a signal handler.
634.It Dv UMTX_OP_MUTEX_UNLOCK
635Unlock umutex.
636The arguments to the request are:
637.Bl -tag -width "obj"
638.It Fa obj
639Pointer to the umutex.
640.El
641.Pp
642Unlocks the mutex, by writing
643.Dv UMUTEX_UNOWNED
644(zero) value into
645.Dv m_owner
646word of the
647.Vt struct umutex .
648The write is done with a release barrier, to provide lock leave semantic.
649.Pp
650If there are threads sleeping in the sleep queue associated with the
651umutex, one thread is woken up.
652If more than one thread sleeps in the sleep queue, the
653.Dv UMUTEX_CONTESTED
654bit is set together with the write of the
655.Dv UMUTEX_UNOWNED
656value into
657.Dv m_owner .
658.Pp
659The request adheres to the priority protection or inheritance protocol
660of the mutex, specified by the
661.Dv UMUTEX_PRIO_PROTECT
662or
663.Dv UMUTEX_PRIO_INHERIT
664flag, respectively.
665See description of the
666.Dv m_ceilings
667member of the
668.Vt struct umutex
669structure for additional details of the request operation on the
670priority protected protocol mutex.
671.It Dv UMTX_OP_SET_CEILING
672Set ceiling for the priority protected umutex.
673The arguments to the request are:
674.Bl -tag -width "uaddr"
675.It Fa obj
676Pointer to the umutex.
677.It Fa val
678New ceiling value.
679.It Fa uaddr
680Address of a variable of type
681.Vt uint32_t .
682If not
683.Dv NULL
684and the update was successful, the previous ceiling value is
685written to the location pointed to by
686.Fa uaddr .
687.El
688.Pp
689The request locks the umutex pointed to by the
690.Fa obj
691parameter, waiting for the lock if not immediately available.
692After the lock is obtained, the new ceiling value
693.Fa val
694is written to the
695.Dv m_ceilings[0]
696member of the
697.Vt struct umutex,
698after which the umutex is unlocked.
699.Pp
700The locking does not adhere to the priority protect protocol,
701to conform to the
702.Tn POSIX
703requirements for the
704.Xr pthread_mutex_setprioceiling 3
705interface.
706.It Dv UMTX_OP_CV_WAIT
707Wait for a condition.
708The arguments to the request are:
709.Bl -tag -width "uaddr2"
710.It Fa obj
711Pointer to the
712.Vt struct ucond .
713.It Fa val
714Request flags, see below.
715.It Fa uaddr
716Pointer to the umutex.
717.It Fa uaddr2
718Optional pointer to a
719.Vt struct timespec
720for timeout specification.
721.El
722.Pp
723The request must be issued by the thread owning the mutex pointed to
724by the
725.Fa uaddr
726argument.
727The
728.Dv c_hash_waiters
729member of the
730.Vt struct ucond ,
731pointed to by the
732.Fa obj
733argument, is set to an arbitrary non-zero value, after which the
734.Fa uaddr
735mutex is unlocked (following the appropriate protocol), and
736the current thread is put to sleep on the sleep queue keyed by
737the
738.Fa obj
739argument.
740The operations are performed atomically.
741It is guaranteed to not miss a wakeup from
742.Dv UMTX_OP_CV_SIGNAL
743or
744.Dv UMTX_OP_CV_BROADCAST
745sent between mutex unlock and putting the current thread on the sleep queue.
746.Pp
747Upon wakeup, if the timeout expired and no other threads are sleeping in
748the same sleep queue, the
749.Dv c_hash_waiters
750member is cleared.
751After wakeup, the
752.Fa uaddr
753umutex is not relocked.
754.Pp
755The following flags are defined:
756.Bl -tag -width "CVWAIT_CLOCKID"
757.It Dv CVWAIT_ABSTIME
758Timeout is absolute.
759.It Dv CVWAIT_CLOCKID
760Clockid is provided.
761.El
762.Pp
763Optionally, a timeout for the request may be specified.
764Unlike other requests, the timeout value is specified directly by a
765.Vt struct timespec ,
766pointed to by the
767.Fa uaddr2
768argument.
769If the
770.Dv CVWAIT_CLOCKID
771flag is provided, the timeout uses the clock from the
772.Dv c_clockid
773member of the
774.Vt struct ucond ,
775pointed to by
776.Fa obj
777argument.
778Otherwise,
779.Dv CLOCK_REALTIME
780is used, regardless of the clock identifier possibly specified in the
781.Vt struct _umtx_time .
782If the
783.Dv CVWAIT_ABSTIME
784flag is supplied, the timeout specifies absolute time value, otherwise
785it denotes a relative time interval.
786.Pp
787The request is not restartable.
788An unblocked signal delivered during
789the wait always results in sleep interruption and
790.Er EINTR
791error.
792.It Dv UMTX_OP_CV_SIGNAL
793Wake up one condition waiter.
794The arguments to the request are:
795.Bl -tag -width "obj"
796.It Fa obj
797Pointer to
798.Vt struct ucond .
799.El
800.Pp
801The request wakes up at most one thread sleeping on the sleep queue keyed
802by the
803.Fa obj
804argument.
805If the woken up thread was the last on the sleep queue, the
806.Dv c_has_waiters
807member of the
808.Vt struct ucond
809is cleared.
810.It Dv UMTX_OP_CV_BROADCAST
811Wake up all condition waiters.
812The arguments to the request are:
813.Bl -tag -width "obj"
814.It Fa obj
815Pointer to
816.Vt struct ucond .
817.El
818.Pp
819The request wakes up all threads sleeping on the sleep queue keyed by the
820.Fa obj
821argument.
822The
823.Dv c_has_waiters
824member of the
825.Vt struct ucond
826is cleared.
827.It Dv UMTX_OP_WAIT_UINT
828Same as
829.Dv UMTX_OP_WAIT ,
830but the type of the variable pointed to by
831.Fa obj
832is
833.Vt u_int
834.Pq a 32-bit integer .
835.It Dv UMTX_OP_RW_RDLOCK
836Read-lock a
837.Vt struct rwlock
838lock.
839The arguments to the request are:
840.Bl -tag -width "obj"
841.It Fa obj
842Pointer to the lock (of type
843.Vt struct rwlock )
844to be read-locked.
845.It Fa val
846Additional flags to augment locking behaviour.
847The valid flags in the
848.Fa val
849argument are:
850.Bl -tag -width indent
851.It Dv URWLOCK_PREFER_READER
852.El
853.El
854.Pp
855The request obtains the read lock on the specified
856.Vt struct rwlock
857by incrementing the count of readers in the
858.Dv rw_state
859word of the structure.
860If the
861.Dv URWLOCK_WRITE_OWNER
862bit is set in the word
863.Dv rw_state ,
864the lock was granted to a writer which has not yet relinquished
865its ownership.
866In this case the current thread is put to sleep until it makes sense to
867retry.
868.Pp
869If the
870.Dv URWLOCK_PREFER_READER
871flag is set either in the
872.Dv rw_flags
873word of the structure, or in the
874.Fa val
875argument of the request, the presence of the threads trying to obtain
876the write lock on the same structure does not prevent the current thread
877from trying to obtain the read lock.
878Otherwise, if the flag is not set, and the
879.Dv URWLOCK_WRITE_WAITERS
880flag is set in
881.Dv rw_state ,
882the current thread does not attempt to obtain read-lock.
883Instead it sets the
884.Dv URWLOCK_READ_WAITERS
885in the
886.Dv rw_state
887word and puts itself to sleep on corresponding sleep queue.
888Upon wakeup, the locking conditions are re-evaluated.
889.Pp
890Optionally, a timeout for the request may be specified.
891.Pp
892The request is not restartable.
893An unblocked signal delivered during the wait always results in sleep
894interruption and
895.Er EINTR
896error.
897.It Dv UMTX_OP_RW_WRLOCK
898Write-lock a
899.Vt struct rwlock
900lock.
901The arguments to the request are:
902.Bl -tag -width "obj"
903.It Fa obj
904Pointer to the lock (of type
905.Vt struct rwlock )
906to be write-locked.
907.El
908.Pp
909The request obtains a write lock on the specified
910.Vt struct rwlock ,
911by setting the
912.Dv URWLOCK_WRITE_OWNER
913bit in the
914.Dv rw_state
915word of the structure.
916If there is already a write lock owner, as indicated by the
917.Dv URWLOCK_WRITE_OWNER
918bit being set, or there are read lock owners, as indicated
919by the read-lock counter, the current thread does not attempt to
920obtain the write-lock.
921Instead it sets the
922.Dv URWLOCK_WRITE_WAITERS
923in the
924.Dv rw_state
925word and puts itself to sleep on corresponding sleep queue.
926Upon wakeup, the locking conditions are re-evaluated.
927.Pp
928Optionally, a timeout for the request may be specified.
929.Pp
930The request is not restartable.
931An unblocked signal delivered during the wait always results in sleep
932interruption and
933.Er EINTR
934error.
935.It Dv UMTX_OP_RW_UNLOCK
936Unlock rwlock.
937The arguments to the request are:
938.Bl -tag -width "obj"
939.It Fa obj
940Pointer to the lock (of type
941.Vt struct rwlock )
942to be unlocked.
943.El
944.Pp
945The unlock type (read or write) is determined by the
946current lock state.
947Note that the
948.Vt struct rwlock
949does not save information about the identity of the thread which
950acquired the lock.
951.Pp
952If there are pending writers after the unlock, and the
953.Dv URWLOCK_PREFER_READER
954flag is not set in the
955.Dv rw_flags
956member of the
957.Fa *obj
958structure, one writer is woken up, selected as described in the
959.Sx SLEEP QUEUES
960subsection.
961If the
962.Dv URWLOCK_PREFER_READER
963flag is set, a pending writer is woken up only if there is
964no pending readers.
965.Pp
966If there are no pending writers, or, in the case that the
967.Dv URWLOCK_PREFER_READER
968flag is set, then all pending readers are woken up by unlock.
969.It Dv UMTX_OP_WAIT_UINT_PRIVATE
970Same as
971.Dv UMTX_OP_WAIT_UINT ,
972but unconditionally select the process-private sleep queue.
973.It Dv UMTX_OP_WAKE_PRIVATE
974Same as
975.Dv UMTX_OP_WAKE ,
976but unconditionally select the process-private sleep queue.
977.It Dv UMTX_OP_MUTEX_WAIT
978Wait for mutex availability.
979The arguments to the request are:
980.Bl -tag -width "obj"
981.It Fa obj
982Address of the mutex.
983.El
984.Pp
985Similarly to the
986.Dv UMTX_OP_MUTEX_LOCK ,
987put the requesting thread to sleep if the mutex lock cannot be obtained
988immediately.
989The
990.Dv UMUTEX_CONTESTED
991bit is set in the
992.Dv m_owner
993word of the mutex to indicate that there is a waiter, before the thread
994is added to the sleep queue.
995Unlike the
996.Dv UMTX_OP_MUTEX_LOCK
997request, the lock is not obtained.
998.Pp
999The operation is not implemented for priority protected and
1000priority inherited protocol mutexes.
1001.Pp
1002Optionally, a timeout for the request may be specified.
1003.Pp
1004A request with a timeout specified is not restartable.
1005An unblocked signal delivered during the wait always results in sleep
1006interruption and
1007.Er EINTR
1008error.
1009A request without a timeout automatically restarts if the signal disposition
1010requested restart via the
1011.Dv SA_RESTART
1012flag in
1013.Vt struct sigaction
1014member
1015.Dv sa_flags .
1016.It Dv UMTX_OP_NWAKE_PRIVATE
1017Wake up a batch of sleeping threads.
1018The arguments to the request are:
1019.Bl -tag -width "obj"
1020.It Fa obj
1021Pointer to the array of pointers.
1022.It Fa val
1023Number of elements in the array pointed to by
1024.Fa obj .
1025.El
1026.Pp
1027For each element in the array pointed to by
1028.Fa obj ,
1029wakes up all threads waiting on the
1030.Em private
1031sleep queue with the key
1032being the byte addressed by the array element.
1033.It Dv UMTX_OP_MUTEX_WAKE
1034Check if a normal umutex is unlocked and wake up a waiter.
1035The arguments for the request are:
1036.Bl -tag -width "obj"
1037.It Fa obj
1038Pointer to the umutex.
1039.El
1040.Pp
1041If the
1042.Dv m_owner
1043word of the mutex pointed to by the
1044.Fa obj
1045argument indicates unowned mutex, which has its contention indicator bit
1046.Dv UMUTEX_CONTESTED
1047set, clear the bit and wake up one waiter in the sleep queue associated
1048with the byte addressed by the
1049.Fa obj ,
1050if any.
1051Only normal mutexes are supported by the request.
1052The sleep queue is always one for a normal mutex type.
1053.Pp
1054This request is deprecated in favor of
1055.Dv UMTX_OP_MUTEX_WAKE2
1056since mutexes using it cannot synchronize their own destruction.
1057That is, the
1058.Dv m_owner
1059word has already been set to
1060.Dv UMUTEX_UNOWNED
1061when this request is made,
1062so that another thread can lock, unlock and destroy the mutex
1063(if no other thread uses the mutex afterwards).
1064Clearing the
1065.Dv UMUTEX_CONTESTED
1066bit may then modify freed memory.
1067.It Dv UMTX_OP_MUTEX_WAKE2
1068Check if a umutex is unlocked and wake up a waiter.
1069The arguments for the request are:
1070.Bl -tag -width "obj"
1071.It Fa obj
1072Pointer to the umutex.
1073.It Fa val
1074The umutex flags.
1075.El
1076.Pp
1077The request does not read the
1078.Dv m_flags
1079member of the
1080.Vt struct umutex ;
1081instead, the
1082.Fa val
1083argument supplies flag information, in particular, to determine the
1084sleep queue where the waiters are found for wake up.
1085.Pp
1086If the mutex is unowned, one waiter is woken up.
1087.Pp
1088If the mutex memory cannot be accessed, all waiters are woken up.
1089.Pp
1090If there is more than one waiter on the sleep queue, or there is only
1091one waiter but the mutex is owned by a thread, the
1092.Dv UMUTEX_CONTESTED
1093bit is set in the
1094.Dv m_owner
1095word of the
1096.Vt struct umutex .
1097.It Dv UMTX_OP_SEM2_WAIT
1098Wait until semaphore is available.
1099The arguments to the request are:
1100.Bl -tag -width "obj"
1101.It Fa obj
1102Pointer to the semaphore (of type
1103.Vt struct _usem2 ) .
1104.It Fa uaddr
1105Size of the memory passed in via the
1106.Fa uaddr2
1107argument.
1108.It Fa uaddr2
1109Optional pointer to a structure of type
1110.Vt struct _umtx_time ,
1111which may be followed by a structure of type
1112.Vt struct timespec .
1113.El
1114.Pp
1115Put the requesting thread onto a sleep queue if the semaphore counter
1116is zero.
1117If the thread is put to sleep, the
1118.Dv USEM_HAS_WAITERS
1119bit is set in the
1120.Dv _count
1121word to indicate waiters.
1122The function returns either due to
1123.Dv _count
1124indicating the semaphore is available (non-zero count due to post),
1125or due to a wakeup.
1126The return does not guarantee that the semaphore is available,
1127nor does it consume the semaphore lock on successful return.
1128.Pp
1129Optionally, a timeout for the request may be specified.
1130.Pp
1131A request with non-absolute timeout value is not restartable.
1132An unblocked signal delivered during such wait results in sleep
1133interruption and
1134.Er EINTR
1135error.
1136.Pp
1137If
1138.Dv UMTX_ABSTIME
1139was not set, and the operation was interrupted and the caller passed in a
1140.Fa uaddr2
1141large enough to hold a
1142.Vt struct timespec
1143following the initial
1144.Vt struct _umtx_time ,
1145then the
1146.Vt struct timespec
1147is updated to contain the unslept amount.
1148.It Dv UMTX_OP_SEM2_WAKE
1149Wake up waiters on semaphore lock.
1150The arguments to the request are:
1151.Bl -tag -width "obj"
1152.It Fa obj
1153Pointer to the semaphore (of type
1154.Vt struct _usem2 ) .
1155.El
1156.Pp
1157The request wakes up one waiter for the semaphore lock.
1158The function does not increment the semaphore lock count.
1159If the
1160.Dv USEM_HAS_WAITERS
1161bit was set in the
1162.Dv _count
1163word, and the last sleeping thread was woken up, the bit is cleared.
1164.It Dv UMTX_OP_SHM
1165Manage anonymous
1166.Tn POSIX
1167shared memory objects (see
1168.Xr shm_open 2 ) ,
1169which can be attached to a byte of physical memory, mapped into the
1170process address space.
1171The objects are used to implement process-shared locks in
1172.Dv libthr .
1173.Pp
1174The
1175.Fa val
1176argument specifies the sub-request of the
1177.Dv UMTX_OP_SHM
1178request:
1179.Bl -tag -width indent
1180.It Dv UMTX_SHM_CREAT
1181Creates the anonymous shared memory object, which can be looked up
1182with the specified key
1183.Fa uaddr .
1184If the object associated with the
1185.Fa uaddr
1186key already exists, it is returned instead of creating a new object.
1187The object's size is one page.
1188On success, the file descriptor referencing the object is returned.
1189The descriptor can be used for mapping the object using
1190.Xr mmap 2 ,
1191or for other shared memory operations.
1192.It Dv UMTX_SHM_LOOKUP
1193Same as
1194.Dv UMTX_SHM_CREATE
1195request, but if there is no shared memory object associated with
1196the specified key
1197.Fa uaddr ,
1198an error is returned, and no new object is created.
1199.It Dv UMTX_SHM_DESTROY
1200De-associate the shared object with the specified key
1201.Fa uaddr .
1202The object is destroyed after the last open file descriptor is closed
1203and the last mapping for it is destroyed.
1204.It Dv UMTX_SHM_ALIVE
1205Checks whether there is a live shared object associated with the
1206supplied key
1207.Fa uaddr .
1208Returns zero if there is, and an error otherwise.
1209This request is an optimization of the
1210.Dv UMTX_SHM_LOOKUP
1211request.
1212It is cheaper when only the liveness of the associated object is asked
1213for, since no file descriptor is installed in the process fd table
1214on success.
1215.El
1216.Pp
1217The
1218.Fa uaddr
1219argument specifies the virtual address, which backing physical memory
1220byte identity is used as a key for the anonymous shared object
1221creation or lookup.
1222.It Dv UMTX_OP_ROBUST_LISTS
1223Register the list heads for the current thread's robust mutex lists.
1224The arguments to the request are:
1225.Bl -tag -width "uaddr"
1226.It Fa val
1227Size of the structure passed in the
1228.Fa uaddr
1229argument.
1230.It Fa uaddr
1231Pointer to the structure of type
1232.Vt struct umtx_robust_lists_params .
1233.El
1234.Pp
1235The structure is defined as
1236.Bd -literal
1237struct umtx_robust_lists_params {
1238	uintptr_t	robust_list_offset;
1239	uintptr_t	robust_priv_list_offset;
1240	uintptr_t	robust_inact_offset;
1241};
1242.Ed
1243.Pp
1244The
1245.Dv robust_list_offset
1246member contains address of the first element in the list of locked
1247robust shared mutexes.
1248The
1249.Dv robust_priv_list_offset
1250member contains address of the first element in the list of locked
1251robust private mutexes.
1252The private and shared robust locked lists are split to allow fast
1253termination of the shared list on fork, in the child.
1254.Pp
1255The
1256.Dv robust_inact_offset
1257contains a pointer to the mutex which might be locked in nearby future,
1258or might have been just unlocked.
1259It is typically set by the lock or unlock mutex implementation code
1260around the whole operation, since lists can be only changed race-free
1261when the thread owns the mutex.
1262The kernel inspects the
1263.Dv robust_inact_offset
1264in addition to walking the shared and private lists.
1265Also, the mutex pointed to by
1266.Dv robust_inact_offset
1267is handled more loosely at the thread termination time,
1268than other mutexes on the list.
1269That mutex is allowed to be not owned by the current thread,
1270in which case list processing is continued.
1271See
1272.Sx ROBUST UMUTEXES
1273subsection for details.
1274.El
1275.Pp
1276The
1277.Fa op
1278argument may be a bitwise OR of a single command from above with one or more of
1279the following flags:
1280.Bl -tag -width indent
1281.It Dv UMTX_OP__I386
1282Request i386 ABI compatibility from the native
1283.Nm
1284system call.
1285Specifically, this implies that:
1286.Bl -hang -offset indent
1287.It
1288.Fa obj
1289arguments that point to a word, point to a 32-bit integer.
1290.It
1291The
1292.Dv UMTX_OP_NWAKE_PRIVATE
1293.Fa obj
1294argument is a pointer to an array of 32-bit pointers.
1295.It
1296The
1297.Dv m_rb_lnk
1298member of
1299.Vt struct umutex
1300is a 32-bit pointer.
1301.It
1302.Vt struct timespec
1303uses a 32-bit time_t.
1304.El
1305.Pp
1306.Dv UMTX_OP__32BIT
1307has no effect if this flag is set.
1308This flag is valid for all architectures, but it is ignored on i386.
1309.It Dv UMTX_OP__32BIT
1310Request non-i386, 32-bit ABI compatibility from the native
1311.Nm
1312system call.
1313Specifically, this implies that:
1314.Bl -hang -offset indent
1315.It
1316.Fa obj
1317arguments that point to a word, point to a 32-bit integer.
1318.It
1319The
1320.Dv UMTX_OP_NWAKE_PRIVATE
1321.Fa obj
1322argument is a pointer to an array of 32-bit pointers.
1323.It
1324The
1325.Dv m_rb_lnk
1326member of
1327.Vt struct umutex
1328is a 32-bit pointer.
1329.It
1330.Vt struct timespec
1331uses a 64-bit time_t.
1332.El
1333.Pp
1334This flag has no effect if
1335.Dv UMTX_OP__I386
1336is set.
1337This flag is valid for all architectures.
1338.El
1339.Pp
1340Note that if any 32-bit ABI compatibility is being requested, then care must be
1341taken with robust lists.
1342A single thread may not mix 32-bit compatible robust lists with native
1343robust lists.
1344The first
1345.Dv UMTX_OP_ROBUST_LISTS
1346call in a given thread determines which ABI that thread will use for robust
1347lists going forward.
1348.Sh RETURN VALUES
1349If successful,
1350all requests, except
1351.Dv UMTX_SHM_CREAT
1352and
1353.Dv UMTX_SHM_LOOKUP
1354sub-requests of the
1355.Dv UMTX_OP_SHM
1356request, will return zero.
1357The
1358.Dv UMTX_SHM_CREAT
1359and
1360.Dv UMTX_SHM_LOOKUP
1361return a shared memory file descriptor on success.
1362On error \-1 is returned, and the
1363.Va errno
1364variable is set to indicate the error.
1365.Sh ERRORS
1366The
1367.Fn _umtx_op
1368operations can fail with the following errors:
1369.Bl -tag -width "[ETIMEDOUT]"
1370.It Bq Er EFAULT
1371One of the arguments point to invalid memory.
1372.It Bq Er EINVAL
1373The clock identifier, specified for the
1374.Vt struct _umtx_time
1375timeout parameter, or in the
1376.Dv c_clockid
1377member of
1378.Vt struct ucond,
1379is invalid.
1380.It Bq Er EINVAL
1381The type of the mutex, encoded by the
1382.Dv m_flags
1383member of
1384.Vt struct umutex ,
1385is invalid.
1386.It Bq Er EINVAL
1387The
1388.Dv m_owner
1389member of the
1390.Vt struct umutex
1391has changed the lock owner thread identifier during unlock.
1392.It Bq Er EINVAL
1393The
1394.Dv timeout.tv_sec
1395or
1396.Dv timeout.tv_nsec
1397member of
1398.Vt struct _umtx_time
1399is less than zero, or
1400.Dv timeout.tv_nsec
1401is greater than 1000000000.
1402.It Bq Er EINVAL
1403The
1404.Fa op
1405argument specifies invalid operation.
1406.It Bq Er EINVAL
1407The
1408.Fa uaddr
1409argument for the
1410.Dv UMTX_OP_SHM
1411request specifies invalid operation.
1412.It Bq Er EINVAL
1413The
1414.Dv UMTX_OP_SET_CEILING
1415request specifies non priority protected mutex.
1416.It Bq Er EINVAL
1417The new ceiling value for the
1418.Dv UMTX_OP_SET_CEILING
1419request, or one or more of the values read from the
1420.Dv m_ceilings
1421array during lock or unlock operations, is greater than
1422.Dv RTP_PRIO_MAX .
1423.It Bq Er EPERM
1424Unlock attempted on an object not owned by the current thread.
1425.It Bq Er EOWNERDEAD
1426The lock was requested on an umutex where the
1427.Dv m_owner
1428field was set to the
1429.Dv UMUTEX_RB_OWNERDEAD
1430value, indicating terminated robust mutex.
1431The lock was granted to the caller, so this error in fact
1432indicates success with additional conditions.
1433.It Bq Er ENOTRECOVERABLE
1434The lock was requested on an umutex which
1435.Dv m_owner
1436field is equal to the
1437.Dv UMUTEX_RB_NOTRECOV
1438value, indicating abandoned robust mutex after termination.
1439The lock was not granted to the caller.
1440.It Bq Er ENOTTY
1441The shared memory object, associated with the address passed to the
1442.Dv UMTX_SHM_ALIVE
1443sub-request of
1444.Dv UMTX_OP_SHM
1445request, was destroyed.
1446.It Bq Er ESRCH
1447For the
1448.Dv UMTX_SHM_LOOKUP ,
1449.Dv UMTX_SHM_DESTROY ,
1450and
1451.Dv UMTX_SHM_ALIVE
1452sub-requests of the
1453.Dv UMTX_OP_SHM
1454request, there is no shared memory object associated with the provided key.
1455.It Bq Er ENOMEM
1456The
1457.Dv UMTX_SHM_CREAT
1458sub-request of the
1459.Dv UMTX_OP_SHM
1460request cannot be satisfied, because allocation of the shared memory object
1461would exceed the
1462.Dv RLIMIT_UMTXP
1463resource limit, see
1464.Xr setrlimit 2 .
1465.It Bq Er EAGAIN
1466The maximum number of readers
1467.Dv ( URWLOCK_MAX_READERS )
1468were already granted ownership of the given
1469.Vt struct rwlock
1470for read.
1471.It Bq Er EBUSY
1472A try mutex lock operation was not able to obtain the lock.
1473.It Bq Er ETIMEDOUT
1474The request specified a timeout in the
1475.Fa uaddr
1476and
1477.Fa uaddr2
1478arguments, and timed out before obtaining the lock or being woken up.
1479.It Bq Er EINTR
1480A signal was delivered during wait, for a non-restartable operation.
1481Operations with timeouts are typically non-restartable, but timeouts
1482specified in absolute time may be restartable.
1483.It Bq Er ERESTART
1484A signal was delivered during wait, for a restartable operation.
1485Mutex lock requests without timeout specified are restartable.
1486The error is not returned to userspace code since restart
1487is handled by usual adjustment of the instruction counter.
1488.El
1489.Sh SEE ALSO
1490.Xr clock_gettime 2 ,
1491.Xr mmap 2 ,
1492.Xr setrlimit 2 ,
1493.Xr shm_open 2 ,
1494.Xr sigaction 2 ,
1495.Xr thr_exit 2 ,
1496.Xr thr_kill 2 ,
1497.Xr thr_kill2 2 ,
1498.Xr thr_new 2 ,
1499.Xr thr_self 2 ,
1500.Xr thr_set_name 2 ,
1501.Xr signal 3
1502.Sh STANDARDS
1503The
1504.Fn _umtx_op
1505system call is non-standard and is used by the
1506.Lb libthr
1507to implement
1508.St -p1003.1-2001
1509.Xr pthread 3
1510functionality.
1511.Sh BUGS
1512A window between a unlocking robust mutex and resetting the pointer in the
1513.Dv robust_inact_offset
1514member of the registered
1515.Vt struct umtx_robust_lists_params
1516allows another thread to destroy the mutex, thus making the kernel inspect
1517freed or reused memory.
1518The
1519.Li libthr
1520implementation is only vulnerable to this race when operating on
1521a shared mutex.
1522A possible fix for the current implementation is to strengthen the checks
1523for shared mutexes before terminating them, in particular, verifying
1524that the mutex memory is mapped from a shared memory object allocated
1525by the
1526.Dv UMTX_OP_SHM
1527request.
1528This is not done because it is believed that the race is adequately
1529covered by other consistency checks, while adding the check would
1530prevent alternative implementations of
1531.Li libpthread .
1532