xref: /freebsd-12.1/lib/libc/sys/kqueue.2 (revision 95c05062)
1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd July 27, 2018
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/event.h
38.Ft int
39.Fn kqueue "void"
40.Ft int
41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
42.Fn EV_SET "kev" ident filter flags fflags data udata
43.Sh DESCRIPTION
44The
45.Fn kqueue
46system call
47provides a generic method of notifying the user when an event
48happens or a condition holds, based on the results of small
49pieces of kernel code termed filters.
50A kevent is identified by the (ident, filter) pair; there may only
51be one unique kevent per kqueue.
52.Pp
53The filter is executed upon the initial registration of a kevent
54in order to detect whether a preexisting condition is present, and is also
55executed whenever an event is passed to the filter for evaluation.
56If the filter determines that the condition should be reported,
57then the kevent is placed on the kqueue for the user to retrieve.
58.Pp
59The filter is also run when the user attempts to retrieve the kevent
60from the kqueue.
61If the filter indicates that the condition that triggered
62the event no longer holds, the kevent is removed from the kqueue and
63is not returned.
64.Pp
65Multiple events which trigger the filter do not result in multiple
66kevents being placed on the kqueue; instead, the filter will aggregate
67the events into a single struct kevent.
68Calling
69.Fn close
70on a file descriptor will remove any kevents that reference the descriptor.
71.Pp
72The
73.Fn kqueue
74system call
75creates a new kernel event queue and returns a descriptor.
76The queue is not inherited by a child created with
77.Xr fork 2 .
78However, if
79.Xr rfork 2
80is called without the
81.Dv RFFDG
82flag, then the descriptor table is shared,
83which will allow sharing of the kqueue between two processes.
84.Pp
85The
86.Fn kevent
87system call
88is used to register events with the queue, and return any pending
89events to the user.
90The
91.Fa changelist
92argument
93is a pointer to an array of
94.Va kevent
95structures, as defined in
96.In sys/event.h .
97All changes contained in the
98.Fa changelist
99are applied before any pending events are read from the queue.
100The
101.Fa nchanges
102argument
103gives the size of
104.Fa changelist .
105The
106.Fa eventlist
107argument
108is a pointer to an array of kevent structures.
109The
110.Fa nevents
111argument
112determines the size of
113.Fa eventlist .
114When
115.Fa nevents
116is zero,
117.Fn kevent
118will return immediately even if there is a
119.Fa timeout
120specified unlike
121.Xr select 2 .
122If
123.Fa timeout
124is a non-NULL pointer, it specifies a maximum interval to wait
125for an event, which will be interpreted as a struct timespec.
126If
127.Fa timeout
128is a NULL pointer,
129.Fn kevent
130waits indefinitely.
131To effect a poll, the
132.Fa timeout
133argument should be non-NULL, pointing to a zero-valued
134.Va timespec
135structure.
136The same array may be used for the
137.Fa changelist
138and
139.Fa eventlist .
140.Pp
141The
142.Fn EV_SET
143macro is provided for ease of initializing a
144kevent structure.
145.Pp
146The
147.Va kevent
148structure is defined as:
149.Bd -literal
150struct kevent {
151	uintptr_t  ident;	/* identifier for this event */
152	short	  filter;	/* filter for event */
153	u_short	  flags;	/* action flags for kqueue */
154	u_int	  fflags;	/* filter flag value */
155	int64_t   data;		/* filter data value */
156	void	  *udata;	/* opaque user data identifier */
157	uint64_t  ext[4];	/* extensions */
158};
159.Ed
160.Pp
161The fields of
162.Fa struct kevent
163are:
164.Bl -tag -width "Fa filter"
165.It Fa ident
166Value used to identify this event.
167The exact interpretation is determined by the attached filter,
168but often is a file descriptor.
169.It Fa filter
170Identifies the kernel filter used to process this event.
171The pre-defined
172system filters are described below.
173.It Fa flags
174Actions to perform on the event.
175.It Fa fflags
176Filter-specific flags.
177.It Fa data
178Filter-specific data value.
179.It Fa udata
180Opaque user-defined value passed through the kernel unchanged.
181.It Fa ext
182Extended data passed to and from kernel.
183The
184.Fa ext[0]
185and
186.Fa ext[1]
187members use is defined by the filter.
188If the filter does not use them, the members are copied unchanged.
189The
190.Fa ext[2]
191and
192.Fa ext[3]
193members are always passed through the kernel as-is,
194making additional context available to application.
195.El
196.Pp
197The
198.Va flags
199field can contain the following values:
200.Bl -tag -width EV_DISPATCH
201.It Dv EV_ADD
202Adds the event to the kqueue.
203Re-adding an existing event
204will modify the parameters of the original event, and not result
205in a duplicate entry.
206Adding an event automatically enables it,
207unless overridden by the EV_DISABLE flag.
208.It Dv EV_ENABLE
209Permit
210.Fn kevent
211to return the event if it is triggered.
212.It Dv EV_DISABLE
213Disable the event so
214.Fn kevent
215will not return it.
216The filter itself is not disabled.
217.It Dv EV_DISPATCH
218Disable the event source immediately after delivery of an event.
219See
220.Dv EV_DISABLE
221above.
222.It Dv EV_DELETE
223Removes the event from the kqueue.
224Events which are attached to
225file descriptors are automatically deleted on the last close of
226the descriptor.
227.It Dv EV_RECEIPT
228This flag is useful for making bulk changes to a kqueue without draining
229any pending events.
230When passed as input, it forces
231.Dv EV_ERROR
232to always be returned.
233When a filter is successfully added the
234.Va data
235field will be zero.
236.It Dv EV_ONESHOT
237Causes the event to return only the first occurrence of the filter
238being triggered.
239After the user retrieves the event from the kqueue,
240it is deleted.
241.It Dv EV_CLEAR
242After the event is retrieved by the user, its state is reset.
243This is useful for filters which report state transitions
244instead of the current state.
245Note that some filters may automatically
246set this flag internally.
247.It Dv EV_EOF
248Filters may set this flag to indicate filter-specific EOF condition.
249.It Dv EV_ERROR
250See
251.Sx RETURN VALUES
252below.
253.El
254.Pp
255The predefined system filters are listed below.
256Arguments may be passed to and from the filter via the
257.Va fflags
258and
259.Va data
260fields in the kevent structure.
261.Bl -tag -width "Dv EVFILT_PROCDESC"
262.It Dv EVFILT_READ
263Takes a descriptor as the identifier, and returns whenever
264there is data available to read.
265The behavior of the filter is slightly different depending
266on the descriptor type.
267.Bl -tag -width 2n
268.It Sockets
269Sockets which have previously been passed to
270.Fn listen
271return when there is an incoming connection pending.
272.Va data
273contains the size of the listen backlog.
274.Pp
275Other socket descriptors return when there is data to be read,
276subject to the
277.Dv SO_RCVLOWAT
278value of the socket buffer.
279This may be overridden with a per-filter low water mark at the
280time the filter is added by setting the
281.Dv NOTE_LOWAT
282flag in
283.Va fflags ,
284and specifying the new low water mark in
285.Va data .
286On return,
287.Va data
288contains the number of bytes of protocol data available to read.
289.Pp
290If the read direction of the socket has shutdown, then the filter
291also sets
292.Dv EV_EOF
293in
294.Va flags ,
295and returns the socket error (if any) in
296.Va fflags .
297It is possible for EOF to be returned (indicating the connection is gone)
298while there is still data pending in the socket buffer.
299.It Vnodes
300Returns when the file pointer is not at the end of file.
301.Va data
302contains the offset from current position to end of file,
303and may be negative.
304.Pp
305This behavior is different from
306.Xr poll 2 ,
307where read events are triggered for regular files unconditionally.
308This event can be triggered unconditionally by setting the
309.Dv NOTE_FILE_POLL
310flag in
311.Va fflags .
312.It "Fifos, Pipes"
313Returns when the there is data to read;
314.Va data
315contains the number of bytes available.
316.Pp
317When the last writer disconnects, the filter will set
318.Dv EV_EOF
319in
320.Va flags .
321This may be cleared by passing in
322.Dv EV_CLEAR ,
323at which point the
324filter will resume waiting for data to become available before
325returning.
326.It "BPF devices"
327Returns when the BPF buffer is full, the BPF timeout has expired, or
328when the BPF has
329.Dq immediate mode
330enabled and there is any data to read;
331.Va data
332contains the number of bytes available.
333.El
334.It Dv EVFILT_WRITE
335Takes a descriptor as the identifier, and returns whenever
336it is possible to write to the descriptor.
337For sockets, pipes
338and fifos,
339.Va data
340will contain the amount of space remaining in the write buffer.
341The filter will set EV_EOF when the reader disconnects, and for
342the fifo case, this may be cleared by use of
343.Dv EV_CLEAR .
344Note that this filter is not supported for vnodes or BPF devices.
345.Pp
346For sockets, the low water mark and socket error handling is
347identical to the
348.Dv EVFILT_READ
349case.
350.It Dv EVFILT_EMPTY
351Takes a descriptor as the identifier, and returns whenever
352there is no remaining data in the write buffer.
353.It Dv EVFILT_AIO
354Events for this filter are not registered with
355.Fn kevent
356directly but are registered via the
357.Va aio_sigevent
358member of an asynchronous I/O request when it is scheduled via an
359asynchronous I/O system call such as
360.Fn aio_read .
361The filter returns under the same conditions as
362.Fn aio_error .
363For more details on this filter see
364.Xr sigevent 3 and
365.Xr aio 4 .
366.It Dv EVFILT_VNODE
367Takes a file descriptor as the identifier and the events to watch for in
368.Va fflags ,
369and returns when one or more of the requested events occurs on the descriptor.
370The events to monitor are:
371.Bl -tag -width "Dv NOTE_CLOSE_WRITE"
372.It Dv NOTE_ATTRIB
373The file referenced by the descriptor had its attributes changed.
374.It Dv NOTE_CLOSE
375A file descriptor referencing the monitored file, was closed.
376The closed file descriptor did not have write access.
377.It Dv NOTE_CLOSE_WRITE
378A file descriptor referencing the monitored file, was closed.
379The closed file descriptor had write access.
380.Pp
381This note, as well as
382.Dv NOTE_CLOSE ,
383are not activated when files are closed forcibly by
384.Xr unmount 2 or
385.Xr revoke 2 .
386Instead,
387.Dv NOTE_REVOKE
388is sent for such events.
389.It Dv NOTE_DELETE
390The
391.Fn unlink
392system call was called on the file referenced by the descriptor.
393.It Dv NOTE_EXTEND
394For regular file, the file referenced by the descriptor was extended.
395.Pp
396For directory, reports that a directory entry was added or removed,
397as the result of rename operation.
398The
399.Dv NOTE_EXTEND
400event is not reported when a name is changed inside the directory.
401.It Dv NOTE_LINK
402The link count on the file changed.
403In particular, the
404.Dv NOTE_LINK
405event is reported if a subdirectory was created or deleted inside
406the directory referenced by the descriptor.
407.It Dv NOTE_OPEN
408The file referenced by the descriptor was opened.
409.It Dv NOTE_READ
410A read occurred on the file referenced by the descriptor.
411.It Dv NOTE_RENAME
412The file referenced by the descriptor was renamed.
413.It Dv NOTE_REVOKE
414Access to the file was revoked via
415.Xr revoke 2
416or the underlying file system was unmounted.
417.It Dv NOTE_WRITE
418A write occurred on the file referenced by the descriptor.
419.El
420.Pp
421On return,
422.Va fflags
423contains the events which triggered the filter.
424.It Dv EVFILT_PROC
425Takes the process ID to monitor as the identifier and the events to watch for
426in
427.Va fflags ,
428and returns when the process performs one or more of the requested events.
429If a process can normally see another process, it can attach an event to it.
430The events to monitor are:
431.Bl -tag -width "Dv NOTE_TRACKERR"
432.It Dv NOTE_EXIT
433The process has exited.
434The exit status will be stored in
435.Va data .
436.It Dv NOTE_FORK
437The process has called
438.Fn fork .
439.It Dv NOTE_EXEC
440The process has executed a new process via
441.Xr execve 2
442or a similar call.
443.It Dv NOTE_TRACK
444Follow a process across
445.Fn fork
446calls.
447The parent process registers a new kevent to monitor the child process
448using the same
449.Va fflags
450as the original event.
451The child process will signal an event with
452.Dv NOTE_CHILD
453set in
454.Va fflags
455and the parent PID in
456.Va data .
457.Pp
458If the parent process fails to register a new kevent
459.Pq usually due to resource limitations ,
460it will signal an event with
461.Dv NOTE_TRACKERR
462set in
463.Va fflags ,
464and the child process will not signal a
465.Dv NOTE_CHILD
466event.
467.El
468.Pp
469On return,
470.Va fflags
471contains the events which triggered the filter.
472.It Dv EVFILT_PROCDESC
473Takes the process descriptor created by
474.Xr pdfork 2
475to monitor as the identifier and the events to watch for in
476.Va fflags ,
477and returns when the associated process performs one or more of the
478requested events.
479The events to monitor are:
480.Bl -tag -width "Dv NOTE_EXIT"
481.It Dv NOTE_EXIT
482The process has exited.
483The exit status will be stored in
484.Va data .
485.El
486.Pp
487On return,
488.Va fflags
489contains the events which triggered the filter.
490.It Dv EVFILT_SIGNAL
491Takes the signal number to monitor as the identifier and returns
492when the given signal is delivered to the process.
493This coexists with the
494.Fn signal
495and
496.Fn sigaction
497facilities, and has a lower precedence.
498The filter will record
499all attempts to deliver a signal to a process, even if the signal has
500been marked as
501.Dv SIG_IGN ,
502except for the
503.Dv SIGCHLD
504signal, which, if ignored, will not be recorded by the filter.
505Event notification happens after normal
506signal delivery processing.
507.Va data
508returns the number of times the signal has occurred since the last call to
509.Fn kevent .
510This filter automatically sets the
511.Dv EV_CLEAR
512flag internally.
513.It Dv EVFILT_TIMER
514Establishes an arbitrary timer identified by
515.Va ident .
516When adding a timer,
517.Va data
518specifies the moment to fire the timer (for
519.Dv NOTE_ABSTIME )
520or the timeout period.
521The timer will be periodic unless
522.Dv EV_ONESHOT
523or
524.Dv NOTE_ABSTIME
525is specified.
526On return,
527.Va data
528contains the number of times the timeout has expired since the last call to
529.Fn kevent .
530For non-monotonic timers, this filter automatically sets the
531.Dv EV_CLEAR
532flag internally.
533.Pp
534The filter accepts the following flags in the
535.Va fflags
536argument:
537.Bl -tag -width "Dv NOTE_MSECONDS"
538.It Dv NOTE_SECONDS
539.Va data
540is in seconds.
541.It Dv NOTE_MSECONDS
542.Va data
543is in milliseconds.
544.It Dv NOTE_USECONDS
545.Va data
546is in microseconds.
547.It Dv NOTE_NSECONDS
548.Va data
549is in nanoseconds.
550.It Dv NOTE_ABSTIME
551The specified expiration time is absolute.
552.El
553.Pp
554If
555.Va fflags
556is not set, the default is milliseconds.
557On return,
558.Va fflags
559contains the events which triggered the filter.
560.Pp
561If an existing timer is re-added, the existing timer will be
562effectively canceled (throwing away any undelivered record of previous
563timer expiration) and re-started using the new parameters contained in
564.Va data
565and
566.Va fflags .
567.Pp
568There is a system wide limit on the number of timers
569which is controlled by the
570.Va kern.kq_calloutmax
571sysctl.
572.It Dv EVFILT_USER
573Establishes a user event identified by
574.Va ident
575which is not associated with any kernel mechanism but is triggered by
576user level code.
577The lower 24 bits of the
578.Va fflags
579may be used for user defined flags and manipulated using the following:
580.Bl -tag -width "Dv NOTE_FFLAGSMASK"
581.It Dv NOTE_FFNOP
582Ignore the input
583.Va fflags .
584.It Dv NOTE_FFAND
585Bitwise AND
586.Va fflags .
587.It Dv NOTE_FFOR
588Bitwise OR
589.Va fflags .
590.It Dv NOTE_FFCOPY
591Copy
592.Va fflags .
593.It Dv NOTE_FFCTRLMASK
594Control mask for
595.Va fflags .
596.It Dv NOTE_FFLAGSMASK
597User defined flag mask for
598.Va fflags .
599.El
600.Pp
601A user event is triggered for output with the following:
602.Bl -tag -width "Dv NOTE_FFLAGSMASK"
603.It Dv NOTE_TRIGGER
604Cause the event to be triggered.
605.El
606.Pp
607On return,
608.Va fflags
609contains the users defined flags in the lower 24 bits.
610.El
611.Sh CANCELLATION BEHAVIOUR
612If
613.Fa nevents
614is non-zero, i.e., the function is potentially blocking, the call
615is a cancellation point.
616Otherwise, i.e., if
617.Fa nevents
618is zero, the call is not cancellable.
619Cancellation can only occur before any changes are made to the kqueue,
620or when the call was blocked and no changes to the queue were requested.
621.Sh RETURN VALUES
622The
623.Fn kqueue
624system call
625creates a new kernel event queue and returns a file descriptor.
626If there was an error creating the kernel event queue, a value of -1 is
627returned and errno set.
628.Pp
629The
630.Fn kevent
631system call
632returns the number of events placed in the
633.Fa eventlist ,
634up to the value given by
635.Fa nevents .
636If an error occurs while processing an element of the
637.Fa changelist
638and there is enough room in the
639.Fa eventlist ,
640then the event will be placed in the
641.Fa eventlist
642with
643.Dv EV_ERROR
644set in
645.Va flags
646and the system error in
647.Va data .
648Otherwise,
649.Dv -1
650will be returned, and
651.Dv errno
652will be set to indicate the error condition.
653If the time limit expires, then
654.Fn kevent
655returns 0.
656.Sh EXAMPLES
657.Bd -literal -compact
658#include <sys/event.h>
659#include <err.h>
660#include <fcntl.h>
661#include <stdio.h>
662#include <stdlib.h>
663#include <string.h>
664
665int
666main(int argc, char **argv)
667{
668    struct kevent event;    /* Event we want to monitor */
669    struct kevent tevent;   /* Event triggered */
670    int kq, fd, ret;
671
672    if (argc != 2)
673	err(EXIT_FAILURE, "Usage: %s path\en", argv[0]);
674    fd = open(argv[1], O_RDONLY);
675    if (fd == -1)
676	err(EXIT_FAILURE, "Failed to open '%s'", argv[1]);
677
678    /* Create kqueue. */
679    kq = kqueue();
680    if (kq == -1)
681	err(EXIT_FAILURE, "kqueue() failed");
682
683    /* Initialize kevent structure. */
684    EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE,
685	0, NULL);
686    /* Attach event to the kqueue. */
687    ret = kevent(kq, &event, 1, NULL, 0, NULL);
688    if (ret == -1)
689	err(EXIT_FAILURE, "kevent register");
690    if (event.flags & EV_ERROR)
691	errx(EXIT_FAILURE, "Event error: %s", strerror(event.data));
692
693    for (;;) {
694	/* Sleep until something happens. */
695	ret = kevent(kq, NULL, 0, &tevent, 1, NULL);
696	if (ret == -1) {
697	    err(EXIT_FAILURE, "kevent wait");
698	} else if (ret > 0) {
699	    printf("Something was written in '%s'\en", argv[1]);
700	}
701    }
702}
703.Ed
704.Sh ERRORS
705The
706.Fn kqueue
707system call fails if:
708.Bl -tag -width Er
709.It Bq Er ENOMEM
710The kernel failed to allocate enough memory for the kernel queue.
711.It Bq Er ENOMEM
712The
713.Dv RLIMIT_KQUEUES
714rlimit
715(see
716.Xr getrlimit 2 )
717for the current user would be exceeded.
718.It Bq Er EMFILE
719The per-process descriptor table is full.
720.It Bq Er ENFILE
721The system file table is full.
722.El
723.Pp
724The
725.Fn kevent
726system call fails if:
727.Bl -tag -width Er
728.It Bq Er EACCES
729The process does not have permission to register a filter.
730.It Bq Er EFAULT
731There was an error reading or writing the
732.Va kevent
733structure.
734.It Bq Er EBADF
735The specified descriptor is invalid.
736.It Bq Er EINTR
737A signal was delivered before the timeout expired and before any
738events were placed on the kqueue for return.
739.It Bq Er EINTR
740A cancellation request was delivered to the thread, but not yet handled.
741.It Bq Er EINVAL
742The specified time limit or filter is invalid.
743.It Bq Er ENOENT
744The event could not be found to be modified or deleted.
745.It Bq Er ENOMEM
746No memory was available to register the event
747or, in the special case of a timer, the maximum number of
748timers has been exceeded.
749This maximum is configurable via the
750.Va kern.kq_calloutmax
751sysctl.
752.It Bq Er ESRCH
753The specified process to attach to does not exist.
754.El
755.Pp
756When
757.Fn kevent
758call fails with
759.Er EINTR
760error, all changes in the
761.Fa changelist
762have been applied.
763.Sh SEE ALSO
764.Xr aio_error 2 ,
765.Xr aio_read 2 ,
766.Xr aio_return 2 ,
767.Xr poll 2 ,
768.Xr read 2 ,
769.Xr select 2 ,
770.Xr sigaction 2 ,
771.Xr write 2 ,
772.Xr pthread_setcancelstate 3 ,
773.Xr signal 3
774.Sh HISTORY
775The
776.Fn kqueue
777and
778.Fn kevent
779system calls first appeared in
780.Fx 4.1 .
781.Sh AUTHORS
782The
783.Fn kqueue
784system and this manual page were written by
785.An Jonathan Lemon Aq Mt [email protected] .
786.Sh BUGS
787The
788.Fa timeout
789value is limited to 24 hours; longer timeouts will be silently
790reinterpreted as 24 hours.
791.Pp
792In versions older than
793.Fx 12.0 ,
794.In sys/event.h
795failed to parse without including
796.In sys/types.h
797manually.
798