xref: /freebsd-14.2/lib/libc/sys/kqueue.2 (revision 15eaaf08)
1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd April 18, 2017
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/event.h
38.Ft int
39.Fn kqueue "void"
40.Ft int
41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
42.Fn EV_SET "kev" ident filter flags fflags data udata
43.Sh DESCRIPTION
44The
45.Fn kqueue
46system call
47provides a generic method of notifying the user when an event
48happens or a condition holds, based on the results of small
49pieces of kernel code termed filters.
50A kevent is identified by the (ident, filter) pair; there may only
51be one unique kevent per kqueue.
52.Pp
53The filter is executed upon the initial registration of a kevent
54in order to detect whether a preexisting condition is present, and is also
55executed whenever an event is passed to the filter for evaluation.
56If the filter determines that the condition should be reported,
57then the kevent is placed on the kqueue for the user to retrieve.
58.Pp
59The filter is also run when the user attempts to retrieve the kevent
60from the kqueue.
61If the filter indicates that the condition that triggered
62the event no longer holds, the kevent is removed from the kqueue and
63is not returned.
64.Pp
65Multiple events which trigger the filter do not result in multiple
66kevents being placed on the kqueue; instead, the filter will aggregate
67the events into a single struct kevent.
68Calling
69.Fn close
70on a file descriptor will remove any kevents that reference the descriptor.
71.Pp
72The
73.Fn kqueue
74system call
75creates a new kernel event queue and returns a descriptor.
76The queue is not inherited by a child created with
77.Xr fork 2 .
78However, if
79.Xr rfork 2
80is called without the
81.Dv RFFDG
82flag, then the descriptor table is shared,
83which will allow sharing of the kqueue between two processes.
84.Pp
85The
86.Fn kevent
87system call
88is used to register events with the queue, and return any pending
89events to the user.
90The
91.Fa changelist
92argument
93is a pointer to an array of
94.Va kevent
95structures, as defined in
96.In sys/event.h .
97All changes contained in the
98.Fa changelist
99are applied before any pending events are read from the queue.
100The
101.Fa nchanges
102argument
103gives the size of
104.Fa changelist .
105The
106.Fa eventlist
107argument
108is a pointer to an array of kevent structures.
109The
110.Fa nevents
111argument
112determines the size of
113.Fa eventlist .
114When
115.Fa nevents
116is zero,
117.Fn kevent
118will return immediately even if there is a
119.Fa timeout
120specified unlike
121.Xr select 2 .
122If
123.Fa timeout
124is a non-NULL pointer, it specifies a maximum interval to wait
125for an event, which will be interpreted as a struct timespec.
126If
127.Fa timeout
128is a NULL pointer,
129.Fn kevent
130waits indefinitely.
131To effect a poll, the
132.Fa timeout
133argument should be non-NULL, pointing to a zero-valued
134.Va timespec
135structure.
136The same array may be used for the
137.Fa changelist
138and
139.Fa eventlist .
140.Pp
141The
142.Fn EV_SET
143macro is provided for ease of initializing a
144kevent structure.
145.Pp
146The
147.Va kevent
148structure is defined as:
149.Bd -literal
150struct kevent {
151	uintptr_t ident;	/* identifier for this event */
152	short	  filter;	/* filter for event */
153	u_short	  flags;	/* action flags for kqueue */
154	u_int	  fflags;	/* filter flag value */
155	intptr_t  data;		/* filter data value */
156	void	  *udata;	/* opaque user data identifier */
157};
158.Ed
159.Pp
160The fields of
161.Fa struct kevent
162are:
163.Bl -tag -width "Fa filter"
164.It Fa ident
165Value used to identify this event.
166The exact interpretation is determined by the attached filter,
167but often is a file descriptor.
168.It Fa filter
169Identifies the kernel filter used to process this event.
170The pre-defined
171system filters are described below.
172.It Fa flags
173Actions to perform on the event.
174.It Fa fflags
175Filter-specific flags.
176.It Fa data
177Filter-specific data value.
178.It Fa udata
179Opaque user-defined value passed through the kernel unchanged.
180.El
181.Pp
182The
183.Va flags
184field can contain the following values:
185.Bl -tag -width EV_DISPATCH
186.It Dv EV_ADD
187Adds the event to the kqueue.
188Re-adding an existing event
189will modify the parameters of the original event, and not result
190in a duplicate entry.
191Adding an event automatically enables it,
192unless overridden by the EV_DISABLE flag.
193.It Dv EV_ENABLE
194Permit
195.Fn kevent
196to return the event if it is triggered.
197.It Dv EV_DISABLE
198Disable the event so
199.Fn kevent
200will not return it.
201The filter itself is not disabled.
202.It Dv EV_DISPATCH
203Disable the event source immediately after delivery of an event.
204See
205.Dv EV_DISABLE
206above.
207.It Dv EV_DELETE
208Removes the event from the kqueue.
209Events which are attached to
210file descriptors are automatically deleted on the last close of
211the descriptor.
212.It Dv EV_RECEIPT
213This flag is useful for making bulk changes to a kqueue without draining
214any pending events.
215When passed as input, it forces
216.Dv EV_ERROR
217to always be returned.
218When a filter is successfully added the
219.Va data
220field will be zero.
221.It Dv EV_ONESHOT
222Causes the event to return only the first occurrence of the filter
223being triggered.
224After the user retrieves the event from the kqueue,
225it is deleted.
226.It Dv EV_CLEAR
227After the event is retrieved by the user, its state is reset.
228This is useful for filters which report state transitions
229instead of the current state.
230Note that some filters may automatically
231set this flag internally.
232.It Dv EV_EOF
233Filters may set this flag to indicate filter-specific EOF condition.
234.It Dv EV_ERROR
235See
236.Sx RETURN VALUES
237below.
238.El
239.Pp
240The predefined system filters are listed below.
241Arguments may be passed to and from the filter via the
242.Va fflags
243and
244.Va data
245fields in the kevent structure.
246.Bl -tag -width "Dv EVFILT_PROCDESC"
247.It Dv EVFILT_READ
248Takes a descriptor as the identifier, and returns whenever
249there is data available to read.
250The behavior of the filter is slightly different depending
251on the descriptor type.
252.Bl -tag -width 2n
253.It Sockets
254Sockets which have previously been passed to
255.Fn listen
256return when there is an incoming connection pending.
257.Va data
258contains the size of the listen backlog.
259.Pp
260Other socket descriptors return when there is data to be read,
261subject to the
262.Dv SO_RCVLOWAT
263value of the socket buffer.
264This may be overridden with a per-filter low water mark at the
265time the filter is added by setting the
266.Dv NOTE_LOWAT
267flag in
268.Va fflags ,
269and specifying the new low water mark in
270.Va data .
271On return,
272.Va data
273contains the number of bytes of protocol data available to read.
274.Pp
275If the read direction of the socket has shutdown, then the filter
276also sets
277.Dv EV_EOF
278in
279.Va flags ,
280and returns the socket error (if any) in
281.Va fflags .
282It is possible for EOF to be returned (indicating the connection is gone)
283while there is still data pending in the socket buffer.
284.It Vnodes
285Returns when the file pointer is not at the end of file.
286.Va data
287contains the offset from current position to end of file,
288and may be negative.
289.Pp
290This behavior is different from
291.Xr poll 2 ,
292where read events are triggered for regular files unconditionally.
293This event can be triggered unconditionally by setting the
294.Dv NOTE_FILE_POLL
295flag in
296.Va fflags .
297.It "Fifos, Pipes"
298Returns when the there is data to read;
299.Va data
300contains the number of bytes available.
301.Pp
302When the last writer disconnects, the filter will set
303.Dv EV_EOF
304in
305.Va flags .
306This may be cleared by passing in
307.Dv EV_CLEAR ,
308at which point the
309filter will resume waiting for data to become available before
310returning.
311.It "BPF devices"
312Returns when the BPF buffer is full, the BPF timeout has expired, or
313when the BPF has
314.Dq immediate mode
315enabled and there is any data to read;
316.Va data
317contains the number of bytes available.
318.El
319.It Dv EVFILT_WRITE
320Takes a descriptor as the identifier, and returns whenever
321it is possible to write to the descriptor.
322For sockets, pipes
323and fifos,
324.Va data
325will contain the amount of space remaining in the write buffer.
326The filter will set EV_EOF when the reader disconnects, and for
327the fifo case, this may be cleared by use of
328.Dv EV_CLEAR .
329Note that this filter is not supported for vnodes or BPF devices.
330.Pp
331For sockets, the low water mark and socket error handling is
332identical to the
333.Dv EVFILT_READ
334case.
335.It Dv EVFILT_EMPTY
336Takes a descriptor as the identifier, and returns whenever
337there is no remaining data in the write buffer.
338.It Dv EVFILT_AIO
339The sigevent portion of the AIO request is filled in, with
340.Va sigev_notify_kqueue
341containing the descriptor of the kqueue that the event should
342be attached to,
343.Va sigev_notify_kevent_flags
344containing the kevent flags which should be
345.Dv EV_ONESHOT ,
346.Dv EV_CLEAR
347or
348.Dv EV_DISPATCH ,
349.Va sigev_value
350containing the udata value, and
351.Va sigev_notify
352set to
353.Dv SIGEV_KEVENT .
354When the
355.Fn aio_*
356system call is made, the event will be registered
357with the specified kqueue, and the
358.Va ident
359argument set to the
360.Fa struct aiocb
361returned by the
362.Fn aio_*
363system call.
364The filter returns under the same conditions as
365.Fn aio_error .
366.It Dv EVFILT_VNODE
367Takes a file descriptor as the identifier and the events to watch for in
368.Va fflags ,
369and returns when one or more of the requested events occurs on the descriptor.
370The events to monitor are:
371.Bl -tag -width "Dv NOTE_CLOSE_WRITE"
372.It Dv NOTE_ATTRIB
373The file referenced by the descriptor had its attributes changed.
374.It Dv NOTE_CLOSE
375A file descriptor referencing the monitored file, was closed.
376The closed file descriptor did not have write access.
377.It Dv NOTE_CLOSE_WRITE
378A file descriptor referencing the monitored file, was closed.
379The closed file descriptor had write access.
380.Pp
381This note, as well as
382.Dv NOTE_CLOSE ,
383are not activated when files are closed forcibly by
384.Xr unmount 2 or
385.Xr revoke 2 .
386Instead,
387.Dv NOTE_REVOKE
388is sent for such events.
389.It Dv NOTE_DELETE
390The
391.Fn unlink
392system call was called on the file referenced by the descriptor.
393.It Dv NOTE_EXTEND
394For regular file, the file referenced by the descriptor was extended.
395.Pp
396For directory, reports that a directory entry was added or removed,
397as the result of rename operation.
398The
399.Dv NOTE_EXTEND
400event is not reported when a name is changed inside the directory.
401.It Dv NOTE_LINK
402The link count on the file changed.
403In particular, the
404.Dv NOTE_LINK
405event is reported if a subdirectory was created or deleted inside
406the directory referenced by the descriptor.
407.It Dv NOTE_OPEN
408The file referenced by the descriptor was opened.
409.It Dv NOTE_READ
410A read occurred on the file referenced by the descriptor.
411.It Dv NOTE_RENAME
412The file referenced by the descriptor was renamed.
413.It Dv NOTE_REVOKE
414Access to the file was revoked via
415.Xr revoke 2
416or the underlying file system was unmounted.
417.It Dv NOTE_WRITE
418A write occurred on the file referenced by the descriptor.
419.El
420.Pp
421On return,
422.Va fflags
423contains the events which triggered the filter.
424.It Dv EVFILT_PROC
425Takes the process ID to monitor as the identifier and the events to watch for
426in
427.Va fflags ,
428and returns when the process performs one or more of the requested events.
429If a process can normally see another process, it can attach an event to it.
430The events to monitor are:
431.Bl -tag -width "Dv NOTE_TRACKERR"
432.It Dv NOTE_EXIT
433The process has exited.
434The exit status will be stored in
435.Va data .
436.It Dv NOTE_FORK
437The process has called
438.Fn fork .
439.It Dv NOTE_EXEC
440The process has executed a new process via
441.Xr execve 2
442or a similar call.
443.It Dv NOTE_TRACK
444Follow a process across
445.Fn fork
446calls.
447The parent process registers a new kevent to monitor the child process
448using the same
449.Va fflags
450as the original event.
451The child process will signal an event with
452.Dv NOTE_CHILD
453set in
454.Va fflags
455and the parent PID in
456.Va data .
457.Pp
458If the parent process fails to register a new kevent
459.Pq usually due to resource limitations ,
460it will signal an event with
461.Dv NOTE_TRACKERR
462set in
463.Va fflags ,
464and the child process will not signal a
465.Dv NOTE_CHILD
466event.
467.El
468.Pp
469On return,
470.Va fflags
471contains the events which triggered the filter.
472.It Dv EVFILT_PROCDESC
473Takes the process descriptor created by
474.Xr pdfork 2
475to monitor as the identifier and the events to watch for in
476.Va fflags ,
477and returns when the associated process performs one or more of the
478requested events.
479The events to monitor are:
480.Bl -tag -width "Dv NOTE_EXIT"
481.It Dv NOTE_EXIT
482The process has exited.
483The exit status will be stored in
484.Va data .
485.El
486.Pp
487On return,
488.Va fflags
489contains the events which triggered the filter.
490.It Dv EVFILT_SIGNAL
491Takes the signal number to monitor as the identifier and returns
492when the given signal is delivered to the process.
493This coexists with the
494.Fn signal
495and
496.Fn sigaction
497facilities, and has a lower precedence.
498The filter will record
499all attempts to deliver a signal to a process, even if the signal has
500been marked as
501.Dv SIG_IGN ,
502except for the
503.Dv SIGCHLD
504signal, which, if ignored, won't be recorded by the filter.
505Event notification happens after normal
506signal delivery processing.
507.Va data
508returns the number of times the signal has occurred since the last call to
509.Fn kevent .
510This filter automatically sets the
511.Dv EV_CLEAR
512flag internally.
513.It Dv EVFILT_TIMER
514Establishes an arbitrary timer identified by
515.Va ident .
516When adding a timer,
517.Va data
518specifies the timeout period.
519The timer will be periodic unless
520.Dv EV_ONESHOT
521is specified.
522On return,
523.Va data
524contains the number of times the timeout has expired since the last call to
525.Fn kevent .
526This filter automatically sets the EV_CLEAR flag internally.
527There is a system wide limit on the number of timers
528which is controlled by the
529.Va kern.kq_calloutmax
530sysctl.
531.Bl -tag -width "Dv NOTE_USECONDS"
532.It Dv NOTE_SECONDS
533.Va data
534is in seconds.
535.It Dv NOTE_MSECONDS
536.Va data
537is in milliseconds.
538.It Dv NOTE_USECONDS
539.Va data
540is in microseconds.
541.It Dv NOTE_NSECONDS
542.Va data
543is in nanoseconds.
544.El
545.Pp
546If
547.Va fflags
548is not set, the default is milliseconds. On return,
549.Va fflags
550contains the events which triggered the filter.
551.It Dv EVFILT_USER
552Establishes a user event identified by
553.Va ident
554which is not associated with any kernel mechanism but is triggered by
555user level code.
556The lower 24 bits of the
557.Va fflags
558may be used for user defined flags and manipulated using the following:
559.Bl -tag -width "Dv NOTE_FFLAGSMASK"
560.It Dv NOTE_FFNOP
561Ignore the input
562.Va fflags .
563.It Dv NOTE_FFAND
564Bitwise AND
565.Va fflags .
566.It Dv NOTE_FFOR
567Bitwise OR
568.Va fflags .
569.It Dv NOTE_FFCOPY
570Copy
571.Va fflags .
572.It Dv NOTE_FFCTRLMASK
573Control mask for
574.Va fflags .
575.It Dv NOTE_FFLAGSMASK
576User defined flag mask for
577.Va fflags .
578.El
579.Pp
580A user event is triggered for output with the following:
581.Bl -tag -width "Dv NOTE_FFLAGSMASK"
582.It Dv NOTE_TRIGGER
583Cause the event to be triggered.
584.El
585.Pp
586On return,
587.Va fflags
588contains the users defined flags in the lower 24 bits.
589.El
590.Sh CANCELLATION BEHAVIOUR
591If
592.Fa nevents
593is non-zero, i.e. the function is potentially blocking, the call
594is a cancellation point.
595Otherwise, i.e. if
596.Fa nevents
597is zero, the call is not cancellable.
598Cancellation can only occur before any changes are made to the kqueue,
599or when the call was blocked and no changes to the queue were requested.
600.Sh RETURN VALUES
601The
602.Fn kqueue
603system call
604creates a new kernel event queue and returns a file descriptor.
605If there was an error creating the kernel event queue, a value of -1 is
606returned and errno set.
607.Pp
608The
609.Fn kevent
610system call
611returns the number of events placed in the
612.Fa eventlist ,
613up to the value given by
614.Fa nevents .
615If an error occurs while processing an element of the
616.Fa changelist
617and there is enough room in the
618.Fa eventlist ,
619then the event will be placed in the
620.Fa eventlist
621with
622.Dv EV_ERROR
623set in
624.Va flags
625and the system error in
626.Va data .
627Otherwise,
628.Dv -1
629will be returned, and
630.Dv errno
631will be set to indicate the error condition.
632If the time limit expires, then
633.Fn kevent
634returns 0.
635.Sh EXAMPLES
636.Bd -literal -compact
637#include <sys/event.h>
638#include <err.h>
639#include <fcntl.h>
640#include <stdio.h>
641#include <stdlib.h>
642#include <string.h>
643
644int
645main(int argc, char **argv)
646{
647    struct kevent event;    /* Event we want to monitor */
648    struct kevent tevent;   /* Event triggered */
649    int kq, fd, ret;
650
651    if (argc != 2)
652	err(EXIT_FAILURE, "Usage: %s path\en", argv[0]);
653    fd = open(argv[1], O_RDONLY);
654    if (fd == -1)
655	err(EXIT_FAILURE, "Failed to open '%s'", argv[1]);
656
657    /* Create kqueue. */
658    kq = kqueue();
659    if (kq == -1)
660	err(EXIT_FAILURE, "kqueue() failed");
661
662    /* Initialize kevent structure. */
663    EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE,
664	0, NULL);
665    /* Attach event to the kqueue. */
666    ret = kevent(kq, &event, 1, NULL, 0, NULL);
667    if (ret == -1)
668	err(EXIT_FAILURE, "kevent register");
669    if (event.flags & EV_ERROR)
670	errx(EXIT_FAILURE, "Event error: %s", strerror(event.data));
671
672    for (;;) {
673	/* Sleep until something happens. */
674	ret = kevent(kq, NULL, 0, &tevent, 1, NULL);
675	if (ret == -1) {
676	    err(EXIT_FAILURE, "kevent wait");
677	} else if (ret > 0) {
678	    printf("Something was written in '%s'\en", argv[1]);
679	}
680    }
681}
682.Ed
683.Sh ERRORS
684The
685.Fn kqueue
686system call fails if:
687.Bl -tag -width Er
688.It Bq Er ENOMEM
689The kernel failed to allocate enough memory for the kernel queue.
690.It Bq Er ENOMEM
691The
692.Dv RLIMIT_KQUEUES
693rlimit
694(see
695.Xr getrlimit 2 )
696for the current user would be exceeded.
697.It Bq Er EMFILE
698The per-process descriptor table is full.
699.It Bq Er ENFILE
700The system file table is full.
701.El
702.Pp
703The
704.Fn kevent
705system call fails if:
706.Bl -tag -width Er
707.It Bq Er EACCES
708The process does not have permission to register a filter.
709.It Bq Er EFAULT
710There was an error reading or writing the
711.Va kevent
712structure.
713.It Bq Er EBADF
714The specified descriptor is invalid.
715.It Bq Er EINTR
716A signal was delivered before the timeout expired and before any
717events were placed on the kqueue for return.
718.It Bq Er EINTR
719A cancellation request was delivered to the thread, but not yet handled.
720.It Bq Er EINVAL
721The specified time limit or filter is invalid.
722.It Bq Er ENOENT
723The event could not be found to be modified or deleted.
724.It Bq Er ENOMEM
725No memory was available to register the event
726or, in the special case of a timer, the maximum number of
727timers has been exceeded.
728This maximum is configurable via the
729.Va kern.kq_calloutmax
730sysctl.
731.It Bq Er ESRCH
732The specified process to attach to does not exist.
733.El
734.Pp
735When
736.Fn kevent
737call fails with
738.Er EINTR
739error, all changes in the
740.Fa changelist
741have been applied.
742.Sh SEE ALSO
743.Xr aio_error 2 ,
744.Xr aio_read 2 ,
745.Xr aio_return 2 ,
746.Xr poll 2 ,
747.Xr read 2 ,
748.Xr select 2 ,
749.Xr sigaction 2 ,
750.Xr write 2 ,
751.Xr pthread_setcancelstate 3 ,
752.Xr signal 3
753.Sh HISTORY
754The
755.Fn kqueue
756and
757.Fn kevent
758system calls first appeared in
759.Fx 4.1 .
760.Sh AUTHORS
761The
762.Fn kqueue
763system and this manual page were written by
764.An Jonathan Lemon Aq Mt [email protected] .
765.Sh BUGS
766The
767.Fa timeout
768value is limited to 24 hours; longer timeouts will be silently
769reinterpreted as 24 hours.
770.Pp
771Previous versions of
772.In sys/event.h
773fail to parse without including
774.In sys/types.h
775manually.
776