xref: /freebsd-14.2/lib/libc/sys/kqueue.2 (revision f287c3e4)
1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd June 22, 2017
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/event.h
38.Ft int
39.Fn kqueue "void"
40.Ft int
41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
42.Fn EV_SET "kev" ident filter flags fflags data udata
43.Sh DESCRIPTION
44The
45.Fn kqueue
46system call
47provides a generic method of notifying the user when an event
48happens or a condition holds, based on the results of small
49pieces of kernel code termed filters.
50A kevent is identified by the (ident, filter) pair; there may only
51be one unique kevent per kqueue.
52.Pp
53The filter is executed upon the initial registration of a kevent
54in order to detect whether a preexisting condition is present, and is also
55executed whenever an event is passed to the filter for evaluation.
56If the filter determines that the condition should be reported,
57then the kevent is placed on the kqueue for the user to retrieve.
58.Pp
59The filter is also run when the user attempts to retrieve the kevent
60from the kqueue.
61If the filter indicates that the condition that triggered
62the event no longer holds, the kevent is removed from the kqueue and
63is not returned.
64.Pp
65Multiple events which trigger the filter do not result in multiple
66kevents being placed on the kqueue; instead, the filter will aggregate
67the events into a single struct kevent.
68Calling
69.Fn close
70on a file descriptor will remove any kevents that reference the descriptor.
71.Pp
72The
73.Fn kqueue
74system call
75creates a new kernel event queue and returns a descriptor.
76The queue is not inherited by a child created with
77.Xr fork 2 .
78However, if
79.Xr rfork 2
80is called without the
81.Dv RFFDG
82flag, then the descriptor table is shared,
83which will allow sharing of the kqueue between two processes.
84.Pp
85The
86.Fn kevent
87system call
88is used to register events with the queue, and return any pending
89events to the user.
90The
91.Fa changelist
92argument
93is a pointer to an array of
94.Va kevent
95structures, as defined in
96.In sys/event.h .
97All changes contained in the
98.Fa changelist
99are applied before any pending events are read from the queue.
100The
101.Fa nchanges
102argument
103gives the size of
104.Fa changelist .
105The
106.Fa eventlist
107argument
108is a pointer to an array of kevent structures.
109The
110.Fa nevents
111argument
112determines the size of
113.Fa eventlist .
114When
115.Fa nevents
116is zero,
117.Fn kevent
118will return immediately even if there is a
119.Fa timeout
120specified unlike
121.Xr select 2 .
122If
123.Fa timeout
124is a non-NULL pointer, it specifies a maximum interval to wait
125for an event, which will be interpreted as a struct timespec.
126If
127.Fa timeout
128is a NULL pointer,
129.Fn kevent
130waits indefinitely.
131To effect a poll, the
132.Fa timeout
133argument should be non-NULL, pointing to a zero-valued
134.Va timespec
135structure.
136The same array may be used for the
137.Fa changelist
138and
139.Fa eventlist .
140.Pp
141The
142.Fn EV_SET
143macro is provided for ease of initializing a
144kevent structure.
145.Pp
146The
147.Va kevent
148structure is defined as:
149.Bd -literal
150struct kevent {
151	uintptr_t  ident;	/* identifier for this event */
152	short	  filter;	/* filter for event */
153	u_short	  flags;	/* action flags for kqueue */
154	u_int	  fflags;	/* filter flag value */
155	int64_t   data;		/* filter data value */
156	void	  *udata;	/* opaque user data identifier */
157	uint64_t  ext[4];	/* extentions */
158};
159.Ed
160.Pp
161The fields of
162.Fa struct kevent
163are:
164.Bl -tag -width "Fa filter"
165.It Fa ident
166Value used to identify this event.
167The exact interpretation is determined by the attached filter,
168but often is a file descriptor.
169.It Fa filter
170Identifies the kernel filter used to process this event.
171The pre-defined
172system filters are described below.
173.It Fa flags
174Actions to perform on the event.
175.It Fa fflags
176Filter-specific flags.
177.It Fa data
178Filter-specific data value.
179.It Fa udata
180Opaque user-defined value passed through the kernel unchanged.
181.It Fa ext
182Extended data passed to and from kernel.
183The
184.Fa ext[0]
185and
186.Fa ext[1]
187members use is defined by the filter.
188If the filter does not use them, the members are copied unchanged.
189The
190.Fa ext[2]
191and
192.Fa ext[3]
193members are always passed through the kernel as-is,
194making additional context available to application.
195.El
196.Pp
197The
198.Va flags
199field can contain the following values:
200.Bl -tag -width EV_DISPATCH
201.It Dv EV_ADD
202Adds the event to the kqueue.
203Re-adding an existing event
204will modify the parameters of the original event, and not result
205in a duplicate entry.
206Adding an event automatically enables it,
207unless overridden by the EV_DISABLE flag.
208.It Dv EV_ENABLE
209Permit
210.Fn kevent
211to return the event if it is triggered.
212.It Dv EV_DISABLE
213Disable the event so
214.Fn kevent
215will not return it.
216The filter itself is not disabled.
217.It Dv EV_DISPATCH
218Disable the event source immediately after delivery of an event.
219See
220.Dv EV_DISABLE
221above.
222.It Dv EV_DELETE
223Removes the event from the kqueue.
224Events which are attached to
225file descriptors are automatically deleted on the last close of
226the descriptor.
227.It Dv EV_RECEIPT
228This flag is useful for making bulk changes to a kqueue without draining
229any pending events.
230When passed as input, it forces
231.Dv EV_ERROR
232to always be returned.
233When a filter is successfully added the
234.Va data
235field will be zero.
236.It Dv EV_ONESHOT
237Causes the event to return only the first occurrence of the filter
238being triggered.
239After the user retrieves the event from the kqueue,
240it is deleted.
241.It Dv EV_CLEAR
242After the event is retrieved by the user, its state is reset.
243This is useful for filters which report state transitions
244instead of the current state.
245Note that some filters may automatically
246set this flag internally.
247.It Dv EV_EOF
248Filters may set this flag to indicate filter-specific EOF condition.
249.It Dv EV_ERROR
250See
251.Sx RETURN VALUES
252below.
253.El
254.Pp
255The predefined system filters are listed below.
256Arguments may be passed to and from the filter via the
257.Va fflags
258and
259.Va data
260fields in the kevent structure.
261.Bl -tag -width "Dv EVFILT_PROCDESC"
262.It Dv EVFILT_READ
263Takes a descriptor as the identifier, and returns whenever
264there is data available to read.
265The behavior of the filter is slightly different depending
266on the descriptor type.
267.Bl -tag -width 2n
268.It Sockets
269Sockets which have previously been passed to
270.Fn listen
271return when there is an incoming connection pending.
272.Va data
273contains the size of the listen backlog.
274.Pp
275Other socket descriptors return when there is data to be read,
276subject to the
277.Dv SO_RCVLOWAT
278value of the socket buffer.
279This may be overridden with a per-filter low water mark at the
280time the filter is added by setting the
281.Dv NOTE_LOWAT
282flag in
283.Va fflags ,
284and specifying the new low water mark in
285.Va data .
286On return,
287.Va data
288contains the number of bytes of protocol data available to read.
289.Pp
290If the read direction of the socket has shutdown, then the filter
291also sets
292.Dv EV_EOF
293in
294.Va flags ,
295and returns the socket error (if any) in
296.Va fflags .
297It is possible for EOF to be returned (indicating the connection is gone)
298while there is still data pending in the socket buffer.
299.It Vnodes
300Returns when the file pointer is not at the end of file.
301.Va data
302contains the offset from current position to end of file,
303and may be negative.
304.Pp
305This behavior is different from
306.Xr poll 2 ,
307where read events are triggered for regular files unconditionally.
308This event can be triggered unconditionally by setting the
309.Dv NOTE_FILE_POLL
310flag in
311.Va fflags .
312.It "Fifos, Pipes"
313Returns when the there is data to read;
314.Va data
315contains the number of bytes available.
316.Pp
317When the last writer disconnects, the filter will set
318.Dv EV_EOF
319in
320.Va flags .
321This may be cleared by passing in
322.Dv EV_CLEAR ,
323at which point the
324filter will resume waiting for data to become available before
325returning.
326.It "BPF devices"
327Returns when the BPF buffer is full, the BPF timeout has expired, or
328when the BPF has
329.Dq immediate mode
330enabled and there is any data to read;
331.Va data
332contains the number of bytes available.
333.El
334.It Dv EVFILT_WRITE
335Takes a descriptor as the identifier, and returns whenever
336it is possible to write to the descriptor.
337For sockets, pipes
338and fifos,
339.Va data
340will contain the amount of space remaining in the write buffer.
341The filter will set EV_EOF when the reader disconnects, and for
342the fifo case, this may be cleared by use of
343.Dv EV_CLEAR .
344Note that this filter is not supported for vnodes or BPF devices.
345.Pp
346For sockets, the low water mark and socket error handling is
347identical to the
348.Dv EVFILT_READ
349case.
350.It Dv EVFILT_EMPTY
351Takes a descriptor as the identifier, and returns whenever
352there is no remaining data in the write buffer.
353.It Dv EVFILT_AIO
354Events for this filter are not registered with
355.Fn kevent
356directly but are registered via the
357.Va aio_sigevent
358member of an asychronous I/O request when it is scheduled via an asychronous I/O
359system call such as
360.Fn aio_read .
361The filter returns under the same conditions as
362.Fn aio_error .
363For more details on this filter see
364.Xr sigevent 3 and
365.Xr aio 4 .
366.It Dv EVFILT_VNODE
367Takes a file descriptor as the identifier and the events to watch for in
368.Va fflags ,
369and returns when one or more of the requested events occurs on the descriptor.
370The events to monitor are:
371.Bl -tag -width "Dv NOTE_CLOSE_WRITE"
372.It Dv NOTE_ATTRIB
373The file referenced by the descriptor had its attributes changed.
374.It Dv NOTE_CLOSE
375A file descriptor referencing the monitored file, was closed.
376The closed file descriptor did not have write access.
377.It Dv NOTE_CLOSE_WRITE
378A file descriptor referencing the monitored file, was closed.
379The closed file descriptor had write access.
380.Pp
381This note, as well as
382.Dv NOTE_CLOSE ,
383are not activated when files are closed forcibly by
384.Xr unmount 2 or
385.Xr revoke 2 .
386Instead,
387.Dv NOTE_REVOKE
388is sent for such events.
389.It Dv NOTE_DELETE
390The
391.Fn unlink
392system call was called on the file referenced by the descriptor.
393.It Dv NOTE_EXTEND
394For regular file, the file referenced by the descriptor was extended.
395.Pp
396For directory, reports that a directory entry was added or removed,
397as the result of rename operation.
398The
399.Dv NOTE_EXTEND
400event is not reported when a name is changed inside the directory.
401.It Dv NOTE_LINK
402The link count on the file changed.
403In particular, the
404.Dv NOTE_LINK
405event is reported if a subdirectory was created or deleted inside
406the directory referenced by the descriptor.
407.It Dv NOTE_OPEN
408The file referenced by the descriptor was opened.
409.It Dv NOTE_READ
410A read occurred on the file referenced by the descriptor.
411.It Dv NOTE_RENAME
412The file referenced by the descriptor was renamed.
413.It Dv NOTE_REVOKE
414Access to the file was revoked via
415.Xr revoke 2
416or the underlying file system was unmounted.
417.It Dv NOTE_WRITE
418A write occurred on the file referenced by the descriptor.
419.El
420.Pp
421On return,
422.Va fflags
423contains the events which triggered the filter.
424.It Dv EVFILT_PROC
425Takes the process ID to monitor as the identifier and the events to watch for
426in
427.Va fflags ,
428and returns when the process performs one or more of the requested events.
429If a process can normally see another process, it can attach an event to it.
430The events to monitor are:
431.Bl -tag -width "Dv NOTE_TRACKERR"
432.It Dv NOTE_EXIT
433The process has exited.
434The exit status will be stored in
435.Va data .
436.It Dv NOTE_FORK
437The process has called
438.Fn fork .
439.It Dv NOTE_EXEC
440The process has executed a new process via
441.Xr execve 2
442or a similar call.
443.It Dv NOTE_TRACK
444Follow a process across
445.Fn fork
446calls.
447The parent process registers a new kevent to monitor the child process
448using the same
449.Va fflags
450as the original event.
451The child process will signal an event with
452.Dv NOTE_CHILD
453set in
454.Va fflags
455and the parent PID in
456.Va data .
457.Pp
458If the parent process fails to register a new kevent
459.Pq usually due to resource limitations ,
460it will signal an event with
461.Dv NOTE_TRACKERR
462set in
463.Va fflags ,
464and the child process will not signal a
465.Dv NOTE_CHILD
466event.
467.El
468.Pp
469On return,
470.Va fflags
471contains the events which triggered the filter.
472.It Dv EVFILT_PROCDESC
473Takes the process descriptor created by
474.Xr pdfork 2
475to monitor as the identifier and the events to watch for in
476.Va fflags ,
477and returns when the associated process performs one or more of the
478requested events.
479The events to monitor are:
480.Bl -tag -width "Dv NOTE_EXIT"
481.It Dv NOTE_EXIT
482The process has exited.
483The exit status will be stored in
484.Va data .
485.El
486.Pp
487On return,
488.Va fflags
489contains the events which triggered the filter.
490.It Dv EVFILT_SIGNAL
491Takes the signal number to monitor as the identifier and returns
492when the given signal is delivered to the process.
493This coexists with the
494.Fn signal
495and
496.Fn sigaction
497facilities, and has a lower precedence.
498The filter will record
499all attempts to deliver a signal to a process, even if the signal has
500been marked as
501.Dv SIG_IGN ,
502except for the
503.Dv SIGCHLD
504signal, which, if ignored, won't be recorded by the filter.
505Event notification happens after normal
506signal delivery processing.
507.Va data
508returns the number of times the signal has occurred since the last call to
509.Fn kevent .
510This filter automatically sets the
511.Dv EV_CLEAR
512flag internally.
513.It Dv EVFILT_TIMER
514Establishes an arbitrary timer identified by
515.Va ident .
516When adding a timer,
517.Va data
518specifies the moment to fire the timer (for
519.Dv NOTE_ABSTIME )
520or the timeout period.
521The timer will be periodic unless
522.Dv EV_ONESHOT
523or
524.Dv NOTE_ABSTIME
525is specified.
526On return,
527.Va data
528contains the number of times the timeout has expired since the last call to
529.Fn kevent .
530For non-monotonic timers, this filter automatically sets the
531.Dv EV_CLEAR
532flag internally.
533.Pp
534The filter accepts the following flags in the
535.Va fflags
536argument:
537.Bl -tag -width "Dv NOTE_MSECONDS"
538.It Dv NOTE_SECONDS
539.Va data
540is in seconds.
541.It Dv NOTE_MSECONDS
542.Va data
543is in milliseconds.
544.It Dv NOTE_USECONDS
545.Va data
546is in microseconds.
547.It Dv NOTE_NSECONDS
548.Va data
549is in nanoseconds.
550.It Dv NOTE_ABSTIME
551The specified expiration time is absolute.
552.El
553.Pp
554If
555.Va fflags
556is not set, the default is milliseconds.
557On return,
558.Va fflags
559contains the events which triggered the filter.
560.Pp
561There is a system wide limit on the number of timers
562which is controlled by the
563.Va kern.kq_calloutmax
564sysctl.
565.It Dv EVFILT_USER
566Establishes a user event identified by
567.Va ident
568which is not associated with any kernel mechanism but is triggered by
569user level code.
570The lower 24 bits of the
571.Va fflags
572may be used for user defined flags and manipulated using the following:
573.Bl -tag -width "Dv NOTE_FFLAGSMASK"
574.It Dv NOTE_FFNOP
575Ignore the input
576.Va fflags .
577.It Dv NOTE_FFAND
578Bitwise AND
579.Va fflags .
580.It Dv NOTE_FFOR
581Bitwise OR
582.Va fflags .
583.It Dv NOTE_FFCOPY
584Copy
585.Va fflags .
586.It Dv NOTE_FFCTRLMASK
587Control mask for
588.Va fflags .
589.It Dv NOTE_FFLAGSMASK
590User defined flag mask for
591.Va fflags .
592.El
593.Pp
594A user event is triggered for output with the following:
595.Bl -tag -width "Dv NOTE_FFLAGSMASK"
596.It Dv NOTE_TRIGGER
597Cause the event to be triggered.
598.El
599.Pp
600On return,
601.Va fflags
602contains the users defined flags in the lower 24 bits.
603.El
604.Sh CANCELLATION BEHAVIOUR
605If
606.Fa nevents
607is non-zero, i.e. the function is potentially blocking, the call
608is a cancellation point.
609Otherwise, i.e. if
610.Fa nevents
611is zero, the call is not cancellable.
612Cancellation can only occur before any changes are made to the kqueue,
613or when the call was blocked and no changes to the queue were requested.
614.Sh RETURN VALUES
615The
616.Fn kqueue
617system call
618creates a new kernel event queue and returns a file descriptor.
619If there was an error creating the kernel event queue, a value of -1 is
620returned and errno set.
621.Pp
622The
623.Fn kevent
624system call
625returns the number of events placed in the
626.Fa eventlist ,
627up to the value given by
628.Fa nevents .
629If an error occurs while processing an element of the
630.Fa changelist
631and there is enough room in the
632.Fa eventlist ,
633then the event will be placed in the
634.Fa eventlist
635with
636.Dv EV_ERROR
637set in
638.Va flags
639and the system error in
640.Va data .
641Otherwise,
642.Dv -1
643will be returned, and
644.Dv errno
645will be set to indicate the error condition.
646If the time limit expires, then
647.Fn kevent
648returns 0.
649.Sh EXAMPLES
650.Bd -literal -compact
651#include <sys/event.h>
652#include <err.h>
653#include <fcntl.h>
654#include <stdio.h>
655#include <stdlib.h>
656#include <string.h>
657
658int
659main(int argc, char **argv)
660{
661    struct kevent event;    /* Event we want to monitor */
662    struct kevent tevent;   /* Event triggered */
663    int kq, fd, ret;
664
665    if (argc != 2)
666	err(EXIT_FAILURE, "Usage: %s path\en", argv[0]);
667    fd = open(argv[1], O_RDONLY);
668    if (fd == -1)
669	err(EXIT_FAILURE, "Failed to open '%s'", argv[1]);
670
671    /* Create kqueue. */
672    kq = kqueue();
673    if (kq == -1)
674	err(EXIT_FAILURE, "kqueue() failed");
675
676    /* Initialize kevent structure. */
677    EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE,
678	0, NULL);
679    /* Attach event to the kqueue. */
680    ret = kevent(kq, &event, 1, NULL, 0, NULL);
681    if (ret == -1)
682	err(EXIT_FAILURE, "kevent register");
683    if (event.flags & EV_ERROR)
684	errx(EXIT_FAILURE, "Event error: %s", strerror(event.data));
685
686    for (;;) {
687	/* Sleep until something happens. */
688	ret = kevent(kq, NULL, 0, &tevent, 1, NULL);
689	if (ret == -1) {
690	    err(EXIT_FAILURE, "kevent wait");
691	} else if (ret > 0) {
692	    printf("Something was written in '%s'\en", argv[1]);
693	}
694    }
695}
696.Ed
697.Sh ERRORS
698The
699.Fn kqueue
700system call fails if:
701.Bl -tag -width Er
702.It Bq Er ENOMEM
703The kernel failed to allocate enough memory for the kernel queue.
704.It Bq Er ENOMEM
705The
706.Dv RLIMIT_KQUEUES
707rlimit
708(see
709.Xr getrlimit 2 )
710for the current user would be exceeded.
711.It Bq Er EMFILE
712The per-process descriptor table is full.
713.It Bq Er ENFILE
714The system file table is full.
715.El
716.Pp
717The
718.Fn kevent
719system call fails if:
720.Bl -tag -width Er
721.It Bq Er EACCES
722The process does not have permission to register a filter.
723.It Bq Er EFAULT
724There was an error reading or writing the
725.Va kevent
726structure.
727.It Bq Er EBADF
728The specified descriptor is invalid.
729.It Bq Er EINTR
730A signal was delivered before the timeout expired and before any
731events were placed on the kqueue for return.
732.It Bq Er EINTR
733A cancellation request was delivered to the thread, but not yet handled.
734.It Bq Er EINVAL
735The specified time limit or filter is invalid.
736.It Bq Er ENOENT
737The event could not be found to be modified or deleted.
738.It Bq Er ENOMEM
739No memory was available to register the event
740or, in the special case of a timer, the maximum number of
741timers has been exceeded.
742This maximum is configurable via the
743.Va kern.kq_calloutmax
744sysctl.
745.It Bq Er ESRCH
746The specified process to attach to does not exist.
747.El
748.Pp
749When
750.Fn kevent
751call fails with
752.Er EINTR
753error, all changes in the
754.Fa changelist
755have been applied.
756.Sh SEE ALSO
757.Xr aio_error 2 ,
758.Xr aio_read 2 ,
759.Xr aio_return 2 ,
760.Xr poll 2 ,
761.Xr read 2 ,
762.Xr select 2 ,
763.Xr sigaction 2 ,
764.Xr write 2 ,
765.Xr pthread_setcancelstate 3 ,
766.Xr signal 3
767.Sh HISTORY
768The
769.Fn kqueue
770and
771.Fn kevent
772system calls first appeared in
773.Fx 4.1 .
774.Sh AUTHORS
775The
776.Fn kqueue
777system and this manual page were written by
778.An Jonathan Lemon Aq Mt [email protected] .
779.Sh BUGS
780The
781.Fa timeout
782value is limited to 24 hours; longer timeouts will be silently
783reinterpreted as 24 hours.
784.Pp
785In versions older than
786.Fx 12.0 ,
787.In sys/event.h
788failed to parse without including
789.In sys/types.h
790manually.
791