xref: /freebsd-14.2/lib/libc/sys/kqueue.2 (revision 2852e2b2)
1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd June 17, 2017
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/event.h
38.Ft int
39.Fn kqueue "void"
40.Ft int
41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
42.Fn EV_SET "kev" ident filter flags fflags data udata
43.Sh DESCRIPTION
44The
45.Fn kqueue
46system call
47provides a generic method of notifying the user when an event
48happens or a condition holds, based on the results of small
49pieces of kernel code termed filters.
50A kevent is identified by the (ident, filter) pair; there may only
51be one unique kevent per kqueue.
52.Pp
53The filter is executed upon the initial registration of a kevent
54in order to detect whether a preexisting condition is present, and is also
55executed whenever an event is passed to the filter for evaluation.
56If the filter determines that the condition should be reported,
57then the kevent is placed on the kqueue for the user to retrieve.
58.Pp
59The filter is also run when the user attempts to retrieve the kevent
60from the kqueue.
61If the filter indicates that the condition that triggered
62the event no longer holds, the kevent is removed from the kqueue and
63is not returned.
64.Pp
65Multiple events which trigger the filter do not result in multiple
66kevents being placed on the kqueue; instead, the filter will aggregate
67the events into a single struct kevent.
68Calling
69.Fn close
70on a file descriptor will remove any kevents that reference the descriptor.
71.Pp
72The
73.Fn kqueue
74system call
75creates a new kernel event queue and returns a descriptor.
76The queue is not inherited by a child created with
77.Xr fork 2 .
78However, if
79.Xr rfork 2
80is called without the
81.Dv RFFDG
82flag, then the descriptor table is shared,
83which will allow sharing of the kqueue between two processes.
84.Pp
85The
86.Fn kevent
87system call
88is used to register events with the queue, and return any pending
89events to the user.
90The
91.Fa changelist
92argument
93is a pointer to an array of
94.Va kevent
95structures, as defined in
96.In sys/event.h .
97All changes contained in the
98.Fa changelist
99are applied before any pending events are read from the queue.
100The
101.Fa nchanges
102argument
103gives the size of
104.Fa changelist .
105The
106.Fa eventlist
107argument
108is a pointer to an array of kevent structures.
109The
110.Fa nevents
111argument
112determines the size of
113.Fa eventlist .
114When
115.Fa nevents
116is zero,
117.Fn kevent
118will return immediately even if there is a
119.Fa timeout
120specified unlike
121.Xr select 2 .
122If
123.Fa timeout
124is a non-NULL pointer, it specifies a maximum interval to wait
125for an event, which will be interpreted as a struct timespec.
126If
127.Fa timeout
128is a NULL pointer,
129.Fn kevent
130waits indefinitely.
131To effect a poll, the
132.Fa timeout
133argument should be non-NULL, pointing to a zero-valued
134.Va timespec
135structure.
136The same array may be used for the
137.Fa changelist
138and
139.Fa eventlist .
140.Pp
141The
142.Fn EV_SET
143macro is provided for ease of initializing a
144kevent structure.
145.Pp
146The
147.Va kevent
148structure is defined as:
149.Bd -literal
150struct kevent {
151	uintptr_t  ident;	/* identifier for this event */
152	short	  filter;	/* filter for event */
153	u_short	  flags;	/* action flags for kqueue */
154	u_int	  fflags;	/* filter flag value */
155	int64_t   data;		/* filter data value */
156	void	  *udata;	/* opaque user data identifier */
157	uint64_t  ext[4];	/* extentions */
158};
159.Ed
160.Pp
161The fields of
162.Fa struct kevent
163are:
164.Bl -tag -width "Fa filter"
165.It Fa ident
166Value used to identify this event.
167The exact interpretation is determined by the attached filter,
168but often is a file descriptor.
169.It Fa filter
170Identifies the kernel filter used to process this event.
171The pre-defined
172system filters are described below.
173.It Fa flags
174Actions to perform on the event.
175.It Fa fflags
176Filter-specific flags.
177.It Fa data
178Filter-specific data value.
179.It Fa udata
180Opaque user-defined value passed through the kernel unchanged.
181.It Fa ext
182Extended data passed to and from kernel.
183The
184.Fa ext[0]
185and
186.Fa ext[1]
187members use is defined by the filter.
188If the filter does not use them, the members are copied unchanged.
189The
190.Fa ext[2]
191and
192.Fa ext[3]
193members are always passed throught the kernel as-is,
194making additional context available to application.
195.El
196.Pp
197The
198.Va flags
199field can contain the following values:
200.Bl -tag -width EV_DISPATCH
201.It Dv EV_ADD
202Adds the event to the kqueue.
203Re-adding an existing event
204will modify the parameters of the original event, and not result
205in a duplicate entry.
206Adding an event automatically enables it,
207unless overridden by the EV_DISABLE flag.
208.It Dv EV_ENABLE
209Permit
210.Fn kevent
211to return the event if it is triggered.
212.It Dv EV_DISABLE
213Disable the event so
214.Fn kevent
215will not return it.
216The filter itself is not disabled.
217.It Dv EV_DISPATCH
218Disable the event source immediately after delivery of an event.
219See
220.Dv EV_DISABLE
221above.
222.It Dv EV_DELETE
223Removes the event from the kqueue.
224Events which are attached to
225file descriptors are automatically deleted on the last close of
226the descriptor.
227.It Dv EV_RECEIPT
228This flag is useful for making bulk changes to a kqueue without draining
229any pending events.
230When passed as input, it forces
231.Dv EV_ERROR
232to always be returned.
233When a filter is successfully added the
234.Va data
235field will be zero.
236.It Dv EV_ONESHOT
237Causes the event to return only the first occurrence of the filter
238being triggered.
239After the user retrieves the event from the kqueue,
240it is deleted.
241.It Dv EV_CLEAR
242After the event is retrieved by the user, its state is reset.
243This is useful for filters which report state transitions
244instead of the current state.
245Note that some filters may automatically
246set this flag internally.
247.It Dv EV_EOF
248Filters may set this flag to indicate filter-specific EOF condition.
249.It Dv EV_ERROR
250See
251.Sx RETURN VALUES
252below.
253.El
254.Pp
255The predefined system filters are listed below.
256Arguments may be passed to and from the filter via the
257.Va fflags
258and
259.Va data
260fields in the kevent structure.
261.Bl -tag -width "Dv EVFILT_PROCDESC"
262.It Dv EVFILT_READ
263Takes a descriptor as the identifier, and returns whenever
264there is data available to read.
265The behavior of the filter is slightly different depending
266on the descriptor type.
267.Bl -tag -width 2n
268.It Sockets
269Sockets which have previously been passed to
270.Fn listen
271return when there is an incoming connection pending.
272.Va data
273contains the size of the listen backlog.
274.Pp
275Other socket descriptors return when there is data to be read,
276subject to the
277.Dv SO_RCVLOWAT
278value of the socket buffer.
279This may be overridden with a per-filter low water mark at the
280time the filter is added by setting the
281.Dv NOTE_LOWAT
282flag in
283.Va fflags ,
284and specifying the new low water mark in
285.Va data .
286On return,
287.Va data
288contains the number of bytes of protocol data available to read.
289.Pp
290If the read direction of the socket has shutdown, then the filter
291also sets
292.Dv EV_EOF
293in
294.Va flags ,
295and returns the socket error (if any) in
296.Va fflags .
297It is possible for EOF to be returned (indicating the connection is gone)
298while there is still data pending in the socket buffer.
299.It Vnodes
300Returns when the file pointer is not at the end of file.
301.Va data
302contains the offset from current position to end of file,
303and may be negative.
304.Pp
305This behavior is different from
306.Xr poll 2 ,
307where read events are triggered for regular files unconditionally.
308This event can be triggered unconditionally by setting the
309.Dv NOTE_FILE_POLL
310flag in
311.Va fflags .
312.It "Fifos, Pipes"
313Returns when the there is data to read;
314.Va data
315contains the number of bytes available.
316.Pp
317When the last writer disconnects, the filter will set
318.Dv EV_EOF
319in
320.Va flags .
321This may be cleared by passing in
322.Dv EV_CLEAR ,
323at which point the
324filter will resume waiting for data to become available before
325returning.
326.It "BPF devices"
327Returns when the BPF buffer is full, the BPF timeout has expired, or
328when the BPF has
329.Dq immediate mode
330enabled and there is any data to read;
331.Va data
332contains the number of bytes available.
333.El
334.It Dv EVFILT_WRITE
335Takes a descriptor as the identifier, and returns whenever
336it is possible to write to the descriptor.
337For sockets, pipes
338and fifos,
339.Va data
340will contain the amount of space remaining in the write buffer.
341The filter will set EV_EOF when the reader disconnects, and for
342the fifo case, this may be cleared by use of
343.Dv EV_CLEAR .
344Note that this filter is not supported for vnodes or BPF devices.
345.Pp
346For sockets, the low water mark and socket error handling is
347identical to the
348.Dv EVFILT_READ
349case.
350.It Dv EVFILT_EMPTY
351Takes a descriptor as the identifier, and returns whenever
352there is no remaining data in the write buffer.
353.It Dv EVFILT_AIO
354The sigevent portion of the AIO request is filled in, with
355.Va sigev_notify_kqueue
356containing the descriptor of the kqueue that the event should
357be attached to,
358.Va sigev_notify_kevent_flags
359containing the kevent flags which should be
360.Dv EV_ONESHOT ,
361.Dv EV_CLEAR
362or
363.Dv EV_DISPATCH ,
364.Va sigev_value
365containing the udata value, and
366.Va sigev_notify
367set to
368.Dv SIGEV_KEVENT .
369When the
370.Fn aio_*
371system call is made, the event will be registered
372with the specified kqueue, and the
373.Va ident
374argument set to the
375.Fa struct aiocb
376returned by the
377.Fn aio_*
378system call.
379The filter returns under the same conditions as
380.Fn aio_error .
381.It Dv EVFILT_VNODE
382Takes a file descriptor as the identifier and the events to watch for in
383.Va fflags ,
384and returns when one or more of the requested events occurs on the descriptor.
385The events to monitor are:
386.Bl -tag -width "Dv NOTE_CLOSE_WRITE"
387.It Dv NOTE_ATTRIB
388The file referenced by the descriptor had its attributes changed.
389.It Dv NOTE_CLOSE
390A file descriptor referencing the monitored file, was closed.
391The closed file descriptor did not have write access.
392.It Dv NOTE_CLOSE_WRITE
393A file descriptor referencing the monitored file, was closed.
394The closed file descriptor had write access.
395.Pp
396This note, as well as
397.Dv NOTE_CLOSE ,
398are not activated when files are closed forcibly by
399.Xr unmount 2 or
400.Xr revoke 2 .
401Instead,
402.Dv NOTE_REVOKE
403is sent for such events.
404.It Dv NOTE_DELETE
405The
406.Fn unlink
407system call was called on the file referenced by the descriptor.
408.It Dv NOTE_EXTEND
409For regular file, the file referenced by the descriptor was extended.
410.Pp
411For directory, reports that a directory entry was added or removed,
412as the result of rename operation.
413The
414.Dv NOTE_EXTEND
415event is not reported when a name is changed inside the directory.
416.It Dv NOTE_LINK
417The link count on the file changed.
418In particular, the
419.Dv NOTE_LINK
420event is reported if a subdirectory was created or deleted inside
421the directory referenced by the descriptor.
422.It Dv NOTE_OPEN
423The file referenced by the descriptor was opened.
424.It Dv NOTE_READ
425A read occurred on the file referenced by the descriptor.
426.It Dv NOTE_RENAME
427The file referenced by the descriptor was renamed.
428.It Dv NOTE_REVOKE
429Access to the file was revoked via
430.Xr revoke 2
431or the underlying file system was unmounted.
432.It Dv NOTE_WRITE
433A write occurred on the file referenced by the descriptor.
434.El
435.Pp
436On return,
437.Va fflags
438contains the events which triggered the filter.
439.It Dv EVFILT_PROC
440Takes the process ID to monitor as the identifier and the events to watch for
441in
442.Va fflags ,
443and returns when the process performs one or more of the requested events.
444If a process can normally see another process, it can attach an event to it.
445The events to monitor are:
446.Bl -tag -width "Dv NOTE_TRACKERR"
447.It Dv NOTE_EXIT
448The process has exited.
449The exit status will be stored in
450.Va data .
451.It Dv NOTE_FORK
452The process has called
453.Fn fork .
454.It Dv NOTE_EXEC
455The process has executed a new process via
456.Xr execve 2
457or a similar call.
458.It Dv NOTE_TRACK
459Follow a process across
460.Fn fork
461calls.
462The parent process registers a new kevent to monitor the child process
463using the same
464.Va fflags
465as the original event.
466The child process will signal an event with
467.Dv NOTE_CHILD
468set in
469.Va fflags
470and the parent PID in
471.Va data .
472.Pp
473If the parent process fails to register a new kevent
474.Pq usually due to resource limitations ,
475it will signal an event with
476.Dv NOTE_TRACKERR
477set in
478.Va fflags ,
479and the child process will not signal a
480.Dv NOTE_CHILD
481event.
482.El
483.Pp
484On return,
485.Va fflags
486contains the events which triggered the filter.
487.It Dv EVFILT_PROCDESC
488Takes the process descriptor created by
489.Xr pdfork 2
490to monitor as the identifier and the events to watch for in
491.Va fflags ,
492and returns when the associated process performs one or more of the
493requested events.
494The events to monitor are:
495.Bl -tag -width "Dv NOTE_EXIT"
496.It Dv NOTE_EXIT
497The process has exited.
498The exit status will be stored in
499.Va data .
500.El
501.Pp
502On return,
503.Va fflags
504contains the events which triggered the filter.
505.It Dv EVFILT_SIGNAL
506Takes the signal number to monitor as the identifier and returns
507when the given signal is delivered to the process.
508This coexists with the
509.Fn signal
510and
511.Fn sigaction
512facilities, and has a lower precedence.
513The filter will record
514all attempts to deliver a signal to a process, even if the signal has
515been marked as
516.Dv SIG_IGN ,
517except for the
518.Dv SIGCHLD
519signal, which, if ignored, won't be recorded by the filter.
520Event notification happens after normal
521signal delivery processing.
522.Va data
523returns the number of times the signal has occurred since the last call to
524.Fn kevent .
525This filter automatically sets the
526.Dv EV_CLEAR
527flag internally.
528.It Dv EVFILT_TIMER
529Establishes an arbitrary timer identified by
530.Va ident .
531When adding a timer,
532.Va data
533specifies the moment to fire the timer (for
534.Dv NOTE_ABSTIME )
535or the timeout period.
536The timer will be periodic unless
537.Dv EV_ONESHOT
538or
539.Dv NOTE_ABSTIME
540is specified.
541On return,
542.Va data
543contains the number of times the timeout has expired since the last call to
544.Fn kevent .
545For non-monotonic timers, this filter automatically sets the
546.Dv EV_CLEAR
547flag internally.
548.Pp
549The filter accepts the following flags in the
550.Va fflags
551argument:
552.Bl -tag -width "Dv NOTE_MSECONDS"
553.It Dv NOTE_SECONDS
554.Va data
555is in seconds.
556.It Dv NOTE_MSECONDS
557.Va data
558is in milliseconds.
559.It Dv NOTE_USECONDS
560.Va data
561is in microseconds.
562.It Dv NOTE_NSECONDS
563.Va data
564is in nanoseconds.
565.It Dv NOTE_ABSTIME
566The specified expiration time is absolute.
567.El
568.Pp
569If
570.Va fflags
571is not set, the default is milliseconds.
572On return,
573.Va fflags
574contains the events which triggered the filter.
575.Pp
576There is a system wide limit on the number of timers
577which is controlled by the
578.Va kern.kq_calloutmax
579sysctl.
580.It Dv EVFILT_USER
581Establishes a user event identified by
582.Va ident
583which is not associated with any kernel mechanism but is triggered by
584user level code.
585The lower 24 bits of the
586.Va fflags
587may be used for user defined flags and manipulated using the following:
588.Bl -tag -width "Dv NOTE_FFLAGSMASK"
589.It Dv NOTE_FFNOP
590Ignore the input
591.Va fflags .
592.It Dv NOTE_FFAND
593Bitwise AND
594.Va fflags .
595.It Dv NOTE_FFOR
596Bitwise OR
597.Va fflags .
598.It Dv NOTE_FFCOPY
599Copy
600.Va fflags .
601.It Dv NOTE_FFCTRLMASK
602Control mask for
603.Va fflags .
604.It Dv NOTE_FFLAGSMASK
605User defined flag mask for
606.Va fflags .
607.El
608.Pp
609A user event is triggered for output with the following:
610.Bl -tag -width "Dv NOTE_FFLAGSMASK"
611.It Dv NOTE_TRIGGER
612Cause the event to be triggered.
613.El
614.Pp
615On return,
616.Va fflags
617contains the users defined flags in the lower 24 bits.
618.El
619.Sh CANCELLATION BEHAVIOUR
620If
621.Fa nevents
622is non-zero, i.e. the function is potentially blocking, the call
623is a cancellation point.
624Otherwise, i.e. if
625.Fa nevents
626is zero, the call is not cancellable.
627Cancellation can only occur before any changes are made to the kqueue,
628or when the call was blocked and no changes to the queue were requested.
629.Sh RETURN VALUES
630The
631.Fn kqueue
632system call
633creates a new kernel event queue and returns a file descriptor.
634If there was an error creating the kernel event queue, a value of -1 is
635returned and errno set.
636.Pp
637The
638.Fn kevent
639system call
640returns the number of events placed in the
641.Fa eventlist ,
642up to the value given by
643.Fa nevents .
644If an error occurs while processing an element of the
645.Fa changelist
646and there is enough room in the
647.Fa eventlist ,
648then the event will be placed in the
649.Fa eventlist
650with
651.Dv EV_ERROR
652set in
653.Va flags
654and the system error in
655.Va data .
656Otherwise,
657.Dv -1
658will be returned, and
659.Dv errno
660will be set to indicate the error condition.
661If the time limit expires, then
662.Fn kevent
663returns 0.
664.Sh EXAMPLES
665.Bd -literal -compact
666#include <sys/event.h>
667#include <err.h>
668#include <fcntl.h>
669#include <stdio.h>
670#include <stdlib.h>
671#include <string.h>
672
673int
674main(int argc, char **argv)
675{
676    struct kevent event;    /* Event we want to monitor */
677    struct kevent tevent;   /* Event triggered */
678    int kq, fd, ret;
679
680    if (argc != 2)
681	err(EXIT_FAILURE, "Usage: %s path\en", argv[0]);
682    fd = open(argv[1], O_RDONLY);
683    if (fd == -1)
684	err(EXIT_FAILURE, "Failed to open '%s'", argv[1]);
685
686    /* Create kqueue. */
687    kq = kqueue();
688    if (kq == -1)
689	err(EXIT_FAILURE, "kqueue() failed");
690
691    /* Initialize kevent structure. */
692    EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE,
693	0, NULL);
694    /* Attach event to the kqueue. */
695    ret = kevent(kq, &event, 1, NULL, 0, NULL);
696    if (ret == -1)
697	err(EXIT_FAILURE, "kevent register");
698    if (event.flags & EV_ERROR)
699	errx(EXIT_FAILURE, "Event error: %s", strerror(event.data));
700
701    for (;;) {
702	/* Sleep until something happens. */
703	ret = kevent(kq, NULL, 0, &tevent, 1, NULL);
704	if (ret == -1) {
705	    err(EXIT_FAILURE, "kevent wait");
706	} else if (ret > 0) {
707	    printf("Something was written in '%s'\en", argv[1]);
708	}
709    }
710}
711.Ed
712.Sh ERRORS
713The
714.Fn kqueue
715system call fails if:
716.Bl -tag -width Er
717.It Bq Er ENOMEM
718The kernel failed to allocate enough memory for the kernel queue.
719.It Bq Er ENOMEM
720The
721.Dv RLIMIT_KQUEUES
722rlimit
723(see
724.Xr getrlimit 2 )
725for the current user would be exceeded.
726.It Bq Er EMFILE
727The per-process descriptor table is full.
728.It Bq Er ENFILE
729The system file table is full.
730.El
731.Pp
732The
733.Fn kevent
734system call fails if:
735.Bl -tag -width Er
736.It Bq Er EACCES
737The process does not have permission to register a filter.
738.It Bq Er EFAULT
739There was an error reading or writing the
740.Va kevent
741structure.
742.It Bq Er EBADF
743The specified descriptor is invalid.
744.It Bq Er EINTR
745A signal was delivered before the timeout expired and before any
746events were placed on the kqueue for return.
747.It Bq Er EINTR
748A cancellation request was delivered to the thread, but not yet handled.
749.It Bq Er EINVAL
750The specified time limit or filter is invalid.
751.It Bq Er ENOENT
752The event could not be found to be modified or deleted.
753.It Bq Er ENOMEM
754No memory was available to register the event
755or, in the special case of a timer, the maximum number of
756timers has been exceeded.
757This maximum is configurable via the
758.Va kern.kq_calloutmax
759sysctl.
760.It Bq Er ESRCH
761The specified process to attach to does not exist.
762.El
763.Pp
764When
765.Fn kevent
766call fails with
767.Er EINTR
768error, all changes in the
769.Fa changelist
770have been applied.
771.Sh SEE ALSO
772.Xr aio_error 2 ,
773.Xr aio_read 2 ,
774.Xr aio_return 2 ,
775.Xr poll 2 ,
776.Xr read 2 ,
777.Xr select 2 ,
778.Xr sigaction 2 ,
779.Xr write 2 ,
780.Xr pthread_setcancelstate 3 ,
781.Xr signal 3
782.Sh HISTORY
783The
784.Fn kqueue
785and
786.Fn kevent
787system calls first appeared in
788.Fx 4.1 .
789.Sh AUTHORS
790The
791.Fn kqueue
792system and this manual page were written by
793.An Jonathan Lemon Aq Mt [email protected] .
794.Sh BUGS
795The
796.Fa timeout
797value is limited to 24 hours; longer timeouts will be silently
798reinterpreted as 24 hours.
799.Pp
800Previous versions of
801.In sys/event.h
802fail to parse without including
803.In sys/types.h
804manually.
805