xref: /freebsd-14.2/lib/libc/sys/kqueue.2 (revision 2b375b4e)
1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd May 3, 2016
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/types.h
38.In sys/event.h
39.In sys/time.h
40.Ft int
41.Fn kqueue "void"
42.Ft int
43.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
44.Fn EV_SET "kev" ident filter flags fflags data udata
45.Sh DESCRIPTION
46The
47.Fn kqueue
48system call
49provides a generic method of notifying the user when an event
50happens or a condition holds, based on the results of small
51pieces of kernel code termed filters.
52A kevent is identified by the (ident, filter) pair; there may only
53be one unique kevent per kqueue.
54.Pp
55The filter is executed upon the initial registration of a kevent
56in order to detect whether a preexisting condition is present, and is also
57executed whenever an event is passed to the filter for evaluation.
58If the filter determines that the condition should be reported,
59then the kevent is placed on the kqueue for the user to retrieve.
60.Pp
61The filter is also run when the user attempts to retrieve the kevent
62from the kqueue.
63If the filter indicates that the condition that triggered
64the event no longer holds, the kevent is removed from the kqueue and
65is not returned.
66.Pp
67Multiple events which trigger the filter do not result in multiple
68kevents being placed on the kqueue; instead, the filter will aggregate
69the events into a single struct kevent.
70Calling
71.Fn close
72on a file descriptor will remove any kevents that reference the descriptor.
73.Pp
74The
75.Fn kqueue
76system call
77creates a new kernel event queue and returns a descriptor.
78The queue is not inherited by a child created with
79.Xr fork 2 .
80However, if
81.Xr rfork 2
82is called without the
83.Dv RFFDG
84flag, then the descriptor table is shared,
85which will allow sharing of the kqueue between two processes.
86.Pp
87The
88.Fn kevent
89system call
90is used to register events with the queue, and return any pending
91events to the user.
92The
93.Fa changelist
94argument
95is a pointer to an array of
96.Va kevent
97structures, as defined in
98.In sys/event.h .
99All changes contained in the
100.Fa changelist
101are applied before any pending events are read from the queue.
102The
103.Fa nchanges
104argument
105gives the size of
106.Fa changelist .
107The
108.Fa eventlist
109argument
110is a pointer to an array of kevent structures.
111The
112.Fa nevents
113argument
114determines the size of
115.Fa eventlist .
116When
117.Fa nevents
118is zero,
119.Fn kevent
120will return immediately even if there is a
121.Fa timeout
122specified unlike
123.Xr select 2 .
124If
125.Fa timeout
126is a non-NULL pointer, it specifies a maximum interval to wait
127for an event, which will be interpreted as a struct timespec.
128If
129.Fa timeout
130is a NULL pointer,
131.Fn kevent
132waits indefinitely.
133To effect a poll, the
134.Fa timeout
135argument should be non-NULL, pointing to a zero-valued
136.Va timespec
137structure.
138The same array may be used for the
139.Fa changelist
140and
141.Fa eventlist .
142.Pp
143The
144.Fn EV_SET
145macro is provided for ease of initializing a
146kevent structure.
147.Pp
148The
149.Va kevent
150structure is defined as:
151.Bd -literal
152struct kevent {
153	uintptr_t ident;	/* identifier for this event */
154	short	  filter;	/* filter for event */
155	u_short	  flags;	/* action flags for kqueue */
156	u_int	  fflags;	/* filter flag value */
157	intptr_t  data;		/* filter data value */
158	void	  *udata;	/* opaque user data identifier */
159};
160.Ed
161.Pp
162The fields of
163.Fa struct kevent
164are:
165.Bl -tag -width "Fa filter"
166.It Fa ident
167Value used to identify this event.
168The exact interpretation is determined by the attached filter,
169but often is a file descriptor.
170.It Fa filter
171Identifies the kernel filter used to process this event.
172The pre-defined
173system filters are described below.
174.It Fa flags
175Actions to perform on the event.
176.It Fa fflags
177Filter-specific flags.
178.It Fa data
179Filter-specific data value.
180.It Fa udata
181Opaque user-defined value passed through the kernel unchanged.
182.El
183.Pp
184The
185.Va flags
186field can contain the following values:
187.Bl -tag -width EV_DISPATCH
188.It Dv EV_ADD
189Adds the event to the kqueue.
190Re-adding an existing event
191will modify the parameters of the original event, and not result
192in a duplicate entry.
193Adding an event automatically enables it,
194unless overridden by the EV_DISABLE flag.
195.It Dv EV_ENABLE
196Permit
197.Fn kevent
198to return the event if it is triggered.
199.It Dv EV_DISABLE
200Disable the event so
201.Fn kevent
202will not return it.
203The filter itself is not disabled.
204.It Dv EV_DISPATCH
205Disable the event source immediately after delivery of an event.
206See
207.Dv EV_DISABLE
208above.
209.It Dv EV_DELETE
210Removes the event from the kqueue.
211Events which are attached to
212file descriptors are automatically deleted on the last close of
213the descriptor.
214.It Dv EV_RECEIPT
215This flag is useful for making bulk changes to a kqueue without draining
216any pending events.
217When passed as input, it forces
218.Dv EV_ERROR
219to always be returned.
220When a filter is successfully added the
221.Va data
222field will be zero.
223.It Dv EV_ONESHOT
224Causes the event to return only the first occurrence of the filter
225being triggered.
226After the user retrieves the event from the kqueue,
227it is deleted.
228.It Dv EV_CLEAR
229After the event is retrieved by the user, its state is reset.
230This is useful for filters which report state transitions
231instead of the current state.
232Note that some filters may automatically
233set this flag internally.
234.It Dv EV_EOF
235Filters may set this flag to indicate filter-specific EOF condition.
236.It Dv EV_ERROR
237See
238.Sx RETURN VALUES
239below.
240.El
241.Pp
242The predefined system filters are listed below.
243Arguments may be passed to and from the filter via the
244.Va fflags
245and
246.Va data
247fields in the kevent structure.
248.Bl -tag -width "Dv EVFILT_PROCDESC"
249.It Dv EVFILT_READ
250Takes a descriptor as the identifier, and returns whenever
251there is data available to read.
252The behavior of the filter is slightly different depending
253on the descriptor type.
254.Bl -tag -width 2n
255.It Sockets
256Sockets which have previously been passed to
257.Fn listen
258return when there is an incoming connection pending.
259.Va data
260contains the size of the listen backlog.
261.Pp
262Other socket descriptors return when there is data to be read,
263subject to the
264.Dv SO_RCVLOWAT
265value of the socket buffer.
266This may be overridden with a per-filter low water mark at the
267time the filter is added by setting the
268.Dv NOTE_LOWAT
269flag in
270.Va fflags ,
271and specifying the new low water mark in
272.Va data .
273On return,
274.Va data
275contains the number of bytes of protocol data available to read.
276.Pp
277If the read direction of the socket has shutdown, then the filter
278also sets
279.Dv EV_EOF
280in
281.Va flags ,
282and returns the socket error (if any) in
283.Va fflags .
284It is possible for EOF to be returned (indicating the connection is gone)
285while there is still data pending in the socket buffer.
286.It Vnodes
287Returns when the file pointer is not at the end of file.
288.Va data
289contains the offset from current position to end of file,
290and may be negative.
291.Pp
292This behavior is different from
293.Xr poll 2 ,
294where read events are triggered for regular files unconditionally.
295This event can be triggered unconditionally by setting the
296.Dv NOTE_FILE_POLL
297flag in
298.Va fflags .
299.It "Fifos, Pipes"
300Returns when the there is data to read;
301.Va data
302contains the number of bytes available.
303.Pp
304When the last writer disconnects, the filter will set
305.Dv EV_EOF
306in
307.Va flags .
308This may be cleared by passing in
309.Dv EV_CLEAR ,
310at which point the
311filter will resume waiting for data to become available before
312returning.
313.It "BPF devices"
314Returns when the BPF buffer is full, the BPF timeout has expired, or
315when the BPF has
316.Dq immediate mode
317enabled and there is any data to read;
318.Va data
319contains the number of bytes available.
320.El
321.It Dv EVFILT_WRITE
322Takes a descriptor as the identifier, and returns whenever
323it is possible to write to the descriptor.
324For sockets, pipes
325and fifos,
326.Va data
327will contain the amount of space remaining in the write buffer.
328The filter will set EV_EOF when the reader disconnects, and for
329the fifo case, this may be cleared by use of
330.Dv EV_CLEAR .
331Note that this filter is not supported for vnodes or BPF devices.
332.Pp
333For sockets, the low water mark and socket error handling is
334identical to the
335.Dv EVFILT_READ
336case.
337.It Dv EVFILT_AIO
338The sigevent portion of the AIO request is filled in, with
339.Va sigev_notify_kqueue
340containing the descriptor of the kqueue that the event should
341be attached to,
342.Va sigev_notify_kevent_flags
343containing the kevent flags which should be
344.Dv EV_ONESHOT ,
345.Dv EV_CLEAR
346or
347.Dv EV_DISPATCH ,
348.Va sigev_value
349containing the udata value, and
350.Va sigev_notify
351set to
352.Dv SIGEV_KEVENT .
353When the
354.Fn aio_*
355system call is made, the event will be registered
356with the specified kqueue, and the
357.Va ident
358argument set to the
359.Fa struct aiocb
360returned by the
361.Fn aio_*
362system call.
363The filter returns under the same conditions as
364.Fn aio_error .
365.It Dv EVFILT_VNODE
366Takes a file descriptor as the identifier and the events to watch for in
367.Va fflags ,
368and returns when one or more of the requested events occurs on the descriptor.
369The events to monitor are:
370.Bl -tag -width "Dv NOTE_CLOSE_WRITE"
371.It Dv NOTE_ATTRIB
372The file referenced by the descriptor had its attributes changed.
373.It Dv NOTE_CLOSE
374A file descriptor referencing the monitored file, was closed.
375The closed file descriptor did not have write access.
376.It Dv NOTE_CLOSE_WRITE
377A file descriptor referencing the monitored file, was closed.
378The closed file descriptor had write access.
379.Pp
380This note, as well as
381.Dv NOTE_CLOSE ,
382are not activated when files are closed forcibly by
383.Xr unmount 2 or
384.Xr revoke 2 .
385Instead,
386.Dv NOTE_REVOKE
387is sent for such events.
388.It Dv NOTE_DELETE
389The
390.Fn unlink
391system call was called on the file referenced by the descriptor.
392.It Dv NOTE_EXTEND
393For regular file, the file referenced by the descriptor was extended.
394.Pp
395For directory, reports that a directory entry was added or removed,
396as the result of rename operation.
397The
398.Dv NOTE_EXTEND
399event is not reported when a name is changed inside the directory.
400.It Dv NOTE_LINK
401The link count on the file changed.
402In particular, the
403.Dv NOTE_LINK
404event is reported if a subdirectory was created or deleted inside
405the directory referenced by the descriptor.
406.It Dv NOTE_OPEN
407The file referenced by the descriptor was opened.
408.It Dv NOTE_READ
409A read occurred on the file referenced by the descriptor.
410.It Dv NOTE_RENAME
411The file referenced by the descriptor was renamed.
412.It Dv NOTE_REVOKE
413Access to the file was revoked via
414.Xr revoke 2
415or the underlying file system was unmounted.
416.It Dv NOTE_WRITE
417A write occurred on the file referenced by the descriptor.
418.El
419.Pp
420On return,
421.Va fflags
422contains the events which triggered the filter.
423.It Dv EVFILT_PROC
424Takes the process ID to monitor as the identifier and the events to watch for
425in
426.Va fflags ,
427and returns when the process performs one or more of the requested events.
428If a process can normally see another process, it can attach an event to it.
429The events to monitor are:
430.Bl -tag -width "Dv NOTE_TRACKERR"
431.It Dv NOTE_EXIT
432The process has exited.
433The exit status will be stored in
434.Va data .
435.It Dv NOTE_FORK
436The process has called
437.Fn fork .
438.It Dv NOTE_EXEC
439The process has executed a new process via
440.Xr execve 2
441or a similar call.
442.It Dv NOTE_TRACK
443Follow a process across
444.Fn fork
445calls.
446The parent process registers a new kevent to monitor the child process
447using the same
448.Va fflags
449as the original event.
450The child process will signal an event with
451.Dv NOTE_CHILD
452set in
453.Va fflags
454and the parent PID in
455.Va data .
456.Pp
457If the parent process fails to register a new kevent
458.Pq usually due to resource limitations ,
459it will signal an event with
460.Dv NOTE_TRACKERR
461set in
462.Va fflags ,
463and the child process will not signal a
464.Dv NOTE_CHILD
465event.
466.El
467.Pp
468On return,
469.Va fflags
470contains the events which triggered the filter.
471.It Dv EVFILT_PROCDESC
472Takes the process descriptor created by
473.Xr pdfork 2
474to monitor as the identifier and the events to watch for in
475.Va fflags ,
476and returns when the associated process performs one or more of the
477requested events.
478The events to monitor are:
479.Bl -tag -width "Dv NOTE_EXIT"
480.It Dv NOTE_EXIT
481The process has exited.
482The exit status will be stored in
483.Va data .
484.El
485.Pp
486On return,
487.Va fflags
488contains the events which triggered the filter.
489.It Dv EVFILT_SIGNAL
490Takes the signal number to monitor as the identifier and returns
491when the given signal is delivered to the process.
492This coexists with the
493.Fn signal
494and
495.Fn sigaction
496facilities, and has a lower precedence.
497The filter will record
498all attempts to deliver a signal to a process, even if the signal has
499been marked as
500.Dv SIG_IGN ,
501except for the
502.Dv SIGCHLD
503signal, which, if ignored, won't be recorded by the filter.
504Event notification happens after normal
505signal delivery processing.
506.Va data
507returns the number of times the signal has occurred since the last call to
508.Fn kevent .
509This filter automatically sets the
510.Dv EV_CLEAR
511flag internally.
512.It Dv EVFILT_TIMER
513Establishes an arbitrary timer identified by
514.Va ident .
515When adding a timer,
516.Va data
517specifies the timeout period.
518The timer will be periodic unless
519.Dv EV_ONESHOT
520is specified.
521On return,
522.Va data
523contains the number of times the timeout has expired since the last call to
524.Fn kevent .
525This filter automatically sets the EV_CLEAR flag internally.
526There is a system wide limit on the number of timers
527which is controlled by the
528.Va kern.kq_calloutmax
529sysctl.
530.Bl -tag -width "Dv NOTE_USECONDS"
531.It Dv NOTE_SECONDS
532.Va data
533is in seconds.
534.It Dv NOTE_MSECONDS
535.Va data
536is in milliseconds.
537.It Dv NOTE_USECONDS
538.Va data
539is in microseconds.
540.It Dv NOTE_NSECONDS
541.Va data
542is in nanoseconds.
543.El
544.Pp
545If
546.Va fflags
547is not set, the default is milliseconds. On return,
548.Va fflags
549contains the events which triggered the filter.
550.It Dv EVFILT_USER
551Establishes a user event identified by
552.Va ident
553which is not associated with any kernel mechanism but is triggered by
554user level code.
555The lower 24 bits of the
556.Va fflags
557may be used for user defined flags and manipulated using the following:
558.Bl -tag -width "Dv NOTE_FFLAGSMASK"
559.It Dv NOTE_FFNOP
560Ignore the input
561.Va fflags .
562.It Dv NOTE_FFAND
563Bitwise AND
564.Va fflags .
565.It Dv NOTE_FFOR
566Bitwise OR
567.Va fflags .
568.It Dv NOTE_FFCOPY
569Copy
570.Va fflags .
571.It Dv NOTE_FFCTRLMASK
572Control mask for
573.Va fflags .
574.It Dv NOTE_FFLAGSMASK
575User defined flag mask for
576.Va fflags .
577.El
578.Pp
579A user event is triggered for output with the following:
580.Bl -tag -width "Dv NOTE_FFLAGSMASK"
581.It Dv NOTE_TRIGGER
582Cause the event to be triggered.
583.El
584.Pp
585On return,
586.Va fflags
587contains the users defined flags in the lower 24 bits.
588.El
589.Sh CANCELLATION BEHAVIOUR
590If
591.Fa nevents
592is non-zero, i.e. the function is potentially blocking, the call
593is a cancellation point.
594Otherwise, i.e. if
595.Fa nevents
596is zero, the call is not cancellable.
597Cancellation can only occur before any changes are made to the kqueue,
598or when the call was blocked and no changes to the queue were requested.
599.Sh RETURN VALUES
600The
601.Fn kqueue
602system call
603creates a new kernel event queue and returns a file descriptor.
604If there was an error creating the kernel event queue, a value of -1 is
605returned and errno set.
606.Pp
607The
608.Fn kevent
609system call
610returns the number of events placed in the
611.Fa eventlist ,
612up to the value given by
613.Fa nevents .
614If an error occurs while processing an element of the
615.Fa changelist
616and there is enough room in the
617.Fa eventlist ,
618then the event will be placed in the
619.Fa eventlist
620with
621.Dv EV_ERROR
622set in
623.Va flags
624and the system error in
625.Va data .
626Otherwise,
627.Dv -1
628will be returned, and
629.Dv errno
630will be set to indicate the error condition.
631If the time limit expires, then
632.Fn kevent
633returns 0.
634.Sh EXAMPLES
635.Bd -literal -compact
636#include <sys/types.h>
637#include <sys/event.h>
638#include <sys/time.h>
639#include <err.h>
640#include <fcntl.h>
641#include <stdio.h>
642#include <stdlib.h>
643#include <string.h>
644#include <unistd.h>
645
646int
647main(int argc, char **argv)
648{
649    struct kevent event;    /* Event we want to monitor */
650    struct kevent tevent;   /* Event triggered */
651    int kq, fd, ret;
652
653    if (argc != 2)
654	err(EXIT_FAILURE, "Usage: %s path\en", argv[0]);
655    fd = open(argv[1], O_RDONLY);
656    if (fd == -1)
657	err(EXIT_FAILURE, "Failed to open '%s'", argv[1]);
658
659    /* Create kqueue. */
660    kq = kqueue();
661    if (kq == -1)
662	err(EXIT_FAILURE, "kqueue() failed");
663
664    /* Initialize kevent structure. */
665    EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE,
666	0, NULL);
667    /* Attach event to the kqueue. */
668    ret = kevent(kq, &event, 1, NULL, 0, NULL);
669    if (ret == -1)
670	err(EXIT_FAILURE, "kevent register");
671    if (event.flags & EV_ERROR)
672	errx(EXIT_FAILURE, "Event error: %s", strerror(event.data));
673
674    for (;;) {
675	/* Sleep until something happens. */
676	ret = kevent(kq, NULL, 0, &tevent, 1, NULL);
677	if (ret == -1) {
678	    err(EXIT_FAILURE, "kevent wait");
679	} else if (ret > 0) {
680	    printf("Something was written in '%s'\en", argv[1]);
681	}
682    }
683}
684.Ed
685.Sh ERRORS
686The
687.Fn kqueue
688system call fails if:
689.Bl -tag -width Er
690.It Bq Er ENOMEM
691The kernel failed to allocate enough memory for the kernel queue.
692.It Bq Er ENOMEM
693The
694.Dv RLIMIT_KQUEUES
695rlimit
696(see
697.Xr getrlimit 2 )
698for the current user would be exceeded.
699.It Bq Er EMFILE
700The per-process descriptor table is full.
701.It Bq Er ENFILE
702The system file table is full.
703.El
704.Pp
705The
706.Fn kevent
707system call fails if:
708.Bl -tag -width Er
709.It Bq Er EACCES
710The process does not have permission to register a filter.
711.It Bq Er EFAULT
712There was an error reading or writing the
713.Va kevent
714structure.
715.It Bq Er EBADF
716The specified descriptor is invalid.
717.It Bq Er EINTR
718A signal was delivered before the timeout expired and before any
719events were placed on the kqueue for return.
720.It Bq Er EINTR
721A cancellation request was delivered to the thread, but not yet handled.
722.It Bq Er EINVAL
723The specified time limit or filter is invalid.
724.It Bq Er ENOENT
725The event could not be found to be modified or deleted.
726.It Bq Er ENOMEM
727No memory was available to register the event
728or, in the special case of a timer, the maximum number of
729timers has been exceeded.
730This maximum is configurable via the
731.Va kern.kq_calloutmax
732sysctl.
733.It Bq Er ESRCH
734The specified process to attach to does not exist.
735.El
736.Pp
737When
738.Fn kevent
739call fails with
740.Er EINTR
741error, all changes in the
742.Fa changelist
743have been applied.
744.Sh SEE ALSO
745.Xr aio_error 2 ,
746.Xr aio_read 2 ,
747.Xr aio_return 2 ,
748.Xr poll 2 ,
749.Xr read 2 ,
750.Xr select 2 ,
751.Xr sigaction 2 ,
752.Xr write 2 ,
753.Xr pthread_setcancelstate 3 ,
754.Xr signal 3
755.Sh HISTORY
756The
757.Fn kqueue
758and
759.Fn kevent
760system calls first appeared in
761.Fx 4.1 .
762.Sh AUTHORS
763The
764.Fn kqueue
765system and this manual page were written by
766.An Jonathan Lemon Aq Mt [email protected] .
767.Sh BUGS
768The
769.Fa timeout
770value is limited to 24 hours; longer timeouts will be silently
771reinterpreted as 24 hours.
772