xref: /freebsd-12.1/lib/libc/sys/kqueue.2 (revision cc65eb4e)
1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd February 15, 2017
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/event.h
38.Ft int
39.Fn kqueue "void"
40.Ft int
41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
42.Fn EV_SET "kev" ident filter flags fflags data udata
43.Sh DESCRIPTION
44The
45.Fn kqueue
46system call
47provides a generic method of notifying the user when an event
48happens or a condition holds, based on the results of small
49pieces of kernel code termed filters.
50A kevent is identified by the (ident, filter) pair; there may only
51be one unique kevent per kqueue.
52.Pp
53The filter is executed upon the initial registration of a kevent
54in order to detect whether a preexisting condition is present, and is also
55executed whenever an event is passed to the filter for evaluation.
56If the filter determines that the condition should be reported,
57then the kevent is placed on the kqueue for the user to retrieve.
58.Pp
59The filter is also run when the user attempts to retrieve the kevent
60from the kqueue.
61If the filter indicates that the condition that triggered
62the event no longer holds, the kevent is removed from the kqueue and
63is not returned.
64.Pp
65Multiple events which trigger the filter do not result in multiple
66kevents being placed on the kqueue; instead, the filter will aggregate
67the events into a single struct kevent.
68Calling
69.Fn close
70on a file descriptor will remove any kevents that reference the descriptor.
71.Pp
72The
73.Fn kqueue
74system call
75creates a new kernel event queue and returns a descriptor.
76The queue is not inherited by a child created with
77.Xr fork 2 .
78However, if
79.Xr rfork 2
80is called without the
81.Dv RFFDG
82flag, then the descriptor table is shared,
83which will allow sharing of the kqueue between two processes.
84.Pp
85The
86.Fn kevent
87system call
88is used to register events with the queue, and return any pending
89events to the user.
90The
91.Fa changelist
92argument
93is a pointer to an array of
94.Va kevent
95structures, as defined in
96.In sys/event.h .
97All changes contained in the
98.Fa changelist
99are applied before any pending events are read from the queue.
100The
101.Fa nchanges
102argument
103gives the size of
104.Fa changelist .
105The
106.Fa eventlist
107argument
108is a pointer to an array of kevent structures.
109The
110.Fa nevents
111argument
112determines the size of
113.Fa eventlist .
114When
115.Fa nevents
116is zero,
117.Fn kevent
118will return immediately even if there is a
119.Fa timeout
120specified unlike
121.Xr select 2 .
122If
123.Fa timeout
124is a non-NULL pointer, it specifies a maximum interval to wait
125for an event, which will be interpreted as a struct timespec.
126If
127.Fa timeout
128is a NULL pointer,
129.Fn kevent
130waits indefinitely.
131To effect a poll, the
132.Fa timeout
133argument should be non-NULL, pointing to a zero-valued
134.Va timespec
135structure.
136The same array may be used for the
137.Fa changelist
138and
139.Fa eventlist .
140.Pp
141The
142.Fn EV_SET
143macro is provided for ease of initializing a
144kevent structure.
145.Pp
146The
147.Va kevent
148structure is defined as:
149.Bd -literal
150struct kevent {
151	uintptr_t ident;	/* identifier for this event */
152	short	  filter;	/* filter for event */
153	u_short	  flags;	/* action flags for kqueue */
154	u_int	  fflags;	/* filter flag value */
155	intptr_t  data;		/* filter data value */
156	void	  *udata;	/* opaque user data identifier */
157};
158.Ed
159.Pp
160The fields of
161.Fa struct kevent
162are:
163.Bl -tag -width "Fa filter"
164.It Fa ident
165Value used to identify this event.
166The exact interpretation is determined by the attached filter,
167but often is a file descriptor.
168.It Fa filter
169Identifies the kernel filter used to process this event.
170The pre-defined
171system filters are described below.
172.It Fa flags
173Actions to perform on the event.
174.It Fa fflags
175Filter-specific flags.
176.It Fa data
177Filter-specific data value.
178.It Fa udata
179Opaque user-defined value passed through the kernel unchanged.
180.El
181.Pp
182The
183.Va flags
184field can contain the following values:
185.Bl -tag -width EV_DISPATCH
186.It Dv EV_ADD
187Adds the event to the kqueue.
188Re-adding an existing event
189will modify the parameters of the original event, and not result
190in a duplicate entry.
191Adding an event automatically enables it,
192unless overridden by the EV_DISABLE flag.
193.It Dv EV_ENABLE
194Permit
195.Fn kevent
196to return the event if it is triggered.
197.It Dv EV_DISABLE
198Disable the event so
199.Fn kevent
200will not return it.
201The filter itself is not disabled.
202.It Dv EV_DISPATCH
203Disable the event source immediately after delivery of an event.
204See
205.Dv EV_DISABLE
206above.
207.It Dv EV_DELETE
208Removes the event from the kqueue.
209Events which are attached to
210file descriptors are automatically deleted on the last close of
211the descriptor.
212.It Dv EV_RECEIPT
213This flag is useful for making bulk changes to a kqueue without draining
214any pending events.
215When passed as input, it forces
216.Dv EV_ERROR
217to always be returned.
218When a filter is successfully added the
219.Va data
220field will be zero.
221.It Dv EV_ONESHOT
222Causes the event to return only the first occurrence of the filter
223being triggered.
224After the user retrieves the event from the kqueue,
225it is deleted.
226.It Dv EV_CLEAR
227After the event is retrieved by the user, its state is reset.
228This is useful for filters which report state transitions
229instead of the current state.
230Note that some filters may automatically
231set this flag internally.
232.It Dv EV_EOF
233Filters may set this flag to indicate filter-specific EOF condition.
234.It Dv EV_ERROR
235See
236.Sx RETURN VALUES
237below.
238.El
239.Pp
240The predefined system filters are listed below.
241Arguments may be passed to and from the filter via the
242.Va fflags
243and
244.Va data
245fields in the kevent structure.
246.Bl -tag -width "Dv EVFILT_PROCDESC"
247.It Dv EVFILT_READ
248Takes a descriptor as the identifier, and returns whenever
249there is data available to read.
250The behavior of the filter is slightly different depending
251on the descriptor type.
252.Bl -tag -width 2n
253.It Sockets
254Sockets which have previously been passed to
255.Fn listen
256return when there is an incoming connection pending.
257.Va data
258contains the size of the listen backlog.
259.Pp
260Other socket descriptors return when there is data to be read,
261subject to the
262.Dv SO_RCVLOWAT
263value of the socket buffer.
264This may be overridden with a per-filter low water mark at the
265time the filter is added by setting the
266.Dv NOTE_LOWAT
267flag in
268.Va fflags ,
269and specifying the new low water mark in
270.Va data .
271On return,
272.Va data
273contains the number of bytes of protocol data available to read.
274.Pp
275If the read direction of the socket has shutdown, then the filter
276also sets
277.Dv EV_EOF
278in
279.Va flags ,
280and returns the socket error (if any) in
281.Va fflags .
282It is possible for EOF to be returned (indicating the connection is gone)
283while there is still data pending in the socket buffer.
284.It Vnodes
285Returns when the file pointer is not at the end of file.
286.Va data
287contains the offset from current position to end of file,
288and may be negative.
289.Pp
290This behavior is different from
291.Xr poll 2 ,
292where read events are triggered for regular files unconditionally.
293This event can be triggered unconditionally by setting the
294.Dv NOTE_FILE_POLL
295flag in
296.Va fflags .
297.It "Fifos, Pipes"
298Returns when the there is data to read;
299.Va data
300contains the number of bytes available.
301.Pp
302When the last writer disconnects, the filter will set
303.Dv EV_EOF
304in
305.Va flags .
306This may be cleared by passing in
307.Dv EV_CLEAR ,
308at which point the
309filter will resume waiting for data to become available before
310returning.
311.It "BPF devices"
312Returns when the BPF buffer is full, the BPF timeout has expired, or
313when the BPF has
314.Dq immediate mode
315enabled and there is any data to read;
316.Va data
317contains the number of bytes available.
318.El
319.It Dv EVFILT_WRITE
320Takes a descriptor as the identifier, and returns whenever
321it is possible to write to the descriptor.
322For sockets, pipes
323and fifos,
324.Va data
325will contain the amount of space remaining in the write buffer.
326The filter will set EV_EOF when the reader disconnects, and for
327the fifo case, this may be cleared by use of
328.Dv EV_CLEAR .
329Note that this filter is not supported for vnodes or BPF devices.
330.Pp
331For sockets, the low water mark and socket error handling is
332identical to the
333.Dv EVFILT_READ
334case.
335.It Dv EVFILT_AIO
336The sigevent portion of the AIO request is filled in, with
337.Va sigev_notify_kqueue
338containing the descriptor of the kqueue that the event should
339be attached to,
340.Va sigev_notify_kevent_flags
341containing the kevent flags which should be
342.Dv EV_ONESHOT ,
343.Dv EV_CLEAR
344or
345.Dv EV_DISPATCH ,
346.Va sigev_value
347containing the udata value, and
348.Va sigev_notify
349set to
350.Dv SIGEV_KEVENT .
351When the
352.Fn aio_*
353system call is made, the event will be registered
354with the specified kqueue, and the
355.Va ident
356argument set to the
357.Fa struct aiocb
358returned by the
359.Fn aio_*
360system call.
361The filter returns under the same conditions as
362.Fn aio_error .
363.It Dv EVFILT_VNODE
364Takes a file descriptor as the identifier and the events to watch for in
365.Va fflags ,
366and returns when one or more of the requested events occurs on the descriptor.
367The events to monitor are:
368.Bl -tag -width "Dv NOTE_CLOSE_WRITE"
369.It Dv NOTE_ATTRIB
370The file referenced by the descriptor had its attributes changed.
371.It Dv NOTE_CLOSE
372A file descriptor referencing the monitored file, was closed.
373The closed file descriptor did not have write access.
374.It Dv NOTE_CLOSE_WRITE
375A file descriptor referencing the monitored file, was closed.
376The closed file descriptor had write access.
377.Pp
378This note, as well as
379.Dv NOTE_CLOSE ,
380are not activated when files are closed forcibly by
381.Xr unmount 2 or
382.Xr revoke 2 .
383Instead,
384.Dv NOTE_REVOKE
385is sent for such events.
386.It Dv NOTE_DELETE
387The
388.Fn unlink
389system call was called on the file referenced by the descriptor.
390.It Dv NOTE_EXTEND
391For regular file, the file referenced by the descriptor was extended.
392.Pp
393For directory, reports that a directory entry was added or removed,
394as the result of rename operation.
395The
396.Dv NOTE_EXTEND
397event is not reported when a name is changed inside the directory.
398.It Dv NOTE_LINK
399The link count on the file changed.
400In particular, the
401.Dv NOTE_LINK
402event is reported if a subdirectory was created or deleted inside
403the directory referenced by the descriptor.
404.It Dv NOTE_OPEN
405The file referenced by the descriptor was opened.
406.It Dv NOTE_READ
407A read occurred on the file referenced by the descriptor.
408.It Dv NOTE_RENAME
409The file referenced by the descriptor was renamed.
410.It Dv NOTE_REVOKE
411Access to the file was revoked via
412.Xr revoke 2
413or the underlying file system was unmounted.
414.It Dv NOTE_WRITE
415A write occurred on the file referenced by the descriptor.
416.El
417.Pp
418On return,
419.Va fflags
420contains the events which triggered the filter.
421.It Dv EVFILT_PROC
422Takes the process ID to monitor as the identifier and the events to watch for
423in
424.Va fflags ,
425and returns when the process performs one or more of the requested events.
426If a process can normally see another process, it can attach an event to it.
427The events to monitor are:
428.Bl -tag -width "Dv NOTE_TRACKERR"
429.It Dv NOTE_EXIT
430The process has exited.
431The exit status will be stored in
432.Va data .
433.It Dv NOTE_FORK
434The process has called
435.Fn fork .
436.It Dv NOTE_EXEC
437The process has executed a new process via
438.Xr execve 2
439or a similar call.
440.It Dv NOTE_TRACK
441Follow a process across
442.Fn fork
443calls.
444The parent process registers a new kevent to monitor the child process
445using the same
446.Va fflags
447as the original event.
448The child process will signal an event with
449.Dv NOTE_CHILD
450set in
451.Va fflags
452and the parent PID in
453.Va data .
454.Pp
455If the parent process fails to register a new kevent
456.Pq usually due to resource limitations ,
457it will signal an event with
458.Dv NOTE_TRACKERR
459set in
460.Va fflags ,
461and the child process will not signal a
462.Dv NOTE_CHILD
463event.
464.El
465.Pp
466On return,
467.Va fflags
468contains the events which triggered the filter.
469.It Dv EVFILT_PROCDESC
470Takes the process descriptor created by
471.Xr pdfork 2
472to monitor as the identifier and the events to watch for in
473.Va fflags ,
474and returns when the associated process performs one or more of the
475requested events.
476The events to monitor are:
477.Bl -tag -width "Dv NOTE_EXIT"
478.It Dv NOTE_EXIT
479The process has exited.
480The exit status will be stored in
481.Va data .
482.El
483.Pp
484On return,
485.Va fflags
486contains the events which triggered the filter.
487.It Dv EVFILT_SIGNAL
488Takes the signal number to monitor as the identifier and returns
489when the given signal is delivered to the process.
490This coexists with the
491.Fn signal
492and
493.Fn sigaction
494facilities, and has a lower precedence.
495The filter will record
496all attempts to deliver a signal to a process, even if the signal has
497been marked as
498.Dv SIG_IGN ,
499except for the
500.Dv SIGCHLD
501signal, which, if ignored, won't be recorded by the filter.
502Event notification happens after normal
503signal delivery processing.
504.Va data
505returns the number of times the signal has occurred since the last call to
506.Fn kevent .
507This filter automatically sets the
508.Dv EV_CLEAR
509flag internally.
510.It Dv EVFILT_TIMER
511Establishes an arbitrary timer identified by
512.Va ident .
513When adding a timer,
514.Va data
515specifies the timeout period.
516The timer will be periodic unless
517.Dv EV_ONESHOT
518is specified.
519On return,
520.Va data
521contains the number of times the timeout has expired since the last call to
522.Fn kevent .
523This filter automatically sets the EV_CLEAR flag internally.
524There is a system wide limit on the number of timers
525which is controlled by the
526.Va kern.kq_calloutmax
527sysctl.
528.Bl -tag -width "Dv NOTE_USECONDS"
529.It Dv NOTE_SECONDS
530.Va data
531is in seconds.
532.It Dv NOTE_MSECONDS
533.Va data
534is in milliseconds.
535.It Dv NOTE_USECONDS
536.Va data
537is in microseconds.
538.It Dv NOTE_NSECONDS
539.Va data
540is in nanoseconds.
541.El
542.Pp
543If
544.Va fflags
545is not set, the default is milliseconds. On return,
546.Va fflags
547contains the events which triggered the filter.
548.It Dv EVFILT_USER
549Establishes a user event identified by
550.Va ident
551which is not associated with any kernel mechanism but is triggered by
552user level code.
553The lower 24 bits of the
554.Va fflags
555may be used for user defined flags and manipulated using the following:
556.Bl -tag -width "Dv NOTE_FFLAGSMASK"
557.It Dv NOTE_FFNOP
558Ignore the input
559.Va fflags .
560.It Dv NOTE_FFAND
561Bitwise AND
562.Va fflags .
563.It Dv NOTE_FFOR
564Bitwise OR
565.Va fflags .
566.It Dv NOTE_FFCOPY
567Copy
568.Va fflags .
569.It Dv NOTE_FFCTRLMASK
570Control mask for
571.Va fflags .
572.It Dv NOTE_FFLAGSMASK
573User defined flag mask for
574.Va fflags .
575.El
576.Pp
577A user event is triggered for output with the following:
578.Bl -tag -width "Dv NOTE_FFLAGSMASK"
579.It Dv NOTE_TRIGGER
580Cause the event to be triggered.
581.El
582.Pp
583On return,
584.Va fflags
585contains the users defined flags in the lower 24 bits.
586.El
587.Sh CANCELLATION BEHAVIOUR
588If
589.Fa nevents
590is non-zero, i.e. the function is potentially blocking, the call
591is a cancellation point.
592Otherwise, i.e. if
593.Fa nevents
594is zero, the call is not cancellable.
595Cancellation can only occur before any changes are made to the kqueue,
596or when the call was blocked and no changes to the queue were requested.
597.Sh RETURN VALUES
598The
599.Fn kqueue
600system call
601creates a new kernel event queue and returns a file descriptor.
602If there was an error creating the kernel event queue, a value of -1 is
603returned and errno set.
604.Pp
605The
606.Fn kevent
607system call
608returns the number of events placed in the
609.Fa eventlist ,
610up to the value given by
611.Fa nevents .
612If an error occurs while processing an element of the
613.Fa changelist
614and there is enough room in the
615.Fa eventlist ,
616then the event will be placed in the
617.Fa eventlist
618with
619.Dv EV_ERROR
620set in
621.Va flags
622and the system error in
623.Va data .
624Otherwise,
625.Dv -1
626will be returned, and
627.Dv errno
628will be set to indicate the error condition.
629If the time limit expires, then
630.Fn kevent
631returns 0.
632.Sh EXAMPLES
633.Bd -literal -compact
634#include <sys/event.h>
635#include <err.h>
636#include <fcntl.h>
637#include <stdio.h>
638#include <stdlib.h>
639#include <string.h>
640
641int
642main(int argc, char **argv)
643{
644    struct kevent event;    /* Event we want to monitor */
645    struct kevent tevent;   /* Event triggered */
646    int kq, fd, ret;
647
648    if (argc != 2)
649	err(EXIT_FAILURE, "Usage: %s path\en", argv[0]);
650    fd = open(argv[1], O_RDONLY);
651    if (fd == -1)
652	err(EXIT_FAILURE, "Failed to open '%s'", argv[1]);
653
654    /* Create kqueue. */
655    kq = kqueue();
656    if (kq == -1)
657	err(EXIT_FAILURE, "kqueue() failed");
658
659    /* Initialize kevent structure. */
660    EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE,
661	0, NULL);
662    /* Attach event to the kqueue. */
663    ret = kevent(kq, &event, 1, NULL, 0, NULL);
664    if (ret == -1)
665	err(EXIT_FAILURE, "kevent register");
666    if (event.flags & EV_ERROR)
667	errx(EXIT_FAILURE, "Event error: %s", strerror(event.data));
668
669    for (;;) {
670	/* Sleep until something happens. */
671	ret = kevent(kq, NULL, 0, &tevent, 1, NULL);
672	if (ret == -1) {
673	    err(EXIT_FAILURE, "kevent wait");
674	} else if (ret > 0) {
675	    printf("Something was written in '%s'\en", argv[1]);
676	}
677    }
678}
679.Ed
680.Sh ERRORS
681The
682.Fn kqueue
683system call fails if:
684.Bl -tag -width Er
685.It Bq Er ENOMEM
686The kernel failed to allocate enough memory for the kernel queue.
687.It Bq Er ENOMEM
688The
689.Dv RLIMIT_KQUEUES
690rlimit
691(see
692.Xr getrlimit 2 )
693for the current user would be exceeded.
694.It Bq Er EMFILE
695The per-process descriptor table is full.
696.It Bq Er ENFILE
697The system file table is full.
698.El
699.Pp
700The
701.Fn kevent
702system call fails if:
703.Bl -tag -width Er
704.It Bq Er EACCES
705The process does not have permission to register a filter.
706.It Bq Er EFAULT
707There was an error reading or writing the
708.Va kevent
709structure.
710.It Bq Er EBADF
711The specified descriptor is invalid.
712.It Bq Er EINTR
713A signal was delivered before the timeout expired and before any
714events were placed on the kqueue for return.
715.It Bq Er EINTR
716A cancellation request was delivered to the thread, but not yet handled.
717.It Bq Er EINVAL
718The specified time limit or filter is invalid.
719.It Bq Er ENOENT
720The event could not be found to be modified or deleted.
721.It Bq Er ENOMEM
722No memory was available to register the event
723or, in the special case of a timer, the maximum number of
724timers has been exceeded.
725This maximum is configurable via the
726.Va kern.kq_calloutmax
727sysctl.
728.It Bq Er ESRCH
729The specified process to attach to does not exist.
730.El
731.Pp
732When
733.Fn kevent
734call fails with
735.Er EINTR
736error, all changes in the
737.Fa changelist
738have been applied.
739.Sh SEE ALSO
740.Xr aio_error 2 ,
741.Xr aio_read 2 ,
742.Xr aio_return 2 ,
743.Xr poll 2 ,
744.Xr read 2 ,
745.Xr select 2 ,
746.Xr sigaction 2 ,
747.Xr write 2 ,
748.Xr pthread_setcancelstate 3 ,
749.Xr signal 3
750.Sh HISTORY
751The
752.Fn kqueue
753and
754.Fn kevent
755system calls first appeared in
756.Fx 4.1 .
757.Sh AUTHORS
758The
759.Fn kqueue
760system and this manual page were written by
761.An Jonathan Lemon Aq Mt [email protected] .
762.Sh BUGS
763The
764.Fa timeout
765value is limited to 24 hours; longer timeouts will be silently
766reinterpreted as 24 hours.
767.Pp
768Previous versions of
769.In sys/event.h
770fail to parse without including
771.In sys/types.h
772manually.
773