xref: /freebsd-12.1/lib/libc/sys/kqueue.2 (revision 4e0a104f)
1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd April 14, 2000
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/types.h
38.In sys/event.h
39.In sys/time.h
40.Ft int
41.Fn kqueue "void"
42.Ft int
43.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
44.Fn EV_SET "&kev" ident filter flags fflags data udata
45.Sh DESCRIPTION
46The
47.Fn kqueue
48system call
49provides a generic method of notifying the user when an event
50happens or a condition holds, based on the results of small
51pieces of kernel code termed filters.
52A kevent is identified by the (ident, filter) pair; there may only
53be one unique kevent per kqueue.
54.Pp
55The filter is executed upon the initial registration of a kevent
56in order to detect whether a preexisting condition is present, and is also
57executed whenever an event is passed to the filter for evaluation.
58If the filter determines that the condition should be reported,
59then the kevent is placed on the kqueue for the user to retrieve.
60.Pp
61The filter is also run when the user attempts to retrieve the kevent
62from the kqueue.
63If the filter indicates that the condition that triggered
64the event no longer holds, the kevent is removed from the kqueue and
65is not returned.
66.Pp
67Multiple events which trigger the filter do not result in multiple
68kevents being placed on the kqueue; instead, the filter will aggregate
69the events into a single struct kevent.
70Calling
71.Fn close
72on a file descriptor will remove any kevents that reference the descriptor.
73.Pp
74The
75.Fn kqueue
76system call
77creates a new kernel event queue and returns a descriptor.
78The queue is not inherited by a child created with
79.Xr fork 2 .
80However, if
81.Xr rfork 2
82is called without the
83.Dv RFFDG
84flag, then the descriptor table is shared,
85which will allow sharing of the kqueue between two processes.
86.Pp
87The
88.Fn kevent
89system call
90is used to register events with the queue, and return any pending
91events to the user.
92The
93.Fa changelist
94argument
95is a pointer to an array of
96.Va kevent
97structures, as defined in
98.In sys/event.h .
99All changes contained in the
100.Fa changelist
101are applied before any pending events are read from the queue.
102The
103.Fa nchanges
104argument
105gives the size of
106.Fa changelist .
107The
108.Fa eventlist
109argument
110is a pointer to an array of kevent structures.
111The
112.Fa nevents
113argument
114determines the size of
115.Fa eventlist .
116When
117.Fa nevents
118is zero,
119.Fn kevent
120will return immediately even if there is a
121.Fa timeout
122specified unlike
123.Xr select  2 .
124If
125.Fa timeout
126is a non-NULL pointer, it specifies a maximum interval to wait
127for an event, which will be interpreted as a struct timespec.  If
128.Fa timeout
129is a NULL pointer,
130.Fn kevent
131waits indefinitely.  To effect a poll, the
132.Fa timeout
133argument should be non-NULL, pointing to a zero-valued
134.Va timespec
135structure.  The same array may be used for the
136.Fa changelist
137and
138.Fa eventlist .
139.Pp
140The
141.Fn EV_SET
142macro is provided for ease of initializing a
143kevent structure.
144.Pp
145The
146.Va kevent
147structure is defined as:
148.Bd -literal
149struct kevent {
150	uintptr_t ident;	/* identifier for this event */
151	short	  filter;	/* filter for event */
152	u_short	  flags;	/* action flags for kqueue */
153	u_int	  fflags;	/* filter flag value */
154	intptr_t  data;		/* filter data value */
155	void	  *udata;	/* opaque user data identifier */
156};
157.Ed
158.Pp
159The fields of
160.Fa struct kevent
161are:
162.Bl -tag -width XXXfilter
163.It ident
164Value used to identify this event.
165The exact interpretation is determined by the attached filter,
166but often is a file descriptor.
167.It filter
168Identifies the kernel filter used to process this event.  The pre-defined
169system filters are described below.
170.It flags
171Actions to perform on the event.
172.It fflags
173Filter-specific flags.
174.It data
175Filter-specific data value.
176.It udata
177Opaque user-defined value passed through the kernel unchanged.
178.El
179.Pp
180The
181.Va flags
182field can contain the following values:
183.Bl -tag -width XXXEV_ONESHOT
184.It EV_ADD
185Adds the event to the kqueue.  Re-adding an existing event
186will modify the parameters of the original event, and not result
187in a duplicate entry.  Adding an event automatically enables it,
188unless overridden by the EV_DISABLE flag.
189.It EV_ENABLE
190Permit
191.Fn kevent
192to return the event if it is triggered.
193.It EV_DISABLE
194Disable the event so
195.Fn kevent
196will not return it.  The filter itself is not disabled.
197.It EV_DELETE
198Removes the event from the kqueue.  Events which are attached to
199file descriptors are automatically deleted on the last close of
200the descriptor.
201.It EV_ONESHOT
202Causes the event to return only the first occurrence of the filter
203being triggered.  After the user retrieves the event from the kqueue,
204it is deleted.
205.It EV_CLEAR
206After the event is retrieved by the user, its state is reset.
207This is useful for filters which report state transitions
208instead of the current state.  Note that some filters may automatically
209set this flag internally.
210.It EV_EOF
211Filters may set this flag to indicate filter-specific EOF condition.
212.It EV_ERROR
213See
214.Sx RETURN VALUES
215below.
216.El
217.Pp
218The predefined system filters are listed below.
219Arguments may be passed to and from the filter via the
220.Va fflags
221and
222.Va data
223fields in the kevent structure.
224.Bl -tag -width EVFILT_SIGNAL
225.It EVFILT_READ
226Takes a descriptor as the identifier, and returns whenever
227there is data available to read.
228The behavior of the filter is slightly different depending
229on the descriptor type.
230.Pp
231.Bl -tag -width 2n
232.It Sockets
233Sockets which have previously been passed to
234.Fn listen
235return when there is an incoming connection pending.
236.Va data
237contains the size of the listen backlog.
238.Pp
239Other socket descriptors return when there is data to be read,
240subject to the
241.Dv SO_RCVLOWAT
242value of the socket buffer.
243This may be overridden with a per-filter low water mark at the
244time the filter is added by setting the
245NOTE_LOWAT
246flag in
247.Va fflags ,
248and specifying the new low water mark in
249.Va data .
250On return,
251.Va data
252contains the number of bytes of protocol data available to read.
253.Pp
254If the read direction of the socket has shutdown, then the filter
255also sets EV_EOF in
256.Va flags ,
257and returns the socket error (if any) in
258.Va fflags .
259It is possible for EOF to be returned (indicating the connection is gone)
260while there is still data pending in the socket buffer.
261.It Vnodes
262Returns when the file pointer is not at the end of file.
263.Va data
264contains the offset from current position to end of file,
265and may be negative.
266.It "Fifos, Pipes"
267Returns when the there is data to read;
268.Va data
269contains the number of bytes available.
270.Pp
271When the last writer disconnects, the filter will set EV_EOF in
272.Va flags .
273This may be cleared by passing in EV_CLEAR, at which point the
274filter will resume waiting for data to become available before
275returning.
276.It "BPF devices"
277Returns when the BPF buffer is full, the BPF timeout has expired, or
278when the BPF has
279.Dq immediate mode
280enabled and there is any data to read;
281.Va data
282contains the number of bytes available.
283.El
284.It EVFILT_WRITE
285Takes a descriptor as the identifier, and returns whenever
286it is possible to write to the descriptor.  For sockets, pipes
287and fifos,
288.Va data
289will contain the amount of space remaining in the write buffer.
290The filter will set EV_EOF when the reader disconnects, and for
291the fifo case, this may be cleared by use of EV_CLEAR.
292Note that this filter is not supported for vnodes or BPF devices.
293.Pp
294For sockets, the low water mark and socket error handling is
295identical to the EVFILT_READ case.
296.It EVFILT_AIO
297The sigevent portion of the AIO request is filled in, with
298.Va sigev_notify_kqueue
299containing the descriptor of the kqueue that the event should
300be attached to,
301.Va sigev_value
302containing the udata value, and
303.Va sigev_notify
304set to SIGEV_KEVENT.
305When the
306.Fn aio_*
307system call is made, the event will be registered
308with the specified kqueue, and the
309.Va ident
310argument set to the
311.Fa struct aiocb
312returned by the
313.Fn aio_*
314system call.
315The filter returns under the same conditions as aio_error.
316.Pp
317Alternatively, a kevent structure may be initialized, with
318.Va ident
319containing the descriptor of the kqueue, and the
320address of the kevent structure placed in the
321.Va aio_lio_opcode
322field of the AIO request.  However, this approach will not work on
323architectures with 64-bit pointers, and should be considered deprecated.
324.It EVFILT_VNODE
325Takes a file descriptor as the identifier and the events to watch for in
326.Va fflags ,
327and returns when one or more of the requested events occurs on the descriptor.
328The events to monitor are:
329.Bl -tag -width XXNOTE_RENAME
330.It NOTE_DELETE
331The
332.Fn unlink
333system call
334was called on the file referenced by the descriptor.
335.It NOTE_WRITE
336A write occurred on the file referenced by the descriptor.
337.It NOTE_EXTEND
338The file referenced by the descriptor was extended.
339.It NOTE_ATTRIB
340The file referenced by the descriptor had its attributes changed.
341.It NOTE_LINK
342The link count on the file changed.
343.It NOTE_RENAME
344The file referenced by the descriptor was renamed.
345.It NOTE_REVOKE
346Access to the file was revoked via
347.Xr revoke 2
348or the underlying fileystem was unmounted.
349.El
350.Pp
351On return,
352.Va fflags
353contains the events which triggered the filter.
354.It EVFILT_PROC
355Takes the process ID to monitor as the identifier and the events to watch for
356in
357.Va fflags ,
358and returns when the process performs one or more of the requested events.
359If a process can normally see another process, it can attach an event to it.
360The events to monitor are:
361.Bl -tag -width XXNOTE_TRACKERR
362.It NOTE_EXIT
363The process has exited.
364.It NOTE_FORK
365The process has called
366.Fn fork .
367.It NOTE_EXEC
368The process has executed a new process via
369.Xr execve 2
370or similar call.
371.It NOTE_TRACK
372Follow a process across
373.Fn fork
374calls.  The parent process will return with NOTE_TRACK set in the
375.Va fflags
376field, while the child process will return with NOTE_CHILD set in
377.Va fflags
378and the parent PID in
379.Va data .
380.It NOTE_TRACKERR
381This flag is returned if the system was unable to attach an event to
382the child process, usually due to resource limitations.
383.El
384.Pp
385On return,
386.Va fflags
387contains the events which triggered the filter.
388.It EVFILT_SIGNAL
389Takes the signal number to monitor as the identifier and returns
390when the given signal is delivered to the process.
391This coexists with the
392.Fn signal
393and
394.Fn sigaction
395facilities, and has a lower precedence.  The filter will record
396all attempts to deliver a signal to a process, even if the signal has
397been marked as SIG_IGN.  Event notification happens after normal
398signal delivery processing.
399.Va data
400returns the number of times the signal has occurred since the last call to
401.Fn kevent .
402This filter automatically sets the EV_CLEAR flag internally.
403.It EVFILT_TIMER
404Establishes an arbitrary timer identified by
405.Va ident .
406When adding a timer,
407.Va data
408specifies the timeout period in milliseconds.
409The timer will be periodic unless EV_ONESHOT is specified.
410On return,
411.Va data
412contains the number of times the timeout has expired since the last call to
413.Fn kevent .
414This filter automatically sets the EV_CLEAR flag internally.
415.El
416.Sh RETURN VALUES
417The
418.Fn kqueue
419system call
420creates a new kernel event queue and returns a file descriptor.
421If there was an error creating the kernel event queue, a value of -1 is
422returned and errno set.
423.Pp
424The
425.Fn kevent
426system call
427returns the number of events placed in the
428.Fa eventlist ,
429up to the value given by
430.Fa nevents .
431If an error occurs while processing an element of the
432.Fa changelist
433and there is enough room in the
434.Fa eventlist ,
435then the event will be placed in the
436.Fa eventlist
437with
438.Dv EV_ERROR
439set in
440.Va flags
441and the system error in
442.Va data .
443Otherwise,
444.Dv -1
445will be returned, and
446.Dv errno
447will be set to indicate the error condition.
448If the time limit expires, then
449.Fn kevent
450returns 0.
451.Sh ERRORS
452The
453.Fn kqueue
454system call fails if:
455.Bl -tag -width Er
456.It Bq Er ENOMEM
457The kernel failed to allocate enough memory for the kernel queue.
458.It Bq Er EMFILE
459The per-process descriptor table is full.
460.It Bq Er ENFILE
461The system file table is full.
462.El
463.Pp
464The
465.Fn kevent
466system call fails if:
467.Bl -tag -width Er
468.It Bq Er EACCES
469The process does not have permission to register a filter.
470.It Bq Er EFAULT
471There was an error reading or writing the
472.Va kevent
473structure.
474.It Bq Er EBADF
475The specified descriptor is invalid.
476.It Bq Er EINTR
477A signal was delivered before the timeout expired and before any
478events were placed on the kqueue for return.
479.It Bq Er EINVAL
480The specified time limit or filter is invalid.
481.It Bq Er ENOENT
482The event could not be found to be modified or deleted.
483.It Bq Er ENOMEM
484No memory was available to register the event.
485.It Bq Er ESRCH
486The specified process to attach to does not exist.
487.El
488.Sh SEE ALSO
489.Xr aio_error 2 ,
490.Xr aio_read 2 ,
491.Xr aio_return 2 ,
492.Xr poll 2 ,
493.Xr read 2 ,
494.Xr select 2 ,
495.Xr sigaction 2 ,
496.Xr write 2 ,
497.Xr signal 3
498.Sh HISTORY
499The
500.Fn kqueue
501and
502.Fn kevent
503system calls first appeared in
504.Fx 4.1 .
505.Sh AUTHORS
506The
507.Fn kqueue
508system and this manual page were written by
509.An Jonathan Lemon Aq [email protected] .
510.Sh BUGS
511It is currently not possible to watch a
512.Xr vnode 9
513that resides on anything but
514a UFS file system.
515