1.\" Copyright (c) 2000 Jonathan Lemon 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd April 18, 2017 28.Dt KQUEUE 2 29.Os 30.Sh NAME 31.Nm kqueue , 32.Nm kevent 33.Nd kernel event notification mechanism 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/event.h 38.Ft int 39.Fn kqueue "void" 40.Ft int 41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout" 42.Fn EV_SET "kev" ident filter flags fflags data udata 43.Sh DESCRIPTION 44The 45.Fn kqueue 46system call 47provides a generic method of notifying the user when an event 48happens or a condition holds, based on the results of small 49pieces of kernel code termed filters. 50A kevent is identified by the (ident, filter) pair; there may only 51be one unique kevent per kqueue. 52.Pp 53The filter is executed upon the initial registration of a kevent 54in order to detect whether a preexisting condition is present, and is also 55executed whenever an event is passed to the filter for evaluation. 56If the filter determines that the condition should be reported, 57then the kevent is placed on the kqueue for the user to retrieve. 58.Pp 59The filter is also run when the user attempts to retrieve the kevent 60from the kqueue. 61If the filter indicates that the condition that triggered 62the event no longer holds, the kevent is removed from the kqueue and 63is not returned. 64.Pp 65Multiple events which trigger the filter do not result in multiple 66kevents being placed on the kqueue; instead, the filter will aggregate 67the events into a single struct kevent. 68Calling 69.Fn close 70on a file descriptor will remove any kevents that reference the descriptor. 71.Pp 72The 73.Fn kqueue 74system call 75creates a new kernel event queue and returns a descriptor. 76The queue is not inherited by a child created with 77.Xr fork 2 . 78However, if 79.Xr rfork 2 80is called without the 81.Dv RFFDG 82flag, then the descriptor table is shared, 83which will allow sharing of the kqueue between two processes. 84.Pp 85The 86.Fn kevent 87system call 88is used to register events with the queue, and return any pending 89events to the user. 90The 91.Fa changelist 92argument 93is a pointer to an array of 94.Va kevent 95structures, as defined in 96.In sys/event.h . 97All changes contained in the 98.Fa changelist 99are applied before any pending events are read from the queue. 100The 101.Fa nchanges 102argument 103gives the size of 104.Fa changelist . 105The 106.Fa eventlist 107argument 108is a pointer to an array of kevent structures. 109The 110.Fa nevents 111argument 112determines the size of 113.Fa eventlist . 114When 115.Fa nevents 116is zero, 117.Fn kevent 118will return immediately even if there is a 119.Fa timeout 120specified unlike 121.Xr select 2 . 122If 123.Fa timeout 124is a non-NULL pointer, it specifies a maximum interval to wait 125for an event, which will be interpreted as a struct timespec. 126If 127.Fa timeout 128is a NULL pointer, 129.Fn kevent 130waits indefinitely. 131To effect a poll, the 132.Fa timeout 133argument should be non-NULL, pointing to a zero-valued 134.Va timespec 135structure. 136The same array may be used for the 137.Fa changelist 138and 139.Fa eventlist . 140.Pp 141The 142.Fn EV_SET 143macro is provided for ease of initializing a 144kevent structure. 145.Pp 146The 147.Va kevent 148structure is defined as: 149.Bd -literal 150struct kevent { 151 uintptr_t ident; /* identifier for this event */ 152 short filter; /* filter for event */ 153 u_short flags; /* action flags for kqueue */ 154 u_int fflags; /* filter flag value */ 155 intptr_t data; /* filter data value */ 156 void *udata; /* opaque user data identifier */ 157}; 158.Ed 159.Pp 160The fields of 161.Fa struct kevent 162are: 163.Bl -tag -width "Fa filter" 164.It Fa ident 165Value used to identify this event. 166The exact interpretation is determined by the attached filter, 167but often is a file descriptor. 168.It Fa filter 169Identifies the kernel filter used to process this event. 170The pre-defined 171system filters are described below. 172.It Fa flags 173Actions to perform on the event. 174.It Fa fflags 175Filter-specific flags. 176.It Fa data 177Filter-specific data value. 178.It Fa udata 179Opaque user-defined value passed through the kernel unchanged. 180.El 181.Pp 182The 183.Va flags 184field can contain the following values: 185.Bl -tag -width EV_DISPATCH 186.It Dv EV_ADD 187Adds the event to the kqueue. 188Re-adding an existing event 189will modify the parameters of the original event, and not result 190in a duplicate entry. 191Adding an event automatically enables it, 192unless overridden by the EV_DISABLE flag. 193.It Dv EV_ENABLE 194Permit 195.Fn kevent 196to return the event if it is triggered. 197.It Dv EV_DISABLE 198Disable the event so 199.Fn kevent 200will not return it. 201The filter itself is not disabled. 202.It Dv EV_DISPATCH 203Disable the event source immediately after delivery of an event. 204See 205.Dv EV_DISABLE 206above. 207.It Dv EV_DELETE 208Removes the event from the kqueue. 209Events which are attached to 210file descriptors are automatically deleted on the last close of 211the descriptor. 212.It Dv EV_RECEIPT 213This flag is useful for making bulk changes to a kqueue without draining 214any pending events. 215When passed as input, it forces 216.Dv EV_ERROR 217to always be returned. 218When a filter is successfully added the 219.Va data 220field will be zero. 221.It Dv EV_ONESHOT 222Causes the event to return only the first occurrence of the filter 223being triggered. 224After the user retrieves the event from the kqueue, 225it is deleted. 226.It Dv EV_CLEAR 227After the event is retrieved by the user, its state is reset. 228This is useful for filters which report state transitions 229instead of the current state. 230Note that some filters may automatically 231set this flag internally. 232.It Dv EV_EOF 233Filters may set this flag to indicate filter-specific EOF condition. 234.It Dv EV_ERROR 235See 236.Sx RETURN VALUES 237below. 238.El 239.Pp 240The predefined system filters are listed below. 241Arguments may be passed to and from the filter via the 242.Va fflags 243and 244.Va data 245fields in the kevent structure. 246.Bl -tag -width "Dv EVFILT_PROCDESC" 247.It Dv EVFILT_READ 248Takes a descriptor as the identifier, and returns whenever 249there is data available to read. 250The behavior of the filter is slightly different depending 251on the descriptor type. 252.Bl -tag -width 2n 253.It Sockets 254Sockets which have previously been passed to 255.Fn listen 256return when there is an incoming connection pending. 257.Va data 258contains the size of the listen backlog. 259.Pp 260Other socket descriptors return when there is data to be read, 261subject to the 262.Dv SO_RCVLOWAT 263value of the socket buffer. 264This may be overridden with a per-filter low water mark at the 265time the filter is added by setting the 266.Dv NOTE_LOWAT 267flag in 268.Va fflags , 269and specifying the new low water mark in 270.Va data . 271On return, 272.Va data 273contains the number of bytes of protocol data available to read. 274.Pp 275If the read direction of the socket has shutdown, then the filter 276also sets 277.Dv EV_EOF 278in 279.Va flags , 280and returns the socket error (if any) in 281.Va fflags . 282It is possible for EOF to be returned (indicating the connection is gone) 283while there is still data pending in the socket buffer. 284.It Vnodes 285Returns when the file pointer is not at the end of file. 286.Va data 287contains the offset from current position to end of file, 288and may be negative. 289.Pp 290This behavior is different from 291.Xr poll 2 , 292where read events are triggered for regular files unconditionally. 293This event can be triggered unconditionally by setting the 294.Dv NOTE_FILE_POLL 295flag in 296.Va fflags . 297.It "Fifos, Pipes" 298Returns when the there is data to read; 299.Va data 300contains the number of bytes available. 301.Pp 302When the last writer disconnects, the filter will set 303.Dv EV_EOF 304in 305.Va flags . 306This may be cleared by passing in 307.Dv EV_CLEAR , 308at which point the 309filter will resume waiting for data to become available before 310returning. 311.It "BPF devices" 312Returns when the BPF buffer is full, the BPF timeout has expired, or 313when the BPF has 314.Dq immediate mode 315enabled and there is any data to read; 316.Va data 317contains the number of bytes available. 318.El 319.It Dv EVFILT_WRITE 320Takes a descriptor as the identifier, and returns whenever 321it is possible to write to the descriptor. 322For sockets, pipes 323and fifos, 324.Va data 325will contain the amount of space remaining in the write buffer. 326The filter will set EV_EOF when the reader disconnects, and for 327the fifo case, this may be cleared by use of 328.Dv EV_CLEAR . 329Note that this filter is not supported for vnodes or BPF devices. 330.Pp 331For sockets, the low water mark and socket error handling is 332identical to the 333.Dv EVFILT_READ 334case. 335.It Dv EVFILT_EMPTY 336Takes a descriptor as the identifier, and returns whenever 337there is no remaining data in the write buffer. 338.It Dv EVFILT_AIO 339The sigevent portion of the AIO request is filled in, with 340.Va sigev_notify_kqueue 341containing the descriptor of the kqueue that the event should 342be attached to, 343.Va sigev_notify_kevent_flags 344containing the kevent flags which should be 345.Dv EV_ONESHOT , 346.Dv EV_CLEAR 347or 348.Dv EV_DISPATCH , 349.Va sigev_value 350containing the udata value, and 351.Va sigev_notify 352set to 353.Dv SIGEV_KEVENT . 354When the 355.Fn aio_* 356system call is made, the event will be registered 357with the specified kqueue, and the 358.Va ident 359argument set to the 360.Fa struct aiocb 361returned by the 362.Fn aio_* 363system call. 364The filter returns under the same conditions as 365.Fn aio_error . 366.It Dv EVFILT_VNODE 367Takes a file descriptor as the identifier and the events to watch for in 368.Va fflags , 369and returns when one or more of the requested events occurs on the descriptor. 370The events to monitor are: 371.Bl -tag -width "Dv NOTE_CLOSE_WRITE" 372.It Dv NOTE_ATTRIB 373The file referenced by the descriptor had its attributes changed. 374.It Dv NOTE_CLOSE 375A file descriptor referencing the monitored file, was closed. 376The closed file descriptor did not have write access. 377.It Dv NOTE_CLOSE_WRITE 378A file descriptor referencing the monitored file, was closed. 379The closed file descriptor had write access. 380.Pp 381This note, as well as 382.Dv NOTE_CLOSE , 383are not activated when files are closed forcibly by 384.Xr unmount 2 or 385.Xr revoke 2 . 386Instead, 387.Dv NOTE_REVOKE 388is sent for such events. 389.It Dv NOTE_DELETE 390The 391.Fn unlink 392system call was called on the file referenced by the descriptor. 393.It Dv NOTE_EXTEND 394For regular file, the file referenced by the descriptor was extended. 395.Pp 396For directory, reports that a directory entry was added or removed, 397as the result of rename operation. 398The 399.Dv NOTE_EXTEND 400event is not reported when a name is changed inside the directory. 401.It Dv NOTE_LINK 402The link count on the file changed. 403In particular, the 404.Dv NOTE_LINK 405event is reported if a subdirectory was created or deleted inside 406the directory referenced by the descriptor. 407.It Dv NOTE_OPEN 408The file referenced by the descriptor was opened. 409.It Dv NOTE_READ 410A read occurred on the file referenced by the descriptor. 411.It Dv NOTE_RENAME 412The file referenced by the descriptor was renamed. 413.It Dv NOTE_REVOKE 414Access to the file was revoked via 415.Xr revoke 2 416or the underlying file system was unmounted. 417.It Dv NOTE_WRITE 418A write occurred on the file referenced by the descriptor. 419.El 420.Pp 421On return, 422.Va fflags 423contains the events which triggered the filter. 424.It Dv EVFILT_PROC 425Takes the process ID to monitor as the identifier and the events to watch for 426in 427.Va fflags , 428and returns when the process performs one or more of the requested events. 429If a process can normally see another process, it can attach an event to it. 430The events to monitor are: 431.Bl -tag -width "Dv NOTE_TRACKERR" 432.It Dv NOTE_EXIT 433The process has exited. 434The exit status will be stored in 435.Va data . 436.It Dv NOTE_FORK 437The process has called 438.Fn fork . 439.It Dv NOTE_EXEC 440The process has executed a new process via 441.Xr execve 2 442or a similar call. 443.It Dv NOTE_TRACK 444Follow a process across 445.Fn fork 446calls. 447The parent process registers a new kevent to monitor the child process 448using the same 449.Va fflags 450as the original event. 451The child process will signal an event with 452.Dv NOTE_CHILD 453set in 454.Va fflags 455and the parent PID in 456.Va data . 457.Pp 458If the parent process fails to register a new kevent 459.Pq usually due to resource limitations , 460it will signal an event with 461.Dv NOTE_TRACKERR 462set in 463.Va fflags , 464and the child process will not signal a 465.Dv NOTE_CHILD 466event. 467.El 468.Pp 469On return, 470.Va fflags 471contains the events which triggered the filter. 472.It Dv EVFILT_PROCDESC 473Takes the process descriptor created by 474.Xr pdfork 2 475to monitor as the identifier and the events to watch for in 476.Va fflags , 477and returns when the associated process performs one or more of the 478requested events. 479The events to monitor are: 480.Bl -tag -width "Dv NOTE_EXIT" 481.It Dv NOTE_EXIT 482The process has exited. 483The exit status will be stored in 484.Va data . 485.El 486.Pp 487On return, 488.Va fflags 489contains the events which triggered the filter. 490.It Dv EVFILT_SIGNAL 491Takes the signal number to monitor as the identifier and returns 492when the given signal is delivered to the process. 493This coexists with the 494.Fn signal 495and 496.Fn sigaction 497facilities, and has a lower precedence. 498The filter will record 499all attempts to deliver a signal to a process, even if the signal has 500been marked as 501.Dv SIG_IGN , 502except for the 503.Dv SIGCHLD 504signal, which, if ignored, won't be recorded by the filter. 505Event notification happens after normal 506signal delivery processing. 507.Va data 508returns the number of times the signal has occurred since the last call to 509.Fn kevent . 510This filter automatically sets the 511.Dv EV_CLEAR 512flag internally. 513.It Dv EVFILT_TIMER 514Establishes an arbitrary timer identified by 515.Va ident . 516When adding a timer, 517.Va data 518specifies the timeout period. 519The timer will be periodic unless 520.Dv EV_ONESHOT 521is specified. 522On return, 523.Va data 524contains the number of times the timeout has expired since the last call to 525.Fn kevent . 526This filter automatically sets the EV_CLEAR flag internally. 527There is a system wide limit on the number of timers 528which is controlled by the 529.Va kern.kq_calloutmax 530sysctl. 531.Bl -tag -width "Dv NOTE_USECONDS" 532.It Dv NOTE_SECONDS 533.Va data 534is in seconds. 535.It Dv NOTE_MSECONDS 536.Va data 537is in milliseconds. 538.It Dv NOTE_USECONDS 539.Va data 540is in microseconds. 541.It Dv NOTE_NSECONDS 542.Va data 543is in nanoseconds. 544.El 545.Pp 546If 547.Va fflags 548is not set, the default is milliseconds. On return, 549.Va fflags 550contains the events which triggered the filter. 551.It Dv EVFILT_USER 552Establishes a user event identified by 553.Va ident 554which is not associated with any kernel mechanism but is triggered by 555user level code. 556The lower 24 bits of the 557.Va fflags 558may be used for user defined flags and manipulated using the following: 559.Bl -tag -width "Dv NOTE_FFLAGSMASK" 560.It Dv NOTE_FFNOP 561Ignore the input 562.Va fflags . 563.It Dv NOTE_FFAND 564Bitwise AND 565.Va fflags . 566.It Dv NOTE_FFOR 567Bitwise OR 568.Va fflags . 569.It Dv NOTE_FFCOPY 570Copy 571.Va fflags . 572.It Dv NOTE_FFCTRLMASK 573Control mask for 574.Va fflags . 575.It Dv NOTE_FFLAGSMASK 576User defined flag mask for 577.Va fflags . 578.El 579.Pp 580A user event is triggered for output with the following: 581.Bl -tag -width "Dv NOTE_FFLAGSMASK" 582.It Dv NOTE_TRIGGER 583Cause the event to be triggered. 584.El 585.Pp 586On return, 587.Va fflags 588contains the users defined flags in the lower 24 bits. 589.El 590.Sh CANCELLATION BEHAVIOUR 591If 592.Fa nevents 593is non-zero, i.e. the function is potentially blocking, the call 594is a cancellation point. 595Otherwise, i.e. if 596.Fa nevents 597is zero, the call is not cancellable. 598Cancellation can only occur before any changes are made to the kqueue, 599or when the call was blocked and no changes to the queue were requested. 600.Sh RETURN VALUES 601The 602.Fn kqueue 603system call 604creates a new kernel event queue and returns a file descriptor. 605If there was an error creating the kernel event queue, a value of -1 is 606returned and errno set. 607.Pp 608The 609.Fn kevent 610system call 611returns the number of events placed in the 612.Fa eventlist , 613up to the value given by 614.Fa nevents . 615If an error occurs while processing an element of the 616.Fa changelist 617and there is enough room in the 618.Fa eventlist , 619then the event will be placed in the 620.Fa eventlist 621with 622.Dv EV_ERROR 623set in 624.Va flags 625and the system error in 626.Va data . 627Otherwise, 628.Dv -1 629will be returned, and 630.Dv errno 631will be set to indicate the error condition. 632If the time limit expires, then 633.Fn kevent 634returns 0. 635.Sh EXAMPLES 636.Bd -literal -compact 637#include <sys/event.h> 638#include <err.h> 639#include <fcntl.h> 640#include <stdio.h> 641#include <stdlib.h> 642#include <string.h> 643 644int 645main(int argc, char **argv) 646{ 647 struct kevent event; /* Event we want to monitor */ 648 struct kevent tevent; /* Event triggered */ 649 int kq, fd, ret; 650 651 if (argc != 2) 652 err(EXIT_FAILURE, "Usage: %s path\en", argv[0]); 653 fd = open(argv[1], O_RDONLY); 654 if (fd == -1) 655 err(EXIT_FAILURE, "Failed to open '%s'", argv[1]); 656 657 /* Create kqueue. */ 658 kq = kqueue(); 659 if (kq == -1) 660 err(EXIT_FAILURE, "kqueue() failed"); 661 662 /* Initialize kevent structure. */ 663 EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE, 664 0, NULL); 665 /* Attach event to the kqueue. */ 666 ret = kevent(kq, &event, 1, NULL, 0, NULL); 667 if (ret == -1) 668 err(EXIT_FAILURE, "kevent register"); 669 if (event.flags & EV_ERROR) 670 errx(EXIT_FAILURE, "Event error: %s", strerror(event.data)); 671 672 for (;;) { 673 /* Sleep until something happens. */ 674 ret = kevent(kq, NULL, 0, &tevent, 1, NULL); 675 if (ret == -1) { 676 err(EXIT_FAILURE, "kevent wait"); 677 } else if (ret > 0) { 678 printf("Something was written in '%s'\en", argv[1]); 679 } 680 } 681} 682.Ed 683.Sh ERRORS 684The 685.Fn kqueue 686system call fails if: 687.Bl -tag -width Er 688.It Bq Er ENOMEM 689The kernel failed to allocate enough memory for the kernel queue. 690.It Bq Er ENOMEM 691The 692.Dv RLIMIT_KQUEUES 693rlimit 694(see 695.Xr getrlimit 2 ) 696for the current user would be exceeded. 697.It Bq Er EMFILE 698The per-process descriptor table is full. 699.It Bq Er ENFILE 700The system file table is full. 701.El 702.Pp 703The 704.Fn kevent 705system call fails if: 706.Bl -tag -width Er 707.It Bq Er EACCES 708The process does not have permission to register a filter. 709.It Bq Er EFAULT 710There was an error reading or writing the 711.Va kevent 712structure. 713.It Bq Er EBADF 714The specified descriptor is invalid. 715.It Bq Er EINTR 716A signal was delivered before the timeout expired and before any 717events were placed on the kqueue for return. 718.It Bq Er EINTR 719A cancellation request was delivered to the thread, but not yet handled. 720.It Bq Er EINVAL 721The specified time limit or filter is invalid. 722.It Bq Er ENOENT 723The event could not be found to be modified or deleted. 724.It Bq Er ENOMEM 725No memory was available to register the event 726or, in the special case of a timer, the maximum number of 727timers has been exceeded. 728This maximum is configurable via the 729.Va kern.kq_calloutmax 730sysctl. 731.It Bq Er ESRCH 732The specified process to attach to does not exist. 733.El 734.Pp 735When 736.Fn kevent 737call fails with 738.Er EINTR 739error, all changes in the 740.Fa changelist 741have been applied. 742.Sh SEE ALSO 743.Xr aio_error 2 , 744.Xr aio_read 2 , 745.Xr aio_return 2 , 746.Xr poll 2 , 747.Xr read 2 , 748.Xr select 2 , 749.Xr sigaction 2 , 750.Xr write 2 , 751.Xr pthread_setcancelstate 3 , 752.Xr signal 3 753.Sh HISTORY 754The 755.Fn kqueue 756and 757.Fn kevent 758system calls first appeared in 759.Fx 4.1 . 760.Sh AUTHORS 761The 762.Fn kqueue 763system and this manual page were written by 764.An Jonathan Lemon Aq Mt [email protected] . 765.Sh BUGS 766The 767.Fa timeout 768value is limited to 24 hours; longer timeouts will be silently 769reinterpreted as 24 hours. 770.Pp 771Previous versions of 772.In sys/event.h 773fail to parse without including 774.In sys/types.h 775manually. 776