1.\" Copyright (c) 2000 Jonathan Lemon 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd February 15, 2017 28.Dt KQUEUE 2 29.Os 30.Sh NAME 31.Nm kqueue , 32.Nm kevent 33.Nd kernel event notification mechanism 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/event.h 38.Ft int 39.Fn kqueue "void" 40.Ft int 41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout" 42.Fn EV_SET "kev" ident filter flags fflags data udata 43.Sh DESCRIPTION 44The 45.Fn kqueue 46system call 47provides a generic method of notifying the user when an event 48happens or a condition holds, based on the results of small 49pieces of kernel code termed filters. 50A kevent is identified by the (ident, filter) pair; there may only 51be one unique kevent per kqueue. 52.Pp 53The filter is executed upon the initial registration of a kevent 54in order to detect whether a preexisting condition is present, and is also 55executed whenever an event is passed to the filter for evaluation. 56If the filter determines that the condition should be reported, 57then the kevent is placed on the kqueue for the user to retrieve. 58.Pp 59The filter is also run when the user attempts to retrieve the kevent 60from the kqueue. 61If the filter indicates that the condition that triggered 62the event no longer holds, the kevent is removed from the kqueue and 63is not returned. 64.Pp 65Multiple events which trigger the filter do not result in multiple 66kevents being placed on the kqueue; instead, the filter will aggregate 67the events into a single struct kevent. 68Calling 69.Fn close 70on a file descriptor will remove any kevents that reference the descriptor. 71.Pp 72The 73.Fn kqueue 74system call 75creates a new kernel event queue and returns a descriptor. 76The queue is not inherited by a child created with 77.Xr fork 2 . 78However, if 79.Xr rfork 2 80is called without the 81.Dv RFFDG 82flag, then the descriptor table is shared, 83which will allow sharing of the kqueue between two processes. 84.Pp 85The 86.Fn kevent 87system call 88is used to register events with the queue, and return any pending 89events to the user. 90The 91.Fa changelist 92argument 93is a pointer to an array of 94.Va kevent 95structures, as defined in 96.In sys/event.h . 97All changes contained in the 98.Fa changelist 99are applied before any pending events are read from the queue. 100The 101.Fa nchanges 102argument 103gives the size of 104.Fa changelist . 105The 106.Fa eventlist 107argument 108is a pointer to an array of kevent structures. 109The 110.Fa nevents 111argument 112determines the size of 113.Fa eventlist . 114When 115.Fa nevents 116is zero, 117.Fn kevent 118will return immediately even if there is a 119.Fa timeout 120specified unlike 121.Xr select 2 . 122If 123.Fa timeout 124is a non-NULL pointer, it specifies a maximum interval to wait 125for an event, which will be interpreted as a struct timespec. 126If 127.Fa timeout 128is a NULL pointer, 129.Fn kevent 130waits indefinitely. 131To effect a poll, the 132.Fa timeout 133argument should be non-NULL, pointing to a zero-valued 134.Va timespec 135structure. 136The same array may be used for the 137.Fa changelist 138and 139.Fa eventlist . 140.Pp 141The 142.Fn EV_SET 143macro is provided for ease of initializing a 144kevent structure. 145.Pp 146The 147.Va kevent 148structure is defined as: 149.Bd -literal 150struct kevent { 151 uintptr_t ident; /* identifier for this event */ 152 short filter; /* filter for event */ 153 u_short flags; /* action flags for kqueue */ 154 u_int fflags; /* filter flag value */ 155 intptr_t data; /* filter data value */ 156 void *udata; /* opaque user data identifier */ 157}; 158.Ed 159.Pp 160The fields of 161.Fa struct kevent 162are: 163.Bl -tag -width "Fa filter" 164.It Fa ident 165Value used to identify this event. 166The exact interpretation is determined by the attached filter, 167but often is a file descriptor. 168.It Fa filter 169Identifies the kernel filter used to process this event. 170The pre-defined 171system filters are described below. 172.It Fa flags 173Actions to perform on the event. 174.It Fa fflags 175Filter-specific flags. 176.It Fa data 177Filter-specific data value. 178.It Fa udata 179Opaque user-defined value passed through the kernel unchanged. 180.El 181.Pp 182The 183.Va flags 184field can contain the following values: 185.Bl -tag -width EV_DISPATCH 186.It Dv EV_ADD 187Adds the event to the kqueue. 188Re-adding an existing event 189will modify the parameters of the original event, and not result 190in a duplicate entry. 191Adding an event automatically enables it, 192unless overridden by the EV_DISABLE flag. 193.It Dv EV_ENABLE 194Permit 195.Fn kevent 196to return the event if it is triggered. 197.It Dv EV_DISABLE 198Disable the event so 199.Fn kevent 200will not return it. 201The filter itself is not disabled. 202.It Dv EV_DISPATCH 203Disable the event source immediately after delivery of an event. 204See 205.Dv EV_DISABLE 206above. 207.It Dv EV_DELETE 208Removes the event from the kqueue. 209Events which are attached to 210file descriptors are automatically deleted on the last close of 211the descriptor. 212.It Dv EV_RECEIPT 213This flag is useful for making bulk changes to a kqueue without draining 214any pending events. 215When passed as input, it forces 216.Dv EV_ERROR 217to always be returned. 218When a filter is successfully added the 219.Va data 220field will be zero. 221.It Dv EV_ONESHOT 222Causes the event to return only the first occurrence of the filter 223being triggered. 224After the user retrieves the event from the kqueue, 225it is deleted. 226.It Dv EV_CLEAR 227After the event is retrieved by the user, its state is reset. 228This is useful for filters which report state transitions 229instead of the current state. 230Note that some filters may automatically 231set this flag internally. 232.It Dv EV_EOF 233Filters may set this flag to indicate filter-specific EOF condition. 234.It Dv EV_ERROR 235See 236.Sx RETURN VALUES 237below. 238.El 239.Pp 240The predefined system filters are listed below. 241Arguments may be passed to and from the filter via the 242.Va fflags 243and 244.Va data 245fields in the kevent structure. 246.Bl -tag -width "Dv EVFILT_PROCDESC" 247.It Dv EVFILT_READ 248Takes a descriptor as the identifier, and returns whenever 249there is data available to read. 250The behavior of the filter is slightly different depending 251on the descriptor type. 252.Bl -tag -width 2n 253.It Sockets 254Sockets which have previously been passed to 255.Fn listen 256return when there is an incoming connection pending. 257.Va data 258contains the size of the listen backlog. 259.Pp 260Other socket descriptors return when there is data to be read, 261subject to the 262.Dv SO_RCVLOWAT 263value of the socket buffer. 264This may be overridden with a per-filter low water mark at the 265time the filter is added by setting the 266.Dv NOTE_LOWAT 267flag in 268.Va fflags , 269and specifying the new low water mark in 270.Va data . 271On return, 272.Va data 273contains the number of bytes of protocol data available to read. 274.Pp 275If the read direction of the socket has shutdown, then the filter 276also sets 277.Dv EV_EOF 278in 279.Va flags , 280and returns the socket error (if any) in 281.Va fflags . 282It is possible for EOF to be returned (indicating the connection is gone) 283while there is still data pending in the socket buffer. 284.It Vnodes 285Returns when the file pointer is not at the end of file. 286.Va data 287contains the offset from current position to end of file, 288and may be negative. 289.Pp 290This behavior is different from 291.Xr poll 2 , 292where read events are triggered for regular files unconditionally. 293This event can be triggered unconditionally by setting the 294.Dv NOTE_FILE_POLL 295flag in 296.Va fflags . 297.It "Fifos, Pipes" 298Returns when the there is data to read; 299.Va data 300contains the number of bytes available. 301.Pp 302When the last writer disconnects, the filter will set 303.Dv EV_EOF 304in 305.Va flags . 306This may be cleared by passing in 307.Dv EV_CLEAR , 308at which point the 309filter will resume waiting for data to become available before 310returning. 311.It "BPF devices" 312Returns when the BPF buffer is full, the BPF timeout has expired, or 313when the BPF has 314.Dq immediate mode 315enabled and there is any data to read; 316.Va data 317contains the number of bytes available. 318.El 319.It Dv EVFILT_WRITE 320Takes a descriptor as the identifier, and returns whenever 321it is possible to write to the descriptor. 322For sockets, pipes 323and fifos, 324.Va data 325will contain the amount of space remaining in the write buffer. 326The filter will set EV_EOF when the reader disconnects, and for 327the fifo case, this may be cleared by use of 328.Dv EV_CLEAR . 329Note that this filter is not supported for vnodes or BPF devices. 330.Pp 331For sockets, the low water mark and socket error handling is 332identical to the 333.Dv EVFILT_READ 334case. 335.It Dv EVFILT_AIO 336The sigevent portion of the AIO request is filled in, with 337.Va sigev_notify_kqueue 338containing the descriptor of the kqueue that the event should 339be attached to, 340.Va sigev_notify_kevent_flags 341containing the kevent flags which should be 342.Dv EV_ONESHOT , 343.Dv EV_CLEAR 344or 345.Dv EV_DISPATCH , 346.Va sigev_value 347containing the udata value, and 348.Va sigev_notify 349set to 350.Dv SIGEV_KEVENT . 351When the 352.Fn aio_* 353system call is made, the event will be registered 354with the specified kqueue, and the 355.Va ident 356argument set to the 357.Fa struct aiocb 358returned by the 359.Fn aio_* 360system call. 361The filter returns under the same conditions as 362.Fn aio_error . 363.It Dv EVFILT_VNODE 364Takes a file descriptor as the identifier and the events to watch for in 365.Va fflags , 366and returns when one or more of the requested events occurs on the descriptor. 367The events to monitor are: 368.Bl -tag -width "Dv NOTE_CLOSE_WRITE" 369.It Dv NOTE_ATTRIB 370The file referenced by the descriptor had its attributes changed. 371.It Dv NOTE_CLOSE 372A file descriptor referencing the monitored file, was closed. 373The closed file descriptor did not have write access. 374.It Dv NOTE_CLOSE_WRITE 375A file descriptor referencing the monitored file, was closed. 376The closed file descriptor had write access. 377.Pp 378This note, as well as 379.Dv NOTE_CLOSE , 380are not activated when files are closed forcibly by 381.Xr unmount 2 or 382.Xr revoke 2 . 383Instead, 384.Dv NOTE_REVOKE 385is sent for such events. 386.It Dv NOTE_DELETE 387The 388.Fn unlink 389system call was called on the file referenced by the descriptor. 390.It Dv NOTE_EXTEND 391For regular file, the file referenced by the descriptor was extended. 392.Pp 393For directory, reports that a directory entry was added or removed, 394as the result of rename operation. 395The 396.Dv NOTE_EXTEND 397event is not reported when a name is changed inside the directory. 398.It Dv NOTE_LINK 399The link count on the file changed. 400In particular, the 401.Dv NOTE_LINK 402event is reported if a subdirectory was created or deleted inside 403the directory referenced by the descriptor. 404.It Dv NOTE_OPEN 405The file referenced by the descriptor was opened. 406.It Dv NOTE_READ 407A read occurred on the file referenced by the descriptor. 408.It Dv NOTE_RENAME 409The file referenced by the descriptor was renamed. 410.It Dv NOTE_REVOKE 411Access to the file was revoked via 412.Xr revoke 2 413or the underlying file system was unmounted. 414.It Dv NOTE_WRITE 415A write occurred on the file referenced by the descriptor. 416.El 417.Pp 418On return, 419.Va fflags 420contains the events which triggered the filter. 421.It Dv EVFILT_PROC 422Takes the process ID to monitor as the identifier and the events to watch for 423in 424.Va fflags , 425and returns when the process performs one or more of the requested events. 426If a process can normally see another process, it can attach an event to it. 427The events to monitor are: 428.Bl -tag -width "Dv NOTE_TRACKERR" 429.It Dv NOTE_EXIT 430The process has exited. 431The exit status will be stored in 432.Va data . 433.It Dv NOTE_FORK 434The process has called 435.Fn fork . 436.It Dv NOTE_EXEC 437The process has executed a new process via 438.Xr execve 2 439or a similar call. 440.It Dv NOTE_TRACK 441Follow a process across 442.Fn fork 443calls. 444The parent process registers a new kevent to monitor the child process 445using the same 446.Va fflags 447as the original event. 448The child process will signal an event with 449.Dv NOTE_CHILD 450set in 451.Va fflags 452and the parent PID in 453.Va data . 454.Pp 455If the parent process fails to register a new kevent 456.Pq usually due to resource limitations , 457it will signal an event with 458.Dv NOTE_TRACKERR 459set in 460.Va fflags , 461and the child process will not signal a 462.Dv NOTE_CHILD 463event. 464.El 465.Pp 466On return, 467.Va fflags 468contains the events which triggered the filter. 469.It Dv EVFILT_PROCDESC 470Takes the process descriptor created by 471.Xr pdfork 2 472to monitor as the identifier and the events to watch for in 473.Va fflags , 474and returns when the associated process performs one or more of the 475requested events. 476The events to monitor are: 477.Bl -tag -width "Dv NOTE_EXIT" 478.It Dv NOTE_EXIT 479The process has exited. 480The exit status will be stored in 481.Va data . 482.El 483.Pp 484On return, 485.Va fflags 486contains the events which triggered the filter. 487.It Dv EVFILT_SIGNAL 488Takes the signal number to monitor as the identifier and returns 489when the given signal is delivered to the process. 490This coexists with the 491.Fn signal 492and 493.Fn sigaction 494facilities, and has a lower precedence. 495The filter will record 496all attempts to deliver a signal to a process, even if the signal has 497been marked as 498.Dv SIG_IGN , 499except for the 500.Dv SIGCHLD 501signal, which, if ignored, won't be recorded by the filter. 502Event notification happens after normal 503signal delivery processing. 504.Va data 505returns the number of times the signal has occurred since the last call to 506.Fn kevent . 507This filter automatically sets the 508.Dv EV_CLEAR 509flag internally. 510.It Dv EVFILT_TIMER 511Establishes an arbitrary timer identified by 512.Va ident . 513When adding a timer, 514.Va data 515specifies the timeout period. 516The timer will be periodic unless 517.Dv EV_ONESHOT 518is specified. 519On return, 520.Va data 521contains the number of times the timeout has expired since the last call to 522.Fn kevent . 523This filter automatically sets the EV_CLEAR flag internally. 524There is a system wide limit on the number of timers 525which is controlled by the 526.Va kern.kq_calloutmax 527sysctl. 528.Bl -tag -width "Dv NOTE_USECONDS" 529.It Dv NOTE_SECONDS 530.Va data 531is in seconds. 532.It Dv NOTE_MSECONDS 533.Va data 534is in milliseconds. 535.It Dv NOTE_USECONDS 536.Va data 537is in microseconds. 538.It Dv NOTE_NSECONDS 539.Va data 540is in nanoseconds. 541.El 542.Pp 543If 544.Va fflags 545is not set, the default is milliseconds. On return, 546.Va fflags 547contains the events which triggered the filter. 548.It Dv EVFILT_USER 549Establishes a user event identified by 550.Va ident 551which is not associated with any kernel mechanism but is triggered by 552user level code. 553The lower 24 bits of the 554.Va fflags 555may be used for user defined flags and manipulated using the following: 556.Bl -tag -width "Dv NOTE_FFLAGSMASK" 557.It Dv NOTE_FFNOP 558Ignore the input 559.Va fflags . 560.It Dv NOTE_FFAND 561Bitwise AND 562.Va fflags . 563.It Dv NOTE_FFOR 564Bitwise OR 565.Va fflags . 566.It Dv NOTE_FFCOPY 567Copy 568.Va fflags . 569.It Dv NOTE_FFCTRLMASK 570Control mask for 571.Va fflags . 572.It Dv NOTE_FFLAGSMASK 573User defined flag mask for 574.Va fflags . 575.El 576.Pp 577A user event is triggered for output with the following: 578.Bl -tag -width "Dv NOTE_FFLAGSMASK" 579.It Dv NOTE_TRIGGER 580Cause the event to be triggered. 581.El 582.Pp 583On return, 584.Va fflags 585contains the users defined flags in the lower 24 bits. 586.El 587.Sh CANCELLATION BEHAVIOUR 588If 589.Fa nevents 590is non-zero, i.e. the function is potentially blocking, the call 591is a cancellation point. 592Otherwise, i.e. if 593.Fa nevents 594is zero, the call is not cancellable. 595Cancellation can only occur before any changes are made to the kqueue, 596or when the call was blocked and no changes to the queue were requested. 597.Sh RETURN VALUES 598The 599.Fn kqueue 600system call 601creates a new kernel event queue and returns a file descriptor. 602If there was an error creating the kernel event queue, a value of -1 is 603returned and errno set. 604.Pp 605The 606.Fn kevent 607system call 608returns the number of events placed in the 609.Fa eventlist , 610up to the value given by 611.Fa nevents . 612If an error occurs while processing an element of the 613.Fa changelist 614and there is enough room in the 615.Fa eventlist , 616then the event will be placed in the 617.Fa eventlist 618with 619.Dv EV_ERROR 620set in 621.Va flags 622and the system error in 623.Va data . 624Otherwise, 625.Dv -1 626will be returned, and 627.Dv errno 628will be set to indicate the error condition. 629If the time limit expires, then 630.Fn kevent 631returns 0. 632.Sh EXAMPLES 633.Bd -literal -compact 634#include <sys/event.h> 635#include <err.h> 636#include <fcntl.h> 637#include <stdio.h> 638#include <stdlib.h> 639#include <string.h> 640 641int 642main(int argc, char **argv) 643{ 644 struct kevent event; /* Event we want to monitor */ 645 struct kevent tevent; /* Event triggered */ 646 int kq, fd, ret; 647 648 if (argc != 2) 649 err(EXIT_FAILURE, "Usage: %s path\en", argv[0]); 650 fd = open(argv[1], O_RDONLY); 651 if (fd == -1) 652 err(EXIT_FAILURE, "Failed to open '%s'", argv[1]); 653 654 /* Create kqueue. */ 655 kq = kqueue(); 656 if (kq == -1) 657 err(EXIT_FAILURE, "kqueue() failed"); 658 659 /* Initialize kevent structure. */ 660 EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE, 661 0, NULL); 662 /* Attach event to the kqueue. */ 663 ret = kevent(kq, &event, 1, NULL, 0, NULL); 664 if (ret == -1) 665 err(EXIT_FAILURE, "kevent register"); 666 if (event.flags & EV_ERROR) 667 errx(EXIT_FAILURE, "Event error: %s", strerror(event.data)); 668 669 for (;;) { 670 /* Sleep until something happens. */ 671 ret = kevent(kq, NULL, 0, &tevent, 1, NULL); 672 if (ret == -1) { 673 err(EXIT_FAILURE, "kevent wait"); 674 } else if (ret > 0) { 675 printf("Something was written in '%s'\en", argv[1]); 676 } 677 } 678} 679.Ed 680.Sh ERRORS 681The 682.Fn kqueue 683system call fails if: 684.Bl -tag -width Er 685.It Bq Er ENOMEM 686The kernel failed to allocate enough memory for the kernel queue. 687.It Bq Er ENOMEM 688The 689.Dv RLIMIT_KQUEUES 690rlimit 691(see 692.Xr getrlimit 2 ) 693for the current user would be exceeded. 694.It Bq Er EMFILE 695The per-process descriptor table is full. 696.It Bq Er ENFILE 697The system file table is full. 698.El 699.Pp 700The 701.Fn kevent 702system call fails if: 703.Bl -tag -width Er 704.It Bq Er EACCES 705The process does not have permission to register a filter. 706.It Bq Er EFAULT 707There was an error reading or writing the 708.Va kevent 709structure. 710.It Bq Er EBADF 711The specified descriptor is invalid. 712.It Bq Er EINTR 713A signal was delivered before the timeout expired and before any 714events were placed on the kqueue for return. 715.It Bq Er EINTR 716A cancellation request was delivered to the thread, but not yet handled. 717.It Bq Er EINVAL 718The specified time limit or filter is invalid. 719.It Bq Er ENOENT 720The event could not be found to be modified or deleted. 721.It Bq Er ENOMEM 722No memory was available to register the event 723or, in the special case of a timer, the maximum number of 724timers has been exceeded. 725This maximum is configurable via the 726.Va kern.kq_calloutmax 727sysctl. 728.It Bq Er ESRCH 729The specified process to attach to does not exist. 730.El 731.Pp 732When 733.Fn kevent 734call fails with 735.Er EINTR 736error, all changes in the 737.Fa changelist 738have been applied. 739.Sh SEE ALSO 740.Xr aio_error 2 , 741.Xr aio_read 2 , 742.Xr aio_return 2 , 743.Xr poll 2 , 744.Xr read 2 , 745.Xr select 2 , 746.Xr sigaction 2 , 747.Xr write 2 , 748.Xr pthread_setcancelstate 3 , 749.Xr signal 3 750.Sh HISTORY 751The 752.Fn kqueue 753and 754.Fn kevent 755system calls first appeared in 756.Fx 4.1 . 757.Sh AUTHORS 758The 759.Fn kqueue 760system and this manual page were written by 761.An Jonathan Lemon Aq Mt [email protected] . 762.Sh BUGS 763The 764.Fa timeout 765value is limited to 24 hours; longer timeouts will be silently 766reinterpreted as 24 hours. 767.Pp 768Previous versions of 769.In sys/event.h 770fail to parse without including 771.In sys/types.h 772manually. 773