1.\" Copyright (c) 2000 Jonathan Lemon 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd May 1, 2020 28.Dt KQUEUE 2 29.Os 30.Sh NAME 31.Nm kqueue , 32.Nm kevent 33.Nd kernel event notification mechanism 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/event.h 38.Ft int 39.Fn kqueue "void" 40.Ft int 41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout" 42.Fn EV_SET "kev" ident filter flags fflags data udata 43.Sh DESCRIPTION 44The 45.Fn kqueue 46system call 47provides a generic method of notifying the user when an event 48happens or a condition holds, based on the results of small 49pieces of kernel code termed filters. 50A kevent is identified by the (ident, filter) pair; there may only 51be one unique kevent per kqueue. 52.Pp 53The filter is executed upon the initial registration of a kevent 54in order to detect whether a preexisting condition is present, and is also 55executed whenever an event is passed to the filter for evaluation. 56If the filter determines that the condition should be reported, 57then the kevent is placed on the kqueue for the user to retrieve. 58.Pp 59The filter is also run when the user attempts to retrieve the kevent 60from the kqueue. 61If the filter indicates that the condition that triggered 62the event no longer holds, the kevent is removed from the kqueue and 63is not returned. 64.Pp 65Multiple events which trigger the filter do not result in multiple 66kevents being placed on the kqueue; instead, the filter will aggregate 67the events into a single struct kevent. 68Calling 69.Fn close 70on a file descriptor will remove any kevents that reference the descriptor. 71.Pp 72The 73.Fn kqueue 74system call 75creates a new kernel event queue and returns a descriptor. 76The queue is not inherited by a child created with 77.Xr fork 2 . 78However, if 79.Xr rfork 2 80is called without the 81.Dv RFFDG 82flag, then the descriptor table is shared, 83which will allow sharing of the kqueue between two processes. 84.Pp 85The 86.Fn kevent 87system call 88is used to register events with the queue, and return any pending 89events to the user. 90The 91.Fa changelist 92argument 93is a pointer to an array of 94.Va kevent 95structures, as defined in 96.In sys/event.h . 97All changes contained in the 98.Fa changelist 99are applied before any pending events are read from the queue. 100The 101.Fa nchanges 102argument 103gives the size of 104.Fa changelist . 105The 106.Fa eventlist 107argument 108is a pointer to an array of kevent structures. 109The 110.Fa nevents 111argument 112determines the size of 113.Fa eventlist . 114When 115.Fa nevents 116is zero, 117.Fn kevent 118will return immediately even if there is a 119.Fa timeout 120specified unlike 121.Xr select 2 . 122If 123.Fa timeout 124is a non-NULL pointer, it specifies a maximum interval to wait 125for an event, which will be interpreted as a struct timespec. 126If 127.Fa timeout 128is a NULL pointer, 129.Fn kevent 130waits indefinitely. 131To effect a poll, the 132.Fa timeout 133argument should be non-NULL, pointing to a zero-valued 134.Va timespec 135structure. 136The same array may be used for the 137.Fa changelist 138and 139.Fa eventlist . 140.Pp 141The 142.Fn EV_SET 143macro is provided for ease of initializing a 144kevent structure. 145.Pp 146The 147.Va kevent 148structure is defined as: 149.Bd -literal 150struct kevent { 151 uintptr_t ident; /* identifier for this event */ 152 short filter; /* filter for event */ 153 u_short flags; /* action flags for kqueue */ 154 u_int fflags; /* filter flag value */ 155 int64_t data; /* filter data value */ 156 void *udata; /* opaque user data identifier */ 157 uint64_t ext[4]; /* extensions */ 158}; 159.Ed 160.Pp 161The fields of 162.Fa struct kevent 163are: 164.Bl -tag -width "Fa filter" 165.It Fa ident 166Value used to identify this event. 167The exact interpretation is determined by the attached filter, 168but often is a file descriptor. 169.It Fa filter 170Identifies the kernel filter used to process this event. 171The pre-defined 172system filters are described below. 173.It Fa flags 174Actions to perform on the event. 175.It Fa fflags 176Filter-specific flags. 177.It Fa data 178Filter-specific data value. 179.It Fa udata 180Opaque user-defined value passed through the kernel unchanged. 181.It Fa ext 182Extended data passed to and from kernel. 183The 184.Fa ext[0] 185and 186.Fa ext[1] 187members use is defined by the filter. 188If the filter does not use them, the members are copied unchanged. 189The 190.Fa ext[2] 191and 192.Fa ext[3] 193members are always passed through the kernel as-is, 194making additional context available to application. 195.El 196.Pp 197The 198.Va flags 199field can contain the following values: 200.Bl -tag -width EV_DISPATCH 201.It Dv EV_ADD 202Adds the event to the kqueue. 203Re-adding an existing event 204will modify the parameters of the original event, and not result 205in a duplicate entry. 206Adding an event automatically enables it, 207unless overridden by the EV_DISABLE flag. 208.It Dv EV_ENABLE 209Permit 210.Fn kevent 211to return the event if it is triggered. 212.It Dv EV_DISABLE 213Disable the event so 214.Fn kevent 215will not return it. 216The filter itself is not disabled. 217.It Dv EV_DISPATCH 218Disable the event source immediately after delivery of an event. 219See 220.Dv EV_DISABLE 221above. 222.It Dv EV_DELETE 223Removes the event from the kqueue. 224Events which are attached to 225file descriptors are automatically deleted on the last close of 226the descriptor. 227.It Dv EV_RECEIPT 228This flag is useful for making bulk changes to a kqueue without draining 229any pending events. 230When passed as input, it forces 231.Dv EV_ERROR 232to always be returned. 233When a filter is successfully added the 234.Va data 235field will be zero. 236Note that if this flag is encountered and there is no remaining space in 237.Fa eventlist 238to hold the 239.Dv EV_ERROR 240event, then subsequent changes will not get processed. 241.It Dv EV_ONESHOT 242Causes the event to return only the first occurrence of the filter 243being triggered. 244After the user retrieves the event from the kqueue, 245it is deleted. 246.It Dv EV_CLEAR 247After the event is retrieved by the user, its state is reset. 248This is useful for filters which report state transitions 249instead of the current state. 250Note that some filters may automatically 251set this flag internally. 252.It Dv EV_EOF 253Filters may set this flag to indicate filter-specific EOF condition. 254.It Dv EV_ERROR 255See 256.Sx RETURN VALUES 257below. 258.El 259.Pp 260The predefined system filters are listed below. 261Arguments may be passed to and from the filter via the 262.Va fflags 263and 264.Va data 265fields in the kevent structure. 266.Bl -tag -width "Dv EVFILT_PROCDESC" 267.It Dv EVFILT_READ 268Takes a descriptor as the identifier, and returns whenever 269there is data available to read. 270The behavior of the filter is slightly different depending 271on the descriptor type. 272.Bl -tag -width 2n 273.It Sockets 274Sockets which have previously been passed to 275.Fn listen 276return when there is an incoming connection pending. 277.Va data 278contains the size of the listen backlog. 279.Pp 280Other socket descriptors return when there is data to be read, 281subject to the 282.Dv SO_RCVLOWAT 283value of the socket buffer. 284This may be overridden with a per-filter low water mark at the 285time the filter is added by setting the 286.Dv NOTE_LOWAT 287flag in 288.Va fflags , 289and specifying the new low water mark in 290.Va data . 291On return, 292.Va data 293contains the number of bytes of protocol data available to read. 294.Pp 295If the read direction of the socket has shutdown, then the filter 296also sets 297.Dv EV_EOF 298in 299.Va flags , 300and returns the socket error (if any) in 301.Va fflags . 302It is possible for EOF to be returned (indicating the connection is gone) 303while there is still data pending in the socket buffer. 304.It Vnodes 305Returns when the file pointer is not at the end of file. 306.Va data 307contains the offset from current position to end of file, 308and may be negative. 309.Pp 310This behavior is different from 311.Xr poll 2 , 312where read events are triggered for regular files unconditionally. 313This event can be triggered unconditionally by setting the 314.Dv NOTE_FILE_POLL 315flag in 316.Va fflags . 317.It "Fifos, Pipes" 318Returns when the there is data to read; 319.Va data 320contains the number of bytes available. 321.Pp 322When the last writer disconnects, the filter will set 323.Dv EV_EOF 324in 325.Va flags . 326This will be cleared by the filter when a new writer connects, 327at which point the 328filter will resume waiting for data to become available before 329returning. 330.It "BPF devices" 331Returns when the BPF buffer is full, the BPF timeout has expired, or 332when the BPF has 333.Dq immediate mode 334enabled and there is any data to read; 335.Va data 336contains the number of bytes available. 337.El 338.It Dv EVFILT_WRITE 339Takes a descriptor as the identifier, and returns whenever 340it is possible to write to the descriptor. 341For sockets, pipes 342and fifos, 343.Va data 344will contain the amount of space remaining in the write buffer. 345The filter will set 346.Dv EV_EOF 347when the reader disconnects, and for the fifo case, this will be cleared 348when a new reader connects. 349Note that this filter is not supported for vnodes or BPF devices. 350.Pp 351For sockets, the low water mark and socket error handling is 352identical to the 353.Dv EVFILT_READ 354case. 355.It Dv EVFILT_EMPTY 356Takes a descriptor as the identifier, and returns whenever 357there is no remaining data in the write buffer. 358.It Dv EVFILT_AIO 359Events for this filter are not registered with 360.Fn kevent 361directly but are registered via the 362.Va aio_sigevent 363member of an asynchronous I/O request when it is scheduled via an 364asynchronous I/O system call such as 365.Fn aio_read . 366The filter returns under the same conditions as 367.Fn aio_error . 368For more details on this filter see 369.Xr sigevent 3 and 370.Xr aio 4 . 371.It Dv EVFILT_VNODE 372Takes a file descriptor as the identifier and the events to watch for in 373.Va fflags , 374and returns when one or more of the requested events occurs on the descriptor. 375The events to monitor are: 376.Bl -tag -width "Dv NOTE_CLOSE_WRITE" 377.It Dv NOTE_ATTRIB 378The file referenced by the descriptor had its attributes changed. 379.It Dv NOTE_CLOSE 380A file descriptor referencing the monitored file, was closed. 381The closed file descriptor did not have write access. 382.It Dv NOTE_CLOSE_WRITE 383A file descriptor referencing the monitored file, was closed. 384The closed file descriptor had write access. 385.Pp 386This note, as well as 387.Dv NOTE_CLOSE , 388are not activated when files are closed forcibly by 389.Xr unmount 2 or 390.Xr revoke 2 . 391Instead, 392.Dv NOTE_REVOKE 393is sent for such events. 394.It Dv NOTE_DELETE 395The 396.Fn unlink 397system call was called on the file referenced by the descriptor. 398.It Dv NOTE_EXTEND 399For regular file, the file referenced by the descriptor was extended. 400.Pp 401For directory, reports that a directory entry was added or removed, 402as the result of rename operation. 403The 404.Dv NOTE_EXTEND 405event is not reported when a name is changed inside the directory. 406.It Dv NOTE_LINK 407The link count on the file changed. 408In particular, the 409.Dv NOTE_LINK 410event is reported if a subdirectory was created or deleted inside 411the directory referenced by the descriptor. 412.It Dv NOTE_OPEN 413The file referenced by the descriptor was opened. 414.It Dv NOTE_READ 415A read occurred on the file referenced by the descriptor. 416.It Dv NOTE_RENAME 417The file referenced by the descriptor was renamed. 418.It Dv NOTE_REVOKE 419Access to the file was revoked via 420.Xr revoke 2 421or the underlying file system was unmounted. 422.It Dv NOTE_WRITE 423A write occurred on the file referenced by the descriptor. 424.El 425.Pp 426On return, 427.Va fflags 428contains the events which triggered the filter. 429.It Dv EVFILT_PROC 430Takes the process ID to monitor as the identifier and the events to watch for 431in 432.Va fflags , 433and returns when the process performs one or more of the requested events. 434If a process can normally see another process, it can attach an event to it. 435The events to monitor are: 436.Bl -tag -width "Dv NOTE_TRACKERR" 437.It Dv NOTE_EXIT 438The process has exited. 439The exit status will be stored in 440.Va data . 441.It Dv NOTE_FORK 442The process has called 443.Fn fork . 444.It Dv NOTE_EXEC 445The process has executed a new process via 446.Xr execve 2 447or a similar call. 448.It Dv NOTE_TRACK 449Follow a process across 450.Fn fork 451calls. 452The parent process registers a new kevent to monitor the child process 453using the same 454.Va fflags 455as the original event. 456The child process will signal an event with 457.Dv NOTE_CHILD 458set in 459.Va fflags 460and the parent PID in 461.Va data . 462.Pp 463If the parent process fails to register a new kevent 464.Pq usually due to resource limitations , 465it will signal an event with 466.Dv NOTE_TRACKERR 467set in 468.Va fflags , 469and the child process will not signal a 470.Dv NOTE_CHILD 471event. 472.El 473.Pp 474On return, 475.Va fflags 476contains the events which triggered the filter. 477.It Dv EVFILT_PROCDESC 478Takes the process descriptor created by 479.Xr pdfork 2 480to monitor as the identifier and the events to watch for in 481.Va fflags , 482and returns when the associated process performs one or more of the 483requested events. 484The events to monitor are: 485.Bl -tag -width "Dv NOTE_EXIT" 486.It Dv NOTE_EXIT 487The process has exited. 488The exit status will be stored in 489.Va data . 490.El 491.Pp 492On return, 493.Va fflags 494contains the events which triggered the filter. 495.It Dv EVFILT_SIGNAL 496Takes the signal number to monitor as the identifier and returns 497when the given signal is delivered to the process. 498This coexists with the 499.Fn signal 500and 501.Fn sigaction 502facilities, and has a lower precedence. 503The filter will record 504all attempts to deliver a signal to a process, even if the signal has 505been marked as 506.Dv SIG_IGN , 507except for the 508.Dv SIGCHLD 509signal, which, if ignored, will not be recorded by the filter. 510Event notification happens after normal 511signal delivery processing. 512.Va data 513returns the number of times the signal has occurred since the last call to 514.Fn kevent . 515This filter automatically sets the 516.Dv EV_CLEAR 517flag internally. 518.It Dv EVFILT_TIMER 519Establishes an arbitrary timer identified by 520.Va ident . 521When adding a timer, 522.Va data 523specifies the moment to fire the timer (for 524.Dv NOTE_ABSTIME ) 525or the timeout period. 526The timer will be periodic unless 527.Dv EV_ONESHOT 528or 529.Dv NOTE_ABSTIME 530is specified. 531On return, 532.Va data 533contains the number of times the timeout has expired since the last call to 534.Fn kevent . 535For non-monotonic timers, this filter automatically sets the 536.Dv EV_CLEAR 537flag internally. 538.Pp 539The filter accepts the following flags in the 540.Va fflags 541argument: 542.Bl -tag -width "Dv NOTE_MSECONDS" 543.It Dv NOTE_SECONDS 544.Va data 545is in seconds. 546.It Dv NOTE_MSECONDS 547.Va data 548is in milliseconds. 549.It Dv NOTE_USECONDS 550.Va data 551is in microseconds. 552.It Dv NOTE_NSECONDS 553.Va data 554is in nanoseconds. 555.It Dv NOTE_ABSTIME 556The specified expiration time is absolute. 557.El 558.Pp 559If 560.Va fflags 561is not set, the default is milliseconds. 562On return, 563.Va fflags 564contains the events which triggered the filter. 565.Pp 566If an existing timer is re-added, the existing timer will be 567effectively canceled (throwing away any undelivered record of previous 568timer expiration) and re-started using the new parameters contained in 569.Va data 570and 571.Va fflags . 572.Pp 573There is a system wide limit on the number of timers 574which is controlled by the 575.Va kern.kq_calloutmax 576sysctl. 577.It Dv EVFILT_USER 578Establishes a user event identified by 579.Va ident 580which is not associated with any kernel mechanism but is triggered by 581user level code. 582The lower 24 bits of the 583.Va fflags 584may be used for user defined flags and manipulated using the following: 585.Bl -tag -width "Dv NOTE_FFLAGSMASK" 586.It Dv NOTE_FFNOP 587Ignore the input 588.Va fflags . 589.It Dv NOTE_FFAND 590Bitwise AND 591.Va fflags . 592.It Dv NOTE_FFOR 593Bitwise OR 594.Va fflags . 595.It Dv NOTE_FFCOPY 596Copy 597.Va fflags . 598.It Dv NOTE_FFCTRLMASK 599Control mask for 600.Va fflags . 601.It Dv NOTE_FFLAGSMASK 602User defined flag mask for 603.Va fflags . 604.El 605.Pp 606A user event is triggered for output with the following: 607.Bl -tag -width "Dv NOTE_FFLAGSMASK" 608.It Dv NOTE_TRIGGER 609Cause the event to be triggered. 610.El 611.Pp 612On return, 613.Va fflags 614contains the users defined flags in the lower 24 bits. 615.El 616.Sh CANCELLATION BEHAVIOUR 617If 618.Fa nevents 619is non-zero, i.e., the function is potentially blocking, the call 620is a cancellation point. 621Otherwise, i.e., if 622.Fa nevents 623is zero, the call is not cancellable. 624Cancellation can only occur before any changes are made to the kqueue, 625or when the call was blocked and no changes to the queue were requested. 626.Sh RETURN VALUES 627The 628.Fn kqueue 629system call 630creates a new kernel event queue and returns a file descriptor. 631If there was an error creating the kernel event queue, a value of -1 is 632returned and errno set. 633.Pp 634The 635.Fn kevent 636system call 637returns the number of events placed in the 638.Fa eventlist , 639up to the value given by 640.Fa nevents . 641If an error occurs while processing an element of the 642.Fa changelist 643and there is enough room in the 644.Fa eventlist , 645then the event will be placed in the 646.Fa eventlist 647with 648.Dv EV_ERROR 649set in 650.Va flags 651and the system error in 652.Va data . 653Otherwise, 654.Dv -1 655will be returned, and 656.Dv errno 657will be set to indicate the error condition. 658If the time limit expires, then 659.Fn kevent 660returns 0. 661.Sh EXAMPLES 662.Bd -literal -compact 663#include <sys/event.h> 664#include <err.h> 665#include <fcntl.h> 666#include <stdio.h> 667#include <stdlib.h> 668#include <string.h> 669 670int 671main(int argc, char **argv) 672{ 673 struct kevent event; /* Event we want to monitor */ 674 struct kevent tevent; /* Event triggered */ 675 int kq, fd, ret; 676 677 if (argc != 2) 678 err(EXIT_FAILURE, "Usage: %s path\en", argv[0]); 679 fd = open(argv[1], O_RDONLY); 680 if (fd == -1) 681 err(EXIT_FAILURE, "Failed to open '%s'", argv[1]); 682 683 /* Create kqueue. */ 684 kq = kqueue(); 685 if (kq == -1) 686 err(EXIT_FAILURE, "kqueue() failed"); 687 688 /* Initialize kevent structure. */ 689 EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE, 690 0, NULL); 691 /* Attach event to the kqueue. */ 692 ret = kevent(kq, &event, 1, NULL, 0, NULL); 693 if (ret == -1) 694 err(EXIT_FAILURE, "kevent register"); 695 if (event.flags & EV_ERROR) 696 errx(EXIT_FAILURE, "Event error: %s", strerror(event.data)); 697 698 for (;;) { 699 /* Sleep until something happens. */ 700 ret = kevent(kq, NULL, 0, &tevent, 1, NULL); 701 if (ret == -1) { 702 err(EXIT_FAILURE, "kevent wait"); 703 } else if (ret > 0) { 704 printf("Something was written in '%s'\en", argv[1]); 705 } 706 } 707} 708.Ed 709.Sh ERRORS 710The 711.Fn kqueue 712system call fails if: 713.Bl -tag -width Er 714.It Bq Er ENOMEM 715The kernel failed to allocate enough memory for the kernel queue. 716.It Bq Er ENOMEM 717The 718.Dv RLIMIT_KQUEUES 719rlimit 720(see 721.Xr getrlimit 2 ) 722for the current user would be exceeded. 723.It Bq Er EMFILE 724The per-process descriptor table is full. 725.It Bq Er ENFILE 726The system file table is full. 727.El 728.Pp 729The 730.Fn kevent 731system call fails if: 732.Bl -tag -width Er 733.It Bq Er EACCES 734The process does not have permission to register a filter. 735.It Bq Er EFAULT 736There was an error reading or writing the 737.Va kevent 738structure. 739.It Bq Er EBADF 740The specified descriptor is invalid. 741.It Bq Er EINTR 742A signal was delivered before the timeout expired and before any 743events were placed on the kqueue for return. 744.It Bq Er EINTR 745A cancellation request was delivered to the thread, but not yet handled. 746.It Bq Er EINVAL 747The specified time limit or filter is invalid. 748.It Bq Er ENOENT 749The event could not be found to be modified or deleted. 750.It Bq Er ENOMEM 751No memory was available to register the event 752or, in the special case of a timer, the maximum number of 753timers has been exceeded. 754This maximum is configurable via the 755.Va kern.kq_calloutmax 756sysctl. 757.It Bq Er ESRCH 758The specified process to attach to does not exist. 759.El 760.Pp 761When 762.Fn kevent 763call fails with 764.Er EINTR 765error, all changes in the 766.Fa changelist 767have been applied. 768.Sh SEE ALSO 769.Xr aio_error 2 , 770.Xr aio_read 2 , 771.Xr aio_return 2 , 772.Xr poll 2 , 773.Xr read 2 , 774.Xr select 2 , 775.Xr sigaction 2 , 776.Xr write 2 , 777.Xr pthread_setcancelstate 3 , 778.Xr signal 3 779.Rs 780.%A Jonathan Lemon 781.%T "Kqueue: A Generic and Scalable Event Notification Facility" 782.%I USENIX Association 783.%B Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference 784.%D June 25-30, 2001 785.\".http://www.usenix.org/event/usenix01/freenix01/full_papers/lemon/lemon.pdf 786.Re 787.Sh HISTORY 788The 789.Fn kqueue 790and 791.Fn kevent 792system calls first appeared in 793.Fx 4.1 . 794.Sh AUTHORS 795The 796.Fn kqueue 797system and this manual page were written by 798.An Jonathan Lemon Aq Mt [email protected] . 799.Sh BUGS 800The 801.Fa timeout 802value is limited to 24 hours; longer timeouts will be silently 803reinterpreted as 24 hours. 804.Pp 805In versions older than 806.Fx 12.0 , 807.In sys/event.h 808failed to parse without including 809.In sys/types.h 810manually. 811