1.\" Copyright (c) 2000 Jonathan Lemon 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd July 27, 2018 28.Dt KQUEUE 2 29.Os 30.Sh NAME 31.Nm kqueue , 32.Nm kevent 33.Nd kernel event notification mechanism 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/event.h 38.Ft int 39.Fn kqueue "void" 40.Ft int 41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout" 42.Fn EV_SET "kev" ident filter flags fflags data udata 43.Sh DESCRIPTION 44The 45.Fn kqueue 46system call 47provides a generic method of notifying the user when an event 48happens or a condition holds, based on the results of small 49pieces of kernel code termed filters. 50A kevent is identified by the (ident, filter) pair; there may only 51be one unique kevent per kqueue. 52.Pp 53The filter is executed upon the initial registration of a kevent 54in order to detect whether a preexisting condition is present, and is also 55executed whenever an event is passed to the filter for evaluation. 56If the filter determines that the condition should be reported, 57then the kevent is placed on the kqueue for the user to retrieve. 58.Pp 59The filter is also run when the user attempts to retrieve the kevent 60from the kqueue. 61If the filter indicates that the condition that triggered 62the event no longer holds, the kevent is removed from the kqueue and 63is not returned. 64.Pp 65Multiple events which trigger the filter do not result in multiple 66kevents being placed on the kqueue; instead, the filter will aggregate 67the events into a single struct kevent. 68Calling 69.Fn close 70on a file descriptor will remove any kevents that reference the descriptor. 71.Pp 72The 73.Fn kqueue 74system call 75creates a new kernel event queue and returns a descriptor. 76The queue is not inherited by a child created with 77.Xr fork 2 . 78However, if 79.Xr rfork 2 80is called without the 81.Dv RFFDG 82flag, then the descriptor table is shared, 83which will allow sharing of the kqueue between two processes. 84.Pp 85The 86.Fn kevent 87system call 88is used to register events with the queue, and return any pending 89events to the user. 90The 91.Fa changelist 92argument 93is a pointer to an array of 94.Va kevent 95structures, as defined in 96.In sys/event.h . 97All changes contained in the 98.Fa changelist 99are applied before any pending events are read from the queue. 100The 101.Fa nchanges 102argument 103gives the size of 104.Fa changelist . 105The 106.Fa eventlist 107argument 108is a pointer to an array of kevent structures. 109The 110.Fa nevents 111argument 112determines the size of 113.Fa eventlist . 114When 115.Fa nevents 116is zero, 117.Fn kevent 118will return immediately even if there is a 119.Fa timeout 120specified unlike 121.Xr select 2 . 122If 123.Fa timeout 124is a non-NULL pointer, it specifies a maximum interval to wait 125for an event, which will be interpreted as a struct timespec. 126If 127.Fa timeout 128is a NULL pointer, 129.Fn kevent 130waits indefinitely. 131To effect a poll, the 132.Fa timeout 133argument should be non-NULL, pointing to a zero-valued 134.Va timespec 135structure. 136The same array may be used for the 137.Fa changelist 138and 139.Fa eventlist . 140.Pp 141The 142.Fn EV_SET 143macro is provided for ease of initializing a 144kevent structure. 145.Pp 146The 147.Va kevent 148structure is defined as: 149.Bd -literal 150struct kevent { 151 uintptr_t ident; /* identifier for this event */ 152 short filter; /* filter for event */ 153 u_short flags; /* action flags for kqueue */ 154 u_int fflags; /* filter flag value */ 155 int64_t data; /* filter data value */ 156 void *udata; /* opaque user data identifier */ 157 uint64_t ext[4]; /* extensions */ 158}; 159.Ed 160.Pp 161The fields of 162.Fa struct kevent 163are: 164.Bl -tag -width "Fa filter" 165.It Fa ident 166Value used to identify this event. 167The exact interpretation is determined by the attached filter, 168but often is a file descriptor. 169.It Fa filter 170Identifies the kernel filter used to process this event. 171The pre-defined 172system filters are described below. 173.It Fa flags 174Actions to perform on the event. 175.It Fa fflags 176Filter-specific flags. 177.It Fa data 178Filter-specific data value. 179.It Fa udata 180Opaque user-defined value passed through the kernel unchanged. 181.It Fa ext 182Extended data passed to and from kernel. 183The 184.Fa ext[0] 185and 186.Fa ext[1] 187members use is defined by the filter. 188If the filter does not use them, the members are copied unchanged. 189The 190.Fa ext[2] 191and 192.Fa ext[3] 193members are always passed through the kernel as-is, 194making additional context available to application. 195.El 196.Pp 197The 198.Va flags 199field can contain the following values: 200.Bl -tag -width EV_DISPATCH 201.It Dv EV_ADD 202Adds the event to the kqueue. 203Re-adding an existing event 204will modify the parameters of the original event, and not result 205in a duplicate entry. 206Adding an event automatically enables it, 207unless overridden by the EV_DISABLE flag. 208.It Dv EV_ENABLE 209Permit 210.Fn kevent 211to return the event if it is triggered. 212.It Dv EV_DISABLE 213Disable the event so 214.Fn kevent 215will not return it. 216The filter itself is not disabled. 217.It Dv EV_DISPATCH 218Disable the event source immediately after delivery of an event. 219See 220.Dv EV_DISABLE 221above. 222.It Dv EV_DELETE 223Removes the event from the kqueue. 224Events which are attached to 225file descriptors are automatically deleted on the last close of 226the descriptor. 227.It Dv EV_RECEIPT 228This flag is useful for making bulk changes to a kqueue without draining 229any pending events. 230When passed as input, it forces 231.Dv EV_ERROR 232to always be returned. 233When a filter is successfully added the 234.Va data 235field will be zero. 236.It Dv EV_ONESHOT 237Causes the event to return only the first occurrence of the filter 238being triggered. 239After the user retrieves the event from the kqueue, 240it is deleted. 241.It Dv EV_CLEAR 242After the event is retrieved by the user, its state is reset. 243This is useful for filters which report state transitions 244instead of the current state. 245Note that some filters may automatically 246set this flag internally. 247.It Dv EV_EOF 248Filters may set this flag to indicate filter-specific EOF condition. 249.It Dv EV_ERROR 250See 251.Sx RETURN VALUES 252below. 253.El 254.Pp 255The predefined system filters are listed below. 256Arguments may be passed to and from the filter via the 257.Va fflags 258and 259.Va data 260fields in the kevent structure. 261.Bl -tag -width "Dv EVFILT_PROCDESC" 262.It Dv EVFILT_READ 263Takes a descriptor as the identifier, and returns whenever 264there is data available to read. 265The behavior of the filter is slightly different depending 266on the descriptor type. 267.Bl -tag -width 2n 268.It Sockets 269Sockets which have previously been passed to 270.Fn listen 271return when there is an incoming connection pending. 272.Va data 273contains the size of the listen backlog. 274.Pp 275Other socket descriptors return when there is data to be read, 276subject to the 277.Dv SO_RCVLOWAT 278value of the socket buffer. 279This may be overridden with a per-filter low water mark at the 280time the filter is added by setting the 281.Dv NOTE_LOWAT 282flag in 283.Va fflags , 284and specifying the new low water mark in 285.Va data . 286On return, 287.Va data 288contains the number of bytes of protocol data available to read. 289.Pp 290If the read direction of the socket has shutdown, then the filter 291also sets 292.Dv EV_EOF 293in 294.Va flags , 295and returns the socket error (if any) in 296.Va fflags . 297It is possible for EOF to be returned (indicating the connection is gone) 298while there is still data pending in the socket buffer. 299.It Vnodes 300Returns when the file pointer is not at the end of file. 301.Va data 302contains the offset from current position to end of file, 303and may be negative. 304.Pp 305This behavior is different from 306.Xr poll 2 , 307where read events are triggered for regular files unconditionally. 308This event can be triggered unconditionally by setting the 309.Dv NOTE_FILE_POLL 310flag in 311.Va fflags . 312.It "Fifos, Pipes" 313Returns when the there is data to read; 314.Va data 315contains the number of bytes available. 316.Pp 317When the last writer disconnects, the filter will set 318.Dv EV_EOF 319in 320.Va flags . 321This may be cleared by passing in 322.Dv EV_CLEAR , 323at which point the 324filter will resume waiting for data to become available before 325returning. 326.It "BPF devices" 327Returns when the BPF buffer is full, the BPF timeout has expired, or 328when the BPF has 329.Dq immediate mode 330enabled and there is any data to read; 331.Va data 332contains the number of bytes available. 333.El 334.It Dv EVFILT_WRITE 335Takes a descriptor as the identifier, and returns whenever 336it is possible to write to the descriptor. 337For sockets, pipes 338and fifos, 339.Va data 340will contain the amount of space remaining in the write buffer. 341The filter will set EV_EOF when the reader disconnects, and for 342the fifo case, this may be cleared by use of 343.Dv EV_CLEAR . 344Note that this filter is not supported for vnodes or BPF devices. 345.Pp 346For sockets, the low water mark and socket error handling is 347identical to the 348.Dv EVFILT_READ 349case. 350.It Dv EVFILT_EMPTY 351Takes a descriptor as the identifier, and returns whenever 352there is no remaining data in the write buffer. 353.It Dv EVFILT_AIO 354Events for this filter are not registered with 355.Fn kevent 356directly but are registered via the 357.Va aio_sigevent 358member of an asynchronous I/O request when it is scheduled via an 359asynchronous I/O system call such as 360.Fn aio_read . 361The filter returns under the same conditions as 362.Fn aio_error . 363For more details on this filter see 364.Xr sigevent 3 and 365.Xr aio 4 . 366.It Dv EVFILT_VNODE 367Takes a file descriptor as the identifier and the events to watch for in 368.Va fflags , 369and returns when one or more of the requested events occurs on the descriptor. 370The events to monitor are: 371.Bl -tag -width "Dv NOTE_CLOSE_WRITE" 372.It Dv NOTE_ATTRIB 373The file referenced by the descriptor had its attributes changed. 374.It Dv NOTE_CLOSE 375A file descriptor referencing the monitored file, was closed. 376The closed file descriptor did not have write access. 377.It Dv NOTE_CLOSE_WRITE 378A file descriptor referencing the monitored file, was closed. 379The closed file descriptor had write access. 380.Pp 381This note, as well as 382.Dv NOTE_CLOSE , 383are not activated when files are closed forcibly by 384.Xr unmount 2 or 385.Xr revoke 2 . 386Instead, 387.Dv NOTE_REVOKE 388is sent for such events. 389.It Dv NOTE_DELETE 390The 391.Fn unlink 392system call was called on the file referenced by the descriptor. 393.It Dv NOTE_EXTEND 394For regular file, the file referenced by the descriptor was extended. 395.Pp 396For directory, reports that a directory entry was added or removed, 397as the result of rename operation. 398The 399.Dv NOTE_EXTEND 400event is not reported when a name is changed inside the directory. 401.It Dv NOTE_LINK 402The link count on the file changed. 403In particular, the 404.Dv NOTE_LINK 405event is reported if a subdirectory was created or deleted inside 406the directory referenced by the descriptor. 407.It Dv NOTE_OPEN 408The file referenced by the descriptor was opened. 409.It Dv NOTE_READ 410A read occurred on the file referenced by the descriptor. 411.It Dv NOTE_RENAME 412The file referenced by the descriptor was renamed. 413.It Dv NOTE_REVOKE 414Access to the file was revoked via 415.Xr revoke 2 416or the underlying file system was unmounted. 417.It Dv NOTE_WRITE 418A write occurred on the file referenced by the descriptor. 419.El 420.Pp 421On return, 422.Va fflags 423contains the events which triggered the filter. 424.It Dv EVFILT_PROC 425Takes the process ID to monitor as the identifier and the events to watch for 426in 427.Va fflags , 428and returns when the process performs one or more of the requested events. 429If a process can normally see another process, it can attach an event to it. 430The events to monitor are: 431.Bl -tag -width "Dv NOTE_TRACKERR" 432.It Dv NOTE_EXIT 433The process has exited. 434The exit status will be stored in 435.Va data . 436.It Dv NOTE_FORK 437The process has called 438.Fn fork . 439.It Dv NOTE_EXEC 440The process has executed a new process via 441.Xr execve 2 442or a similar call. 443.It Dv NOTE_TRACK 444Follow a process across 445.Fn fork 446calls. 447The parent process registers a new kevent to monitor the child process 448using the same 449.Va fflags 450as the original event. 451The child process will signal an event with 452.Dv NOTE_CHILD 453set in 454.Va fflags 455and the parent PID in 456.Va data . 457.Pp 458If the parent process fails to register a new kevent 459.Pq usually due to resource limitations , 460it will signal an event with 461.Dv NOTE_TRACKERR 462set in 463.Va fflags , 464and the child process will not signal a 465.Dv NOTE_CHILD 466event. 467.El 468.Pp 469On return, 470.Va fflags 471contains the events which triggered the filter. 472.It Dv EVFILT_PROCDESC 473Takes the process descriptor created by 474.Xr pdfork 2 475to monitor as the identifier and the events to watch for in 476.Va fflags , 477and returns when the associated process performs one or more of the 478requested events. 479The events to monitor are: 480.Bl -tag -width "Dv NOTE_EXIT" 481.It Dv NOTE_EXIT 482The process has exited. 483The exit status will be stored in 484.Va data . 485.El 486.Pp 487On return, 488.Va fflags 489contains the events which triggered the filter. 490.It Dv EVFILT_SIGNAL 491Takes the signal number to monitor as the identifier and returns 492when the given signal is delivered to the process. 493This coexists with the 494.Fn signal 495and 496.Fn sigaction 497facilities, and has a lower precedence. 498The filter will record 499all attempts to deliver a signal to a process, even if the signal has 500been marked as 501.Dv SIG_IGN , 502except for the 503.Dv SIGCHLD 504signal, which, if ignored, will not be recorded by the filter. 505Event notification happens after normal 506signal delivery processing. 507.Va data 508returns the number of times the signal has occurred since the last call to 509.Fn kevent . 510This filter automatically sets the 511.Dv EV_CLEAR 512flag internally. 513.It Dv EVFILT_TIMER 514Establishes an arbitrary timer identified by 515.Va ident . 516When adding a timer, 517.Va data 518specifies the moment to fire the timer (for 519.Dv NOTE_ABSTIME ) 520or the timeout period. 521The timer will be periodic unless 522.Dv EV_ONESHOT 523or 524.Dv NOTE_ABSTIME 525is specified. 526On return, 527.Va data 528contains the number of times the timeout has expired since the last call to 529.Fn kevent . 530For non-monotonic timers, this filter automatically sets the 531.Dv EV_CLEAR 532flag internally. 533.Pp 534The filter accepts the following flags in the 535.Va fflags 536argument: 537.Bl -tag -width "Dv NOTE_MSECONDS" 538.It Dv NOTE_SECONDS 539.Va data 540is in seconds. 541.It Dv NOTE_MSECONDS 542.Va data 543is in milliseconds. 544.It Dv NOTE_USECONDS 545.Va data 546is in microseconds. 547.It Dv NOTE_NSECONDS 548.Va data 549is in nanoseconds. 550.It Dv NOTE_ABSTIME 551The specified expiration time is absolute. 552.El 553.Pp 554If 555.Va fflags 556is not set, the default is milliseconds. 557On return, 558.Va fflags 559contains the events which triggered the filter. 560.Pp 561If an existing timer is re-added, the existing timer will be 562effectively canceled (throwing away any undelivered record of previous 563timer expiration) and re-started using the new parameters contained in 564.Va data 565and 566.Va fflags . 567.Pp 568There is a system wide limit on the number of timers 569which is controlled by the 570.Va kern.kq_calloutmax 571sysctl. 572.It Dv EVFILT_USER 573Establishes a user event identified by 574.Va ident 575which is not associated with any kernel mechanism but is triggered by 576user level code. 577The lower 24 bits of the 578.Va fflags 579may be used for user defined flags and manipulated using the following: 580.Bl -tag -width "Dv NOTE_FFLAGSMASK" 581.It Dv NOTE_FFNOP 582Ignore the input 583.Va fflags . 584.It Dv NOTE_FFAND 585Bitwise AND 586.Va fflags . 587.It Dv NOTE_FFOR 588Bitwise OR 589.Va fflags . 590.It Dv NOTE_FFCOPY 591Copy 592.Va fflags . 593.It Dv NOTE_FFCTRLMASK 594Control mask for 595.Va fflags . 596.It Dv NOTE_FFLAGSMASK 597User defined flag mask for 598.Va fflags . 599.El 600.Pp 601A user event is triggered for output with the following: 602.Bl -tag -width "Dv NOTE_FFLAGSMASK" 603.It Dv NOTE_TRIGGER 604Cause the event to be triggered. 605.El 606.Pp 607On return, 608.Va fflags 609contains the users defined flags in the lower 24 bits. 610.El 611.Sh CANCELLATION BEHAVIOUR 612If 613.Fa nevents 614is non-zero, i.e., the function is potentially blocking, the call 615is a cancellation point. 616Otherwise, i.e., if 617.Fa nevents 618is zero, the call is not cancellable. 619Cancellation can only occur before any changes are made to the kqueue, 620or when the call was blocked and no changes to the queue were requested. 621.Sh RETURN VALUES 622The 623.Fn kqueue 624system call 625creates a new kernel event queue and returns a file descriptor. 626If there was an error creating the kernel event queue, a value of -1 is 627returned and errno set. 628.Pp 629The 630.Fn kevent 631system call 632returns the number of events placed in the 633.Fa eventlist , 634up to the value given by 635.Fa nevents . 636If an error occurs while processing an element of the 637.Fa changelist 638and there is enough room in the 639.Fa eventlist , 640then the event will be placed in the 641.Fa eventlist 642with 643.Dv EV_ERROR 644set in 645.Va flags 646and the system error in 647.Va data . 648Otherwise, 649.Dv -1 650will be returned, and 651.Dv errno 652will be set to indicate the error condition. 653If the time limit expires, then 654.Fn kevent 655returns 0. 656.Sh EXAMPLES 657.Bd -literal -compact 658#include <sys/event.h> 659#include <err.h> 660#include <fcntl.h> 661#include <stdio.h> 662#include <stdlib.h> 663#include <string.h> 664 665int 666main(int argc, char **argv) 667{ 668 struct kevent event; /* Event we want to monitor */ 669 struct kevent tevent; /* Event triggered */ 670 int kq, fd, ret; 671 672 if (argc != 2) 673 err(EXIT_FAILURE, "Usage: %s path\en", argv[0]); 674 fd = open(argv[1], O_RDONLY); 675 if (fd == -1) 676 err(EXIT_FAILURE, "Failed to open '%s'", argv[1]); 677 678 /* Create kqueue. */ 679 kq = kqueue(); 680 if (kq == -1) 681 err(EXIT_FAILURE, "kqueue() failed"); 682 683 /* Initialize kevent structure. */ 684 EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE, 685 0, NULL); 686 /* Attach event to the kqueue. */ 687 ret = kevent(kq, &event, 1, NULL, 0, NULL); 688 if (ret == -1) 689 err(EXIT_FAILURE, "kevent register"); 690 if (event.flags & EV_ERROR) 691 errx(EXIT_FAILURE, "Event error: %s", strerror(event.data)); 692 693 for (;;) { 694 /* Sleep until something happens. */ 695 ret = kevent(kq, NULL, 0, &tevent, 1, NULL); 696 if (ret == -1) { 697 err(EXIT_FAILURE, "kevent wait"); 698 } else if (ret > 0) { 699 printf("Something was written in '%s'\en", argv[1]); 700 } 701 } 702} 703.Ed 704.Sh ERRORS 705The 706.Fn kqueue 707system call fails if: 708.Bl -tag -width Er 709.It Bq Er ENOMEM 710The kernel failed to allocate enough memory for the kernel queue. 711.It Bq Er ENOMEM 712The 713.Dv RLIMIT_KQUEUES 714rlimit 715(see 716.Xr getrlimit 2 ) 717for the current user would be exceeded. 718.It Bq Er EMFILE 719The per-process descriptor table is full. 720.It Bq Er ENFILE 721The system file table is full. 722.El 723.Pp 724The 725.Fn kevent 726system call fails if: 727.Bl -tag -width Er 728.It Bq Er EACCES 729The process does not have permission to register a filter. 730.It Bq Er EFAULT 731There was an error reading or writing the 732.Va kevent 733structure. 734.It Bq Er EBADF 735The specified descriptor is invalid. 736.It Bq Er EINTR 737A signal was delivered before the timeout expired and before any 738events were placed on the kqueue for return. 739.It Bq Er EINTR 740A cancellation request was delivered to the thread, but not yet handled. 741.It Bq Er EINVAL 742The specified time limit or filter is invalid. 743.It Bq Er ENOENT 744The event could not be found to be modified or deleted. 745.It Bq Er ENOMEM 746No memory was available to register the event 747or, in the special case of a timer, the maximum number of 748timers has been exceeded. 749This maximum is configurable via the 750.Va kern.kq_calloutmax 751sysctl. 752.It Bq Er ESRCH 753The specified process to attach to does not exist. 754.El 755.Pp 756When 757.Fn kevent 758call fails with 759.Er EINTR 760error, all changes in the 761.Fa changelist 762have been applied. 763.Sh SEE ALSO 764.Xr aio_error 2 , 765.Xr aio_read 2 , 766.Xr aio_return 2 , 767.Xr poll 2 , 768.Xr read 2 , 769.Xr select 2 , 770.Xr sigaction 2 , 771.Xr write 2 , 772.Xr pthread_setcancelstate 3 , 773.Xr signal 3 774.Sh HISTORY 775The 776.Fn kqueue 777and 778.Fn kevent 779system calls first appeared in 780.Fx 4.1 . 781.Sh AUTHORS 782The 783.Fn kqueue 784system and this manual page were written by 785.An Jonathan Lemon Aq Mt [email protected] . 786.Sh BUGS 787The 788.Fa timeout 789value is limited to 24 hours; longer timeouts will be silently 790reinterpreted as 24 hours. 791.Pp 792In versions older than 793.Fx 12.0 , 794.In sys/event.h 795failed to parse without including 796.In sys/types.h 797manually. 798