1.\" Copyright (c) 2000 Jonathan Lemon 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd June 17, 2017 28.Dt KQUEUE 2 29.Os 30.Sh NAME 31.Nm kqueue , 32.Nm kevent 33.Nd kernel event notification mechanism 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/event.h 38.Ft int 39.Fn kqueue "void" 40.Ft int 41.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout" 42.Fn EV_SET "kev" ident filter flags fflags data udata 43.Sh DESCRIPTION 44The 45.Fn kqueue 46system call 47provides a generic method of notifying the user when an event 48happens or a condition holds, based on the results of small 49pieces of kernel code termed filters. 50A kevent is identified by the (ident, filter) pair; there may only 51be one unique kevent per kqueue. 52.Pp 53The filter is executed upon the initial registration of a kevent 54in order to detect whether a preexisting condition is present, and is also 55executed whenever an event is passed to the filter for evaluation. 56If the filter determines that the condition should be reported, 57then the kevent is placed on the kqueue for the user to retrieve. 58.Pp 59The filter is also run when the user attempts to retrieve the kevent 60from the kqueue. 61If the filter indicates that the condition that triggered 62the event no longer holds, the kevent is removed from the kqueue and 63is not returned. 64.Pp 65Multiple events which trigger the filter do not result in multiple 66kevents being placed on the kqueue; instead, the filter will aggregate 67the events into a single struct kevent. 68Calling 69.Fn close 70on a file descriptor will remove any kevents that reference the descriptor. 71.Pp 72The 73.Fn kqueue 74system call 75creates a new kernel event queue and returns a descriptor. 76The queue is not inherited by a child created with 77.Xr fork 2 . 78However, if 79.Xr rfork 2 80is called without the 81.Dv RFFDG 82flag, then the descriptor table is shared, 83which will allow sharing of the kqueue between two processes. 84.Pp 85The 86.Fn kevent 87system call 88is used to register events with the queue, and return any pending 89events to the user. 90The 91.Fa changelist 92argument 93is a pointer to an array of 94.Va kevent 95structures, as defined in 96.In sys/event.h . 97All changes contained in the 98.Fa changelist 99are applied before any pending events are read from the queue. 100The 101.Fa nchanges 102argument 103gives the size of 104.Fa changelist . 105The 106.Fa eventlist 107argument 108is a pointer to an array of kevent structures. 109The 110.Fa nevents 111argument 112determines the size of 113.Fa eventlist . 114When 115.Fa nevents 116is zero, 117.Fn kevent 118will return immediately even if there is a 119.Fa timeout 120specified unlike 121.Xr select 2 . 122If 123.Fa timeout 124is a non-NULL pointer, it specifies a maximum interval to wait 125for an event, which will be interpreted as a struct timespec. 126If 127.Fa timeout 128is a NULL pointer, 129.Fn kevent 130waits indefinitely. 131To effect a poll, the 132.Fa timeout 133argument should be non-NULL, pointing to a zero-valued 134.Va timespec 135structure. 136The same array may be used for the 137.Fa changelist 138and 139.Fa eventlist . 140.Pp 141The 142.Fn EV_SET 143macro is provided for ease of initializing a 144kevent structure. 145.Pp 146The 147.Va kevent 148structure is defined as: 149.Bd -literal 150struct kevent { 151 uintptr_t ident; /* identifier for this event */ 152 short filter; /* filter for event */ 153 u_short flags; /* action flags for kqueue */ 154 u_int fflags; /* filter flag value */ 155 int64_t data; /* filter data value */ 156 void *udata; /* opaque user data identifier */ 157 uint64_t ext[4]; /* extentions */ 158}; 159.Ed 160.Pp 161The fields of 162.Fa struct kevent 163are: 164.Bl -tag -width "Fa filter" 165.It Fa ident 166Value used to identify this event. 167The exact interpretation is determined by the attached filter, 168but often is a file descriptor. 169.It Fa filter 170Identifies the kernel filter used to process this event. 171The pre-defined 172system filters are described below. 173.It Fa flags 174Actions to perform on the event. 175.It Fa fflags 176Filter-specific flags. 177.It Fa data 178Filter-specific data value. 179.It Fa udata 180Opaque user-defined value passed through the kernel unchanged. 181.It Fa ext 182Extended data passed to and from kernel. 183The 184.Fa ext[0] 185and 186.Fa ext[1] 187members use is defined by the filter. 188If the filter does not use them, the members are copied unchanged. 189The 190.Fa ext[2] 191and 192.Fa ext[3] 193members are always passed throught the kernel as-is, 194making additional context available to application. 195.El 196.Pp 197The 198.Va flags 199field can contain the following values: 200.Bl -tag -width EV_DISPATCH 201.It Dv EV_ADD 202Adds the event to the kqueue. 203Re-adding an existing event 204will modify the parameters of the original event, and not result 205in a duplicate entry. 206Adding an event automatically enables it, 207unless overridden by the EV_DISABLE flag. 208.It Dv EV_ENABLE 209Permit 210.Fn kevent 211to return the event if it is triggered. 212.It Dv EV_DISABLE 213Disable the event so 214.Fn kevent 215will not return it. 216The filter itself is not disabled. 217.It Dv EV_DISPATCH 218Disable the event source immediately after delivery of an event. 219See 220.Dv EV_DISABLE 221above. 222.It Dv EV_DELETE 223Removes the event from the kqueue. 224Events which are attached to 225file descriptors are automatically deleted on the last close of 226the descriptor. 227.It Dv EV_RECEIPT 228This flag is useful for making bulk changes to a kqueue without draining 229any pending events. 230When passed as input, it forces 231.Dv EV_ERROR 232to always be returned. 233When a filter is successfully added the 234.Va data 235field will be zero. 236.It Dv EV_ONESHOT 237Causes the event to return only the first occurrence of the filter 238being triggered. 239After the user retrieves the event from the kqueue, 240it is deleted. 241.It Dv EV_CLEAR 242After the event is retrieved by the user, its state is reset. 243This is useful for filters which report state transitions 244instead of the current state. 245Note that some filters may automatically 246set this flag internally. 247.It Dv EV_EOF 248Filters may set this flag to indicate filter-specific EOF condition. 249.It Dv EV_ERROR 250See 251.Sx RETURN VALUES 252below. 253.El 254.Pp 255The predefined system filters are listed below. 256Arguments may be passed to and from the filter via the 257.Va fflags 258and 259.Va data 260fields in the kevent structure. 261.Bl -tag -width "Dv EVFILT_PROCDESC" 262.It Dv EVFILT_READ 263Takes a descriptor as the identifier, and returns whenever 264there is data available to read. 265The behavior of the filter is slightly different depending 266on the descriptor type. 267.Bl -tag -width 2n 268.It Sockets 269Sockets which have previously been passed to 270.Fn listen 271return when there is an incoming connection pending. 272.Va data 273contains the size of the listen backlog. 274.Pp 275Other socket descriptors return when there is data to be read, 276subject to the 277.Dv SO_RCVLOWAT 278value of the socket buffer. 279This may be overridden with a per-filter low water mark at the 280time the filter is added by setting the 281.Dv NOTE_LOWAT 282flag in 283.Va fflags , 284and specifying the new low water mark in 285.Va data . 286On return, 287.Va data 288contains the number of bytes of protocol data available to read. 289.Pp 290If the read direction of the socket has shutdown, then the filter 291also sets 292.Dv EV_EOF 293in 294.Va flags , 295and returns the socket error (if any) in 296.Va fflags . 297It is possible for EOF to be returned (indicating the connection is gone) 298while there is still data pending in the socket buffer. 299.It Vnodes 300Returns when the file pointer is not at the end of file. 301.Va data 302contains the offset from current position to end of file, 303and may be negative. 304.Pp 305This behavior is different from 306.Xr poll 2 , 307where read events are triggered for regular files unconditionally. 308This event can be triggered unconditionally by setting the 309.Dv NOTE_FILE_POLL 310flag in 311.Va fflags . 312.It "Fifos, Pipes" 313Returns when the there is data to read; 314.Va data 315contains the number of bytes available. 316.Pp 317When the last writer disconnects, the filter will set 318.Dv EV_EOF 319in 320.Va flags . 321This may be cleared by passing in 322.Dv EV_CLEAR , 323at which point the 324filter will resume waiting for data to become available before 325returning. 326.It "BPF devices" 327Returns when the BPF buffer is full, the BPF timeout has expired, or 328when the BPF has 329.Dq immediate mode 330enabled and there is any data to read; 331.Va data 332contains the number of bytes available. 333.El 334.It Dv EVFILT_WRITE 335Takes a descriptor as the identifier, and returns whenever 336it is possible to write to the descriptor. 337For sockets, pipes 338and fifos, 339.Va data 340will contain the amount of space remaining in the write buffer. 341The filter will set EV_EOF when the reader disconnects, and for 342the fifo case, this may be cleared by use of 343.Dv EV_CLEAR . 344Note that this filter is not supported for vnodes or BPF devices. 345.Pp 346For sockets, the low water mark and socket error handling is 347identical to the 348.Dv EVFILT_READ 349case. 350.It Dv EVFILT_EMPTY 351Takes a descriptor as the identifier, and returns whenever 352there is no remaining data in the write buffer. 353.It Dv EVFILT_AIO 354The sigevent portion of the AIO request is filled in, with 355.Va sigev_notify_kqueue 356containing the descriptor of the kqueue that the event should 357be attached to, 358.Va sigev_notify_kevent_flags 359containing the kevent flags which should be 360.Dv EV_ONESHOT , 361.Dv EV_CLEAR 362or 363.Dv EV_DISPATCH , 364.Va sigev_value 365containing the udata value, and 366.Va sigev_notify 367set to 368.Dv SIGEV_KEVENT . 369When the 370.Fn aio_* 371system call is made, the event will be registered 372with the specified kqueue, and the 373.Va ident 374argument set to the 375.Fa struct aiocb 376returned by the 377.Fn aio_* 378system call. 379The filter returns under the same conditions as 380.Fn aio_error . 381.It Dv EVFILT_VNODE 382Takes a file descriptor as the identifier and the events to watch for in 383.Va fflags , 384and returns when one or more of the requested events occurs on the descriptor. 385The events to monitor are: 386.Bl -tag -width "Dv NOTE_CLOSE_WRITE" 387.It Dv NOTE_ATTRIB 388The file referenced by the descriptor had its attributes changed. 389.It Dv NOTE_CLOSE 390A file descriptor referencing the monitored file, was closed. 391The closed file descriptor did not have write access. 392.It Dv NOTE_CLOSE_WRITE 393A file descriptor referencing the monitored file, was closed. 394The closed file descriptor had write access. 395.Pp 396This note, as well as 397.Dv NOTE_CLOSE , 398are not activated when files are closed forcibly by 399.Xr unmount 2 or 400.Xr revoke 2 . 401Instead, 402.Dv NOTE_REVOKE 403is sent for such events. 404.It Dv NOTE_DELETE 405The 406.Fn unlink 407system call was called on the file referenced by the descriptor. 408.It Dv NOTE_EXTEND 409For regular file, the file referenced by the descriptor was extended. 410.Pp 411For directory, reports that a directory entry was added or removed, 412as the result of rename operation. 413The 414.Dv NOTE_EXTEND 415event is not reported when a name is changed inside the directory. 416.It Dv NOTE_LINK 417The link count on the file changed. 418In particular, the 419.Dv NOTE_LINK 420event is reported if a subdirectory was created or deleted inside 421the directory referenced by the descriptor. 422.It Dv NOTE_OPEN 423The file referenced by the descriptor was opened. 424.It Dv NOTE_READ 425A read occurred on the file referenced by the descriptor. 426.It Dv NOTE_RENAME 427The file referenced by the descriptor was renamed. 428.It Dv NOTE_REVOKE 429Access to the file was revoked via 430.Xr revoke 2 431or the underlying file system was unmounted. 432.It Dv NOTE_WRITE 433A write occurred on the file referenced by the descriptor. 434.El 435.Pp 436On return, 437.Va fflags 438contains the events which triggered the filter. 439.It Dv EVFILT_PROC 440Takes the process ID to monitor as the identifier and the events to watch for 441in 442.Va fflags , 443and returns when the process performs one or more of the requested events. 444If a process can normally see another process, it can attach an event to it. 445The events to monitor are: 446.Bl -tag -width "Dv NOTE_TRACKERR" 447.It Dv NOTE_EXIT 448The process has exited. 449The exit status will be stored in 450.Va data . 451.It Dv NOTE_FORK 452The process has called 453.Fn fork . 454.It Dv NOTE_EXEC 455The process has executed a new process via 456.Xr execve 2 457or a similar call. 458.It Dv NOTE_TRACK 459Follow a process across 460.Fn fork 461calls. 462The parent process registers a new kevent to monitor the child process 463using the same 464.Va fflags 465as the original event. 466The child process will signal an event with 467.Dv NOTE_CHILD 468set in 469.Va fflags 470and the parent PID in 471.Va data . 472.Pp 473If the parent process fails to register a new kevent 474.Pq usually due to resource limitations , 475it will signal an event with 476.Dv NOTE_TRACKERR 477set in 478.Va fflags , 479and the child process will not signal a 480.Dv NOTE_CHILD 481event. 482.El 483.Pp 484On return, 485.Va fflags 486contains the events which triggered the filter. 487.It Dv EVFILT_PROCDESC 488Takes the process descriptor created by 489.Xr pdfork 2 490to monitor as the identifier and the events to watch for in 491.Va fflags , 492and returns when the associated process performs one or more of the 493requested events. 494The events to monitor are: 495.Bl -tag -width "Dv NOTE_EXIT" 496.It Dv NOTE_EXIT 497The process has exited. 498The exit status will be stored in 499.Va data . 500.El 501.Pp 502On return, 503.Va fflags 504contains the events which triggered the filter. 505.It Dv EVFILT_SIGNAL 506Takes the signal number to monitor as the identifier and returns 507when the given signal is delivered to the process. 508This coexists with the 509.Fn signal 510and 511.Fn sigaction 512facilities, and has a lower precedence. 513The filter will record 514all attempts to deliver a signal to a process, even if the signal has 515been marked as 516.Dv SIG_IGN , 517except for the 518.Dv SIGCHLD 519signal, which, if ignored, won't be recorded by the filter. 520Event notification happens after normal 521signal delivery processing. 522.Va data 523returns the number of times the signal has occurred since the last call to 524.Fn kevent . 525This filter automatically sets the 526.Dv EV_CLEAR 527flag internally. 528.It Dv EVFILT_TIMER 529Establishes an arbitrary timer identified by 530.Va ident . 531When adding a timer, 532.Va data 533specifies the moment to fire the timer (for 534.Dv NOTE_ABSTIME ) 535or the timeout period. 536The timer will be periodic unless 537.Dv EV_ONESHOT 538or 539.Dv NOTE_ABSTIME 540is specified. 541On return, 542.Va data 543contains the number of times the timeout has expired since the last call to 544.Fn kevent . 545For non-monotonic timers, this filter automatically sets the 546.Dv EV_CLEAR 547flag internally. 548.Pp 549The filter accepts the following flags in the 550.Va fflags 551argument: 552.Bl -tag -width "Dv NOTE_MSECONDS" 553.It Dv NOTE_SECONDS 554.Va data 555is in seconds. 556.It Dv NOTE_MSECONDS 557.Va data 558is in milliseconds. 559.It Dv NOTE_USECONDS 560.Va data 561is in microseconds. 562.It Dv NOTE_NSECONDS 563.Va data 564is in nanoseconds. 565.It Dv NOTE_ABSTIME 566The specified expiration time is absolute. 567.El 568.Pp 569If 570.Va fflags 571is not set, the default is milliseconds. 572On return, 573.Va fflags 574contains the events which triggered the filter. 575.Pp 576There is a system wide limit on the number of timers 577which is controlled by the 578.Va kern.kq_calloutmax 579sysctl. 580.It Dv EVFILT_USER 581Establishes a user event identified by 582.Va ident 583which is not associated with any kernel mechanism but is triggered by 584user level code. 585The lower 24 bits of the 586.Va fflags 587may be used for user defined flags and manipulated using the following: 588.Bl -tag -width "Dv NOTE_FFLAGSMASK" 589.It Dv NOTE_FFNOP 590Ignore the input 591.Va fflags . 592.It Dv NOTE_FFAND 593Bitwise AND 594.Va fflags . 595.It Dv NOTE_FFOR 596Bitwise OR 597.Va fflags . 598.It Dv NOTE_FFCOPY 599Copy 600.Va fflags . 601.It Dv NOTE_FFCTRLMASK 602Control mask for 603.Va fflags . 604.It Dv NOTE_FFLAGSMASK 605User defined flag mask for 606.Va fflags . 607.El 608.Pp 609A user event is triggered for output with the following: 610.Bl -tag -width "Dv NOTE_FFLAGSMASK" 611.It Dv NOTE_TRIGGER 612Cause the event to be triggered. 613.El 614.Pp 615On return, 616.Va fflags 617contains the users defined flags in the lower 24 bits. 618.El 619.Sh CANCELLATION BEHAVIOUR 620If 621.Fa nevents 622is non-zero, i.e. the function is potentially blocking, the call 623is a cancellation point. 624Otherwise, i.e. if 625.Fa nevents 626is zero, the call is not cancellable. 627Cancellation can only occur before any changes are made to the kqueue, 628or when the call was blocked and no changes to the queue were requested. 629.Sh RETURN VALUES 630The 631.Fn kqueue 632system call 633creates a new kernel event queue and returns a file descriptor. 634If there was an error creating the kernel event queue, a value of -1 is 635returned and errno set. 636.Pp 637The 638.Fn kevent 639system call 640returns the number of events placed in the 641.Fa eventlist , 642up to the value given by 643.Fa nevents . 644If an error occurs while processing an element of the 645.Fa changelist 646and there is enough room in the 647.Fa eventlist , 648then the event will be placed in the 649.Fa eventlist 650with 651.Dv EV_ERROR 652set in 653.Va flags 654and the system error in 655.Va data . 656Otherwise, 657.Dv -1 658will be returned, and 659.Dv errno 660will be set to indicate the error condition. 661If the time limit expires, then 662.Fn kevent 663returns 0. 664.Sh EXAMPLES 665.Bd -literal -compact 666#include <sys/event.h> 667#include <err.h> 668#include <fcntl.h> 669#include <stdio.h> 670#include <stdlib.h> 671#include <string.h> 672 673int 674main(int argc, char **argv) 675{ 676 struct kevent event; /* Event we want to monitor */ 677 struct kevent tevent; /* Event triggered */ 678 int kq, fd, ret; 679 680 if (argc != 2) 681 err(EXIT_FAILURE, "Usage: %s path\en", argv[0]); 682 fd = open(argv[1], O_RDONLY); 683 if (fd == -1) 684 err(EXIT_FAILURE, "Failed to open '%s'", argv[1]); 685 686 /* Create kqueue. */ 687 kq = kqueue(); 688 if (kq == -1) 689 err(EXIT_FAILURE, "kqueue() failed"); 690 691 /* Initialize kevent structure. */ 692 EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE, 693 0, NULL); 694 /* Attach event to the kqueue. */ 695 ret = kevent(kq, &event, 1, NULL, 0, NULL); 696 if (ret == -1) 697 err(EXIT_FAILURE, "kevent register"); 698 if (event.flags & EV_ERROR) 699 errx(EXIT_FAILURE, "Event error: %s", strerror(event.data)); 700 701 for (;;) { 702 /* Sleep until something happens. */ 703 ret = kevent(kq, NULL, 0, &tevent, 1, NULL); 704 if (ret == -1) { 705 err(EXIT_FAILURE, "kevent wait"); 706 } else if (ret > 0) { 707 printf("Something was written in '%s'\en", argv[1]); 708 } 709 } 710} 711.Ed 712.Sh ERRORS 713The 714.Fn kqueue 715system call fails if: 716.Bl -tag -width Er 717.It Bq Er ENOMEM 718The kernel failed to allocate enough memory for the kernel queue. 719.It Bq Er ENOMEM 720The 721.Dv RLIMIT_KQUEUES 722rlimit 723(see 724.Xr getrlimit 2 ) 725for the current user would be exceeded. 726.It Bq Er EMFILE 727The per-process descriptor table is full. 728.It Bq Er ENFILE 729The system file table is full. 730.El 731.Pp 732The 733.Fn kevent 734system call fails if: 735.Bl -tag -width Er 736.It Bq Er EACCES 737The process does not have permission to register a filter. 738.It Bq Er EFAULT 739There was an error reading or writing the 740.Va kevent 741structure. 742.It Bq Er EBADF 743The specified descriptor is invalid. 744.It Bq Er EINTR 745A signal was delivered before the timeout expired and before any 746events were placed on the kqueue for return. 747.It Bq Er EINTR 748A cancellation request was delivered to the thread, but not yet handled. 749.It Bq Er EINVAL 750The specified time limit or filter is invalid. 751.It Bq Er ENOENT 752The event could not be found to be modified or deleted. 753.It Bq Er ENOMEM 754No memory was available to register the event 755or, in the special case of a timer, the maximum number of 756timers has been exceeded. 757This maximum is configurable via the 758.Va kern.kq_calloutmax 759sysctl. 760.It Bq Er ESRCH 761The specified process to attach to does not exist. 762.El 763.Pp 764When 765.Fn kevent 766call fails with 767.Er EINTR 768error, all changes in the 769.Fa changelist 770have been applied. 771.Sh SEE ALSO 772.Xr aio_error 2 , 773.Xr aio_read 2 , 774.Xr aio_return 2 , 775.Xr poll 2 , 776.Xr read 2 , 777.Xr select 2 , 778.Xr sigaction 2 , 779.Xr write 2 , 780.Xr pthread_setcancelstate 3 , 781.Xr signal 3 782.Sh HISTORY 783The 784.Fn kqueue 785and 786.Fn kevent 787system calls first appeared in 788.Fx 4.1 . 789.Sh AUTHORS 790The 791.Fn kqueue 792system and this manual page were written by 793.An Jonathan Lemon Aq Mt [email protected] . 794.Sh BUGS 795The 796.Fa timeout 797value is limited to 24 hours; longer timeouts will be silently 798reinterpreted as 24 hours. 799.Pp 800Previous versions of 801.In sys/event.h 802fail to parse without including 803.In sys/types.h 804manually. 805