1.\" Copyright (c) 2003, David G. Lawrence 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice unmodified, this list of conditions, and the following 9.\" disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 17.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 24.\" SUCH DAMAGE. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd March 30, 2020 29.Dt SENDFILE 2 30.Os 31.Sh NAME 32.Nm sendfile 33.Nd send a file to a socket 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/types.h 38.In sys/socket.h 39.In sys/uio.h 40.Ft int 41.Fo sendfile 42.Fa "int fd" "int s" "off_t offset" "size_t nbytes" 43.Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags" 44.Fc 45.Sh DESCRIPTION 46The 47.Fn sendfile 48system call 49sends a regular file or shared memory object specified by descriptor 50.Fa fd 51out a stream socket specified by descriptor 52.Fa s . 53.Pp 54The 55.Fa offset 56argument specifies where to begin in the file. 57Should 58.Fa offset 59fall beyond the end of file, the system will return 60success and report 0 bytes sent as described below. 61The 62.Fa nbytes 63argument specifies how many bytes of the file should be sent, with 0 having the special 64meaning of send until the end of file has been reached. 65.Pp 66An optional header and/or trailer can be sent before and after the file data by specifying 67a pointer to a 68.Vt "struct sf_hdtr" , 69which has the following structure: 70.Pp 71.Bd -literal -offset indent -compact 72struct sf_hdtr { 73 struct iovec *headers; /* pointer to header iovecs */ 74 int hdr_cnt; /* number of header iovecs */ 75 struct iovec *trailers; /* pointer to trailer iovecs */ 76 int trl_cnt; /* number of trailer iovecs */ 77}; 78.Ed 79.Pp 80The 81.Fa headers 82and 83.Fa trailers 84pointers, if 85.Pf non- Dv NULL , 86point to arrays of 87.Vt "struct iovec" 88structures. 89See the 90.Fn writev 91system call for information on the iovec structure. 92The number of iovecs in these 93arrays is specified by 94.Fa hdr_cnt 95and 96.Fa trl_cnt . 97.Pp 98If 99.Pf non- Dv NULL , 100the system will write the total number of bytes sent on the socket to the 101variable pointed to by 102.Fa sbytes . 103.Pp 104The least significant 16 bits of 105.Fa flags 106argument is a bitmap of these values: 107.Bl -tag -offset indent -width "SF_USER_READAHEAD" 108.It Dv SF_NODISKIO 109This flag causes 110.Nm 111to return 112.Er EBUSY 113instead of blocking when a busy page is encountered. 114This rare situation can happen if some other process is now working 115with the same region of the file. 116It is advised to retry the operation after a short period. 117.Pp 118Note that in older 119.Fx 120versions the 121.Dv SF_NODISKIO 122had slightly different notion. 123The flag prevented 124.Nm 125to run I/O operations in case if an invalid (not cached) page is encountered, 126thus avoiding blocking on I/O. 127Starting with 128.Fx 11 129.Nm 130sending files off the 131.Xr ffs 7 132filesystem does not block on I/O 133(see 134.Sx IMPLEMENTATION NOTES 135), so the condition no longer applies. 136However, it is safe if an application utilizes 137.Dv SF_NODISKIO 138and on 139.Er EBUSY 140performs the same action as it did in 141older 142.Fx 143versions, e.g., 144.Xr aio_read 2 , 145.Xr read 2 146or 147.Nm 148in a different context. 149.It Dv SF_NOCACHE 150The data sent to socket will not be cached by the virtual memory system, 151and will be freed directly to the pool of free pages. 152.It Dv SF_SYNC 153.Nm 154sleeps until the network stack no longer references the VM pages 155of the file, making subsequent modifications to it safe. 156Please note that this is not a guarantee that the data has actually 157been sent. 158.It Dv SF_USER_READAHEAD 159.Nm 160has some internal heuristics to do readahead when sending data. 161This flag forces 162.Nm 163to override any heuristically calculated readahead and use exactly the 164application specified readahead. 165See 166.Sx SETTING READAHEAD 167for more details on readahead. 168.El 169.Pp 170When using a socket marked for non-blocking I/O, 171.Fn sendfile 172may send fewer bytes than requested. 173In this case, the number of bytes successfully 174written is returned in 175.Fa *sbytes 176(if specified), 177and the error 178.Er EAGAIN 179is returned. 180.Sh SETTING READAHEAD 181.Nm 182uses internal heuristics based on request size and file system layout 183to do readahead. 184Additionally application may request extra readahead. 185The most significant 16 bits of 186.Fa flags 187specify amount of pages that 188.Nm 189may read ahead when reading the file. 190A macro 191.Fn SF_FLAGS 192is provided to combine readahead amount and flags. 193An example showing specifying readahead of 16 pages and 194.Dv SF_NOCACHE 195flag: 196.Pp 197.Bd -literal -offset indent -compact 198 SF_FLAGS(16, SF_NOCACHE) 199.Ed 200.Pp 201.Nm 202will use either application specified readahead or internally calculated, 203whichever is bigger. 204Setting flag 205.Dv SF_USER_READAHEAD 206would turn off any heuristics and set maximum possible readahead length to 207the number of pages specified via flags. 208.Sh IMPLEMENTATION NOTES 209The 210.Fx 211implementation of 212.Fn sendfile 213does not block on disk I/O when it sends a file off the 214.Xr ffs 7 215filesystem. 216The syscall returns success before the actual I/O completes, and data 217is put into the socket later unattended. 218However, the order of data in the socket is preserved, so it is safe 219to do further writes to the socket. 220.Pp 221The 222.Fx 223implementation of 224.Fn sendfile 225is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided. 226.Sh TUNING 227.Ss physical paging buffers 228.Fn sendfile 229uses vnode pager to read file pages into memory. 230The pager uses a pool of physical buffers to run its I/O operations. 231When system runs out of pbufs, sendfile will block and report state 232.Dq Li zonelimit . 233Size of the pool can be tuned with 234.Va vm.vnode_pbufs 235.Xr loader.conf 5 236tunable and can be checked with 237.Xr sysctl 8 238OID of the same name at runtime. 239.Ss sendfile(2) buffers 240On some architectures, this system call internally uses a special 241.Fn sendfile 242buffer 243.Pq Vt "struct sf_buf" 244to handle sending file data to the client. 245If the sending socket is 246blocking, and there are not enough 247.Fn sendfile 248buffers available, 249.Fn sendfile 250will block and report a state of 251.Dq Li sfbufa . 252If the sending socket is non-blocking and there are not enough 253.Fn sendfile 254buffers available, the call will block and wait for the 255necessary buffers to become available before finishing the call. 256.Pp 257The number of 258.Vt sf_buf Ns 's 259allocated should be proportional to the number of nmbclusters used to 260send data to a client via 261.Fn sendfile . 262Tune accordingly to avoid blocking! 263Busy installations that make extensive use of 264.Fn sendfile 265may want to increase these values to be inline with their 266.Va kern.ipc.nmbclusters 267(see 268.Xr tuning 7 269for details). 270.Pp 271The number of 272.Fn sendfile 273buffers available is determined at boot time by either the 274.Va kern.ipc.nsfbufs 275.Xr loader.conf 5 276variable or the 277.Dv NSFBUFS 278kernel configuration tunable. 279The number of 280.Fn sendfile 281buffers scales with 282.Va kern.maxusers . 283The 284.Va kern.ipc.nsfbufsused 285and 286.Va kern.ipc.nsfbufspeak 287read-only 288.Xr sysctl 8 289variables show current and peak 290.Fn sendfile 291buffers usage respectively. 292These values may also be viewed through 293.Nm netstat Fl m . 294.Pp 295If 296.Xr sysctl 8 297OID 298.Va kern.ipc.nsfbufs 299doesn't exist, your architecture does not need to use 300.Fn sendfile 301buffers because their task can be efficiently performed 302by the generic virtual memory structures. 303.Sh RETURN VALUES 304.Rv -std sendfile 305.Sh ERRORS 306.Bl -tag -width Er 307.It Bq Er EAGAIN 308The socket is marked for non-blocking I/O and not all data was sent due to 309the socket buffer being filled. 310If specified, the number of bytes successfully sent will be returned in 311.Fa *sbytes . 312.It Bq Er EBADF 313The 314.Fa fd 315argument 316is not a valid file descriptor. 317.It Bq Er EBADF 318The 319.Fa s 320argument 321is not a valid socket descriptor. 322.It Bq Er EBUSY 323A busy page was encountered and 324.Dv SF_NODISKIO 325had been specified. 326Partial data may have been sent. 327.It Bq Er EFAULT 328An invalid address was specified for an argument. 329.It Bq Er EINTR 330A signal interrupted 331.Fn sendfile 332before it could be completed. 333If specified, the number 334of bytes successfully sent will be returned in 335.Fa *sbytes . 336.It Bq Er EINVAL 337The 338.Fa fd 339argument 340is not a regular file. 341.It Bq Er EINVAL 342The 343.Fa s 344argument 345is not a SOCK_STREAM type socket. 346.It Bq Er EINVAL 347The 348.Fa offset 349argument 350is negative. 351.It Bq Er EIO 352An error occurred while reading from 353.Fa fd . 354.It Bq Er EINTEGRITY 355Corrupted data was detected while reading from 356.Fa fd . 357.It Bq Er ENOTCAPABLE 358The 359.Fa fd 360or the 361.Fa s 362argument has insufficient rights. 363.It Bq Er ENOBUFS 364The system was unable to allocate an internal buffer. 365.It Bq Er ENOTCONN 366The 367.Fa s 368argument 369points to an unconnected socket. 370.It Bq Er ENOTSOCK 371The 372.Fa s 373argument 374is not a socket. 375.It Bq Er EOPNOTSUPP 376The file system for descriptor 377.Fa fd 378does not support 379.Fn sendfile . 380.It Bq Er EPIPE 381The socket peer has closed the connection. 382.El 383.Sh SEE ALSO 384.Xr netstat 1 , 385.Xr open 2 , 386.Xr send 2 , 387.Xr socket 2 , 388.Xr writev 2 , 389.Xr loader.conf 5 , 390.Xr tuning 7 , 391.Xr sysctl 8 392.Rs 393.%A K. Elmeleegy 394.%A A. Chanda 395.%A A. L. Cox 396.%A W. Zwaenepoel 397.%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management 398.%J The Proceedings of the 2005 USENIX Annual Technical Conference 399.%P pp 223-236 400.%D 2005 401.Re 402.Sh HISTORY 403The 404.Fn sendfile 405system call 406first appeared in 407.Fx 3.0 . 408This manual page first appeared in 409.Fx 3.1 . 410In 411.Fx 10 412support for sending shared memory descriptors had been introduced. 413In 414.Fx 11 415a non-blocking implementation had been introduced. 416.Sh AUTHORS 417The initial implementation of 418.Fn sendfile 419system call 420and this manual page were written by 421.An David G. Lawrence Aq Mt [email protected] . 422The 423.Fx 11 424implementation was written by 425.An Gleb Smirnoff Aq Mt [email protected] . 426.Sh BUGS 427The 428.Fn sendfile 429system call will not fail, i.e., return 430.Dv -1 431and set 432.Va errno 433to 434.Er EFAULT , 435if provided an invalid address for 436.Fa sbytes . 437The 438.Fn sendfile 439system call does not support SCTP sockets, 440it will return 441.Dv -1 442and set 443.Va errno 444to 445.Er EINVAL . 446