1.\" Copyright (c) 2003, David G. Lawrence 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice unmodified, this list of conditions, and the following 9.\" disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 17.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 24.\" SUCH DAMAGE. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd January 25, 2019 29.Dt SENDFILE 2 30.Os 31.Sh NAME 32.Nm sendfile 33.Nd send a file to a socket 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/types.h 38.In sys/socket.h 39.In sys/uio.h 40.Ft int 41.Fo sendfile 42.Fa "int fd" "int s" "off_t offset" "size_t nbytes" 43.Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags" 44.Fc 45.Sh DESCRIPTION 46The 47.Fn sendfile 48system call 49sends a regular file or shared memory object specified by descriptor 50.Fa fd 51out a stream socket specified by descriptor 52.Fa s . 53.Pp 54The 55.Fa offset 56argument specifies where to begin in the file. 57Should 58.Fa offset 59fall beyond the end of file, the system will return 60success and report 0 bytes sent as described below. 61The 62.Fa nbytes 63argument specifies how many bytes of the file should be sent, with 0 having the special 64meaning of send until the end of file has been reached. 65.Pp 66An optional header and/or trailer can be sent before and after the file data by specifying 67a pointer to a 68.Vt "struct sf_hdtr" , 69which has the following structure: 70.Pp 71.Bd -literal -offset indent -compact 72struct sf_hdtr { 73 struct iovec *headers; /* pointer to header iovecs */ 74 int hdr_cnt; /* number of header iovecs */ 75 struct iovec *trailers; /* pointer to trailer iovecs */ 76 int trl_cnt; /* number of trailer iovecs */ 77}; 78.Ed 79.Pp 80The 81.Fa headers 82and 83.Fa trailers 84pointers, if 85.Pf non- Dv NULL , 86point to arrays of 87.Vt "struct iovec" 88structures. 89See the 90.Fn writev 91system call for information on the iovec structure. 92The number of iovecs in these 93arrays is specified by 94.Fa hdr_cnt 95and 96.Fa trl_cnt . 97.Pp 98If 99.Pf non- Dv NULL , 100the system will write the total number of bytes sent on the socket to the 101variable pointed to by 102.Fa sbytes . 103.Pp 104The least significant 16 bits of 105.Fa flags 106argument is a bitmap of these values: 107.Bl -tag -offset indent -width "SF_USER_READAHEAD" 108.It Dv SF_NODISKIO 109This flag causes 110.Nm 111to return 112.Er EBUSY 113instead of blocking when a busy page is encountered. 114This rare situation can happen if some other process is now working 115with the same region of the file. 116It is advised to retry the operation after a short period. 117.Pp 118Note that in older 119.Fx 120versions the 121.Dv SF_NODISKIO 122had slightly different notion. 123The flag prevented 124.Nm 125to run I/O operations in case if an invalid (not cached) page is encountered, 126thus avoiding blocking on I/O. 127Starting with 128.Fx 11 129.Nm 130sending files off the 131.Xr ffs 7 132filesystem does not block on I/O 133(see 134.Sx IMPLEMENTATION NOTES 135), so the condition no longer applies. 136However, it is safe if an application utilizes 137.Dv SF_NODISKIO 138and on 139.Er EBUSY 140performs the same action as it did in 141older 142.Fx 143versions, e.g., 144.Xr aio_read 2 , 145.Xr read 2 146or 147.Nm 148in a different context. 149.It Dv SF_NOCACHE 150The data sent to socket will not be cached by the virtual memory system, 151and will be freed directly to the pool of free pages. 152.It Dv SF_SYNC 153.Nm 154sleeps until the network stack no longer references the VM pages 155of the file, making subsequent modifications to it safe. 156Please note that this is not a guarantee that the data has actually 157been sent. 158.It Dv SF_USER_READAHEAD 159.Nm 160has some internal heuristics to do readahead when sending data. 161This flag forces 162.Nm 163to override any heuristically calculated readahead and use exactly the 164application specified readahead. 165See 166.Sx SETTING READAHEAD 167for more details on readahead. 168.El 169.Pp 170When using a socket marked for non-blocking I/O, 171.Fn sendfile 172may send fewer bytes than requested. 173In this case, the number of bytes successfully 174written is returned in 175.Fa *sbytes 176(if specified), 177and the error 178.Er EAGAIN 179is returned. 180.Sh SETTING READAHEAD 181.Nm 182uses internal heuristics based on request size and file system layout 183to do readahead. 184Additionally application may request extra readahead. 185The most significant 16 bits of 186.Fa flags 187specify amount of pages that 188.Nm 189may read ahead when reading the file. 190A macro 191.Fn SF_FLAGS 192is provided to combine readahead amount and flags. 193An example showing specifying readahead of 16 pages and 194.Dv SF_NOCACHE 195flag: 196.Pp 197.Bd -literal -offset indent -compact 198 SF_FLAGS(16, SF_NOCACHE) 199.Ed 200.Pp 201.Nm 202will use either application specified readahead or internally calculated, 203whichever is bigger. 204Setting flag 205.Dv SF_USER_READAHEAD 206would turn off any heuristics and set maximum possible readahead length to 207the number of pages specified via flags. 208.Sh IMPLEMENTATION NOTES 209The 210.Fx 211implementation of 212.Fn sendfile 213does not block on disk I/O when it sends a file off the 214.Xr ffs 7 215filesystem. 216The syscall returns success before the actual I/O completes, and data 217is put into the socket later unattended. 218However, the order of data in the socket is preserved, so it is safe 219to do further writes to the socket. 220.Pp 221The 222.Fx 223implementation of 224.Fn sendfile 225is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided. 226.Sh TUNING 227On some architectures, this system call internally uses a special 228.Fn sendfile 229buffer 230.Pq Vt "struct sf_buf" 231to handle sending file data to the client. 232If the sending socket is 233blocking, and there are not enough 234.Fn sendfile 235buffers available, 236.Fn sendfile 237will block and report a state of 238.Dq Li sfbufa . 239If the sending socket is non-blocking and there are not enough 240.Fn sendfile 241buffers available, the call will block and wait for the 242necessary buffers to become available before finishing the call. 243.Pp 244The number of 245.Vt sf_buf Ns 's 246allocated should be proportional to the number of nmbclusters used to 247send data to a client via 248.Fn sendfile . 249Tune accordingly to avoid blocking! 250Busy installations that make extensive use of 251.Fn sendfile 252may want to increase these values to be inline with their 253.Va kern.ipc.nmbclusters 254(see 255.Xr tuning 7 256for details). 257.Pp 258The number of 259.Fn sendfile 260buffers available is determined at boot time by either the 261.Va kern.ipc.nsfbufs 262.Xr loader.conf 5 263variable or the 264.Dv NSFBUFS 265kernel configuration tunable. 266The number of 267.Fn sendfile 268buffers scales with 269.Va kern.maxusers . 270The 271.Va kern.ipc.nsfbufsused 272and 273.Va kern.ipc.nsfbufspeak 274read-only 275.Xr sysctl 8 276variables show current and peak 277.Fn sendfile 278buffers usage respectively. 279These values may also be viewed through 280.Nm netstat Fl m . 281.Pp 282If a value of zero is reported for 283.Va kern.ipc.nsfbufs , 284your architecture does not need to use 285.Fn sendfile 286buffers because their task can be efficiently performed 287by the generic virtual memory structures. 288.Sh RETURN VALUES 289.Rv -std sendfile 290.Sh ERRORS 291.Bl -tag -width Er 292.It Bq Er EAGAIN 293The socket is marked for non-blocking I/O and not all data was sent due to 294the socket buffer being filled. 295If specified, the number of bytes successfully sent will be returned in 296.Fa *sbytes . 297.It Bq Er EBADF 298The 299.Fa fd 300argument 301is not a valid file descriptor. 302.It Bq Er EBADF 303The 304.Fa s 305argument 306is not a valid socket descriptor. 307.It Bq Er EBUSY 308A busy page was encountered and 309.Dv SF_NODISKIO 310had been specified. 311Partial data may have been sent. 312.It Bq Er EFAULT 313An invalid address was specified for an argument. 314.It Bq Er EINTR 315A signal interrupted 316.Fn sendfile 317before it could be completed. 318If specified, the number 319of bytes successfully sent will be returned in 320.Fa *sbytes . 321.It Bq Er EINVAL 322The 323.Fa fd 324argument 325is not a regular file. 326.It Bq Er EINVAL 327The 328.Fa s 329argument 330is not a SOCK_STREAM type socket. 331.It Bq Er EINVAL 332The 333.Fa offset 334argument 335is negative. 336.It Bq Er EIO 337An error occurred while reading from 338.Fa fd . 339.It Bq Er ENOTCAPABLE 340The 341.Fa fd 342or the 343.Fa s 344argument has insufficient rights. 345.It Bq Er ENOBUFS 346The system was unable to allocate an internal buffer. 347.It Bq Er ENOTCONN 348The 349.Fa s 350argument 351points to an unconnected socket. 352.It Bq Er ENOTSOCK 353The 354.Fa s 355argument 356is not a socket. 357.It Bq Er EOPNOTSUPP 358The file system for descriptor 359.Fa fd 360does not support 361.Fn sendfile . 362.It Bq Er EPIPE 363The socket peer has closed the connection. 364.El 365.Sh SEE ALSO 366.Xr netstat 1 , 367.Xr open 2 , 368.Xr send 2 , 369.Xr socket 2 , 370.Xr writev 2 , 371.Xr tuning 7 372.Rs 373.%A K. Elmeleegy 374.%A A. Chanda 375.%A A. L. Cox 376.%A W. Zwaenepoel 377.%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management 378.%J The Proceedings of the 2005 USENIX Annual Technical Conference 379.%P pp 223-236 380.%D 2005 381.Re 382.Sh HISTORY 383The 384.Fn sendfile 385system call 386first appeared in 387.Fx 3.0 . 388This manual page first appeared in 389.Fx 3.1 . 390In 391.Fx 10 392support for sending shared memory descriptors had been introduced. 393In 394.Fx 11 395a non-blocking implementation had been introduced. 396.Sh AUTHORS 397The initial implementation of 398.Fn sendfile 399system call 400and this manual page were written by 401.An David G. Lawrence Aq Mt [email protected] . 402The 403.Fx 11 404implementation was written by 405.An Gleb Smirnoff Aq Mt [email protected] . 406.Sh BUGS 407The 408.Fn sendfile 409system call will not fail, i.e., return 410.Dv -1 411and set 412.Va errno 413to 414.Er EFAULT , 415if provided an invalid address for 416.Fa sbytes . 417