xref: /freebsd-12.1/lib/libc/sys/sendfile.2 (revision 42c60ced)
1.\" Copyright (c) 2003, David G. Lawrence
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice unmodified, this list of conditions, and the following
9.\"    disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\"
14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
17.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
24.\" SUCH DAMAGE.
25.\"
26.\" $FreeBSD$
27.\"
28.Dd January 25, 2019
29.Dt SENDFILE 2
30.Os
31.Sh NAME
32.Nm sendfile
33.Nd send a file to a socket
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/types.h
38.In sys/socket.h
39.In sys/uio.h
40.Ft int
41.Fo sendfile
42.Fa "int fd" "int s" "off_t offset" "size_t nbytes"
43.Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags"
44.Fc
45.Sh DESCRIPTION
46The
47.Fn sendfile
48system call
49sends a regular file or shared memory object specified by descriptor
50.Fa fd
51out a stream socket specified by descriptor
52.Fa s .
53.Pp
54The
55.Fa offset
56argument specifies where to begin in the file.
57Should
58.Fa offset
59fall beyond the end of file, the system will return
60success and report 0 bytes sent as described below.
61The
62.Fa nbytes
63argument specifies how many bytes of the file should be sent, with 0 having the special
64meaning of send until the end of file has been reached.
65.Pp
66An optional header and/or trailer can be sent before and after the file data by specifying
67a pointer to a
68.Vt "struct sf_hdtr" ,
69which has the following structure:
70.Pp
71.Bd -literal -offset indent -compact
72struct sf_hdtr {
73	struct iovec *headers;	/* pointer to header iovecs */
74	int hdr_cnt;		/* number of header iovecs */
75	struct iovec *trailers;	/* pointer to trailer iovecs */
76	int trl_cnt;		/* number of trailer iovecs */
77};
78.Ed
79.Pp
80The
81.Fa headers
82and
83.Fa trailers
84pointers, if
85.Pf non- Dv NULL ,
86point to arrays of
87.Vt "struct iovec"
88structures.
89See the
90.Fn writev
91system call for information on the iovec structure.
92The number of iovecs in these
93arrays is specified by
94.Fa hdr_cnt
95and
96.Fa trl_cnt .
97.Pp
98If
99.Pf non- Dv NULL ,
100the system will write the total number of bytes sent on the socket to the
101variable pointed to by
102.Fa sbytes .
103.Pp
104The least significant 16 bits of
105.Fa flags
106argument is a bitmap of these values:
107.Bl -tag -offset indent -width "SF_USER_READAHEAD"
108.It Dv SF_NODISKIO
109This flag causes
110.Nm
111to return
112.Er EBUSY
113instead of blocking when a busy page is encountered.
114This rare situation can happen if some other process is now working
115with the same region of the file.
116It is advised to retry the operation after a short period.
117.Pp
118Note that in older
119.Fx
120versions the
121.Dv SF_NODISKIO
122had slightly different notion.
123The flag prevented
124.Nm
125to run I/O operations in case if an invalid (not cached) page is encountered,
126thus avoiding blocking on I/O.
127Starting with
128.Fx 11
129.Nm
130sending files off the
131.Xr ffs 7
132filesystem does not block on I/O
133(see
134.Sx IMPLEMENTATION NOTES
135), so the condition no longer applies.
136However, it is safe if an application utilizes
137.Dv SF_NODISKIO
138and on
139.Er EBUSY
140performs the same action as it did in
141older
142.Fx
143versions, e.g.,
144.Xr aio_read 2 ,
145.Xr read 2
146or
147.Nm
148in a different context.
149.It Dv SF_NOCACHE
150The data sent to socket will not be cached by the virtual memory system,
151and will be freed directly to the pool of free pages.
152.It Dv SF_SYNC
153.Nm
154sleeps until the network stack no longer references the VM pages
155of the file, making subsequent modifications to it safe.
156Please note that this is not a guarantee that the data has actually
157been sent.
158.It Dv SF_USER_READAHEAD
159.Nm
160has some internal heuristics to do readahead when sending data.
161This flag forces
162.Nm
163to override any heuristically calculated readahead and use exactly the
164application specified readahead.
165See
166.Sx SETTING READAHEAD
167for more details on readahead.
168.El
169.Pp
170When using a socket marked for non-blocking I/O,
171.Fn sendfile
172may send fewer bytes than requested.
173In this case, the number of bytes successfully
174written is returned in
175.Fa *sbytes
176(if specified),
177and the error
178.Er EAGAIN
179is returned.
180.Sh SETTING READAHEAD
181.Nm
182uses internal heuristics based on request size and file system layout
183to do readahead.
184Additionally application may request extra readahead.
185The most significant 16 bits of
186.Fa flags
187specify amount of pages that
188.Nm
189may read ahead when reading the file.
190A macro
191.Fn SF_FLAGS
192is provided to combine readahead amount and flags.
193An example showing specifying readahead of 16 pages and
194.Dv SF_NOCACHE
195flag:
196.Pp
197.Bd -literal -offset indent -compact
198	SF_FLAGS(16, SF_NOCACHE)
199.Ed
200.Pp
201.Nm
202will use either application specified readahead or internally calculated,
203whichever is bigger.
204Setting flag
205.Dv SF_USER_READAHEAD
206would turn off any heuristics and set maximum possible readahead length to
207the number of pages specified via flags.
208.Sh IMPLEMENTATION NOTES
209The
210.Fx
211implementation of
212.Fn sendfile
213does not block on disk I/O when it sends a file off the
214.Xr ffs 7
215filesystem.
216The syscall returns success before the actual I/O completes, and data
217is put into the socket later unattended.
218However, the order of data in the socket is preserved, so it is safe
219to do further writes to the socket.
220.Pp
221The
222.Fx
223implementation of
224.Fn sendfile
225is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided.
226.Sh TUNING
227On some architectures, this system call internally uses a special
228.Fn sendfile
229buffer
230.Pq Vt "struct sf_buf"
231to handle sending file data to the client.
232If the sending socket is
233blocking, and there are not enough
234.Fn sendfile
235buffers available,
236.Fn sendfile
237will block and report a state of
238.Dq Li sfbufa .
239If the sending socket is non-blocking and there are not enough
240.Fn sendfile
241buffers available, the call will block and wait for the
242necessary buffers to become available before finishing the call.
243.Pp
244The number of
245.Vt sf_buf Ns 's
246allocated should be proportional to the number of nmbclusters used to
247send data to a client via
248.Fn sendfile .
249Tune accordingly to avoid blocking!
250Busy installations that make extensive use of
251.Fn sendfile
252may want to increase these values to be inline with their
253.Va kern.ipc.nmbclusters
254(see
255.Xr tuning 7
256for details).
257.Pp
258The number of
259.Fn sendfile
260buffers available is determined at boot time by either the
261.Va kern.ipc.nsfbufs
262.Xr loader.conf 5
263variable or the
264.Dv NSFBUFS
265kernel configuration tunable.
266The number of
267.Fn sendfile
268buffers scales with
269.Va kern.maxusers .
270The
271.Va kern.ipc.nsfbufsused
272and
273.Va kern.ipc.nsfbufspeak
274read-only
275.Xr sysctl 8
276variables show current and peak
277.Fn sendfile
278buffers usage respectively.
279These values may also be viewed through
280.Nm netstat Fl m .
281.Pp
282If a value of zero is reported for
283.Va kern.ipc.nsfbufs ,
284your architecture does not need to use
285.Fn sendfile
286buffers because their task can be efficiently performed
287by the generic virtual memory structures.
288.Sh RETURN VALUES
289.Rv -std sendfile
290.Sh ERRORS
291.Bl -tag -width Er
292.It Bq Er EAGAIN
293The socket is marked for non-blocking I/O and not all data was sent due to
294the socket buffer being filled.
295If specified, the number of bytes successfully sent will be returned in
296.Fa *sbytes .
297.It Bq Er EBADF
298The
299.Fa fd
300argument
301is not a valid file descriptor.
302.It Bq Er EBADF
303The
304.Fa s
305argument
306is not a valid socket descriptor.
307.It Bq Er EBUSY
308A busy page was encountered and
309.Dv SF_NODISKIO
310had been specified.
311Partial data may have been sent.
312.It Bq Er EFAULT
313An invalid address was specified for an argument.
314.It Bq Er EINTR
315A signal interrupted
316.Fn sendfile
317before it could be completed.
318If specified, the number
319of bytes successfully sent will be returned in
320.Fa *sbytes .
321.It Bq Er EINVAL
322The
323.Fa fd
324argument
325is not a regular file.
326.It Bq Er EINVAL
327The
328.Fa s
329argument
330is not a SOCK_STREAM type socket.
331.It Bq Er EINVAL
332The
333.Fa offset
334argument
335is negative.
336.It Bq Er EIO
337An error occurred while reading from
338.Fa fd .
339.It Bq Er ENOTCAPABLE
340The
341.Fa fd
342or the
343.Fa s
344argument has insufficient rights.
345.It Bq Er ENOBUFS
346The system was unable to allocate an internal buffer.
347.It Bq Er ENOTCONN
348The
349.Fa s
350argument
351points to an unconnected socket.
352.It Bq Er ENOTSOCK
353The
354.Fa s
355argument
356is not a socket.
357.It Bq Er EOPNOTSUPP
358The file system for descriptor
359.Fa fd
360does not support
361.Fn sendfile .
362.It Bq Er EPIPE
363The socket peer has closed the connection.
364.El
365.Sh SEE ALSO
366.Xr netstat 1 ,
367.Xr open 2 ,
368.Xr send 2 ,
369.Xr socket 2 ,
370.Xr writev 2 ,
371.Xr tuning 7
372.Rs
373.%A K. Elmeleegy
374.%A A. Chanda
375.%A A. L. Cox
376.%A W. Zwaenepoel
377.%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management
378.%J The Proceedings of the 2005 USENIX Annual Technical Conference
379.%P pp 223-236
380.%D 2005
381.Re
382.Sh HISTORY
383The
384.Fn sendfile
385system call
386first appeared in
387.Fx 3.0 .
388This manual page first appeared in
389.Fx 3.1 .
390In
391.Fx 10
392support for sending shared memory descriptors had been introduced.
393In
394.Fx 11
395a non-blocking implementation had been introduced.
396.Sh AUTHORS
397The initial implementation of
398.Fn sendfile
399system call
400and this manual page were written by
401.An David G. Lawrence Aq Mt [email protected] .
402The
403.Fx 11
404implementation was written by
405.An Gleb Smirnoff Aq Mt [email protected] .
406.Sh BUGS
407The
408.Fn sendfile
409system call will not fail, i.e., return
410.Dv -1
411and set
412.Va errno
413to
414.Er EFAULT ,
415if provided an invalid address for
416.Fa sbytes .
417