|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
ba082202 |
| 07-Mar-2025 |
K Prateek Nayak <[email protected]> |
fs/pipe: Use pipe_buf() helper to retrieve pipe buffer
Use pipe_buf() helper to retrieve the pipe buffer throughout the file replacing the open-coded the logic.
Suggested-by: Oleg Nesterov <oleg@re
fs/pipe: Use pipe_buf() helper to retrieve pipe buffer
Use pipe_buf() helper to retrieve the pipe buffer throughout the file replacing the open-coded the logic.
Suggested-by: Oleg Nesterov <[email protected]> Signed-off-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
cf3d0c54 |
| 07-Mar-2025 |
K Prateek Nayak <[email protected]> |
fs/pipe: Limit the slots in pipe_resize_ring()
Limit the number of slots in pipe_resize_ring() to the maximum value representable by pipe->{head,tail}. Values beyond the max limit can lead to incorr
fs/pipe: Limit the slots in pipe_resize_ring()
Limit the number of slots in pipe_resize_ring() to the maximum value representable by pipe->{head,tail}. Values beyond the max limit can lead to incorrect pipe occupancy related calculations where the pipe will never appear full.
Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Oleg Nesterov <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
00a7d398 |
| 07-Mar-2025 |
Linus Torvalds <[email protected]> |
fs/pipe: add simpler helpers for common cases
The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the i
fs/pipe: add simpler helpers for common cases
The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the involved types.
It turns out that we don't have _that_ many places that access these fields directly and were affected, but we have more than we strictly should have, because our low-level helper functions have been designed to have intimate knowledge of how the pipes work.
And as a result, that random noise of direct 'pipe->head' and 'pipe->tail' accesses makes it harder to pinpoint any actual potential problem spots remaining.
For example, we didn't have a "is the pipe full" helper function, but instead had a "given these pipe buffer indexes and this pipe size, is the pipe full". That's because some low-level pipe code does actually want that much more complicated interface.
But most other places literally just want a "is the pipe full" helper, and not having it meant that those places ended up being unnecessarily much too aware of this all.
It would have been much better if only the very core pipe code that cared had been the one aware of this all.
So let's fix it - better late than never. This just introduces the trivial wrappers for "is this pipe full or empty" and to get how many pipe buffers are used, so that instead of writing
if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
the places that literally just want to know if a pipe is full can just say
if (pipe_is_full(pipe))
instead. The existing trivial cases were converted with a 'sed' script.
This cuts down on the places that access pipe->head and pipe->tail directly outside of the pipe code (and core splice code) quite a lot.
The splice code in particular still revels in doing the direct low-level accesses, and the fuse fuse_dev_splice_write() code also seems a bit unnecessarily eager to go very low-level, but it's at least a bit better than it used to be.
Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
d810d4c2 |
| 06-Mar-2025 |
Linus Torvalds <[email protected]> |
fs/pipe: do not open-code pipe head/tail logic in FIONREAD
Rasmus points out that we do indeed have other cases of breakage from the type changes that were introduced on 32-bit targets in order to r
fs/pipe: do not open-code pipe head/tail logic in FIONREAD
Rasmus points out that we do indeed have other cases of breakage from the type changes that were introduced on 32-bit targets in order to read the pipe head and tail values atomically (commit 3d252160b818: "fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex").
Fix it up by using the proper helper functions that now deal with the pipe buffer index types properly. This makes the code simpler and more obvious.
The compiler does the CSE and loop hoisting of the pipe ring size masking that we used to do manually, so open-coding this was never a good idea.
Reported-by: Rasmus Villemoes <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg Nesterov <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Swapnil Sapkal <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
3d252160 |
| 04-Mar-2025 |
Linus Torvalds <[email protected]> |
fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex
pipe_readable(), pipe_writable(), and pipe_poll() can read "pipe->head" and "pipe->tail" outside of "pipe->mutex" critical section. Whe
fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex
pipe_readable(), pipe_writable(), and pipe_poll() can read "pipe->head" and "pipe->tail" outside of "pipe->mutex" critical section. When the head and the tail are read individually in that order, there is a window for interruption between the two reads in which both the head and the tail can be updated by concurrent readers and writers.
One of the problematic scenarios observed with hackbench running multiple groups on a large server on a particular pipe inode is as follows:
pipe->head = 36 pipe->tail = 36
hackbench-118762 [057] ..... 1029.550548: pipe_write: *wakes up: pipe not full* hackbench-118762 [057] ..... 1029.550548: pipe_write: head: 36 -> 37 [tail: 36] hackbench-118762 [057] ..... 1029.550548: pipe_write: *wake up next reader 118740* hackbench-118762 [057] ..... 1029.550548: pipe_write: *wake up next writer 118768*
hackbench-118768 [206] ..... 1029.55055X: pipe_write: *writer wakes up* hackbench-118768 [206] ..... 1029.55055X: pipe_write: head = READ_ONCE(pipe->head) [37] ... CPU 206 interrupted (exact wakeup was not traced but 118768 did read head at 37 in traces)
hackbench-118740 [057] ..... 1029.550558: pipe_read: *reader wakes up: pipe is not empty* hackbench-118740 [057] ..... 1029.550558: pipe_read: tail: 36 -> 37 [head = 37] hackbench-118740 [057] ..... 1029.550559: pipe_read: *pipe is empty; wakeup writer 118768* hackbench-118740 [057] ..... 1029.550559: pipe_read: *sleeps*
hackbench-118766 [185] ..... 1029.550592: pipe_write: *New writer comes in* hackbench-118766 [185] ..... 1029.550592: pipe_write: head: 37 -> 38 [tail: 37] hackbench-118766 [185] ..... 1029.550592: pipe_write: *wakes up reader 118766*
hackbench-118740 [185] ..... 1029.550598: pipe_read: *reader wakes up; pipe not empty* hackbench-118740 [185] ..... 1029.550599: pipe_read: tail: 37 -> 38 [head: 38] hackbench-118740 [185] ..... 1029.550599: pipe_read: *pipe is empty* hackbench-118740 [185] ..... 1029.550599: pipe_read: *reader sleeps; wakeup writer 118768*
... CPU 206 switches back to writer hackbench-118768 [206] ..... 1029.550601: pipe_write: tail = READ_ONCE(pipe->tail) [38] hackbench-118768 [206] ..... 1029.550601: pipe_write: pipe_full()? (u32)(37 - 38) >= 16? Yes hackbench-118768 [206] ..... 1029.550601: pipe_write: *writer goes back to sleep*
[ Tasks 118740 and 118768 can then indefinitely wait on each other. ]
The unsigned arithmetic in pipe_occupancy() wraps around when "pipe->tail > pipe->head" leading to pipe_full() returning true despite the pipe being empty.
The case of genuine wraparound of "pipe->head" is handled since pipe buffer has data allowing readers to make progress until the pipe->tail wraps too after which the reader will wakeup a sleeping writer, however, mistaking the pipe to be full when it is in fact empty can lead to readers and writers waiting on each other indefinitely.
This issue became more problematic and surfaced as a hang in hackbench after the optimization in commit aaec5a95d596 ("pipe_read: don't wake up the writer if the pipe is still full") significantly reduced the number of spurious wakeups of writers that had previously helped mask the issue.
To avoid missing any updates between the reads of "pipe->head" and "pipe->write", unionize the two with a single unsigned long "pipe->head_tail" member that can be loaded atomically.
Using "pipe->head_tail" to read the head and the tail ensures the lockless checks do not miss any updates to the head or the tail and since those two are only updated under "pipe->mutex", it ensures that the head is always ahead of, or equal to the tail resulting in correct calculations.
[ prateek: commit log, testing on x86 platforms. ]
Reported-and-debugged-by: Swapnil Sapkal <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Alexey Gladkov <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Tested-by: Swapnil Sapkal <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Tested-by: Alexey Gladkov <[email protected]> Signed-off-by: K Prateek Nayak <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
46af8e24 |
| 03-Mar-2025 |
Mateusz Guzik <[email protected]> |
pipe: cache 2 pages instead of 1
User data is kept in a circular buffer backed by pages allocated as needed. Only having space for one spare is still prone to having to resort to allocation / freein
pipe: cache 2 pages instead of 1
User data is kept in a circular buffer backed by pages allocated as needed. Only having space for one spare is still prone to having to resort to allocation / freeing.
In my testing this decreases page allocs by 60% during a kernel build.
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
a40cd584 |
| 03-Mar-2025 |
Mateusz Guzik <[email protected]> |
pipe: drop an always true check in anon_pipe_write()
The check operates on the stale value of 'head' and always loops back.
Just do it unconditionally. No functional changes.
Signed-off-by: Mateus
pipe: drop an always true check in anon_pipe_write()
The check operates on the stale value of 'head' and always loops back.
Just do it unconditionally. No functional changes.
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
ee5eda8e |
| 10-Feb-2025 |
Oleg Nesterov <[email protected]> |
pipe: change pipe_write() to never add a zero-sized buffer
a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot") changed pipe_write() to increment pipe->head in advance. IIU
pipe: change pipe_write() to never add a zero-sized buffer
a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot") changed pipe_write() to increment pipe->head in advance. IIUC to avoid the race with the post_one_notification()-like code which can add another buffer under pipe->rd_wait.lock without pipe->mutex.
This is no longer necessary after c73be61cede5 ("pipe: Add general notification queue support"), pipe_write() checks pipe_has_watch_queue() and returns -EXDEV at the start. And can't help in any case, pipe_write() no longer takes this rd_wait.lock spinlock.
Change pipe_write() to call copy_page_from_iter() first and do nothing if it fails. This way pipe_write() can't add a zero-sized buffer and we can simplify pipe_read() which currently has to take care of this very unlikely case.
Also, with this patch we can probably kill eat_empty_buffer() and more "is this buffer empty" checks in fs/splice.c later.
Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: K Prateek Nayak <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2 |
|
| #
2a42754b |
| 03-Feb-2025 |
Amir Goldstein <[email protected]> |
fsnotify: disable notification by default for all pseudo files
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events.
Disable notifications to all
fsnotify: disable notification by default for all pseudo files
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events.
Disable notifications to all files allocated with alloc_file_pseudo() and enable legacy inotify events for the specific cases of pipe and socket, which have known users of inotify events.
Pre-content events are also kept disabled for sockets and pipes.
Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <[email protected]> Closes: https://lore.kernel.org/linux-fsdevel/[email protected]/ Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/ Tested-by: Alex Williamson <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
f017b0a4 |
| 05-Feb-2025 |
Oleg Nesterov <[email protected]> |
pipe: don't update {a,c,m}time for anonymous pipes
These numbers are visible in fstat() but hopefully nobody uses this information and file_accessed/file_update_time are not that cheap. Stupid test-
pipe: don't update {a,c,m}time for anonymous pipes
These numbers are visible in fstat() but hopefully nobody uses this information and file_accessed/file_update_time are not that cheap. Stupid test-case:
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <assert.h> #include <sys/ioctl.h> #include <sys/time.h>
static char buf[17 * 4096]; static struct timeval TW, TR;
int wr(int fd, int size) { int c, r; struct timeval t0, t1;
gettimeofday(&t0, NULL); for (c = 0; (r = write(fd, buf, size)) > 0; c += r); gettimeofday(&t1, NULL); timeradd(&TW, &t1, &TW); timersub(&TW, &t0, &TW);
return c; }
int rd(int fd, int size) { int c, r; struct timeval t0, t1;
gettimeofday(&t0, NULL); for (c = 0; (r = read(fd, buf, size)) > 0; c += r); gettimeofday(&t1, NULL); timeradd(&TR, &t1, &TR); timersub(&TR, &t0, &TR);
return c; }
int main(int argc, const char *argv[]) { int fd[2], nb = 1, loop, size;
assert(argc == 3); loop = atoi(argv[1]); size = atoi(argv[2]);
assert(pipe(fd) == 0); assert(ioctl(fd[0], FIONBIO, &nb) == 0); assert(ioctl(fd[1], FIONBIO, &nb) == 0);
assert(size <= sizeof(buf)); while (loop--) assert(wr(fd[1], size) == rd(fd[0], size));
struct timeval tt; timeradd(&TW, &TR, &tt); printf("TW = %lu.%03lu TR = %lu.%03lu TT = %lu.%03lu\n", TW.tv_sec, TW.tv_usec/1000, TR.tv_sec, TR.tv_usec/1000, tt.tv_sec, tt.tv_usec/1000);
return 0; }
Before: # for i in 1 2 3; do /host/tmp/test 10000 100; done TW = 8.047 TR = 5.845 TT = 13.893 TW = 8.091 TR = 5.872 TT = 13.963 TW = 8.083 TR = 5.885 TT = 13.969 After: # for i in 1 2 3; do /host/tmp/test 10000 100; done TW = 4.752 TR = 4.664 TT = 9.416 TW = 4.684 TR = 4.608 TT = 9.293 TW = 4.736 TR = 4.652 TT = 9.388
Signed-off-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: K Prateek Nayak <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
262b2fa9 |
| 05-Feb-2025 |
Oleg Nesterov <[email protected]> |
pipe: introduce struct file_operations pipeanon_fops
So that fifos and anonymous pipes could have different f_op methods. Preparation to simplify the next patch.
Signed-off-by: Oleg Nesterov <oleg@
pipe: introduce struct file_operations pipeanon_fops
So that fifos and anonymous pipes could have different f_op methods. Preparation to simplify the next patch.
Signed-off-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: K Prateek Nayak <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1 |
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6 |
|
| #
aaec5a95 |
| 02-Jan-2025 |
Oleg Nesterov <[email protected]> |
pipe_read: don't wake up the writer if the pipe is still full
wake_up(pipe->wr_wait) makes no sense if pipe_full() is still true after the reading, the writer sleeping in wait_event(wr_wait, pipe_wr
pipe_read: don't wake up the writer if the pipe is still full
wake_up(pipe->wr_wait) makes no sense if pipe_full() is still true after the reading, the writer sleeping in wait_event(wr_wait, pipe_writable()) will check the pipe_writable() == !pipe_full() condition and sleep again.
Only wake the writer if we actually released a pipe buf, and the pipe was full before we did so.
Signed-off-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Reported-by: WangYuli <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
cb787f4a |
| 27-Sep-2024 |
Al Viro <[email protected]> |
[tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical
[tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe.
Signed-off-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
5a957bba |
| 30-Aug-2024 |
Christian Brauner <[email protected]> |
pipe: use f_pipe
Pipes use f_version to defer poll notifications until a write has been observed. Since multiple file's refer to the same struct pipe_inode_info in their ->private_data moving it int
pipe: use f_pipe
Pipes use f_version to defer poll notifications until a write has been observed. Since multiple file's refer to the same struct pipe_inode_info in their ->private_data moving it into their isn't feasible since we would need to introduce an additional pointer indirection.
However, since pipes don't require f_pos_lock we placed a new f_pipe member into a union with f_pos_lock that pipes can use. This is similar to what we already do for struct inode where we have additional fields per file type. This will allow us to fully remove f_version in the next step.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
33d8525d |
| 04-Sep-2024 |
Kienan Stewart <[email protected]> |
fs/pipe: Correct imprecise wording in comment
The comment inaccurately describes what pipefs is - that is, a file system.
Signed-off-by: Kienan Stewart <[email protected]> Link: https://lore.ke
fs/pipe: Correct imprecise wording in comment
The comment inaccurately describes what pipefs is - that is, a file system.
Signed-off-by: Kienan Stewart <[email protected]> Link: https://lore.kernel.org/r/20240904-pipe-correct_imprecise_wording-v1-1-2b07843472c2@efficios.com Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
78eb4ea2 |
| 24-Jul-2024 |
Joel Granados <[email protected]> |
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ct
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified.
This patch has been generated by the following coccinelle script:
``` virtual patch
@r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@ identifier func, ctl, write, buffer, lenp, ppos; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... }
@r3@ identifier func; @@
int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *);
@r4@ identifier func, ctl; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *);
@r5@ identifier func, write, buffer, lenp, ppos; @@
int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration.
Co-developed-by: Thomas Weißschuh <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Co-developed-by: Joel Granados <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2 |
|
| #
85f273a6 |
| 27-Jan-2024 |
Kent Overstreet <[email protected]> |
fs/pipe: Convert to lockdep_cmp_fn
*_lock_nested() is fundamentally broken; lockdep needs to check lock ordering, but we cannot device a total ordering on an unbounded number of elements with only a
fs/pipe: Convert to lockdep_cmp_fn
*_lock_nested() is fundamentally broken; lockdep needs to check lock ordering, but we cannot device a total ordering on an unbounded number of elements with only a few subclasses.
the replacement is to define lock ordering with a proper comparison function.
fs/pipe.c was already doing everything correctly otherwise, nothing much changes here.
Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
9d5b9475 |
| 21-Nov-2023 |
Joel Granados <[email protected]> |
fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels
fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size.
Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Acked-by: Christian Brauner <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
show more ...
|
| #
e95aada4 |
| 01-Dec-2023 |
Lukas Schauer <[email protected]> |
pipe: wakeup wr_wait after setting max_usage
Commit c73be61cede5 ("pipe: Add general notification queue support") a regression was introduced that would lock up resized pipes under certain condition
pipe: wakeup wr_wait after setting max_usage
Commit c73be61cede5 ("pipe: Add general notification queue support") a regression was introduced that would lock up resized pipes under certain conditions. See the reproducer in [1].
The commit resizing the pipe ring size was moved to a different function, doing that moved the wakeup for pipe->wr_wait before actually raising pipe->max_usage. If a pipe was full before the resize occured it would result in the wakeup never actually triggering pipe_write.
Set @max_usage and @nr_accounted before waking writers if this isn't a watch queue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212295 [1] Link: https://lore.kernel.org/r/20231201-orchideen-modewelt-e009de4562c6@brauner Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reviewed-by: David Howells <[email protected]> Cc: <[email protected]> Signed-off-by: Lukas Schauer <[email protected]> [Christian Brauner <[email protected]>: rewrite to account for watch queues] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
055ca835 |
| 24-Nov-2023 |
Jann Horn <[email protected]> |
fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
When you try to splice between a normal pipe and a notification pipe, get_pipe_info(..., true) fails, so splice() falls back to treatin
fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
When you try to splice between a normal pipe and a notification pipe, get_pipe_info(..., true) fails, so splice() falls back to treating the notification pipe like a normal pipe - so we end up in iter_file_splice_write(), which first locks the input pipe, then calls vfs_iter_write(), which locks the output pipe.
Lockdep complains about that, because we're taking a pipe lock while already holding another pipe lock.
I think this probably (?) can't actually lead to deadlocks, since you'd need another way to nest locking a normal pipe into locking a watch_queue pipe, but the lockdep annotations don't make that clear.
Bail out earlier in pipe_write() for notification pipes, before taking the pipe lock.
Reported-and-tested-by: <[email protected]> Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Jann Horn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3 |
|
| #
478dbf12 |
| 21-Sep-2023 |
Max Kellermann <[email protected]> |
fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
If there is no watch_queue, holding the pipe mutex is enough to prevent concurrent writes, and we can avoid the spinlock.
O_NOTIF
fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
If there is no watch_queue, holding the pipe mutex is enough to prevent concurrent writes, and we can avoid the spinlock.
O_NOTIFICATION_QUEUE is an exotic and rarely used feature, and of all the pipes that exist at any given time, only very few actually have a watch_queue, therefore it appears worthwile to optimize the common case.
This patch does not optimize pipe_resize_ring() where the spinlocks could be avoided as well; that does not seem like a worthwile optimization because this function is not called often.
Related commits:
- commit 8df441294dd3 ("pipe: Check for ring full inside of the spinlock in pipe_write()") - commit b667b8673443 ("pipe: Advance tail pointer inside of wait spinlock in pipe_read()") - commit 189b0ddc2451 ("pipe: Fix missing lock in pipe_resize_ring()")
Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
dfaabf91 |
| 21-Sep-2023 |
Max Kellermann <[email protected]> |
fs/pipe: remove unnecessary spinlock from pipe_write()
This reverts commit 8df441294dd3 ("pipe: Check for ring full inside of the spinlock in pipe_write()") which was obsoleted by commit c73be61cede
fs/pipe: remove unnecessary spinlock from pipe_write()
This reverts commit 8df441294dd3 ("pipe: Check for ring full inside of the spinlock in pipe_write()") which was obsoleted by commit c73be61cede ("pipe: Add general notification queue support") because now pipe_write() fails early with -EXDEV if there is a watch_queue.
Without a watch_queue, no notifications can be posted to the pipe and mutex protection is enough, as can be seen in splice_pipe_to_pipe() which does not use the spinlock either.
Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
b4bd6b4b |
| 21-Sep-2023 |
Max Kellermann <[email protected]> |
fs/pipe: move check to pipe_has_watch_queue()
This declutters the code by reducing the number of #ifdefs and makes the watch_queue checks simpler. This has no runtime effect; the machine code is id
fs/pipe: move check to pipe_has_watch_queue()
This declutters the code by reducing the number of #ifdefs and makes the watch_queue checks simpler. This has no runtime effect; the machine code is identical.
Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
68279f9c |
| 11-Oct-2023 |
Alexey Dobriyan <[email protected]> |
treewide: mark stuff as __ro_after_init
__read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1.
Also, mark some stuff as "
treewide: mark stuff as __ro_after_init
__read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1.
Also, mark some stuff as "const" and "__init" while I'm at it.
[[email protected]: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning] [[email protected]: coding-style cleanups] Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183 Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|