|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13 |
|
| #
344af277 |
| 13-Jan-2025 |
Christophe Leroy <[email protected]> |
select: Fix unbalanced user_access_end()
While working on implementing user access validation on powerpc I got the following warnings on a pmac32_defconfig build:
CC fs/select.o fs/select.
select: Fix unbalanced user_access_end()
While working on implementing user access validation on powerpc I got the following warnings on a pmac32_defconfig build:
CC fs/select.o fs/select.o: warning: objtool: sys_pselect6+0x1bc: redundant UACCESS disable fs/select.o: warning: objtool: sys_pselect6_time32+0x1bc: redundant UACCESS disable
On powerpc/32s, user_read_access_begin/end() are no-ops, but the failure path has a user_access_end() instead of user_read_access_end() which means an access end without any prior access begin.
Replace that user_access_end() by user_read_access_end().
Fixes: 7e71609f64ec ("pselect6() and friends: take handling the combined 6th/7th args into helper") Signed-off-by: Christophe Leroy <[email protected]> Link: https://lore.kernel.org/r/a7139e28d767a13e667ee3c79599a8047222ef36.1736751221.git.christophe.leroy@csgroup.eu Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
89359897 |
| 07-Jun-2024 |
Al Viro <[email protected]> |
do_pollfd(): convert to CLASS(fd)
lift setting ->revents into the caller, so that failure exits (including the early one) would be plain returns.
We need the scope of our struct fd to end before th
do_pollfd(): convert to CLASS(fd)
lift setting ->revents into the caller, so that failure exits (including the early one) would be plain returns.
We need the scope of our struct fd to end before the store to ->revents, since that's shared with the failure exits prior to the point where we can do fdget().
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
d000e073 |
| 23-Jul-2024 |
Al Viro <[email protected]> |
convert do_select()
take the logics from fdget() to fdput() into an inlined helper - with existing wait_key_set() subsumed into that.
Reviewed-by: Christian Brauner <[email protected]> Signed-off-
convert do_select()
take the logics from fdget() to fdput() into an inlined helper - with existing wait_key_set() subsumed into that.
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
193b7279 |
| 08-Aug-2024 |
Thorsten Blum <[email protected]> |
fs/select: Annotate struct poll_list with __counted_by()
Add the __counted_by compiler attribute to the flexible array member entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CO
fs/select: Annotate struct poll_list with __counted_by()
Add the __counted_by compiler attribute to the flexible array member entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
Signed-off-by: Thorsten Blum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
ed4fb6d7 |
| 14-Aug-2024 |
Felix Moessbauer <[email protected]> |
hrtimer: Use and report correct timerslack values for realtime tasks
The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple tim
hrtimer: Use and report correct timerslack values for realtime tasks
The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple timers in a single interrupt. This is a performance optimization. Timers of realtime tasks (having a realtime scheduling policy) should not be delayed.
This logic was inconsitently applied to the hrtimers, leading to delays of realtime tasks which used timed waits for events (e.g. condition variables). Due to the downstream override of the slack for rt tasks, the procfs reported incorrect (non-zero) timerslack_ns values.
This is changed by setting the timer_slack_ns task attribute to 0 for all tasks with a rt policy. By that, downstream users do not need to specially handle rt tasks (w.r.t. the slack), and the procfs entry shows the correct value of "0". Setting non-zero slack values (either via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored, as stated in "man 2 PR_SET_TIMERSLACK":
Timer slack is not applied to threads that are scheduled under a real-time scheduling policy (see sched_setscheduler(2)).
The special handling of timerslack on rt tasks in downstream users is removed as well.
Signed-off-by: Felix Moessbauer <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
2865baf5 |
| 09-Apr-2024 |
Linus Torvalds <[email protected]> |
x86: support user address masking instead of non-speculative conditional
The Spectre-v1 mitigations made "access_ok()" much more expensive, since it has to serialize execution with the test for a va
x86: support user address masking instead of non-speculative conditional
The Spectre-v1 mitigations made "access_ok()" much more expensive, since it has to serialize execution with the test for a valid user address.
All the normal user copy routines avoid this by just masking the user address with a data-dependent mask instead, but the fast "unsafe_user_read()" kind of patterms that were supposed to be a fast case got slowed down.
This introduces a notion of using
src = masked_user_access_begin(src);
to do the user address sanity using a data-dependent mask instead of the more traditional conditional
if (user_read_access_begin(src, len)) {
model.
This model only works for dense accesses that start at 'src' and on architectures that have a guard region that is guaranteed to fault in between the user space and the kernel space area.
With this, the user access doesn't need to be manually checked, because a bad address is guaranteed to fault (by some architecture masking trick: on x86-64 this involves just turning an invalid user address into all ones, since we don't map the top of address space).
This only converts a couple of examples for now. Example x86-64 code generation for loading two words from user space:
stac mov %rax,%rcx sar $0x3f,%rcx or %rax,%rcx mov (%rcx),%r13 mov 0x8(%rcx),%r14 clac
where all the error handling and -EFAULT is now purely handled out of line by the exception path.
Of course, if the micro-architecture does badly at 'clac' and 'stac', the above is still pitifully slow. But at least we did as well as we could.
Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
1da91ea8 |
| 31-May-2024 |
Al Viro <[email protected]> |
introduce fd_file(), convert all accessors to it.
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are ve
introduce fd_file(), convert all accessors to it.
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict]
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
ae04f69d |
| 10-Jun-2024 |
Qais Yousef <[email protected]> |
sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()
Some find the name realtime overloaded. Use rt_or_dl() as an alternative, hopefully better, name.
Suggested-by: Daniel Bristot de
sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()
Some find the name realtime overloaded. Use rt_or_dl() as an alternative, hopefully better, name.
Suggested-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
130fd056 |
| 10-Jun-2024 |
Qais Yousef <[email protected]> |
sched/rt: Clean up usage of rt_task()
rt_task() checks if a task has RT priority. But depends on your dictionary, this could mean it belongs to RT class, or is a 'realtime' task, which includes RT a
sched/rt: Clean up usage of rt_task()
rt_task() checks if a task has RT priority. But depends on your dictionary, this could mean it belongs to RT class, or is a 'realtime' task, which includes RT and DL classes.
Since this has caused some confusion already on discussion [1], it seemed a clean up is due.
I define the usage of rt_task() to be tasks that belong to RT class. Make sure that it returns true only for RT class and audit the users and replace the ones required the old behavior with the new realtime_task() which returns true for RT and DL classes. Introduce similar realtime_prio() to create similar distinction to rt_prio() and update the users that required the old behavior to use the new function.
Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.
Document the functions to make it more obvious what is the difference between them. PI-boosted tasks is a factor that must be taken into account when choosing which function to use.
Rename task_is_realtime() to realtime_task_policy() as the old name is confusing against the new realtime_task().
No functional changes were intended.
[1] https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Phil Auld <[email protected]> Reviewed-by: "Steven Rostedt (Google)" <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5 |
|
| #
ddb9fd7a |
| 16-Feb-2024 |
Arnd Bergmann <[email protected]> |
fs/select: rework stack allocation hack for clang
A while ago, we changed the way that select() and poll() preallocate a temporary buffer just under the size of the static warning limit of 1024 byte
fs/select: rework stack allocation hack for clang
A while ago, we changed the way that select() and poll() preallocate a temporary buffer just under the size of the static warning limit of 1024 bytes, as clang was frequently going slightly above that limit.
The warnings have recently returned and I took another look. As it turns out, clang is not actually inherently worse at reserving stack space, it just happens to inline do_select() into core_sys_select(), while gcc never inlines it.
Annotate do_select() to never be inlined and in turn remove the special case for the allocation size. This should give the same behavior for both clang and gcc all the time and once more avoids those warnings.
Fixes: ad312f95d41c ("fs/select: avoid clang stack usage warning") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3 |
|
| #
c67ef897 |
| 29-Jan-2024 |
Kees Cook <[email protected]> |
select: Avoid wrap-around instrumentation in do_sys_poll()
The mix of int, unsigned int, and unsigned long used by struct poll_list::len, todo, len, and j meant that the signed overflow sanitizer go
select: Avoid wrap-around instrumentation in do_sys_poll()
The mix of int, unsigned int, and unsigned long used by struct poll_list::len, todo, len, and j meant that the signed overflow sanitizer got worried it needed to instrument several places where arithmetic happens between these variables. Since all of the variables are always positive and bounded by unsigned int, use a single type in all places. Additionally expand the zero-test into an explicit range check before updating "todo".
This keeps sanitizer instrumentation[1] out of a UACCESS path:
vmlinux.o: warning: objtool: do_sys_poll+0x285: call to __ubsan_handle_sub_overflow() with UACCESS enabled
Link: https://github.com/KSPP/linux/issues/26 [1] Cc: Christian Brauner <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Jan Kara <[email protected]> Cc: <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
68514dac |
| 10-Jan-2022 |
Jan Kara <[email protected]> |
select: Fix indefinitely sleeping task in poll_schedule_timeout()
A task can end up indefinitely sleeping in do_select() -> poll_schedule_timeout() when the following race happens:
TASK1 (thread1
select: Fix indefinitely sleeping task in poll_schedule_timeout()
A task can end up indefinitely sleeping in do_select() -> poll_schedule_timeout() when the following race happens:
TASK1 (thread1) TASK2 TASK1 (thread2) do_select() setup poll_wqueues table with 'fd' write data to 'fd' pollwake() table->triggered = 1 closes 'fd' thread1 is waiting for poll_schedule_timeout() - sees table->triggered table->triggered = 0 return -EINTR loop back in do_select()
But at this point when TASK1 loops back, the fdget() in the setup of poll_wqueues fails. So now so we never find 'fd' is ready for reading and sleep in poll_schedule_timeout() indefinitely.
Treat an fd that got closed as a fd on which some event happened. This makes sure cannot block indefinitely in do_select().
Another option would be to return -EBADF in this case but that has a potential of subtly breaking applications that excercise this behavior and it happens to work for them. So returning fd as active seems like a safer choice.
Suggested-by: Linus Torvalds <[email protected]> CC: [email protected] Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8 |
|
| #
b6459415 |
| 29-Dec-2021 |
Jakub Kicinski <[email protected]> |
net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of
net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k.
There's a lot of missing includes this was masking. Primarily in networking tho, this time.
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Marc Kleine-Budde <[email protected]> Acked-by: Florian Fainelli <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Acked-by: Stefano Garzarella <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1 |
|
| #
0bcfe68b |
| 07-Sep-2021 |
Linus Torvalds <[email protected]> |
Revert "memcg: enable accounting for pollfd and select bits arrays"
This reverts commit b655843444152c0a14b749308e4cb35d91cbcf0b.
Just like with the memcg lock accounting, the kernel test robot rep
Revert "memcg: enable accounting for pollfd and select bits arrays"
This reverts commit b655843444152c0a14b749308e4cb35d91cbcf0b.
Just like with the memcg lock accounting, the kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting.
People already have suggestions on how to do that, but it's "future work".
So revert it for now.
[ Note: the first link below is for this same commit but a different commit ID, because it's the kernel test robot ended up noticing it in Andrew Morton's patch queue ]
Link: https://lore.kernel.org/lkml/20210905132732.GC15026@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
b6558434 |
| 02-Sep-2021 |
Vasily Averin <[email protected]> |
memcg: enable accounting for pollfd and select bits arrays
User can call select/poll system calls with a large number of assigned file descriptors and force kernel to allocate up to several pages of
memcg: enable accounting for pollfd and select bits arrays
User can call select/poll system calls with a large number of assigned file descriptors and force kernel to allocate up to several pages of memory till end of these sleeping system calls. We have here long-living unaccounted per-task allocations.
It makes sense to account for these allocations to restrict the host's memory consumption from inside the memcg-limited container.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vasily Averin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Yutian Yang <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7 |
|
| #
5abbe51a |
| 01-Feb-2021 |
Oleg Nesterov <[email protected]> |
kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT.
Add a new helper which sets restart_block->fn and calls
kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT.
Add a new helper which sets restart_block->fn and calls a dummy arch_set_restart_data() helper.
Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3 |
|
| #
ef0ba055 |
| 07-Jan-2021 |
Linus Torvalds <[email protected]> |
poll: fix performance regression due to out-of-line __put_user()
The kernel test robot reported a -5.8% performance regression on the "poll2" test of will-it-scale, and bisected it to commit d55564c
poll: fix performance regression due to out-of-line __put_user()
The kernel test robot reported a -5.8% performance regression on the "poll2" test of will-it-scale, and bisected it to commit d55564cfc222 ("x86: Make __put_user() generate an out-of-line call").
I didn't expect an out-of-line __put_user() to matter, because no normal core code should use that non-checking legacy version of user access any more. But I had overlooked the very odd poll() usage, which does a __put_user() to update the 'revents' values of the poll array.
Now, Al Viro correctly points out that instead of updating just the 'revents' field, it would be much simpler to just copy the _whole_ pollfd entry, and then we could just use "copy_to_user()" on the whole array of entries, the same way we use "copy_from_user()" a few lines earlier to get the original values.
But that is not what we've traditionally done, and I worry that threaded applications might be concurrently modifying the other fields of the pollfd array. So while Al's suggestion is simpler - and perhaps worth trying in the future - this instead keeps the "just update revents" model.
To fix the performance regression, use the modern "unsafe_put_user()" instead of __put_user(), with the proper "user_write_access_begin()" guarding in place. This improves code generation enormously.
Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/ Reported-by: kernel test robot <[email protected]> Tested-by: Oliver Sang <[email protected]> Cc: Al Viro <[email protected]> Cc: David Laight <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4 |
|
| #
5e01fdff |
| 31-Aug-2020 |
Gustavo A. R. Silva <[email protected]> |
fs: Replace zero-length array with flexible-array member
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel
fs: Replace zero-length array with flexible-array member
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2].
[1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays
Signed-off-by: Gustavo A. R. Silva <[email protected]>
show more ...
|
|
Revision tags: v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3 |
|
| #
7e71609f |
| 19-Feb-2020 |
Al Viro <[email protected]> |
pselect6() and friends: take handling the combined 6th/7th args into helper
... and use unsafe_get_user(), while we are at it.
Signed-off-by: Al Viro <[email protected]>
|
|
Revision tags: v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5 |
|
| #
75d319c0 |
| 25-Oct-2019 |
Arnd Bergmann <[email protected]> |
y2038: syscalls: change remaining timeval to __kernel_old_timeval
All of the remaining syscalls that pass a timeval (gettimeofday, utime, futimesat) can trivially be changed to pass a __kernel_old_t
y2038: syscalls: change remaining timeval to __kernel_old_timeval
All of the remaining syscalls that pass a timeval (gettimeofday, utime, futimesat) can trivially be changed to pass a __kernel_old_timeval instead, which has a compatible layout, but avoids ambiguity with the timeval type in user space.
Acked-by: Christian Brauner <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
show more ...
|
|
Revision tags: v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1 |
|
| #
43e11fa2 |
| 16-Jul-2019 |
Gustavo A. R. Silva <[email protected]> |
fs/select.c: use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory
fs/select.c: use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example:
struct foo { int stuff; struct boo entry[]; };
size = sizeof(struct foo) + count * sizeof(struct boo); instance = kmalloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper:
instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
Also, notice that variable size is unnecessary, hence it is removed.
This code was detected with the help of Coccinelle.
Link: http://lkml.kernel.org/r/20190604164226.GA13823@embeddedor Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
ac301020 |
| 16-Jul-2019 |
Oleg Nesterov <[email protected]> |
select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
Now that restore_saved_sigmask_unless() is always called with the same argument right before poll_select_copy_remaining
select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
Now that restore_saved_sigmask_unless() is always called with the same argument right before poll_select_copy_remaining() we can move it into poll_select_copy_remaining() and make it the only caller of restore() in fs/select.c.
The patch also renames poll_select_copy_remaining(), poll_select_finish() looks better after this change.
kern_select() doesn't use set_user_sigmask(), so in this case poll_select_finish() does restore_saved_sigmask_unless() "for no reason". But this won't hurt, and WARN_ON(!TIF_SIGPENDING) is still valid.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Al Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: David Laight <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Deepa Dinamani <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Eric Wong <[email protected]> Cc: Jason Baron <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
8cf8b553 |
| 16-Jul-2019 |
Oleg Nesterov <[email protected]> |
select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
do_poll() returns -EINTR if interrupted and after that all its callers have to translate it into -ERESTARTNOHAND. Change do_pol
select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
do_poll() returns -EINTR if interrupted and after that all its callers have to translate it into -ERESTARTNOHAND. Change do_poll() to return -ERESTARTNOHAND and update (simplify) the callers.
Note that this also unifies all users of restore_saved_sigmask_unless(), see the next patch.
Linus:
: The *right* return value will actually be then chosen by : poll_select_copy_remaining(), which will turn ERESTARTNOHAND to EINTR : when it can't update the timeout. : : Except for the cases that use restart_block and do that instead and : don't have the whole timeout restart issue as a result.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Al Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: David Laight <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Deepa Dinamani <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Eric Wong <[email protected]> Cc: Jason Baron <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
b772434b |
| 16-Jul-2019 |
Oleg Nesterov <[email protected]> |
signal: simplify set_user_sigmask/restore_user_sigmask
task->saved_sigmask and ->restore_sigmask are only used in the ret-from- syscall paths. This means that set_user_sigmask() can save ->blocked
signal: simplify set_user_sigmask/restore_user_sigmask
task->saved_sigmask and ->restore_sigmask are only used in the ret-from- syscall paths. This means that set_user_sigmask() can save ->blocked in ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked was modified.
This way the callers do not need 2 sigset_t's passed to set/restore and restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns into the trivial helper which just calls restore_saved_sigmask().
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Deepa Dinamani <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Eric Wong <[email protected]> Cc: Jason Baron <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Al Viro <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: David Laight <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.2, v5.2-rc7 |
|
| #
97abc889 |
| 28-Jun-2019 |
Oleg Nesterov <[email protected]> |
signal: remove the wrong signal_pending() check in restore_user_sigmask()
This is the minimal fix for stable, I'll send cleanups later.
Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") in
signal: remove the wrong signal_pending() check in restore_user_sigmask()
This is the minimal fix for stable, I'll send cleanups later.
Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") introduced the visible change which breaks user-space: a signal temporary unblocked by set_user_sigmask() can be delivered even if the caller returns success or timeout.
Change restore_user_sigmask() to accept the additional "interrupted" argument which should be used instead of signal_pending() check, and update the callers.
Eric said:
: For clarity. I don't think this is required by posix, or fundamentally to : remove the races in select. It is what linux has always done and we have : applications who care so I agree this fix is needed. : : Further in any case where the semantic change that this patch rolls back : (aka where allowing a signal to be delivered and the select like call to : complete) would be advantage we can do as well if not better by using : signalfd. : : Michael is there any chance we can get this guarantee of the linux : implementation of pselect and friends clearly documented. The guarantee : that if the system call completes successfully we are guaranteed that no : signal that is unblocked by using sigmask will be delivered?
Link: http://lkml.kernel.org/r/[email protected] Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Oleg Nesterov <[email protected]> Reported-by: Eric Wong <[email protected]> Tested-by: Eric Wong <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Deepa Dinamani <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Jason Baron <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Al Viro <[email protected]> Cc: David Laight <[email protected]> Cc: <[email protected]> [5.0+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|