|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
9ae7e5a1 |
| 19-Mar-2025 |
Mateusz Guzik <[email protected]> |
fs: predict not reaching the limit in alloc_empty_file()
Eliminates a jump over a call to capable() in the common case.
By default the limit is not even set, in which case the check can't even fail
fs: predict not reaching the limit in alloc_empty_file()
Eliminates a jump over a call to capable() in the common case.
By default the limit is not even set, in which case the check can't even fail to begin with. It remains unset at least on Debian and Ubuntu. For this cases this can probably become a static branch instead.
In the meantime tidy it up.
I note the check separate from the bump makes the entire thing racy.
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6 |
|
| #
e8358845 |
| 05-Mar-2025 |
Mateusz Guzik <[email protected]> |
file: add fput and file_ref_put routines optimized for use when closing a fd
Vast majority of the time closing a file descriptor also operates on the last reference, where a regular fput usage will
file: add fput and file_ref_put routines optimized for use when closing a fd
Vast majority of the time closing a file descriptor also operates on the last reference, where a regular fput usage will result in 2 atomics. This can be changed to only suffer 1.
See commentary above file_ref_put_close() for more information.
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
711f9b8f |
| 03-Feb-2025 |
Amir Goldstein <[email protected]> |
fsnotify: disable pre-content and permission events by default
After introducing pre-content events, we had a regression related to disabling huge faults on files that should never have pre-content
fsnotify: disable pre-content and permission events by default
After introducing pre-content events, we had a regression related to disabling huge faults on files that should never have pre-content events enabled.
This happened because the default f_mode of allocated files (0) does not disable pre-content events.
Pre-content events are disabled in file_set_fsnotify_mode_by_watchers() but internal files may not get to call this helper.
Initialize f_mode to disable permission and pre-content events for all files and if needed they will be enabled for the callers of file_set_fsnotify_mode_by_watchers().
Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <[email protected]> Closes: https://lore.kernel.org/linux-fsdevel/[email protected]/ Tested-by: Alex Williamson <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
2a42754b |
| 03-Feb-2025 |
Amir Goldstein <[email protected]> |
fsnotify: disable notification by default for all pseudo files
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events.
Disable notifications to all
fsnotify: disable notification by default for all pseudo files
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events.
Disable notifications to all files allocated with alloc_file_pseudo() and enable legacy inotify events for the specific cases of pipe and socket, which have known users of inotify events.
Pre-content events are also kept disabled for sockets and pipes.
Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <[email protected]> Closes: https://lore.kernel.org/linux-fsdevel/[email protected]/ Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/ Tested-by: Alex Williamson <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1 |
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
| #
c1feab95 |
| 24-Jan-2025 |
Al Viro <[email protected]> |
add a string-to-qstr constructor
Quite a few places want to build a struct qstr by given string; it would be convenient to have a primitive doing that, rather than open-coding it via QSTR_INIT().
T
add a string-to-qstr constructor
Quite a few places want to build a struct qstr by given string; it would be convenient to have a primitive doing that, rather than open-coding it via QSTR_INIT().
The closest approximation was in bcachefs, but that expands to initializer list - {.len = strlen(string), .name = string}. It would be more useful to have it as compound literal - (struct qstr){.len = strlen(string), .name = string}.
Unlike initializer list it's a valid expression. What's more, it's a valid lvalue - it's an equivalent of anonymous local variable with such initializer, so the things like path->dentry = d_alloc_pseudo(mnt->mnt_sb, &QSTR(name)); are valid. It can also be used as initializer, with identical effect - struct qstr x = (struct qstr){.name = s, .len = strlen(s)}; is equivalent to struct qstr anon_variable = {.name = s, .len = strlen(s)}; struct qstr x = anon_variable; // anon_variable is never used after that point and any even remotely sane compiler will manage to collapse that into struct qstr x = {.name = s, .len = strlen(s)};
What compound literals can't be used for is initialization of global variables, but those are covered by QSTR_INIT().
This commit lifts definition(s) of QSTR() into linux/dcache.h, converts it to compound literal (all bcachefs users are fine with that) and converts assorted open-coded instances to using that.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
9b7da575 |
| 23-Oct-2024 |
shao mingyin <[email protected]> |
file: flush delayed work in delayed fput()
The fput() of file rcS might not have completed causing issues when executing the file.
rcS is opened in do_populate_rootfs before executed. At the end of
file: flush delayed work in delayed fput()
The fput() of file rcS might not have completed causing issues when executing the file.
rcS is opened in do_populate_rootfs before executed. At the end of do_populate_rootfs() flush_delayed_fput() is called. Now do_populate_rootfs() assumes that all fput()s caused by do_populate_rootfs() have completed.
But flush_delayed_fput() can only ensure that fput() on the current delayed_fput_list has finished. Any file that has been removed from delayed_fput_list asynchronously in the meantime might not have completed causing the exec to fail.
do_populate_rootfs delayed_fput_list delayed_fput execve fput() a fput() a->b fput() a->b->rcS __fput(a) fput() c fput() c->d __fput(b) flush_delayed_fput __fput(c) __fput(d) __fput(b) __fput(b) execve(rcS)
Ensure that all delayed work is done by calling flush_delayed_work() in flush_delayed_fput() explicitly.
Signed-off-by: Chen Lin <[email protected]> Signed-off-by: Shao Mingyin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Yang Yang <[email protected]> Cc: Yang Tao <[email protected]> Cc: Xu Xin <[email protected]> [brauner: rewrite commit message] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
d727935c |
| 24-Nov-2024 |
Jinliang Zheng <[email protected]> |
fs: fix proc_handler for sysctl_nr_open
Use proc_douintvec_minmax() instead of proc_dointvec_minmax() to handle sysctl_nr_open, because its data type is unsigned int, not int.
Fixes: 9b80a184eaad (
fs: fix proc_handler for sysctl_nr_open
Use proc_douintvec_minmax() instead of proc_dointvec_minmax() to handle sysctl_nr_open, because its data type is unsigned int, not int.
Fixes: 9b80a184eaad ("fs/file: more unsigned file descriptors") Signed-off-by: Jinliang Zheng <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3 |
|
| #
90ee6ed7 |
| 07-Oct-2024 |
Christian Brauner <[email protected]> |
fs: port files to file_ref
Port files to rely on file_ref reference to improve scaling and gain overflow protection.
- We continue to WARN during get_file() in case a file that is already marked
fs: port files to file_ref
Port files to rely on file_ref reference to improve scaling and gain overflow protection.
- We continue to WARN during get_file() in case a file that is already marked dead is revived as get_file() is only valid if the caller already holds a reference to the file. This hasn't changed just the check changes.
- The semantics for epoll and ttm's dmabuf usage have changed. Both epoll and ttm synchronize with __fput() to prevent the underlying file from beeing freed.
(1) epoll
Explaining epoll is straightforward using a simple diagram. Essentially, the mutex of the epoll instance needs to be taken in both __fput() and around epi_fget() preventing the file from being freed while it is polled or preventing the file from being resurrected.
CPU1 CPU2 fput(file) -> __fput(file) -> eventpoll_release(file) -> eventpoll_release_file(file) mutex_lock(&ep->mtx) epi_item_poll() -> epi_fget() -> file_ref_get(file) mutex_unlock(&ep->mtx) mutex_lock(&ep->mtx); __ep_remove() mutex_unlock(&ep->mtx); -> kmem_cache_free(file)
(2) ttm dmabuf
This explanation is a bit more involved. A regular dmabuf file stashed the dmabuf in file->private_data and the file in dmabuf->file:
file->private_data = dmabuf; dmabuf->file = file;
The generic release method of a dmabuf file handles file specific things:
f_op->release::dma_buf_file_release()
while the generic dentry release method of a dmabuf handles dmabuf freeing including driver specific things:
dentry->d_release::dma_buf_release()
During ttm dmabuf initialization in ttm_object_device_init() the ttm driver copies the provided struct dma_buf_ops into a private location:
struct ttm_object_device { spinlock_t object_lock; struct dma_buf_ops ops; void (*dmabuf_release)(struct dma_buf *dma_buf); struct idr idr; };
ttm_object_device_init(const struct dma_buf_ops *ops) { // copy original dma_buf_ops in private location tdev->ops = *ops;
// stash the release method of the original struct dma_buf_ops tdev->dmabuf_release = tdev->ops.release;
// override the release method in the copy of the struct dma_buf_ops // with ttm's own dmabuf release method tdev->ops.release = ttm_prime_dmabuf_release; }
When a new dmabuf is created the struct dma_buf_ops with the overriden release method set to ttm_prime_dmabuf_release is passed in exp_info.ops:
DEFINE_DMA_BUF_EXPORT_INFO(exp_info); exp_info.ops = &tdev->ops; exp_info.size = prime->size; exp_info.flags = flags; exp_info.priv = prime;
The call to dma_buf_export() then sets
mutex_lock_interruptible(&prime->mutex); dma_buf = dma_buf_export(&exp_info) { dmabuf->ops = exp_info->ops; } mutex_unlock(&prime->mutex);
which creates a new dmabuf file and then install a file descriptor to it in the callers file descriptor table:
ret = dma_buf_fd(dma_buf, flags);
When that dmabuf file is closed we now get:
fput(file) -> __fput(file) -> f_op->release::dma_buf_file_release() -> dput() -> d_op->d_release::dma_buf_release() -> dmabuf->ops->release::ttm_prime_dmabuf_release() mutex_lock(&prime->mutex); if (prime->dma_buf == dma_buf) prime->dma_buf = NULL; mutex_unlock(&prime->mutex);
Where we can see that prime->dma_buf is set to NULL. So when we have the following diagram:
CPU1 CPU2 fput(file) -> __fput(file) -> f_op->release::dma_buf_file_release() -> dput() -> d_op->d_release::dma_buf_release() -> dmabuf->ops->release::ttm_prime_dmabuf_release() ttm_prime_handle_to_fd() mutex_lock_interruptible(&prime->mutex) dma_buf = prime->dma_buf dma_buf && get_dma_buf_unless_doomed(dma_buf) -> file_ref_get(dma_buf->file) mutex_unlock(&prime->mutex);
mutex_lock(&prime->mutex); if (prime->dma_buf == dma_buf) prime->dma_buf = NULL; mutex_unlock(&prime->mutex); -> kmem_cache_free(file)
The logic of the mechanism is the same as for epoll: sync with __fput() preventing the file from being freed. Here the synchronization happens through the ttm instance's prime->mutex. Basically, the lifetime of the dma_buf and the file are tighly coupled.
Both (1) and (2) used to call atomic_inc_not_zero() to check whether the file has already been marked dead and then refuse to revive it.
This is only safe because both (1) and (2) sync with __fput() and thus prevent kmem_cache_free() on the file being called and thus prevent the file from being immediately recycled due to SLAB_TYPESAFE_BY_RCU.
Both (1) and (2) have been ported from atomic_inc_not_zero() to file_ref_get(). That means a file that is already in the process of being marked as FILE_REF_DEAD:
file_ref_put() cnt = atomic_long_dec_return() -> __file_ref_put(cnt) if (cnt == FIlE_REF_NOREF) atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD)
can be revived again:
CPU1 CPU2 file_ref_put() cnt = atomic_long_dec_return() -> __file_ref_put(cnt) if (cnt == FIlE_REF_NOREF) file_ref_get() // Brings reference back to FILE_REF_ONEREF atomic_long_add_negative() atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD)
This is fine and inherent to the file_ref_get()/file_ref_put() semantics. For both (1) and (2) this is safe because __fput() is prevented from making progress if file_ref_get() fails due to the aforementioned synchronization mechanisms.
Two cases need to be considered that affect both (1) epoll and (2) ttm dmabuf:
(i) fput()'s file_ref_put() and marks the file as FILE_REF_NOREF but before that fput() can mark the file as FILE_REF_DEAD someone manages to sneak in a file_ref_get() and brings the refcount back from FILE_REF_NOREF to FILE_REF_ONEREF. In that case the original fput() doesn't call __fput(). For epoll the poll will finish and for ttm dmabuf the file can be used again. For ttm dambuf this is actually an advantage because it avoids immediately allocating a new dmabuf object.
CPU1 CPU2 file_ref_put() cnt = atomic_long_dec_return() -> __file_ref_put(cnt) if (cnt == FIlE_REF_NOREF) file_ref_get() // Brings reference back to FILE_REF_ONEREF atomic_long_add_negative() atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD)
(ii) fput()'s file_ref_put() marks the file FILE_REF_NOREF and also suceeds in actually marking it FILE_REF_DEAD and then calls into __fput() to free the file.
When either (1) or (2) call file_ref_get() they fail as atomic_long_add_negative() will return true.
At the same time, both (1) and (2) all file_ref_get() under mutexes that __fput() must also acquire preventing kmem_cache_free() from freeing the file.
So while this might be treated as a change in semantics for (1) and (2) it really isn't. It if should end up causing issues this can be fixed by adding a helper that does something like:
long cnt = atomic_long_read(&ref->refcnt); do { if (cnt < 0) return false; } while (!atomic_long_try_cmpxchg(&ref->refcnt, &cnt, cnt + 1)); return true;
which would block FILE_REF_NOREF to FILE_REF_ONEREF transitions.
- Jann correctly pointed out that kmem_cache_zalloc() cannot be used anymore once files have been ported to file_ref_t.
The kmem_cache_zalloc() call will memset() the whole struct file to zero when it is reallocated. This will also set file->f_ref to zero which mens that a concurrent file_ref_get() can return true:
CPU1 CPU2 __get_file_rcu() rcu_dereference_raw() close() [frees file] alloc_empty_file() kmem_cache_zalloc() [reallocates same file] memset(..., 0, ...) file_ref_get() [increments 0->1, returns true] init_file() file_ref_init(..., 1) [sets to 0] rcu_dereference_raw() fput() file_ref_put() [decrements 0->FILE_REF_NOREF, frees file] [UAF]
causing a concurrent __get_file_rcu() call to acquire a reference to the file that is about to be reallocated and immediately freeing it on realizing that it has been recycled. This causes a UAF for the task that reallocated/recycled the file.
This is prevented by switching from kmem_cache_zalloc() to kmem_cache_alloc() and initializing the fields manually. With file->f_ref initialized last.
Note that a memset() also isn't guaranteed to atomically update an unsigned long so it's theoretically possible to see torn and therefore bogus counter values.
Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
8b1bc259 |
| 07-Oct-2024 |
Christian Brauner <[email protected]> |
fs: protect backing files with rcu
Currently backing files are not under any form of rcu protection. Switching to file_ref requires rcu protection and so does the speculative vma lookup. Switch back
fs: protect backing files with rcu
Currently backing files are not under any form of rcu protection. Switching to file_ref requires rcu protection and so does the speculative vma lookup. Switch backing files to the same rcu slab type as regular files. There should be no additional magic required as the lifetime of a backing file is always tied to a regular file.
Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
be5498ca |
| 03-Jun-2024 |
Al Viro <[email protected]> |
remove pointless includes of <linux/fdtable.h>
some of those used to be needed, some had been cargo-culted for no reason...
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Vir
remove pointless includes of <linux/fdtable.h>
some of those used to be needed, some had been cargo-culted for no reason...
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
5e9b50de |
| 30-Aug-2024 |
Christian Brauner <[email protected]> |
fs: add f_pipe
Only regular files with FMODE_ATOMIC_POS and directories need f_pos_lock. Place a new f_pipe member in a union with f_pos_lock that they can use and make them stop abusing f_version i
fs: add f_pipe
Only regular files with FMODE_ATOMIC_POS and directories need f_pos_lock. Place a new f_pipe member in a union with f_pos_lock that they can use and make them stop abusing f_version in follow-up patches.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
5f7d2566 |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
file: port to struct kmem_cache_args
Port filp_cache to struct kmem_cache_args.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Mike Rapoport (Micros
file: port to struct kmem_cache_args
Port filp_cache to struct kmem_cache_args.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
ea566e18 |
| 28-Aug-2024 |
Christian Brauner <[email protected]> |
fs: use kmem_cache_create_rcu()
Switch to the new kmem_cache_create_rcu() helper which allows us to use a custom free pointer offset avoiding the need to have an external free pointer which would gr
fs: use kmem_cache_create_rcu()
Switch to the new kmem_cache_create_rcu() helper which allows us to use a custom free pointer offset avoiding the need to have an external free pointer which would grow struct file behind our backs.
Link: https://lore.kernel.org/r/[email protected] Acked-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
1934b212 |
| 09-Aug-2024 |
Christian Brauner <[email protected]> |
file: reclaim 24 bytes from f_owner
We do embedd struct fown_struct into struct file letting it take up 32 bytes in total. We could tweak struct fown_struct to be more compact but really it shouldn'
file: reclaim 24 bytes from f_owner
We do embedd struct fown_struct into struct file letting it take up 32 bytes in total. We could tweak struct fown_struct to be more compact but really it shouldn't even be embedded in struct file in the first place.
Instead, actual users of struct fown_struct should allocate the struct on demand. This frees up 24 bytes in struct file.
That will have some potentially user-visible changes for the ownership fcntl()s. Some of them can now fail due to allocation failures. Practically, that probably will almost never happen as the allocations are small and they only happen once per file.
The fown_struct is used during kill_fasync() which is used by e.g., pipes to generate a SIGIO signal. Sending of such signals is conditional on userspace having set an owner for the file using one of the F_OWNER fcntl()s. Such users will be unaffected if struct fown_struct is allocated during the fcntl() call.
There are a few subsystems that call __f_setown() expecting file->f_owner to be allocated:
(1) tun devices file->f_op->fasync::tun_chr_fasync() -> __f_setown()
There are no callers of tun_chr_fasync().
(2) tty devices
file->f_op->fasync::tty_fasync() -> __tty_fasync() -> __f_setown()
tty_fasync() has no additional callers but __tty_fasync() has. Note that __tty_fasync() only calls __f_setown() if the @on argument is true. It's called from:
file->f_op->release::tty_release() -> tty_release() -> __tty_fasync() -> __f_setown()
tty_release() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of tty_release() are safe as well.
file->f_op->release::tty_open() -> tty_release() -> __tty_fasync() -> __f_setown()
__tty_hangup() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of __tty_hangup() are safe as well.
From the callchains it's obvious that (1) and (2) end up getting called via file->f_op->fasync(). That can happen either through the F_SETFL fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If FASYNC is requested and the file isn't already FASYNC then file->f_op->fasync() is called with @on true which ends up causing both (1) and (2) to call __f_setown().
(1) and (2) are the only subsystems that call __f_setown() from the file->f_op->fasync() handler. So both (1) and (2) have been updated to allocate a struct fown_struct prior to calling fasync_helper() to register with the fasync infrastructure. That's safe as they both call fasync_helper() which also does allocations if @on is true.
The other interesting case are file leases:
(3) file leases lease_manager_ops->lm_setup::lease_setup() -> __f_setown()
Which in turn is called from:
generic_add_lease() -> lease_manager_ops->lm_setup::lease_setup() -> __f_setown()
So here again we can simply make generic_add_lease() allocate struct fown_struct prior to the lease_manager_ops->lm_setup::lease_setup() which happens under a spinlock.
With that the two remaining subsystems that call __f_setown() are:
(4) dnotify (5) sockets
Both have their own custom ioctls to set struct fown_struct and both have been converted to allocate a struct fown_struct on demand from their respective ioctls.
Interactions with O_PATH are fine as well e.g., when opening a /dev/tty as O_PATH then no file->f_op->open() happens thus no file->f_owner is allocated. That's fine as no file operation will be set for those and the device has never been opened. fcntl()s called on such things will just allocate a ->f_owner on demand. Although I have zero idea why'd you care about f_owner on an O_PATH fd.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
de7007f2 |
| 27-Jul-2024 |
Mohit0404 <[email protected]> |
Fixed: fs: file_table_c: Missing blank line warnings and struct declaration improved
Fixed- WARNING: Missing a blank line after declarations WARNING: Missing a blank line after declarations Decla
Fixed: fs: file_table_c: Missing blank line warnings and struct declaration improved
Fixed- WARNING: Missing a blank line after declarations WARNING: Missing a blank line after declarations Declaration format: improved struct file declaration format
Signed-off-by: Mohit0404 <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
78eb4ea2 |
| 24-Jul-2024 |
Joel Granados <[email protected]> |
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ct
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified.
This patch has been generated by the following coccinelle script:
``` virtual patch
@r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@ identifier func, ctl, write, buffer, lenp, ppos; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... }
@r3@ identifier func; @@
int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *);
@r4@ identifier func, ctl; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *);
@r5@ identifier func, write, buffer, lenp, ppos; @@
int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration.
Co-developed-by: Thomas Weißschuh <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Co-developed-by: Joel Granados <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4 |
|
| #
bac0a9e5 |
| 08-Feb-2024 |
Christian Brauner <[email protected]> |
file: add alloc_file_pseudo_noaccount()
When we open block devices as files we want to make sure to not charge them against the open file limit of the caller as that can cause spurious failures.
Li
file: add alloc_file_pseudo_noaccount()
When we open block devices as files we want to make sure to not charge them against the open file limit of the caller as that can cause spurious failures.
Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
0873add0 |
| 08-Feb-2024 |
Christian Brauner <[email protected]> |
file: prepare for new helper
In order to add a helper to open files that aren't accounted split alloc_file() and parts of alloc_file_pseudo() into helpers. One to prepare a path, another one to setu
file: prepare for new helper
In order to add a helper to open files that aren't accounted split alloc_file() and parts of alloc_file_pseudo() into helpers. One to prepare a path, another one to setup the file.
Suggested-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
cd3cec0a |
| 15-Feb-2024 |
Roberto Sassu <[email protected]> |
ima: Move to LSM infrastructure
Move hardcoded IMA function calls (not appraisal-specific functions) from various places in the kernel to the LSM infrastructure, by introducing a new LSM named 'ima'
ima: Move to LSM infrastructure
Move hardcoded IMA function calls (not appraisal-specific functions) from various places in the kernel to the LSM infrastructure, by introducing a new LSM named 'ima' (at the end of the LSM list and always enabled like 'integrity').
Having IMA before EVM in the Makefile is sufficient to preserve the relative order of the new 'ima' LSM in respect to the upcoming 'evm' LSM, and thus the order of IMA and EVM function calls as when they were hardcoded.
Make moved functions as static (except ima_post_key_create_or_update(), which is not in ima_main.c), and register them as implementation of the respective hooks in the new function init_ima_lsm().
Select CONFIG_SECURITY_PATH, to ensure that the path-based LSM hook path_post_mknod is always available and ima_post_path_mknod() is always executed to mark files as new, as before the move.
A slight difference is that IMA and EVM functions registered for the inode_post_setattr, inode_post_removexattr, path_post_mknod, inode_post_create_tmpfile, inode_post_set_acl and inode_post_remove_acl won't be executed for private inodes. Since those inodes are supposed to be fs-internal, they should not be of interest to IMA or EVM. The S_PRIVATE flag is used for anonymous inodes, hugetlbfs, reiserfs xattrs, XFS scrub and kernel-internal tmpfs files.
Conditionally register ima_post_key_create_or_update() if CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS is enabled. Also, conditionally register ima_kernel_module_request() if CONFIG_INTEGRITY_ASYMMETRIC_KEYS is enabled.
Finally, add the LSM_ID_IMA case in lsm_list_modules_test.c.
Signed-off-by: Roberto Sassu <[email protected]> Acked-by: Chuck Lever <[email protected]> Acked-by: Casey Schaufler <[email protected]> Acked-by: Christian Brauner <[email protected]> Reviewed-by: Stefan Berger <[email protected]> Reviewed-by: Mimi Zohar <[email protected]> Acked-by: Mimi Zohar <[email protected]> Signed-off-by: Paul Moore <[email protected]>
show more ...
|
| #
f09068b5 |
| 15-Feb-2024 |
Roberto Sassu <[email protected]> |
security: Introduce file_release hook
In preparation for moving IMA and EVM to the LSM infrastructure, introduce the file_release hook.
IMA calculates at file close the new digest of the file conte
security: Introduce file_release hook
In preparation for moving IMA and EVM to the LSM infrastructure, introduce the file_release hook.
IMA calculates at file close the new digest of the file content and writes it to security.ima, so that appraisal at next file access succeeds.
The new hook cannot return an error and cannot cause the operation to be reverted.
Signed-off-by: Roberto Sassu <[email protected]> Acked-by: Christian Brauner <[email protected]> Reviewed-by: Stefan Berger <[email protected]> Reviewed-by: Mimi Zohar <[email protected]> Signed-off-by: Paul Moore <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
9d5b9475 |
| 21-Nov-2023 |
Joel Granados <[email protected]> |
fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels
fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size.
Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Acked-by: Christian Brauner <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
show more ...
|
| #
372a34e6 |
| 30-Nov-2023 |
Christian Brauner <[email protected]> |
fs: replace f_rcuhead with f_task_work
The naming is actively misleading since we switched to SLAB_TYPESAFE_BY_RCU. rcu_head is #define callback_head. Use callback_head directly and rename f_rcuhead
fs: replace f_rcuhead with f_task_work
The naming is actively misleading since we switched to SLAB_TYPESAFE_BY_RCU. rcu_head is #define callback_head. Use callback_head directly and rename f_rcuhead to f_task_work.
Add comments in there to explain what it's used for.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
7cb537b6 |
| 26-Nov-2023 |
Al Viro <[email protected]> |
file: massage cleanup of files that failed to open
A file that has never gotten FMODE_OPENED will never have RCU-accessed references, its final fput() is equivalent to file_free() and if it doesn't
file: massage cleanup of files that failed to open
A file that has never gotten FMODE_OPENED will never have RCU-accessed references, its final fput() is equivalent to file_free() and if it doesn't have FMODE_BACKING either, it can be done from any context and won't need task_work treatment.
Now that we have SLAB_TYPESAFE_BY_RCU we can simplify this and have other callers benefit. All of that can be achieved easier is to make fput() recoginze that case and call file_free() directly.
No need to introduce a special primitive for that. It also allowed things like failing dentry_open() could benefit from that as well.
Signed-off-by: Al Viro <[email protected]> [Christian Brauner <[email protected]>: massage commit message] Link: https://lore.kernel.org/r/20231126020834.GC38156@ZenIV Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1 |
|
| #
9024b4c9 |
| 12-Nov-2023 |
Al Viro <[email protected]> |
d_alloc_pseudo(): move setting ->d_op there from the (sole) caller
Signed-off-by: Al Viro <[email protected]>
|