History log of /linux-6.15/fs/file_table.c (Results 1 – 25 of 178)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14
# 9ae7e5a1 19-Mar-2025 Mateusz Guzik <[email protected]>

fs: predict not reaching the limit in alloc_empty_file()

Eliminates a jump over a call to capable() in the common case.

By default the limit is not even set, in which case the check can't even
fail

fs: predict not reaching the limit in alloc_empty_file()

Eliminates a jump over a call to capable() in the common case.

By default the limit is not even set, in which case the check can't even
fail to begin with. It remains unset at least on Debian and Ubuntu.
For this cases this can probably become a static branch instead.

In the meantime tidy it up.

I note the check separate from the bump makes the entire thing racy.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.14-rc7, v6.14-rc6
# e8358845 05-Mar-2025 Mateusz Guzik <[email protected]>

file: add fput and file_ref_put routines optimized for use when closing a fd

Vast majority of the time closing a file descriptor also operates on the
last reference, where a regular fput usage will

file: add fput and file_ref_put routines optimized for use when closing a fd

Vast majority of the time closing a file descriptor also operates on the
last reference, where a regular fput usage will result in 2 atomics.
This can be changed to only suffer 1.

See commentary above file_ref_put_close() for more information.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2
# 711f9b8f 03-Feb-2025 Amir Goldstein <[email protected]>

fsnotify: disable pre-content and permission events by default

After introducing pre-content events, we had a regression related to
disabling huge faults on files that should never have pre-content

fsnotify: disable pre-content and permission events by default

After introducing pre-content events, we had a regression related to
disabling huge faults on files that should never have pre-content events
enabled.

This happened because the default f_mode of allocated files (0) does
not disable pre-content events.

Pre-content events are disabled in file_set_fsnotify_mode_by_watchers()
but internal files may not get to call this helper.

Initialize f_mode to disable permission and pre-content events for all
files and if needed they will be enabled for the callers of
file_set_fsnotify_mode_by_watchers().

Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson <[email protected]>
Closes: https://lore.kernel.org/linux-fsdevel/[email protected]/
Tested-by: Alex Williamson <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 2a42754b 03-Feb-2025 Amir Goldstein <[email protected]>

fsnotify: disable notification by default for all pseudo files

Most pseudo files are not applicable for fsnotify events at all,
let alone to the new pre-content events.

Disable notifications to all

fsnotify: disable notification by default for all pseudo files

Most pseudo files are not applicable for fsnotify events at all,
let alone to the new pre-content events.

Disable notifications to all files allocated with alloc_file_pseudo()
and enable legacy inotify events for the specific cases of pipe and
socket, which have known users of inotify events.

Pre-content events are also kept disabled for sockets and pipes.

Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson <[email protected]>
Closes: https://lore.kernel.org/linux-fsdevel/[email protected]/
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/
Tested-by: Alex Williamson <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.14-rc1
# 1751f872 28-Jan-2025 Joel Granados <[email protected]>

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysc

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
virtual patch

@
depends on !(file in "net")
disable optional_qualifier
@

identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@

+ const
struct ctl_table table_name [] = { ... };

sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c

Reviewed-by: Song Liu <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/
Reviewed-by: Martin K. Petersen <[email protected]> # SCSI
Reviewed-by: Darrick J. Wong <[email protected]> # xfs
Acked-by: Jani Nikula <[email protected]>
Acked-by: Corey Minyard <[email protected]>
Acked-by: Wei Liu <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Reviewed-by: Bill O'Donnell <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Ashutosh Dixit <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


# c1feab95 24-Jan-2025 Al Viro <[email protected]>

add a string-to-qstr constructor

Quite a few places want to build a struct qstr by given string;
it would be convenient to have a primitive doing that, rather
than open-coding it via QSTR_INIT().

T

add a string-to-qstr constructor

Quite a few places want to build a struct qstr by given string;
it would be convenient to have a primitive doing that, rather
than open-coding it via QSTR_INIT().

The closest approximation was in bcachefs, but that expands to
initializer list - {.len = strlen(string), .name = string}.
It would be more useful to have it as compound literal -
(struct qstr){.len = strlen(string), .name = string}.

Unlike initializer list it's a valid expression. What's more,
it's a valid lvalue - it's an equivalent of anonymous local
variable with such initializer, so the things like
path->dentry = d_alloc_pseudo(mnt->mnt_sb, &QSTR(name));
are valid. It can also be used as initializer, with identical
effect -
struct qstr x = (struct qstr){.name = s, .len = strlen(s)};
is equivalent to
struct qstr anon_variable = {.name = s, .len = strlen(s)};
struct qstr x = anon_variable;
// anon_variable is never used after that point
and any even remotely sane compiler will manage to collapse that
into
struct qstr x = {.name = s, .len = strlen(s)};

What compound literals can't be used for is initialization of
global variables, but those are covered by QSTR_INIT().

This commit lifts definition(s) of QSTR() into linux/dcache.h,
converts it to compound literal (all bcachefs users are fine
with that) and converts assorted open-coded instances to using
that.

Signed-off-by: Al Viro <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5
# 9b7da575 23-Oct-2024 shao mingyin <[email protected]>

file: flush delayed work in delayed fput()

The fput() of file rcS might not have completed causing issues when
executing the file.

rcS is opened in do_populate_rootfs before executed. At the end of

file: flush delayed work in delayed fput()

The fput() of file rcS might not have completed causing issues when
executing the file.

rcS is opened in do_populate_rootfs before executed. At the end of
do_populate_rootfs() flush_delayed_fput() is called. Now
do_populate_rootfs() assumes that all fput()s caused by
do_populate_rootfs() have completed.

But flush_delayed_fput() can only ensure that fput() on the current
delayed_fput_list has finished. Any file that has been removed from
delayed_fput_list asynchronously in the meantime might not have
completed causing the exec to fail.

do_populate_rootfs delayed_fput_list delayed_fput execve
fput() a
fput() a->b
fput() a->b->rcS
__fput(a)
fput() c
fput() c->d
__fput(b)
flush_delayed_fput
__fput(c)
__fput(d)
__fput(b)
__fput(b) execve(rcS)

Ensure that all delayed work is done by calling flush_delayed_work() in
flush_delayed_fput() explicitly.

Signed-off-by: Chen Lin <[email protected]>
Signed-off-by: Shao Mingyin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Yang Yang <[email protected]>
Cc: Yang Tao <[email protected]>
Cc: Xu Xin <[email protected]>
[brauner: rewrite commit message]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# d727935c 24-Nov-2024 Jinliang Zheng <[email protected]>

fs: fix proc_handler for sysctl_nr_open

Use proc_douintvec_minmax() instead of proc_dointvec_minmax() to handle
sysctl_nr_open, because its data type is unsigned int, not int.

Fixes: 9b80a184eaad (

fs: fix proc_handler for sysctl_nr_open

Use proc_douintvec_minmax() instead of proc_dointvec_minmax() to handle
sysctl_nr_open, because its data type is unsigned int, not int.

Fixes: 9b80a184eaad ("fs/file: more unsigned file descriptors")
Signed-off-by: Jinliang Zheng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.12-rc4, v6.12-rc3
# 90ee6ed7 07-Oct-2024 Christian Brauner <[email protected]>

fs: port files to file_ref

Port files to rely on file_ref reference to improve scaling and gain
overflow protection.

- We continue to WARN during get_file() in case a file that is already
marked

fs: port files to file_ref

Port files to rely on file_ref reference to improve scaling and gain
overflow protection.

- We continue to WARN during get_file() in case a file that is already
marked dead is revived as get_file() is only valid if the caller
already holds a reference to the file. This hasn't changed just the
check changes.

- The semantics for epoll and ttm's dmabuf usage have changed. Both
epoll and ttm synchronize with __fput() to prevent the underlying file
from beeing freed.

(1) epoll

Explaining epoll is straightforward using a simple diagram.
Essentially, the mutex of the epoll instance needs to be taken in both
__fput() and around epi_fget() preventing the file from being freed
while it is polled or preventing the file from being resurrected.

CPU1 CPU2
fput(file)
-> __fput(file)
-> eventpoll_release(file)
-> eventpoll_release_file(file)
mutex_lock(&ep->mtx)
epi_item_poll()
-> epi_fget()
-> file_ref_get(file)
mutex_unlock(&ep->mtx)
mutex_lock(&ep->mtx);
__ep_remove()
mutex_unlock(&ep->mtx);
-> kmem_cache_free(file)

(2) ttm dmabuf

This explanation is a bit more involved. A regular dmabuf file stashed
the dmabuf in file->private_data and the file in dmabuf->file:

file->private_data = dmabuf;
dmabuf->file = file;

The generic release method of a dmabuf file handles file specific
things:

f_op->release::dma_buf_file_release()

while the generic dentry release method of a dmabuf handles dmabuf
freeing including driver specific things:

dentry->d_release::dma_buf_release()

During ttm dmabuf initialization in ttm_object_device_init() the ttm
driver copies the provided struct dma_buf_ops into a private location:

struct ttm_object_device {
spinlock_t object_lock;
struct dma_buf_ops ops;
void (*dmabuf_release)(struct dma_buf *dma_buf);
struct idr idr;
};

ttm_object_device_init(const struct dma_buf_ops *ops)
{
// copy original dma_buf_ops in private location
tdev->ops = *ops;

// stash the release method of the original struct dma_buf_ops
tdev->dmabuf_release = tdev->ops.release;

// override the release method in the copy of the struct dma_buf_ops
// with ttm's own dmabuf release method
tdev->ops.release = ttm_prime_dmabuf_release;
}

When a new dmabuf is created the struct dma_buf_ops with the overriden
release method set to ttm_prime_dmabuf_release is passed in exp_info.ops:

DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
exp_info.ops = &tdev->ops;
exp_info.size = prime->size;
exp_info.flags = flags;
exp_info.priv = prime;

The call to dma_buf_export() then sets

mutex_lock_interruptible(&prime->mutex);
dma_buf = dma_buf_export(&exp_info)
{
dmabuf->ops = exp_info->ops;
}
mutex_unlock(&prime->mutex);

which creates a new dmabuf file and then install a file descriptor to
it in the callers file descriptor table:

ret = dma_buf_fd(dma_buf, flags);

When that dmabuf file is closed we now get:

fput(file)
-> __fput(file)
-> f_op->release::dma_buf_file_release()
-> dput()
-> d_op->d_release::dma_buf_release()
-> dmabuf->ops->release::ttm_prime_dmabuf_release()
mutex_lock(&prime->mutex);
if (prime->dma_buf == dma_buf)
prime->dma_buf = NULL;
mutex_unlock(&prime->mutex);

Where we can see that prime->dma_buf is set to NULL. So when we have
the following diagram:

CPU1 CPU2
fput(file)
-> __fput(file)
-> f_op->release::dma_buf_file_release()
-> dput()
-> d_op->d_release::dma_buf_release()
-> dmabuf->ops->release::ttm_prime_dmabuf_release()
ttm_prime_handle_to_fd()
mutex_lock_interruptible(&prime->mutex)
dma_buf = prime->dma_buf
dma_buf && get_dma_buf_unless_doomed(dma_buf)
-> file_ref_get(dma_buf->file)
mutex_unlock(&prime->mutex);

mutex_lock(&prime->mutex);
if (prime->dma_buf == dma_buf)
prime->dma_buf = NULL;
mutex_unlock(&prime->mutex);
-> kmem_cache_free(file)

The logic of the mechanism is the same as for epoll: sync with
__fput() preventing the file from being freed. Here the
synchronization happens through the ttm instance's prime->mutex.
Basically, the lifetime of the dma_buf and the file are tighly
coupled.

Both (1) and (2) used to call atomic_inc_not_zero() to check whether
the file has already been marked dead and then refuse to revive it.

This is only safe because both (1) and (2) sync with __fput() and thus
prevent kmem_cache_free() on the file being called and thus prevent
the file from being immediately recycled due to SLAB_TYPESAFE_BY_RCU.

Both (1) and (2) have been ported from atomic_inc_not_zero() to
file_ref_get(). That means a file that is already in the process of
being marked as FILE_REF_DEAD:

file_ref_put()
cnt = atomic_long_dec_return()
-> __file_ref_put(cnt)
if (cnt == FIlE_REF_NOREF)
atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD)

can be revived again:

CPU1 CPU2
file_ref_put()
cnt = atomic_long_dec_return()
-> __file_ref_put(cnt)
if (cnt == FIlE_REF_NOREF)
file_ref_get()
// Brings reference back to FILE_REF_ONEREF
atomic_long_add_negative()
atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD)

This is fine and inherent to the file_ref_get()/file_ref_put()
semantics. For both (1) and (2) this is safe because __fput() is
prevented from making progress if file_ref_get() fails due to the
aforementioned synchronization mechanisms.

Two cases need to be considered that affect both (1) epoll and (2) ttm
dmabuf:

(i) fput()'s file_ref_put() and marks the file as FILE_REF_NOREF but
before that fput() can mark the file as FILE_REF_DEAD someone
manages to sneak in a file_ref_get() and brings the refcount back
from FILE_REF_NOREF to FILE_REF_ONEREF. In that case the original
fput() doesn't call __fput(). For epoll the poll will finish and
for ttm dmabuf the file can be used again. For ttm dambuf this is
actually an advantage because it avoids immediately allocating
a new dmabuf object.

CPU1 CPU2
file_ref_put()
cnt = atomic_long_dec_return()
-> __file_ref_put(cnt)
if (cnt == FIlE_REF_NOREF)
file_ref_get()
// Brings reference back to FILE_REF_ONEREF
atomic_long_add_negative()
atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD)

(ii) fput()'s file_ref_put() marks the file FILE_REF_NOREF and
also suceeds in actually marking it FILE_REF_DEAD and then calls
into __fput() to free the file.

When either (1) or (2) call file_ref_get() they fail as
atomic_long_add_negative() will return true.

At the same time, both (1) and (2) all file_ref_get() under
mutexes that __fput() must also acquire preventing
kmem_cache_free() from freeing the file.

So while this might be treated as a change in semantics for (1) and
(2) it really isn't. It if should end up causing issues this can be
fixed by adding a helper that does something like:

long cnt = atomic_long_read(&ref->refcnt);
do {
if (cnt < 0)
return false;
} while (!atomic_long_try_cmpxchg(&ref->refcnt, &cnt, cnt + 1));
return true;

which would block FILE_REF_NOREF to FILE_REF_ONEREF transitions.

- Jann correctly pointed out that kmem_cache_zalloc() cannot be used
anymore once files have been ported to file_ref_t.

The kmem_cache_zalloc() call will memset() the whole struct file to
zero when it is reallocated. This will also set file->f_ref to zero
which mens that a concurrent file_ref_get() can return true:

CPU1 CPU2
__get_file_rcu()
rcu_dereference_raw()
close()
[frees file]
alloc_empty_file()
kmem_cache_zalloc()
[reallocates same file]
memset(..., 0, ...)
file_ref_get()
[increments 0->1, returns true]
init_file()
file_ref_init(..., 1)
[sets to 0]
rcu_dereference_raw()
fput()
file_ref_put()
[decrements 0->FILE_REF_NOREF, frees file]
[UAF]

causing a concurrent __get_file_rcu() call to acquire a reference to
the file that is about to be reallocated and immediately freeing it
on realizing that it has been recycled. This causes a UAF for the
task that reallocated/recycled the file.

This is prevented by switching from kmem_cache_zalloc() to
kmem_cache_alloc() and initializing the fields manually. With
file->f_ref initialized last.

Note that a memset() also isn't guaranteed to atomically update an
unsigned long so it's theoretically possible to see torn and
therefore bogus counter values.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 8b1bc259 07-Oct-2024 Christian Brauner <[email protected]>

fs: protect backing files with rcu

Currently backing files are not under any form of rcu protection.
Switching to file_ref requires rcu protection and so does the
speculative vma lookup. Switch back

fs: protect backing files with rcu

Currently backing files are not under any form of rcu protection.
Switching to file_ref requires rcu protection and so does the
speculative vma lookup. Switch backing files to the same rcu slab type
as regular files. There should be no additional magic required as the
lifetime of a backing file is always tied to a regular file.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# be5498ca 03-Jun-2024 Al Viro <[email protected]>

remove pointless includes of <linux/fdtable.h>

some of those used to be needed, some had been cargo-culted for
no reason...

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Vir

remove pointless includes of <linux/fdtable.h>

some of those used to be needed, some had been cargo-culted for
no reason...

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# 5e9b50de 30-Aug-2024 Christian Brauner <[email protected]>

fs: add f_pipe

Only regular files with FMODE_ATOMIC_POS and directories need
f_pos_lock. Place a new f_pipe member in a union with f_pos_lock
that they can use and make them stop abusing f_version i

fs: add f_pipe

Only regular files with FMODE_ATOMIC_POS and directories need
f_pos_lock. Place a new f_pipe member in a union with f_pos_lock
that they can use and make them stop abusing f_version in follow-up
patches.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 5f7d2566 05-Sep-2024 Christian Brauner <[email protected]>

file: port to struct kmem_cache_args

Port filp_cache to struct kmem_cache_args.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Mike Rapoport (Micros

file: port to struct kmem_cache_args

Port filp_cache to struct kmem_cache_args.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# ea566e18 28-Aug-2024 Christian Brauner <[email protected]>

fs: use kmem_cache_create_rcu()

Switch to the new kmem_cache_create_rcu() helper which allows us to use
a custom free pointer offset avoiding the need to have an external free
pointer which would gr

fs: use kmem_cache_create_rcu()

Switch to the new kmem_cache_create_rcu() helper which allows us to use
a custom free pointer offset avoiding the need to have an external free
pointer which would grow struct file behind our backs.

Link: https://lore.kernel.org/r/[email protected]
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 1934b212 09-Aug-2024 Christian Brauner <[email protected]>

file: reclaim 24 bytes from f_owner

We do embedd struct fown_struct into struct file letting it take up 32
bytes in total. We could tweak struct fown_struct to be more compact but
really it shouldn'

file: reclaim 24 bytes from f_owner

We do embedd struct fown_struct into struct file letting it take up 32
bytes in total. We could tweak struct fown_struct to be more compact but
really it shouldn't even be embedded in struct file in the first place.

Instead, actual users of struct fown_struct should allocate the struct
on demand. This frees up 24 bytes in struct file.

That will have some potentially user-visible changes for the ownership
fcntl()s. Some of them can now fail due to allocation failures.
Practically, that probably will almost never happen as the allocations
are small and they only happen once per file.

The fown_struct is used during kill_fasync() which is used by e.g.,
pipes to generate a SIGIO signal. Sending of such signals is conditional
on userspace having set an owner for the file using one of the F_OWNER
fcntl()s. Such users will be unaffected if struct fown_struct is
allocated during the fcntl() call.

There are a few subsystems that call __f_setown() expecting
file->f_owner to be allocated:

(1) tun devices
file->f_op->fasync::tun_chr_fasync()
-> __f_setown()

There are no callers of tun_chr_fasync().

(2) tty devices

file->f_op->fasync::tty_fasync()
-> __tty_fasync()
-> __f_setown()

tty_fasync() has no additional callers but __tty_fasync() has. Note
that __tty_fasync() only calls __f_setown() if the @on argument is
true. It's called from:

file->f_op->release::tty_release()
-> tty_release()
-> __tty_fasync()
-> __f_setown()

tty_release() calls __tty_fasync() with @on false
=> __f_setown() is never called from tty_release().
=> All callers of tty_release() are safe as well.

file->f_op->release::tty_open()
-> tty_release()
-> __tty_fasync()
-> __f_setown()

__tty_hangup() calls __tty_fasync() with @on false
=> __f_setown() is never called from tty_release().
=> All callers of __tty_hangup() are safe as well.

From the callchains it's obvious that (1) and (2) end up getting called
via file->f_op->fasync(). That can happen either through the F_SETFL
fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If
FASYNC is requested and the file isn't already FASYNC then
file->f_op->fasync() is called with @on true which ends up causing both
(1) and (2) to call __f_setown().

(1) and (2) are the only subsystems that call __f_setown() from the
file->f_op->fasync() handler. So both (1) and (2) have been updated to
allocate a struct fown_struct prior to calling fasync_helper() to
register with the fasync infrastructure. That's safe as they both call
fasync_helper() which also does allocations if @on is true.

The other interesting case are file leases:

(3) file leases
lease_manager_ops->lm_setup::lease_setup()
-> __f_setown()

Which in turn is called from:

generic_add_lease()
-> lease_manager_ops->lm_setup::lease_setup()
-> __f_setown()

So here again we can simply make generic_add_lease() allocate struct
fown_struct prior to the lease_manager_ops->lm_setup::lease_setup()
which happens under a spinlock.

With that the two remaining subsystems that call __f_setown() are:

(4) dnotify
(5) sockets

Both have their own custom ioctls to set struct fown_struct and both
have been converted to allocate a struct fown_struct on demand from
their respective ioctls.

Interactions with O_PATH are fine as well e.g., when opening a /dev/tty
as O_PATH then no file->f_op->open() happens thus no file->f_owner is
allocated. That's fine as no file operation will be set for those and
the device has never been opened. fcntl()s called on such things will
just allocate a ->f_owner on demand. Although I have zero idea why'd you
care about f_owner on an O_PATH fd.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# de7007f2 27-Jul-2024 Mohit0404 <[email protected]>

Fixed: fs: file_table_c: Missing blank line warnings and struct declaration improved

Fixed-
WARNING: Missing a blank line after declarations
WARNING: Missing a blank line after declarations
Decla

Fixed: fs: file_table_c: Missing blank line warnings and struct declaration improved

Fixed-
WARNING: Missing a blank line after declarations
WARNING: Missing a blank line after declarations
Declaration format: improved struct file declaration format

Signed-off-by: Mohit0404 <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 78eb4ea2 24-Jul-2024 Joel Granados <[email protected]>

sysctl: treewide: constify the ctl_table argument of proc_handlers

const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ct

sysctl: treewide: constify the ctl_table argument of proc_handlers

const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.

This patch has been generated by the following coccinelle script:

```
virtual patch

@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);

@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }

@r3@
identifier func;
@@

int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);

@r4@
identifier func, ctl;
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);

@r5@
identifier func, write, buffer, lenp, ppos;
@@

int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);

```

* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.

* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.

Co-developed-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Thomas Weißschuh <[email protected]>
Co-developed-by: Joel Granados <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4
# bac0a9e5 08-Feb-2024 Christian Brauner <[email protected]>

file: add alloc_file_pseudo_noaccount()

When we open block devices as files we want to make sure to not charge
them against the open file limit of the caller as that can cause
spurious failures.

Li

file: add alloc_file_pseudo_noaccount()

When we open block devices as files we want to make sure to not charge
them against the open file limit of the caller as that can cause
spurious failures.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 0873add0 08-Feb-2024 Christian Brauner <[email protected]>

file: prepare for new helper

In order to add a helper to open files that aren't accounted split
alloc_file() and parts of alloc_file_pseudo() into helpers. One to
prepare a path, another one to setu

file: prepare for new helper

In order to add a helper to open files that aren't accounted split
alloc_file() and parts of alloc_file_pseudo() into helpers. One to
prepare a path, another one to setup the file.

Suggested-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# cd3cec0a 15-Feb-2024 Roberto Sassu <[email protected]>

ima: Move to LSM infrastructure

Move hardcoded IMA function calls (not appraisal-specific functions) from
various places in the kernel to the LSM infrastructure, by introducing a
new LSM named 'ima'

ima: Move to LSM infrastructure

Move hardcoded IMA function calls (not appraisal-specific functions) from
various places in the kernel to the LSM infrastructure, by introducing a
new LSM named 'ima' (at the end of the LSM list and always enabled like
'integrity').

Having IMA before EVM in the Makefile is sufficient to preserve the
relative order of the new 'ima' LSM in respect to the upcoming 'evm' LSM,
and thus the order of IMA and EVM function calls as when they were
hardcoded.

Make moved functions as static (except ima_post_key_create_or_update(),
which is not in ima_main.c), and register them as implementation of the
respective hooks in the new function init_ima_lsm().

Select CONFIG_SECURITY_PATH, to ensure that the path-based LSM hook
path_post_mknod is always available and ima_post_path_mknod() is always
executed to mark files as new, as before the move.

A slight difference is that IMA and EVM functions registered for the
inode_post_setattr, inode_post_removexattr, path_post_mknod,
inode_post_create_tmpfile, inode_post_set_acl and inode_post_remove_acl
won't be executed for private inodes. Since those inodes are supposed to be
fs-internal, they should not be of interest to IMA or EVM. The S_PRIVATE
flag is used for anonymous inodes, hugetlbfs, reiserfs xattrs, XFS scrub
and kernel-internal tmpfs files.

Conditionally register ima_post_key_create_or_update() if
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS is enabled. Also, conditionally register
ima_kernel_module_request() if CONFIG_INTEGRITY_ASYMMETRIC_KEYS is enabled.

Finally, add the LSM_ID_IMA case in lsm_list_modules_test.c.

Signed-off-by: Roberto Sassu <[email protected]>
Acked-by: Chuck Lever <[email protected]>
Acked-by: Casey Schaufler <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Reviewed-by: Mimi Zohar <[email protected]>
Acked-by: Mimi Zohar <[email protected]>
Signed-off-by: Paul Moore <[email protected]>

show more ...


# f09068b5 15-Feb-2024 Roberto Sassu <[email protected]>

security: Introduce file_release hook

In preparation for moving IMA and EVM to the LSM infrastructure, introduce
the file_release hook.

IMA calculates at file close the new digest of the file conte

security: Introduce file_release hook

In preparation for moving IMA and EVM to the LSM infrastructure, introduce
the file_release hook.

IMA calculates at file close the new digest of the file content and writes
it to security.ima, so that appraisal at next file access succeeds.

The new hook cannot return an error and cannot cause the operation to be
reverted.

Signed-off-by: Roberto Sassu <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Reviewed-by: Mimi Zohar <[email protected]>
Signed-off-by: Paul Moore <[email protected]>

show more ...


Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3
# 9d5b9475 21-Nov-2023 Joel Granados <[email protected]>

fs: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels

fs: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)

Remove sentinel elements ctl_table struct. Special attention was placed in
making sure that an empty directory for fs/verity was created when
CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the
register sysctl call that expects a size.

Signed-off-by: Joel Granados <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>

show more ...


# 372a34e6 30-Nov-2023 Christian Brauner <[email protected]>

fs: replace f_rcuhead with f_task_work

The naming is actively misleading since we switched to
SLAB_TYPESAFE_BY_RCU. rcu_head is #define callback_head. Use
callback_head directly and rename f_rcuhead

fs: replace f_rcuhead with f_task_work

The naming is actively misleading since we switched to
SLAB_TYPESAFE_BY_RCU. rcu_head is #define callback_head. Use
callback_head directly and rename f_rcuhead to f_task_work.

Add comments in there to explain what it's used for.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 7cb537b6 26-Nov-2023 Al Viro <[email protected]>

file: massage cleanup of files that failed to open

A file that has never gotten FMODE_OPENED will never have RCU-accessed
references, its final fput() is equivalent to file_free() and if it
doesn't

file: massage cleanup of files that failed to open

A file that has never gotten FMODE_OPENED will never have RCU-accessed
references, its final fput() is equivalent to file_free() and if it
doesn't have FMODE_BACKING either, it can be done from any context and
won't need task_work treatment.

Now that we have SLAB_TYPESAFE_BY_RCU we can simplify this and have
other callers benefit. All of that can be achieved easier is to make
fput() recoginze that case and call file_free() directly.

No need to introduce a special primitive for that. It also allowed
things like failing dentry_open() could benefit from that as well.

Signed-off-by: Al Viro <[email protected]>
[Christian Brauner <[email protected]>: massage commit message]
Link: https://lore.kernel.org/r/20231126020834.GC38156@ZenIV
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.7-rc2, v6.7-rc1
# 9024b4c9 12-Nov-2023 Al Viro <[email protected]>

d_alloc_pseudo(): move setting ->d_op there from the (sole) caller

Signed-off-by: Al Viro <[email protected]>


12345678