|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
5be1fa8a |
| 08-Dec-2024 |
Al Viro <[email protected]> |
Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies f
Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed.
It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all.
It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances.
This commit only changes the calling conventions; making use of supplied values is left to followups.
NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs).
One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len.
Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4 |
|
| #
197231da |
| 18-Oct-2024 |
Thorsten Blum <[email protected]> |
proc: Fix W=1 build kernel-doc warning
Building the kernel with W=1 generates the following warning:
fs/proc/fd.c:81: warning: This comment starts with '/**', but isn't a kerne
proc: Fix W=1 build kernel-doc warning
Building the kernel with W=1 generates the following warning:
fs/proc/fd.c:81: warning: This comment starts with '/**', but isn't a kernel-doc comment.
Use a normal comment for the helper function proc_fdinfo_permission().
Signed-off-by: Thorsten Blum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
8fd3395e |
| 31-Jul-2024 |
Al Viro <[email protected]> |
get rid of ...lookup...fdget_rcu() family
Once upon a time, predecessors of those used to do file lookup without bumping a refcount, provided that caller held rcu_read_lock() across the lookup and w
get rid of ...lookup...fdget_rcu() family
Once upon a time, predecessors of those used to do file lookup without bumping a refcount, provided that caller held rcu_read_lock() across the lookup and whatever it wanted to read from the struct file found. When struct file allocation switched to SLAB_TYPESAFE_BY_RCU, that stopped being feasible and these primitives started to bump the file refcount for lookup result, requiring the caller to call fput() afterwards.
But that turned them pointless - e.g. rcu_read_lock(); file = lookup_fdget_rcu(fd); rcu_read_unlock(); is equivalent to file = fget_raw(fd); and all callers of lookup_fdget_rcu() are of that form. Similarly, task_lookup_fdget_rcu() calls can be replaced with calling fget_task(). task_lookup_next_fdget_rcu() doesn't have direct counterparts, but its callers would be happier if we replaced it with an analogue that deals with RCU internally.
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
698e7d16 |
| 09-Sep-2024 |
Yan Zhen <[email protected]> |
proc: Fix typo in the comment
The deference here confuses me.
Maybe here want to say that because show_fd_locks() does not dereference the files pointer, using the stale value of the files pointer
proc: Fix typo in the comment
The deference here confuses me.
Maybe here want to say that because show_fd_locks() does not dereference the files pointer, using the stale value of the files pointer is safe.
Correctly spelled comments make it easier for the reader to understand the code.
replace 'deferences' with 'dereferences' in the comment & replace 'inialized' with 'initialized' in the comment.
Signed-off-by: Yan Zhen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
cf71eaa1 |
| 06-Aug-2024 |
Christian Brauner <[email protected]> |
proc: block mounting on top of /proc/<pid>/fdinfo/*
Entries under /proc/<pid>/fdinfo/* are ephemeral and may go away before the process dies. As such allowing them to be used as mount points creates
proc: block mounting on top of /proc/<pid>/fdinfo/*
Entries under /proc/<pid>/fdinfo/* are ephemeral and may go away before the process dies. As such allowing them to be used as mount points creates the ability to leak mounts that linger until the process dies with no ability to unmount them until then. Don't allow using them as mountpoints.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
74ce2080 |
| 06-Aug-2024 |
Christian Brauner <[email protected]> |
proc: block mounting on top of /proc/<pid>/fd/*
Entries under /proc/<pid>/fd/* are ephemeral and may go away before the process dies. As such allowing them to be used as mount points creates the abi
proc: block mounting on top of /proc/<pid>/fd/*
Entries under /proc/<pid>/fd/* are ephemeral and may go away before the process dies. As such allowing them to be used as mount points creates the ability to leak mounts that linger until the process dies with no ability to unmount them until then. Don't allow using them as mountpoints.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
55d4860d |
| 06-Aug-2024 |
Christian Brauner <[email protected]> |
proc: proc_readfdinfo() -> proc_fdinfo_iterate()
Give the method to iterate through the fdinfo directory a better name.
Link: https://lore.kernel.org/r/20240806-work-procfs-v1-2-fb04e1d09f0c@kernel
proc: proc_readfdinfo() -> proc_fdinfo_iterate()
Give the method to iterate through the fdinfo directory a better name.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
b69181b8 |
| 06-Aug-2024 |
Christian Brauner <[email protected]> |
proc: proc_readfd() -> proc_fd_iterate()
Give the method to iterate through the fd directory a better name.
Link: https://lore.kernel.org/r/[email protected] Reviewe
proc: proc_readfd() -> proc_fd_iterate()
Give the method to iterate through the fd directory a better name.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
0a960ba4 |
| 01-May-2024 |
Tyler Hicks (Microsoft) <[email protected]> |
proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
The following commits loosened the permissions of /proc/<PID>/fdinfo/ directory, as well as the files within it, from 05
proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
The following commits loosened the permissions of /proc/<PID>/fdinfo/ directory, as well as the files within it, from 0500 to 0555 while also introducing a PTRACE_MODE_READ check between the current task and <PID>'s task:
- commit 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ") - commit 1927e498aee1 ("procfs: prevent unprivileged processes accessing fdinfo dir")
Before those changes, inode based system calls like inotify_add_watch(2) would fail when the current task didn't have sufficient read permissions:
[...] lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0 inotify_add_watch(64, "/proc/1/task/1/fdinfo", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE| IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = -1 EACCES (Permission denied) [...]
This matches the documented behavior in the inotify_add_watch(2) man page:
ERRORS EACCES Read access to the given file is not permitted.
After those changes, inotify_add_watch(2) started succeeding despite the current task not having PTRACE_MODE_READ privileges on the target task:
[...] lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 inotify_add_watch(64, "/proc/1/task/1/fdinfo", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE| IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = 1757 openat(AT_FDCWD, "/proc/1/task/1/fdinfo", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 EACCES (Permission denied) [...]
This change in behavior broke .NET prior to v7. See the github link below for the v7 commit that inadvertently/quietly (?) fixed .NET after the kernel changes mentioned above.
Return to the old behavior by moving the PTRACE_MODE_READ check out of the file .open operation and into the inode .permission operation:
[...] lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 inotify_add_watch(64, "/proc/1/task/1/fdinfo", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE| IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = -1 EACCES (Permission denied) [...]
Reported-by: Kevin Parsons (Microsoft) <[email protected]> Link: https://github.com/dotnet/runtime/commit/89e5469ac591b82d38510fe7de98346cce74ad4f Link: https://stackoverflow.com/questions/75379065/start-self-contained-net6-build-exe-as-service-on-raspbian-system-unauthorizeda Fixes: 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ") Cc: [email protected] Cc: Christian Brauner <[email protected]> Cc: Christian König <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Hardik Garg <[email protected]> Cc: Allen Pais <[email protected]> Signed-off-by: Tyler Hicks (Microsoft) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7 |
|
| #
f60d374d |
| 05-Jan-2024 |
Al Viro <[email protected]> |
close_on_exec(): pass files_struct instead of fdtable
both callers are happier that way...
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
|
|
Revision tags: v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4 |
|
| #
0ede61d8 |
| 29-Sep-2023 |
Christian Brauner <[email protected]> |
file: convert to SLAB_TYPESAFE_BY_RCU
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which
file: convert to SLAB_TYPESAFE_BY_RCU
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based freeing for files completely. This is a pretty sensitive change overall but it might actually be worth doing.
The main downside is the subtlety. The other one is that we should really wait for Jann's patch to land that enables KASAN to handle SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this exists.
With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times which requires a few changes. So it isn't sufficient anymore to just acquire a reference to the file in question under rcu using atomic_long_inc_not_zero() since the file might have already been recycled and someone else might have bumped the reference.
In other words, callers might see reference count bumps from newer users. For this reason it is necessary to verify that the pointer is the same before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu().
In addition, it isn't possible to access or check fields in struct file without first aqcuiring a reference on it. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a reference under rcu or they must hold the files_lock of the fdtable. Failing to do either one of this is a bug.
Thanks to Jann for pointing out that we need to ensure memory ordering between reallocations and pointer check by ensuring that all subsequent loads have a dependency on the second load in get_file_rcu() and providing a fixup that was folded into this patch.
Cc: Jann Horn <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6 |
|
| #
0d72b928 |
| 07-Aug-2023 |
Jeff Layton <[email protected]> |
fs: pass the request_mask to generic_fillattr
generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE
fs: pass the request_mask to generic_fillattr
generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps.
Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr.
Acked-by: Joseph Qi <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: "Paulo Alcantara (SUSE)" <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4 |
|
| #
4609e1f1 |
| 13-Jan-2023 |
Christian Brauner <[email protected]> |
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is j
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap.
Acked-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
show more ...
|
| #
b74d24f7 |
| 13-Jan-2023 |
Christian Brauner <[email protected]> |
fs: port ->getattr() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just
fs: port ->getattr() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap.
Acked-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6 |
|
| #
5970e15d |
| 20-Nov-2022 |
Jeff Layton <[email protected]> |
filelock: move file locking definitions to separate header file
The file locking definitions have lived in fs.h since the dawn of time, but they are only used by a small subset of the source files t
filelock: move file locking definitions to separate header file
The file locking definitions have lived in fs.h since the dawn of time, but they are only used by a small subset of the source files that include it.
Move the file locking definitions to a new header file, and add the appropriate #include directives to the source files that need them. By doing this we trim down fs.h a bit and limit the amount of rebuilding that has to be done when we make changes to the file locking APIs.
Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: Christian Brauner (Microsoft) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: David Howells <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Acked-by: Chuck Lever <[email protected]> Acked-by: Joseph Qi <[email protected]> Acked-by: Steve French <[email protected]> Acked-by: Al Viro <[email protected]> Acked-by: Darrick J. Wong <[email protected]> Signed-off-by: Jeff Layton <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7 |
|
| #
f1f1f256 |
| 22-Sep-2022 |
Ivan Babrou <[email protected]> |
proc: report open files as size in stat() for /proc/pid/fd
Many monitoring tools include open file count as a metric. Currently the only way to get this number is to enumerate the files in /proc/pi
proc: report open files as size in stat() for /proc/pid/fd
Many monitoring tools include open file count as a metric. Currently the only way to get this number is to enumerate the files in /proc/pid/fd.
The problem with the current approach is that it does many things people generally don't care about when they need one number for a metric. In our tests for cadvisor, which reports open file counts per cgroup, we observed that reading the number of open files is slow. Out of 35.23% of CPU time spent in `proc_readfd_common`, we see 29.43% spent in `proc_fill_cache`, which is responsible for filling dentry info. Some of this extra time is spinlock contention, but it's a contention for the lock we don't want to take to begin with.
We considered putting the number of open files in /proc/pid/status. Unfortunately, counting the number of fds involves iterating the open_files bitmap, which has a linear complexity in proportion with the number of open files (bitmap slots really, but it's close). We don't want to make /proc/pid/status any slower, so instead we put this info in /proc/pid/fd as a size member of the stat syscall result. Previously the reported number was zero, so there's very little risk of breaking anything, while still providing a somewhat logical way to count the open files with a fallback if it's zero.
RFC for this patch included iterating open fds under RCU. Thanks to Frank Hofmann for the suggestion to use the bitmap instead.
Previously:
``` $ sudo stat /proc/1/fd | head -n2 File: /proc/1/fd Size: 0 Blocks: 0 IO Block: 1024 directory ```
With this patch:
``` $ sudo stat /proc/1/fd | head -n2 File: /proc/1/fd Size: 65 Blocks: 0 IO Block: 1024 directory ```
Correctness check:
``` $ sudo ls /proc/1/fd | wc -l 65 ```
I added the docs for /proc/<pid>/fd while I'm at it.
[[email protected]: use bitmap_weight() to count the bits] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: include linux/bitmap.h for bitmap_weight()] [[email protected]: return errno from proc_fd_getattr() instead of setting negative size] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ivan Babrou <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Anton Mitterer <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Laight <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7 |
|
| #
1927e498 |
| 10-May-2022 |
Kalesh Singh <[email protected]> |
procfs: prevent unprivileged processes accessing fdinfo dir
The file permissions on the fdinfo dir from were changed from S_IRUSR|S_IXUSR to S_IRUGO|S_IXUGO, and a PTRACE_MODE_READ check was added f
procfs: prevent unprivileged processes accessing fdinfo dir
The file permissions on the fdinfo dir from were changed from S_IRUSR|S_IXUSR to S_IRUGO|S_IXUGO, and a PTRACE_MODE_READ check was added for opening the fdinfo files [1]. However, the ptrace permission check was not added to the directory, allowing anyone to get the open FD numbers by reading the fdinfo directory.
Add the missing ptrace permission check for opening the fdinfo directory.
[1] https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected] Fixes: 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ") Signed-off-by: Kalesh Singh <[email protected]> Cc: Kees Cook <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Hridya Valsaraju <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1 |
|
| #
3845f256 |
| 01-Jul-2021 |
Kalesh Singh <[email protected]> |
procfs/dmabuf: add inode number to /proc/*/fdinfo
And 'ino' field to /proc/<pid>/fdinfo/<FD> and /proc/<pid>/task/<tid>/fdinfo/<FD>.
The inode numbers can be used to uniquely identify DMA buffers i
procfs/dmabuf: add inode number to /proc/*/fdinfo
And 'ino' field to /proc/<pid>/fdinfo/<FD> and /proc/<pid>/task/<tid>/fdinfo/<FD>.
The inode numbers can be used to uniquely identify DMA buffers in user space and avoids a dependency on /proc/<pid>/fd/* when accounting per-process DMA buffer sizes.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kalesh Singh <[email protected]> Acked-by: Randy Dunlap <[email protected]> Acked-by: Christian König <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Kees Cook <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Hridya Valsaraju <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: Szabolcs Nagy <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Bernd Edlinger <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Helge Deller <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
7bc3fa01 |
| 01-Jul-2021 |
Kalesh Singh <[email protected]> |
procfs: allow reading fdinfo with PTRACE_MODE_READ
Android captures per-process system memory state when certain low memory events (e.g a foreground app kill) occur, to identify potential memory hog
procfs: allow reading fdinfo with PTRACE_MODE_READ
Android captures per-process system memory state when certain low memory events (e.g a foreground app kill) occur, to identify potential memory hoggers. In order to measure how much memory a process actually consumes, it is necessary to include the DMA buffer sizes for that process in the memory accounting. Since the handle to DMA buffers are raw FDs, it is important to be able to identify which processes have FD references to a DMA buffer.
Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and /proc/<pid>/fdinfo -- both are only readable by the process owner, as follows:
1. Do a readlink on each FD. 2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD. 3. stat the file to get the dmabuf inode number. 4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size.
Accessing other processes' fdinfo requires root privileges. This limits the use of the interface to debugging environments and is not suitable for production builds. Granting root privileges even to a system process increases the attack surface and is highly undesirable.
Since fdinfo doesn't permit reading process memory and manipulating process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kalesh Singh <[email protected]> Suggested-by: Jann Horn <[email protected]> Acked-by: Christian König <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Bernd Edlinger <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hridya Valsaraju <[email protected]> Cc: James Morris <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Szabolcs Nagy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5 |
|
| #
549c7297 |
| 21-Jan-2021 |
Christian Brauner <[email protected]> |
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has b
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches.
As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods.
Link: https://lore.kernel.org/r/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
47291baa |
| 21-Jan-2021 |
Christian Brauner <[email protected]> |
namei: make permission helpers idmapped mount aware
The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller
namei: make permission helpers idmapped mount aware
The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before.
Link: https://lore.kernel.org/r/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: James Morris <[email protected]> Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5 |
|
| #
775e0656 |
| 20-Nov-2020 |
Eric W. Biederman <[email protected]> |
proc/fd: In fdinfo seq_show don't use get_files_struct
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_file
proc/fd: In fdinfo seq_show don't use get_files_struct
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance.
Instead hold task_lock for the duration that task->files needs to be stable in seq_show. The task_lock was already taken in get_files_struct, and so skipping get_files_struct performs less work overall, and avoids the problems with the files_struct reference count.
[1] https://lkml.kernel.org/r/[email protected] Suggested-by: Oleg Nesterov <[email protected]> Acked-by: Christian Brauner <[email protected]> v1: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|
| #
5b17b618 |
| 20-Nov-2020 |
Eric W. Biederman <[email protected]> |
proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_
proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance.
Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving the checking for the maximum file descritor into the generic code, and by remvoing the need for capturing and releasing a reference on files_struct.
As task_lookup_fd_rcu may update the fd ctx->pos has been changed to be the fd +2 after task_lookup_fd_rcu returns.
[1] https://lkml.kernel.org/r/[email protected] Suggested-by: Oleg Nesterov <[email protected]> Tested-by: Andy Lavr <[email protected]> v1: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|
| #
64eb661f |
| 20-Nov-2020 |
Eric W. Biederman <[email protected]> |
proc/fd: In tid_fd_mode use task_lookup_fd_rcu
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct
proc/fd: In tid_fd_mode use task_lookup_fd_rcu
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance.
Instead of manually coding finding the files struct for a task and then calling files_lookup_fd_rcu, use the helper task_lookup_fd_rcu that combines those to steps. Making the code simpler and removing the need to get a reference on a files_struct.
[1] https://lkml.kernel.org/r/[email protected] Suggested-by: Oleg Nesterov <[email protected]> Acked-by: Christian Brauner <[email protected]> v1: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|
| #
f36c2943 |
| 20-Nov-2020 |
Eric W. Biederman <[email protected]> |
file: Replace fcheck_files with files_lookup_fd_rcu
This change renames fcheck_files to files_lookup_fd_rcu. All of the remaining callers take the rcu_read_lock before calling this function so the
file: Replace fcheck_files with files_lookup_fd_rcu
This change renames fcheck_files to files_lookup_fd_rcu. All of the remaining callers take the rcu_read_lock before calling this function so the _rcu suffix is appropriate. This change also tightens up the debug check to verify that all callers hold the rcu_read_lock.
All callers that used to call files_check with the files->file_lock held have now been changed to call files_lookup_fd_locked.
This change of name has helped remind me of which locks and which guarantees are in place helping me to catch bugs later in the patchset.
The need for better names became apparent in the last round of discussion of this set of changes[1].
[1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|