|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3 |
|
| #
c86b300b |
| 14-Apr-2025 |
Christian Brauner <[email protected]> |
fs: add kern_path_locked_negative()
The audit code relies on the fact that kern_path_locked() returned a path even for a negative dentry. If it doesn't find a valid dentry it immediately calls:
fs: add kern_path_locked_negative()
The audit code relies on the fact that kern_path_locked() returned a path even for a negative dentry. If it doesn't find a valid dentry it immediately calls:
audit_find_parent(d_backing_inode(parent_path.dentry));
which assumes that parent_path.dentry is still valid. But it isn't since kern_path_locked() has been changed to path_put() also for a negative dentry.
Fix this by adding a helper that implements the required audit semantics and allows us to fix the immediate bleeding. We can find a unified solution for this afterwards.
Link: https://lore.kernel.org/20250414-rennt-wimmeln-f186c3a780f1@brauner Fixes: 1c3cb50b58c3 ("VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry") Reported-and-tested-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
2c3230fb |
| 06-Feb-2025 |
NeilBrown <[email protected]> |
VFS: repack LOOKUP_ bit flags.
The LOOKUP_ bits are not in order, which can make it awkward when adding new bits. Two bits have recently been added to the end which makes them look like "scoping fl
VFS: repack LOOKUP_ bit flags.
The LOOKUP_ bits are not in order, which can make it awkward when adding new bits. Two bits have recently been added to the end which makes them look like "scoping flags", but in fact they aren't.
Also LOOKUP_PARENT is described as "internal use only" but is used in fs/nfs/
This patch: - Moves these three flags into the "pathwalk mode" section - changes all bits to use the BIT(n) macro - Allocates bits in order leaving gaps between the sections, and documents those gaps.
Signed-off-by: NeilBrown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
dff60734 |
| 04-Jun-2024 |
Mateusz Guzik <[email protected]> |
vfs: retire user_path_at_empty and drop empty arg from getname_flags
No users after do_readlinkat started doing the job on its own.
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lo
vfs: retire user_path_at_empty and drop empty arg from getname_flags
No users after do_readlinkat started doing the job on its own.
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
42bd2af5 |
| 11-Apr-2024 |
Linus Torvalds <[email protected]> |
vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
"The definition of insanity is doing the same thing over and over again and expecting different results”
We've tried to do this
vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
"The definition of insanity is doing the same thing over and over again and expecting different results”
We've tried to do this before, most recently with commit bb2314b47996 ("fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink") about a decade ago.
But the effort goes back even further than that, eg this thread back from 1998 that is so old that we don't even have it archived in lore:
https://lkml.org/lkml/1998/3/10/108
which also points out some of the reasons why it's dangerous.
Or, how about then in 2003:
https://lkml.org/lkml/2003/4/6/112
where we went through some of the same arguments, just wirh different people involved.
In particular, having access to a file descriptor does not necessarily mean that you have access to the path that was used for lookup, and there may be very good reasons why you absolutely must not have access to a path to said file.
For example, if we were passed a file descriptor from the outside into some limited environment (think chroot, but also user namespaces etc) a 'flink()' system call could now make that file visible inside a context where it's not supposed to be visible.
In the process the user may also be able to re-open it with permissions that the original file descriptor did not have (eg a read-only file descriptor may be associated with an underlying file that is writable).
Another variation on this is if somebody else (typically root) opens a file in a directory that is not accessible to others, and passes the file descriptor on as a read-only file. Again, the access to the file descriptor does not imply that you should have access to a path to the file in the filesystem.
So while we have tried this several times in the past, it never works.
The last time we did this, that commit bb2314b47996 quickly got reverted again in commit f0cc6ffb8ce8 (Revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink"), with a note saying "We may re-do this once the whole discussion about the interface is done".
Well, the discussion is long done, and didn't come to any resolution. There's no question that 'flink()' would be a useful operation, but it's a dangerous one.
However, it does turn out that since 2008 (commit d76b0d9b2d87: "CRED: Use creds in file structs") we have had a fairly straightforward way to check whether the file descriptor was opened by the same credentials as the credentials of the flink().
That allows the most common patterns that people want to use, which tend to be to either open the source carefully (ie using the openat2() RESOLVE_xyz flags, and/or checking ownership with fstat() before linking), or to use O_TMPFILE and fill in the file contents before it's exposed to the world with linkat().
But it also means that if the file descriptor was opened by somebody else, or we've gone through a credentials change since, the operation no longer works (unless we have CAP_DAC_READ_SEARCH capabilities in the opener's user namespace, as before).
Note that the credential equality check is done by using pointer equality, which means that it's not enough that you have effectively the same user - they have to be literally identical, since our credentials are using copy-on-write semantics.
So you can't change your credentials to something else and try to change it back to the same ones between the open() and the linkat(). This is not meant to be some kind of generic permission check, this is literally meant as a "the open and link calls are 'atomic' wrt user credentials" check.
It also means that you can't just move things between namespaces, because the credentials aren't just a list of uid's and gid's: they includes the pointer to the user_ns that the capabilities are relative to.
So let's try this one more time and see if maybe this approach ends up being workable after all.
Cc: Andrew Lutomirski <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Peter Anvin <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected] [brauner: relax capability check to opener of the file] Link: https://lore.kernel.org/all/20231113-undenkbar-gediegen-efde5f1c34bc@brauner Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2 |
|
| #
74d016ec |
| 16-Nov-2023 |
Al Viro <[email protected]> |
new helper: user_path_locked_at()
Equivalent of kern_path_locked() taking dfd/userland name. User introduced in the next commit.
Signed-off-by: Al Viro <[email protected]>
|
|
Revision tags: v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5 |
|
| #
95e93d17 |
| 04-Oct-2023 |
Mateusz Guzik <[email protected]> |
vfs: predict the error in retry_estale as unlikely
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian
vfs: predict the error in retry_estale as unlikely
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc4, v6.6-rc3, v6.6-rc2 |
|
| #
5aa8fd9c |
| 12-Sep-2023 |
Jeff Layton <[email protected]> |
fs: add a new SB_I_NOUMASK flag
SB_POSIXACL must be set when a filesystem supports POSIX ACLs, but NFSv4 also sets this flag to prevent the VFS from applying the umask on newly-created files. NFSv4
fs: add a new SB_I_NOUMASK flag
SB_POSIXACL must be set when a filesystem supports POSIX ACLs, but NFSv4 also sets this flag to prevent the VFS from applying the umask on newly-created files. NFSv4 doesn't support POSIX ACLs however, which causes confusion when other subsystems try to test for them.
Add a new SB_I_NOUMASK flag that allows filesystems to opt-in to umask stripping without advertising support for POSIX ACLs. Set the new flag on NFSv4 instead of SB_POSIXACL.
Also, move mode_strip_umask to namei.h and convert init_mknod and init_mkdir to use it.
Signed-off-by: Jeff Layton <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3 |
|
| #
74d7970f |
| 21-Apr-2023 |
Namjae Jeon <[email protected]> |
ksmbd: fix racy issue from using ->d_parent and ->d_name
Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_renam
ksmbd: fix racy issue from using ->d_parent and ->d_name
Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child() to lock stable parent while underlying rename racy. Introduce vfs_path_parent_lookup helper to avoid out of share access and export vfs functions like the following ones to use vfs_path_parent_lookup(). - rename __lookup_hash() to lookup_one_qstr_excl(). - export lookup_one_qstr_excl(). - export getname_kernel() and putname().
vfs_path_parent_lookup() is used for parent lookup of destination file using absolute pathname given from FILE_RENAME_INFORMATION request.
Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3 |
|
| #
9bc37e04 |
| 15-Mar-2023 |
Al Viro <[email protected]> |
fs: introduce lock_rename_child() helper
Pass the dentry of a source file and the dentry of a destination directory to lock parent inodes for rename. As soon as this function returns, ->d_parent of
fs: introduce lock_rename_child() helper
Pass the dentry of a source file and the dentry of a destination directory to lock parent inodes for rename. As soon as this function returns, ->d_parent of the source file dentry is stable and inodes are properly locked for calling vfs-rename. This helper is needed for ksmbd server. rename request of SMB protocol has to rename an opened file, no matter which directory it's in.
Signed-off-by: Al Viro <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
211db0ac |
| 15-Mar-2023 |
Namjae Jeon <[email protected]> |
ksmbd: remove internal.h include
Since vfs_path_lookup is exported, It should not be internal. Move vfs_path_lookup prototype in internal.h to linux/namei.h.
Suggested-by: Al Viro <[email protected]
ksmbd: remove internal.h include
Since vfs_path_lookup is exported, It should not be internal. Move vfs_path_lookup prototype in internal.h to linux/namei.h.
Suggested-by: Al Viro <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1 |
|
| #
e1f19857 |
| 07-Dec-2022 |
Richard Weinberger <[email protected]> |
fs: namei: Allow follow_down() to uncover auto mounts
This function is only used by NFSD to cross mount points. If a mount point is of type auto mount, follow_down() will not uncover it. Add LOOKUP_
fs: namei: Allow follow_down() to uncover auto mounts
This function is only used by NFSD to cross mount points. If a mount point is of type auto mount, follow_down() will not uncover it. Add LOOKUP_AUTOMOUNT to the lookup flags to have ->d_automount() called when NFSD walks down the mount tree.
Signed-off-by: Richard Weinberger <[email protected]> Reviewed-by: Ian Kent <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Acked-by: Al Viro <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
show more ...
|
| #
4609e1f1 |
| 13-Jan-2023 |
Christian Brauner <[email protected]> |
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is j
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap.
Acked-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1 |
|
| #
ea4af4aa |
| 04-Aug-2022 |
Al Viro <[email protected]> |
nd_jump_link(): constify path
Reviewed-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Al Viro <[email protected]>
|
|
Revision tags: v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2 |
|
| #
00675017 |
| 04-Apr-2022 |
Christian Brauner <[email protected]> |
fs: add two trivial lookup helpers
Similar to the addition of lookup_one() add a version of lookup_one_unlocked() and lookup_one_positive_unlocked() that take idmapped mounts into account. This is r
fs: add two trivial lookup helpers
Similar to the addition of lookup_one() add a version of lookup_one_unlocked() and lookup_one_positive_unlocked() that take idmapped mounts into account. This is required to port overlay to support idmapped base layers.
Cc: <[email protected]> Tested-by: Giuseppe Scrivano <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4 |
|
| #
c2fd68b6 |
| 27-Jul-2021 |
Christian Brauner <[email protected]> |
namei: add mapping aware lookup helper
Various filesystems rely on the lookup_one_len() helper to lookup a single path component relative to a well-known starting point. Allow such filesystems to su
namei: add mapping aware lookup helper
Various filesystems rely on the lookup_one_len() helper to lookup a single path component relative to a well-known starting point. Allow such filesystems to support idmapped mounts by adding a version of this helper to take the idmap into account when calling inode_permission(). This change is a required to let btrfs (and other filesystems) support idmapped mounts.
Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: [email protected] Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: David Sterba <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6 |
|
| #
bcba1e7d |
| 02-Apr-2021 |
Al Viro <[email protected]> |
take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space
Separate field in nameidata (nd->state) holding the flags that should be internal-only - that way we both get some spare bits in LOOKUP
take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space
Separate field in nameidata (nd->state) holding the flags that should be internal-only - that way we both get some spare bits in LOOKUP_... and get simpler rules for nd->root lifetime rules, since we can set the replacement of LOOKUP_ROOT (ND_ROOT_PRESET) at the same time we set nd->root.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1 |
|
| #
6c6ec2b0 |
| 17-Dec-2020 |
Jens Axboe <[email protected]> |
fs: add support for LOOKUP_CACHED
io_uring always punts opens to async context, since there's no control over whether the lookup blocks or not. Add LOOKUP_CACHED to support just doing the fast RCU b
fs: add support for LOOKUP_CACHED
io_uring always punts opens to async context, since there's no control over whether the lookup blocks or not. Add LOOKUP_CACHED to support just doing the fast RCU based lookups, which we know will not block. If we can do a cached path resolution of the filename, then we don't have to always punt lookups for a worker.
During path resolution, we always do LOOKUP_RCU first. If that fails and we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.
Cc: Al Viro <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7 |
|
| #
b4c03536 |
| 19-Jan-2020 |
Al Viro <[email protected]> |
sanitize handling of nd->last_type, kill LAST_BIND
->last_type values are set in 3 places: path_init() (sets to LAST_ROOT), link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).
The are
sanitize handling of nd->last_type, kill LAST_BIND
->last_type values are set in 3 places: path_init() (sets to LAST_ROOT), link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).
The are checked in walk_component(), lookup_last() and do_last(). They also get copied to the caller by filename_parentat(). In the last 3 cases the value is what we had at the return from link_path_walk(). In case of walk_component() it's either directly downstream from assignment in link_path_walk() or, when called by lookup_last(), the value we have at the return from link_path_walk().
The value at the entry into link_path_walk() can survive to return only if the pathname contains nothing but slashes. Note that pick_link() never returns such - pure jumps are handled directly. So for the calls of link_path_walk() for trailing symlinks it does not matter what value had been there at the entry; the value at the return won't depend upon it.
There are 3 call chains that might have pick_link() storing LAST_BIND:
1) pick_link() from step_into() from walk_component() from link_path_walk(). In that case we will either be parsing the next component immediately after return into link_path_walk(), which will overwrite the ->last_type before anyone has a chance to look at it, or we'll fail, in which case nobody will be looking at ->last_type at all.
2) pick_link() from step_into() from walk_component() from lookup_last(). The value is never looked at due to the above; it won't affect the value seen at return from any link_path_walk().
3) pick_link() from step_into() from do_last(). Ditto.
In other words, assignemnt in pick_link() is pointless, and so is LAST_BIND itself; nothing ever looks at that value. Kill it off. And make link_path_walk() _always_ assign ->last_type - in the only case when the value at the entry might survive to the return that value is always LAST_ROOT, inherited from path_init(). Move that assignment from path_init() into the beginning of link_path_walk(), to consolidate the things.
Historical note: LAST_BIND used to be used for the kludge with trailing pure jump symlinks (extra iteration through the top-level loop). No point keeping it anymore...
Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v5.5-rc6 |
|
| #
161aff1d |
| 12-Jan-2020 |
Al Viro <[email protected]> |
LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
New LOOKUP flag, telling path_lookupat() to act as path_mountpointat(). IOW, traverse mounts at the final point and skip revalidation
LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
New LOOKUP flag, telling path_lookupat() to act as path_mountpointat(). IOW, traverse mounts at the final point and skip revalidation of the location where it ends up.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
c64cd6e3 |
| 10-Jan-2020 |
Al Viro <[email protected]> |
reimplement path_mountpoint() with less magic
... and get rid of a bunch of bugs in it. Background: the reason for path_mountpoint() is that umount() really doesn't want attempts to revalidate the
reimplement path_mountpoint() with less magic
... and get rid of a bunch of bugs in it. Background: the reason for path_mountpoint() is that umount() really doesn't want attempts to revalidate the root of what it's trying to umount. The thing we want to avoid actually happen from complete_walk(); solution was to do something parallel to normal path_lookupat() and it both went overboard and got the boilerplate subtly (and not so subtly) wrong.
A better solution is to do pretty much what the normal path_lookupat() does, but instead of complete_walk() do unlazy_walk(). All it takes to avoid that ->d_weak_revalidate() call... mountpoint_last() goes away, along with everything it got wrong, and so does the magic around LOOKUP_NO_REVAL.
Another source of bugs is that when we traverse mounts at the final location (and we need to do that - umount . expects to get whatever's overmounting ., if any, out of the lookup) we really ought to take care of ->d_manage() - as it is, manual umount of autofs automount in progress can lead to unpleasant surprises for the daemon. Easily solved by using handle_lookup_down() instead of follow_mount().
Tested-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1 |
|
| #
8db52c7e |
| 06-Dec-2019 |
Aleksa Sarai <[email protected]> |
namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
/* Background. */ Container runtimes or other administrative management processes will often interact with root filesystems while in the host mou
namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
/* Background. */ Container runtimes or other administrative management processes will often interact with root filesystems while in the host mount namespace, because the cost of doing a chroot(2) on every operation is too prohibitive (especially in Go, which cannot safely use vfork). However, a malicious program can trick the management process into doing operations on files outside of the root filesystem through careful crafting of symlinks.
Most programs that need this feature have attempted to make this process safe, by doing all of the path resolution in userspace (with symlinks being scoped to the root of the malicious root filesystem). Unfortunately, this method is prone to foot-guns and usually such implementations have subtle security bugs.
Thus, what userspace needs is a way to resolve a path as though it were in a chroot(2) -- with all absolute symlinks being resolved relative to the dirfd root (and ".." components being stuck under the dirfd root). It is much simpler and more straight-forward to provide this functionality in-kernel (because it can be done far more cheaply and correctly).
More classical applications that also have this problem (which have their own potentially buggy userspace path sanitisation code) include web servers, archive extraction tools, network file servers, and so on.
/* Userspace API. */ LOOKUP_IN_ROOT will be exposed to userspace through openat2(2).
/* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_IN_ROOT applies to all components of the path.
With LOOKUP_IN_ROOT, any path component which attempts to cross the starting point of the pathname lookup (the dirfd passed to openat) will remain at the starting point. Thus, all absolute paths and symlinks will be scoped within the starting point.
There is a slight change in behaviour regarding pathnames -- if the pathname is absolute then the dirfd is still used as the root of resolution of LOOKUP_IN_ROOT is specified (this is to avoid obvious foot-guns, at the cost of a minor API inconsistency).
As with LOOKUP_BENEATH, Jann's security concern about ".."[1] applies to LOOKUP_IN_ROOT -- therefore ".." resolution is blocked. This restriction will be lifted in a future patch, but requires more work to ensure that permitting ".." is done safely.
Magic-link jumps are also blocked, because they can beam the path lookup across the starting point. It would be possible to detect and block only the "bad" crossings with path_is_under() checks, but it's unclear whether it makes sense to permit magic-links at all. However, userspace is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that magic-link crossing is entirely disabled.
/* Testing. */ LOOKUP_IN_ROOT is tested as part of the openat2(2) selftests.
[1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/
Cc: Christian Brauner <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
adb21d2b |
| 06-Dec-2019 |
Aleksa Sarai <[email protected]> |
namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
/* Background. */ There are many circumstances when userspace wants to resolve a path and ensure that it doesn't go outside of a particular ro
namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
/* Background. */ There are many circumstances when userspace wants to resolve a path and ensure that it doesn't go outside of a particular root directory during resolution. Obvious examples include archive extraction tools, as well as other security-conscious userspace programs. FreeBSD spun out O_BENEATH from their Capsicum project[1,2], so it also seems reasonable to implement similar functionality for Linux.
This is part of a refresh of Al's AT_NO_JUMPS patchset[3] (which was a variation on David Drysdale's O_BENEATH patchset[4], which in turn was based on the Capsicum project[5]).
/* Userspace API. */ LOOKUP_BENEATH will be exposed to userspace through openat2(2).
/* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_BENEATH applies to all components of the path.
With LOOKUP_BENEATH, any path component which attempts to "escape" the starting point of the filesystem lookup (the dirfd passed to openat) will yield -EXDEV. Thus, all absolute paths and symlinks are disallowed.
Due to a security concern brought up by Jann[6], any ".." path components are also blocked. This restriction will be lifted in a future patch, but requires more work to ensure that permitting ".." is done safely.
Magic-link jumps are also blocked, because they can beam the path lookup across the starting point. It would be possible to detect and block only the "bad" crossings with path_is_under() checks, but it's unclear whether it makes sense to permit magic-links at all. However, userspace is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that magic-link crossing is entirely disabled.
/* Testing. */ LOOKUP_BENEATH is tested as part of the openat2(2) selftests.
[1]: https://reviews.freebsd.org/D2808 [2]: https://reviews.freebsd.org/D17547 [3]: https://lore.kernel.org/lkml/[email protected]/ [4]: https://lore.kernel.org/lkml/[email protected]/ [5]: https://lore.kernel.org/lkml/[email protected]/ [6]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/
Cc: Christian Brauner <[email protected]> Suggested-by: David Drysdale <[email protected]> Suggested-by: Al Viro <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
72ba2929 |
| 06-Dec-2019 |
Aleksa Sarai <[email protected]> |
namei: LOOKUP_NO_XDEV: block mountpoint crossing
/* Background. */ The need to contain path operations within a mountpoint has been a long-standing usecase that userspace has historically implemente
namei: LOOKUP_NO_XDEV: block mountpoint crossing
/* Background. */ The need to contain path operations within a mountpoint has been a long-standing usecase that userspace has historically implemented manually with liberal usage of stat(). find, rsync, tar and many other programs implement these semantics -- but it'd be much simpler to have a fool-proof way of refusing to open a path if it crosses a mountpoint.
This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a variation on David Drysdale's O_BENEATH patchset[2], which in turn was based on the Capsicum project[3]).
/* Userspace API. */ LOOKUP_NO_XDEV will be exposed to userspace through openat2(2).
/* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_XDEV applies to all components of the path.
With LOOKUP_NO_XDEV, any path component which crosses a mount-point during path resolution (including "..") will yield an -EXDEV. Absolute paths, absolute symlinks, and magic-links will only yield an -EXDEV if the jump involved changing mount-points.
/* Testing. */ LOOKUP_NO_XDEV is tested as part of the openat2(2) selftests.
[1]: https://lore.kernel.org/lkml/[email protected]/ [2]: https://lore.kernel.org/lkml/[email protected]/ [3]: https://lore.kernel.org/lkml/[email protected]/
Cc: Christian Brauner <[email protected]> Suggested-by: David Drysdale <[email protected]> Suggested-by: Al Viro <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
4b99d499 |
| 06-Dec-2019 |
Aleksa Sarai <[email protected]> |
namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
/* Background. */ There has always been a special class of symlink-like objects in procfs (and a few other pseudo-filesystems) which allow fo
namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
/* Background. */ There has always been a special class of symlink-like objects in procfs (and a few other pseudo-filesystems) which allow for non-lexical resolution of paths using nd_jump_link(). These "magic-links" do not follow traditional mount namespace boundaries, and have been used consistently in container escape attacks because they can be used to trick unsuspecting privileged processes into resolving unexpected paths.
It is also non-trivial for userspace to unambiguously avoid resolving magic-links, because they do not have a reliable indication that they are a magic-link (in order to verify them you'd have to manually open the path given by readlink(2) and then verify that the two file descriptors reference the same underlying file, which is plagued with possible race conditions or supplementary attack scenarios).
It would therefore be very helpful for userspace to be able to avoid these symlinks easily, thus hopefully removing a tool from attackers' toolboxes.
This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a variation on David Drysdale's O_BENEATH patchset[2], which in turn was based on the Capsicum project[3]).
/* Userspace API. */ LOOKUP_NO_MAGICLINKS will be exposed to userspace through openat2(2).
/* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_MAGICLINKS applies to all components of the path.
With LOOKUP_NO_MAGICLINKS, any magic-link path component encountered during path resolution will yield -ELOOP. The handling of ~LOOKUP_FOLLOW for a trailing magic-link is identical to LOOKUP_NO_SYMLINKS.
LOOKUP_NO_SYMLINKS implies LOOKUP_NO_MAGICLINKS.
/* Testing. */ LOOKUP_NO_MAGICLINKS is tested as part of the openat2(2) selftests.
[1]: https://lore.kernel.org/lkml/[email protected]/ [2]: https://lore.kernel.org/lkml/[email protected]/ [3]: https://lore.kernel.org/lkml/[email protected]/
Cc: Christian Brauner <[email protected]> Suggested-by: David Drysdale <[email protected]> Suggested-by: Al Viro <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
27812141 |
| 06-Dec-2019 |
Aleksa Sarai <[email protected]> |
namei: LOOKUP_NO_SYMLINKS: block symlink resolution
/* Background. */ Userspace cannot easily resolve a path without resolving symlinks, and would have to manually resolve each path component with O
namei: LOOKUP_NO_SYMLINKS: block symlink resolution
/* Background. */ Userspace cannot easily resolve a path without resolving symlinks, and would have to manually resolve each path component with O_PATH and O_NOFOLLOW. This is clearly inefficient, and can be fairly easy to screw up (resulting in possible security bugs). Linus has mentioned that Git has a particular need for this kind of flag[1]. It also resolves a fairly long-standing perceived deficiency in O_NOFOLLOw -- that it only blocks the opening of trailing symlinks.
This is part of a refresh of Al's AT_NO_JUMPS patchset[2] (which was a variation on David Drysdale's O_BENEATH patchset[3], which in turn was based on the Capsicum project[4]).
/* Userspace API. */ LOOKUP_NO_SYMLINKS will be exposed to userspace through openat2(2).
/* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_SYMLINKS applies to all components of the path.
With LOOKUP_NO_SYMLINKS, any symlink path component encountered during path resolution will yield -ELOOP. If the trailing component is a symlink (and no other components were symlinks), then O_PATH|O_NOFOLLOW will not error out and will instead provide a handle to the trailing symlink -- without resolving it.
/* Testing. */ LOOKUP_NO_SYMLINKS is tested as part of the openat2(2) selftests.
[1]: https://lore.kernel.org/lkml/CA+55aFyOKM7DW7+0sdDFKdZFXgptb5r1id9=Wvhd8AgSP7qjwQ@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/[email protected]/ [3]: https://lore.kernel.org/lkml/[email protected]/ [4]: https://lore.kernel.org/lkml/[email protected]/
Cc: Christian Brauner <[email protected]> Suggested-by: Al Viro <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|