History log of /linux-6.15/include/linux/namei.h (Results 1 – 25 of 109)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3
# c86b300b 14-Apr-2025 Christian Brauner <[email protected]>

fs: add kern_path_locked_negative()

The audit code relies on the fact that kern_path_locked() returned a
path even for a negative dentry. If it doesn't find a valid dentry it
immediately calls:

fs: add kern_path_locked_negative()

The audit code relies on the fact that kern_path_locked() returned a
path even for a negative dentry. If it doesn't find a valid dentry it
immediately calls:

audit_find_parent(d_backing_inode(parent_path.dentry));

which assumes that parent_path.dentry is still valid. But it isn't since
kern_path_locked() has been changed to path_put() also for a negative
dentry.

Fix this by adding a helper that implements the required audit semantics
and allows us to fix the immediate bleeding. We can find a unified
solution for this afterwards.

Link: https://lore.kernel.org/20250414-rennt-wimmeln-f186c3a780f1@brauner
Fixes: 1c3cb50b58c3 ("VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry")
Reported-and-tested-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2
# 2c3230fb 06-Feb-2025 NeilBrown <[email protected]>

VFS: repack LOOKUP_ bit flags.

The LOOKUP_ bits are not in order, which can make it awkward when adding
new bits. Two bits have recently been added to the end which makes them
look like "scoping fl

VFS: repack LOOKUP_ bit flags.

The LOOKUP_ bits are not in order, which can make it awkward when adding
new bits. Two bits have recently been added to the end which makes them
look like "scoping flags", but in fact they aren't.

Also LOOKUP_PARENT is described as "internal use only" but is used in
fs/nfs/

This patch:
- Moves these three flags into the "pathwalk mode" section
- changes all bits to use the BIT(n) macro
- Allocates bits in order leaving gaps between the sections,
and documents those gaps.

Signed-off-by: NeilBrown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# dff60734 04-Jun-2024 Mateusz Guzik <[email protected]>

vfs: retire user_path_at_empty and drop empty arg from getname_flags

No users after do_readlinkat started doing the job on its own.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lo

vfs: retire user_path_at_empty and drop empty arg from getname_flags

No users after do_readlinkat started doing the job on its own.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4
# 42bd2af5 11-Apr-2024 Linus Torvalds <[email protected]>

vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements

"The definition of insanity is doing the same thing over and over
again and expecting different results”

We've tried to do this

vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements

"The definition of insanity is doing the same thing over and over
again and expecting different results”

We've tried to do this before, most recently with commit bb2314b47996
("fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink") about a
decade ago.

But the effort goes back even further than that, eg this thread back
from 1998 that is so old that we don't even have it archived in lore:

https://lkml.org/lkml/1998/3/10/108

which also points out some of the reasons why it's dangerous.

Or, how about then in 2003:

https://lkml.org/lkml/2003/4/6/112

where we went through some of the same arguments, just wirh different
people involved.

In particular, having access to a file descriptor does not necessarily
mean that you have access to the path that was used for lookup, and
there may be very good reasons why you absolutely must not have access
to a path to said file.

For example, if we were passed a file descriptor from the outside into
some limited environment (think chroot, but also user namespaces etc) a
'flink()' system call could now make that file visible inside a context
where it's not supposed to be visible.

In the process the user may also be able to re-open it with permissions
that the original file descriptor did not have (eg a read-only file
descriptor may be associated with an underlying file that is writable).

Another variation on this is if somebody else (typically root) opens a
file in a directory that is not accessible to others, and passes the
file descriptor on as a read-only file. Again, the access to the file
descriptor does not imply that you should have access to a path to the
file in the filesystem.

So while we have tried this several times in the past, it never works.

The last time we did this, that commit bb2314b47996 quickly got reverted
again in commit f0cc6ffb8ce8 (Revert "fs: Allow unprivileged linkat(...,
AT_EMPTY_PATH) aka flink"), with a note saying "We may re-do this once
the whole discussion about the interface is done".

Well, the discussion is long done, and didn't come to any resolution.
There's no question that 'flink()' would be a useful operation, but it's
a dangerous one.

However, it does turn out that since 2008 (commit d76b0d9b2d87: "CRED:
Use creds in file structs") we have had a fairly straightforward way to
check whether the file descriptor was opened by the same credentials as
the credentials of the flink().

That allows the most common patterns that people want to use, which tend
to be to either open the source carefully (ie using the openat2()
RESOLVE_xyz flags, and/or checking ownership with fstat() before
linking), or to use O_TMPFILE and fill in the file contents before it's
exposed to the world with linkat().

But it also means that if the file descriptor was opened by somebody
else, or we've gone through a credentials change since, the operation no
longer works (unless we have CAP_DAC_READ_SEARCH capabilities in the
opener's user namespace, as before).

Note that the credential equality check is done by using pointer
equality, which means that it's not enough that you have effectively the
same user - they have to be literally identical, since our credentials
are using copy-on-write semantics.

So you can't change your credentials to something else and try to change
it back to the same ones between the open() and the linkat(). This is
not meant to be some kind of generic permission check, this is literally
meant as a "the open and link calls are 'atomic' wrt user credentials"
check.

It also means that you can't just move things between namespaces,
because the credentials aren't just a list of uid's and gid's: they
includes the pointer to the user_ns that the capabilities are relative
to.

So let's try this one more time and see if maybe this approach ends up
being workable after all.

Cc: Andrew Lutomirski <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Peter Anvin <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[brauner: relax capability check to opener of the file]
Link: https://lore.kernel.org/all/20231113-undenkbar-gediegen-efde5f1c34bc@brauner
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2
# 74d016ec 16-Nov-2023 Al Viro <[email protected]>

new helper: user_path_locked_at()

Equivalent of kern_path_locked() taking dfd/userland name.
User introduced in the next commit.

Signed-off-by: Al Viro <[email protected]>


Revision tags: v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5
# 95e93d17 04-Oct-2023 Mateusz Guzik <[email protected]>

vfs: predict the error in retry_estale as unlikely

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian

vfs: predict the error in retry_estale as unlikely

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.6-rc4, v6.6-rc3, v6.6-rc2
# 5aa8fd9c 12-Sep-2023 Jeff Layton <[email protected]>

fs: add a new SB_I_NOUMASK flag

SB_POSIXACL must be set when a filesystem supports POSIX ACLs, but NFSv4
also sets this flag to prevent the VFS from applying the umask on
newly-created files. NFSv4

fs: add a new SB_I_NOUMASK flag

SB_POSIXACL must be set when a filesystem supports POSIX ACLs, but NFSv4
also sets this flag to prevent the VFS from applying the umask on
newly-created files. NFSv4 doesn't support POSIX ACLs however, which
causes confusion when other subsystems try to test for them.

Add a new SB_I_NOUMASK flag that allows filesystems to opt-in to umask
stripping without advertising support for POSIX ACLs. Set the new flag
on NFSv4 instead of SB_POSIXACL.

Also, move mode_strip_umask to namei.h and convert init_mknod and
init_mkdir to use it.

Signed-off-by: Jeff Layton <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3
# 74d7970f 21-Apr-2023 Namjae Jeon <[email protected]>

ksmbd: fix racy issue from using ->d_parent and ->d_name

Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name
in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_renam

ksmbd: fix racy issue from using ->d_parent and ->d_name

Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name
in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child()
to lock stable parent while underlying rename racy.
Introduce vfs_path_parent_lookup helper to avoid out of share access and
export vfs functions like the following ones to use
vfs_path_parent_lookup().
- rename __lookup_hash() to lookup_one_qstr_excl().
- export lookup_one_qstr_excl().
- export getname_kernel() and putname().

vfs_path_parent_lookup() is used for parent lookup of destination file
using absolute pathname given from FILE_RENAME_INFORMATION request.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

show more ...


Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3
# 9bc37e04 15-Mar-2023 Al Viro <[email protected]>

fs: introduce lock_rename_child() helper

Pass the dentry of a source file and the dentry of a destination directory
to lock parent inodes for rename. As soon as this function returns,
->d_parent of

fs: introduce lock_rename_child() helper

Pass the dentry of a source file and the dentry of a destination directory
to lock parent inodes for rename. As soon as this function returns,
->d_parent of the source file dentry is stable and inodes are properly
locked for calling vfs-rename. This helper is needed for ksmbd server.
rename request of SMB protocol has to rename an opened file, no matter
which directory it's in.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# 211db0ac 15-Mar-2023 Namjae Jeon <[email protected]>

ksmbd: remove internal.h include

Since vfs_path_lookup is exported, It should not be internal.
Move vfs_path_lookup prototype in internal.h to linux/namei.h.

Suggested-by: Al Viro <[email protected]

ksmbd: remove internal.h include

Since vfs_path_lookup is exported, It should not be internal.
Move vfs_path_lookup prototype in internal.h to linux/namei.h.

Suggested-by: Al Viro <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1
# e1f19857 07-Dec-2022 Richard Weinberger <[email protected]>

fs: namei: Allow follow_down() to uncover auto mounts

This function is only used by NFSD to cross mount points.
If a mount point is of type auto mount, follow_down() will
not uncover it. Add LOOKUP_

fs: namei: Allow follow_down() to uncover auto mounts

This function is only used by NFSD to cross mount points.
If a mount point is of type auto mount, follow_down() will
not uncover it. Add LOOKUP_AUTOMOUNT to the lookup flags
to have ->d_automount() called when NFSD walks down the
mount tree.

Signed-off-by: Richard Weinberger <[email protected]>
Reviewed-by: Ian Kent <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

show more ...


# 4609e1f1 13-Jan-2023 Christian Brauner <[email protected]>

fs: port ->permission() to pass mnt_idmap

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is j

fs: port ->permission() to pass mnt_idmap

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christian Brauner (Microsoft) <[email protected]>

show more ...


Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1
# ea4af4aa 04-Aug-2022 Al Viro <[email protected]>

nd_jump_link(): constify path

Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
Signed-off-by: Al Viro <[email protected]>


Revision tags: v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2
# 00675017 04-Apr-2022 Christian Brauner <[email protected]>

fs: add two trivial lookup helpers

Similar to the addition of lookup_one() add a version of
lookup_one_unlocked() and lookup_one_positive_unlocked() that take
idmapped mounts into account. This is r

fs: add two trivial lookup helpers

Similar to the addition of lookup_one() add a version of
lookup_one_unlocked() and lookup_one_positive_unlocked() that take
idmapped mounts into account. This is required to port overlay to
support idmapped base layers.

Cc: <[email protected]>
Tested-by: Giuseppe Scrivano <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>

show more ...


Revision tags: v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4
# c2fd68b6 27-Jul-2021 Christian Brauner <[email protected]>

namei: add mapping aware lookup helper

Various filesystems rely on the lookup_one_len() helper to lookup a
single path component relative to a well-known starting point. Allow
such filesystems to su

namei: add mapping aware lookup helper

Various filesystems rely on the lookup_one_len() helper to lookup a
single path component relative to a well-known starting point. Allow
such filesystems to support idmapped mounts by adding a version of this
helper to take the idmap into account when calling inode_permission().
This change is a required to let btrfs (and other filesystems) support
idmapped mounts.

Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: [email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: David Sterba <[email protected]>

show more ...


Revision tags: v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6
# bcba1e7d 02-Apr-2021 Al Viro <[email protected]>

take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space

Separate field in nameidata (nd->state) holding the flags that
should be internal-only - that way we both get some spare bits
in LOOKUP

take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space

Separate field in nameidata (nd->state) holding the flags that
should be internal-only - that way we both get some spare bits
in LOOKUP_... and get simpler rules for nd->root lifetime rules,
since we can set the replacement of LOOKUP_ROOT (ND_ROOT_PRESET)
at the same time we set nd->root.

Signed-off-by: Al Viro <[email protected]>

show more ...


Revision tags: v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1
# 6c6ec2b0 17-Dec-2020 Jens Axboe <[email protected]>

fs: add support for LOOKUP_CACHED

io_uring always punts opens to async context, since there's no control
over whether the lookup blocks or not. Add LOOKUP_CACHED to support
just doing the fast RCU b

fs: add support for LOOKUP_CACHED

io_uring always punts opens to async context, since there's no control
over whether the lookup blocks or not. Add LOOKUP_CACHED to support
just doing the fast RCU based lookups, which we know will not block. If
we can do a cached path resolution of the filename, then we don't have
to always punt lookups for a worker.

During path resolution, we always do LOOKUP_RCU first. If that fails and
we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.

Cc: Al Viro <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7
# b4c03536 19-Jan-2020 Al Viro <[email protected]>

sanitize handling of nd->last_type, kill LAST_BIND

->last_type values are set in 3 places: path_init() (sets to LAST_ROOT),
link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).

The are

sanitize handling of nd->last_type, kill LAST_BIND

->last_type values are set in 3 places: path_init() (sets to LAST_ROOT),
link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).

The are checked in walk_component(), lookup_last() and do_last().
They also get copied to the caller by filename_parentat(). In the last
3 cases the value is what we had at the return from link_path_walk().
In case of walk_component() it's either directly downstream from
assignment in link_path_walk() or, when called by lookup_last(), the
value we have at the return from link_path_walk().

The value at the entry into link_path_walk() can survive to return only
if the pathname contains nothing but slashes. Note that pick_link()
never returns such - pure jumps are handled directly. So for the calls
of link_path_walk() for trailing symlinks it does not matter what value
had been there at the entry; the value at the return won't depend upon it.

There are 3 call chains that might have pick_link() storing LAST_BIND:

1) pick_link() from step_into() from walk_component() from
link_path_walk(). In that case we will either be parsing the next
component immediately after return into link_path_walk(), which will
overwrite the ->last_type before anyone has a chance to look at it,
or we'll fail, in which case nobody will be looking at ->last_type at all.

2) pick_link() from step_into() from walk_component() from lookup_last().
The value is never looked at due to the above; it won't affect the value
seen at return from any link_path_walk().

3) pick_link() from step_into() from do_last(). Ditto.

In other words, assignemnt in pick_link() is pointless, and so is
LAST_BIND itself; nothing ever looks at that value. Kill it off.
And make link_path_walk() _always_ assign ->last_type - in the only
case when the value at the entry might survive to the return that value
is always LAST_ROOT, inherited from path_init(). Move that assignment
from path_init() into the beginning of link_path_walk(), to consolidate
the things.

Historical note: LAST_BIND used to be used for the kludge with trailing
pure jump symlinks (extra iteration through the top-level loop).
No point keeping it anymore...

Signed-off-by: Al Viro <[email protected]>

show more ...


Revision tags: v5.5-rc6
# 161aff1d 12-Jan-2020 Al Viro <[email protected]>

LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()

New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
IOW, traverse mounts at the final point and skip revalidation

LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()

New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
IOW, traverse mounts at the final point and skip revalidation of the
location where it ends up.

Signed-off-by: Al Viro <[email protected]>

show more ...


# c64cd6e3 10-Jan-2020 Al Viro <[email protected]>

reimplement path_mountpoint() with less magic

... and get rid of a bunch of bugs in it. Background:
the reason for path_mountpoint() is that umount() really doesn't
want attempts to revalidate the

reimplement path_mountpoint() with less magic

... and get rid of a bunch of bugs in it. Background:
the reason for path_mountpoint() is that umount() really doesn't
want attempts to revalidate the root of what it's trying to umount.
The thing we want to avoid actually happen from complete_walk();
solution was to do something parallel to normal path_lookupat()
and it both went overboard and got the boilerplate subtly
(and not so subtly) wrong.

A better solution is to do pretty much what the normal path_lookupat()
does, but instead of complete_walk() do unlazy_walk(). All it takes
to avoid that ->d_weak_revalidate() call... mountpoint_last() goes
away, along with everything it got wrong, and so does the magic around
LOOKUP_NO_REVAL.

Another source of bugs is that when we traverse mounts at the final
location (and we need to do that - umount . expects to get whatever's
overmounting ., if any, out of the lookup) we really ought to take
care of ->d_manage() - as it is, manual umount of autofs automount
in progress can lead to unpleasant surprises for the daemon. Easily
solved by using handle_lookup_down() instead of follow_mount().

Tested-by: Ian Kent <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


Revision tags: v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1
# 8db52c7e 06-Dec-2019 Aleksa Sarai <[email protected]>

namei: LOOKUP_IN_ROOT: chroot-like scoped resolution

/* Background. */
Container runtimes or other administrative management processes will
often interact with root filesystems while in the host mou

namei: LOOKUP_IN_ROOT: chroot-like scoped resolution

/* Background. */
Container runtimes or other administrative management processes will
often interact with root filesystems while in the host mount namespace,
because the cost of doing a chroot(2) on every operation is too
prohibitive (especially in Go, which cannot safely use vfork). However,
a malicious program can trick the management process into doing
operations on files outside of the root filesystem through careful
crafting of symlinks.

Most programs that need this feature have attempted to make this process
safe, by doing all of the path resolution in userspace (with symlinks
being scoped to the root of the malicious root filesystem).
Unfortunately, this method is prone to foot-guns and usually such
implementations have subtle security bugs.

Thus, what userspace needs is a way to resolve a path as though it were
in a chroot(2) -- with all absolute symlinks being resolved relative to
the dirfd root (and ".." components being stuck under the dirfd root).
It is much simpler and more straight-forward to provide this
functionality in-kernel (because it can be done far more cheaply and
correctly).

More classical applications that also have this problem (which have
their own potentially buggy userspace path sanitisation code) include
web servers, archive extraction tools, network file servers, and so on.

/* Userspace API. */
LOOKUP_IN_ROOT will be exposed to userspace through openat2(2).

/* Semantics. */
Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW),
LOOKUP_IN_ROOT applies to all components of the path.

With LOOKUP_IN_ROOT, any path component which attempts to cross the
starting point of the pathname lookup (the dirfd passed to openat) will
remain at the starting point. Thus, all absolute paths and symlinks will
be scoped within the starting point.

There is a slight change in behaviour regarding pathnames -- if the
pathname is absolute then the dirfd is still used as the root of
resolution of LOOKUP_IN_ROOT is specified (this is to avoid obvious
foot-guns, at the cost of a minor API inconsistency).

As with LOOKUP_BENEATH, Jann's security concern about ".."[1] applies to
LOOKUP_IN_ROOT -- therefore ".." resolution is blocked. This restriction
will be lifted in a future patch, but requires more work to ensure that
permitting ".." is done safely.

Magic-link jumps are also blocked, because they can beam the path lookup
across the starting point. It would be possible to detect and block
only the "bad" crossings with path_is_under() checks, but it's unclear
whether it makes sense to permit magic-links at all. However, userspace
is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that
magic-link crossing is entirely disabled.

/* Testing. */
LOOKUP_IN_ROOT is tested as part of the openat2(2) selftests.

[1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/

Cc: Christian Brauner <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# adb21d2b 06-Dec-2019 Aleksa Sarai <[email protected]>

namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution

/* Background. */
There are many circumstances when userspace wants to resolve a path and
ensure that it doesn't go outside of a particular ro

namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution

/* Background. */
There are many circumstances when userspace wants to resolve a path and
ensure that it doesn't go outside of a particular root directory during
resolution. Obvious examples include archive extraction tools, as well as
other security-conscious userspace programs. FreeBSD spun out O_BENEATH
from their Capsicum project[1,2], so it also seems reasonable to
implement similar functionality for Linux.

This is part of a refresh of Al's AT_NO_JUMPS patchset[3] (which was a
variation on David Drysdale's O_BENEATH patchset[4], which in turn was
based on the Capsicum project[5]).

/* Userspace API. */
LOOKUP_BENEATH will be exposed to userspace through openat2(2).

/* Semantics. */
Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW),
LOOKUP_BENEATH applies to all components of the path.

With LOOKUP_BENEATH, any path component which attempts to "escape" the
starting point of the filesystem lookup (the dirfd passed to openat)
will yield -EXDEV. Thus, all absolute paths and symlinks are disallowed.

Due to a security concern brought up by Jann[6], any ".." path
components are also blocked. This restriction will be lifted in a future
patch, but requires more work to ensure that permitting ".." is done
safely.

Magic-link jumps are also blocked, because they can beam the path lookup
across the starting point. It would be possible to detect and block
only the "bad" crossings with path_is_under() checks, but it's unclear
whether it makes sense to permit magic-links at all. However, userspace
is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that
magic-link crossing is entirely disabled.

/* Testing. */
LOOKUP_BENEATH is tested as part of the openat2(2) selftests.

[1]: https://reviews.freebsd.org/D2808
[2]: https://reviews.freebsd.org/D17547
[3]: https://lore.kernel.org/lkml/[email protected]/
[4]: https://lore.kernel.org/lkml/[email protected]/
[5]: https://lore.kernel.org/lkml/[email protected]/
[6]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/

Cc: Christian Brauner <[email protected]>
Suggested-by: David Drysdale <[email protected]>
Suggested-by: Al Viro <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# 72ba2929 06-Dec-2019 Aleksa Sarai <[email protected]>

namei: LOOKUP_NO_XDEV: block mountpoint crossing

/* Background. */
The need to contain path operations within a mountpoint has been a
long-standing usecase that userspace has historically implemente

namei: LOOKUP_NO_XDEV: block mountpoint crossing

/* Background. */
The need to contain path operations within a mountpoint has been a
long-standing usecase that userspace has historically implemented
manually with liberal usage of stat(). find, rsync, tar and
many other programs implement these semantics -- but it'd be much
simpler to have a fool-proof way of refusing to open a path if it
crosses a mountpoint.

This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a
variation on David Drysdale's O_BENEATH patchset[2], which in turn was
based on the Capsicum project[3]).

/* Userspace API. */
LOOKUP_NO_XDEV will be exposed to userspace through openat2(2).

/* Semantics. */
Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW),
LOOKUP_NO_XDEV applies to all components of the path.

With LOOKUP_NO_XDEV, any path component which crosses a mount-point
during path resolution (including "..") will yield an -EXDEV. Absolute
paths, absolute symlinks, and magic-links will only yield an -EXDEV if
the jump involved changing mount-points.

/* Testing. */
LOOKUP_NO_XDEV is tested as part of the openat2(2) selftests.

[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: https://lore.kernel.org/lkml/[email protected]/

Cc: Christian Brauner <[email protected]>
Suggested-by: David Drysdale <[email protected]>
Suggested-by: Al Viro <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# 4b99d499 06-Dec-2019 Aleksa Sarai <[email protected]>

namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution

/* Background. */
There has always been a special class of symlink-like objects in procfs
(and a few other pseudo-filesystems) which allow fo

namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution

/* Background. */
There has always been a special class of symlink-like objects in procfs
(and a few other pseudo-filesystems) which allow for non-lexical
resolution of paths using nd_jump_link(). These "magic-links" do not
follow traditional mount namespace boundaries, and have been used
consistently in container escape attacks because they can be used to
trick unsuspecting privileged processes into resolving unexpected paths.

It is also non-trivial for userspace to unambiguously avoid resolving
magic-links, because they do not have a reliable indication that they
are a magic-link (in order to verify them you'd have to manually open
the path given by readlink(2) and then verify that the two file
descriptors reference the same underlying file, which is plagued with
possible race conditions or supplementary attack scenarios).

It would therefore be very helpful for userspace to be able to avoid
these symlinks easily, thus hopefully removing a tool from attackers'
toolboxes.

This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a
variation on David Drysdale's O_BENEATH patchset[2], which in turn was
based on the Capsicum project[3]).

/* Userspace API. */
LOOKUP_NO_MAGICLINKS will be exposed to userspace through openat2(2).

/* Semantics. */
Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW),
LOOKUP_NO_MAGICLINKS applies to all components of the path.

With LOOKUP_NO_MAGICLINKS, any magic-link path component encountered
during path resolution will yield -ELOOP. The handling of ~LOOKUP_FOLLOW
for a trailing magic-link is identical to LOOKUP_NO_SYMLINKS.

LOOKUP_NO_SYMLINKS implies LOOKUP_NO_MAGICLINKS.

/* Testing. */
LOOKUP_NO_MAGICLINKS is tested as part of the openat2(2) selftests.

[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: https://lore.kernel.org/lkml/[email protected]/

Cc: Christian Brauner <[email protected]>
Suggested-by: David Drysdale <[email protected]>
Suggested-by: Al Viro <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# 27812141 06-Dec-2019 Aleksa Sarai <[email protected]>

namei: LOOKUP_NO_SYMLINKS: block symlink resolution

/* Background. */
Userspace cannot easily resolve a path without resolving symlinks, and
would have to manually resolve each path component with O

namei: LOOKUP_NO_SYMLINKS: block symlink resolution

/* Background. */
Userspace cannot easily resolve a path without resolving symlinks, and
would have to manually resolve each path component with O_PATH and
O_NOFOLLOW. This is clearly inefficient, and can be fairly easy to screw
up (resulting in possible security bugs). Linus has mentioned that Git
has a particular need for this kind of flag[1]. It also resolves a
fairly long-standing perceived deficiency in O_NOFOLLOw -- that it only
blocks the opening of trailing symlinks.

This is part of a refresh of Al's AT_NO_JUMPS patchset[2] (which was a
variation on David Drysdale's O_BENEATH patchset[3], which in turn was
based on the Capsicum project[4]).

/* Userspace API. */
LOOKUP_NO_SYMLINKS will be exposed to userspace through openat2(2).

/* Semantics. */
Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW),
LOOKUP_NO_SYMLINKS applies to all components of the path.

With LOOKUP_NO_SYMLINKS, any symlink path component encountered during
path resolution will yield -ELOOP. If the trailing component is a
symlink (and no other components were symlinks), then O_PATH|O_NOFOLLOW
will not error out and will instead provide a handle to the trailing
symlink -- without resolving it.

/* Testing. */
LOOKUP_NO_SYMLINKS is tested as part of the openat2(2) selftests.

[1]: https://lore.kernel.org/lkml/CA+55aFyOKM7DW7+0sdDFKdZFXgptb5r1id9=Wvhd8AgSP7qjwQ@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: https://lore.kernel.org/lkml/[email protected]/
[4]: https://lore.kernel.org/lkml/[email protected]/

Cc: Christian Brauner <[email protected]>
Suggested-by: Al Viro <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


12345