History log of /linux-6.15/fs/stat.c (Results 1 – 25 of 116)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3
# 777d0961 17-Apr-2025 Christoph Hellwig <[email protected]>

fs: move the bdex_statx call to vfs_getattr_nosec

Currently bdex_statx is only called from the very high-level
vfs_statx_path function, and thus bypassing it for in-kernel calls
to vfs_getattr or vf

fs: move the bdex_statx call to vfs_getattr_nosec

Currently bdex_statx is only called from the very high-level
vfs_statx_path function, and thus bypassing it for in-kernel calls
to vfs_getattr or vfs_getattr_nosec.

This breaks querying the block ѕize of the underlying device in the
loop driver and also is a pitfall for any other new kernel caller.

Move the call into the lowest level helper to ensure all callers get
the right results.

Fixes: 2d985f8c6b91 ("vfs: support STATX_DIOALIGN on block devices")
Fixes: f4774e92aab8 ("loop: take the file system minimum dio alignment into account")
Reported-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13
# 0fac3ed4 19-Jan-2025 Su Hui <[email protected]>

fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()

Clang static checker(scan-build) warning:
fs/stat.c:287:21: warning: The left expression of the compound assignment is
an uninitia

fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()

Clang static checker(scan-build) warning:
fs/stat.c:287:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
287 | stat->result_mask |= STATX_MNT_ID_UNIQUE;
| ~~~~~~~~~~~~~~~~~ ^
fs/stat.c:290:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
290 | stat->result_mask |= STATX_MNT_ID;

When vfs_getattr() failed because of security_inode_getattr(), 'stat' is
uninitialized. In this case, there is a harmless garbage problem in
vfs_statx_path(). It's better to return error directly when
vfs_getattr() failed, avoiding garbage value and more clearly.

Signed-off-by: Su Hui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.13-rc7
# 7ed6cbe0 09-Jan-2025 Christoph Hellwig <[email protected]>

fs: add STATX_DIO_READ_ALIGN

Add a separate dio read align field, as many out of place write
file systems can easily do reads aligned to the device sector size,
but require bigger alignment for writ

fs: add STATX_DIO_READ_ALIGN

Add a separate dio read align field, as many out of place write
file systems can easily do reads aligned to the device sector size,
but require bigger alignment for writes.

This is usually papered over by falling back to buffered I/O for smaller
writes and doing read-modify-write cycles, but performance for this
sucks, so applications benefit from knowing the actual write alignment.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: John Garry <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6
# 95f567f8 01-Nov-2024 Stefan Berger <[email protected]>

fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag

Commit 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to e

fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag

Commit 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to ensure that the
call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr.
Now, simplify the getattr interface functions of filesystems where the flag
AT_GETATTR_NOSEC is checked.

There is only a single caller of inode_operations getattr function and it
is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only
one from which the AT_GETATTR_NOSEC flag is passed from.

Two filesystems are checking this flag in .getattr and the flag is always
passed to them unconditionally from only vfs_getattr_nosec:

- ecryptfs: Simplify by always calling vfs_getattr_nosec in
ecryptfs_getattr. From there the flag is passed to no other
function and this function is not called otherwise.

- overlayfs: Simplify by always calling vfs_getattr_nosec in
ovl_getattr. From there the flag is passed to no other
function and this function is not called otherwise.

The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from
any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not
important inside this function. Also, since no filesystem is checking the
flag anymore, remove the flag entirely now, including the BUG_ON check that
never triggered.

The net change of the changes here combined with the original commit is
that ecryptfs and overlayfs do not call vfs_getattr but only
vfs_getattr_nosec.

Fixes: 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
Reported-by: Al Viro <[email protected]>
Closes: https://lore.kernel.org/linux-fsdevel/20241101011724.GN1350452@ZenIV/T/#u
Cc: Tyler Hicks <[email protected]>
Cc: [email protected]
Cc: Miklos Szeredi <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: [email protected]
Cc: Christian Brauner <[email protected]>
Cc: [email protected]
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


Revision tags: v6.12-rc5, v6.12-rc4
# 0dd4fb73 20-Oct-2024 Al Viro <[email protected]>

fs/stat.c: switch to CLASS(fd_raw)

... and use fd_empty() consistently

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>


Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1
# 88a20626 20-Sep-2024 Al Viro <[email protected]>

kill getname_statx_lookup_flags()

LOOKUP_EMPTY is ignored by the only remaining user, and without
that 'getname_' prefix makes no sense.

Remove LOOKUP_EMPTY part, rename to statx_lookup_flags() and

kill getname_statx_lookup_flags()

LOOKUP_EMPTY is ignored by the only remaining user, and without
that 'getname_' prefix makes no sense.

Remove LOOKUP_EMPTY part, rename to statx_lookup_flags() and make
static. It most likely is _not_ statx() specific, either, but
that's the next step.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# e896474f 08-Oct-2024 Al Viro <[email protected]>

getname_maybe_null() - the third variant of pathname copy-in

Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH
it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string,

getname_maybe_null() - the third variant of pathname copy-in

Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH
it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string,
ERR_PTR(-EFAULT) on NULL), with AT_EMPTY_PATH both empty string and
NULL are accepted.

Calling conventions: getname_maybe_null(user_pointer, flags) returns
* pointer to struct filename when non-empty string had been
successfully read
* ERR_PTR(...) on error
* NULL if an empty string or NULL pointer had been given
with AT_EMPTY_PATH in the flags argument.

It tries to avoid allocation in the last case; it's not always
able to do so, in which case the temporary struct filename instance
is freed and NULL returned anyway.

Fast path is inlined.

Signed-off-by: Al Viro <[email protected]>

show more ...


# c86e3c47 02-Oct-2024 Jeff Layton <[email protected]>

fs: tracepoints around multigrain timestamp events

Add some tracepoints around various multigrain timestamp events.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Darrick J. Wong <djw

fs: tracepoints around multigrain timestamp events

Add some tracepoints around various multigrain timestamp events.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Tested-by: Randy Dunlap <[email protected]> # documentation bits
Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 4e40eff0 02-Oct-2024 Jeff Layton <[email protected]>

fs: add infrastructure for multigrain timestamps

The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to o

fs: add infrastructure for multigrain timestamps

The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide when to invalidate the cache. Even with NFSv4, a lot of
exported filesystems don't properly support a change attribute and are
subject to the same problems with timestamp granularity. Other
applications have similar issues with timestamps (e.g backup
applications).

If fine-grained timestamps were always used, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.

What is needed is a way to only use fine-grained timestamps when they
are being actively queried. Use the (unused) top bit in
inode->i_ctime_nsec as a flag that indicates whether the current
timestamps have been queried via stat() or the like. When it's set,
allow the update to use a fine-grained timestamp iff it's necessary to
make the ctime show a different value.

If it has been queried, then first see whether the current coarse time
is later than the existing ctime. If it is, accept that value. If it
isn't, then get a fine-grained timestamp and attempt to stamp the inode
ctime with that value. If that races with another concurrent stamp, then
abandon the update and take the new value without retrying.

Filesystems can opt into this by setting the FS_MGTIME fstype flag.
Others should be unaffected (other than being subject to the same floor
value as multigrain filesystems).

Tested-by: Randy Dunlap <[email protected]> # documentation bits
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 1da91ea8 31-May-2024 Al Viro <[email protected]>

introduce fd_file(), convert all accessors to it.

For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are ve

introduce fd_file(), convert all accessors to it.

For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).

NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).

[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


# 0ef625bb 25-Jun-2024 Mateusz Guzik <[email protected]>

vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)

The newly used helper also checks for empty ("") paths.

NULL paths with any flag value other than AT_EMPTY_PATH go the usual
route and end up with

vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)

The newly used helper also checks for empty ("") paths.

NULL paths with any flag value other than AT_EMPTY_PATH go the usual
route and end up with -EFAULT to retain compatibility (Rust is abusing
calls of the sort to detect availability of statx).

This avoids path lookup code, lockref management, memory allocation and
in case of NULL path userspace memory access (which can be quite
expensive with SMAP on x86_64).

Benchmarked with statx(..., AT_EMPTY_PATH, ...) running on Sapphire
Rapids, with the "" path for the first two cases and NULL for the last
one.

Results in ops/s:
stock: 4231237
pre-check: 5944063 (+40%)
NULL path: 6601619 (+11%/+56%)

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Xi Ruoyao <[email protected]>
[brauner: use path_mounted() and other tweaks]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.10-rc1, v6.9, v6.9-rc7
# 27a2d0cb 30-Apr-2024 Christian Brauner <[email protected]>

stat: use vfs_empty_path() helper

Use the newly added helper for this.

Signed-off-by: Christian Brauner <[email protected]>


# 9abcfbd2 20-Jun-2024 Prasad Singamsetty <[email protected]>

block: Add atomic write support for statx

Extend statx system call to return additional info for atomic write support
support if the specified file is a block device.

Reviewed-by: Martin K. Peterse

block: Add atomic write support for statx

Extend statx system call to return additional info for atomic write support
support if the specified file is a block device.

Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: Prasad Singamsetty <[email protected]>
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 0f9ca80f 20-Jun-2024 Prasad Singamsetty <[email protected]>

fs: Add initial atomic write support info to statx

Extend statx system call to return additional info for atomic write support
support for a file.

Helper function generic_fill_statx_atomic_writes()

fs: Add initial atomic write support info to statx

Extend statx system call to return additional info for atomic write support
support for a file.

Helper function generic_fill_statx_atomic_writes() can be used by FSes to
fill in the relevant statx fields. For now atomic_write_segments_max will
always be 1, otherwise some rules would need to be imposed on iovec length
and alignment, which we don't want now.

Signed-off-by: Prasad Singamsetty <[email protected]>
jpg: relocate bdev support to another patch
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# dff60734 04-Jun-2024 Mateusz Guzik <[email protected]>

vfs: retire user_path_at_empty and drop empty arg from getname_flags

No users after do_readlinkat started doing the job on its own.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lo

vfs: retire user_path_at_empty and drop empty arg from getname_flags

No users after do_readlinkat started doing the job on its own.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 969ce92d 04-Jun-2024 Mateusz Guzik <[email protected]>

vfs: stop using user_path_at_empty in do_readlinkat

It is the only consumer and it saddles getname_flags with an argument set
to NULL by everyone else.

Instead the routine can do the empty check on

vfs: stop using user_path_at_empty in do_readlinkat

It is the only consumer and it saddles getname_flags with an argument set
to NULL by everyone else.

Instead the routine can do the empty check on its own.

Then user_path_at_empty can get retired and getname_flags can lose the
argument.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8
# 2a82bb02 08-Mar-2024 Kent Overstreet <[email protected]>

statx: stx_subvol

Add a new statx field for (sub)volume identifiers, as implemented by
btrfs and bcachefs.

This includes bcachefs support; we'll definitely want btrfs support as
well.

Link: https:

statx: stx_subvol

Add a new statx field for (sub)volume identifiers, as implemented by
btrfs and bcachefs.

This includes bcachefs support; we'll definitely want btrfs support as
well.

Link: https://lore.kernel.org/linux-fsdevel/2uvhm6gweyl7iyyp2xpfryvcu2g3padagaeqcbiavjyiis6prl@yjm725bizncq/
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6
# 376870aa 15-Dec-2023 Alexander Mikhalitsyn <[email protected]>

fs: fix doc comment typo fs tree wide

Do the replacement:
s/simply passs @nop_mnt_idmap/simply pass @nop_mnt_idmap/
in the fs/ tree.

Found by chance while working on support for idmapped mounts in

fs: fix doc comment typo fs tree wide

Do the replacement:
s/simply passs @nop_mnt_idmap/simply pass @nop_mnt_idmap/
in the fs/ tree.

Found by chance while working on support for idmapped mounts in fuse.

Cc: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6
# 98d2b430 25-Oct-2023 Miklos Szeredi <[email protected]>

add unique mount ID

If a mount is released then its mnt_id can immediately be reused. This is
bad news for user interfaces that want to uniquely identify a mount.

Implementing a unique mount ID is

add unique mount ID

If a mount is released then its mnt_id can immediately be reused. This is
bad news for user interfaces that want to uniquely identify a mount.

Implementing a unique mount ID is trivial (use a 64bit counter).
Unfortunately userspace assumes 32bit size and would overflow after the
counter reaches 2^32.

Introduce a new 64bit ID alongside the old one. Initialize the counter to
2^32, this guarantees that the old and new IDs are never mixed up.

Signed-off-by: Miklos Szeredi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Ian Kent <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.6-rc7, v6.6-rc6, v6.6-rc5
# 8a924db2 02-Oct-2023 Stefan Berger <[email protected]>

fs: Pass AT_GETATTR_NOSEC flag to getattr interface function

When vfs_getattr_nosec() calls a filesystem's getattr interface function
then the 'nosec' should propagate into this function so that
vfs

fs: Pass AT_GETATTR_NOSEC flag to getattr interface function

When vfs_getattr_nosec() calls a filesystem's getattr interface function
then the 'nosec' should propagate into this function so that
vfs_getattr_nosec() can again be called from the filesystem's gettattr
rather than vfs_getattr(). The latter would add unnecessary security
checks that the initial vfs_getattr_nosec() call wanted to avoid.
Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass
with the new getattr_flags parameter to the getattr interface function.
In overlayfs and ecryptfs use this flag to determine which one of the
two functions to call.

In a recent code change introduced to IMA vfs_getattr_nosec() ended up
calling vfs_getattr() in overlayfs, which in turn called
security_inode_getattr() on an exiting process that did not have
current->fs set anymore, which then caused a kernel NULL pointer
dereference. With this change the call to security_inode_getattr() can
be avoided, thus avoiding the NULL pointer dereference.

Reported-by: <[email protected]>
Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
Cc: Alexander Viro <[email protected]>
Cc: <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: Mimi Zohar <[email protected]>
Suggested-by: Christian Brauner <[email protected]>
Co-developed-by: Amir Goldstein <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 16a94965 04-Oct-2023 Jeff Layton <[email protected]>

fs: convert core infrastructure to new timestamp accessors

Convert the core vfs code to use the new timestamp accessor functions.

Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.

fs: convert core infrastructure to new timestamp accessors

Convert the core vfs code to use the new timestamp accessor functions.

Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.6-rc4, v6.6-rc3
# 647aa768 20-Sep-2023 Christian Brauner <[email protected]>

Revert "fs: add infrastructure for multigrain timestamps"

This reverts commit ffb6cf19e06334062744b7e3493f71e500964f8e.

Users reported regressions due to enabling multi-grained timestamps
unconditi

Revert "fs: add infrastructure for multigrain timestamps"

This reverts commit ffb6cf19e06334062744b7e3493f71e500964f8e.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.6-rc2, v6.6-rc1
# 42aadec8 03-Sep-2023 Linus Torvalds <[email protected]>

stat: remove no-longer-used helper macros

The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit a52dd97

stat: remove no-longer-used helper macros

The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit a52dd971f947 ("vfs: de-crapify "cp_new_stat()" function").

Then a decade later Mikulas noticed that said inconsistency had been a
mistake in the early x86-64 port, and shouldn't have existed in the
first place. So commit 932aba1e1690 ("stat: fix inconsistency between
struct stat and struct compat_stat") removed the uses of the helpers.

But the helpers remained around, unused.

Get rid of them.

Signed-off-by: Linus Torvalds <[email protected]>

show more ...


# 9013c51c 03-Sep-2023 Linus Torvalds <[email protected]>

vfs: mostly undo glibc turning 'fstat()' into 'fstatat(AT_EMPTY_PATH)'

Mateusz reports that glibc turns 'fstat()' calls into 'fstatat()', and
that seems to have been going on for quite a long time d

vfs: mostly undo glibc turning 'fstat()' into 'fstatat(AT_EMPTY_PATH)'

Mateusz reports that glibc turns 'fstat()' calls into 'fstatat()', and
that seems to have been going on for quite a long time due to glibc
having tried to simplify its stat logic into just one point.

This turns out to cause completely unnecessary overhead, where we then
go off and allocate the kernel side pathname, and actually look up the
empty path. Sure, our path lookup is quite optimized, but it still
causes a fair bit of allocation overhead and a couple of completely
unnecessary rounds of lockref accesses etc.

This is all hopefully getting fixed in user space, and there is a patch
floating around for just having glibc use the native fstat() system
call. But even with the current situation we can at least improve on
things by catching the situation and short-circuiting it.

Note that this is still measurably slower than just a plain 'fstat()',
since just checking that the filename is actually empty is somewhat
expensive due to inevitable user space access overhead from the kernel
(ie verifying pointers, and SMAP on x86). But it's still quite a bit
faster than actually looking up the path for real.

To quote numers from Mateusz:
"Sapphire Rapids, will-it-scale, ops/s

stock fstat 5088199
patched fstat 7625244 (+49%)
real fstat 8540383 (+67% / +12%)"

where that 'stock fstat' is the glibc translation of fstat into
fstatat() with an empty path, the 'patched fstat' is with this short
circuiting of the path lookup, and the 'real fstat' is the actual native
fstat() system call with none of this overhead.

Link: https://lore.kernel.org/lkml/20230903204858.lv7i3kqvw6eamhgz@f/
Reported-by: Mateusz Guzik <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v6.5, v6.5-rc7, v6.5-rc6
# ffb6cf19 07-Aug-2023 Jeff Layton <[email protected]>

fs: add infrastructure for multigrain timestamps

The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optim

fs: add infrastructure for multigrain timestamps

The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot metadata updates, down to around 1 per jiffy,
even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache. Even with NFSv4, a lot of
exported filesystems don't properly support a change attribute and are
subject to the same problems with timestamp granularity. Other
applications have similar issues with timestamps (e.g backup
applications).

If we were to always use fine-grained timestamps, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.

What we need is a way to only use fine-grained timestamps when they are
being actively queried.

POSIX generally mandates that when the the mtime changes, the ctime must
also change. The kernel always stores normalized ctime values, so only
the first 30 bits of the tv_nsec field are ever used.

Use the 31st bit of the ctime tv_nsec field to indicate that something
has queried the inode for the mtime or ctime. When this flag is set,
on the next mtime or ctime update, the kernel will fetch a fine-grained
timestamp instead of the usual coarse-grained one.

Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.

Later patches will convert individual filesystems to use the new
infrastructure.

Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


12345