History log of /linux-6.15/fs/ceph/snap.c (Results 1 – 25 of 94)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7
# 38d46409 12-Jun-2023 Xiubo Li <[email protected]>

ceph: print cluster fsid and client global_id in all debug logs

Multiple CephFS mounts on a host is increasingly common so
disambiguating messages like this is necessary and will make it easier
to d

ceph: print cluster fsid and client global_id in all debug logs

Multiple CephFS mounts on a host is increasingly common so
disambiguating messages like this is necessary and will make it easier
to debug issues.

At the same this will improve the debug logs to make them easier to
troubleshooting issues, such as print the ino# instead only printing
the memory addresses of the corresponding inodes and print the dentry
names instead of the corresponding memory addresses for the dentry,etc.

Link: https://tracker.ceph.com/issues/61590
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Patrick Donnelly <[email protected]>
Reviewed-by: Milind Changire <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 5995d90d 12-Jun-2023 Xiubo Li <[email protected]>

ceph: rename _to_client() to _to_fs_client()

We need to covert the inode to ceph_client in the following commit,
and will add one new helper for that, here we rename the old helper
to _fs_client().

ceph: rename _to_client() to _to_fs_client()

We need to covert the inode to ceph_client in the following commit,
and will add one new helper for that, here we rename the old helper
to _fs_client().

Link: https://tracker.ceph.com/issues/61590
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Patrick Donnelly <[email protected]>
Reviewed-by: Milind Changire <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


Revision tags: v6.4-rc6
# 197b7d79 09-Jun-2023 Xiubo Li <[email protected]>

ceph: pass the mdsc to several helpers

We will use the 'mdsc' to get the global_id in the following commits.

Link: https://tracker.ceph.com/issues/61590
Signed-off-by: Xiubo Li <[email protected]>

ceph: pass the mdsc to several helpers

We will use the 'mdsc' to get the global_id in the following commits.

Link: https://tracker.ceph.com/issues/61590
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Patrick Donnelly <[email protected]>
Reviewed-by: Milind Changire <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# c453bdb5 04-Oct-2023 Jeff Layton <[email protected]>

ceph: convert to new timestamp accessors

Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/20231004185347.

ceph: convert to new timestamp accessors

Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1
# e3dfcab2 21-Dec-2022 Xiubo Li <[email protected]>

ceph: drop messages from MDS when unmounting

When unmounting all the dirty buffers will be flushed and after
the last osd request is finished the last reference of the i_count
will be released. Then

ceph: drop messages from MDS when unmounting

When unmounting all the dirty buffers will be flushed and after
the last osd request is finished the last reference of the i_count
will be released. Then it will flush the dirty cap/snap to MDSs,
and the unmounting won't wait the possible acks, which will ihold
the inodes when updating the metadata locally but makes no sense
any more, of this. This will make the evict_inodes() to skip these
inodes.

If encrypt is enabled the kernel generate a warning when removing
the encrypt keys when the skipped inodes still hold the keyring:

WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0
CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0
RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00
RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000
RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000
R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40
R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000
FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
generic_shutdown_super+0x47/0x120
kill_anon_super+0x14/0x30
ceph_kill_sb+0x36/0x90 [ceph]
deactivate_locked_super+0x29/0x60
cleanup_mnt+0xb8/0x140
task_work_run+0x67/0xb0
exit_to_user_mode_prepare+0x23d/0x240
syscall_exit_to_user_mode+0x25/0x60
do_syscall_64+0x40/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fd83dc39e9b

Later the kernel will crash when iput() the inodes and dereferencing
the "sb->s_master_keys", which has been released by the
generic_shutdown_super().

Link: https://tracker.ceph.com/issues/59162
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-and-tested-by: Luís Henriques <[email protected]>
Reviewed-by: Milind Changire <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 7795aef0 05-Jul-2023 Jeff Layton <[email protected]>

ceph: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

ceph: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Reviewed-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 2d12ad95 11-May-2023 Xiubo Li <[email protected]>

ceph: trigger to flush the buffer when making snapshot

The 'i_wr_ref' is used to track the 'Fb' caps, while whenever the 'Fb'
caps is took the kclient will always take the 'Fw' caps at the same
time

ceph: trigger to flush the buffer when making snapshot

The 'i_wr_ref' is used to track the 'Fb' caps, while whenever the 'Fb'
caps is took the kclient will always take the 'Fw' caps at the same
time. That means it will always be a false check in __ceph_finish_cap_snap().

When writing to buffer the kclient will take both 'Fb|Fw' caps and then
write the contents to the buffer pages by increasing the 'i_wrbuffer_ref'
and then just release both 'Fb|Fw'. This is different with the user
space libcephfs, which will keep the 'Fb' being took and use 'i_wr_ref'
instead of 'i_wrbuffer_ref' to track this until the buffer is flushed
to Rados.

We need to defer flushing the capsnap until the corresponding buffer
pages are all flushed to Rados, and at the same time just trigger to
flush the buffer pages immediately.

Link: https://tracker.ceph.com/issues/48640
Link: https://tracker.ceph.com/issues/59343
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Milind Changire <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 409e873e 01-Jun-2023 Xiubo Li <[email protected]>

ceph: fix use-after-free bug for inodes when flushing capsnaps

There is a race between capsnaps flush and removing the inode from
'mdsc->snap_flush_list' list:

== Thread A ==

ceph: fix use-after-free bug for inodes when flushing capsnaps

There is a race between capsnaps flush and removing the inode from
'mdsc->snap_flush_list' list:

== Thread A == == Thread B ==
ceph_queue_cap_snap()
-> allocate 'capsnapA'
->ihold('&ci->vfs_inode')
->add 'capsnapA' to 'ci->i_cap_snaps'
->add 'ci' to 'mdsc->snap_flush_list'
...
== Thread C ==
ceph_flush_snaps()
->__ceph_flush_snaps()
->__send_flush_snap()
handle_cap_flushsnap_ack()
->iput('&ci->vfs_inode')
this also will release 'ci'
...
== Thread D ==
ceph_handle_snap()
->flush_snaps()
->iterate 'mdsc->snap_flush_list'
->get the stale 'ci'
->remove 'ci' from ->ihold(&ci->vfs_inode) this
'mdsc->snap_flush_list' will WARNING

To fix this we will increase the inode's i_count ref when adding 'ci'
to the 'mdsc->snap_flush_list' list.

[ idryomov: need_put int -> bool ]

Cc: [email protected]
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2209299
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Milind Changire <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 4cafd040 18-May-2023 Xiubo Li <[email protected]>

ceph: force updating the msg pointer in non-split case

When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the
request may still contain a list of 'split_realms', and we need
to skip it anyw

ceph: force updating the msg pointer in non-split case

When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the
request may still contain a list of 'split_realms', and we need
to skip it anyway. Or it will be parsed as a corrupt snaptrace.

Cc: [email protected]
Link: https://tracker.ceph.com/issues/61200
Reported-by: Frank Schilder <[email protected]>
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# a68e564a 01-Feb-2023 Xiubo Li <[email protected]>

ceph: blocklist the kclient when receiving corrupted snap trace

When received corrupted snap trace we don't know what exactly has
happened in MDS side. And we shouldn't continue IOs and metadatas
ac

ceph: blocklist the kclient when receiving corrupted snap trace

When received corrupted snap trace we don't know what exactly has
happened in MDS side. And we shouldn't continue IOs and metadatas
access to MDS, which may corrupt or get incorrect contents.

This patch will just block all the further IO/MDS requests
immediately and then evict the kclient itself.

The reason why we still need to evict the kclient just after
blocking all the further IOs is that the MDS could revoke the caps
faster.

Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Venky Shankar <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


Revision tags: v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5
# 51884d15 09-Nov-2022 Xiubo Li <[email protected]>

ceph: avoid putting the realm twice when decoding snaps fails

When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put i

ceph: avoid putting the realm twice when decoding snaps fails

When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.

Cc: [email protected]
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


Revision tags: v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2
# 874c8ca1 09-Jun-2022 David Howells <[email protected]>

netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context

While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context <-> inode casting, __builtin_

netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context

While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context <-> inode casting, __builtin_object_size() as
used by FORTIFY_SOURCE was not as easily fooled. This was causing the
following complaint[1] from gcc v12:

In file included from include/linux/string.h:253,
from include/linux/ceph/ceph_debug.h:7,
from fs/ceph/inode.c:2:
In function 'fortify_memset_chk',
inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
242 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by embedding a struct inode into struct netfs_i_context (which
should perhaps be renamed to struct netfs_inode). The struct inode
vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
structs and vfs_inode is then simply changed to "netfs.inode" in those
filesystems.

Further, rename netfs_i_context to netfs_inode, get rid of the
netfs_inode() function that converted a netfs_i_context pointer to an
inode pointer (that can now be done with &ctx->inode) and rename the
netfs_i_context() function to netfs_inode() (which is now a wrapper
around container_of()).

Most of the changes were done with:

perl -p -i -e 's/vfs_inode/netfs.inode/'g \
`git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`

Kees suggested doing it with a pair structure[2] and a special
declarator to insert that into the network filesystem's inode
wrapper[3], but I think it's cleaner to embed it - and then it doesn't
matter if struct randomisation reorders things.

Dave Chinner suggested using a filesystem-specific VFS_I() function in
each filesystem to convert that filesystem's own inode wrapper struct
into the VFS inode struct[4].

Version #2:
- Fix a couple of missed name changes due to a disabled cifs option.
- Rename nfs_i_context to nfs_inode
- Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
structs.

[ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily
disable '-Wattribute-warning' for now") that is no longer needed ]

Fixes: bc899ee1c898 ("netfs: Add a netfs inode context")
Reported-by: Jeff Layton <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
cc: Jonathan Corbet <[email protected]>
cc: Eric Van Hensbergen <[email protected]>
cc: Latchesar Ionkov <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: Christian Schoenebeck <[email protected]>
cc: Marc Dionne <[email protected]>
cc: Ilya Dryomov <[email protected]>
cc: Steve French <[email protected]>
cc: William Kucharski <[email protected]>
cc: "Matthew Wilcox (Oracle)" <[email protected]>
cc: Dave Chinner <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/[email protected]/ [3]
Link: https://lore.kernel.org/r/[email protected]/ [4]
Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6
# ad5255c1 23-Feb-2022 Xiubo Li <[email protected]>

ceph: misc fix for code style and logs

To make the logs more readable such as for log likes:

ceph: will move 00000000a42b796b to split realm 100000003ed 000000007146df45

With this it will always s

ceph: misc fix for code style and logs

To make the logs more readable such as for log likes:

ceph: will move 00000000a42b796b to split realm 100000003ed 000000007146df45

With this it will always show the inode numbers instead the inode
addresses.

Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 1ab36c9d 23-Feb-2022 Xiubo Li <[email protected]>

ceph: allocate capsnap memory outside of ceph_queue_cap_snap()

This will reduce very possible but unnecessary frequently memory
allocate/free in this loop.

URL: https://tracker.ceph.com/issues/4410

ceph: allocate capsnap memory outside of ceph_queue_cap_snap()

This will reduce very possible but unnecessary frequently memory
allocate/free in this loop.

URL: https://tracker.ceph.com/issues/44100
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 5ed91587 23-Feb-2022 Xiubo Li <[email protected]>

ceph: do not release the global snaprealm until unmounting

The global snaprealm would be created and then destroyed immediately
every time when updating it.

URL: https://tracker.ceph.com/issues/543

ceph: do not release the global snaprealm until unmounting

The global snaprealm would be created and then destroyed immediately
every time when updating it.

URL: https://tracker.ceph.com/issues/54362
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


Revision tags: v5.17-rc5
# 74a31df4 19-Feb-2022 Xiubo Li <[email protected]>

ceph: eliminate the recursion when rebuilding the snap context

Use a list instead of recursion to avoid possible stack overflow.

Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton

ceph: eliminate the recursion when rebuilding the snap context

Use a list instead of recursion to avoid possible stack overflow.

Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 2e586641 19-Feb-2022 Xiubo Li <[email protected]>

ceph: do not update snapshot context when there is no new snapshot

We will only track the uppest parent snapshot realm from which we
need to rebuild the snapshot contexts _downward_ in hierarchy. Fo

ceph: do not update snapshot context when there is no new snapshot

We will only track the uppest parent snapshot realm from which we
need to rebuild the snapshot contexts _downward_ in hierarchy. For
all the others having no new snapshot we will do nothing.

This fix will avoid calling ceph_queue_cap_snap() on some inodes
inappropriately. For example, with the code in mainline, suppose there
are 2 directory hierarchies (with 6 directories total), like this:

/dir_X1/dir_X2/dir_X3/
/dir_Y1/dir_Y2/dir_Y3/

Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
root snapshot under /.snap/root_snap. Every time we make snapshots under
/dir_Y1/..., the kclient will always try to rebuild the snap context for
snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
and dir_Y3, which makes no sense.

That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
creating a new snapshot under /dir_Y1/... the new seq will be 4, and
the mds will send the kclient a snapshot backtrace in _downward_
order: seqs 4, 3.

When ceph_update_snap_trace() is called, it will always rebuild the from
the last realm, that's the root_snap. So later when rebuilding the snap
context, the current logic will always cause it to rebuild the snap_X2
realm and then try to queue cap snaps for all the inodes related in that
realm, even though it's not necessary.

This is accompanied by a lot of these sorts of dout messages:

"ceph: queue_cap_snap 00000000a42b796b nothing dirty|writing"

Fix the logic to avoid this situation.

Also, the 'invalidate' word is not precise here. In actuality, it will
cause a rebuild of the existing snapshot contexts or just build
non-existent ones. Rename it to 'rebuild_snapcs'.

URL: https://tracker.ceph.com/issues/44100
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# ab58a5a1 15-Feb-2022 Xiubo Li <[email protected]>

ceph: move to a dedicated slabcache for ceph_cap_snap

There could be huge number of capsnaps around at any given time. On
x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes
by

ceph: move to a dedicated slabcache for ceph_cap_snap

There could be huge number of capsnaps around at any given time. On
x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes
by kzalloc. Move this to a dedicated slabcache to save 8 bytes for each.

[ jlayton: use kmem_cache_zalloc ]

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


Revision tags: v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5
# 0ba92e1c 02-Aug-2021 Jeff Layton <[email protected]>

ceph: add ceph_change_snap_realm() helper

Consolidate some fiddly code for changing an inode's snap_realm
into a new helper function, and change the callers to use it.

While we're in here, nothing

ceph: add ceph_change_snap_realm() helper

Consolidate some fiddly code for changing an inode's snap_realm
into a new helper function, and change the callers to use it.

While we're in here, nothing uses the i_snap_realm_counter field, so
remove that from the inode.

Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# b2f9fa1f 18-Aug-2021 Xiubo Li <[email protected]>

ceph: correctly handle releasing an embedded cap flush

The ceph_cap_flush structures are usually dynamically allocated, but
the ceph_cap_snap has an embedded one.

When force umounting, the client w

ceph: correctly handle releasing an embedded cap flush

The ceph_cap_flush structures are usually dynamically allocated, but
the ceph_cap_snap has an embedded one.

When force umounting, the client will try to remove all the session
caps. During this, it will free them, but that should not be done
with the ones embedded in a capsnap.

Fix this by adding a new boolean that indicates that the cap flush is
embedded in a capsnap, and skip freeing it if that's set.

At the same time, switch to using list_del_init() when detaching the
i_list and g_list heads. It's possible for a forced umount to remove
these objects but then handle_cap_flushsnap_ack() races in and does the
list_del_init() again, corrupting memory.

Cc: [email protected]
URL: https://tracker.ceph.com/issues/52283
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 8434ffe7 03-Aug-2021 Jeff Layton <[email protected]>

ceph: take snap_empty_lock atomically with snaprealm refcount change

There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could

ceph: take snap_empty_lock atomically with snaprealm refcount change

There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.

Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.

Because these objects can also undergo a 0->1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0->1 transition.

With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.

Cc: [email protected]
URL: https://tracker.ceph.com/issues/46419
Reported-by: Mark Nelson <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


Revision tags: v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5
# 23c2c76e 04-Jun-2021 Jeff Layton <[email protected]>

ceph: eliminate ceph_async_iput()

Now that we don't need to hold session->s_mutex or the snap_rwsem when
calling ceph_check_caps, we can eliminate ceph_async_iput and just use
normal iput calls.

Si

ceph: eliminate ceph_async_iput()

Now that we don't need to hold session->s_mutex or the snap_rwsem when
calling ceph_check_caps, we can eliminate ceph_async_iput and just use
normal iput calls.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# 7732fe16 14-Jun-2021 Jeff Layton <[email protected]>

ceph: don't take s_mutex in ceph_flush_snaps

Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>


# df2c0cb7 01-Jun-2021 Jeff Layton <[email protected]>

ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm

They both say that the snap_rwsem must be held for write, but I don't
see any real reason for it, and it's not curre

ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm

They both say that the snap_rwsem must be held for write, but I don't
see any real reason for it, and it's not currently always called that
way.

The lookup is just walking the rbtree, so holding it for read should be
fine there. The "get" is bumping the refcount and (possibly) removing
it from the empty list. I see no need to hold the snap_rwsem for write
for that.

Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


# a6862e67 01-Jun-2021 Jeff Layton <[email protected]>

ceph: add some lockdep assertions around snaprealm handling

Turn some comments into lockdep asserts.

Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
S

ceph: add some lockdep assertions around snaprealm handling

Turn some comments into lockdep asserts.

Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

show more ...


1234