|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7 |
|
| #
38d46409 |
| 12-Jun-2023 |
Xiubo Li <[email protected]> |
ceph: print cluster fsid and client global_id in all debug logs
Multiple CephFS mounts on a host is increasingly common so disambiguating messages like this is necessary and will make it easier to d
ceph: print cluster fsid and client global_id in all debug logs
Multiple CephFS mounts on a host is increasingly common so disambiguating messages like this is necessary and will make it easier to debug issues.
At the same this will improve the debug logs to make them easier to troubleshooting issues, such as print the ino# instead only printing the memory addresses of the corresponding inodes and print the dentry names instead of the corresponding memory addresses for the dentry,etc.
Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Patrick Donnelly <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
5995d90d |
| 12-Jun-2023 |
Xiubo Li <[email protected]> |
ceph: rename _to_client() to _to_fs_client()
We need to covert the inode to ceph_client in the following commit, and will add one new helper for that, here we rename the old helper to _fs_client().
ceph: rename _to_client() to _to_fs_client()
We need to covert the inode to ceph_client in the following commit, and will add one new helper for that, here we rename the old helper to _fs_client().
Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Patrick Donnelly <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc6 |
|
| #
197b7d79 |
| 09-Jun-2023 |
Xiubo Li <[email protected]> |
ceph: pass the mdsc to several helpers
We will use the 'mdsc' to get the global_id in the following commits.
Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <[email protected]>
ceph: pass the mdsc to several helpers
We will use the 'mdsc' to get the global_id in the following commits.
Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Patrick Donnelly <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
c453bdb5 |
| 04-Oct-2023 |
Jeff Layton <[email protected]> |
ceph: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions.
Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/20231004185347.
ceph: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions.
Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1 |
|
| #
e3dfcab2 |
| 21-Dec-2022 |
Xiubo Li <[email protected]> |
ceph: drop messages from MDS when unmounting
When unmounting all the dirty buffers will be flushed and after the last osd request is finished the last reference of the i_count will be released. Then
ceph: drop messages from MDS when unmounting
When unmounting all the dirty buffers will be flushed and after the last osd request is finished the last reference of the i_count will be released. Then it will flush the dirty cap/snap to MDSs, and the unmounting won't wait the possible acks, which will ihold the inodes when updating the metadata locally but makes no sense any more, of this. This will make the evict_inodes() to skip these inodes.
If encrypt is enabled the kernel generate a warning when removing the encrypt keys when the skipped inodes still hold the keyring:
WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> generic_shutdown_super+0x47/0x120 kill_anon_super+0x14/0x30 ceph_kill_sb+0x36/0x90 [ceph] deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x67/0xb0 exit_to_user_mode_prepare+0x23d/0x240 syscall_exit_to_user_mode+0x25/0x60 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd83dc39e9b
Later the kernel will crash when iput() the inodes and dereferencing the "sb->s_master_keys", which has been released by the generic_shutdown_super().
Link: https://tracker.ceph.com/issues/59162 Signed-off-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
7795aef0 |
| 05-Jul-2023 |
Jeff Layton <[email protected]> |
ceph: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
ceph: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
Reviewed-by: Xiubo Li <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
2d12ad95 |
| 11-May-2023 |
Xiubo Li <[email protected]> |
ceph: trigger to flush the buffer when making snapshot
The 'i_wr_ref' is used to track the 'Fb' caps, while whenever the 'Fb' caps is took the kclient will always take the 'Fw' caps at the same time
ceph: trigger to flush the buffer when making snapshot
The 'i_wr_ref' is used to track the 'Fb' caps, while whenever the 'Fb' caps is took the kclient will always take the 'Fw' caps at the same time. That means it will always be a false check in __ceph_finish_cap_snap().
When writing to buffer the kclient will take both 'Fb|Fw' caps and then write the contents to the buffer pages by increasing the 'i_wrbuffer_ref' and then just release both 'Fb|Fw'. This is different with the user space libcephfs, which will keep the 'Fb' being took and use 'i_wr_ref' instead of 'i_wrbuffer_ref' to track this until the buffer is flushed to Rados.
We need to defer flushing the capsnap until the corresponding buffer pages are all flushed to Rados, and at the same time just trigger to flush the buffer pages immediately.
Link: https://tracker.ceph.com/issues/48640 Link: https://tracker.ceph.com/issues/59343 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
409e873e |
| 01-Jun-2023 |
Xiubo Li <[email protected]> |
ceph: fix use-after-free bug for inodes when flushing capsnaps
There is a race between capsnaps flush and removing the inode from 'mdsc->snap_flush_list' list:
== Thread A ==
ceph: fix use-after-free bug for inodes when flushing capsnaps
There is a race between capsnaps flush and removing the inode from 'mdsc->snap_flush_list' list:
== Thread A == == Thread B == ceph_queue_cap_snap() -> allocate 'capsnapA' ->ihold('&ci->vfs_inode') ->add 'capsnapA' to 'ci->i_cap_snaps' ->add 'ci' to 'mdsc->snap_flush_list' ... == Thread C == ceph_flush_snaps() ->__ceph_flush_snaps() ->__send_flush_snap() handle_cap_flushsnap_ack() ->iput('&ci->vfs_inode') this also will release 'ci' ... == Thread D == ceph_handle_snap() ->flush_snaps() ->iterate 'mdsc->snap_flush_list' ->get the stale 'ci' ->remove 'ci' from ->ihold(&ci->vfs_inode) this 'mdsc->snap_flush_list' will WARNING
To fix this we will increase the inode's i_count ref when adding 'ci' to the 'mdsc->snap_flush_list' list.
[ idryomov: need_put int -> bool ]
Cc: [email protected] Link: https://bugzilla.redhat.com/show_bug.cgi?id=2209299 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Milind Changire <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
4cafd040 |
| 18-May-2023 |
Xiubo Li <[email protected]> |
ceph: force updating the msg pointer in non-split case
When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the request may still contain a list of 'split_realms', and we need to skip it anyw
ceph: force updating the msg pointer in non-split case
When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the request may still contain a list of 'split_realms', and we need to skip it anyway. Or it will be parsed as a corrupt snaptrace.
Cc: [email protected] Link: https://tracker.ceph.com/issues/61200 Reported-by: Frank Schilder <[email protected]> Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
a68e564a |
| 01-Feb-2023 |
Xiubo Li <[email protected]> |
ceph: blocklist the kclient when receiving corrupted snap trace
When received corrupted snap trace we don't know what exactly has happened in MDS side. And we shouldn't continue IOs and metadatas ac
ceph: blocklist the kclient when receiving corrupted snap trace
When received corrupted snap trace we don't know what exactly has happened in MDS side. And we shouldn't continue IOs and metadatas access to MDS, which may corrupt or get incorrect contents.
This patch will just block all the further IO/MDS requests immediately and then evict the kclient itself.
The reason why we still need to evict the kclient just after blocking all the further IOs is that the MDS could revoke the caps faster.
Link: https://tracker.ceph.com/issues/57686 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Venky Shankar <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
|
Revision tags: v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5 |
|
| #
51884d15 |
| 09-Nov-2022 |
Xiubo Li <[email protected]> |
ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm' and 'realm' pointing to the same snaprealm memory. And then it'll put i
ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm' and 'realm' pointing to the same snaprealm memory. And then it'll put it twice and could cause random use-after-free, BUG_ON, etc issues.
Cc: [email protected] Link: https://tracker.ceph.com/issues/57686 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2 |
|
| #
874c8ca1 |
| 09-Jun-2022 |
David Howells <[email protected]> |
netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context
While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_
netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context
While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12:
In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems.
Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()).
Most of the changes were done with:
perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`
Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things.
Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4].
Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs.
[ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ]
Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Xiubo Li <[email protected]> cc: Jonathan Corbet <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: Marc Dionne <[email protected]> cc: Ilya Dryomov <[email protected]> cc: Steve French <[email protected]> cc: William Kucharski <[email protected]> cc: "Matthew Wilcox (Oracle)" <[email protected]> cc: Dave Chinner <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/[email protected]/ [2] Link: https://lore.kernel.org/r/[email protected]/ [3] Link: https://lore.kernel.org/r/[email protected]/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
ad5255c1 |
| 23-Feb-2022 |
Xiubo Li <[email protected]> |
ceph: misc fix for code style and logs
To make the logs more readable such as for log likes:
ceph: will move 00000000a42b796b to split realm 100000003ed 000000007146df45
With this it will always s
ceph: misc fix for code style and logs
To make the logs more readable such as for log likes:
ceph: will move 00000000a42b796b to split realm 100000003ed 000000007146df45
With this it will always show the inode numbers instead the inode addresses.
Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
1ab36c9d |
| 23-Feb-2022 |
Xiubo Li <[email protected]> |
ceph: allocate capsnap memory outside of ceph_queue_cap_snap()
This will reduce very possible but unnecessary frequently memory allocate/free in this loop.
URL: https://tracker.ceph.com/issues/4410
ceph: allocate capsnap memory outside of ceph_queue_cap_snap()
This will reduce very possible but unnecessary frequently memory allocate/free in this loop.
URL: https://tracker.ceph.com/issues/44100 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
5ed91587 |
| 23-Feb-2022 |
Xiubo Li <[email protected]> |
ceph: do not release the global snaprealm until unmounting
The global snaprealm would be created and then destroyed immediately every time when updating it.
URL: https://tracker.ceph.com/issues/543
ceph: do not release the global snaprealm until unmounting
The global snaprealm would be created and then destroyed immediately every time when updating it.
URL: https://tracker.ceph.com/issues/54362 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5 |
|
| #
74a31df4 |
| 19-Feb-2022 |
Xiubo Li <[email protected]> |
ceph: eliminate the recursion when rebuilding the snap context
Use a list instead of recursion to avoid possible stack overflow.
Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton
ceph: eliminate the recursion when rebuilding the snap context
Use a list instead of recursion to avoid possible stack overflow.
Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
2e586641 |
| 19-Feb-2022 |
Xiubo Li <[email protected]> |
ceph: do not update snapshot context when there is no new snapshot
We will only track the uppest parent snapshot realm from which we need to rebuild the snapshot contexts _downward_ in hierarchy. Fo
ceph: do not update snapshot context when there is no new snapshot
We will only track the uppest parent snapshot realm from which we need to rebuild the snapshot contexts _downward_ in hierarchy. For all the others having no new snapshot we will do nothing.
This fix will avoid calling ceph_queue_cap_snap() on some inodes inappropriately. For example, with the code in mainline, suppose there are 2 directory hierarchies (with 6 directories total), like this:
/dir_X1/dir_X2/dir_X3/ /dir_Y1/dir_Y2/dir_Y3/
Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a root snapshot under /.snap/root_snap. Every time we make snapshots under /dir_Y1/..., the kclient will always try to rebuild the snap context for snap_X2 realm and finally will always try to queue cap snaps for dir_Y2 and dir_Y3, which makes no sense.
That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when creating a new snapshot under /dir_Y1/... the new seq will be 4, and the mds will send the kclient a snapshot backtrace in _downward_ order: seqs 4, 3.
When ceph_update_snap_trace() is called, it will always rebuild the from the last realm, that's the root_snap. So later when rebuilding the snap context, the current logic will always cause it to rebuild the snap_X2 realm and then try to queue cap snaps for all the inodes related in that realm, even though it's not necessary.
This is accompanied by a lot of these sorts of dout messages:
"ceph: queue_cap_snap 00000000a42b796b nothing dirty|writing"
Fix the logic to avoid this situation.
Also, the 'invalidate' word is not precise here. In actuality, it will cause a rebuild of the existing snapshot contexts or just build non-existent ones. Rename it to 'rebuild_snapcs'.
URL: https://tracker.ceph.com/issues/44100 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
ab58a5a1 |
| 15-Feb-2022 |
Xiubo Li <[email protected]> |
ceph: move to a dedicated slabcache for ceph_cap_snap
There could be huge number of capsnaps around at any given time. On x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes by
ceph: move to a dedicated slabcache for ceph_cap_snap
There could be huge number of capsnaps around at any given time. On x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes by kzalloc. Move this to a dedicated slabcache to save 8 bytes for each.
[ jlayton: use kmem_cache_zalloc ]
Signed-off-by: Xiubo Li <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5 |
|
| #
0ba92e1c |
| 02-Aug-2021 |
Jeff Layton <[email protected]> |
ceph: add ceph_change_snap_realm() helper
Consolidate some fiddly code for changing an inode's snap_realm into a new helper function, and change the callers to use it.
While we're in here, nothing
ceph: add ceph_change_snap_realm() helper
Consolidate some fiddly code for changing an inode's snap_realm into a new helper function, and change the callers to use it.
While we're in here, nothing uses the i_snap_realm_counter field, so remove that from the inode.
Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Luis Henriques <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
b2f9fa1f |
| 18-Aug-2021 |
Xiubo Li <[email protected]> |
ceph: correctly handle releasing an embedded cap flush
The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one.
When force umounting, the client w
ceph: correctly handle releasing an embedded cap flush
The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one.
When force umounting, the client will try to remove all the session caps. During this, it will free them, but that should not be done with the ones embedded in a capsnap.
Fix this by adding a new boolean that indicates that the cap flush is embedded in a capsnap, and skip freeing it if that's set.
At the same time, switch to using list_del_init() when detaching the i_list and g_list heads. It's possible for a forced umount to remove these objects but then handle_cap_flushsnap_ack() races in and does the list_del_init() again, corrupting memory.
Cc: [email protected] URL: https://tracker.ceph.com/issues/52283 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
8434ffe7 |
| 03-Aug-2021 |
Jeff Layton <[email protected]> |
ceph: take snap_empty_lock atomically with snaprealm refcount change
There is a race in ceph_put_snap_realm. The change to the nref and the spinlock acquisition are not done atomically, so you could
ceph: take snap_empty_lock atomically with snaprealm refcount change
There is a race in ceph_put_snap_realm. The change to the nref and the spinlock acquisition are not done atomically, so you could decrement nref, and before you take the spinlock, the nref is incremented again. At that point, you end up putting it on the empty list when it shouldn't be there. Eventually __cleanup_empty_realms runs and frees it when it's still in-use.
Fix this by protecting the 1->0 transition with atomic_dec_and_lock, and just drop the spinlock if we can get the rwsem.
Because these objects can also undergo a 0->1 refcount transition, we must protect that change as well with the spinlock. Increment locklessly unless the value is at 0, in which case we take the spinlock, increment and then take it off the empty list if it did the 0->1 transition.
With these changes, I'm removing the dout() messages from these functions, as well as in __put_snap_realm. They've always been racy, and it's better to not print values that may be misleading.
Cc: [email protected] URL: https://tracker.ceph.com/issues/46419 Reported-by: Mark Nelson <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Luis Henriques <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5 |
|
| #
23c2c76e |
| 04-Jun-2021 |
Jeff Layton <[email protected]> |
ceph: eliminate ceph_async_iput()
Now that we don't need to hold session->s_mutex or the snap_rwsem when calling ceph_check_caps, we can eliminate ceph_async_iput and just use normal iput calls.
Si
ceph: eliminate ceph_async_iput()
Now that we don't need to hold session->s_mutex or the snap_rwsem when calling ceph_check_caps, we can eliminate ceph_async_iput and just use normal iput calls.
Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
7732fe16 |
| 14-Jun-2021 |
Jeff Layton <[email protected]> |
ceph: don't take s_mutex in ceph_flush_snaps
Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Luis Henriques <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
|
| #
df2c0cb7 |
| 01-Jun-2021 |
Jeff Layton <[email protected]> |
ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm
They both say that the snap_rwsem must be held for write, but I don't see any real reason for it, and it's not curre
ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm
They both say that the snap_rwsem must be held for write, but I don't see any real reason for it, and it's not currently always called that way.
The lookup is just walking the rbtree, so holding it for read should be fine there. The "get" is bumping the refcount and (possibly) removing it from the empty list. I see no need to hold the snap_rwsem for write for that.
Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|
| #
a6862e67 |
| 01-Jun-2021 |
Jeff Layton <[email protected]> |
ceph: add some lockdep assertions around snaprealm handling
Turn some comments into lockdep asserts.
Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> S
ceph: add some lockdep assertions around snaprealm handling
Turn some comments into lockdep asserts.
Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
show more ...
|