|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4 |
|
| #
c0e473a0 |
| 23-Apr-2025 |
Darrick J. Wong <[email protected]> |
block: fix race between set_blocksize and read paths
With the new large sector size support, it's now the case that set_blocksize can change i_blksize and the folio order in a manner that conflicts
block: fix race between set_blocksize and read paths
With the new large sector size support, it's now the case that set_blocksize can change i_blksize and the folio order in a manner that conflicts with a concurrent reader and causes a kernel crash.
Specifically, let's say that udev-worker calls libblkid to detect the labels on a block device. The read call can create an order-0 folio to read the first 4096 bytes from the disk. But then udev is preempted.
Next, someone tries to mount an 8k-sectorsize filesystem from the same block device. The filesystem calls set_blksize, which sets i_blksize to 8192 and the minimum folio order to 1.
Now udev resumes, still holding the order-0 folio it allocated. It then tries to schedule a read bio and do_mpage_readahead tries to create bufferheads for the folio. Unfortunately, blocks_per_folio == 0 because the page size is 4096 but the blocksize is 8192 so no bufferheads are attached and the bh walk never sets bdev. We then submit the bio with a NULL block device and crash.
Therefore, truncate the page cache after flushing but before updating i_blksize. However, that's not enough -- we also need to lock out file IO and page faults during the update. Take both the i_rwsem and the invalidate_lock in exclusive mode for invalidations, and in shared mode for read/write operations.
I don't know if this is the correct fix, but xfs/259 found it.
Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Tested-by: Shin'ichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/r/174543795699.4139148.2086129139322431423.stgit@frogsfrogsfrogs Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
1ebd4a3c |
| 04-Feb-2025 |
Eric Biggers <[email protected]> |
blk-crypto: add ioctls to create and prepare hardware-wrapped keys
Until this point, the kernel can use hardware-wrapped keys to do encryption if userspace provides one -- specifically a key in ephe
blk-crypto: add ioctls to create and prepare hardware-wrapped keys
Until this point, the kernel can use hardware-wrapped keys to do encryption if userspace provides one -- specifically a key in ephemerally-wrapped form. However, no generic way has been provided for userspace to get such a key in the first place.
Getting such a key is a two-step process. First, the key needs to be imported from a raw key or generated by the hardware, producing a key in long-term wrapped form. This happens once in the whole lifetime of the key. Second, the long-term wrapped key needs to be converted into ephemerally-wrapped form. This happens each time the key is "unlocked".
In Android, these operations are supported in a generic way through KeyMint, a userspace abstraction layer. However, that method is Android-specific and can't be used on other Linux systems, may rely on proprietary libraries, and also misleads people into supporting KeyMint features like rollback resistance that make sense for other KeyMint keys but don't make sense for hardware-wrapped inline encryption keys.
Therefore, this patch provides a generic kernel interface for these operations by introducing new block device ioctls:
- BLKCRYPTOIMPORTKEY: convert a raw key to long-term wrapped form.
- BLKCRYPTOGENERATEKEY: have the hardware generate a new key, then return it in long-term wrapped form.
- BLKCRYPTOPREPAREKEY: convert a key from long-term wrapped form to ephemerally-wrapped form.
These ioctls are implemented using new operations in blk_crypto_ll_ops.
Signed-off-by: Eric Biggers <[email protected]> Tested-by: Bartosz Golaszewski <[email protected]> # sm8650 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
50c52250 |
| 11-Sep-2024 |
Pavel Begunkov <[email protected]> |
block: implement async io_uring discard cmd
io_uring allows implementing custom file specific asynchronous operations via the fops->uring_cmd callback, a.k.a. IORING_OP_URING_CMD requests or just io
block: implement async io_uring discard cmd
io_uring allows implementing custom file specific asynchronous operations via the fops->uring_cmd callback, a.k.a. IORING_OP_URING_CMD requests or just io_uring commands. Use it to add support for async discards.
Normally, it first tries to queue up bios in a non-blocking context, and if that fails, we'd retry from a blocking context by returning -EAGAIN to the core io_uring. We always get the result from bios asynchronously by setting a custom bi_end_io callback, at which point we drag the request into the task context to either reissue or complete it and post a completion to the user.
Unlike ioctl(BLKDISCARD) with stronger guarantees against races, we only do a best effort attempt to invalidate page cache, and it can race with any writes and reads and leave page cache stale. It's the same kind of races we allow to direct writes.
Also, apart from cases where discarding is not allowed at all, e.g. discards are not supported or the file/device is read only, the user should assume that the sector range on disk is not valid anymore, even when an error was returned to the user.
Suggested-by: Conrad Meyer <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/2b5210443e4fa0257934f73dfafcc18a77cd0e09.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
7a07210b |
| 11-Sep-2024 |
Pavel Begunkov <[email protected]> |
block: introduce blk_validate_byte_range()
In preparation to further changes extract a helper function out of blk_ioctl_discard() that validates if we can do IO against the given range of disk byte
block: introduce blk_validate_byte_range()
In preparation to further changes extract a helper function out of blk_ioctl_discard() that validates if we can do IO against the given range of disk byte addresses.
Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/19a7779323c71e742a2f511e4cf49efcfd68cfd4.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7 |
|
| #
697ba0b6 |
| 03-Sep-2024 |
Alexey Dobriyan <[email protected]> |
block: fix integer overflow in BLKSECDISCARD
I independently rediscovered
commit 22d24a544b0d49bbcbd61c8c0eaf77d3c9297155 block: fix overflow in blk_ioctl_discard()
but for secure erase.
Same p
block: fix integer overflow in BLKSECDISCARD
I independently rediscovered
commit 22d24a544b0d49bbcbd61c8c0eaf77d3c9297155 block: fix overflow in blk_ioctl_discard()
but for secure erase.
Same problem:
uint64_t r[2] = {512, 18446744073709551104ULL}; ioctl(fd, BLKSECDISCARD, r);
will enter near infinite loop inside blkdev_issue_secure_erase():
a.out: attempt to access beyond end of device loop0: rw=5, sector=3399043073, nr_sectors = 1024 limit=2048 bio_check_eod: 3286214 callbacks suppressed
Signed-off-by: Alexey Dobriyan <[email protected]> Link: https://lore.kernel.org/r/9e64057f-650a-46d1-b9f7-34af391536ef@p183 Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7 |
|
| #
bf86bcdb |
| 01-Jul-2024 |
Christoph Hellwig <[email protected]> |
blk-lib: check for kill signal in ioctl BLKZEROOUT
Zeroout can access a significant capacity and take longer than the user expected. A user may change their mind about wanting to run that command a
blk-lib: check for kill signal in ioctl BLKZEROOUT
Zeroout can access a significant capacity and take longer than the user expected. A user may change their mind about wanting to run that command and attempt to kill the process and do something else with their device. But since the task is uninterruptable, they have to wait for it to finish, which could be many hours.
Add a new BLKDEV_ZERO_KILLABLE flag for blkdev_issue_zeroout that checks for a fatal signal at each iteration so the user doesn't have to wait for their regretted operation to complete naturally.
Heavily based on an earlier patch from Keith Busch.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9 |
|
| #
719c15a7 |
| 06-May-2024 |
Christoph Hellwig <[email protected]> |
blk-lib: check for kill signal in ioctl BLKDISCARD
Discards can access a significant capacity and take longer than the user expected. A user may change their mind about wanting to run that command
blk-lib: check for kill signal in ioctl BLKDISCARD
Discards can access a significant capacity and take longer than the user expected. A user may change their mind about wanting to run that command and attempt to kill the process and do something else with their device. But since the task is uninterruptable, they have to wait for it to finish, which could be many hours.
Open code blkdev_issue_discard in the BLKDISCARD ioctl handler and check for a fatal signal at each iteration so the user doesn't have to wait for their regretted operation to complete naturally.
Heavily based on an earlier patch from Keith Busch.
Reported-by: Conrad Meyer <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
30f1e724 |
| 06-May-2024 |
Christoph Hellwig <[email protected]> |
block: move discard checks into the ioctl handler
Most bio operations get basic sanity checking in submit_bio and anything more complicated than that is done in the callers. Discards are a bit diff
block: move discard checks into the ioctl handler
Most bio operations get basic sanity checking in submit_bio and anything more complicated than that is done in the callers. Discards are a bit different from that in that a lot of checking is done in __blkdev_issue_discard, and the specific errnos for that are returned to userspace. Move the checks that require specific errnos to the ioctl handler instead, and just leave the basic sanity checking in submit_bio for the other handlers. This introduces two changes in behavior:
1) the logical block size alignment check of the start and len is lost for non-ioctl callers. This matches what is done for other operations including reads and writes. We should probably verify this for all bios, but for now make discards match the normal flow. 2) for non-ioctl callers all errors are reported on I/O completion now instead of synchronously. Callers in general mostly ignore or log errors so this will actually simplify the code once cleaned up
Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
ccb326b5 |
| 07-May-2024 |
Justin Stitt <[email protected]> |
block/ioctl: prefer different overflow check
Running syzkaller with the newly reintroduced signed integer overflow sanitizer shows this report:
[ 62.982337] ------------[ cut here ]------------ [
block/ioctl: prefer different overflow check
Running syzkaller with the newly reintroduced signed integer overflow sanitizer shows this report:
[ 62.982337] ------------[ cut here ]------------ [ 62.985692] cgroup: Invalid name [ 62.986211] UBSAN: signed-integer-overflow in ../block/ioctl.c:36:46 [ 62.989370] 9pnet_fd: p9_fd_create_tcp (7343): problem connecting socket to 127.0.0.1 [ 62.992992] 9223372036854775807 + 4095 cannot be represented in type 'long long' [ 62.997827] 9pnet_fd: p9_fd_create_tcp (7345): problem connecting socket to 127.0.0.1 [ 62.999369] random: crng reseeded on system resumption [ 63.000634] GUP no longer grows the stack in syz-executor.2 (7353): 20002000-20003000 (20001000) [ 63.000668] CPU: 0 PID: 7353 Comm: syz-executor.2 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1 [ 63.000677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 63.000682] Call Trace: [ 63.000686] <TASK> [ 63.000731] dump_stack_lvl+0x93/0xd0 [ 63.000919] __get_user_pages+0x903/0xd30 [ 63.001030] __gup_longterm_locked+0x153e/0x1ba0 [ 63.001041] ? _raw_read_unlock_irqrestore+0x17/0x50 [ 63.001072] ? try_get_folio+0x29c/0x2d0 [ 63.001083] internal_get_user_pages_fast+0x1119/0x1530 [ 63.001109] iov_iter_extract_pages+0x23b/0x580 [ 63.001206] bio_iov_iter_get_pages+0x4de/0x1220 [ 63.001235] iomap_dio_bio_iter+0x9b6/0x1410 [ 63.001297] __iomap_dio_rw+0xab4/0x1810 [ 63.001316] iomap_dio_rw+0x45/0xa0 [ 63.001328] ext4_file_write_iter+0xdde/0x1390 [ 63.001372] vfs_write+0x599/0xbd0 [ 63.001394] ksys_write+0xc8/0x190 [ 63.001403] do_syscall_64+0xd4/0x1b0 [ 63.001421] ? arch_exit_to_user_mode_prepare+0x3a/0x60 [ 63.001479] entry_SYSCALL_64_after_hwframe+0x6f/0x77 [ 63.001535] RIP: 0033:0x7f7fd3ebf539 [ 63.001551] Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 [ 63.001562] RSP: 002b:00007f7fd32570c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 63.001584] RAX: ffffffffffffffda RBX: 00007f7fd3ff3f80 RCX: 00007f7fd3ebf539 [ 63.001590] RDX: 4db6d1e4f7e43360 RSI: 0000000020000000 RDI: 0000000000000004 [ 63.001595] RBP: 00007f7fd3f1e496 R08: 0000000000000000 R09: 0000000000000000 [ 63.001599] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 63.001604] R13: 0000000000000006 R14: 00007f7fd3ff3f80 R15: 00007ffd415ad2b8 ... [ 63.018142] ---[ end trace ]---
Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang; It was re-enabled in the kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow sanitizer").
Let's rework this overflow checking logic to not actually perform an overflow during the check itself, thus avoiding the UBSAN splat.
[1]: https://github.com/llvm/llvm-project/pull/82432
Signed-off-by: Justin Stitt <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
881494ed |
| 11-Apr-2024 |
Al Viro <[email protected]> |
blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/[email protected]
blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
224941e8 |
| 11-Apr-2024 |
Al Viro <[email protected]> |
use ->bd_mapping instead of ->bd_inode->i_mapping
Just the low-hanging fruit...
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/20240411145346.2516848-2-viro@zeniv.
use ->bd_mapping instead of ->bd_inode->i_mapping
Just the low-hanging fruit...
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
01e198f0 |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_read_only to ->__bd_flags
Signed-off-by: Al Viro <[email protected]>
|
| #
ead083ae |
| 18-Apr-2024 |
Al Viro <[email protected]> |
set_blocksize(): switch to passing struct file *
Signed-off-by: Al Viro <[email protected]>
|
| #
752863bd |
| 17-Apr-2024 |
Christoph Hellwig <[email protected]> |
block: propagate partition scanning errors to the BLKRRPART ioctl
Commit 4601b4b130de ("block: reopen the device in blkdev_reread_part") lost the propagation of I/O errors from the low-level read of
block: propagate partition scanning errors to the BLKRRPART ioctl
Commit 4601b4b130de ("block: reopen the device in blkdev_reread_part") lost the propagation of I/O errors from the low-level read of the partition table to the user space caller of the BLKRRPART.
Apparently some user space relies on, so restore the propagation. This isn't exactly pretty as other block device open calls explicitly do not are about these errors, so add a new BLK_OPEN_STRICT_SCAN to opt into the error propagation.
Fixes: 4601b4b130de ("block: reopen the device in blkdev_reread_part") Reported-by: Saranya Muruganandam <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Shin'ichiro Kawasaki <[email protected]> Tested-by: Shin'ichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2 |
|
| #
22d24a54 |
| 29-Mar-2024 |
Li Nan <[email protected]> |
block: fix overflow in blk_ioctl_discard()
There is no check for overflow of 'start + len' in blk_ioctl_discard(). Hung task occurs if submit an discard ioctl with the following param: start = 0x8
block: fix overflow in blk_ioctl_discard()
There is no check for overflow of 'start + len' in blk_ioctl_discard(). Hung task occurs if submit an discard ioctl with the following param: start = 0x80000000000ff000, len = 0x8000000000fff000; Add the overflow validation now.
Signed-off-by: Li Nan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8 |
|
| #
b9355185 |
| 05-Mar-2024 |
Li Lingfeng <[email protected]> |
block: move capacity validation to blkpg_do_ioctl()
Commit 6d4e80db4ebe ("block: add capacity validation in bdev_add_partition()") add check of partition's start and end sectors to prevent exceeding
block: move capacity validation to blkpg_do_ioctl()
Commit 6d4e80db4ebe ("block: add capacity validation in bdev_add_partition()") add check of partition's start and end sectors to prevent exceeding the size of the disk when adding partitions. However, there is still no check for resizing partitions now. Move the check to blkpg_do_ioctl() to cover resizing partitions.
Signed-off-by: Li Lingfeng <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2 |
|
| #
e5ca9d39 |
| 23-Jan-2024 |
Christian Brauner <[email protected]> |
block/ioctl: port blkdev_bszset() to file
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <jack@
block/ioctl: port blkdev_bszset() to file
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc1 |
|
| #
7777f47f |
| 18-Jan-2024 |
Li Lingfeng <[email protected]> |
block: Move checking GENHD_FL_NO_PART to bdev_add_partition()
Commit 1a721de8489f ("block: don't add or resize partition on the disk with GENHD_FL_NO_PART") prevented all operations about partitions
block: Move checking GENHD_FL_NO_PART to bdev_add_partition()
Commit 1a721de8489f ("block: don't add or resize partition on the disk with GENHD_FL_NO_PART") prevented all operations about partitions on disks with GENHD_FL_NO_PART in blkpg_do_ioctl() since they are meaningless. However, it changed error code in some scenarios. So move checking GENHD_FL_NO_PART to bdev_add_partition() to eliminate impact.
Fixes: 1a721de8489f ("block: don't add or resize partition on the disk with GENHD_FL_NO_PART") Reported-by: Allison Karlitskaya <[email protected]> Closes: https://lore.kernel.org/all/CAOYeF9VsmqKMcQjo1k6YkGNujwN-nzfxY17N3F-CMikE1tYp+w@mail.gmail.com/ Signed-off-by: Li Lingfeng <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
6f64f866 |
| 29-Jun-2023 |
Min Li <[email protected]> |
block: add check that partition length needs to be aligned with block size
Before calling add partition or resize partition, there is no check on whether the length is aligned with the logical block
block: add check that partition length needs to be aligned with block size
Before calling add partition or resize partition, there is no check on whether the length is aligned with the logical block size. If the logical block size of the disk is larger than 512 bytes, then the partition size maybe not the multiple of the logical block size, and when the last sector is read, bio_truncate() will adjust the bio size, resulting in an IO error if the size of the read command is smaller than the logical block size.If integrity data is supported, this will also result in a null pointer dereference when calling bio_integrity_free.
Cc: <[email protected]> Signed-off-by: Min Li <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
fd146410 |
| 18-Oct-2023 |
Jan Kara <[email protected]> |
fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock
The implementation of bdev holder operations such as fs_bdev_mark_dead() and fs_bdev_sync() grab sb->s_umount semaphore under bdev->bd_hold
fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock
The implementation of bdev holder operations such as fs_bdev_mark_dead() and fs_bdev_sync() grab sb->s_umount semaphore under bdev->bd_holder_lock. This is problematic because it leads to disk->open_mutex -> sb->s_umount lock ordering which is counterintuitive (usually we grab higher level (e.g. filesystem) locks first and lower level (e.g. block layer) locks later) and indeed makes lockdep complain about possible locking cycles whenever we open a block device while holding sb->s_umount semaphore. Implement a function bdev_super_lock_shared() which safely transitions from holding bdev->bd_holder_lock to holding sb->s_umount on alive superblock without introducing the problematic lock dependency. We use this function fs_bdev_sync() and fs_bdev_mark_dead().
Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
acb083b5 |
| 27-Sep-2023 |
Jan Kara <[email protected]> |
block: Use bdev_open_by_dev() in disk_scan_partitions() and blkdev_bszset()
Convert disk_scan_partitions() and blkdev_bszset() to use bdev_open_by_dev().
Acked-by: Christoph Hellwig <[email protected]> Re
block: Use bdev_open_by_dev() in disk_scan_partitions() and blkdev_bszset()
Convert disk_scan_partitions() and blkdev_bszset() to use bdev_open_by_dev().
Acked-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
1a721de8 |
| 31-Aug-2023 |
Li Lingfeng <[email protected]> |
block: don't add or resize partition on the disk with GENHD_FL_NO_PART
Commit a33df75c6328 ("block: use an xarray for disk->part_tbl") remove disk_expand_part_tbl() in add_partition(), which means a
block: don't add or resize partition on the disk with GENHD_FL_NO_PART
Commit a33df75c6328 ("block: use an xarray for disk->part_tbl") remove disk_expand_part_tbl() in add_partition(), which means all kinds of devices will support extended dynamic `dev_t`. However, some devices with GENHD_FL_NO_PART are not expected to add or resize partition. Fix this by adding check of GENHD_FL_NO_PART before add or resize partition.
Fixes: a33df75c6328 ("block: use an xarray for disk->part_tbl") Signed-off-by: Li Lingfeng <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
2142b88c |
| 11-Aug-2023 |
Christoph Hellwig <[email protected]> |
block: call into the file system for ioctl BLKFLSBUF
BLKFLSBUF is a historic ioctl that is called on a file handle to a block device and syncs either the file system mounted on that block device if
block: call into the file system for ioctl BLKFLSBUF
BLKFLSBUF is a historic ioctl that is called on a file handle to a block device and syncs either the file system mounted on that block device if there is one, or otherwise the just the data on the block device.
Replace the get_super based syncing with a holder operation to remove the last usage of get_super, and to also support syncing the file system if the block device is not the main block device stored in s_dev.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.4, v6.4-rc7 |
|
| #
9a72a024 |
| 13-Jun-2023 |
Jingbo Xu <[email protected]> |
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
Allow of unprivileged Persistent Reservation operations on devices if the write permission check on the device node has passed.
brw-rw-
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
Allow of unprivileged Persistent Reservation operations on devices if the write permission check on the device node has passed.
brw-rw---- 1 root disk 259, 0 Jun 13 07:09 /dev/nvme0n1
In the example above, the "disk" group of nvme0n1 is also allowed to make reservations on the device even without CAP_SYS_ADMIN.
Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
12629621 |
| 13-Jun-2023 |
Jingbo Xu <[email protected]> |
block: disallow Persistent Reservation on partitions
Refuse Persistent Reservation operations on partitions as reservation on partitions doesn't make sense.
Besides, introduce blkdev_pr_allowed() h
block: disallow Persistent Reservation on partitions
Refuse Persistent Reservation operations on partitions as reservation on partitions doesn't make sense.
Besides, introduce blkdev_pr_allowed() helper, where more policies could be placed here later.
Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|