History log of /linux-6.15/block/ioctl.c (Results 1 – 25 of 162)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4
# c0e473a0 23-Apr-2025 Darrick J. Wong <[email protected]>

block: fix race between set_blocksize and read paths

With the new large sector size support, it's now the case that
set_blocksize can change i_blksize and the folio order in a manner that
conflicts

block: fix race between set_blocksize and read paths

With the new large sector size support, it's now the case that
set_blocksize can change i_blksize and the folio order in a manner that
conflicts with a concurrent reader and causes a kernel crash.

Specifically, let's say that udev-worker calls libblkid to detect the
labels on a block device. The read call can create an order-0 folio to
read the first 4096 bytes from the disk. But then udev is preempted.

Next, someone tries to mount an 8k-sectorsize filesystem from the same
block device. The filesystem calls set_blksize, which sets i_blksize to
8192 and the minimum folio order to 1.

Now udev resumes, still holding the order-0 folio it allocated. It then
tries to schedule a read bio and do_mpage_readahead tries to create
bufferheads for the folio. Unfortunately, blocks_per_folio == 0 because
the page size is 4096 but the blocksize is 8192 so no bufferheads are
attached and the bh walk never sets bdev. We then submit the bio with a
NULL block device and crash.

Therefore, truncate the page cache after flushing but before updating
i_blksize. However, that's not enough -- we also need to lock out file
IO and page faults during the update. Take both the i_rwsem and the
invalidate_lock in exclusive mode for invalidations, and in shared mode
for read/write operations.

I don't know if this is the correct fix, but xfs/259 found it.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/174543795699.4139148.2086129139322431423.stgit@frogsfrogsfrogs
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2
# 1ebd4a3c 04-Feb-2025 Eric Biggers <[email protected]>

blk-crypto: add ioctls to create and prepare hardware-wrapped keys

Until this point, the kernel can use hardware-wrapped keys to do
encryption if userspace provides one -- specifically a key in
ephe

blk-crypto: add ioctls to create and prepare hardware-wrapped keys

Until this point, the kernel can use hardware-wrapped keys to do
encryption if userspace provides one -- specifically a key in
ephemerally-wrapped form. However, no generic way has been provided for
userspace to get such a key in the first place.

Getting such a key is a two-step process. First, the key needs to be
imported from a raw key or generated by the hardware, producing a key in
long-term wrapped form. This happens once in the whole lifetime of the
key. Second, the long-term wrapped key needs to be converted into
ephemerally-wrapped form. This happens each time the key is "unlocked".

In Android, these operations are supported in a generic way through
KeyMint, a userspace abstraction layer. However, that method is
Android-specific and can't be used on other Linux systems, may rely on
proprietary libraries, and also misleads people into supporting KeyMint
features like rollback resistance that make sense for other KeyMint keys
but don't make sense for hardware-wrapped inline encryption keys.

Therefore, this patch provides a generic kernel interface for these
operations by introducing new block device ioctls:

- BLKCRYPTOIMPORTKEY: convert a raw key to long-term wrapped form.

- BLKCRYPTOGENERATEKEY: have the hardware generate a new key, then
return it in long-term wrapped form.

- BLKCRYPTOPREPAREKEY: convert a key from long-term wrapped form to
ephemerally-wrapped form.

These ioctls are implemented using new operations in blk_crypto_ll_ops.

Signed-off-by: Eric Biggers <[email protected]>
Tested-by: Bartosz Golaszewski <[email protected]> # sm8650
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11
# 50c52250 11-Sep-2024 Pavel Begunkov <[email protected]>

block: implement async io_uring discard cmd

io_uring allows implementing custom file specific asynchronous
operations via the fops->uring_cmd callback, a.k.a. IORING_OP_URING_CMD
requests or just io

block: implement async io_uring discard cmd

io_uring allows implementing custom file specific asynchronous
operations via the fops->uring_cmd callback, a.k.a. IORING_OP_URING_CMD
requests or just io_uring commands. Use it to add support for async
discards.

Normally, it first tries to queue up bios in a non-blocking context,
and if that fails, we'd retry from a blocking context by returning
-EAGAIN to the core io_uring. We always get the result from bios
asynchronously by setting a custom bi_end_io callback, at which point
we drag the request into the task context to either reissue or complete
it and post a completion to the user.

Unlike ioctl(BLKDISCARD) with stronger guarantees against races, we only
do a best effort attempt to invalidate page cache, and it can race with
any writes and reads and leave page cache stale. It's the same kind of
races we allow to direct writes.

Also, apart from cases where discarding is not allowed at all, e.g.
discards are not supported or the file/device is read only, the user
should assume that the sector range on disk is not valid anymore, even
when an error was returned to the user.

Suggested-by: Conrad Meyer <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/2b5210443e4fa0257934f73dfafcc18a77cd0e09.1726072086.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 7a07210b 11-Sep-2024 Pavel Begunkov <[email protected]>

block: introduce blk_validate_byte_range()

In preparation to further changes extract a helper function out of
blk_ioctl_discard() that validates if we can do IO against the given
range of disk byte

block: introduce blk_validate_byte_range()

In preparation to further changes extract a helper function out of
blk_ioctl_discard() that validates if we can do IO against the given
range of disk byte addresses.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/19a7779323c71e742a2f511e4cf49efcfd68cfd4.1726072086.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.11-rc7
# 697ba0b6 03-Sep-2024 Alexey Dobriyan <[email protected]>

block: fix integer overflow in BLKSECDISCARD

I independently rediscovered

commit 22d24a544b0d49bbcbd61c8c0eaf77d3c9297155
block: fix overflow in blk_ioctl_discard()

but for secure erase.

Same p

block: fix integer overflow in BLKSECDISCARD

I independently rediscovered

commit 22d24a544b0d49bbcbd61c8c0eaf77d3c9297155
block: fix overflow in blk_ioctl_discard()

but for secure erase.

Same problem:

uint64_t r[2] = {512, 18446744073709551104ULL};
ioctl(fd, BLKSECDISCARD, r);

will enter near infinite loop inside blkdev_issue_secure_erase():

a.out: attempt to access beyond end of device
loop0: rw=5, sector=3399043073, nr_sectors = 1024 limit=2048
bio_check_eod: 3286214 callbacks suppressed

Signed-off-by: Alexey Dobriyan <[email protected]>
Link: https://lore.kernel.org/r/9e64057f-650a-46d1-b9f7-34af391536ef@p183
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7
# bf86bcdb 01-Jul-2024 Christoph Hellwig <[email protected]>

blk-lib: check for kill signal in ioctl BLKZEROOUT

Zeroout can access a significant capacity and take longer than the user
expected. A user may change their mind about wanting to run that
command a

blk-lib: check for kill signal in ioctl BLKZEROOUT

Zeroout can access a significant capacity and take longer than the user
expected. A user may change their mind about wanting to run that
command and attempt to kill the process and do something else with their
device. But since the task is uninterruptable, they have to wait for it
to finish, which could be many hours.

Add a new BLKDEV_ZERO_KILLABLE flag for blkdev_issue_zeroout that checks
for a fatal signal at each iteration so the user doesn't have to wait for
their regretted operation to complete naturally.

Heavily based on an earlier patch from Keith Busch.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9
# 719c15a7 06-May-2024 Christoph Hellwig <[email protected]>

blk-lib: check for kill signal in ioctl BLKDISCARD

Discards can access a significant capacity and take longer than the user
expected. A user may change their mind about wanting to run that command

blk-lib: check for kill signal in ioctl BLKDISCARD

Discards can access a significant capacity and take longer than the user
expected. A user may change their mind about wanting to run that command
and attempt to kill the process and do something else with their device.
But since the task is uninterruptable, they have to wait for it to
finish, which could be many hours.

Open code blkdev_issue_discard in the BLKDISCARD ioctl handler and check
for a fatal signal at each iteration so the user doesn't have to wait
for their regretted operation to complete naturally.

Heavily based on an earlier patch from Keith Busch.

Reported-by: Conrad Meyer <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 30f1e724 06-May-2024 Christoph Hellwig <[email protected]>

block: move discard checks into the ioctl handler

Most bio operations get basic sanity checking in submit_bio and anything
more complicated than that is done in the callers. Discards are a bit
diff

block: move discard checks into the ioctl handler

Most bio operations get basic sanity checking in submit_bio and anything
more complicated than that is done in the callers. Discards are a bit
different from that in that a lot of checking is done in
__blkdev_issue_discard, and the specific errnos for that are returned
to userspace. Move the checks that require specific errnos to the ioctl
handler instead, and just leave the basic sanity checking in submit_bio
for the other handlers. This introduces two changes in behavior:

1) the logical block size alignment check of the start and len is lost
for non-ioctl callers.
This matches what is done for other operations including reads and
writes. We should probably verify this for all bios, but for now
make discards match the normal flow.
2) for non-ioctl callers all errors are reported on I/O completion now
instead of synchronously. Callers in general mostly ignore or log
errors so this will actually simplify the code once cleaned up

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# ccb326b5 07-May-2024 Justin Stitt <[email protected]>

block/ioctl: prefer different overflow check

Running syzkaller with the newly reintroduced signed integer overflow
sanitizer shows this report:

[ 62.982337] ------------[ cut here ]------------
[

block/ioctl: prefer different overflow check

Running syzkaller with the newly reintroduced signed integer overflow
sanitizer shows this report:

[ 62.982337] ------------[ cut here ]------------
[ 62.985692] cgroup: Invalid name
[ 62.986211] UBSAN: signed-integer-overflow in ../block/ioctl.c:36:46
[ 62.989370] 9pnet_fd: p9_fd_create_tcp (7343): problem connecting socket to 127.0.0.1
[ 62.992992] 9223372036854775807 + 4095 cannot be represented in type 'long long'
[ 62.997827] 9pnet_fd: p9_fd_create_tcp (7345): problem connecting socket to 127.0.0.1
[ 62.999369] random: crng reseeded on system resumption
[ 63.000634] GUP no longer grows the stack in syz-executor.2 (7353): 20002000-20003000 (20001000)
[ 63.000668] CPU: 0 PID: 7353 Comm: syz-executor.2 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1
[ 63.000677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 63.000682] Call Trace:
[ 63.000686] <TASK>
[ 63.000731] dump_stack_lvl+0x93/0xd0
[ 63.000919] __get_user_pages+0x903/0xd30
[ 63.001030] __gup_longterm_locked+0x153e/0x1ba0
[ 63.001041] ? _raw_read_unlock_irqrestore+0x17/0x50
[ 63.001072] ? try_get_folio+0x29c/0x2d0
[ 63.001083] internal_get_user_pages_fast+0x1119/0x1530
[ 63.001109] iov_iter_extract_pages+0x23b/0x580
[ 63.001206] bio_iov_iter_get_pages+0x4de/0x1220
[ 63.001235] iomap_dio_bio_iter+0x9b6/0x1410
[ 63.001297] __iomap_dio_rw+0xab4/0x1810
[ 63.001316] iomap_dio_rw+0x45/0xa0
[ 63.001328] ext4_file_write_iter+0xdde/0x1390
[ 63.001372] vfs_write+0x599/0xbd0
[ 63.001394] ksys_write+0xc8/0x190
[ 63.001403] do_syscall_64+0xd4/0x1b0
[ 63.001421] ? arch_exit_to_user_mode_prepare+0x3a/0x60
[ 63.001479] entry_SYSCALL_64_after_hwframe+0x6f/0x77
[ 63.001535] RIP: 0033:0x7f7fd3ebf539
[ 63.001551] Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[ 63.001562] RSP: 002b:00007f7fd32570c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 63.001584] RAX: ffffffffffffffda RBX: 00007f7fd3ff3f80 RCX: 00007f7fd3ebf539
[ 63.001590] RDX: 4db6d1e4f7e43360 RSI: 0000000020000000 RDI: 0000000000000004
[ 63.001595] RBP: 00007f7fd3f1e496 R08: 0000000000000000 R09: 0000000000000000
[ 63.001599] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 63.001604] R13: 0000000000000006 R14: 00007f7fd3ff3f80 R15: 00007ffd415ad2b8
...
[ 63.018142] ---[ end trace ]---

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang; It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").

Let's rework this overflow checking logic to not actually perform an
overflow during the check itself, thus avoiding the UBSAN splat.

[1]: https://github.com/llvm/llvm-project/pull/82432

Signed-off-by: Justin Stitt <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4
# 881494ed 11-Apr-2024 Al Viro <[email protected]>

blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...

Signed-off-by: Al Viro <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...

Signed-off-by: Al Viro <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 224941e8 11-Apr-2024 Al Viro <[email protected]>

use ->bd_mapping instead of ->bd_inode->i_mapping

Just the low-hanging fruit...

Signed-off-by: Al Viro <[email protected]>
Link: https://lore.kernel.org/r/20240411145346.2516848-2-viro@zeniv.

use ->bd_mapping instead of ->bd_inode->i_mapping

Just the low-hanging fruit...

Signed-off-by: Al Viro <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 01e198f0 12-Apr-2024 Al Viro <[email protected]>

bdev: move ->bd_read_only to ->__bd_flags

Signed-off-by: Al Viro <[email protected]>


# ead083ae 18-Apr-2024 Al Viro <[email protected]>

set_blocksize(): switch to passing struct file *

Signed-off-by: Al Viro <[email protected]>


# 752863bd 17-Apr-2024 Christoph Hellwig <[email protected]>

block: propagate partition scanning errors to the BLKRRPART ioctl

Commit 4601b4b130de ("block: reopen the device in blkdev_reread_part")
lost the propagation of I/O errors from the low-level read of

block: propagate partition scanning errors to the BLKRRPART ioctl

Commit 4601b4b130de ("block: reopen the device in blkdev_reread_part")
lost the propagation of I/O errors from the low-level read of the
partition table to the user space caller of the BLKRRPART.

Apparently some user space relies on, so restore the propagation. This
isn't exactly pretty as other block device open calls explicitly do not
are about these errors, so add a new BLK_OPEN_STRICT_SCAN to opt into
the error propagation.

Fixes: 4601b4b130de ("block: reopen the device in blkdev_reread_part")
Reported-by: Saranya Muruganandam <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Shin'ichiro Kawasaki <[email protected]>
Tested-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.9-rc3, v6.9-rc2
# 22d24a54 29-Mar-2024 Li Nan <[email protected]>

block: fix overflow in blk_ioctl_discard()

There is no check for overflow of 'start + len' in blk_ioctl_discard().
Hung task occurs if submit an discard ioctl with the following param:
start = 0x8

block: fix overflow in blk_ioctl_discard()

There is no check for overflow of 'start + len' in blk_ioctl_discard().
Hung task occurs if submit an discard ioctl with the following param:
start = 0x80000000000ff000, len = 0x8000000000fff000;
Add the overflow validation now.

Signed-off-by: Li Nan <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.9-rc1, v6.8
# b9355185 05-Mar-2024 Li Lingfeng <[email protected]>

block: move capacity validation to blkpg_do_ioctl()

Commit 6d4e80db4ebe ("block: add capacity validation in
bdev_add_partition()") add check of partition's start and end sectors to
prevent exceeding

block: move capacity validation to blkpg_do_ioctl()

Commit 6d4e80db4ebe ("block: add capacity validation in
bdev_add_partition()") add check of partition's start and end sectors to
prevent exceeding the size of the disk when adding partitions. However,
there is still no check for resizing partitions now.
Move the check to blkpg_do_ioctl() to cover resizing partitions.

Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2
# e5ca9d39 23-Jan-2024 Christian Brauner <[email protected]>

block/ioctl: port blkdev_bszset() to file

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <jack@

block/ioctl: port blkdev_bszset() to file

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.8-rc1
# 7777f47f 18-Jan-2024 Li Lingfeng <[email protected]>

block: Move checking GENHD_FL_NO_PART to bdev_add_partition()

Commit 1a721de8489f ("block: don't add or resize partition on the disk
with GENHD_FL_NO_PART") prevented all operations about partitions

block: Move checking GENHD_FL_NO_PART to bdev_add_partition()

Commit 1a721de8489f ("block: don't add or resize partition on the disk
with GENHD_FL_NO_PART") prevented all operations about partitions on disks
with GENHD_FL_NO_PART in blkpg_do_ioctl() since they are meaningless.
However, it changed error code in some scenarios. So move checking
GENHD_FL_NO_PART to bdev_add_partition() to eliminate impact.

Fixes: 1a721de8489f ("block: don't add or resize partition on the disk with GENHD_FL_NO_PART")
Reported-by: Allison Karlitskaya <[email protected]>
Closes: https://lore.kernel.org/all/CAOYeF9VsmqKMcQjo1k6YkGNujwN-nzfxY17N3F-CMikE1tYp+w@mail.gmail.com/
Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1
# 6f64f866 29-Jun-2023 Min Li <[email protected]>

block: add check that partition length needs to be aligned with block size

Before calling add partition or resize partition, there is no check
on whether the length is aligned with the logical block

block: add check that partition length needs to be aligned with block size

Before calling add partition or resize partition, there is no check
on whether the length is aligned with the logical block size.
If the logical block size of the disk is larger than 512 bytes,
then the partition size maybe not the multiple of the logical block size,
and when the last sector is read, bio_truncate() will adjust the bio size,
resulting in an IO error if the size of the read command is smaller than
the logical block size.If integrity data is supported, this will also
result in a null pointer dereference when calling bio_integrity_free.

Cc: <[email protected]>
Signed-off-by: Min Li <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# fd146410 18-Oct-2023 Jan Kara <[email protected]>

fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock

The implementation of bdev holder operations such as fs_bdev_mark_dead()
and fs_bdev_sync() grab sb->s_umount semaphore under
bdev->bd_hold

fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock

The implementation of bdev holder operations such as fs_bdev_mark_dead()
and fs_bdev_sync() grab sb->s_umount semaphore under
bdev->bd_holder_lock. This is problematic because it leads to
disk->open_mutex -> sb->s_umount lock ordering which is counterintuitive
(usually we grab higher level (e.g. filesystem) locks first and lower
level (e.g. block layer) locks later) and indeed makes lockdep complain
about possible locking cycles whenever we open a block device while
holding sb->s_umount semaphore. Implement a function
bdev_super_lock_shared() which safely transitions from holding
bdev->bd_holder_lock to holding sb->s_umount on alive superblock without
introducing the problematic lock dependency. We use this function
fs_bdev_sync() and fs_bdev_mark_dead().

Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# acb083b5 27-Sep-2023 Jan Kara <[email protected]>

block: Use bdev_open_by_dev() in disk_scan_partitions() and blkdev_bszset()

Convert disk_scan_partitions() and blkdev_bszset() to use
bdev_open_by_dev().

Acked-by: Christoph Hellwig <[email protected]>
Re

block: Use bdev_open_by_dev() in disk_scan_partitions() and blkdev_bszset()

Convert disk_scan_partitions() and blkdev_bszset() to use
bdev_open_by_dev().

Acked-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 1a721de8 31-Aug-2023 Li Lingfeng <[email protected]>

block: don't add or resize partition on the disk with GENHD_FL_NO_PART

Commit a33df75c6328 ("block: use an xarray for disk->part_tbl") remove
disk_expand_part_tbl() in add_partition(), which means a

block: don't add or resize partition on the disk with GENHD_FL_NO_PART

Commit a33df75c6328 ("block: use an xarray for disk->part_tbl") remove
disk_expand_part_tbl() in add_partition(), which means all kinds of
devices will support extended dynamic `dev_t`.
However, some devices with GENHD_FL_NO_PART are not expected to add or
resize partition.
Fix this by adding check of GENHD_FL_NO_PART before add or resize
partition.

Fixes: a33df75c6328 ("block: use an xarray for disk->part_tbl")
Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 2142b88c 11-Aug-2023 Christoph Hellwig <[email protected]>

block: call into the file system for ioctl BLKFLSBUF

BLKFLSBUF is a historic ioctl that is called on a file handle to a
block device and syncs either the file system mounted on that block
device if

block: call into the file system for ioctl BLKFLSBUF

BLKFLSBUF is a historic ioctl that is called on a file handle to a
block device and syncs either the file system mounted on that block
device if there is one, or otherwise the just the data on the block
device.

Replace the get_super based syncing with a holder operation to remove
the last usage of get_super, and to also support syncing the file system
if the block device is not the main block device stored in s_dev.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.4, v6.4-rc7
# 9a72a024 13-Jun-2023 Jingbo Xu <[email protected]>

block: fine-granular CAP_SYS_ADMIN for Persistent Reservation

Allow of unprivileged Persistent Reservation operations on devices
if the write permission check on the device node has passed.

brw-rw-

block: fine-granular CAP_SYS_ADMIN for Persistent Reservation

Allow of unprivileged Persistent Reservation operations on devices
if the write permission check on the device node has passed.

brw-rw---- 1 root disk 259, 0 Jun 13 07:09 /dev/nvme0n1

In the example above, the "disk" group of nvme0n1 is also allowed to
make reservations on the device even without CAP_SYS_ADMIN.

Signed-off-by: Jingbo Xu <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 12629621 13-Jun-2023 Jingbo Xu <[email protected]>

block: disallow Persistent Reservation on partitions

Refuse Persistent Reservation operations on partitions as reservation
on partitions doesn't make sense.

Besides, introduce blkdev_pr_allowed() h

block: disallow Persistent Reservation on partitions

Refuse Persistent Reservation operations on partitions as reservation
on partitions doesn't make sense.

Besides, introduce blkdev_pr_allowed() helper, where more policies could
be placed here later.

Signed-off-by: Jingbo Xu <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


1234567