|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4 |
|
| #
5f33b522 |
| 23-Apr-2025 |
Christoph Hellwig <[email protected]> |
block: don't autoload drivers on stat
blkdev_get_no_open can trigger the legacy autoload of block drivers. A simple stat of a block device has not historically done that, so disable this behavior a
block: don't autoload drivers on stat
blkdev_get_no_open can trigger the legacy autoload of block drivers. A simple stat of a block device has not historically done that, so disable this behavior again.
Fixes: 9abcfbd235f5 ("block: Add atomic write support for statx") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
d13b7090 |
| 23-Apr-2025 |
Christoph Hellwig <[email protected]> |
block: remove the backing_inode variable in bdev_statx
backing_inode is only used once, so remove it and update the comment describing the bdev lookup to be a bit more clear.
Signed-off-by: Christo
block: remove the backing_inode variable in bdev_statx
backing_inode is only used once, so remove it and update the comment describing the bdev lookup to be a bit more clear.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
e03463d2 |
| 23-Apr-2025 |
Darrick J. Wong <[email protected]> |
block: hoist block size validation code to a separate function
Hoist the block size validation code to bdev_validate_blocksize so that we can call it from filesystems that don't care about the bdev
block: hoist block size validation code to a separate function
Hoist the block size validation code to bdev_validate_blocksize so that we can call it from filesystems that don't care about the bdev pagecache manipulations of set_blocksize.
Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/174543795720.4139148.840349813093799165.stgit@frogsfrogsfrogs Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
c0e473a0 |
| 23-Apr-2025 |
Darrick J. Wong <[email protected]> |
block: fix race between set_blocksize and read paths
With the new large sector size support, it's now the case that set_blocksize can change i_blksize and the folio order in a manner that conflicts
block: fix race between set_blocksize and read paths
With the new large sector size support, it's now the case that set_blocksize can change i_blksize and the folio order in a manner that conflicts with a concurrent reader and causes a kernel crash.
Specifically, let's say that udev-worker calls libblkid to detect the labels on a block device. The read call can create an order-0 folio to read the first 4096 bytes from the disk. But then udev is preempted.
Next, someone tries to mount an 8k-sectorsize filesystem from the same block device. The filesystem calls set_blksize, which sets i_blksize to 8192 and the minimum folio order to 1.
Now udev resumes, still holding the order-0 folio it allocated. It then tries to schedule a read bio and do_mpage_readahead tries to create bufferheads for the folio. Unfortunately, blocks_per_folio == 0 because the page size is 4096 but the blocksize is 8192 so no bufferheads are attached and the bh walk never sets bdev. We then submit the bio with a NULL block device and crash.
Therefore, truncate the page cache after flushing but before updating i_blksize. However, that's not enough -- we also need to lock out file IO and page faults during the update. Take both the i_rwsem and the invalidate_lock in exclusive mode for invalidations, and in shared mode for read/write operations.
I don't know if this is the correct fix, but xfs/259 found it.
Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Tested-by: Shin'ichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/r/174543795699.4139148.2086129139322431423.stgit@frogsfrogsfrogs Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc3 |
|
| #
777d0961 |
| 17-Apr-2025 |
Christoph Hellwig <[email protected]> |
fs: move the bdex_statx call to vfs_getattr_nosec
Currently bdex_statx is only called from the very high-level vfs_statx_path function, and thus bypassing it for in-kernel calls to vfs_getattr or vf
fs: move the bdex_statx call to vfs_getattr_nosec
Currently bdex_statx is only called from the very high-level vfs_statx_path function, and thus bypassing it for in-kernel calls to vfs_getattr or vfs_getattr_nosec.
This breaks querying the block ѕize of the underlying device in the loop driver and also is a pitfall for any other new kernel caller.
Move the call into the lowest level helper to ensure all callers get the right results.
Fixes: 2d985f8c6b91 ("vfs: support STATX_DIOALIGN on block devices") Fixes: f4774e92aab8 ("loop: take the file system minimum dio alignment into account") Reported-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
a64e5a59 |
| 07-Mar-2025 |
Luis Chamberlain <[email protected]> |
bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()
The commit titled "block/bdev: lift block size restrictions to 64k" lifted the block layer's max supported block size to 64k ins
bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()
The commit titled "block/bdev: lift block size restrictions to 64k" lifted the block layer's max supported block size to 64k inside the helper blk_validate_block_size() now that we support large folios. However in lifting the block size we also removed the silly use cases many filesystems have to use sb_set_blocksize() to *verify* that the block size <= PAGE_SIZE. The call to sb_set_blocksize() was used to check the block size <= PAGE_SIZE since historically we've always supported userspace to create for example 64k block size filesystems even on 4k page size systems, but what we didn't allow was mounting them. Older filesystems have been using the check with sb_set_blocksize() for years.
While, we could argue that such checks should be filesystem specific, there are much more users of sb_set_blocksize() than LBS enabled filesystem on upstream, so just do the easier thing and bring back the PAGE_SIZE check for sb_set_blocksize() users and only skip it for LBS enabled filesystems.
This will ensure that tests such as generic/466 when run in a loop against say, ext4, won't try to try to actually mount a filesystem with a block size larger than your filesystem supports given your PAGE_SIZE and in the worst case crash.
Cc: Kent Overstreet <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4 |
|
| #
425fbcd6 |
| 21-Feb-2025 |
Luis Chamberlain <[email protected]> |
bdev: use bdev_io_min() for statx block size
You can use lsblk to query for a block device block device block size:
lsblk -o MIN-IO /dev/nvme0n1 MIN-IO 4096
The min-io is the minimum IO the block
bdev: use bdev_io_min() for statx block size
You can use lsblk to query for a block device block device block size:
lsblk -o MIN-IO /dev/nvme0n1 MIN-IO 4096
The min-io is the minimum IO the block device prefers for optimal performance. In turn we map this to the block device block size. The current block size exposed even for block devices with an LBA format of 16k is 4k. Likewise devices which support 4k LBA format but have a larger Indirection Unit of 16k have an exposed block size of 4k.
This incurs read-modify-writes on direct IO against devices with a min-io larger than the page size. To fix this, use the block device min io, which is the minimal optimal IO the device prefers.
With this we now get:
lsblk -o MIN-IO /dev/nvme0n1 MIN-IO 16384
And so userspace gets the appropriate information it needs for optimal performance. This is verified with blkalgn against mkfs against a device with LBA format of 4k but an NPWG of 16k (min io size)
mkfs.xfs -f -b size=16k /dev/nvme3n1 blkalgn -d nvme3n1 --ops Write
Block size : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 0 | | 16384 -> 32767 : 66 |****************************************| 32768 -> 65535 : 0 | | 65536 -> 131071 : 0 | | 131072 -> 262143 : 2 |* | Block size: 14 - 66 Block size: 17 - 2
Algn size : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 0 | | 16384 -> 32767 : 66 |****************************************| 32768 -> 65535 : 0 | | 65536 -> 131071 : 0 | | 131072 -> 262143 : 2 |* | Algn size: 14 - 66 Algn size: 17 - 2
Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: John Garry <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
47dd6753 |
| 21-Feb-2025 |
Luis Chamberlain <[email protected]> |
block/bdev: lift block size restrictions to 64k
We now can support blocksizes larger than PAGE_SIZE, so in theory we should be able to lift the restriction up to the max supported page cache order.
block/bdev: lift block size restrictions to 64k
We now can support blocksizes larger than PAGE_SIZE, so in theory we should be able to lift the restriction up to the max supported page cache order. However bound ourselves to what we can currently validate and test. Through blktests and fstest we can validate up to 64k today.
Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
3c209171 |
| 21-Feb-2025 |
Hannes Reinecke <[email protected]> |
block/bdev: enable large folio support for large logical block sizes
Call mapping_set_folio_min_order() when modifying the logical block size to ensure folios are allocated with the correct size.
R
block/bdev: enable large folio support for large logical block sizes
Call mapping_set_folio_min_order() when modifying the logical block size to ensure folios are allocated with the correct size.
Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Signed-off-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: John Garry <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
26fff8a4 |
| 18-Dec-2024 |
Luis Chamberlain <[email protected]> |
block/bdev: use helper for max block size check
We already have a helper for checking the limits on the block size both low and high, just use that.
No functional changes.
Reviewed-by: John Garry
block/bdev: use helper for max block size check
We already have a helper for checking the limits on the block size both low and high, just use that.
No functional changes.
Reviewed-by: John Garry <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
aa3d8a36 |
| 26-Aug-2024 |
NeilBrown <[email protected]> |
block: change wait on bd_claiming to use a var_waitqueue
bd_prepare_to_claim() waits for a var to change, not for a bit to be cleared. Change from bit_waitqueue() to __var_waitqueue() and correspond
block: change wait on bd_claiming to use a var_waitqueue
bd_prepare_to_claim() waits for a var to change, not for a bit to be cleared. Change from bit_waitqueue() to __var_waitqueue() and correspondingly use wake_up_var(). This will allow a future patch which change the "bit" function to expect an "unsigned long *" instead of "void *".
Signed-off-by: NeilBrown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
b55d26bd |
| 03-Aug-2024 |
Deven Bowers <[email protected]> |
block,lsm: add LSM blob and new LSM hooks for block devices
This patch introduces a new LSM blob to the block_device structure, enabling the security subsystem to store security-sensitive data relat
block,lsm: add LSM blob and new LSM hooks for block devices
This patch introduces a new LSM blob to the block_device structure, enabling the security subsystem to store security-sensitive data related to block devices. Currently, for a device mapper's mapped device containing a dm-verity target, critical security information such as the roothash and its signing state are not readily accessible. Specifically, while the dm-verity volume creation process passes the dm-verity roothash and its signature from userspace to the kernel, the roothash is stored privately within the dm-verity target, and its signature is discarded post-verification. This makes it extremely hard for the security subsystem to utilize these data.
With the addition of the LSM blob to the block_device structure, the security subsystem can now retain and manage important security metadata such as the roothash and the signing state of a dm-verity by storing them inside the blob. Access decisions can then be based on these stored data.
The implementation follows the same approach used for security blobs in other structures like struct file, struct inode, and struct superblock. The initialization of the security blob occurs after the creation of the struct block_device, performed by the security subsystem. Similarly, the security blob is freed by the security subsystem before the struct block_device is deallocated or freed.
This patch also introduces a new hook security_bdev_setintegrity() to save block device's integrity data to the new LSM blob. For example, for dm-verity, it can use this hook to expose its roothash and signing state to LSMs, then LSMs can save these data into the LSM blob.
Please note that the new hook should be invoked every time the security information is updated to keep these data current. For example, in dm-verity, if the mapping table is reloaded and configured to use a different dm-verity target with a new roothash and signing information, the previously stored data in the LSM blob will become obsolete. It is crucial to re-invoke the hook to refresh these data and ensure they are up to date. This necessity arises from the design of device-mapper, where a device-mapper device is first created, and then targets are subsequently loaded into it. These targets can be modified multiple times during the device's lifetime. Therefore, while the LSM blob is allocated during the creation of the block device, its actual contents are not initialized at this stage and can change substantially over time. This includes alterations from data that the LSM 'trusts' to those it does not, making it essential to handle these changes correctly. Failure to address this dynamic aspect could potentially allow for bypassing LSM checks.
Signed-off-by: Deven Bowers <[email protected]> Signed-off-by: Fan Wu <[email protected]> [PM: merge fuzz, subject line tweaks] Signed-off-by: Paul Moore <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5 |
|
| #
9abcfbd2 |
| 20-Jun-2024 |
Prasad Singamsetty <[email protected]> |
block: Add atomic write support for statx
Extend statx system call to return additional info for atomic write support support if the specified file is a block device.
Reviewed-by: Martin K. Peterse
block: Add atomic write support for statx
Extend statx system call to return additional info for atomic write support support if the specified file is a block device.
Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Prasad Singamsetty <[email protected]> Signed-off-by: John Garry <[email protected]> Reviewed-by: Keith Busch <[email protected]> Acked-by: Darrick J. Wong <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc4 |
|
| #
d9c23321 |
| 14-Jun-2024 |
Jiapeng Chong <[email protected]> |
bdev: make blockdev_mnt static
The blockdev_mnt are not used outside the file bdev.c, so the modification is defined as static.
block/bdev.c:377:17: warning: symbol 'blockdev_mnt' was not declared.
bdev: make blockdev_mnt static
The blockdev_mnt are not used outside the file bdev.c, so the modification is defined as static.
block/bdev.c:377:17: warning: symbol 'blockdev_mnt' was not declared. Should it be static?
Reported-by: Abaci Robot <[email protected]> jpg: Remove closes bugzilla link Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: John Garry <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Fixes: 8f3a608827d1 ("bdev: open block device as files") Tested-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
203c1ce0 |
| 29-Apr-2024 |
Al Viro <[email protected]> |
RIP ->bd_inode
Signed-off-by: Al Viro <[email protected]>
|
| #
df65f166 |
| 28-Apr-2024 |
Al Viro <[email protected]> |
block/bdev.c: use the knowledge of inode/bdev coallocation
Here we know that bdevfs inodes are coallocated with struct block_device and we can get to ->bd_inode value without any dereferencing. Int
block/bdev.c: use the knowledge of inode/bdev coallocation
Here we know that bdevfs inodes are coallocated with struct block_device and we can get to ->bd_inode value without any dereferencing. Introduce an inlined helper (static, *not* exported, purely internal for bdev.c) that gets an associated inode by block_device - BD_INODE(bdev).
NOTE: leave it static; nobody outside of block/bdev.c has any business playing with that.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
224941e8 |
| 11-Apr-2024 |
Al Viro <[email protected]> |
use ->bd_mapping instead of ->bd_inode->i_mapping
Just the low-hanging fruit...
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/20240411145346.2516848-2-viro@zeniv.
use ->bd_mapping instead of ->bd_inode->i_mapping
Just the low-hanging fruit...
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
e33aef2c |
| 11-Apr-2024 |
Al Viro <[email protected]> |
block_device: add a pointer to struct address_space (page cache of bdev)
points to ->i_data of coallocated inode.
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/20
block_device: add a pointer to struct address_space (page cache of bdev)
points to ->i_data of coallocated inode.
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
2638c208 |
| 28-Apr-2024 |
Al Viro <[email protected]> |
missing helpers: bdev_unhash(), bdev_drop()
bdev_unhash(): make block device invisible to lookups by device number bdev_drop(): drop reference to associated inode.
Both are internal, for use by gen
missing helpers: bdev_unhash(), bdev_drop()
bdev_unhash(): make block device invisible to lookups by device number bdev_drop(): drop reference to associated inode.
Both are internal, for use by genhd and partition-related code - similar to bdev_add(). The logics in there (especially the lifetime-related parts of it) ought to be cleaned up, but that's a separate story; here we just encapsulate getting to associated inode.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
186ddac2 |
| 11-Apr-2024 |
Yu Kuai <[email protected]> |
block: move two helpers into bdev.c
disk_live() and block_size() access bd_inode directly, prepare to remove the field bd_inode from block_device, and only access bd_inode in block layer.
Signed-of
block: move two helpers into bdev.c
disk_live() and block_size() access bd_inode directly, prepare to remove the field bd_inode from block_device, and only access bd_inode in block layer.
Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
ac2b6f9d |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_has_subit_bio to ->__bd_flags
In bdev_alloc() we have all flags initialized to false, so assignment to ->bh_has_submit_bio n there is a no-op unless we have partno != 0 and flag alre
bdev: move ->bd_has_subit_bio to ->__bd_flags
In bdev_alloc() we have all flags initialized to false, so assignment to ->bh_has_submit_bio n there is a no-op unless we have partno != 0 and flag already set on entire device.
In device_add_disk() we have just allocated the block_device in question and it had been a full-device one, so the flag is guaranteed to be still clear when we get to assignment.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
4c80105e |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_write_holder into ->__bd_flags
Signed-off-by: Al Viro <[email protected]>
|
| #
1116b9fa |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: infrastructure for flags
Replace bd_partno with a 32bit field (__bd_flags). The lower 8 bits contain the partition number, the upper 24 are for flags.
Helpers: bdev_{test,set,clear}_flag(bde
bdev: infrastructure for flags
Replace bd_partno with a 32bit field (__bd_flags). The lower 8 bits contain the partition number, the upper 24 are for flags.
Helpers: bdev_{test,set,clear}_flag(bdev, flag), with atomic_or() and atomic_andnot() used to set/clear.
NOTE: this commit does not actually move any flags over there - they are still bool fields. As the result, it shifts the fields wrt cacheline boundaries; that's going to be restored once the first 3 flags are dealt with.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
d18a8679 |
| 02-May-2024 |
Al Viro <[email protected]> |
make set_blocksize() fail unless block device is opened exclusive
Signed-off-by: Al Viro <[email protected]>
|
| #
ead083ae |
| 18-Apr-2024 |
Al Viro <[email protected]> |
set_blocksize(): switch to passing struct file *
Signed-off-by: Al Viro <[email protected]>
|