|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
4208c562 |
| 17-Sep-2024 |
Kanchan Joshi <[email protected]> |
block: remove bogus union
The union around bi_integrity field is pointless. Remove it.
Signed-off-by: Kanchan Joshi <[email protected]> Link: https://lore.kernel.org/r/20240917045457.429698-1-jos
block: remove bogus union
The union around bi_integrity field is pointless. Remove it.
Signed-off-by: Kanchan Joshi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
b55d26bd |
| 03-Aug-2024 |
Deven Bowers <[email protected]> |
block,lsm: add LSM blob and new LSM hooks for block devices
This patch introduces a new LSM blob to the block_device structure, enabling the security subsystem to store security-sensitive data relat
block,lsm: add LSM blob and new LSM hooks for block devices
This patch introduces a new LSM blob to the block_device structure, enabling the security subsystem to store security-sensitive data related to block devices. Currently, for a device mapper's mapped device containing a dm-verity target, critical security information such as the roothash and its signing state are not readily accessible. Specifically, while the dm-verity volume creation process passes the dm-verity roothash and its signature from userspace to the kernel, the roothash is stored privately within the dm-verity target, and its signature is discarded post-verification. This makes it extremely hard for the security subsystem to utilize these data.
With the addition of the LSM blob to the block_device structure, the security subsystem can now retain and manage important security metadata such as the roothash and the signing state of a dm-verity by storing them inside the blob. Access decisions can then be based on these stored data.
The implementation follows the same approach used for security blobs in other structures like struct file, struct inode, and struct superblock. The initialization of the security blob occurs after the creation of the struct block_device, performed by the security subsystem. Similarly, the security blob is freed by the security subsystem before the struct block_device is deallocated or freed.
This patch also introduces a new hook security_bdev_setintegrity() to save block device's integrity data to the new LSM blob. For example, for dm-verity, it can use this hook to expose its roothash and signing state to LSMs, then LSMs can save these data into the LSM blob.
Please note that the new hook should be invoked every time the security information is updated to keep these data current. For example, in dm-verity, if the mapping table is reloaded and configured to use a different dm-verity target with a new roothash and signing information, the previously stored data in the LSM blob will become obsolete. It is crucial to re-invoke the hook to refresh these data and ensure they are up to date. This necessity arises from the design of device-mapper, where a device-mapper device is first created, and then targets are subsequently loaded into it. These targets can be modified multiple times during the device's lifetime. Therefore, while the LSM blob is allocated during the creation of the block device, its actual contents are not initialized at this stage and can change substantially over time. This includes alterations from data that the LSM 'trusts' to those it does not, making it essential to handle these changes correctly. Failure to address this dynamic aspect could potentially allow for bypassing LSM checks.
Signed-off-by: Deven Bowers <[email protected]> Signed-off-by: Fan Wu <[email protected]> [PM: merge fuzz, subject line tweaks] Signed-off-by: Paul Moore <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1 |
|
| #
6fa99325 |
| 19-Jul-2024 |
John Garry <[email protected]> |
block: Catch possible entries missing from cmd_flag_name[]
Add a BUILD_BUG_ON() call to ensure that we are not missing entries in cmd_flag_name[].
Reviewed-by: Bart Van Assche <[email protected]>
block: Catch possible entries missing from cmd_flag_name[]
Add a BUILD_BUG_ON() call to ensure that we are not missing entries in cmd_flag_name[].
Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5 |
|
| #
9da3d1e9 |
| 20-Jun-2024 |
John Garry <[email protected]> |
block: Add core atomic write support
Add atomic write support, as follows: - add helper functions to get request_queue atomic write limits - report request_queue atomic write support limits to sysfs
block: Add core atomic write support
Add atomic write support, as follows: - add helper functions to get request_queue atomic write limits - report request_queue atomic write support limits to sysfs and update Doc - support to safely merge atomic writes - deal with splitting atomic writes - misc helper functions - add a per-request atomic write flag
New request_queue limits are added, as follows: - atomic_write_hw_max is set by the block driver and is the maximum length of an atomic write which the device may support. It is not necessarily a power-of-2. - atomic_write_max_sectors is derived from atomic_write_hw_max_sectors and max_hw_sectors. It is always a power-of-2. Atomic writes may be merged, and atomic_write_max_sectors would be the limit on a merged atomic write request size. This value is not capped at max_sectors, as the value in max_sectors can be controlled from userspace, and it would only cause trouble if userspace could limit atomic_write_unit_max_bytes and the other atomic write limits. - atomic_write_hw_unit_{min,max} are set by the block driver and are the min/max length of an atomic write unit which the device may support. They both must be a power-of-2. Typically atomic_write_hw_unit_max will hold the same value as atomic_write_hw_max. - atomic_write_unit_{min,max} are derived from atomic_write_hw_unit_{min,max}, max_hw_sectors, and block core limits. Both min and max values must be a power-of-2. - atomic_write_hw_boundary is set by the block driver. If non-zero, it indicates an LBA space boundary at which an atomic write straddles no longer is atomically executed by the disk. The value must be a power-of-2. Note that it would be acceptable to enforce a rule that atomic_write_hw_boundary_sectors is a multiple of atomic_write_hw_unit_max, but the resultant code would be more complicated.
All atomic writes limits are by default set 0 to indicate no atomic write support. Even though it is assumed by Linux that a logical block can always be atomically written, we ignore this as it is not of particular interest. Stacked devices are just not supported either for now.
An atomic write must always be submitted to the block driver as part of a single request. As such, only a single BIO must be submitted to the block layer for an atomic write. When a single atomic write BIO is submitted, it cannot be split. As such, atomic_write_unit_{max, min}_bytes are limited by the maximum guaranteed BIO size which will not be required to be split. This max size is calculated by request_queue max segments and the number of bvecs a BIO can fit, BIO_MAX_VECS. Currently we rely on userspace issuing a write with iovcnt=1 for pwritev2() - as such, we can rely on each segment containing PAGE_SIZE of data, apart from the first+last, which each can fit logical block size of data. The first+last will be LBS length/aligned as we rely on direct IO alignment rules also.
New sysfs files are added to report the following atomic write limits: - atomic_write_unit_max_bytes - same as atomic_write_unit_max_sectors in bytes - atomic_write_unit_min_bytes - same as atomic_write_unit_min_sectors in bytes - atomic_write_boundary_bytes - same as atomic_write_hw_boundary_sectors in bytes - atomic_write_max_bytes - same as atomic_write_max_sectors in bytes
Atomic writes may only be merged with other atomic writes and only under the following conditions: - total resultant request length <= atomic_write_max_bytes - the merged write does not straddle a boundary
Helper function bdev_can_atomic_write() is added to indicate whether atomic writes may be issued to a bdev. If a bdev is a partition, the partition start must be aligned with both atomic_write_unit_min_sectors and atomic_write_hw_boundary_sectors.
FSes will rely on the block layer to validate that an atomic write BIO submitted will be of valid size, so add blk_validate_atomic_write_op_size() for this purpose. Userspace expects an atomic write which is of invalid size to be rejected with -EINVAL, so add BLK_STS_INVAL for this. Also use BLK_STS_INVAL for when a BIO needs to be split, as this should mean an invalid size BIO.
Flag REQ_ATOMIC is used for indicating an atomic write.
Co-developed-by: Himanshu Madhani <[email protected]> Signed-off-by: Himanshu Madhani <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: John Garry <[email protected]> Reviewed-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
a4184174 |
| 13-Dec-2023 |
Arnd Bergmann <[email protected]> |
alpha: drop pre-EV56 support
All EV4 machines are already gone, and the remaining EV5 based machines all support the slightly more modern EV56 generation as well. Debian only supports EV56 and later
alpha: drop pre-EV56 support
All EV4 machines are already gone, and the remaining EV5 based machines all support the slightly more modern EV56 generation as well. Debian only supports EV56 and later.
Drop both of these and build kernels optimized for EV56 and higher when the "generic" options is selected, tuning for an out-of-order EV6 pipeline, same as Debian userspace.
Since this was the only supported architecture without 8-bit and 16-bit stores, common kernel code no longer has to worry about aligning struct members, and existing workarounds from the block and tty layers can be removed.
The alpha memory management code no longer needs an abstraction for the differences between EV4 and EV5+.
Link: https://lists.debian.org/debian-alpha/2023/05/msg00009.html Acked-by: Paul E. McKenney <[email protected]> Acked-by: Matt Turner <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
show more ...
|
| #
203c1ce0 |
| 29-Apr-2024 |
Al Viro <[email protected]> |
RIP ->bd_inode
Signed-off-by: Al Viro <[email protected]>
|
| #
e33aef2c |
| 11-Apr-2024 |
Al Viro <[email protected]> |
block_device: add a pointer to struct address_space (page cache of bdev)
points to ->i_data of coallocated inode.
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/20
block_device: add a pointer to struct address_space (page cache of bdev)
points to ->i_data of coallocated inode.
Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
811ba89a |
| 28-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_make_it_fail to ->__bd_flags
Signed-off-by: Al Viro <[email protected]>
|
| #
49a43dae |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_ro_warned to ->__bd_flags
Signed-off-by: Al Viro <[email protected]>
|
| #
ac2b6f9d |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_has_subit_bio to ->__bd_flags
In bdev_alloc() we have all flags initialized to false, so assignment to ->bh_has_submit_bio n there is a no-op unless we have partno != 0 and flag alre
bdev: move ->bd_has_subit_bio to ->__bd_flags
In bdev_alloc() we have all flags initialized to false, so assignment to ->bh_has_submit_bio n there is a no-op unless we have partno != 0 and flag already set on entire device.
In device_add_disk() we have just allocated the block_device in question and it had been a full-device one, so the flag is guaranteed to be still clear when we get to assignment.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
4c80105e |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_write_holder into ->__bd_flags
Signed-off-by: Al Viro <[email protected]>
|
| #
01e198f0 |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: move ->bd_read_only to ->__bd_flags
Signed-off-by: Al Viro <[email protected]>
|
| #
1116b9fa |
| 12-Apr-2024 |
Al Viro <[email protected]> |
bdev: infrastructure for flags
Replace bd_partno with a 32bit field (__bd_flags). The lower 8 bits contain the partition number, the upper 24 are for flags.
Helpers: bdev_{test,set,clear}_flag(bde
bdev: infrastructure for flags
Replace bd_partno with a 32bit field (__bd_flags). The lower 8 bits contain the partition number, the upper 24 are for flags.
Helpers: bdev_{test,set,clear}_flag(bdev, flag), with atomic_or() and atomic_andnot() used to set/clear.
NOTE: this commit does not actually move any flags over there - they are still bool fields. As the result, it shifts the fields wrt cacheline boundaries; that's going to be restored once the first 3 flags are dealt with.
Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
02ccd7c3 |
| 08-Apr-2024 |
Damien Le Moal <[email protected]> |
block: Remove zone write locking
Zone write locking is now unused and replaced with zone write plugging. Remove all code that was implementing zone write locking, that is, the various helper functio
block: Remove zone write locking
Zone write locking is now unused and replaced with zone write plugging. Remove all code that was implementing zone write locking, that is, the various helper functions controlling request zone write locking and the gendisk attached zone bitmaps.
Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Hans Holmberg <[email protected]> Tested-by: Dennis Maisenbacher <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
63b5385e |
| 08-Apr-2024 |
Damien Le Moal <[email protected]> |
block: Remove BLK_STS_ZONE_RESOURCE
The zone append emulation of the scsi disk driver was the only driver using BLK_STS_ZONE_RESOURCE. With this code removed, BLK_STS_ZONE_RESOURCE is now unused. Re
block: Remove BLK_STS_ZONE_RESOURCE
The zone append emulation of the scsi disk driver was the only driver using BLK_STS_ZONE_RESOURCE. With this code removed, BLK_STS_ZONE_RESOURCE is now unused. Remove this macro definition and simplify blk_mq_dispatch_rq_list() where this status code was handled.
Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Tested-by: Hans Holmberg <[email protected]> Tested-by: Dennis Maisenbacher <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
9b1ce7f0 |
| 08-Apr-2024 |
Damien Le Moal <[email protected]> |
block: Implement zone append emulation
Given that zone write plugging manages all writes to zones of a zoned block device and tracks the write pointer position of all zones that are not full nor emp
block: Implement zone append emulation
Given that zone write plugging manages all writes to zones of a zoned block device and tracks the write pointer position of all zones that are not full nor empty, emulating zone append operations using regular writes can be implemented generically, without relying on the underlying device driver to implement such emulation. This is needed for devices that do not natively support the zone append command (e.g. SMR hard-disks).
A device may request zone append emulation by setting its max_zone_append_sectors queue limit to 0. For such device, the function blk_zone_wplug_prepare_bio() changes zone append BIOs into non-mergeable regular write BIOs. Modified zone append BIOs are flagged with the new BIO flag BIO_EMULATES_ZONE_APPEND. This flag is checked on completion of the BIO in blk_zone_write_plug_bio_endio() to restore the original REQ_OP_ZONE_APPEND operation code of the BIO.
The block layer internal inline helper function bio_is_zone_append() is added to test if a BIO is either a native zone append operation (REQ_OP_ZONE_APPEND operation code) or if it is flagged with BIO_EMULATES_ZONE_APPEND. Given that both native and emulated zone append BIO completion handling should be similar, The functions blk_update_request() and blk_zone_complete_request_bio() are modified to use bio_is_zone_append() to execute blk_zone_update_request_bio() for both native and emulated zone append operations.
This commit contains contributions from Christoph Hellwig <[email protected]>.
Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Tested-by: Hans Holmberg <[email protected]> Tested-by: Dennis Maisenbacher <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
dd291d77 |
| 08-Apr-2024 |
Damien Le Moal <[email protected]> |
block: Introduce zone write plugging
Zone write plugging implements a per-zone "plug" for write operations to control the submission and execution order of write operations to sequential write requi
block: Introduce zone write plugging
Zone write plugging implements a per-zone "plug" for write operations to control the submission and execution order of write operations to sequential write required zones of a zoned block device. Per-zone plugging guarantees that at any time there is at most only one write request per zone being executed. This mechanism is intended to replace zone write locking which implements a similar per-zone write throttling at the scheduler level, but is implemented only by mq-deadline.
Unlike zone write locking which operates on requests, zone write plugging operates on BIOs. A zone write plug is simply a BIO list that is atomically manipulated using a spinlock and a kblockd submission work. A write BIO to a zone is "plugged" to delay its execution if a write BIO for the same zone was already issued, that is, if a write request for the same zone is being executed. The next plugged BIO is unplugged and issued once the write request completes.
This mechanism allows to: - Untangle zone write ordering from block IO schedulers. This allows removing the restriction on using mq-deadline for writing to zoned block devices. Any block IO scheduler, including "none" can be used. - Zone write plugging operates on BIOs instead of requests. Plugged BIOs waiting for execution thus do not hold scheduling tags and thus are not preventing other BIOs from executing (reads or writes to other zones). Depending on the workload, this can significantly improve the device use (higher queue depth operation) and performance. - Both blk-mq (request based) zoned devices and BIO-based zoned devices (e.g. device mapper) can use zone write plugging. It is mandatory for the former but optional for the latter. BIO-based drivers can use zone write plugging to implement write ordering guarantees, or the drivers can implement their own if needed. - The code is less invasive in the block layer and is mostly limited to blk-zoned.c with some small changes in blk-mq.c, blk-merge.c and bio.c.
Zone write plugging is implemented using struct blk_zone_wplug. This structure includes a spinlock, a BIO list and a work structure to handle the submission of plugged BIOs. Zone write plugs structures are managed using a per-disk hash table.
Plugging of zone write BIOs is done using the function blk_zone_write_plug_bio() which returns false if a BIO execution does not need to be delayed and true otherwise. This function is called from blk_mq_submit_bio() after a BIO is split to avoid large BIOs spanning multiple zones which would cause mishandling of zone write plugs. This ichange enables by default zone write plugging for any mq request-based block device. BIO-based device drivers can also use zone write plugging by expliclty calling blk_zone_write_plug_bio() in their ->submit_bio method. For such devices, the driver must ensure that a BIO passed to blk_zone_write_plug_bio() is already split and not straddling zone boundaries.
Only write and write zeroes BIOs are plugged. Zone write plugging does not introduce any significant overhead for other operations. A BIO that is being handled through zone write plugging is flagged using the new BIO flag BIO_ZONE_WRITE_PLUGGING. A request handling a BIO flagged with this new flag is flagged with the new RQF_ZONE_WRITE_PLUGGING flag. The completion of BIOs and requests flagged trigger respectively calls to the functions blk_zone_write_bio_endio() and blk_zone_write_complete_request(). The latter function is used to trigger submission of the next plugged BIO using the zone plug work. blk_zone_write_bio_endio() does the same for BIO-based devices. This ensures that at any time, at most one request (blk-mq devices) or one BIO (BIO-based devices) is being executed for any zone. The handling of zone write plugs using a per-zone plug spinlock maximizes parallelism and device usage by allowing multiple zones to be writen simultaneously without lock contention.
Zone write plugging ignores flush BIOs without data. Hovever, any flush BIO that has data is always plugged so that the write part of the flush sequence is serialized with other regular writes.
Given that any BIO handled through zone write plugging will be the only BIO in flight for the target zone when it is executed, the unplugging and submission of a BIO will have no chance of successfully merging with plugged requests or requests in the scheduler. To overcome this potential performance degradation, blk_mq_submit_bio() calls the function blk_zone_write_plug_attempt_merge() to try to merge other plugged BIOs with the one just unplugged and submitted. Successful merging is signaled using blk_zone_write_plug_bio_merged(), called from bio_attempt_back_merge(). Furthermore, to avoid recalculating the number of segments of plugged BIOs to attempt merging, the number of segments of a plugged BIO is saved using the new struct bio field __bi_nr_segments. To avoid growing the size of struct bio, this field is added as a union with the bio_cookie field. This is safe to do as polling is always disabled for plugged BIOs.
When BIOs are plugged in a zone write plug, the device request queue usage counter is always incremented. This reference is kept and reused for blk-mq devices when the plugged BIO is unplugged and submitted again using submit_bio_noacct_nocheck(). For this case, the unplugged BIO is already flagged with BIO_ZONE_WRITE_PLUGGING and blk_mq_submit_bio() proceeds directly to allocating a new request for the BIO, re-using the usage reference count taken when the BIO was plugged. This extra reference count is dropped in blk_zone_write_plug_attempt_merge() for any plugged BIO that is successfully merged. Given that BIO-based devices will not take this path, the extra reference is dropped after a plugged BIO is unplugged and submitted.
Zone write plugs are dynamically allocated and managed using a hash table (an array of struct hlist_head) with RCU protection. A zone write plug is allocated when a write BIO is received for the zone and not freed until the zone is fully written, reset or finished. To detect when a zone write plug can be freed, the write state of each zone is tracked using a write pointer offset which corresponds to the offset of a zone write pointer relative to the zone start. Write operations always increment this write pointer offset. Zone reset operations set it to 0 and zone finish operations set it to the zone size.
If a write error happens, the wp_offset value of a zone write plug may become incorrect and out of sync with the device managed write pointer. This is handled using the zone write plug flag BLK_ZONE_WPLUG_ERROR. The function blk_zone_wplug_handle_error() is called from the new disk zone write plug work when this flag is set. This function executes a report zone to update the zone write pointer offset to the current value as indicated by the device. The disk zone write plug work is scheduled whenever a BIO flagged with BIO_ZONE_WRITE_PLUGGING completes with an error or when bio_zone_wplug_prepare_bio() detects an unaligned write. Once scheduled, the disk zone write plugs work keeps running until all zone errors are handled.
To match the new data structures used for zoned disks, the function disk_free_zone_bitmaps() is renamed to the more generic disk_free_zone_resources(). The function disk_init_zone_resources() is also introduced to initialize zone write plugs resources when a gendisk is allocated.
In order to guarantee that the user can simultaneously write up to a number of zones equal to a device max active zone limit or max open zone limit, zone write plugs are allocated using a mempool sized to the maximum of these 2 device limits. For a device that does not have active and open zone limits, 128 is used as the default mempool size.
If a change to the device active and open zone limits is detected, the disk mempool is resized when blk_revalidate_disk_zones() is executed.
This commit contains contributions from Christoph Hellwig <[email protected]>.
Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Hans Holmberg <[email protected]> Tested-by: Dennis Maisenbacher <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
44981351 |
| 02-Feb-2024 |
Bart Van Assche <[email protected]> |
block, fs: Restore the per-bio/request data lifetime fields
Restore support for passing data lifetime information from filesystems to block drivers. This patch reverts commit b179c98f7697 ("block: R
block, fs: Restore the per-bio/request data lifetime fields
Restore support for passing data lifetime information from filesystems to block drivers. This patch reverts commit b179c98f7697 ("block: Remove request.write_hint") and commit c75e707fe1aa ("block: remove the per-bio/request write hint").
This patch does not modify the size of struct bio because the new bi_write_hint member fills a hole in struct bio. pahole reports the following for struct bio on an x86_64 system with this patch applied:
/* size: 112, cachelines: 2, members: 20 */ /* sum members: 110, holes: 1, sum holes: 2 */ /* last cacheline: 48 bytes */
Reviewed-by: Kanchan Joshi <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
| #
c4e47bbb |
| 16-Jan-2024 |
Jens Axboe <[email protected]> |
block: move cgroup time handling code into blk.h
In preparation for moving time keeping into blk.h, move the cgroup related code for timestamps in here too. This will help avoid a circular dependenc
block: move cgroup time handling code into blk.h
In preparation for moving time keeping into blk.h, move the cgroup related code for timestamps in here too. This will help avoid a circular dependency, and also moves it into a more appropriate header as this one is private to the block layer code.
Leave struct bio_issue in blk_types.h as it's a proper time definition.
Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
1c042f8d |
| 21-Dec-2023 |
Christoph Hellwig <[email protected]> |
block: reject invalid operation in submit_bio_noacct
submit_bio_noacct allows completely invalid operations, or operations that are not supported in the bio path. Extent the existing switch stateme
block: reject invalid operation in submit_bio_noacct
submit_bio_noacct allows completely invalid operations, or operations that are not supported in the bio path. Extent the existing switch statement to rejcect all invalid types.
Move the code point for REQ_OP_ZONE_APPEND so that it's not right in the middle of the zone management operations and the switch statement can follow the numerical order of the operations.
Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4 |
|
| #
67d995e0 |
| 28-Nov-2023 |
Yu Kuai <[email protected]> |
block: warn once for each partition in bio_check_ro()
Commit 1b0a151c10a6 ("blk-core: use pr_warn_ratelimited() in bio_check_ro()") fix message storm by limit the rate, however, there will still be
block: warn once for each partition in bio_check_ro()
Commit 1b0a151c10a6 ("blk-core: use pr_warn_ratelimited() in bio_check_ro()") fix message storm by limit the rate, however, there will still be lots of message in the long term. Fix it better by warn once for each partition.
Signed-off-by: Yu Kuai <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
fad907cf |
| 28-Nov-2023 |
Ming Lei <[email protected]> |
block: move .bd_inode into 1st cacheline of block_device
The .bd_inode field of block_device is used in IO fast path of blkdev_write_iter() and blkdev_llseek(), so it is more efficient to keep it in
block: move .bd_inode into 1st cacheline of block_device
The .bd_inode field of block_device is used in IO fast path of blkdev_write_iter() and blkdev_llseek(), so it is more efficient to keep it into the 1st cacheline.
.bd_openers is only touched in open()/close(), and .bd_size_lock is only for updating bdev capacity, which is in slow path too.
So swap .bd_inode layout with .bd_openers & .bd_size_lock to move .bd_inode into the 1st cache line.
Cc: Yu Kuai <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc3, v6.7-rc2, v6.7-rc1 |
|
| #
ed5cc702 |
| 01-Nov-2023 |
Jan Kara <[email protected]> |
block: Add config option to not allow writing to mounted devices
Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more
block: Add config option to not allow writing to mounted devices
Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. We will make filesystems use this flag for used devices.
Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes.
Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening.
Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.6 |
|
| #
90f95dc4 |
| 24-Oct-2023 |
Christian Brauner <[email protected]> |
super: remove bd_fsfreeze_sb
Remove bd_fsfreeze_sb as it's now unused and can be removed. Also move bd_fsfreeze_count down to not have it weirdly placed in the middle of the holder fields.
Link: ht
super: remove bd_fsfreeze_sb
Remove bd_fsfreeze_sb as it's now unused and can be removed. Also move bd_fsfreeze_count down to not have it weirdly placed in the middle of the holder fields.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Jan Kara <[email protected]> Suggested-by: Jan Kara <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4 |
|
| #
49ef8832 |
| 27-Sep-2023 |
Christian Brauner <[email protected]> |
bdev: implement freeze and thaw holder operations
The old method of implementing block device freeze and thaw operations required us to rely on get_active_super() to walk the list of all superblocks
bdev: implement freeze and thaw holder operations
The old method of implementing block device freeze and thaw operations required us to rely on get_active_super() to walk the list of all superblocks on the system to find any superblock that might use the block device. This is wasteful and not very pleasant overall.
Now that we can finally go straight from block device to owning superblock things become way simpler.
Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|