|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
3d44e1d1 |
| 15-Feb-2025 |
Yu Kuai <[email protected]> |
md: switch personalities to use md_submodule_head
Remove the global list 'pers_list', and switch to use md_submodule_head, which is managed by xarry. Prepare to unify registration and unregistration
md: switch personalities to use md_submodule_head
Remove the global list 'pers_list', and switch to use md_submodule_head, which is managed by xarry. Prepare to unify registration and unregistration for all sub modules.
Link: https://lore.kernel.org/linux-raid/[email protected] Signed-off-by: Yu Kuai <[email protected]>
show more ...
|
| #
fbe8f2fa |
| 12-Feb-2025 |
Bart Van Assche <[email protected]> |
md/raid*: Fix the set_queue_limits implementations
queue_limits_cancel_update() must only be called if queue_limits_start_update() is called first. Remove the queue_limits_cancel_update() calls from
md/raid*: Fix the set_queue_limits implementations
queue_limits_cancel_update() must only be called if queue_limits_start_update() is called first. Remove the queue_limits_cancel_update() calls from the raid*_set_limits() functions because there is no corresponding queue_limits_start_update() call.
Cc: Christoph Hellwig <[email protected]> Fixes: c6e56cf6b2e7 ("block: move integrity information into queue_limits") Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/linux-raid/[email protected]/ Signed-off-by: Yu Kuai <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1, v6.13 |
|
| #
6a7e17b2 |
| 16-Jan-2025 |
John Garry <[email protected]> |
block: Add common atomic writes enable flag
Currently only stacked devices need to explicitly enable atomic writes by setting BLK_FEAT_ATOMIC_WRITES_STACKED flag.
This does not work well for device
block: Add common atomic writes enable flag
Currently only stacked devices need to explicitly enable atomic writes by setting BLK_FEAT_ATOMIC_WRITES_STACKED flag.
This does not work well for device mapper stacking devices, as there many sets of limits are stacked and what is the 'bottom' and 'top' device can swapped. This means that BLK_FEAT_ATOMIC_WRITES_STACKED needs to be set for many queue limits, which is messy.
Generalize enabling atomic writes enabling by ensuring that all devices must explicitly set a flag - that includes NVMe, SCSI sd, and md raid.
Signed-off-by: John Garry <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
fa6fec82 |
| 18-Nov-2024 |
John Garry <[email protected]> |
md/raid0: Atomic write support
Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. All other stacked device request queue limits should automatically be set properly. With regards to atomic
md/raid0: Atomic write support
Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. All other stacked device request queue limits should automatically be set properly. With regards to atomic write max bytes limit, this will be set at hw_max_sectors and this is limited by the stripe width, which we want.
Reviewed-by: Yu Kuai <[email protected]> Signed-off-by: John Garry <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12 |
|
| #
74538fda |
| 11-Nov-2024 |
John Garry <[email protected]> |
md/raid0: Handle bio_split() errors
Add proper bio_split() error handling. For any error, set bi_status, end the bio, and return.
Reviewed-by: Yu Kuai <[email protected]> Reviewed-by: Hannes Reine
md/raid0: Handle bio_split() errors
Add proper bio_split() error handling. For any error, set bi_status, end the bio, and return.
Reviewed-by: Yu Kuai <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6 |
|
| #
573d5abf |
| 26-Jun-2024 |
Christoph Hellwig <[email protected]> |
md: set md-specific flags for all queue limits
The md driver wants to enforce a number of flags for all devices, even when not inheriting them from the underlying devices. To make sure these flags
md: set md-specific flags for all queue limits
The md driver wants to enforce a number of flags for all devices, even when not inheriting them from the underlying devices. To make sure these flags survive the queue_limits_set calls that md uses to update the queue limits without deriving them form the previous limits add a new md_init_stacking_limits helper that calls blk_set_stacking_limits and sets these flags.
Fixes: 1122c0c1cc71 ("block: move cache control settings out of queue->flags") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc5, v6.10-rc4 |
|
| #
c6e56cf6 |
| 13-Jun-2024 |
Christoph Hellwig <[email protected]> |
block: move integrity information into queue_limits
Move the integrity information into the queue limits so that it can be set atomically with other queue limits, and that the sysfs changes to the r
block: move integrity information into queue_limits
Move the integrity information into the queue limits so that it can be set atomically with other queue limits, and that the sysfs changes to the read_verify and write_generate flags are properly synchronized. This also allows to provide a more useful helper to stack the integrity fields, although it still is separate from the main stacking function as not all stackable devices want to inherit the integrity settings. Even with that it greatly simplifies the code in md and dm.
Note that the integrity field is moved as-is into the queue limits. While there are good arguments for removing the separate blk_integrity structure, this would cause a lot of churn and might better be done at a later time if desired. However the integrity field in the queue_limits structure is now unconditional so that various ifdefs can be avoided or replaced with IS_ENABLED(). Given that tiny size of it that seems like a worthwhile trade off.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
d11854ed |
| 13-Jun-2024 |
Christoph Hellwig <[email protected]> |
md/raid0: don't free conf on raid0_run failure
The core md code calls the ->free method which already frees conf.
Fixes: 0c031fd37f69 ("md: Move alloc/free acct bioset in to personality") Signed-of
md/raid0: don't free conf on raid0_run failure
The core md code calls the ->free method which already frees conf.
Fixes: 0c031fd37f69 ("md: Move alloc/free acct bioset in to personality") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc3 |
|
| #
35f20aca |
| 04-Jun-2024 |
Christoph Hellwig <[email protected]> |
md/raid0: don't free conf on raid0_run failure
The core md code calls the ->free method which already frees conf.
Fixes: 0c031fd37f69 ("md: Move alloc/free acct bioset in to personality") Signed-of
md/raid0: don't free conf on raid0_run failure
The core md code calls the ->free method which already frees conf.
Fixes: 0c031fd37f69 ("md: Move alloc/free acct bioset in to personality") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
396799eb |
| 03-Mar-2024 |
Christoph Hellwig <[email protected]> |
md: remove mddev->queue
Just use the request_queue from the gendisk pointer in the relatively few places that sill need it.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed--by: Song Liu <son
md: remove mddev->queue
Just use the request_queue from the gendisk pointer in the relatively few places that sill need it.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed--by: Song Liu <[email protected]> Tested-by: Song Liu <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
56cf22d6 |
| 03-Mar-2024 |
Christoph Hellwig <[email protected]> |
md/raid0: use the atomic queue limit update APIs
Build the queue limits outside the queue and apply them using queue_limits_set. To make the code more obvious also split the queue limits handling i
md/raid0: use the atomic queue limit update APIs
Build the queue limits outside the queue and apply them using queue_limits_set. To make the code more obvious also split the queue limits handling into a separate helper function.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed--by: Song Liu <[email protected]> Tested-by: Song Liu <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
176df894 |
| 03-Mar-2024 |
Christoph Hellwig <[email protected]> |
md: add a mddev_is_dm helper
Add a helper to check for a DM-mapped MD device instead of using the obfuscated ->gendisk or ->queue NULL checks.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed
md: add a mddev_is_dm helper
Add a helper to check for a DM-mapped MD device instead of using the obfuscated ->gendisk or ->queue NULL checks.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed--by: Song Liu <[email protected]> Tested-by: Song Liu <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
c396b90e |
| 03-Mar-2024 |
Christoph Hellwig <[email protected]> |
md: add a mddev_trace_remap helper
Add a helper to trace bio remapping that hides some argument dereferences and the check for a DM-mapped MD device.
Signed-off-by: Christoph Hellwig <[email protected]> R
md: add a mddev_trace_remap helper
Add a helper to trace bio remapping that hides some argument dereferences and the check for a DM-mapped MD device.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed--by: Song Liu <[email protected]> Tested-by: Song Liu <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7 |
|
| #
cc22b540 |
| 16-Aug-2023 |
David Jeffery <[email protected]> |
md: raid0: account for split bio in iostat accounting
When a bio is split by md raid0, the newly created bio will not be tracked by md for I/O accounting. Only the portion of I/O still assigned to t
md: raid0: account for split bio in iostat accounting
When a bio is split by md raid0, the newly created bio will not be tracked by md for I/O accounting. Only the portion of I/O still assigned to the original bio which was reduced by the split will be accounted for. This results in md iostat data sometimes showing I/O values far below the actual amount of data being sent through md.
md_account_bio() needs to be called for all bio generated by the bio split.
A simple example of the issue was generated using a raid0 device on partitions to the same device. Since all raid0 I/O then goes to one device, it makes it easy to see a gap between the md device and its sd storage. Reading an lvm device on top of the md device, the iostat output (some 0 columns and extra devices removed to make the data more compact) was:
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read md2 0.00 0.00 0.00 0.00 0 sde 0.00 0.00 0.00 0.00 0 md2 1364.00 411496.00 0.00 0.00 411496 sde 1734.00 646144.00 0.00 0.00 646144 md2 1699.00 510680.00 0.00 0.00 510680 sde 2155.00 802784.00 0.00 0.00 802784 md2 803.00 241480.00 0.00 0.00 241480 sde 1016.00 377888.00 0.00 0.00 377888 md2 0.00 0.00 0.00 0.00 0 sde 0.00 0.00 0.00 0.00 0
I/O was generated doing large direct I/O reads (12M) with dd to a linear lvm volume on top of the 4 leg raid0 device.
The md2 reads were showing as roughly 2/3 of the reads to the sde device containing all of md2's raid partitions. The sum of reads to sde was 1826816 kB, which was the expected amount as it was the amount read by dd. With the patch, the total reads from md will match the reads from sde and be consistent with the amount of I/O generated.
Fixes: 10764815ff47 ("md: add io accounting for raid0 and raid5") Signed-off-by: David Jeffery <[email protected]> Tested-by: Laurence Oberman <[email protected]> Reviewed-by: Laurence Oberman <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
319ff40a |
| 14-Aug-2023 |
Jan Kara <[email protected]> |
md/raid0: Fix performance regression for large sequential writes
Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other things changed how bio that needs to be split is submitted. Befor
md/raid0: Fix performance regression for large sequential writes
Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other things changed how bio that needs to be split is submitted. Before this commit, we have split the bio, mapped and submitted each part. After this commit, we map only the first part of the split bio and submit the second part unmapped. Due to bio sorting in __submit_bio_noacct() this results in the following request ordering:
9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392
Split off chunk-sized (1024 sectors) request:
9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488
Request is unaligned to the chunk so it's split in raid0_make_request(). This is the first part mapped and punted to bio_list:
8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464
Now raid0_make_request() returns, second part is postponed on bio_list. __submit_bio_noacct() resorts the bio_list, mapped request is submitted to the underlying device:
8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016
Now we take another request from the bio_list which is the remainder of the original huge request. Split off another chunk-sized bit from it and the situation repeats:
9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512 8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488 8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016 ... 9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write] 8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512 8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016 ...
This repeats until we consume the whole original huge request. Now we finally get to processing the second parts of the split off requests (in reverse order):
8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920 8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896 8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872 8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848 ...
I guess it is obvious that this IO pattern is extremely inefficient way to perform sequential IO. It also makes bio_list to grow to rather long lengths.
Change raid0_make_request() to map both parts of the split bio. Since we know we are provided with at most chunk-sized bios, we will always need to split the incoming bio at most once.
Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Song Liu <[email protected]>
show more ...
|
| #
af50e20a |
| 14-Aug-2023 |
Jan Kara <[email protected]> |
md/raid0: Factor out helper for mapping and submitting a bio
Factor out helper function for mapping and submitting a bio out of raid0_make_request(). We will use it later for submitting both parts o
md/raid0: Factor out helper for mapping and submitting a bio
Factor out helper function for mapping and submitting a bio out of raid0_make_request(). We will use it later for submitting both parts of a split bio.
Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Song Liu <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4 |
|
| #
c567c86b |
| 21-Jun-2023 |
Yu Kuai <[email protected]> |
md: move initialization and destruction of 'io_acct_set' to md.c
'io_acct_set' is only used for raid0 and raid456, prepare to use it for raid1 and raid10, so that io accounting from different levels
md: move initialization and destruction of 'io_acct_set' to md.c
'io_acct_set' is only used for raid0 and raid456, prepare to use it for raid1 and raid10, so that io accounting from different levels can be consistent.
By the way, follow up patches will also use this io clone mechanism to make sure 'active_io' represents in flight io, not io that is dispatching, so that mddev_suspend will wait for io to be done as designed.
Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Xiao Ni <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
e8360070 |
| 23-Jun-2023 |
Jason Baron <[email protected]> |
md/raid0: add discard support for the 'original' layout
We've found that using raid0 with the 'original' layout and discard enabled with different disk sizes (such that at least two zones are create
md/raid0: add discard support for the 'original' layout
We've found that using raid0 with the 'original' layout and discard enabled with different disk sizes (such that at least two zones are created) can result in data corruption. This is due to the fact that the discard handling in 'raid0_handle_discard()' assumes the 'alternate' layout. We've seen this corruption using ext4 but other filesystems are likely susceptible as well.
More specifically, while multiple zones are necessary to create the corruption, the corruption may not occur with multiple zones if they layout in such a way the layout matches what the 'alternate' layout would have produced. Thus, not all raid0 devices with the 'original' layout, different size disks and discard enabled will encounter this corruption.
The 3.14 kernel inadvertently changed the raid0 disk layout for different size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the same raid0 array could corrupt data. This lead to the creation of the 'original' layout (to match the pre-3.14 layout) and the 'alternate' layout (to match the post 3.14 layout) in the 5.4 kernel time frame and an option to tell the kernel which layout to use (since it couldn't be autodetected). However, when the 'original' layout was added back to 5.4 discard support for the 'original' layout was not added leading this issue.
I've been able to reliably reproduce the corruption with the following test case:
1. create raid0 array with different size disks using original layout 2. mkfs 3. mount -o discard 4. create lots of files 5. remove 1/2 the files 6. fstrim -a (or just the mount point for the raid0 array) 7. umount 8. fsck -fn /dev/md0 (spews all sorts of corruptions)
Let's fix this by adding proper discard support to the 'original' layout. The fix 'maps' the 'original' layout disks to the order in which they are read/written such that we can compare the disks in the same way that the current 'alternate' layout does. A 'disk_shift' field is added to 'struct strip_zone'. This could be computed on the fly in raid0_handle_discard() but by adding this field, we save some computation in the discard path.
Note we could also potentially fix this by re-ordering the disks in the zones that follow the first one, and then always read/writing them using the 'alternate' layout. However, that is seen as a more substantial change, and we are attempting the least invasive fix at this time to remedy the corruption.
I've verified the change using the reproducer mentioned above. Typically, the corruption is seen after less than 3 iterations, while the patch has run 500+ iterations.
Cc: NeilBrown <[email protected]> Cc: Song Liu <[email protected]> Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.") Cc: [email protected] Signed-off-by: Jason Baron <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2 |
|
| #
c31fea2f |
| 06-Mar-2023 |
Mariusz Tkaczyk <[email protected]> |
md: add error_handlers for raid0 and linear
After the commit 9631abdbf406c("md: Set MD_BROKEN for RAID1 and RAID10") MD_BROKEN must be set if array is failed because state_store() checks it. If it i
md: add error_handlers for raid0 and linear
After the commit 9631abdbf406c("md: Set MD_BROKEN for RAID1 and RAID10") MD_BROKEN must be set if array is failed because state_store() checks it. If it is set then -EBUSY is returned to userspace.
For raid0 and linear MD_BROKEN is not set by error_handler(). As a result mdadm is unable to trigger clean-up actions. It is a regression.
This patch adds appropriate error_handler for raid0 and linear. The error handler sets MD_BROKEN for this device.
Reviewed-by: Xiao Ni <[email protected]> Signed-off-by: Mariusz Tkaczyk <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4 |
|
| #
8e1a2279 |
| 02-Nov-2022 |
Xiao Ni <[email protected]> |
md/raid0, raid10: Don't set discard sectors for request queue
It should use disk_stack_limits to get a proper max_discard_sectors rather than setting a value by stack drivers.
And there is a bug. I
md/raid0, raid10: Don't set discard sectors for request queue
It should use disk_stack_limits to get a proper max_discard_sectors rather than setting a value by stack drivers.
And there is a bug. If all member disks are rotational devices, raid0/raid10 set max_discard_sectors. So the member devices are not ssd/nvme, but raid0/raid10 export the wrong value. It reports warning messages in function __blkdev_issue_discard when mkfs.xfs like this:
[ 4616.022599] ------------[ cut here ]------------ [ 4616.027779] WARNING: CPU: 4 PID: 99634 at block/blk-lib.c:50 __blkdev_issue_discard+0x16a/0x1a0 [ 4616.140663] RIP: 0010:__blkdev_issue_discard+0x16a/0x1a0 [ 4616.146601] Code: 24 4c 89 20 31 c0 e9 fe fe ff ff c1 e8 09 8d 48 ff 4c 89 f0 4c 09 e8 48 85 c1 0f 84 55 ff ff ff b8 ea ff ff ff e9 df fe ff ff <0f> 0b 48 8d 74 24 08 e8 ea d6 00 00 48 c7 c6 20 1e 89 ab 48 c7 c7 [ 4616.167567] RSP: 0018:ffffaab88cbffca8 EFLAGS: 00010246 [ 4616.173406] RAX: ffff9ba1f9e44678 RBX: 0000000000000000 RCX: ffff9ba1c9792080 [ 4616.181376] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ba1c9792080 [ 4616.189345] RBP: 0000000000000cc0 R08: ffffaab88cbffd10 R09: 0000000000000000 [ 4616.197317] R10: 0000000000000012 R11: 0000000000000000 R12: 0000000000000000 [ 4616.205288] R13: 0000000000400000 R14: 0000000000000cc0 R15: ffff9ba1c9792080 [ 4616.213259] FS: 00007f9a5534e980(0000) GS:ffff9ba1b7c80000(0000) knlGS:0000000000000000 [ 4616.222298] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4616.228719] CR2: 000055a390a4c518 CR3: 0000000123e40006 CR4: 00000000001706e0 [ 4616.236689] Call Trace: [ 4616.239428] blkdev_issue_discard+0x52/0xb0 [ 4616.244108] blkdev_common_ioctl+0x43c/0xa00 [ 4616.248883] blkdev_ioctl+0x116/0x280 [ 4616.252977] __x64_sys_ioctl+0x8a/0xc0 [ 4616.257163] do_syscall_64+0x5c/0x90 [ 4616.261164] ? handle_mm_fault+0xc5/0x2a0 [ 4616.265652] ? do_user_addr_fault+0x1d8/0x690 [ 4616.270527] ? do_syscall_64+0x69/0x90 [ 4616.274717] ? exc_page_fault+0x62/0x150 [ 4616.279097] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 4616.284748] RIP: 0033:0x7f9a55398c6b
Signed-off-by: Xiao Ni <[email protected]> Reported-by: Yi Zhang <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Song Liu <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3 |
|
| #
1727fd50 |
| 23-Aug-2022 |
Saurabh Sengar <[email protected]> |
md: Replace snprintf with scnprintf
Current code produces a warning as shown below when total characters in the constituent block device names plus the slashes exceeds 200. snprintf() returns the nu
md: Replace snprintf with scnprintf
Current code produces a warning as shown below when total characters in the constituent block device names plus the slashes exceeds 200. snprintf() returns the number of characters generated from the given input, which could cause the expression “200 – len” to wrap around to a large positive number. Fix this by using scnprintf() instead, which returns the actual number of characters written into the buffer.
[ 1513.267938] ------------[ cut here ]------------ [ 1513.267943] WARNING: CPU: 15 PID: 37247 at <snip>/lib/vsprintf.c:2509 vsnprintf+0x2c8/0x510 [ 1513.267944] Modules linked in: <snip> [ 1513.267969] CPU: 15 PID: 37247 Comm: mdadm Not tainted 5.4.0-1085-azure #90~18.04.1-Ubuntu [ 1513.267969] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022 [ 1513.267971] RIP: 0010:vsnprintf+0x2c8/0x510 <-snip-> [ 1513.267982] Call Trace: [ 1513.267986] snprintf+0x45/0x70 [ 1513.267990] ? disk_name+0x71/0xa0 [ 1513.267993] dump_zones+0x114/0x240 [raid0] [ 1513.267996] ? _cond_resched+0x19/0x40 [ 1513.267998] raid0_run+0x19e/0x270 [raid0] [ 1513.268000] md_run+0x5e0/0xc50 [ 1513.268003] ? security_capable+0x3f/0x60 [ 1513.268005] do_md_run+0x19/0x110 [ 1513.268006] md_ioctl+0x195e/0x1f90 [ 1513.268007] blkdev_ioctl+0x91f/0x9f0 [ 1513.268010] block_ioctl+0x3d/0x50 [ 1513.268012] do_vfs_ioctl+0xa9/0x640 [ 1513.268014] ? __fput+0x162/0x260 [ 1513.268016] ksys_ioctl+0x75/0x80 [ 1513.268017] __x64_sys_ioctl+0x1a/0x20 [ 1513.268019] do_syscall_64+0x5e/0x200 [ 1513.268021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 766038846e875 ("md/raid0: replace printk() with pr_*()") Reviewed-by: Michael Kelley <[email protected]> Acked-by: Guoqing Jiang <[email protected]> Signed-off-by: Saurabh Sengar <[email protected]> Signed-off-by: Song Liu <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7 |
|
| #
0f2571ad |
| 12-May-2022 |
Xiao Ni <[email protected]> |
md: Don't set mddev private to NULL in raid0 pers->free
In normal stop process, it does like this: do_md_stop | __md_stop (pers->free(); mddev->private=NULL) | md_free (free mdd
md: Don't set mddev private to NULL in raid0 pers->free
In normal stop process, it does like this: do_md_stop | __md_stop (pers->free(); mddev->private=NULL) | md_free (free mddev) __md_stop sets mddev->private to NULL after pers->free. The raid device will be stopped and mddev memory is free. But in reshape, it doesn't free the mddev and mddev will still be used in new raid.
In reshape, it first sets mddev->private to new_pers and then runs old_pers->free(). Now raid0 sets mddev->private to NULL in raid0_free. The new raid can't work anymore. It will panic when dereference mddev->private because of NULL pointer dereference.
It can panic like this: [63010.814972] kernel BUG at drivers/md/raid10.c:928! [63010.819778] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [63010.825011] CPU: 3 PID: 44437 Comm: md0_resync Kdump: loaded Not tainted 5.14.0-86.el9.x86_64 #1 [63010.833789] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.15.0 09/11/2020 [63010.841440] RIP: 0010:raise_barrier+0x161/0x170 [raid10] [63010.865508] RSP: 0018:ffffc312408bbc10 EFLAGS: 00010246 [63010.870734] RAX: 0000000000000000 RBX: ffffa00bf7d39800 RCX: 0000000000000000 [63010.877866] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffa00bf7d39800 [63010.884999] RBP: 0000000000000000 R08: fffffa4945e74400 R09: 0000000000000000 [63010.892132] R10: ffffa00eed02f798 R11: 0000000000000000 R12: ffffa00bbc435200 [63010.899266] R13: ffffa00bf7d39800 R14: 0000000000000400 R15: 0000000000000003 [63010.906399] FS: 0000000000000000(0000) GS:ffffa00eed000000(0000) knlGS:0000000000000000 [63010.914485] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [63010.920229] CR2: 00007f5cfbe99828 CR3: 0000000105efe000 CR4: 00000000003506e0 [63010.927363] Call Trace: [63010.929822] ? bio_reset+0xe/0x40 [63010.933144] ? raid10_alloc_init_r10buf+0x60/0xa0 [raid10] [63010.938629] raid10_sync_request+0x756/0x1610 [raid10] [63010.943770] md_do_sync.cold+0x3e4/0x94c [63010.947698] md_thread+0xab/0x160 [63010.951024] ? md_write_inc+0x50/0x50 [63010.954688] kthread+0x149/0x170 [63010.957923] ? set_kthread_struct+0x40/0x40 [63010.962107] ret_from_fork+0x22/0x30
Removing the code that sets mddev->private to NULL in raid0 can fix problem.
Fixes: 0c031fd37f69 (md: Move alloc/free acct bioset in to personality) Reported-by: Fine Fan <[email protected]> Signed-off-by: Xiao Ni <[email protected]> Signed-off-by: Song Liu <[email protected]>
show more ...
|
| #
913cce5a |
| 12-May-2022 |
Christoph Hellwig <[email protected]> |
md: remove most calls to bdevname
Use the %pg format specifier to save on stack consumption and code size.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Song Liu <[email protected]>
|
|
Revision tags: v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3 |
|
| #
ea23994e |
| 13-Apr-2022 |
Pascal Hambourg <[email protected]> |
md/raid0: Ignore RAID0 layout if the second zone has only one device
The RAID0 layout is irrelevant if all members have the same size so the array has only one zone. It is *also* irrelevant if the a
md/raid0: Ignore RAID0 layout if the second zone has only one device
The RAID0 layout is irrelevant if all members have the same size so the array has only one zone. It is *also* irrelevant if the array has two zones and the second zone has only one device, for example if the array has two members of different sizes.
So in that case it makes sense to allow assembly even when the layout is undefined, like what is done when the array has only one zone.
Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Pascal Hambourg <[email protected]> Signed-off-by: Song Liu <[email protected]>
show more ...
|
| #
70200574 |
| 15-Apr-2022 |
Christoph Hellwig <[email protected]> |
block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes.
The only places where needs special attention
block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes.
The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Acked-by: Christoph Böhmwalder <[email protected]> [drbd] Acked-by: Jan Höppner <[email protected]> [s390] Acked-by: Coly Li <[email protected]> [bcache] Acked-by: David Sterba <[email protected]> [btrfs] Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|