|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
896b02d0 |
| 18-Mar-2025 |
Ojaswin Mujoo <[email protected]> |
ext4: Make sb update interval tunable
Currently, outside error paths, we auto commit the super block after 1 hour has passed and 16MB worth of updates have been written since last commit. This is a
ext4: Make sb update interval tunable
Currently, outside error paths, we auto commit the super block after 1 hour has passed and 16MB worth of updates have been written since last commit. This is a policy decision so make this tunable while keeping the defaults same. This is useful if user wants to tweak the superblock behavior or for debugging the codepath by allowing to trigger it more frequently.
We can now tweak the super block update using sb_update_sec and sb_update_kb files in /sys/fs/ext4/<dev>/
Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Ritesh Harjani (IBM) <[email protected]> Reviewed-by: Baokun Li <[email protected]> Signed-off-by: Ojaswin Mujoo <[email protected]> Link: https://patch.msgid.link/950fb8c9b2905620e16f02a3b9eeea5a5b6cb87e.1742279837.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
63bfe841 |
| 19-Mar-2024 |
Baokun Li <[email protected]> |
ext4: add positive int attr pointer to avoid sysfs variables overflow
The following variables controlled by the sysfs interface are of type int and are normally used in the range [0, INT_MAX], but a
ext4: add positive int attr pointer to avoid sysfs variables overflow
The following variables controlled by the sysfs interface are of type int and are normally used in the range [0, INT_MAX], but are declared as attr_pointer_ui, and thus may be set to values that exceed INT_MAX and result in overflows to get negative values.
err_ratelimit_burst msg_ratelimit_burst warning_ratelimit_burst err_ratelimit_interval_ms msg_ratelimit_interval_ms warning_ratelimit_interval_ms
Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a value range of 0-INT_MAX to avoid overflow.
Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
b7b2a579 |
| 19-Mar-2024 |
Baokun Li <[email protected]> |
ext4: add new attr pointer attr_mb_order
The s_mb_best_avail_max_trim_order is of type unsigned int, and has a range of values well beyond the normal use of the mb_order. Although the mballoc code i
ext4: add new attr pointer attr_mb_order
The s_mb_best_avail_max_trim_order is of type unsigned int, and has a range of values well beyond the normal use of the mb_order. Although the mballoc code is careful enough that large numbers don't matter there, but this can mislead the sysadmin into thinking that it's normal to set such values. Hence add a new attr_id attr_mb_order with values in the range [0, 64] to avoid storing garbage values and make us more resilient to surprises in the future.
Suggested-by: Jan Kara <[email protected]> Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
13df4d44 |
| 19-Mar-2024 |
Baokun Li <[email protected]> |
ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
We can trigger a slab-out-of-bounds with the following commands:
mkfs.ext4 -F /dev/$disk 10G mount /dev/$disk /tmp/t
ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
We can trigger a slab-out-of-bounds with the following commands:
mkfs.ext4 -F /dev/$disk 10G mount /dev/$disk /tmp/test echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc echo test > /tmp/test/file && sync
================================================================== BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4] Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11 CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521 Call Trace: dump_stack_lvl+0x2c/0x50 kasan_report+0xb6/0xf0 ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4] ext4_mb_regular_allocator+0x19e9/0x2370 [ext4] ext4_mb_new_blocks+0x88a/0x1370 [ext4] ext4_ext_map_blocks+0x14f7/0x2390 [ext4] ext4_map_blocks+0x569/0xea0 [ext4] ext4_do_writepages+0x10f6/0x1bc0 [ext4] [...] ==================================================================
The flow of issue triggering is as follows:
// Set s_mb_group_prealloc to 2147483647 via sysfs ext4_mb_new_blocks ext4_mb_normalize_request ext4_mb_normalize_group_request ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc ext4_mb_regular_allocator ext4_mb_choose_next_group ext4_mb_choose_next_group_best_avail mb_avg_fragment_size_order order = fls(len) - 2 = 29 ext4_mb_find_good_group_avg_frag_lists frag_list = &sbi->s_mb_avg_fragment_size[order] if (list_empty(frag_list)) // Trigger SOOB!
At 4k block size, the length of the s_mb_avg_fragment_size list is 14, but an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds to be triggered by an attempt to access an element at index 29.
Add a new attr_id attr_clusters_in_group with values in the range [0, sbi->s_clusters_per_group] and declare mb_group_prealloc as that type to fix the issue. In addition avoid returning an order from mb_avg_fragment_size_order() greater than MB_NUM_ORDERS(sb) and reduce some useless loops.
Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)") CC: [email protected] Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Ojaswin Mujoo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
57341fe3 |
| 19-Mar-2024 |
Baokun Li <[email protected]> |
ext4: refactor out ext4_generic_attr_show()
Refactor out the function ext4_generic_attr_show() to handle the reading of values of various common types, with no functional changes.
Signed-off-by: Ba
ext4: refactor out ext4_generic_attr_show()
Refactor out the function ext4_generic_attr_show() to handle the reading of values of various common types, with no functional changes.
Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
f536808a |
| 19-Mar-2024 |
Baokun Li <[email protected]> |
ext4: refactor out ext4_generic_attr_store()
Refactor out the function ext4_generic_attr_store() to handle the setting of values of various common types, with no functional changes.
Signed-off-by:
ext4: refactor out ext4_generic_attr_store()
Refactor out the function ext4_generic_attr_store() to handle the setting of values of various common types, with no functional changes.
Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
9e8e819f |
| 19-Mar-2024 |
Baokun Li <[email protected]> |
ext4: avoid overflow when setting values via sysfs
When setting values of type unsigned int through sysfs, we use kstrtoul() to parse it and then truncate part of it as the final set value, when the
ext4: avoid overflow when setting values via sysfs
When setting values of type unsigned int through sysfs, we use kstrtoul() to parse it and then truncate part of it as the final set value, when the set value is greater than UINT_MAX, the set value will not match what we see because of the truncation. As follows:
$ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups $ cat /sys/fs/ext4/sda/mb_max_linear_groups 0
So we use kstrtouint() to parse the attr_pointer_ui type to avoid the inconsistency described above. In addition, a judgment is added to avoid setting s_resv_clusters less than 0.
Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5 |
|
| #
f52f3d2b |
| 30-May-2023 |
Ojaswin Mujoo <[email protected]> |
ext4: Give symbolic names to mballoc criterias
mballoc criterias have historically been called by numbers like CR0, CR1... however this makes it confusing to understand what each criteria is about.
ext4: Give symbolic names to mballoc criterias
mballoc criterias have historically been called by numbers like CR0, CR1... however this makes it confusing to understand what each criteria is about.
Change these criterias from numbers to symbolic names and add relevant comments. While we are at it, also reformat and add some comments to ext4_seq_mb_stats_show() for better readability.
Additionally, define CR_FAST which signifies the criteria below which we can make quicker decisions like: * quitting early if (free block < requested len) * avoiding to scan free extents smaller than required len. * avoiding to initialize buddy cache and work with existing cache * limiting prefetches
Suggested-by: Jan Kara <[email protected]> Signed-off-by: Ojaswin Mujoo <[email protected]> Link: https://lore.kernel.org/r/a2dc6ec5aea5e5e68cf8e788c2a964ffead9c8b0.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
7e170922 |
| 30-May-2023 |
Ojaswin Mujoo <[email protected]> |
ext4: Add allocation criteria 1.5 (CR1_5)
CR1_5 aims to optimize allocations which can't be satisfied in CR1. The fact that we couldn't find a group in CR1 suggests that it would be difficult to fin
ext4: Add allocation criteria 1.5 (CR1_5)
CR1_5 aims to optimize allocations which can't be satisfied in CR1. The fact that we couldn't find a group in CR1 suggests that it would be difficult to find a continuous extent to compleltely satisfy our allocations. So before falling to the slower CR2, in CR1.5 we proactively trim the the preallocations so we can find a group with (free / fragments) big enough. This speeds up our allocation at the cost of slightly reduced preallocation.
The patch also adds a new sysfs tunable:
* /sys/fs/ext4/<partition>/mb_cr1_5_max_trim_order
This controls how much CR1.5 can trim a request before falling to CR2. For example, for a request of order 7 and max trim order 2, CR1.5 can trim this upto order 5.
Suggested-by: Ritesh Harjani (IBM) <[email protected]> Signed-off-by: Ojaswin Mujoo <[email protected]> Reviewed-by: Ritesh Harjani (IBM) <[email protected]> Link: https://lore.kernel.org/r/150fdf65c8e4cc4dba71e020ce0859bcf636a5ff.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
361eb69f |
| 25-Mar-2023 |
Ojaswin Mujoo <[email protected]> |
ext4: Remove the logic to trim inode PAs
Earlier, inode PAs were stored in a linked list. This caused a need to periodically trim the list down inorder to avoid growing it to a very large size, as t
ext4: Remove the logic to trim inode PAs
Earlier, inode PAs were stored in a linked list. This caused a need to periodically trim the list down inorder to avoid growing it to a very large size, as this would severly affect performance during list iteration.
Recent patches changed this list to an rbtree, and since the tree scales up much better, we no longer need to have the trim functionality, hence remove it.
Signed-off-by: Ojaswin Mujoo <[email protected]> Reviewed-by: Ritesh Harjani (IBM) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/c409addceaa3ade4b40328e28e3b54b2f259689e.1679731817.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8 |
|
| #
60db00e0 |
| 09-Feb-2023 |
Thomas Weißschuh <[email protected]> |
ext4: make kobj_type structures constant
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type.
Take advantage of this to co
ext4: make kobj_type structures constant
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definitions to prevent modification at runtime.
Signed-off-by: Thomas Weißschuh <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3 |
|
| #
d99a55a9 |
| 04-Jan-2023 |
Kees Cook <[email protected]> |
ext4: fix function prototype mismatch for ext4_feat_ktype
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function point
ext4: fix function prototype mismatch for ext4_feat_ktype
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed.
ext4_feat_ktype was setting the "release" handler to "kfree", which doesn't have a matching function prototype. Add a simple wrapper with the correct prototype.
This was found as a result of Clang's new -Wcast-function-type-strict flag, which is more sensitive than the simpler -Wcast-function-type, which only checks for type width mismatches.
Note that this code is only reached when ext4 is a loadable module and it is being unloaded:
CFI failure at kobject_put+0xbb/0x1b0 (target: kfree+0x0/0x180; expected type: 0x7c4aa698) ... RIP: 0010:kobject_put+0xbb/0x1b0 ... Call Trace: <TASK> ext4_exit_sysfs+0x14/0x60 [ext4] cleanup_module+0x67/0xedb [ext4]
Fixes: b99fee58a20a ("ext4: create ext4_feat kobject dynamically") Cc: Theodore Ts'o <[email protected]> Cc: Eric Biggers <[email protected]> Cc: [email protected] Build-tested-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
118901ad |
| 04-Jan-2023 |
Kees Cook <[email protected]> |
ext4: Fix function prototype mismatch for ext4_feat_ktype
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function point
ext4: Fix function prototype mismatch for ext4_feat_ktype
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed.
ext4_feat_ktype was setting the "release" handler to "kfree", which doesn't have a matching function prototype. Add a simple wrapper with the correct prototype.
This was found as a result of Clang's new -Wcast-function-type-strict flag, which is more sensitive than the simpler -Wcast-function-type, which only checks for type width mismatches.
Note that this code is only reached when ext4 is a loadable module and it is being unloaded:
CFI failure at kobject_put+0xbb/0x1b0 (target: kfree+0x0/0x180; expected type: 0x7c4aa698) ... RIP: 0010:kobject_put+0xbb/0x1b0 ... Call Trace: <TASK> ext4_exit_sysfs+0x14/0x60 [ext4] cleanup_module+0x67/0xedb [ext4]
Fixes: b99fee58a20a ("ext4: create ext4_feat kobject dynamically") Cc: Theodore Ts'o <[email protected]> Cc: Eric Biggers <[email protected]> Cc: [email protected] Build-tested-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
5298d4bf |
| 18-Jan-2022 |
Christoph Hellwig <[email protected]> |
unicode: clean up the Kconfig symbol confusion
Turn the CONFIG_UNICODE symbol into a tristate that generates some always built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.
Note
unicode: clean up the Kconfig symbol confusion
Turn the CONFIG_UNICODE symbol into a tristate that generates some always built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.
Note that a lot of the IS_ENABLED() checks could be turned from cpp statements into normal ifs, but this change is intended to be fairly mechanic, so that should be cleaned up later.
Fixes: 2b3d04787012 ("unicode: Add utf8-data module") Reported-by: Linus Torvalds <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1 |
|
| #
4a69aecb |
| 03-Nov-2021 |
Lukas Czerner <[email protected]> |
ext4: allow to change s_last_trim_minblks via sysfs
Ext4 has an optimization mechanism for batched disacrd (FITRIM) that should help speed up subsequent calls of FITRIM ioctl by skipping the groups
ext4: allow to change s_last_trim_minblks via sysfs
Ext4 has an optimization mechanism for batched disacrd (FITRIM) that should help speed up subsequent calls of FITRIM ioctl by skipping the groups that were previously trimmed. However because the FITRIM allows to set the minimum size of an extent to trim, ext4 stores the last minimum extent size and only avoids trimming the group if it was previously trimmed with minimum extent size equal to, or smaller than the current call.
There is currently no way to bypass the optimization without umount/mount cycle. This becomes a problem when the file system is live migrated to a different storage, because the optimization will prevent possibly useful discard calls to the storage.
Fix it by exporting the s_last_trim_minblks via sysfs interface which will allow us to set the minimum size to the number of blocks larger than subsequent FITRIM call, effectively bypassing the optimization.
By setting the s_last_trim_minblks to ULONG_MAX the optimization will be effectively cleared regardless of the previous state, or file system configuration.
For example: getconf ULONG_MAX > /sys/fs/ext4/dm-1/last_trim_minblks
Signed-off-by: Lukas Czerner <[email protected]> Reported-by: Laurent GUERBY <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v5.15, v5.15-rc7, v5.15-rc6 |
|
| #
dfac1a16 |
| 13-Oct-2021 |
Qing Wang <[email protected]> |
ext4: replace snprintf in show functions with sysfs_emit
coccicheck complains about the use of snprintf() in sysfs show functions.
Fix the coccicheck warning: WARNING: use scnprintf or sprintf.
Us
ext4: replace snprintf in show functions with sysfs_emit
coccicheck complains about the use of snprintf() in sysfs show functions.
Fix the coccicheck warning: WARNING: use scnprintf or sprintf.
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Signed-off-by: Qing Wang <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6 |
|
| #
d578b994 |
| 11-Jun-2021 |
Jonathan Davies <[email protected]> |
ext4: notify sysfs on errors_count value change
After s_error_count is incremented, signal the change in the corresponding sysfs attribute via sysfs_notify. This allows userspace to poll() on change
ext4: notify sysfs on errors_count value change
After s_error_count is incremented, signal the change in the corresponding sysfs attribute via sysfs_notify. This allows userspace to poll() on changes to /sys/fs/ext4/*/errors_count.
[ Moved call of ext4_notify_error_sysfs() to flush_stashed_error_work() to avoid BUG's caused by calling sysfs_notify trying to sleep after being called from an invalid context. -- TYT ]
Signed-off-by: Jonathan Davies <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc5 |
|
| #
e71f99f2 |
| 03-Jun-2021 |
Daniel Rosenberg <[email protected]> |
ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
Encrypted casefolding is only supported when both encryption and casefolding are both enabled in the config.
Fixes: 4
ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
Encrypted casefolding is only supported when both encryption and casefolding are both enabled in the config.
Fixes: 471fbbea7ff7 ("ext4: handle casefolding with encryption") Cc: [email protected] # 5.13+ Signed-off-by: Daniel Rosenberg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6 |
|
| #
f68f4063 |
| 01-Apr-2021 |
Harshad Shirwadkar <[email protected]> |
ext4: add proc files to monitor new structures
This patch adds a new file "mb_structs_summary" which allows us to see the summary of the new allocator structures added in this series. Here's the sam
ext4: add proc files to monitor new structures
This patch adds a new file "mb_structs_summary" which allows us to see the summary of the new allocator structures added in this series. Here's the sample output of file:
optimize_scan: 1 max_free_order_lists: list_order_0_groups: 0 list_order_1_groups: 0 list_order_2_groups: 0 list_order_3_groups: 0 list_order_4_groups: 0 list_order_5_groups: 0 list_order_6_groups: 0 list_order_7_groups: 0 list_order_8_groups: 0 list_order_9_groups: 0 list_order_10_groups: 0 list_order_11_groups: 0 list_order_12_groups: 0 list_order_13_groups: 40 fragment_size_tree: tree_min: 16384 tree_max: 32768 tree_nodes: 40
Signed-off-by: Harshad Shirwadkar <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Ritesh Harjani <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
196e402a |
| 01-Apr-2021 |
Harshad Shirwadkar <[email protected]> |
ext4: improve cr 0 / cr 1 group scanning
Instead of traversing through groups linearly, scan groups in specific orders at cr 0 and cr 1. At cr 0, we want to find groups that have the largest free or
ext4: improve cr 0 / cr 1 group scanning
Instead of traversing through groups linearly, scan groups in specific orders at cr 0 and cr 1. At cr 0, we want to find groups that have the largest free order >= the order of the request. So, with this patch, we maintain lists for each possible order and insert each group into a list based on the largest free order in its buddy bitmap. During cr 0 allocation, we traverse these lists in the increasing order of largest free orders. This allows us to find a group with the best available cr 0 match in constant time. If nothing can be found, we fallback to cr 1 immediately.
At CR1, the story is slightly different. We want to traverse in the order of increasing average fragment size. For CR1, we maintain a rb tree of groupinfos which is sorted by average fragment size. Instead of traversing linearly, at CR1, we traverse in the order of increasing average fragment size, starting at the most optimal group. This brings down cr 1 search complexity to log(num groups).
For cr >= 2, we just perform the linear search as before. Also, in case of lock contention, we intermittently fallback to linear search even in CR 0 and CR 1 cases. This allows us to proceed during the allocation path even in case of high contention.
There is an opportunity to do optimization at CR2 too. That's because at CR2 we only consider groups where bb_free counter (number of free blocks) is greater than the request extent size. That's left as future work.
All the changes introduced in this patch are protected under a new mount option "mb_optimize_scan".
With this patchset, following experiment was performed:
Created a highly fragmented disk of size 65TB. The disk had no contiguous 2M regions. Following command was run consecutively for 3 times:
time dd if=/dev/urandom of=file bs=2M count=10
Here are the results with and without cr 0/1 optimizations introduced in this patch:
|---------+------------------------------+---------------------------| | | Without CR 0/1 Optimizations | With CR 0/1 Optimizations | |---------+------------------------------+---------------------------| | 1st run | 5m1.871s | 2m47.642s | | 2nd run | 2m28.390s | 0m0.611s | | 3rd run | 2m26.530s | 0m1.255s | |---------+------------------------------+---------------------------|
Signed-off-by: Harshad Shirwadkar <[email protected]> Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
| #
a6c75eaf |
| 01-Apr-2021 |
Harshad Shirwadkar <[email protected]> |
ext4: add mballoc stats proc file
Add new stats for measuring the performance of mballoc. This patch is forked from Artem Blagodarenko's work that can be found here:
https://github.com/lustre/lustr
ext4: add mballoc stats proc file
Add new stats for measuring the performance of mballoc. This patch is forked from Artem Blagodarenko's work that can be found here:
https://github.com/lustre/lustre-release/blob/master/ldiskfs/kernel_patches/patches/rhel8/ext4-simple-blockalloc.patch
This patch reorganizes the stats by cr level. This is how the output looks like:
mballoc: reqs: 0 success: 0 groups_scanned: 0 cr0_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 0 groups_considered: 0 useless_loops: 0 cr3_stats: hits: 0 groups_considered: 0 useless_loops: 0 extents_scanned: 0 goal_hits: 0 2^n_hits: 0 breaks: 0 lost: 0 buddies_generated: 0/40 buddies_time_used: 0 preallocated: 0 discarded: 0
Signed-off-by: Harshad Shirwadkar <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Ritesh Harjani <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc5, v5.12-rc4 |
|
| #
471fbbea |
| 19-Mar-2021 |
Daniel Rosenberg <[email protected]> |
ext4: handle casefolding with encryption
This adds support for encryption with casefolding.
Since the name on disk is case preserving, and also encrypted, we can no longer just recompute the hash o
ext4: handle casefolding with encryption
This adds support for encryption with casefolding.
Since the name on disk is case preserving, and also encrypted, we can no longer just recompute the hash on the fly. Additionally, to avoid leaking extra information from the hash of the unencrypted name, we use siphash via an fscrypt v2 policy.
The hash is stored at the end of the directory entry for all entries inside of an encrypted and casefolded directory apart from those that deal with '.' and '..'. This way, the change is backwards compatible with existing ext4 filesystems.
[ Changed to advertise this feature via the file: /sys/fs/ext4/features/encrypted_casefold -- TYT ]
Signed-off-by: Daniel Rosenberg <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse |
|
| #
efc61345 |
| 18-Feb-2021 |
Eric Whitney <[email protected]> |
ext4: shrink race window in ext4_should_retry_alloc()
When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it fails at significant rates on the two test scenarios that disable delaye
ext4: shrink race window in ext4_should_retry_alloc()
When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it fails at significant rates on the two test scenarios that disable delayed allocation (ext3conv and data_journal) and force actual block allocation for the fallocate and pwrite functions in the test. The failure rate on 5.10 for both ext3conv and data_journal on one test system typically runs about 85%. On 5.11, the failure rate on ext3conv sometimes drops to as low as 1% while the rate on data_journal increases to nearly 100%.
The observed failures are largely due to ext4_should_retry_alloc() cutting off block allocation retries when s_mb_free_pending (used to indicate that a transaction in progress will free blocks) is 0. However, free space is usually available when this occurs during runs of generic/371. It appears that a thread attempting to allocate blocks is just missing transaction commits in other threads that increase the free cluster count and reset s_mb_free_pending while the allocating thread isn't running. Explicitly testing for free space availability avoids this race.
The current code uses a post-increment operator in the conditional expression that determines whether the retry limit has been exceeded. This means that the conditional expression uses the value of the retry counter before it's increased, resulting in an extra retry cycle. The current code actually retries twice before hitting its retry limit rather than once.
Increasing the retry limit to 3 from the current actual maximum retry count of 2 in combination with the change described above reduces the observed failure rate to less that 0.1% on both ext3conv and data_journal with what should be limited impact on users sensitive to the overhead caused by retries.
A per filesystem percpu counter exported via sysfs is added to allow users or developers to track the number of times the retry limit is exceeded without resorting to debugging methods. This should provide some insight into worst case retry behavior.
Signed-off-by: Eric Whitney <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
show more ...
|
|
Revision tags: v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6 |
|
| #
8446fe92 |
| 24-Nov-2020 |
Christoph Hellwig <[email protected]> |
block: switch partition lookup to use struct block_device
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path.
Signed-off-by: Chris
block: switch partition lookup to use struct block_device
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Coly Li <[email protected]> [bcache] Acked-by: Chao Yu <[email protected]> [f2fs] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2 |
|
| #
6694875e |
| 28-Oct-2020 |
Theodore Ts'o <[email protected]> |
ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/...
Signed-off-by: Theodore Ts'o <[email protected]>
|