History log of /linux-6.15/fs/fs-writeback.c (Results 1 – 25 of 412)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7
# 94eed61d 11-Jan-2025 Kaixiong Yu <[email protected]>

fs: fs-writeback: move sysctl to fs/fs-writeback.c

The dirtytime_expire_interval belongs to fs/fs-writeback.c, move it to
fs/fs-writeback.c from /kernel/sysctl.c. And remove the useless extern
varia

fs: fs-writeback: move sysctl to fs/fs-writeback.c

The dirtytime_expire_interval belongs to fs/fs-writeback.c, move it to
fs/fs-writeback.c from /kernel/sysctl.c. And remove the useless extern
variable declaration and the function declaration from
include/linux/writeback.h

Signed-off-by: Kaixiong Yu <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12
# 8182a8b3 12-Nov-2024 Christoph Hellwig <[email protected]>

writeback: wbc_attach_fdatawrite_inode out of line

This allows exporting this high-level interface only while keeping
wbc_attach_and_unlock_inode private in fs-writeback.c and unexporting
__inode_at

writeback: wbc_attach_fdatawrite_inode out of line

This allows exporting this high-level interface only while keeping
wbc_attach_and_unlock_inode private in fs-writeback.c and unexporting
__inode_attach_wb.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 4d7485cf 12-Nov-2024 Christoph Hellwig <[email protected]>

writeback: add a __releases annoation to wbc_attach_and_unlock_inode

This shuts up a sparse lock context tracking warning.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.or

writeback: add a __releases annoation to wbc_attach_and_unlock_inode

This shuts up a sparse lock context tracking warning.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1
# 30dac24e 26-Sep-2024 Pankaj Raghav <[email protected]>

fs/writeback: convert wbc_account_cgroup_owner to take a folio

Most of the callers of wbc_account_cgroup_owner() are converting a folio
to page before calling the function. wbc_account_cgroup_owner(

fs/writeback: convert wbc_account_cgroup_owner to take a folio

Most of the callers of wbc_account_cgroup_owner() are converting a folio
to page before calling the function. wbc_account_cgroup_owner() is
converting the page back to a folio to call mem_cgroup_css_from_folio().

Convert wbc_account_cgroup_owner() to take a folio instead of a page,
and convert all callers to pass a folio directly except f2fs.

Convert the page to folio for all the callers from f2fs as they were the
only callers calling wbc_account_cgroup_owner() with a page. As f2fs is
already in the process of converting to folios, these call sites might
also soon be calling wbc_account_cgroup_owner() with a folio directly in
the future.

No functional changes. Only compile tested.

Signed-off-by: Pankaj Raghav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: David Sterba <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# 532980cb 23-Aug-2024 Christian Brauner <[email protected]>

inode: port __I_SYNC to var event

Port the __I_SYNC mechanism to use the new var event mechanism.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jos

inode: port __I_SYNC to var event

Port the __I_SYNC mechanism to use the new var event mechanism.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.11-rc4
# 57510c58 13-Aug-2024 Mateusz Guzik <[email protected]>

vfs: drop one lock trip in evict()

Most commonly neither I_LRU_ISOLATING nor I_SYNC are set, but the stock
kernel takes a back-to-back relock trip to check for them.

It probably can be avoided alto

vfs: drop one lock trip in evict()

Most commonly neither I_LRU_ISOLATING nor I_SYNC are set, but the stock
kernel takes a back-to-back relock trip to check for them.

It probably can be avoided altogether, but for now massage things back
to just one lock acquire.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Zhihao Cheng <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1
# c393eaa8 26-Jul-2024 Haifeng Xu <[email protected]>

fs: don't flush in-flight wb switches for superblocks without cgroup writeback

When deactivating any type of superblock, it had to wait for the in-flight
wb switches to be completed. wb switches are

fs: don't flush in-flight wb switches for superblocks without cgroup writeback

When deactivating any type of superblock, it had to wait for the in-flight
wb switches to be completed. wb switches are executed in inode_switch_wbs_work_fn()
which needs to acquire the wb_switch_rwsem and races against sync_inodes_sb().
If there are too much dirty data in the superblock, the waiting time may increase
significantly.

For superblocks without cgroup writeback such as tmpfs, they have nothing to
do with the wb swithes, so the flushing can be avoided.

Signed-off-by: Haifeng Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 78eb4ea2 24-Jul-2024 Joel Granados <[email protected]>

sysctl: treewide: constify the ctl_table argument of proc_handlers

const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ct

sysctl: treewide: constify the ctl_table argument of proc_handlers

const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.

This patch has been generated by the following coccinelle script:

```
virtual patch

@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);

@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }

@r3@
identifier func;
@@

int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);

@r4@
identifier func, ctl;
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);

@r5@
identifier func, write, buffer, lenp, ppos;
@@

int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);

```

* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.

* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.

Co-developed-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Thomas Weißschuh <[email protected]>
Co-developed-by: Joel Granados <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7
# 6a1ee871 28-Feb-2024 Kemeng Shi <[email protected]>

fs/writeback: remove unnecessary return in writeback_inodes_sb

writeback_inodes_sb doesn't have return value, just remove unnecessary
return in it.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.

fs/writeback: remove unnecessary return in writeback_inodes_sb

writeback_inodes_sb doesn't have return value, just remove unnecessary
return in it.

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# ba679de9 28-Feb-2024 Kemeng Shi <[email protected]>

fs/writeback: correct comment of __wakeup_flusher_threads_bdi

Commit e8e8a0c6c9bfc ("writeback: move nr_pages == 0 logic to one
location") removed parameter nr_pages of __wakeup_flusher_threads_bdi

fs/writeback: correct comment of __wakeup_flusher_threads_bdi

Commit e8e8a0c6c9bfc ("writeback: move nr_pages == 0 logic to one
location") removed parameter nr_pages of __wakeup_flusher_threads_bdi
and we try to writeback all dirty pages in __wakeup_flusher_threads_bdi
now. Just correct stale comment.

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 639924ab 28-Feb-2024 Kemeng Shi <[email protected]>

fs/writeback: only calculate dirtied_before when b_io is empty

The dirtied_before is only used when b_io is not empty, so only calculate
when b_io is not empty.

Signed-off-by: Kemeng Shi <shikemeng

fs/writeback: only calculate dirtied_before when b_io is empty

The dirtied_before is only used when b_io is not empty, so only calculate
when b_io is not empty.

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 2ddc9346 28-Feb-2024 Kemeng Shi <[email protected]>

fs/writeback: remove unused parameter wb of finish_writeback_work

Remove unused parameter wb of finish_writeback_work.

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kerne

fs/writeback: remove unused parameter wb of finish_writeback_work

Remove unused parameter wb of finish_writeback_work.

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Tim Chen <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# d9210989 28-Feb-2024 Kemeng Shi <[email protected]>

fs/writeback: bail out if there is no more inodes for IO and queued once

For case there is no more inodes for IO in io list from last wb_writeback,
We may bail out early even there is inode in dirty

fs/writeback: bail out if there is no more inodes for IO and queued once

For case there is no more inodes for IO in io list from last wb_writeback,
We may bail out early even there is inode in dirty list should be written
back. Only bail out when we queued once to avoid missing dirtied inode.

This is from code reading...

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
[[email protected]: fold in memory corruption fix from Jan in [1]]
Link: https://lore.kernel.org/r/20240405132346.bid7gibby3lxxhez@quack3 [1]
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# ac0c18f2 28-Feb-2024 Kemeng Shi <[email protected]>

fs/writeback: avoid to writeback non-expired inode in kupdate writeback

In kupdate writeback, only expired inode (have been dirty for longer than
dirty_expire_interval) is supposed to be written bac

fs/writeback: avoid to writeback non-expired inode in kupdate writeback

In kupdate writeback, only expired inode (have been dirty for longer than
dirty_expire_interval) is supposed to be written back. However, kupdate
writeback will writeback non-expired inode left in b_io or b_more_io from
last wb_writeback. As a result, writeback will keep being triggered
unexpected when we keep dirtying pages even dirty memory is under
threshold and inode is not expired. To be more specific:
Assume dirty background threshold is > 1G and dirty_expire_centisecs is
> 60s. When we running fio -size=1G -invalidate=0 -ioengine=libaio
--time_based -runtime=60... (keep dirtying), the writeback will keep
being triggered as following:
wb_workfn
wb_do_writeback
wb_check_background_flush
/*
* Wb dirty background threshold starts at 0 if device was idle and
* grows up when bandwidth of wb is updated. So a background
* writeback is triggered.
*/
wb_over_bg_thresh
/*
* Dirtied inode will be written back and added to b_more_io list
* after slice used up (because we keep dirtying the inode).
*/
wb_writeback

Writeback is triggered per dirty_writeback_centisecs as following:
wb_workfn
wb_do_writeback
wb_check_old_data_flush
/*
* Write back inode left in b_io and b_more_io from last wb_writeback
* even the inode is non-expired and it will be added to b_more_io
* again as slice will be used up (because we keep dirtying the
* inode)
*/
wb_writeback

Fix this by moving non-expired inode to dirty list instead of more io
list for kupdate writeback in requeue_inode.

Test as following:
/* make it more easier to observe the issue */
echo 300000 > /proc/sys/vm/dirty_expire_centisecs
echo 100 > /proc/sys/vm/dirty_writeback_centisecs
/* create a idle device */
mkfs.ext4 -F /dev/vdb
mount /dev/vdb /bdi1/
/* run buffer write with fio */
fio -name test -filename=/bdi1/file -size=800M -ioengine=libaio -bs=4K \
-iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0

Fio result before fix (run three tests):
1360MB/s
1329MB/s
1455MB/s

Fio result after fix (run three tests):
1737MB/s
1729MB/s
1789MB/s

Writeback for non-expired inode is gone as expeted. Observe this with trace
writeback_start and writeback_written as following:
echo 1 > /sys/kernel/debug/tracing/events/writeback/writeback_start/enab
echo 1 > /sys/kernel/debug/tracing/events/writeback/writeback_written/enable
cat /sys/kernel/tracing/trace_pipe

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1
# 12f7900c 18-Jan-2024 Kemeng Shi <[email protected]>

writeback: move wb_wakeup_delayed defination to fs-writeback.c

The wb_wakeup_delayed is only used in fs-writeback.c. Move it to
fs-writeback.c after defination of wb_wakeup and make it static.

Sign

writeback: move wb_wakeup_delayed defination to fs-writeback.c

The wb_wakeup_delayed is only used in fs-writeback.c. Move it to
fs-writeback.c after defination of wb_wakeup and make it static.

Signed-off-by: Kemeng Shi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4
# c9c4ff12 27-Nov-2023 David Howells <[email protected]>

netfs: Move pinning-for-writeback from fscache to netfs

Move the resource pinning-for-writeback from fscache code to netfslib code.
This is used to keep a cache backing object pinned whilst we have

netfs: Move pinning-for-writeback from fscache to netfs

Move the resource pinning-for-writeback from fscache code to netfslib code.
This is used to keep a cache backing object pinned whilst we have dirty
pages on the netfs inode in the pagecache such that VM writeback will be
able to reach it.

Whilst we're at it, switch the parameters of netfs_unpin_writeback() to
match ->write_inode() so that it can be used for that directly.

Note that this mechanism could be more generically useful than that for
network filesystems. Quite often they have to keep around other resources
(e.g. authentication tokens or network connections) until the writeback is
complete.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]

show more ...


Revision tags: v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6
# 6654408a 14-Oct-2023 Jingbo Xu <[email protected]>

writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs

The cgwb cleanup routine will try to release the dying cgwb by switching
the attached inodes. It fetches the attached i

writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs

The cgwb cleanup routine will try to release the dying cgwb by switching
the attached inodes. It fetches the attached inodes from wb->b_attached
list, omitting the fact that inodes only with dirty timestamps reside in
wb->b_dirty_time list, which is the case when lazytime is enabled. This
causes enormous zombie memory cgroup when lazytime is enabled, as inodes
with dirty timestamps can not be switched to a live cgwb for a long time.

It is reasonable not to switch cgwb for inodes with dirty data, as
otherwise it may break the bandwidth restrictions. However since the
writeback of inode metadata is not accounted for, let's also switch
inodes with dirty timestamps to avoid zombie memory and block cgroups
when laztytime is enabled.

Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Jingbo Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2
# be049c3a 16-Sep-2023 Chunhai Guo <[email protected]>

fs-writeback: do not requeue a clean inode having skipped pages

When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback it

fs-writeback: do not requeue a clean inode having skipped pages

When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback iteration, a
clean inode is requeued to the wb->b_dirty queue due to non-zero
pages_skipped, without anything actually being written. This causes an
infinite loop and prevents the plug from being flushed, resulting in a
deadlock. We now avoid requeuing the clean inode to prevent this issue.

wb_writeback fsync (inode-Y)
blk_start_plug(&plug)
for (;;) {
iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
filemap_fdatawrite
__filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
do_writepages
f2fs_write_data_pages
__f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
f2fs_write_cache_pages
f2fs_write_single_data_page
f2fs_do_write_data_page
f2fs_outplace_write_data
f2fs_update_data_blkaddr
f2fs_wait_on_page_writeback
wait_on_page_writeback // wait for f2fs node page-X
iter i:
progress = __writeback_inodes_wb(wb, work)
. writeback_sb_inodes
. __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
. . do_writepages
. . f2fs_write_data_pages
. . . __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
. . . wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
. if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
. total_wrote++; // total_wrote = 1
. requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
if (progress) // progress = 1
continue;
iter i+1:
queue_io
// similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug) // flush plug won't be called

Signed-off-by: Chunhai Guo <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.6-rc1, v6.5, v6.5-rc7
# d8ce82ef 18-Aug-2023 Christian Brauner <[email protected]>

super: make locking naming consistent

Make the naming consistent with the earlier introduced
super_lock_{read,write}() helpers.

Reviewed-by: Jan Kara <[email protected]>
Message-Id: <20230818-vfs-super-

super: make locking naming consistent

Make the naming consistent with the earlier introduced
super_lock_{read,write}() helpers.

Reviewed-by: Jan Kara <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3
# 2816ea2a 21-Apr-2023 Yosry Ahmed <[email protected]>

writeback: move wb_over_bg_thresh() call outside lock section

Patch series "cgroup: eliminate atomic rstat flushing", v5.

A previous patch series [1] changed most atomic rstat flushing contexts to

writeback: move wb_over_bg_thresh() call outside lock section

Patch series "cgroup: eliminate atomic rstat flushing", v5.

A previous patch series [1] changed most atomic rstat flushing contexts to
become non-atomic. This was done to avoid an expensive operation that
scales with # cgroups and # cpus to happen with irqs disabled and
scheduling not permitted. There were two remaining atomic flushing
contexts after that series. This series tries to eliminate them as well,
eliminating atomic rstat flushing completely.

The two remaining atomic flushing contexts are:
(a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
(b) mem_cgroup_threshold()->mem_cgroup_usage()

For (a), flushing needs to be atomic as wb_writeback() calls
wb_over_bg_thresh() with a spinlock held. However, it seems like the call
to wb_over_bg_thresh() doesn't need to be protected by that spinlock, so
this series proposes a refactoring that moves the call outside the lock
criticial section and makes the stats flushing in mem_cgroup_wb_stats()
non-atomic.

For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
with irqs disabled. We only flush the stats when calculating the root
usage, as it is approximated as the sum of some memcg stats (file, anon,
and optionally swap) instead of the conventional page counter. This
series proposes changing this calculation to use the global stats instead,
eliminating the need for a memcg stat flush.

After these 2 contexts are eliminated, we no longer need
mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can
remove them and simplify the code.

[1] https://lore.kernel.org/linux-mm/[email protected]/


This patch (of 5):

wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
flush, which can be expensive on large systems. Currently,
wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
have to do the rstat flush atomically. On systems with a lot of
cpus and/or cgroups, this can cause us to disable irqs for a long time,
potentially causing problems.

Move the call to wb_over_bg_thresh() outside the lock section in
preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
The list_empty(&wb->work_list) check should be okay outside the lock
section of wb->list_lock as it is protected by a separate lock
(wb->work_lock), and wb_over_bg_thresh() doesn't seem like it is
modifying any of wb->b_* lists the wb->list_lock is protecting.
Also, the loop seems to be already releasing and reacquring the
lock, so this refactoring looks safe.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Reviewed-by: Michal Koutný <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5
# 3e46c89c 19-Jan-2023 Maxim Korotkov <[email protected]>

writeback: fix call of incorrect macro

the variable 'history' is of type u16, it may be an error
that the hweight32 macro was used for it
I guess macro hweight16 should be used

Found by Linux Ve

writeback: fix call of incorrect macro

the variable 'history' is of type u16, it may be an error
that the hweight32 macro was used for it
I guess macro hweight16 should be used

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 2a81490811d0 ("writeback: implement foreign cgroup inode detection")
Signed-off-by: Maxim Korotkov <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 1ba1199e 10-Apr-2023 Baokun Li <[email protected]>

writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs

KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi

writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs

KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================

The race that causes the above issue is as follows:

cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref

bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.

This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.

To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: yangerkun <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 75376c6f 16-Jan-2023 Matthew Wilcox (Oracle) <[email protected]>

mm: convert mem_cgroup_css_from_page() to mem_cgroup_css_from_folio()

Only one caller doesn't have a folio, so move the page_folio() call to
that one caller from mem_cgroup_css_from_folio().

Link:

mm: convert mem_cgroup_css_from_page() to mem_cgroup_css_from_folio()

Only one caller doesn't have a folio, so move the page_folio() call to
that one caller from mem_cgroup_css_from_folio().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 9cfb816b 16-Jan-2023 Matthew Wilcox (Oracle) <[email protected]>

mm/fs: convert inode_attach_wb() to take a folio

Patch series "Writeback folio conversions".

Remove more calls to compound_head() by passing folios around instead of
pages.


This patch (of 2):

Th

mm/fs: convert inode_attach_wb() to take a folio

Patch series "Writeback folio conversions".

Remove more calls to compound_head() by passing folios around instead of
pages.


This patch (of 2):

The only caller of inode_attach_wb() which doesn't pass NULL already has a
folio, so convert the whole call-chain to take folios.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1
# 23e188a1 10-Dec-2022 Miaohe Lin <[email protected]>

writeback: remove obsolete macro EXPIRE_DIRTY_ATIME

EXPIRE_DIRTY_ATIME is not used anymore. Remove it.

Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: ht

writeback: remove obsolete macro EXPIRE_DIRTY_ATIME

EXPIRE_DIRTY_ATIME is not used anymore. Remove it.

Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


12345678910>>...17