|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
6340584e |
| 11-Feb-2025 |
Bart Van Assche <[email protected]> |
mm/vmstat: revert "fix a W=1 clang compiler warning"
Commit 75b5ab134bb5 enabled -Wenum-enum-conversion in W=1 builds. Commit 30c2de0a267c ("mm/vmstat: fix a W=1 clang compiler warning") fixed a -W
mm/vmstat: revert "fix a W=1 clang compiler warning"
Commit 75b5ab134bb5 enabled -Wenum-enum-conversion in W=1 builds. Commit 30c2de0a267c ("mm/vmstat: fix a W=1 clang compiler warning") fixed a -Wenum-enum-conversion warning. Commit 8f6629c004b1 removed the -Wenum-enum-conversion option again from W=1 builds. Since the W=1 compiler warning fix is no longer necessary, revert it.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bart Van Assche <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ivan Shapovalov <[email protected]> Cc: David Laight <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7 |
|
| #
b8974b89 |
| 11-Jan-2025 |
Kaixiong Yu <[email protected]> |
mm: vmstat: move sysctls to mm/vmstat.c
This moves all vmstat related sysctls to its own file, removes useless extern variable declarations, and do some related clean-ups. To avoid compiler warnings
mm: vmstat: move sysctls to mm/vmstat.c
This moves all vmstat related sysctls to its own file, removes useless extern variable declarations, and do some related clean-ups. To avoid compiler warnings when CONFIG_PROC_FS is not defined, add the macro definition CONFIG_PROC_FS ahead CONFIG_NUMA in vmstat.c.
Signed-off-by: Kaixiong Yu <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
30c2de0a |
| 12-Dec-2024 |
Bart Van Assche <[email protected]> |
mm/vmstat: fix a W=1 clang compiler warning
Fix the following clang compiler warning that is reported if the kernel is built with W=1:
./include/linux/vmstat.h:518:36: error: arithmetic between dif
mm/vmstat: fix a W=1 clang compiler warning
Fix the following clang compiler warning that is reported if the kernel is built with W=1:
./include/linux/vmstat.h:518:36: error: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Werror,-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~
Link: https://lkml.kernel.org/r/[email protected] Fixes: 9d7ea9a297e6 ("mm/vmstat: add helpers to get vmstat item names for each enum type") Signed-off-by: Bart Van Assche <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
f77f0c75 |
| 14-Aug-2024 |
Kaiyang Zhao <[email protected]> |
mm,memcg: provide per-cgroup counters for NUMA balancing operations
The ability to observe the demotion and promotion decisions made by the kernel on a per-cgroup basis is important for monitoring a
mm,memcg: provide per-cgroup counters for NUMA balancing operations
The ability to observe the demotion and promotion decisions made by the kernel on a per-cgroup basis is important for monitoring and tuning containerized workloads on machines equipped with tiered memory.
Different containers in the system may experience drastically different memory tiering actions that cannot be distinguished from the global counters alone.
For example, a container running a workload that has a much hotter memory accesses will likely see more promotions and fewer demotions, potentially depriving a colocated container of top tier memory to such an extent that its performance degrades unacceptably.
For another example, some containers may exhibit longer periods between data reuse, causing much more numa_hint_faults than numa_pages_migrated. In this case, tuning hot_threshold_ms may be appropriate, but the signal can easily be lost if only global counters are available.
In the long term, we hope to introduce per-cgroup control of promotion and demotion actions to implement memory placement policies in tiering.
This patch set adds seven counters to memory.stat in a cgroup: numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd, pgdemote_khugepaged, pgdemote_direct and pgpromote_success. pgdemote_* and pgpromote_success are also available in memory.numa_stat.
count_memcg_events_mm() is added to count multiple event occurrences at once, and get_mem_cgroup_from_folio() is added because we need to get a reference to the memcg of a folio before it's migrated to track numa_pages_migrated. The accounting of PGDEMOTE_* is moved to shrink_inactive_list() before being changed to per-cgroup.
[[email protected]: add documentation of the memcg counters in cgroup-v2.rst] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kaiyang Zhao <[email protected]> Cc: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Wei Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3 |
|
| #
9d857311 |
| 08-Aug-2024 |
Pasha Tatashin <[email protected]> |
mm: don't account memmap per-node
Fix invalid access to pgdat during hot-remove operation: ndctl users reported a GPF when trying to destroy a namespace: $ ndctl destroy-namespace all -r all -f Seg
mm: don't account memmap per-node
Fix invalid access to pgdat during hot-remove operation: ndctl users reported a GPF when trying to destroy a namespace: $ ndctl destroy-namespace all -r all -f Segmentation fault dmesg: Oops: general protection fault, probably for non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: probably user-memory-access in range [0x000000000002b280-0x000000000002b287] CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1 Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.20.1 09/13/2023 RIP: 0010:mod_node_page_state+0x2a/0x110
cxl-test users report a GPF when trying to unload the test module: $ modrpobe -r cxl-test dmesg BUG: unable to handle page fault for address: 0000000000004200 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 UID: 0 PID: 1076 Comm: modprobe Tainted: G O N 6.11.0-rc1 #197 Tainted: [O]=OOT_MODULE, [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/15 RIP: 0010:mod_node_page_state+0x6/0x90
Currently, when memory is hot-plugged or hot-removed the accounting is done based on the assumption that memmap is allocated from the same node as the hot-plugged/hot-removed memory, which is not always the case.
In addition, there are challenges with keeping the node id of the memory that is being remove to the time when memmap accounting is actually performed: since this is done after remove_pfn_range_from_zone(), and also after remove_memory_block_devices(). Meaning that we cannot use pgdat nor walking though memblocks to get the nid.
Given all of that, account the memmap overhead system wide instead.
For this we are going to be using global atomic counters, but given that memmap size is rarely modified, and normally is only modified either during early boot when there is only one CPU, or under a hotplug global mutex lock, therefore there is no need for per-cpu optimizations.
Also, while we are here rename nr_memmap to nr_memmap_pages, and nr_memmap_boot to nr_memmap_boot_pages to be self explanatory that the units are in page count.
[[email protected]: address a few nits from David Hildenbrand] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 15995a352474 ("mm: report per-page metadata information") Signed-off-by: Pasha Tatashin <[email protected]> Reported-by: Yi Zhang <[email protected]> Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com Reported-by: Alison Schofield <[email protected]> Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t Tested-by: Dan Williams <[email protected]> Tested-by: Alison Schofield <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: David Rientjes <[email protected]> Tested-by: Yi Zhang <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: Fan Ni <[email protected]> Cc: Joel Granados <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Li Zhijian <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sourav Panda <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
f4cb78af |
| 08-Aug-2024 |
Pasha Tatashin <[email protected]> |
mm: add system wide stats items category
/proc/vmstat contains events and stats, events can only grow, but stats can grow and shrink.
vmstat has the following: ------------------------- NR_VM_ZONE_
mm: add system wide stats items category
/proc/vmstat contains events and stats, events can only grow, but stats can grow and shrink.
vmstat has the following: ------------------------- NR_VM_ZONE_STAT_ITEMS: per-zone stats NR_VM_NUMA_EVENT_ITEMS: per-numa events NR_VM_NODE_STAT_ITEMS: per-numa stats NR_VM_WRITEBACK_STAT_ITEMS: system-wide background-writeback and dirty-throttling tresholds. NR_VM_EVENT_ITEMS: system-wide events -------------------------
Rename NR_VM_WRITEBACK_STAT_ITEMS to NR_VM_STAT_ITEMS, to track the system-wide stats, we are going to add per-page metadata stats to this category in the next patch.
Also delete unused writeback_stat_name().
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 15995a352474 ("mm: report per-page metadata information") Signed-off-by: Pasha Tatashin <[email protected]> Suggested-by: Yosry Ahmed <[email protected]> Tested-by: Alison Schofield <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Dan Williams <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: Joel Granados <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Li Zhijian <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sourav Panda <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yi Zhang <[email protected]> Cc: Fan Ni <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1 |
|
| #
78eb4ea2 |
| 24-Jul-2024 |
Joel Granados <[email protected]> |
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ct
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified.
This patch has been generated by the following coccinelle script:
``` virtual patch
@r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@ identifier func, ctl, write, buffer, lenp, ppos; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... }
@r3@ identifier func; @@
int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *);
@r4@ identifier func, ctl; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *);
@r5@ identifier func, write, buffer, lenp, ppos; @@
int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration.
Co-developed-by: Thomas Weißschuh <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Co-developed-by: Joel Granados <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
15995a35 |
| 05-Jun-2024 |
Sourav Panda <[email protected]> |
mm: report per-page metadata information
Today, we do not have any observability of per-page metadata and how much it takes away from the machine capacity. Thus, we want to describe the amount of m
mm: report per-page metadata information
Today, we do not have any observability of per-page metadata and how much it takes away from the machine capacity. Thus, we want to describe the amount of memory that is going towards per-page metadata, which can vary depending on build configuration, machine architecture, and system use.
This patch adds 2 fields to /proc/vmstat that can used as shown below:
Accounting per-page metadata allocated by boot-allocator: /proc/vmstat:nr_memmap_boot * PAGE_SIZE
Accounting per-page metadata allocated by buddy-allocator: /proc/vmstat:nr_memmap * PAGE_SIZE
Accounting total Perpage metadata allocated on the machine: (/proc/vmstat:nr_memmap_boot + /proc/vmstat:nr_memmap) * PAGE_SIZE
Utility for userspace:
Observability: Describe the amount of memory overhead that is going to per-page metadata on the system at any given time since this overhead is not currently observable.
Debugging: Tracking the changes or absolute value in struct pages can help detect anomalies as they can be correlated with other metrics in the machine (e.g., memtotal, number of huge pages, etc).
page_ext overheads: Some kernel features such as page_owner page_table_check that use page_ext can be optionally enabled via kernel parameters. Having the total per-page metadata information helps users precisely measure impact. Furthermore, page-metadata metrics will reflect the amount of struct pages reliquished (or overhead reduced) when hugetlbfs pages are reserved which will vary depending on whether hugetlb vmemmap optimization is enabled or not.
For background and results see: lore.kernel.org/all/[email protected]
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sourav Panda <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Chen Linxuan <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Muchun Song <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tomas Mudrunka <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Xu <[email protected]> Cc: Yang Yang <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
e0932b6c |
| 20-Mar-2024 |
Johannes Weiner <[email protected]> |
mm: page_alloc: consolidate free page accounting
Free page accounting currently happens a bit too high up the call stack, where it has to deal with guard pages, compaction capturing, block stealing
mm: page_alloc: consolidate free page accounting
Free page accounting currently happens a bit too high up the call stack, where it has to deal with guard pages, compaction capturing, block stealing and even page isolation. This is subtle and fragile, and makes it difficult to hack on the code.
Now that type violations on the freelists have been fixed, push the accounting down to where pages enter and leave the freelist.
[[email protected]: undo unrelated drive-by line wrap] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: remove unused page parameter from account_freepages()] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix free page accounting] Link: https://lkml.kernel.org/r/a2a48baca69f103aa431fd201f8a06e3b95e203d.1712648441.git.baolin.wang@linux.alibaba.com [[email protected]: avoid defining unused function] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Tested-by: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8 |
|
| #
c701123b |
| 28-Dec-2023 |
Matthew Wilcox (Oracle) <[email protected]> |
mm/memcontrol: remove __mod_lruvec_page_state()
There are no more callers of __mod_lruvec_page_state(), so convert the implementation to __lruvec_stat_mod_folio(), removing two calls to compound_hea
mm/memcontrol: remove __mod_lruvec_page_state()
There are no more callers of __mod_lruvec_page_state(), so convert the implementation to __lruvec_stat_mod_folio(), removing two calls to compound_head() (one explicit, one hidden inside page_memcg()).
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Zi Yan <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
e435ca87 |
| 28-Dec-2023 |
Matthew Wilcox (Oracle) <[email protected]> |
mm: remove inc/dec lruvec page state functions
Patch series "Remove some lruvec page accounting functions", v2.
Some functions are now unused; remove them. Make __mod_lruvec_page_state() unused an
mm: remove inc/dec lruvec page state functions
Patch series "Remove some lruvec page accounting functions", v2.
Some functions are now unused; remove them. Make __mod_lruvec_page_state() unused and then remove it.
This patch (of 6):
All callers of these have been converted to their folio equivalents.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1 |
|
| #
52f23865 |
| 27-Feb-2023 |
Suren Baghdasaryan <[email protected]> |
mm: introduce per-VMA lock statistics
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics about handling page fault under VMA lock.
Link: https://lkml.kernel.org/r/2023022717
mm: introduce per-VMA lock statistics
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics about handling page fault under VMA lock.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5 |
|
| #
7964cf8c |
| 06-Sep-2022 |
Liam R. Howlett <[email protected]> |
mm: remove vmacache
By using the maple tree and the maple tree state, the vmacache is no longer beneficial and is complicating the VMA code. Remove the vmacache to reduce the work in keeping it up
mm: remove vmacache
By using the maple tree and the maple tree state, the vmacache is no longer beneficial and is complicating the VMA code. Remove the vmacache to reduce the work in keeping it up to date and code complexity.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1 |
|
| #
0995d7e5 |
| 29-Apr-2021 |
Matthew Wilcox (Oracle) <[email protected]> |
mm/workingset: Convert workingset_refault() to take a folio
This nets us 178 bytes of savings from removing calls to compound_head. The three callers all grow a little, but each of them will be conv
mm/workingset: Convert workingset_refault() to take a folio
This nets us 178 bytes of savings from removing calls to compound_head. The three callers all grow a little, but each of them will be converted to use folios soon, so that's fine.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: David Howells <[email protected]> Acked-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5 |
|
| #
a53e17e4 |
| 18-Jan-2021 |
Matthew Wilcox (Oracle) <[email protected]> |
mm/vmstat: Add functions to account folio statistics
Allow page counters to be more readily modified by callers which have a folio. Name these wrappers with 'stat' instead of 'state' as requested b
mm/vmstat: Add functions to account folio statistics
Allow page counters to be more readily modified by callers which have a folio. Name these wrappers with 'stat' instead of 'state' as requested by Linus here: https://lore.kernel.org/linux-mm/CAHk-=wj847SudR-kt+46fT3+xFFgiwpgThvm7DJWGdi4cVrbnQ@mail.gmail.com/ No change to generated code.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Jeff Layton <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: David Howells <[email protected]> Acked-by: Mike Rapoport <[email protected]>
show more ...
|
| #
3e23060b |
| 29-Jun-2021 |
Mel Gorman <[email protected]> |
mm/page_alloc: batch the accounting updates in the bulk allocator
Now that the zone_statistics are simple counters that do not require special protection, the bulk allocator accounting updates can b
mm/page_alloc: batch the accounting updates in the bulk allocator
Now that the zone_statistics are simple counters that do not require special protection, the bulk allocator accounting updates can be batch updated without adding too much complexity with protected RMW updates or using xchg.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
3ac44a34 |
| 29-Jun-2021 |
Mel Gorman <[email protected]> |
mm/vmstat: inline NUMA event counter updates
__count_numa_event is small enough to be treated similarly to __count_vm_event so inline it.
Link: https://lkml.kernel.org/r/20210512095458.30632-5-mgor
mm/vmstat: inline NUMA event counter updates
__count_numa_event is small enough to be treated similarly to __count_vm_event so inline it.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
f19298b9 |
| 29-Jun-2021 |
Mel Gorman <[email protected]> |
mm/vmstat: convert NUMA statistics to basic NUMA counters
NUMA statistics are maintained on the zone level for hits, misses, foreign etc but nothing relies on them being perfectly accurate for funct
mm/vmstat: convert NUMA statistics to basic NUMA counters
NUMA statistics are maintained on the zone level for hits, misses, foreign etc but nothing relies on them being perfectly accurate for functional correctness. The counters are used by userspace to get a general overview of a workloads NUMA behaviour but the page allocator incurs a high cost to maintain perfect accuracy similar to what is required for a vmstat like NR_FREE_PAGES. There even is a sysctl vm.numa_stat to allow userspace to turn off the collection of NUMA statistics like NUMA_HIT.
This patch converts NUMA_HIT and friends to be NUMA events with similar accuracy to VM events. There is a possibility that slight errors will be introduced but the overall trend as seen by userspace will be similar. The counters are no longer updated from vmstat_refresh context as it is unnecessary overhead for counters that may never be read by userspace. Note that counters could be maintained at the node level to save space but it would have a user-visible impact due to /proc/zoneinfo.
[[email protected]: Fix misplaced closing brace for !CONFIG_NUMA]
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
28f836b6 |
| 29-Jun-2021 |
Mel Gorman <[email protected]> |
mm/page_alloc: split per cpu page lists and zone stats
The PCP (per-cpu page allocator in page_alloc.c) shares locking requirements with vmstat and the zone lock which is inconvenient and causes som
mm/page_alloc: split per cpu page lists and zone stats
The PCP (per-cpu page allocator in page_alloc.c) shares locking requirements with vmstat and the zone lock which is inconvenient and causes some issues. For example, the PCP list and vmstat share the same per-cpu space meaning that it's possible that vmstat updates dirty cache lines holding per-cpu lists across CPUs unless padding is used. Second, PREEMPT_RT does not want to disable IRQs for too long in the page allocator.
This series splits the locking requirements and uses locks types more suitable for PREEMPT_RT, reduces the time when special locking is required for stats and reduces the time when IRQs need to be disabled on !PREEMPT_RT kernels.
Why local_lock? PREEMPT_RT considers the following sequence to be unsafe as documented in Documentation/locking/locktypes.rst
local_irq_disable(); spin_lock(&lock);
The pcp allocator has this sequence for rmqueue_pcplist (local_irq_save) -> __rmqueue_pcplist -> rmqueue_bulk (spin_lock). While it's possible to separate this out, it generally means there are points where we enable IRQs and reenable them again immediately. To prevent a migration and the per-cpu pointer going stale, migrate_disable is also needed. That is a custom lock that is similar, but worse, than local_lock. Furthermore, on PREEMPT_RT, it's undesirable to leave IRQs disabled for too long. By converting to local_lock which disables migration on PREEMPT_RT, the locking requirements can be separated and start moving the protections for PCP, stats and the zone lock to PREEMPT_RT-safe equivalent locking. As a bonus, local_lock also means that PROVE_LOCKING does something useful.
After that, it's obvious that zone_statistics incurs too much overhead and leaves IRQs disabled for longer than necessary on !PREEMPT_RT kernels. zone_statistics uses perfectly accurate counters requiring IRQs be disabled for parallel RMW sequences when inaccurate ones like vm_events would do. The series makes the NUMA statistics (NUMA_HIT and friends) inaccurate counters that then require no special protection on !PREEMPT_RT.
The bulk page allocator can then do stat updates in bulk with IRQs enabled which should improve the efficiency. Technically, this could have been done without the local_lock and vmstat conversion work and the order simply reflects the timing of when different series were implemented.
Finally, there are places where we conflate IRQs being disabled for the PCP with the IRQ-safe zone spinlock. The remainder of the series reduces the scope of what is protected by disabled IRQs on !PREEMPT_RT kernels. By the end of the series, page_alloc.c does not call local_irq_save so the locking scope is a bit clearer. The one exception is that modifying NR_FREE_PAGES still happens in places where it's known the IRQs are disabled as it's harmless for PREEMPT_RT and would be expensive to split the locking there.
No performance data is included because despite the overhead of the stats, it's within the noise for most workloads on !PREEMPT_RT. However, Jesper Dangaard Brouer ran a page allocation microbenchmark on a E5-1650 v4 @ 3.60GHz CPU on the first version of this series. Focusing on the array variant of the bulk page allocator reveals the following.
(CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz) ARRAY variant: time_bulk_page_alloc_free_array: step=bulk size
Baseline Patched 1 56.383 54.225 (+3.83%) 2 40.047 35.492 (+11.38%) 3 37.339 32.643 (+12.58%) 4 35.578 30.992 (+12.89%) 8 33.592 29.606 (+11.87%) 16 32.362 28.532 (+11.85%) 32 31.476 27.728 (+11.91%) 64 30.633 27.252 (+11.04%) 128 30.596 27.090 (+11.46%)
While this is a positive outcome, the series is more likely to be interesting to the RT people in terms of getting parts of the PREEMPT_RT tree into mainline.
This patch (of 9):
The per-cpu page allocator lists and the per-cpu vmstat deltas are stored in the same struct per_cpu_pages even though vmstats have no direct impact on the per-cpu page lists. This is inconsistent because the vmstats for a node are stored on a dedicated structure. The bigger issue is that the per_cpu_pages structure is not cache-aligned and stat updates either cache conflict with adjacent per-cpu lists incurring a runtime cost or padding is required incurring a memory cost.
This patch splits the per-cpu pagelists and the vmstat deltas into separate structures. It's mostly a mechanical conversion but some variable renaming is done to clearly distinguish the per-cpu pages structure (pcp) from the vmstats (pzstats).
Superficially, this appears to increase the size of the per_cpu_pages structure but the movement of expire fills a structure hole so there is no impact overall.
[[email protected]: make it W=1 cleaner] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: make it W=1 even cleaner] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: check struct per_cpu_zonestat has a non-zero size] [[email protected]: Init zone->per_cpu_zonestats properly]
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
1c824a68 |
| 30-Apr-2021 |
Johannes Weiner <[email protected]> |
mm: page-writeback: simplify memcg handling in test_clear_page_writeback()
Page writeback doesn't hold a page reference, which allows truncate to free a page the second PageWriteback is cleared. Th
mm: page-writeback: simplify memcg handling in test_clear_page_writeback()
Page writeback doesn't hold a page reference, which allows truncate to free a page the second PageWriteback is cleared. This used to require special attention in test_clear_page_writeback(), where we had to be careful not to rely on the unstable page->memcg binding and look up all the necessary information before clearing the writeback flag.
Since commit 073861ed77b6 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)") test_clear_page_writeback() is called with an explicit reference on the page, and this dance is no longer needed.
Use unlock_page_memcg() and dec_lruvec_page_state() directly.
This removes the last user of the lock_page_memcg() return value, change it to void. Touch up the comments in there as well. This also removes the last extern user of __unlock_page_memcg(), make it static. Further, it removes the last user of dec_lruvec_state(), delete it, along with a few other unused helpers.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hugh Dickins <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
629484ae |
| 26-Feb-2021 |
Johannes Weiner <[email protected]> |
mm: vmstat: add some comments on internal storage of byte items
Byte-accounted items are used for slab object accounting at the cgroup level, because the objects in a slab page can belong to differe
mm: vmstat: add some comments on internal storage of byte items
Byte-accounted items are used for slab object accounting at the cgroup level, because the objects in a slab page can belong to different cgroups. At the global level these items always change in multiples of whole slab pages. The vmstat code exploits this and stores these items as pages internally, which allows for more compact per-cpu data.
This optimization isn't self-evident from the asserts and the division in the stat update functions. Provide the reader with some context.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1 |
|
| #
c47d5032 |
| 15-Dec-2020 |
Shakeel Butt <[email protected]> |
mm: move lruvec stats update functions to vmstat.h
Patch series "memcg: add pagetable comsumption to memory.stat", v2.
Many workloads consumes significant amount of memory in pagetables. One speci
mm: move lruvec stats update functions to vmstat.h
Patch series "memcg: add pagetable comsumption to memory.stat", v2.
Many workloads consumes significant amount of memory in pagetables. One specific use-case is the user space network driver which mmaps the application memory to provide zero copy transfer. This driver can consume a large amount memory in page tables. This patch series exposes the pagetable comsumption for each memory cgroup.
This patch (of 2):
This does not change any functionality and only move the functions which update the lruvec stats to vmstat.h from memcontrol.h. The main reason for this patch is to be able to use these functions in the page table contructor function which is defined in mm.h and we can not include the memcontrol.h in that file. Also this is a better place for this interface in general. The lruvec abstraction, while invented for memcg, isn't specific to memcg at all.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1 |
|
| #
ed017373 |
| 16-Oct-2020 |
Yu Zhao <[email protected]> |
mm: use self-explanatory macros rather than "2"
Signed-off-by: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alex Shi <[email protected]> Link: ht
mm: use self-explanatory macros rather than "2"
Signed-off-by: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alex Shi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.9, v5.9-rc8 |
|
| #
be458311 |
| 01-Oct-2020 |
Roman Gushchin <[email protected]> |
mm: memcg/slab: fix slab statistics in !SMP configuration
Since commit ea426c2a7de8 ("mm: memcg: prepare for byte-sized vmstat items") the write side of slab counters accepts a value in bytes and co
mm: memcg/slab: fix slab statistics in !SMP configuration
Since commit ea426c2a7de8 ("mm: memcg: prepare for byte-sized vmstat items") the write side of slab counters accepts a value in bytes and converts it to pages. It happens in __mod_node_page_state().
However a non-SMP version of __mod_node_page_state() doesn't perform this conversion. It leads to incorrect (unrealistically high) slab counters values. Fix this by adding a similar conversion to the non-SMP version of __mod_node_page_state().
Signed-off-by: Roman Gushchin <[email protected]> Reported-and-tested-by: Bastian Bittorf <[email protected]> Fixes: ea426c2a7de8 ("mm: memcg: prepare for byte-sized vmstat items") Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1 |
|
| #
ea426c2a |
| 07-Aug-2020 |
Roman Gushchin <[email protected]> |
mm: memcg: prepare for byte-sized vmstat items
To implement per-object slab memory accounting, we need to convert slab vmstat counters to bytes. Actually, out of 4 levels of counters: global, per-n
mm: memcg: prepare for byte-sized vmstat items
To implement per-object slab memory accounting, we need to convert slab vmstat counters to bytes. Actually, out of 4 levels of counters: global, per-node, per-memcg and per-lruvec only two last levels will require byte-sized counters. It's because global and per-node counters will be counting the number of slab pages, and per-memcg and per-lruvec will be counting the amount of memory taken by charged slab objects.
Converting all vmstat counters to bytes or even all slab counters to bytes would introduce an additional overhead. So instead let's store global and per-node counters in pages, and memcg and lruvec counters in bytes.
To make the API clean all access helpers (both on the read and write sides) are dealing with bytes.
To avoid back-and-forth conversions a new flavor of read-side helpers is introduced, which always returns values in pages: node_page_state_pages() and global_node_page_state_pages().
Actually new helpers are just reading raw values. Old helpers are simple wrappers, which will complain on an attempt to read byte value, because at the moment no one actually needs bytes.
Thanks to Johannes Weiner for the idea of having the byte-sized API on top of the page-sized internal storage.
Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|