|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
5f6084f9 |
| 23-Jan-2025 |
Carlos Llamas <[email protected]> |
mm/oom_kill: fix trivial typo in comment
Update 'give' -> 'given' in the description of oom_reap_task_mm().
Link: https://lkml.kernel.org/r/[email protected] Signed-off-b
mm/oom_kill: fix trivial typo in comment
Update 'give' -> 'given' in the description of oom_reap_task_mm().
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Carlos Llamas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5 |
|
| #
ade81479 |
| 24-Dec-2024 |
Chen Ridong <[email protected]> |
memcg: fix soft lockup in the OOM process
A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered.
wat
memcg: fix soft lockup in the OOM process
A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered.
watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G Hardware name: Huawei Cloud OpenStack Nova, BIOS RIP: 0010:console_unlock+0x343/0x540 RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vprintk_emit+0x193/0x280 printk+0x52/0x6e dump_task+0x114/0x130 mem_cgroup_scan_tasks+0x76/0x100 dump_header+0x1fe/0x210 oom_kill_process+0xd1/0x100 out_of_memory+0x125/0x570 mem_cgroup_out_of_memory+0xb5/0xd0 try_charge+0x720/0x770 mem_cgroup_try_charge+0x86/0x180 mem_cgroup_try_charge_delay+0x1c/0x40 do_anonymous_page+0xb5/0x390 handle_mm_fault+0xc4/0x1f0
This is because thousands of processes are in the OOM cgroup, it takes a long time to traverse all of them. As a result, this lead to soft lockup in the OOM process.
To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' function per 1000 iterations. For global OOM, call 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
Link: https://lkml.kernel.org/r/[email protected] Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads") Signed-off-by: Chen Ridong <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Muchun Song <[email protected]> Cc: Michal Koutný <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
f2f48408 |
| 26-Sep-2024 |
Nanyong Sun <[email protected]> |
mm: move mm flags to mm_types.h
The types of mm flags are now far beyond the core dump related features. This patch moves mm flags from linux/sched/coredump.h to linux/mm_types.h. The linux/sched/c
mm: move mm flags to mm_types.h
The types of mm flags are now far beyond the core dump related features. This patch moves mm flags from linux/sched/coredump.h to linux/mm_types.h. The linux/sched/coredump.h has include the mm_types.h, so the C files related to coredump does not need to change head file inclusion. In addition, the inclusion of sched/coredump.h now can be deleted from the C files that irrelevant to core dump.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nanyong Sun <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
7998df0b |
| 28-Mar-2024 |
Joel Granados <[email protected]> |
memory: remove the now superfluous sentinel element from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentin
memory: remove the now superfluous sentinel element from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove sentinel from all files under mm/ that register a sysctl table.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
72ba14de |
| 23-Feb-2024 |
Carlos Galo <[email protected]> |
mm: update mark_victim tracepoints fields
The current implementation of the mark_victim tracepoint provides only the process ID (pid) of the victim process. This limitation poses challenges for use
mm: update mark_victim tracepoints fields
The current implementation of the mark_victim tracepoint provides only the process ID (pid) of the victim process. This limitation poses challenges for userspace tools requiring real-time OOM analysis and intervention. Although this information is available from the kernel logs, it’s not the appropriate format to provide OOM notifications. In Android, BPF programs are used with the mark_victim trace events to notify userspace of an OOM kill. For consistency, update the trace event to include the same information about the OOMed victim as the kernel logs.
- UID In Android each installed application has a unique UID. Including the `uid` assists in correlating OOM events with specific apps.
- Process Name (comm) Enables identification of the affected process.
- OOM Score Will allow userspace to get additional insight of the relative kill priority of the OOM victim. In Android, the oom_score_adj is used to categorize app state (foreground, background, etc.), which aids in analyzing user-perceptible impacts of OOM events [1].
- Total VM, RSS Stats, and pgtables Amount of memory used by the victim that will, potentially, be freed up by killing it.
[1] https://cs.android.com/android/platform/superproject/main/+/246dc8fc95b6d93afcba5c6d6c133307abb3ac2e:frameworks/base/services/core/java/com/android/server/am/ProcessList.java;l=188-283 Signed-off-by: Carlos Galo <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Masami Hiramatsu (Google)" <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
27873192 |
| 23-Nov-2023 |
Yong Wang <[email protected]> |
mm, oom:dump_tasks add rss detailed information printing
When the system is under oom, it prints out the RSS information of each process. However, we don't know the size of rss_anon, rss_file, and
mm, oom:dump_tasks add rss detailed information printing
When the system is under oom, it prints out the RSS information of each process. However, we don't know the size of rss_anon, rss_file, and rss_shmem.
To distinguish the memory occupied by anonymous or file mappings or shmem, could help us identify the root cause of the oom.
So this patch adds RSS details, which refers to the /proc/<pid>/status[1]. It can help us know more about process memory usage.
Example of oom including the new rss_* fields: [ 1630.902466] Tasks state (memory values in pages): [ 1630.902870] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name [ 1630.903619] [ 149] 0 149 486 288 0 288 0 36864 0 0 ash [ 1630.904210] [ 156] 0 156 153531 153345 153345 0 0 1269760 0 0 mm_test
[1] commit 8cee852ec53f ("mm, procfs: breakdown RSS for anon, shmem and file in /proc/pid/status").
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yong Wang <[email protected]> Reviewed-by: Yang Yang <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Xuexin Jiang <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
1f4f7f0f |
| 16-Oct-2023 |
Kairui Song <[email protected]> |
mm/oom_killer: simplify OOM killer info dump helper
There is only one caller wants to dump the kill victim info, so just let it call the standalone helper, no need to make the generic info dump help
mm/oom_killer: simplify OOM killer info dump helper
There is only one caller wants to dump the kill victim info, so just let it call the standalone helper, no need to make the generic info dump helper take an extra argument for that.
Result of bloat-o-meter: ./scripts/bloat-o-meter ./mm/oom_kill.old.o ./mm/oom_kill.o add/remove: 0/0 grow/shrink: 1/2 up/down: 131/-142 (-11) Function old new delta oom_kill_process 412 543 +131 out_of_memory 1422 1418 -4 dump_header 562 424 -138 Total: Before=21514, After=21503, chg -0.05%
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kairui Song <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5 |
|
| #
61f29738 |
| 04-Aug-2023 |
ZhangPeng <[email protected]> |
mm: remove redundant K() macro definition
Patch series "cleanup with helper macro K()".
Use helper macro K() to improve code readability. No functional modification involved. Remove redundant K()
mm: remove redundant K() macro definition
Patch series "cleanup with helper macro K()".
Use helper macro K() to improve code readability. No functional modification involved. Remove redundant K() macro definition.
This patch (of 7):
Since commit eb8589b4f8c1 ("mm: move mem_init_print_info() to mm_init.c"), the K() macro definition has been moved to mm/internal.h. Therefore, the definitions in mm/memcontrol.c, mm/backing-dev.c and mm/oom_kill.c are redundant. Drop redundant definitions.
[[email protected]: oom_kill.c: remove "#undef K", per Kefeng] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Nanyong Sun <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2 |
|
| #
4822acb1 |
| 08-May-2023 |
Haifeng Xu <[email protected]> |
mm, oom: do not check 0 mask in out_of_memory()
Since commit 60e2793d440a ("mm, oom: do not trigger out_of_memory from the #PF"), no user sets gfp_mask to 0. Remove the 0 mask check and update the
mm, oom: do not check 0 mask in out_of_memory()
Since commit 60e2793d440a ("mm, oom: do not trigger out_of_memory from the #PF"), no user sets gfp_mask to 0. Remove the 0 mask check and update the comments.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4 |
|
| #
7d4a8be0 |
| 10-Jan-2023 |
Alistair Popple <[email protected]> |
mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export
mmu_notifier_range_update_to_read_only() was originally introduced in commit c6d23413f81b ("mm/mmu_notifier: mmu_notifier
mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export
mmu_notifier_range_update_to_read_only() was originally introduced in commit c6d23413f81b ("mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper") as an optimisation for device drivers that know a range has only been mapped read-only. However there are no users of this feature so remove it. As it is the only user of the struct mmu_notifier_range.vma field remove that also.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Hubbard <[email protected]> Cc: Ralph Campbell <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3 |
|
| #
974f4367 |
| 23-Aug-2022 |
Michal Hocko <[email protected]> |
mm: reduce noise in show_mem for lowmem allocations
While discussing early DMA pool pre-allocation failure with Christoph [1] I have realized that the allocation failure warning is rather noisy for
mm: reduce noise in show_mem for lowmem allocations
While discussing early DMA pool pre-allocation failure with Christoph [1] I have realized that the allocation failure warning is rather noisy for constrained allocations like GFP_DMA{32}. Those zones are usually not populated on all nodes very often as their memory ranges are constrained.
This is an attempt to reduce the ballast that doesn't provide any relevant information for those allocation failures investigation. Please note that I have only compile tested it (in my default config setup) and I am throwing it mostly to see what people think about it.
[1] http://lkml.kernel.org/r/[email protected]
[[email protected]: update] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix build] [[email protected]: fix it for mapletree] [[email protected]: update it for Michal's update] [[email protected]: fix arch/powerpc/xmon/xmon.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arch/sparc/kernel/setup_32.c] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1 |
|
| #
b3541d91 |
| 31-May-2022 |
Suren Baghdasaryan <[email protected]> |
mm: delete unused MMF_OOM_VICTIM flag
With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now unused and can be removed.
[[email protected]: remove comment about now-remov
mm: delete unused MMF_OOM_VICTIM flag
With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now unused and can be removed.
[[email protected]: remove comment about now-removed mm_is_oom_victim()] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Brauner (Microsoft) <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jann Horn <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Peter Xu <[email protected]> Cc: John Hubbard <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Liam Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
bf3980c8 |
| 31-May-2022 |
Suren Baghdasaryan <[email protected]> |
mm: drop oom code from exit_mmap
The primary reason to invoke the oom reaper from the exit_mmap path used to be a prevention of an excessive oom killing if the oom victim exit races with the oom rea
mm: drop oom code from exit_mmap
The primary reason to invoke the oom reaper from the exit_mmap path used to be a prevention of an excessive oom killing if the oom victim exit races with the oom reaper (see [1] for more details). The invocation has moved around since then because of the interaction with the munlock logic but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be any blocking operation before the memory is unmapped by exit_mmap so the oom reaper invocation can be dropped. The unmapping part can be done with the non-exclusive mmap_sem and the exclusive one is only required when page tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to read. This is really unlikely to make any observable difference although some microbenchmarks could benefit from one less branch that needs to be evaluated even though it almost never is true.
[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") [2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") [3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
[[email protected]: restore Suren's mmap_read_lock() optimization] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Brauner (Microsoft) <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
e1c2c775 |
| 06-Sep-2022 |
Liam R. Howlett <[email protected]> |
mm/oom_kill: use vma iterators instead of vma linked list
Use vma iterator in preparation of removing the linked list.
Link: https://lkml.kernel.org/r/20220906194824.2110408-62-Liam.Howlett@oracle.
mm/oom_kill: use vma iterators instead of vma linked list
Use vma iterator in preparation of removing the linked list.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
a19cad06 |
| 01-Jun-2022 |
Andrew Morton <[email protected]> |
mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery
arm allnoconfig:
mm/oom_kill.c:60:25: warning: 'vm_oom_kill_table' defined but not used [-Wunused-variable] 60 | static struct ctl_table vm_oom_k
mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery
arm allnoconfig:
mm/oom_kill.c:60:25: warning: 'vm_oom_kill_table' defined but not used [-Wunused-variable] 60 | static struct ctl_table vm_oom_kill_table[] = { | ^~~~~~~~~~~~~~~~~
Cc: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4 |
|
| #
e4a38402 |
| 21-Apr-2022 |
Nico Pache <[email protected]> |
oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping
oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death.
A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death:
CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory)
If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely.
Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup.
Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
Based on a patch by Michal Hocko.
Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/[email protected] Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz <[email protected]> Signed-off-by: Nico Pache <[email protected]> Co-developed-by: Joel Savitz <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Rafael Aquini <[email protected]> Cc: Waiman Long <[email protected]> Cc: Herton R. Krzesinski <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ben Segall <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: David Rientjes <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joel Savitz <[email protected]> Cc: Darren Hart <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5 |
|
| #
43fe219a |
| 18-Feb-2022 |
sujiaxun <[email protected]> |
mm: move oom_kill sysctls to their own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain.
To help with this maintenance let's
mm: move oom_kill sysctls to their own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic.
So move the oom_kill sysctls to their own file, mm/oom_kill.c
[[email protected]: null-terminate the array] Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: sujiaxun <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Cc: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
show more ...
|
| #
bd8b77d6 |
| 22-Mar-2022 |
Miaohe Lin <[email protected]> |
mm/oom_kill: remove unneeded is_memcg_oom check
oom_cpuset_eligible() is always called when !is_memcg_oom(). Remove this unnecessary check.
Link: https://lkml.kernel.org/r/20220224115933.20154-1-l
mm/oom_kill: remove unneeded is_memcg_oom check
oom_cpuset_eligible() is always called when !is_memcg_oom(). Remove this unnecessary check.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
a213e5cf |
| 15-Feb-2022 |
Hugh Dickins <[email protected]> |
mm/munlock: delete munlock_vma_pages_all(), allow oomreap
munlock_vma_pages_range() will still be required, when munlocking but not munmapping a set of pages; but when unmapping a pte, the mlock cou
mm/munlock: delete munlock_vma_pages_all(), allow oomreap
munlock_vma_pages_range() will still be required, when munlocking but not munmapping a set of pages; but when unmapping a pte, the mlock count will be maintained in much the same way as it will be maintained when mapping in the pte. Which removes the need for munlock_vma_pages_all() on mlocked vmas when munmapping or exiting: eliminating the catastrophic contention on i_mmap_rwsem, and the need for page lock on the pages.
There is still a need to update locked_vm accounting according to the munmapped vmas when munmapping: do that in detach_vmas_to_be_unmapped(). exit_mmap() does not need locked_vm updates, so delete unlock_range().
And wasn't I the one who forbade the OOM reaper to attack mlocked vmas, because of the uncertainty in blocking on all those page locks? No fear of that now, so permit the OOM reaper on mlocked vmas.
Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
f530243a |
| 14-Jan-2022 |
Jann Horn <[email protected]> |
mm, oom: OOM sysrq should always kill a process
The OOM kill sysrq (alt+sysrq+F) should allow the user to kill the process with the highest OOM badness with a single execution.
However, at the mome
mm, oom: OOM sysrq should always kill a process
The OOM kill sysrq (alt+sysrq+F) should allow the user to kill the process with the highest OOM badness with a single execution.
However, at the moment, the OOM kill can bail out if an OOM notifier (e.g. the i915 one) says that it reclaimed a tiny amount of memory from somewhere. That's probably not what the user wants, so skip the bailout if the OOM was triggered via sysrq.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jann Horn <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
ba535c1c |
| 14-Jan-2022 |
Suren Baghdasaryan <[email protected]> |
mm/oom_kill: allow process_mrelease to run under mmap_lock protection
With exit_mmap holding mmap_write_lock during free_pgtables call, process_mrelease does not need to elevate mm->mm_users in orde
mm/oom_kill: allow process_mrelease to run under mmap_lock protection
With exit_mmap holding mmap_write_lock during free_pgtables call, process_mrelease does not need to elevate mm->mm_users in order to prevent exit_mmap from destrying pagetables while __oom_reap_task_mm is walking the VMA tree. The change prevents process_mrelease from calling the last mmput, which can lead to waiting for IO completion in exit_aio.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Jan Engelhardt <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
b6bf9abb |
| 14-Jan-2022 |
Dan Schatzberg <[email protected]> |
mm/memcg: add oom_group_kill memory event
Our container agent wants to know when a container exits if it was OOM killed or not to report to the user. We use memory.oom.group = 1 to ensure that OOM
mm/memcg: add oom_group_kill memory event
Our container agent wants to know when a container exits if it was OOM killed or not to report to the user. We use memory.oom.group = 1 to ensure that OOM kills within the container's cgroup kill everything. Existing memory.events are insufficient for knowing if this triggered:
1) Our current approach reads memory.events oom_kill and reports the container was killed if the value is non-zero. This is erroneous in some cases where containers create their children cgroups with memory.oom.group=1 as such OOM kills will get counted against the container cgroup's oom_kill counter despite not actually OOM killing the entire container.
2) Reading memory.events.local will fail to identify OOM kills in leaf cgroups (that don't set memory.oom.group) within the container cgroup.
This patch adds a new oom_group_kill event when memory.oom.group triggers to allow userspace to cleanly identify when an entire cgroup is oom killed.
[[email protected]: changes from Johannes and Chris] Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dan Schatzberg <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Chris Down <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Zefan Li <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Alex Shi <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2 |
|
| #
98b24b16 |
| 19-Nov-2021 |
Eric W. Biederman <[email protected]> |
signal: Have the oom killer detect coredumps using signal->core_state
In preparation for removing the flag SIGNAL_GROUP_COREDUMP, change __task_will_free_mem to test signal->core_state instead of th
signal: Have the oom killer detect coredumps using signal->core_state
In preparation for removing the flag SIGNAL_GROUP_COREDUMP, change __task_will_free_mem to test signal->core_state instead of the flag SIGNAL_GROUP_COREDUMP.
Both fields are protected by siglock and both live in signal_struct so there are no real tradeoffs here, just a change to which field is being tested.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc1 |
|
| #
3723929e |
| 05-Nov-2021 |
Sultan Alsawaf <[email protected]> |
mm: mark the OOM reaper thread as freezable
The OOM reaper alters user address space which might theoretically alter the snapshot if reaping is allowed to happen after the freezer quiescent state.
mm: mark the OOM reaper thread as freezable
The OOM reaper alters user address space which might theoretically alter the snapshot if reaping is allowed to happen after the freezer quiescent state. To this end, the reaper kthread uses wait_event_freezable() while waiting for any work so that it cannot run while the system freezes.
However, the current implementation doesn't respect the freezer because all kernel threads are created with the PF_NOFREEZE flag, so they are automatically excluded from freezing operations. This means that the OOM reaper can race with system snapshotting if it has work to do while the system is being frozen.
Fix this by adding a set_freezable() call which will clear the PF_NOFREEZE flag and thus make the OOM reaper visible to the freezer.
Please note that the OOM reaper altering the snapshot this way is mostly a theoretical concern and has not been observed in practice.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Sultan Alsawaf <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|