|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
d2734f04 |
| 12-Mar-2025 |
Shuai Xue <[email protected]> |
mm: memory-failure: enhance comments for return value of memory_failure()
The comments for the return value of memory_failure are not complete, supplement the comments.
Link: https://lkml.kernel.or
mm: memory-failure: enhance comments for return value of memory_failure()
The comments for the return value of memory_failure are not complete, supplement the comments.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shuai Xue <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Reviewed-by: Jane Chu <[email protected]> Acked-by: Miaohe Lin <[email protected]> Tested-by: Tony Luck <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ruidong Tian <[email protected]> Cc: Thomas Gleinxer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
aaf99ac2 |
| 12-Mar-2025 |
Shuai Xue <[email protected]> |
mm/hwpoison: do not send SIGBUS to processes with recovered clean pages
When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrec
mm/hwpoison: do not send SIGBUS to processes with recovered clean pages
When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed.
- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to #CMCI. Just to add to the confusion, Linux does take an action (in uc_decode_notifier()) to try to offline the page despite the UC*NA* signature name.
- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a speculative read. 2) Core finds it is going to use address A soon after sending the read request 3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison).
- Why user process is killed for instr case
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") tries to fix noise message "Memory error not recovered" and skips duplicate SIGBUSs due to the race. But it also introduced a bug that kill_accessing_process() return -EHWPOISON for instr case, as result, kill_me_maybe() send a SIGBUS to user process.
If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure(). For dirty pages, memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, converting the PTE to a hwpoison entry. As a result, kill_accessing_process():
- call walk_page_range() and return 1 regardless of whether try_to_unmap() succeeds or fails, - call kill_proc() to make sure a SIGBUS is sent - return -EHWPOISON to indicate that SIGBUS is already sent to the process and kill_me_maybe() doesn't have to send it again.
However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the PTE unchanged and not converted to a hwpoison entry. Conversely, for clean pages where PTE entries are not marked as hwpoison, kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send a SIGBUS.
Console log looks like this:
Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered Memory failure: 0x827ca68: already hardware poisoned mce: Memory error not recovered
To fix it, return 0 for "corrupted page was clean", preventing an unnecessary SIGBUS to user process.
[1] https://lore.kernel.org/lkml/[email protected]/T/#mba94f1305b3009dd340ce4114d3221fe810d1871 Link: https://lkml.kernel.org/r/[email protected] Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") Signed-off-by: Shuai Xue <[email protected]> Tested-by: Tony Luck <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ruidong Tian <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Yazen Ghannam <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5 |
|
| #
38607c62 |
| 28-Feb-2025 |
Alistair Popple <[email protected]> |
fs/dax: properly refcount fs dax pages
Currently fs dax pages are considered free when the refcount drops to one and their refcounts are not increased when mapped via PTEs or decreased when unmapped
fs/dax: properly refcount fs dax pages
Currently fs dax pages are considered free when the refcount drops to one and their refcounts are not increased when mapped via PTEs or decreased when unmapped. This requires special logic in mm paths to detect that these pages should not be properly refcounted, and to detect when the refcount drops to one instead of zero.
On the other hand get_user_pages(), etc. will properly refcount fs dax pages by taking a reference and dropping it when the page is unpinned.
Tracking this special behaviour requires extra PTE bits (eg. pte_devmap) and introduces rules that are potentially confusing and specific to FS DAX pages. To fix this, and to possibly allow removal of the special PTE bits in future, convert the fs dax page refcounts to be zero based and instead take a reference on the page each time it is mapped as is currently the case for normal pages.
This may also allow a future clean-up to remove the pgmap refcounting that is currently done in mm/gup.c.
Link: https://lkml.kernel.org/r/c7d886ad7468a20452ef6e0ddab6cfe220874e7c.1740713401.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Dan Williams <[email protected]> Tested-by: Alison Schofield <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Asahi Lina <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chunyan Zhang <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: linmiaohe <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michael "Camp Drill Sergeant" Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Ted Ts'o <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4 |
|
| #
b81679b1 |
| 17-Feb-2025 |
Ma Wupeng <[email protected]> |
mm: memory-failure: update ttu flag inside unmap_poisoned_folio
Patch series "mm: memory_failure: unmap poisoned folio during migrate properly", v3.
Fix two bugs during folio migration if the folio
mm: memory-failure: update ttu flag inside unmap_poisoned_folio
Patch series "mm: memory_failure: unmap poisoned folio during migrate properly", v3.
Fix two bugs during folio migration if the folio is poisoned.
This patch (of 3):
Commit 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON") introduce TTU_HWPOISON to replace TTU_IGNORE_HWPOISON in order to stop send SIGBUS signal when accessing an error page after a memory error on a clean folio. However during page migration, anon folio must be set with TTU_HWPOISON during unmap_*(). For pagecache we need some policy just like the one in hwpoison_user_mappings to set this flag. So move this policy from hwpoison_user_mappings to unmap_poisoned_folio to handle this warning properly.
Warning will be produced during unamp poison folio with the following log:
------------[ cut here ]------------ WARNING: CPU: 1 PID: 365 at mm/rmap.c:1847 try_to_unmap_one+0x8fc/0xd3c Modules linked in: CPU: 1 UID: 0 PID: 365 Comm: bash Tainted: G W 6.13.0-rc1-00018-gacdb4bbda7ab #42 Tainted: [W]=WARN Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : try_to_unmap_one+0x8fc/0xd3c lr : try_to_unmap_one+0x3dc/0xd3c Call trace: try_to_unmap_one+0x8fc/0xd3c (P) try_to_unmap_one+0x3dc/0xd3c (L) rmap_walk_anon+0xdc/0x1f8 rmap_walk+0x3c/0x58 try_to_unmap+0x88/0x90 unmap_poisoned_folio+0x30/0xa8 do_migrate_range+0x4a0/0x568 offline_pages+0x5a4/0x670 memory_block_action+0x17c/0x374 memory_subsys_offline+0x3c/0x78 device_offline+0xa4/0xd0 state_store+0x8c/0xf0 dev_attr_store+0x18/0x2c sysfs_kf_write+0x44/0x54 kernfs_fop_write_iter+0x118/0x1a8 vfs_write+0x3a8/0x4bc ksys_write+0x6c/0xf8 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x100 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0xc8/0xcc el0t_64_sync+0x198/0x19c ---[ end trace 0000000000000000 ]---
[[email protected]: unmap_poisoned_folio(): remove shadowed local `mapping', per Miaohe] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON") Signed-off-by: Ma Wupeng <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Ma Wupeng <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
408a8dc6 |
| 29-Oct-2024 |
zhangguopeng <[email protected]> |
mm/memory-failure: replace sprintf() with sysfs_emit()
As Documentation/filesystems/sysfs.rst suggested, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be return
mm/memory-failure: replace sprintf() with sysfs_emit()
As Documentation/filesystems/sysfs.rst suggested, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: zhangguopeng <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
68158bfa |
| 05-Oct-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
mm: mass constification of folio/page pointers
Now that page_pgoff() takes const pointers, we can constify the pointers to a lot of functions.
Link: https://lkml.kernel.org/r/20241005200121.3231142
mm: mass constification of folio/page pointers
Now that page_pgoff() takes const pointers, we can constify the pointers to a lot of functions.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
713da0b3 |
| 05-Oct-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
mm: renovate page_address_in_vma()
This function doesn't modify any of its arguments, so if we make a few other functions take const pointers, we can make page_address_in_vma() take const pointers t
mm: renovate page_address_in_vma()
This function doesn't modify any of its arguments, so if we make a few other functions take const pointers, we can make page_address_in_vma() take const pointers too. All of its callers have the containing folio already, so pass that in as an argument instead of recalculating it. Also add kernel-doc
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
f7470591 |
| 05-Oct-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
mm: convert page_to_pgoff() to page_pgoff()
Patch series "page->index removals in mm", v2.
As part of shrinking struct page, we need to stop using page->index. This patchset gets rid of most of th
mm: convert page_to_pgoff() to page_pgoff()
Patch series "page->index removals in mm", v2.
As part of shrinking struct page, we need to stop using page->index. This patchset gets rid of most of the remaining references to page->index in mm, as well as increasing the number of functions which take a const folio/page pointer. It shrinks the text segment of mm by a few hundred bytes in my test config, probably mostly from removing calls to compound_head() in page_to_pgoff().
This patch (of 7):
Change the function signature to pass in the folio as all three callers have it. This removes a reference to page->index, which we're trying to get rid of. And add kernel-doc.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
f1264e95 |
| 27-Aug-2024 |
Kefeng Wang <[email protected]> |
mm: migrate: add isolate_folio_to_list()
Add isolate_folio_to_list() helper to try to isolate HugeTLB, no-LRU movable and LRU folios to a list, which will be reused by do_migrate_range() from memory
mm: migrate: add isolate_folio_to_list()
Add isolate_folio_to_list() helper to try to isolate HugeTLB, no-LRU movable and LRU folios to a list, which will be reused by do_migrate_range() from memory hotplug soon, also drop the mf_isolate_folio() since we could directly use new helper in the soft_offline_in_use_page().
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Miaohe Lin <[email protected]> Tested-by: Miaohe Lin <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
16038c4f |
| 27-Aug-2024 |
Kefeng Wang <[email protected]> |
mm: memory-failure: add unmap_poisoned_folio()
Add unmap_poisoned_folio() helper which will be reused by do_migrate_range() from memory hotplug soon.
[[email protected]: whitespace tweak, p
mm: memory-failure: add unmap_poisoned_folio()
Add unmap_poisoned_folio() helper which will be reused by do_migrate_range() from memory hotplug soon.
[[email protected]: whitespace tweak, per Miaohe Lin] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3 |
|
| #
d75abd0d |
| 06-Aug-2024 |
Waiman Long <[email protected]> |
mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
The memory_failure_cpu structure is a per-cpu structure. Access to its content requires the use of get_cpu_var() to lock in the cu
mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
The memory_failure_cpu structure is a per-cpu structure. Access to its content requires the use of get_cpu_var() to lock in the current CPU and disable preemption. The use of a regular spinlock_t for locking purpose is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping lock in a preemption disabled context is illegal resulting in the following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0 [12135.732252] preempt_count: 1, expected: 0 [12135.732255] RCU nest depth: 2, expected: 2 : [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021 [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred [12135.732433] Call Trace: [12135.732436] <TASK> [12135.732450] dump_stack_lvl+0x57/0x81 [12135.732461] __might_resched.cold+0xf4/0x12f [12135.732479] rt_spin_lock+0x4c/0x100 [12135.732491] memory_failure_queue+0x40/0xe0 [12135.732503] ghes_do_memory_failure+0x53/0x390 [12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0 [12135.732575] ghes_proc+0xf9/0x1a0 [12135.732591] ghes_notify_hed+0x6a/0x150 [12135.732602] notifier_call_chain+0x43/0xb0 [12135.732626] blocking_notifier_call_chain+0x43/0x60 [12135.732637] acpi_ev_notify_dispatch+0x47/0x70 [12135.732648] acpi_os_execute_deferred+0x13/0x20 [12135.732654] process_one_work+0x41f/0x500 [12135.732695] worker_thread+0x192/0x360 [12135.732715] kthread+0x111/0x140 [12135.732733] ret_from_fork+0x29/0x50 [12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep with this call.
[[email protected]: don't hold percpu ref across pr_err(), per Miaohe] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant") Signed-off-by: Waiman Long <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Len Brown <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1, v6.10 |
|
| #
8a78882d |
| 08-Jul-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: remove obsolete MF_MSG_DIFFERENT_COMPOUND
The page cannot become compound pages again just after a folio is split as an extra refcnt is held. So the MF_MSG_DIFFERENT_COMPOUND cas
mm/memory-failure: remove obsolete MF_MSG_DIFFERENT_COMPOUND
The page cannot become compound pages again just after a folio is split as an extra refcnt is held. So the MF_MSG_DIFFERENT_COMPOUND case is obsolete and can be removed to get rid of this false assumption and code burden. But add one WARN_ON() here to keep the situation clear.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7 |
|
| #
e6c0c032 |
| 02-Jul-2024 |
Christophe Leroy <[email protected]> |
mm: provide mm_struct and address to huge_ptep_get()
On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is a PTE entry or a PMD entry. This cannot be known with the PMD entry i
mm: provide mm_struct and address to huge_ptep_get()
On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is a PTE entry or a PMD entry. This cannot be known with the PMD entry itself because there is no easy way to know it from the content of the entry.
So huge_ptep_get() will need to know either the size of the page or get the pmd.
In order to be consistent with huge_ptep_get_and_clear(), give mm and address to huge_ptep_get().
Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc6 |
|
| #
56374430 |
| 26-Jun-2024 |
Jiaqi Yan <[email protected]> |
mm/memory-failure: userspace controls soft-offlining pages
Correctable memory errors are very common on servers with large amount of memory, and are corrected by ECC. Soft offline is kernel's addit
mm/memory-failure: userspace controls soft-offlining pages
Correctable memory errors are very common on servers with large amount of memory, and are corrected by ECC. Soft offline is kernel's additional recovery handling for memory pages having (excessive) corrected memory errors. Impacted page is migrated to a healthy page if inuse; the original page is discarded for any future use.
The actual policy on whether (and when) to soft offline should be maintained by userspace, especially in case of an 1G HugeTLB page. Soft-offline dissolves the HugeTLB page, either in-use or free, into chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage. If userspace has not acknowledged such behavior, it may be surprised when later failed to mmap hugepages due to lack of hugepages. In case of a transparent hugepage, it will be split into 4K pages as well; userspace will stop enjoying the transparent performance.
In addition, discarding the entire 1G HugeTLB page only because of corrected memory errors sounds very costly and kernel better not doing under the hood. But today there are at least 2 such cases doing so: 1. when GHES driver sees both GHES_SEV_CORRECTED and CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER. 2. RAS Correctable Errors Collector counts correctable errors per PFN and when the counter for a PFN reaches threshold In both cases, userspace has no control of the soft offline performed by kernel's memory failure recovery.
This commit gives userspace the control of softofflining any page: kernel only soft offlines raw page / transparent hugepage / HugeTLB hugepage if userspace has agreed to. The interface to userspace is a new sysctl at /proc/sys/vm/enable_soft_offline. By default its value is set to 1 to preserve existing behavior in kernel. When set to 0, soft-offline (e.g. MADV_SOFT_OFFLINE) will fail with EOPNOTSUPP.
[[email protected]: v7] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiaqi Yan <[email protected]> Acked-by: Miaohe Lin <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Frank van der Linden <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Lance Yang <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
865319f7 |
| 26-Jun-2024 |
Jiaqi Yan <[email protected]> |
mm/memory-failure: refactor log format in soft offline code
Patch series "Userspace controls soft-offline pages", v6.
Correctable memory errors are very common on servers with large amount of memor
mm/memory-failure: refactor log format in soft offline code
Patch series "Userspace controls soft-offline pages", v6.
Correctable memory errors are very common on servers with large amount of memory, and are corrected by ECC, but with two pain points to users:
1. Correction usually happens on the fly and adds latency overhead 2. Not-fully-proved theory states excessive correctable memory errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages having (excessive) corrected memory errors. Impacted page is migrated to healthy page if it is in use, then the original page is discarded for any future use.
The actual policy on whether (and when) to soft offline should be maintained by userspace, especially in case of an 1G HugeTLB page. Soft-offline dissolves the HugeTLB page, either in-use or free, into chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage. If userspace has not acknowledged such behavior, it may be surprised when later mmap hugepages MAP_FAILED due to lack of hugepages. In case of a transparent hugepage, it will be split into 4K pages as well; userspace will stop enjoying the transparent performance.
In addition, discarding the entire 1G HugeTLB page only because of corrected memory errors sounds very costly and kernel better not doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER. 2. RAS Correctable Errors Collector counts correctable errors per PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed by kernel's memory failure recovery.
This patch series give userspace the control of softofflining any page: kernel only soft offlines raw page / transparent hugepage / HugeTLB hugepage if userspace has agreed to. The interface to userspace is a new sysctl called enable_soft_offline under /proc/sys/vm. By default enable_soft_line is 1 to preserve existing behavior in kernel.
This patch (of 4):
Logs from soft_offline_page and soft_offline_in_use_page have different formats than majority of the memory failure code:
"Memory failure: 0x${pfn}: ${lower_case_message}"
Convert them to the following format:
"Soft offline: 0x${pfn}: ${lower_case_message}"
No functional change in this commit.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiaqi Yan <[email protected]> Acked-by: Miaohe Lin <[email protected]> Reviewed-by: Lance Yang <[email protected]> Cc: David Rientjes <[email protected]> Cc: Frank van der Linden <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc5 |
|
| #
5cea5666 |
| 19-Jun-2024 |
Jiaqi Yan <[email protected]> |
mm/memory-failure: refactor log format in unpoison_memory
Logs from memory_failure and other memory-failure.c code follow the format:
"Memory failure: 0x{pfn}: ${lower_case_message}"
Convert the
mm/memory-failure: refactor log format in unpoison_memory
Logs from memory_failure and other memory-failure.c code follow the format:
"Memory failure: 0x{pfn}: ${lower_case_message}"
Convert the logs in unpoison_memory to follow similar format:
"Unpoison: 0x${pfn}: ${lower_case_message}"
For example (from local test): [ 1331.938397] Unpoison: 0x144bc8: page was already unpoisoned
No functional change in this commit.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiaqi Yan <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Jane Chu <[email protected]> Cc: Lance Yang <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc4 |
|
| #
e5d89670 |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: correct comment in me_swapcache_dirty
Dirty swap cache page could live both in page table (not page cache) and swap cache when freshly swapped in. Correct comment.
Link: https:/
mm/memory-failure: correct comment in me_swapcache_dirty
Dirty swap cache page could live both in page table (not page cache) and swap cache when freshly swapped in. Correct comment.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: kernel test robot <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
d49f2366 |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: remove obsolete comment in kill_proc()
When user sets SIGBUS to SIG_IGN, it won't cause loop now. For action required mce error, SIGBUS cannot be blocked. Also when a hwpoisoned
mm/memory-failure: remove obsolete comment in kill_proc()
When user sets SIGBUS to SIG_IGN, it won't cause loop now. For action required mce error, SIGBUS cannot be blocked. Also when a hwpoisoned page is re-accessed, kill_accessing_process() will be called to kill the process.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: kernel test robot <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
b71340ef |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: fix comment of get_hwpoison_page()
When return value is 0, it could also means the page is free hugetlb page or free buddy page. Fix the corresponding comment.
Link: https://lkm
mm/memory-failure: fix comment of get_hwpoison_page()
When return value is 0, it could also means the page is free hugetlb page or free buddy page. Fix the corresponding comment.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: kernel test robot <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
28eab7d4 |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: remove obsolete comment in unpoison_memory()
Since commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head"), folio->_mapcount is not overloaded wi
mm/memory-failure: remove obsolete comment in unpoison_memory()
Since commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head"), folio->_mapcount is not overloaded with SLAB. Update corresponding comment.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: kernel test robot <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
96e13a4e |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: use helper macro task_pid_nr()
Use helper macro task_pid_nr() to get the pid of a task. No functional change intended.
Link: https://lkml.kernel.org/r/20240612071835.157004-9-li
mm/memory-failure: use helper macro task_pid_nr()
Use helper macro task_pid_nr() to get the pid of a task. No functional change intended.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: kernel test robot <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
5a8b01be |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: don't export hwpoison_filter() when !CONFIG_HWPOISON_INJECT
When CONFIG_HWPOISON_INJECT is not enabled, there is no user of the hwpoison_filter() outside memory-failure. So there
mm/memory-failure: don't export hwpoison_filter() when !CONFIG_HWPOISON_INJECT
When CONFIG_HWPOISON_INJECT is not enabled, there is no user of the hwpoison_filter() outside memory-failure. So there is no need to export it in that case.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Cc: Borislav Petkov (AMD) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
4d64ab2f |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: remove confusing initialization to count
It's meaningless and confusing to init local variable count to 1. Remove it. No functional change intended.
Link: https://lkml.kernel.o
mm/memory-failure: remove confusing initialization to count
It's meaningless and confusing to init local variable count to 1. Remove it. No functional change intended.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: kernel test robot <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
7f8de206 |
| 12-Jun-2024 |
Miaohe Lin <[email protected]> |
mm/memory-failure: remove unneeded empty string
Remove unneeded empty string in definition of macro pr_fmt. No functional change intended.
Link: https://lkml.kernel.org/r/20240612071835.157004-6-l
mm/memory-failure: remove unneeded empty string
Remove unneeded empty string in definition of macro pr_fmt. No functional change intended.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: kernel test robot <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|