History log of /linux-6.15/mm/memory-failure.c (Results 1 – 25 of 525)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# d2734f04 12-Mar-2025 Shuai Xue <[email protected]>

mm: memory-failure: enhance comments for return value of memory_failure()

The comments for the return value of memory_failure are not complete,
supplement the comments.

Link: https://lkml.kernel.or

mm: memory-failure: enhance comments for return value of memory_failure()

The comments for the return value of memory_failure are not complete,
supplement the comments.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shuai Xue <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
Reviewed-by: Jane Chu <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Tested-by: Tony Luck <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ruidong Tian <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# aaf99ac2 12-Mar-2025 Shuai Xue <[email protected]>

mm/hwpoison: do not send SIGBUS to processes with recovered clean pages

When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrec

mm/hwpoison: do not send SIGBUS to processes with recovered clean pages

When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.

- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]

Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action
Optional) signature in the machine check bank. This was overkill because
it's not an urgent problem that no core is on the verge of consuming that
bad data. It's also found that multi SRAO UCE may cause nested MCE
interrupts and finally become an IERR.

Hence, Intel downgrades the machine check bank signature of patrol scrub
from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
#CMCI. Just to add to the confusion, Linux does take an action (in
uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.

- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]

Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if
an error is found in any read.

Thus:

1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core
that will soon try to retire the load from address A.

Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).

- Why user process is killed for instr case

Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.

If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. As a result,
kill_accessing_process():

- call walk_page_range() and return 1 regardless of whether
try_to_unmap() succeeds or fails,
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the
process and kill_me_maybe() doesn't have to send it again.

However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
PTE unchanged and not converted to a hwpoison entry. Conversely, for
clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send
a SIGBUS.

Console log looks like this:

Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
Memory failure: 0x827ca68: already hardware poisoned
mce: Memory error not recovered

To fix it, return 0 for "corrupted page was clean", preventing an
unnecessary SIGBUS to user process.

[1] https://lore.kernel.org/lkml/[email protected]/T/#mba94f1305b3009dd340ce4114d3221fe810d1871
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
Signed-off-by: Shuai Xue <[email protected]>
Tested-by: Tony Luck <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ruidong Tian <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Yazen Ghannam <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.14-rc6, v6.14-rc5
# 38607c62 28-Feb-2025 Alistair Popple <[email protected]>

fs/dax: properly refcount fs dax pages

Currently fs dax pages are considered free when the refcount drops to one
and their refcounts are not increased when mapped via PTEs or decreased
when unmapped

fs/dax: properly refcount fs dax pages

Currently fs dax pages are considered free when the refcount drops to one
and their refcounts are not increased when mapped via PTEs or decreased
when unmapped. This requires special logic in mm paths to detect that
these pages should not be properly refcounted, and to detect when the
refcount drops to one instead of zero.

On the other hand get_user_pages(), etc. will properly refcount fs dax
pages by taking a reference and dropping it when the page is unpinned.

Tracking this special behaviour requires extra PTE bits (eg. pte_devmap)
and introduces rules that are potentially confusing and specific to FS DAX
pages. To fix this, and to possibly allow removal of the special PTE bits
in future, convert the fs dax page refcounts to be zero based and instead
take a reference on the page each time it is mapped as is currently the
case for normal pages.

This may also allow a future clean-up to remove the pgmap refcounting that
is currently done in mm/gup.c.

Link: https://lkml.kernel.org/r/c7d886ad7468a20452ef6e0ddab6cfe220874e7c.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Tested-by: Alison Schofield <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Asahi Lina <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chunyan Zhang <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: linmiaohe <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Matthew Wilcow (Oracle) <[email protected]>
Cc: Michael "Camp Drill Sergeant" Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Ted Ts'o <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.14-rc4
# b81679b1 17-Feb-2025 Ma Wupeng <[email protected]>

mm: memory-failure: update ttu flag inside unmap_poisoned_folio

Patch series "mm: memory_failure: unmap poisoned folio during migrate
properly", v3.

Fix two bugs during folio migration if the folio

mm: memory-failure: update ttu flag inside unmap_poisoned_folio

Patch series "mm: memory_failure: unmap poisoned folio during migrate
properly", v3.

Fix two bugs during folio migration if the folio is poisoned.


This patch (of 3):

Commit 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to
TTU_HWPOISON") introduce TTU_HWPOISON to replace TTU_IGNORE_HWPOISON in
order to stop send SIGBUS signal when accessing an error page after a
memory error on a clean folio. However during page migration, anon folio
must be set with TTU_HWPOISON during unmap_*(). For pagecache we need
some policy just like the one in hwpoison_user_mappings to set this flag.
So move this policy from hwpoison_user_mappings to unmap_poisoned_folio to
handle this warning properly.

Warning will be produced during unamp poison folio with the following log:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 365 at mm/rmap.c:1847 try_to_unmap_one+0x8fc/0xd3c
Modules linked in:
CPU: 1 UID: 0 PID: 365 Comm: bash Tainted: G W 6.13.0-rc1-00018-gacdb4bbda7ab #42
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0x8fc/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0x8fc/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
---[ end trace 0000000000000000 ]---

[[email protected]: unmap_poisoned_folio(): remove shadowed local `mapping', per Miaohe]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON")
Signed-off-by: Ma Wupeng <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Cc: Ma Wupeng <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1
# 1751f872 28-Jan-2025 Joel Granados <[email protected]>

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysc

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
virtual patch

@
depends on !(file in "net")
disable optional_qualifier
@

identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@

+ const
struct ctl_table table_name [] = { ... };

sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c

Reviewed-by: Song Liu <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/
Reviewed-by: Martin K. Petersen <[email protected]> # SCSI
Reviewed-by: Darrick J. Wong <[email protected]> # xfs
Acked-by: Jani Nikula <[email protected]>
Acked-by: Corey Minyard <[email protected]>
Acked-by: Wei Liu <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Reviewed-by: Bill O'Donnell <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Ashutosh Dixit <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6
# 408a8dc6 29-Oct-2024 zhangguopeng <[email protected]>

mm/memory-failure: replace sprintf() with sysfs_emit()

As Documentation/filesystems/sysfs.rst suggested, show() should only use
sysfs_emit() or sysfs_emit_at() when formatting the value to be return

mm/memory-failure: replace sprintf() with sysfs_emit()

As Documentation/filesystems/sysfs.rst suggested, show() should only use
sysfs_emit() or sysfs_emit_at() when formatting the value to be returned
to user space.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: zhangguopeng <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2
# 68158bfa 05-Oct-2024 Matthew Wilcox (Oracle) <[email protected]>

mm: mass constification of folio/page pointers

Now that page_pgoff() takes const pointers, we can constify the pointers
to a lot of functions.

Link: https://lkml.kernel.org/r/20241005200121.3231142

mm: mass constification of folio/page pointers

Now that page_pgoff() takes const pointers, we can constify the pointers
to a lot of functions.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 713da0b3 05-Oct-2024 Matthew Wilcox (Oracle) <[email protected]>

mm: renovate page_address_in_vma()

This function doesn't modify any of its arguments, so if we make a few
other functions take const pointers, we can make page_address_in_vma()
take const pointers t

mm: renovate page_address_in_vma()

This function doesn't modify any of its arguments, so if we make a few
other functions take const pointers, we can make page_address_in_vma()
take const pointers too. All of its callers have the containing folio
already, so pass that in as an argument instead of recalculating it. Also
add kernel-doc

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# f7470591 05-Oct-2024 Matthew Wilcox (Oracle) <[email protected]>

mm: convert page_to_pgoff() to page_pgoff()

Patch series "page->index removals in mm", v2.

As part of shrinking struct page, we need to stop using page->index. This
patchset gets rid of most of th

mm: convert page_to_pgoff() to page_pgoff()

Patch series "page->index removals in mm", v2.

As part of shrinking struct page, we need to stop using page->index. This
patchset gets rid of most of the remaining references to page->index in
mm, as well as increasing the number of functions which take a const
folio/page pointer. It shrinks the text segment of mm by a few hundred
bytes in my test config, probably mostly from removing calls to
compound_head() in page_to_pgoff().


This patch (of 7):

Change the function signature to pass in the folio as all three callers
have it. This removes a reference to page->index, which we're trying to
get rid of. And add kernel-doc.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6
# f1264e95 27-Aug-2024 Kefeng Wang <[email protected]>

mm: migrate: add isolate_folio_to_list()

Add isolate_folio_to_list() helper to try to isolate HugeTLB, no-LRU
movable and LRU folios to a list, which will be reused by
do_migrate_range() from memory

mm: migrate: add isolate_folio_to_list()

Add isolate_folio_to_list() helper to try to isolate HugeTLB, no-LRU
movable and LRU folios to a list, which will be reused by
do_migrate_range() from memory hotplug soon, also drop the
mf_isolate_folio() since we could directly use new helper in the
soft_offline_in_use_page().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Tested-by: Miaohe Lin <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 16038c4f 27-Aug-2024 Kefeng Wang <[email protected]>

mm: memory-failure: add unmap_poisoned_folio()

Add unmap_poisoned_folio() helper which will be reused by
do_migrate_range() from memory hotplug soon.

[[email protected]: whitespace tweak, p

mm: memory-failure: add unmap_poisoned_folio()

Add unmap_poisoned_folio() helper which will be reused by
do_migrate_range() from memory hotplug soon.

[[email protected]: whitespace tweak, per Miaohe Lin]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3
# d75abd0d 06-Aug-2024 Waiman Long <[email protected]>

mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu

The memory_failure_cpu structure is a per-cpu structure. Access to its
content requires the use of get_cpu_var() to lock in the cu

mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu

The memory_failure_cpu structure is a per-cpu structure. Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption. The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.

Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.

[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
[12135.732252] preempt_count: 1, expected: 0
[12135.732255] RCU nest depth: 2, expected: 2
:
[12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
[12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
[12135.732433] Call Trace:
[12135.732436] <TASK>
[12135.732450] dump_stack_lvl+0x57/0x81
[12135.732461] __might_resched.cold+0xf4/0x12f
[12135.732479] rt_spin_lock+0x4c/0x100
[12135.732491] memory_failure_queue+0x40/0xe0
[12135.732503] ghes_do_memory_failure+0x53/0x390
[12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0
[12135.732575] ghes_proc+0xf9/0x1a0
[12135.732591] ghes_notify_hed+0x6a/0x150
[12135.732602] notifier_call_chain+0x43/0xb0
[12135.732626] blocking_notifier_call_chain+0x43/0x60
[12135.732637] acpi_ev_notify_dispatch+0x47/0x70
[12135.732648] acpi_os_execute_deferred+0x13/0x20
[12135.732654] process_one_work+0x41f/0x500
[12135.732695] worker_thread+0x192/0x360
[12135.732715] kthread+0x111/0x140
[12135.732733] ret_from_fork+0x29/0x50
[12135.732779] </TASK>

Fix it by using a raw_spinlock_t for locking instead.

Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.

[[email protected]: don't hold percpu ref across pr_err(), per Miaohe]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.11-rc2, v6.11-rc1, v6.10
# 8a78882d 08-Jul-2024 Miaohe Lin <[email protected]>

mm/memory-failure: remove obsolete MF_MSG_DIFFERENT_COMPOUND

The page cannot become compound pages again just after a folio is split as
an extra refcnt is held. So the MF_MSG_DIFFERENT_COMPOUND cas

mm/memory-failure: remove obsolete MF_MSG_DIFFERENT_COMPOUND

The page cannot become compound pages again just after a folio is split as
an extra refcnt is held. So the MF_MSG_DIFFERENT_COMPOUND case is
obsolete and can be removed to get rid of this false assumption and code
burden. But add one WARN_ON() here to keep the situation clear.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.10-rc7
# e6c0c032 02-Jul-2024 Christophe Leroy <[email protected]>

mm: provide mm_struct and address to huge_ptep_get()

On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is
a PTE entry or a PMD entry. This cannot be known with the PMD entry
i

mm: provide mm_struct and address to huge_ptep_get()

On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is
a PTE entry or a PMD entry. This cannot be known with the PMD entry
itself because there is no easy way to know it from the content of the
entry.

So huge_ptep_get() will need to know either the size of the page or get
the pmd.

In order to be consistent with huge_ptep_get_and_clear(), give mm and
address to huge_ptep_get().

Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.10-rc6
# 56374430 26-Jun-2024 Jiaqi Yan <[email protected]>

mm/memory-failure: userspace controls soft-offlining pages

Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC. Soft offline is kernel's addit

mm/memory-failure: userspace controls soft-offlining pages

Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC. Soft offline is kernel's additional
recovery handling for memory pages having (excessive) corrected memory
errors. Impacted page is migrated to a healthy page if inuse; the
original page is discarded for any future use.

The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page.
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage. If
userspace has not acknowledged such behavior, it may be surprised when
later failed to mmap hugepages due to lack of hugepages. In case of a
transparent hugepage, it will be split into 4K pages as well; userspace
will stop enjoying the transparent performance.

In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not doing
under the hood. But today there are at least 2 such cases doing so:
1. when GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.

This commit gives userspace the control of softofflining any page: kernel
only soft offlines raw page / transparent hugepage / HugeTLB hugepage if
userspace has agreed to. The interface to userspace is a new sysctl at
/proc/sys/vm/enable_soft_offline. By default its value is set to 1 to
preserve existing behavior in kernel. When set to 0, soft-offline (e.g.
MADV_SOFT_OFFLINE) will fail with EOPNOTSUPP.

[[email protected]: v7]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiaqi Yan <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Frank van der Linden <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 865319f7 26-Jun-2024 Jiaqi Yan <[email protected]>

mm/memory-failure: refactor log format in soft offline code

Patch series "Userspace controls soft-offline pages", v6.

Correctable memory errors are very common on servers with large amount of
memor

mm/memory-failure: refactor log format in soft offline code

Patch series "Userspace controls soft-offline pages", v6.

Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC, but with two pain points to users:

1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.

Soft offline is kernel's additional solution for memory pages having
(excessive) corrected memory errors. Impacted page is migrated to healthy
page if it is in use, then the original page is discarded for any future
use.

The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page.
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage. If
userspace has not acknowledged such behavior, it may be surprised when
later mmap hugepages MAP_FAILED due to lack of hugepages. In case of a
transparent hugepage, it will be split into 4K pages as well; userspace
will stop enjoying the transparent performance.

In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not doing
under the hood. But today there are at least 2 such cases:

1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold

In both cases, userspace has no control of the soft offline performed by
kernel's memory failure recovery.

This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to. The interface to userspace is a new
sysctl called enable_soft_offline under /proc/sys/vm. By default
enable_soft_line is 1 to preserve existing behavior in kernel.


This patch (of 4):

Logs from soft_offline_page and soft_offline_in_use_page have different
formats than majority of the memory failure code:

"Memory failure: 0x${pfn}: ${lower_case_message}"

Convert them to the following format:

"Soft offline: 0x${pfn}: ${lower_case_message}"

No functional change in this commit.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiaqi Yan <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Reviewed-by: Lance Yang <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Frank van der Linden <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.10-rc5
# 5cea5666 19-Jun-2024 Jiaqi Yan <[email protected]>

mm/memory-failure: refactor log format in unpoison_memory

Logs from memory_failure and other memory-failure.c code follow the
format:

"Memory failure: 0x{pfn}: ${lower_case_message}"

Convert the

mm/memory-failure: refactor log format in unpoison_memory

Logs from memory_failure and other memory-failure.c code follow the
format:

"Memory failure: 0x{pfn}: ${lower_case_message}"

Convert the logs in unpoison_memory to follow similar format:

"Unpoison: 0x${pfn}: ${lower_case_message}"

For example (from local test):
[ 1331.938397] Unpoison: 0x144bc8: page was already unpoisoned

No functional change in this commit.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiaqi Yan <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.10-rc4
# e5d89670 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: correct comment in me_swapcache_dirty

Dirty swap cache page could live both in page table (not page cache) and
swap cache when freshly swapped in. Correct comment.

Link: https:/

mm/memory-failure: correct comment in me_swapcache_dirty

Dirty swap cache page could live both in page table (not page cache) and
swap cache when freshly swapped in. Correct comment.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# d49f2366 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: remove obsolete comment in kill_proc()

When user sets SIGBUS to SIG_IGN, it won't cause loop now. For action
required mce error, SIGBUS cannot be blocked. Also when a hwpoisoned

mm/memory-failure: remove obsolete comment in kill_proc()

When user sets SIGBUS to SIG_IGN, it won't cause loop now. For action
required mce error, SIGBUS cannot be blocked. Also when a hwpoisoned page
is re-accessed, kill_accessing_process() will be called to kill the
process.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# b71340ef 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: fix comment of get_hwpoison_page()

When return value is 0, it could also means the page is free hugetlb page
or free buddy page. Fix the corresponding comment.

Link: https://lkm

mm/memory-failure: fix comment of get_hwpoison_page()

When return value is 0, it could also means the page is free hugetlb page
or free buddy page. Fix the corresponding comment.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 28eab7d4 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: remove obsolete comment in unpoison_memory()

Since commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to
allow larger rcu_head"), folio->_mapcount is not overloaded wi

mm/memory-failure: remove obsolete comment in unpoison_memory()

Since commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to
allow larger rcu_head"), folio->_mapcount is not overloaded with SLAB.
Update corresponding comment.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 96e13a4e 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: use helper macro task_pid_nr()

Use helper macro task_pid_nr() to get the pid of a task. No functional
change intended.

Link: https://lkml.kernel.org/r/20240612071835.157004-9-li

mm/memory-failure: use helper macro task_pid_nr()

Use helper macro task_pid_nr() to get the pid of a task. No functional
change intended.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 5a8b01be 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: don't export hwpoison_filter() when !CONFIG_HWPOISON_INJECT

When CONFIG_HWPOISON_INJECT is not enabled, there is no user of the
hwpoison_filter() outside memory-failure. So there

mm/memory-failure: don't export hwpoison_filter() when !CONFIG_HWPOISON_INJECT

When CONFIG_HWPOISON_INJECT is not enabled, there is no user of the
hwpoison_filter() outside memory-failure. So there is no need to export
it in that case.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 4d64ab2f 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: remove confusing initialization to count

It's meaningless and confusing to init local variable count to 1. Remove
it. No functional change intended.

Link: https://lkml.kernel.o

mm/memory-failure: remove confusing initialization to count

It's meaningless and confusing to init local variable count to 1. Remove
it. No functional change intended.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 7f8de206 12-Jun-2024 Miaohe Lin <[email protected]>

mm/memory-failure: remove unneeded empty string

Remove unneeded empty string in definition of macro pr_fmt. No functional
change intended.

Link: https://lkml.kernel.org/r/20240612071835.157004-6-l

mm/memory-failure: remove unneeded empty string

Remove unneeded empty string in definition of macro pr_fmt. No functional
change intended.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


12345678910>>...21