|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13 |
|
| #
d6ff4c8f |
| 19-Jan-2025 |
Mateusz Guzik <[email protected]> |
fs: avoid mmap sem relocks when coredumping with many missing pages
Dumping processes with large allocated and mostly not-faulted areas is very slow.
Borrowing a test case from Tavian Barnes:
int
fs: avoid mmap sem relocks when coredumping with many missing pages
Dumping processes with large allocated and mostly not-faulted areas is very slow.
Borrowing a test case from Tavian Barnes:
int main(void) { char *mem = mmap(NULL, 1ULL << 40, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_NORESERVE | MAP_PRIVATE, -1, 0); printf("%p %m\n", mem); if (mem != MAP_FAILED) { mem[0] = 1; } abort(); }
That's 1TB of almost completely not-populated area.
On my test box it takes 13-14 seconds to dump.
The profile shows: - 99.89% 0.00% a.out entry_SYSCALL_64_after_hwframe do_syscall_64 syscall_exit_to_user_mode arch_do_signal_or_restart - get_signal - 99.89% do_coredump - 99.88% elf_core_dump - dump_user_range - 98.12% get_dump_page - 64.19% __get_user_pages - 40.92% gup_vma_lookup - find_vma - mt_find 4.21% __rcu_read_lock 1.33% __rcu_read_unlock - 3.14% check_vma_flags 0.68% vma_is_secretmem 0.61% __cond_resched 0.60% vma_pgtable_walk_end 0.59% vma_pgtable_walk_begin 0.58% no_page_table - 15.13% down_read_killable 0.69% __cond_resched 13.84% up_read 0.58% __cond_resched
Almost 29% of the time is spent relocking the mmap semaphore between calls to get_dump_page() which find nothing.
Whacking that results in times of 10 seconds (down from 13-14).
While here make the thing killable.
The real problem is the page-sized iteration and the real fix would patch it up instead. It is left as an exercise for the mm-familiar reader.
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1 |
|
| #
4f4c549f |
| 22-Dec-2022 |
Catalin Marinas <[email protected]> |
arm64: mte: Avoid the racy walk of the vma list during core dump
The MTE coredump code in arch/arm64/kernel/elfcore.c iterates over the vma list without the mmap_lock held. This can race with anothe
arm64: mte: Avoid the racy walk of the vma list during core dump
The MTE coredump code in arch/arm64/kernel/elfcore.c iterates over the vma list without the mmap_lock held. This can race with another process or userfaultfd concurrently modifying the vma list. Change the for_each_mte_vma macro and its callers to instead use the vma snapshot taken by dump_vma_snapshot() and stored in the cprm object.
Fixes: 6dd8b1a0b6cb ("arm64: mte: Dump the MTE tags in the core file") Cc: <[email protected]> # 5.18.x Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Seth Jenkins <[email protected]> Suggested-by: Seth Jenkins <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
19e183b5 |
| 22-Dec-2022 |
Catalin Marinas <[email protected]> |
elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
A subsequent fix for arm64 will use this parameter to parse the vma information from the snapshot created by dump_vma_snapshot() rat
elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
A subsequent fix for arm64 will use this parameter to parse the vma information from the snapshot created by dump_vma_snapshot() rather than traversing the vma list without the mmap_lock.
Fixes: 6dd8b1a0b6cb ("arm64: mte: Dump the MTE tags in the core file") Cc: <[email protected]> # 5.18.x Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Seth Jenkins <[email protected]> Suggested-by: Seth Jenkins <[email protected]> Cc: Will Deacon <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
736eedc9 |
| 22-Dec-2022 |
Catalin Marinas <[email protected]> |
arm64: mte: Fix double-freeing of the temporary tag storage during coredump
Commit 16decce22efa ("arm64: mte: Fix the stack frame size warning in mte_dump_tag_range()") moved the temporary tag stora
arm64: mte: Fix double-freeing of the temporary tag storage during coredump
Commit 16decce22efa ("arm64: mte: Fix the stack frame size warning in mte_dump_tag_range()") moved the temporary tag storage array from the stack to slab but it also introduced an error in double freeing this object. Remove the in-loop freeing.
Fixes: 16decce22efa ("arm64: mte: Fix the stack frame size warning in mte_dump_tag_range()") Cc: <[email protected]> # 5.18.x Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Seth Jenkins <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4 |
|
| #
e059853d |
| 04-Nov-2022 |
Catalin Marinas <[email protected]> |
arm64: mte: Fix/clarify the PG_mte_tagged semantics
Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored.
arm64: mte: Fix/clarify the PG_mte_tagged semantics
Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored. However, in mte_sync_tags() it is set before setting the tags to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on write in another thread can cause the new page to have stale tags. Similarly, tag reading via ptrace() can read stale tags if the PG_mte_tagged flag is set before actually clearing/restoring the tags.
Fix the PG_mte_tagged semantics so that it is only set after the tags have been cleared or restored. This is safe for swap restoring into a MAP_SHARED or CoW page since the core code takes the page lock. Add two functions to test and set the PG_mte_tagged flag with acquire and release semantics. The downside is that concurrent mprotect(PROT_MTE) on a MAP_SHARED page may cause tag loss. This is already the case for KVM guests if a VMM changes the page protection while the guest triggers a user_mem_abort().
Signed-off-by: Catalin Marinas <[email protected]> [[email protected]: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Steven Price <[email protected]> Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Peter Collingbourne <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5 |
|
| #
ef770d18 |
| 06-Sep-2022 |
Liam R. Howlett <[email protected]> |
arm64: Change elfcore for_each_mte_vma() to use VMA iterator
Rework for_each_mte_vma() to use a VMA iterator instead of an explicit linked-list.
Link: https://lkml.kernel.org/r/20220906194824.21104
arm64: Change elfcore for_each_mte_vma() to use VMA iterator
Rework for_each_mte_vma() to use a VMA iterator instead of an explicit linked-list.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5 |
|
| #
c35fe2a6 |
| 25-Apr-2022 |
Catalin Marinas <[email protected]> |
elf: Fix the arm64 MTE ELF segment name and value
Unfortunately, the name/value choice for the MTE ELF segment type (PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by PT_AARCH64_UNWI
elf: Fix the arm64 MTE ELF segment name and value
Unfortunately, the name/value choice for the MTE ELF segment type (PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by PT_AARCH64_UNWIND, as defined in the AArch64 ELF ABI (https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst).
Update the ELF segment type value to LOPROC+2 and also change the define to PT_AARCH64_MEMTAG_MTE to match the AArch64 ELF ABI namespace. The AArch64 ELF ABI document is updating accordingly (segment type not previously mentioned in the document).
Signed-off-by: Catalin Marinas <[email protected]> Fixes: 761b9b366cec ("elf: Introduce the ARM MTE ELF segment type") Cc: Will Deacon <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Machado <[email protected]> Cc: Richard Earnshaw <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1 |
|
| #
16decce2 |
| 01-Apr-2022 |
Catalin Marinas <[email protected]> |
arm64: mte: Fix the stack frame size warning in mte_dump_tag_range()
With 64K page configurations, the tags array stored on the stack of the mte_dump_tag_range() function is 2048 bytes, triggering a
arm64: mte: Fix the stack frame size warning in mte_dump_tag_range()
With 64K page configurations, the tags array stored on the stack of the mte_dump_tag_range() function is 2048 bytes, triggering a compiler warning when CONFIG_FRAME_WARN is enabled. Switch to a kmalloc() allocation via mte_allocate_tag_storage().
Signed-off-by: Catalin Marinas <[email protected]> Fixes: 6dd8b1a0b6cb ("arm64: mte: Dump the MTE tags in the core file") Reported-by: kernel test robot <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
a0ab7e5b |
| 04-Apr-2022 |
Will Deacon <[email protected]> |
Revert "arm64: Change elfcore for_each_mte_vma() to use VMA iterator"
This reverts commit 3a4f7ef4bed5bdc77a1ac8132f9f0650bbcb3eae.
Revert this temporary bodge. It only existed to ease integration
Revert "arm64: Change elfcore for_each_mte_vma() to use VMA iterator"
This reverts commit 3a4f7ef4bed5bdc77a1ac8132f9f0650bbcb3eae.
Revert this temporary bodge. It only existed to ease integration with the maple tree work for the 5.18 merge window and that doesn't appear to have landed in any case.
Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5 |
|
| #
3a4f7ef4 |
| 18-Feb-2022 |
Liam Howlett <[email protected]> |
arm64: Change elfcore for_each_mte_vma() to use VMA iterator
Rework for_each_mte_vma() to use a VMA iterator instead of an explicit linked-list. This will allow easy integration with the maple tree
arm64: Change elfcore for_each_mte_vma() to use VMA iterator
Rework for_each_mte_vma() to use a VMA iterator instead of an explicit linked-list. This will allow easy integration with the maple tree work which removes the VMA list altogether.
Signed-off-by: Liam R. Howlett <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] [will: Folded in fix from Catalin] Link: https://lore.kernel.org/r/[email protected] Signed-off--by: Catalin Marinas <[email protected]> Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc4, v5.17-rc3 |
|
| #
6dd8b1a0 |
| 31-Jan-2022 |
Catalin Marinas <[email protected]> |
arm64: mte: Dump the MTE tags in the core file
For each vma mapped with PROT_MTE (the VM_MTE flag set), generate a PT_ARM_MEMTAG_MTE segment in the core file and dump the corresponding tags. The in-
arm64: mte: Dump the MTE tags in the core file
For each vma mapped with PROT_MTE (the VM_MTE flag set), generate a PT_ARM_MEMTAG_MTE segment in the core file and dump the corresponding tags. The in-file size for such segments is 128 bytes per page.
For pages in a VM_MTE vma which are not present in the user page tables or don't have the PG_mte_tagged flag set (e.g. execute-only), just write zeros in the core file.
An example of program headers for two vmas, one 2-page, the other 4-page long:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align ... LOAD 0x030000 0x0000ffff80034000 0x0000000000000000 0x000000 0x002000 RW 0x1000 LOAD 0x030000 0x0000ffff80036000 0x0000000000000000 0x004000 0x004000 RW 0x1000 ... LOPROC+0x1 0x05b000 0x0000ffff80034000 0x0000000000000000 0x000100 0x002000 0 LOPROC+0x1 0x05b100 0x0000ffff80036000 0x0000000000000000 0x000200 0x004000 0
Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Luis Machado <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|