|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
aed877c2 |
| 28-Feb-2025 |
Alistair Popple <[email protected]> |
device/dax: properly refcount device dax pages when mapping
Device DAX pages are currently not reference counted when mapped, instead relying on the devmap PTE bit to ensure mapping code will not ge
device/dax: properly refcount device dax pages when mapping
Device DAX pages are currently not reference counted when mapped, instead relying on the devmap PTE bit to ensure mapping code will not get/put references. This requires special handling in various page table walkers, particularly GUP, to manage references on the underlying pgmap to ensure the pages remain valid.
However there is no reason these pages can't be refcounted properly at map time. Doning so eliminates the need for the devmap PTE bit, freeing up a precious PTE bit. It also simplifies GUP as it no longer needs to manage the special pgmap references and can instead just treat the pages normally as defined by vm_normal_page().
Link: https://lkml.kernel.org/r/968d3a8e9157e7492e85d065765c027e525f9fc9.1740713401.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <[email protected]> Tested-by: Alison Schofield <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Asahi Lina <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chunyan Zhang <[email protected]> Cc: Dan Wiliams <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: linmiaohe <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michael "Camp Drill Sergeant" Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Ted Ts'o <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
995abaaa |
| 16-Dec-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
dax: remove access to page->index
This looks like a complete mess (why are we setting page->index at page fault time?), but I no longer care about DAX, and there's no reason to let DAX hold us back
dax: remove access to page->index
This looks like a complete mess (why are we setting page->index at page fault time?), but I no longer care about DAX, and there's no reason to let DAX hold us back from removing page->index.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
7fcbd978 |
| 27-Sep-2024 |
Kun(llfl) <[email protected]> |
device-dax: correct pgoff align in dax_set_mapping()
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise, vmf->address not aligned to fault_size will be aligned to the next ali
device-dax: correct pgoff align in dax_set_mapping()
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise, vmf->address not aligned to fault_size will be aligned to the next alignment, that can result in memory failure getting the wrong address.
It's a subtle situation that only can be observed in page_mapped_in_vma() after the page is page fault handled by dev_dax_huge_fault. Generally, there is little chance to perform page_mapped_in_vma in dev-dax's page unless in specific error injection to the dax device to trigger an MCE - memory-failure. In that case, page_mapped_in_vma() will be triggered to determine which task is accessing the failure address and kill that task in the end.
We used self-developed dax device (which is 2M aligned mapping) , to perform error injection to random address. It turned out that error injected to non-2M-aligned address was causing endless MCE until panic. Because page_mapped_in_vma() kept resulting wrong address and the task accessing the failure address was never killed properly:
[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.049006] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.448042] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.792026] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.162502] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.461116] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.764730] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.042128] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.464293] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.818090] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3787.085297] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page: Recovered
It took us several weeks to pinpoint this problem, but we eventually used bpftrace to trace the page fault and mce address and successfully identified the issue.
Joao added:
; Likely we never reproduce in production because we always pin : device-dax regions in the region align they provide (Qemu does : similarly with prealloc in hugetlb/file backed memory). I think this : bug requires that we touch *unpinned* device-dax regions unaligned to : the device-dax selected alignment (page size i.e. 4K/2M/1G)
Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.1727421694.git.llfl@linux.alibaba.com Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff") Signed-off-by: Kun(llfl) <[email protected]> Tested-by: JianXiong Zhao <[email protected]> Reviewed-by: Joao Martins <[email protected]> Cc: Dan Williams <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
5b198b47 |
| 12-Aug-2024 |
Peter Xu <[email protected]> |
mm/dax: dump start address in fault handler
Patch series "mm/mprotect: Fix dax puds", v5.
Dax supports pud pages for a while, but mprotect on puds was missing since the start. This series tries to
mm/dax: dump start address in fault handler
Patch series "mm/mprotect: Fix dax puds", v5.
Dax supports pud pages for a while, but mprotect on puds was missing since the start. This series tries to fix that by providing pud handling in mprotect(). The goal is to add more types of pud mappings like hugetlb or pfnmaps. This series paves way for it by fixing known pud entries.
Considering nobody reported this until when I looked at those other types of pud mappings, I am thinking maybe it doesn't need to be a fix for stable and this may not need to be backported. I would guess whoever cares about mprotect() won't care 1G dax puds yet, vice versa. I hope fixing that in new kernels would be fine, but I'm open to suggestions.
There're a few small things changed to teach mprotect work on PUDs. E.g. it will need to start with dropping NUMA_HUGE_PTE_UPDATES which may stop making sense when there can be more than one type of huge pte. OTOH, we'll also need to push the mmu notifiers from pmd to pud layers, which might need some attention but so far I think it's safe. For such details, please refer to each patch's commit message.
The mprotect() pud process should be straightforward, as I kept it as simple as possible. There's no NUMA handled as dax simply doesn't support that. There's also no userfault involvements as file memory (even if work with userfault-wp async mode) will need to split a pud, so pud entry doesn't need to yet know userfault's existance (but hugetlb entries will; that's also for later).
This patch (of 7):
Currently the dax fault handler dumps the vma range when dynamic debugging enabled. That's mostly not useful. Dump the (aligned) address instead with the order info.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Edgecombe, Rick P" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
1d5198dd |
| 05-Jun-2024 |
Jeff Johnson <[email protected]> |
dax: add missing MODULE_DESCRIPTION() macros
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dax/hmem/dax_hmem.o WARNING: modpost: missing MODULE
dax: add missing MODULE_DESCRIPTION() macros
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dax/hmem/dax_hmem.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dax/device_dax.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dax/kmem.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dax/dax_pmem.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dax/dax_cxl.o
Add all missing invocations of the MODULE_DESCRIPTION() macro.
[iweiny: edit descriptions]
Signed-off-by: Jeff Johnson <[email protected]> Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Ira Weiny <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
529ce23a |
| 26-Mar-2024 |
Rick Edgecombe <[email protected]> |
mm: switch mm->get_unmapped_area() to a flag
The mm_struct contains a function pointer *get_unmapped_area(), which is set to either arch_get_unmapped_area() or arch_get_unmapped_area_topdown() durin
mm: switch mm->get_unmapped_area() to a flag
The mm_struct contains a function pointer *get_unmapped_area(), which is set to either arch_get_unmapped_area() or arch_get_unmapped_area_topdown() during the initialization of the mm.
Since the function pointer only ever points to two functions that are named the same across all arch's, a function pointer is not really required. In addition future changes will want to add versions of the functions that take additional arguments. So to save a pointers worth of bytes in mm_struct, and prevent adding additional function pointers to mm_struct in future changes, remove it and keep the information about which get_unmapped_area() to use in a flag.
Add the new flag to MMF_INIT_MASK so it doesn't get clobbered on fork by mmf_init_flags(). Most MM flags get clobbered on fork. In the pre-existing behavior mm->get_unmapped_area() would get copied to the new mm in dup_mm(), so not clobbering the flag preserves the existing behavior around inheriting the topdown-ness.
Introduce a helper, mm_get_unmapped_area(), to easily convert code that refers to the old function pointer to instead select and call either arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the flag. Then drop the mm->get_unmapped_area() function pointer. Leave the get_unmapped_area() pointer in struct file_operations alone. The main purpose of this change is to reorganize in preparation for future changes, but it also converts the calls of mm->get_unmapped_area() from indirect branches into a direct ones.
The stress-ng bigheap benchmark calls realloc a lot, which calls through get_unmapped_area() in the kernel. On x86, the change yielded a ~1% improvement there on a retpoline config.
In testing a few x86 configs, removing the pointer unfortunately didn't result in any actual size reductions in the compiled layout of mm_struct. But depending on compiler or arch alignment requirements, the change could shrink the size of mm_struct.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rick Edgecombe <[email protected]> Acked-by: Dave Hansen <[email protected]> Acked-by: Liam R. Howlett <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Dan Williams <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Deepak Gupta <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: H. Peter Anvin (Intel) <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mark Brown <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
210a03c9 |
| 28-Mar-2024 |
Christian Brauner <[email protected]> |
fs: claw back a few FMODE_* bits
There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to cha
fs: claw back a few FMODE_* bits
There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to change for individual files. Imho, such flags don't need to live in f_mode they might as well live in the fops structs itself. And the fops struct already has that lonely mmap_supported_flags member. We might as well turn that into a generic fop_flags member and move a few flags from FMODE_* space into FOP_* space. That gets us four FMODE_* bits back and the ability for new static flags that are about file ops to not have to live in FMODE_* space but in their own FOP_* space. It's not the most beautiful thing ever but it gets the job done. Yes, there'll be an additional pointer chase but hopefully that won't matter for these flags.
I suspect there's a few more we can move into there and that we can also redirect a bunch of new flag suggestions that follow this pattern into the fop_flags field instead of f_mode.
Link: https://lore.kernel.org/r/20240328-gewendet-spargel-aa60a030ef74@brauner Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7 |
|
| #
1d024e7a |
| 18-Aug-2023 |
Matthew Wilcox (Oracle) <[email protected]> |
mm: remove enum page_entry_size
Remove the unnecessary encoding of page order into an enum and pass the page order directly. That lets us get rid of pe_order().
The switch constructs have to be ch
mm: remove enum page_entry_size
Remove the unnecessary encoding of page order into an enum and pass the page order directly. That lets us get rid of pe_order().
The switch constructs have to be changed to if/else constructs to prevent GCC from warning on builds with 3-level page tables where PMD_ORDER and PUD_ORDER have the same value.
If you are looking at this commit because your driver stopped compiling, look at the previous commit as well and audit your driver to be sure it doesn't depend on mmap_lock being held in its ->huge_fault method.
[[email protected]: use "order %u" to match the (non dev_t) style] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3 |
|
| #
2d515352 |
| 17-May-2023 |
Arnd Bergmann <[email protected]> |
dax: fix missing-prototype warnings
dev_dax_probe declaration for this function was removed with the only caller outside of device.c. Mark it static to avoid a W=1 warning: drivers/dax/device.c:399:
dax: fix missing-prototype warnings
dev_dax_probe declaration for this function was removed with the only caller outside of device.c. Mark it static to avoid a W=1 warning: drivers/dax/device.c:399:5: error: no previous prototype for 'dev_dax_probe'
Similarly, run_dax() causes a warning, but this one is because the declaration needs to be included:
drivers/dax/super.c:337:6: error: no previous prototype for 'run_dax'
Fixes: 83762cb5c7c4 ("dax: Kill DEV_DAX_PMEM_COMPAT") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8 |
|
| #
e9ee9fe3 |
| 10-Feb-2023 |
Dan Williams <[email protected]> |
dax: Assign RAM regions to memory-hotplug by default
The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Wh
dax: Assign RAM regions to memory-hotplug by default
The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Where is my memory?". Recall that platforms are increasingly shipping with performance-differentiated memory pools beyond typical DRAM and NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic interleave, varied media types, and future fabric attached possibilities).
For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux 'Soft Reserved') attribute is expected to be applied to all memory-pools that are not the general purpose pool. This designation gives an Operating System a chance to defer usage of a memory pool until later in the boot process where its performance properties can be interrogated and administrator policy can be applied.
'Soft Reserved' memory can be anything from too limited and precious to be part of the general purpose pool (HBM), too slow to host hot kernel data structures (some PMEM media), or anything in between. However, in the absence of an explicit policy, the memory should at least be made usable by default. The current device-dax default hides all non-general-purpose memory behind a device interface.
The expectation is that the distribution of users that want the memory online by default vs device-dedicated-access by default follows the Pareto principle. A small number of enlightened users may want to do userspace memory management through a device, but general users just want the kernel to make the memory available with an option to get more advanced later.
Arrange for all device-dax instances not backed by PMEM to default to attaching to the dax_kmem driver. From there the baseline memory hotplug policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=) gates whether the memory comes online or stays offline. Where, if it stays offline, it can be reliably converted back to device-mode where it can be partitioned, or fronted by a userspace allocator.
So, if someone wants device-dax instances for their 'Soft Reserved' memory:
1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot with memhp_default_state=offline, or roll the dice and hope that the kernel has not pinned a page in that memory before step 2.
2/ Write a udev rule to convert the target dax device(s) from 'system-ram' mode to 'devdax' mode:
daxctl reconfigure-device $dax -m devdax -f
Cc: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dave Hansen <[email protected]> Reviewed-by: Gregory Price <[email protected]> Tested-by: Fan Ni <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc7, v6.2-rc6 |
|
| #
1c71222e |
| 26-Jan-2023 |
Suren Baghdasaryan <[email protected]> |
mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma loc
mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness.
[[email protected]: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Sebastian Reichel <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Punit Agrawal <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4 |
|
| #
46de8b97 |
| 09-Feb-2022 |
Matthew Wilcox (Oracle) <[email protected]> |
fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
This is a mechanical change.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Damien Le Moal <damien.lemoal@open
fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
This is a mechanical change.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Damien Le Moal <[email protected]> Acked-by: Damien Le Moal <[email protected]> Tested-by: Mike Marshall <[email protected]> # orangefs Tested-by: David Howells <[email protected]> # afs
show more ...
|
| #
5660a863 |
| 09-Feb-2022 |
Matthew Wilcox (Oracle) <[email protected]> |
fs: Remove noop_invalidatepage()
We used to have to use noop_invalidatepage() to prevent block_invalidatepage() from being called, but that behaviour is now gone.
Signed-off-by: Matthew Wilcox (Ora
fs: Remove noop_invalidatepage()
We used to have to use noop_invalidatepage() to prevent block_invalidatepage() from being called, but that behaviour is now gone.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Damien Le Moal <[email protected]> Acked-by: Damien Le Moal <[email protected]> Tested-by: Mike Marshall <[email protected]> # orangefs Tested-by: David Howells <[email protected]> # afs
show more ...
|
|
Revision tags: v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
14606001 |
| 14-Jan-2022 |
Joao Martins <[email protected]> |
device-dax: compound devmap support
Use the newly added compound devmap facility which maps the assigned dax ranges as compound pages at a page size of @align.
dax devices are created with a fixed
device-dax: compound devmap support
Use the newly added compound devmap facility which maps the assigned dax ranges as compound pages at a page size of @align.
dax devices are created with a fixed @align (huge page size) which is enforced through as well at mmap() of the device. Faults, consequently happen too at the specified @align specified at the creation, and those don't change throughout dax device lifetime. MCEs unmap a whole dax huge page, as well as splits occurring at the configured page size.
Performance measured by gup_test improves considerably for unpin_user_pages() and altmap with NVDIMMs:
$ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms [altmap] (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms
$ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms [altmap with -m 127004] (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms
.. as well as unpin_user_page_range_dirty_lock() being just as effective as THP/hugetlb[0] pages.
[0] https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
6ec228b6 |
| 14-Jan-2022 |
Joao Martins <[email protected]> |
device-dax: remove pfn from __dev_dax_{pte,pmd,pud}_fault()
After moving the page mapping to be set prior to pte insertion, the pfn in dev_dax_huge_fault() no longer is necessary. Remove it, as wel
device-dax: remove pfn from __dev_dax_{pte,pmd,pud}_fault()
After moving the page mapping to be set prior to pte insertion, the pfn in dev_dax_huge_fault() no longer is necessary. Remove it, as well as the @pfn argument passed to the internal fault handler helpers.
[[email protected]: fix CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=n build]
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
0e7325f0 |
| 14-Jan-2022 |
Joao Martins <[email protected]> |
device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
Normally, the @page mapping is set prior to inserting the page into a page table entry. Make device-dax adhere to the same ordering, rat
device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
Normally, the @page mapping is set prior to inserting the page into a page table entry. Make device-dax adhere to the same ordering, rather than setting mapping after the PTE is inserted.
The address_space never changes and it is always associated with the same inode and underlying pages. So, the page mapping is set once but cleared when the struct pages are removed/freed (i.e. after {devm_}memunmap_pages()).
Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Joao Martins <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
a0fb038e |
| 14-Jan-2022 |
Joao Martins <[email protected]> |
device-dax: factor out page mapping initialization
Move initialization of page->mapping into a separate helper.
This is in preparation to move the mapping set to be prior to inserting the page tabl
device-dax: factor out page mapping initialization
Move initialization of page->mapping into a separate helper.
This is in preparation to move the mapping set to be prior to inserting the page table entry and also for tidying up compound page handling into one helper.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
fc65c4eb |
| 14-Jan-2022 |
Joao Martins <[email protected]> |
device-dax: ensure dev_dax->pgmap is valid for dynamic devices
Right now, only static dax regions have a valid @pgmap pointer in its struct dev_dax. Dynamic dax case however, do not.
In preparatio
device-dax: ensure dev_dax->pgmap is valid for dynamic devices
Right now, only static dax regions have a valid @pgmap pointer in its struct dev_dax. Dynamic dax case however, do not.
In preparation for device-dax compound devmap support, make sure that dev_dax pgmap field is set after it has been allocated and initialized.
dynamic dax device have the @pgmap is allocated at probe() and it's managed by devm (contrast to static dax region which a pgmap is provided and dax core kfrees it). So in addition to ensure a valid @pgmap, clear the pgmap when the dynamic dax device is released to avoid the same pgmap ranges to be re-requested across multiple region device reconfigs.
Add a static_dev_dax() and use that helper in dev_dax_probe() to ensure the initialization differences between dynamic and static regions are more explicit. While at it, consolidate the ranges initialization when we allocate the @pgmap for the dynamic dax region case. Also take the opportunity to document the differences between static and dynamic da regions.
Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Dan Williams <[email protected]> Signed-off-by: Joao Martins <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
09b80137 |
| 14-Jan-2022 |
Joao Martins <[email protected]> |
device-dax: use struct_size()
Use the struct_size() helper for the size of a struct with variable array member at the end, rather than manually calculating it.
Link: https://lkml.kernel.org/r/20211
device-dax: use struct_size()
Use the struct_size() helper for the size of a struct with variable array member at the end, rather than manually calculating it.
Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Dan Williams <[email protected]> Signed-off-by: Joao Martins <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
b9b5777f |
| 14-Jan-2022 |
Joao Martins <[email protected]> |
device-dax: use ALIGN() for determining pgoff
Rather than calculating @pgoff manually, switch to ALIGN() instead.
Link: https://lkml.kernel.org/r/[email protected] Su
device-dax: use ALIGN() for determining pgoff
Rather than calculating @pgoff manually, switch to ALIGN() instead.
Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Dan Williams <[email protected]> Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2 |
|
| #
83762cb5 |
| 15-Nov-2021 |
Dan Williams <[email protected]> |
dax: Kill DEV_DAX_PMEM_COMPAT
The /sys/class/dax compatibility option has shipped in the kernel for 4 years now which should be sufficient time for tools to abandon the old ABI in favor of the /sys/
dax: Kill DEV_DAX_PMEM_COMPAT
The /sys/class/dax compatibility option has shipped in the kernel for 4 years now which should be sufficient time for tools to abandon the old ABI in favor of the /sys/bus/dax device-model. Delete it now and see if anyone screams.
Since this compatibility option shipped there has been more reports of users being surprised by the compat ABI than surprised by the "new", so the compat infrastructure has outlived its usefulness. Recall that /sys/bus/dax device-model is required for the dax kmem driver which allows PMEM to be used as "System RAM".
The following projects were known to have a dependency on /sys/class/dax and have dropped their dependency as of the listed version:
- ndctl (including libndctl, daxctl, and libdaxctl): v64+ - fio: v3.13+ - pmdk: v1.5.2+
As further evidence this option is no longer needed some distributions have already stopped enabling CONFIG_DEV_DAX_PMEM_COMPAT.
Cc: Ira Weiny <[email protected]> Cc: Dave Jiang <[email protected]> Reported-by: Vishal Verma <[email protected]> Acked-by: Dave Hansen <[email protected]> Reviewed-by: Jane Chu <[email protected]> Link: https://lore.kernel.org/r/163701116195.3784476.726128179293466337.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1 |
|
| #
b82a96c9 |
| 29-Jun-2021 |
Matthew Wilcox (Oracle) <[email protected]> |
fs: remove noop_set_page_dirty()
Use __set_page_dirty_no_writeback() instead. This will set the dirty bit on the page, which will be used to avoid calling set_page_dirty() in the future. It will h
fs: remove noop_set_page_dirty()
Use __set_page_dirty_no_writeback() instead. This will set the dirty bit on the page, which will be used to avoid calling set_page_dirty() in the future. It will have no effect on actually writing the page back, as the pages are not on any LRU lists.
[[email protected]: export __set_page_dirty_no_writeback() to modules]
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7 |
|
| #
c80b5320 |
| 05-Feb-2021 |
Uwe Kleine-König <[email protected]> |
device-dax: Drop an empty .remove callback
The dax core properly handles a dax driver not having a remove callback. So drop it without changing the effective behaviour.
Signed-off-by: Uwe Kleine-Kö
device-dax: Drop an empty .remove callback
The dax core properly handles a dax driver not having a remove callback. So drop it without changing the effective behaviour.
Signed-off-by: Uwe Kleine-König <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
show more ...
|
|
Revision tags: v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1 |
|
| #
dd3b614f |
| 15-Dec-2020 |
Dmitry Safonov <[email protected]> |
vm_ops: rename .split() callback to .may_split()
Rename the callback to reflect that it's not called *on* or *after* split, but rather some time before the splitting to check if it's possible.
Link
vm_ops: rename .split() callback to .may_split()
Rename the callback to reflect that it's not called *on* or *after* split, but rather some time before the splitting to check if it's possible.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry Safonov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Brian Geffon <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1 |
|
| #
33cf94d7 |
| 13-Oct-2020 |
Joao Martins <[email protected]> |
device-dax: make align a per-device property
Introduce @align to struct dev_dax.
When creating a new device, we still initialize to the default dax_region @align. Child devices belonging to a regi
device-dax: make align a per-device property
Introduce @align to struct dev_dax.
When creating a new device, we still initialize to the default dax_region @align. Child devices belonging to a region may wish to keep a different alignment property instead of a global region-defined one.
Signed-off-by: Joao Martins <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brice Goglin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: David Airlie <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Hulk Robot <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Yan <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: "Jérôme Glisse" <[email protected]> Cc: Jia He <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Juergen Gross <[email protected]> Cc: kernel test robot <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/160106117957.30709.1142303024324655705.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|