|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
6f110a5e |
| 06-Apr-2025 |
Linus Torvalds <[email protected]> |
Disable SLUB_TINY for build testing
... and don't error out so hard on missing module descriptions.
Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") we used to warn about miss
Disable SLUB_TINY for build testing
... and don't error out so hard on missing module descriptions.
Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") we used to warn about missing module descriptions, but only when building with extra warnigns (ie 'W=1').
After that commit the warning became an unconditional hard error.
And it turns out not all modules have been converted despite the claims to the contrary. As reported by Damian Tometzki, the slub KUnit test didn't have a module description, and apparently nobody ever really noticed.
The reason nobody noticed seems to be that the slub KUnit tests get disabled by SLUB_TINY, which also ends up disabling a lot of other code, both in tests and in slub itself. And so anybody doing full build tests didn't actually see this failre.
So let's disable SLUB_TINY for build-only tests, since it clearly ends up limiting build coverage. Also turn the missing module descriptions error back into a warning, but let's keep it around for non-'W=1' builds.
Reported-by: Damian Tometzki <[email protected]> Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/ Cc: Masahiro Yamada <[email protected]> Cc: Jeff Johnson <[email protected]> Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
e63ee43e |
| 03-Mar-2025 |
David Hildenbrand <[email protected]> |
mm: CONFIG_NO_PAGE_MAPCOUNT to prepare for not maintain per-page mapcounts in large folios
We're close to the finishing line: let's introduce a new CONFIG_NO_PAGE_MAPCOUNT config option where we wil
mm: CONFIG_NO_PAGE_MAPCOUNT to prepare for not maintain per-page mapcounts in large folios
We're close to the finishing line: let's introduce a new CONFIG_NO_PAGE_MAPCOUNT config option where we will incrementally remove any dependencies on per-page mapcounts in large folios. Once that's done, we'll stop maintaining the per-page mapcounts with this config option enabled.
CONFIG_NO_PAGE_MAPCOUNT will be EXPERIMENTAL for now, as we'll have to learn about some of the real world impact of some of the implications.
As writing "!CONFIG_NO_PAGE_MAPCOUNT" is really nasty, let's introduce a helper config option "CONFIG_PAGE_MAPCOUNT" that expresses the negation.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andy Lutomirks^H^Hski <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lance Yang <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michal Koutn <[email protected]> Cc: Muchun Song <[email protected]> Cc: tejun heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
6af8cb80 |
| 03-Mar-2025 |
David Hildenbrand <[email protected]> |
mm/rmap: basic MM owner tracking for large folios (!hugetlb)
For small folios, we traditionally use the mapcount to decide whether it was "certainly mapped exclusively" by a single MM (mapcount == 1
mm/rmap: basic MM owner tracking for large folios (!hugetlb)
For small folios, we traditionally use the mapcount to decide whether it was "certainly mapped exclusively" by a single MM (mapcount == 1) or whether it "maybe mapped shared" by multiple MMs (mapcount > 1). For PMD-sized folios that were PMD-mapped, we were able to use a similar mechanism (single PMD mapping), but for PTE-mapped folios and in the future folios that span multiple PMDs, this does not work.
So we need a different mechanism to handle large folios. Let's add a new mechanism to detect whether a large folio is "certainly mapped exclusively", or whether it is "maybe mapped shared".
We'll use this information next to optimize CoW reuse for PTE-mapped anonymous THP, and to convert folio_likely_mapped_shared() to folio_maybe_mapped_shared(), independent of per-page mapcounts.
For each large folio, we'll have two slots, whereby a slot stores: (1) an MM id: unique id assigned to each MM (2) a per-MM mapcount
If a slot is unoccupied, it can be taken by the next MM that maps folio page.
In addition, we'll remember the current state -- "mapped exclusively" vs. "maybe mapped shared" -- and use a bit spinlock to sync on updates and to reduce the total number of atomic accesses on updates. In the future, it might be possible to squeeze a proper spinlock into "struct folio". For now, keep it simple, as we require the whole thing with THP only, that is incompatible with RT.
As we have to squeeze this information into the "struct folio" of even folios of order-1 (2 pages), and we generally want to reduce the required metadata, we'll assign each MM a unique ID that can fit into an int. In total, we can squeeze everything into 4x int (2x long) on 64bit.
32bit support is a bit challenging, because we only have 2x long == 2x int in order-1 folios. But we can make it work for now, because we neither expect many MMs nor very large folios on 32bit.
We will reliably detect folios as "mapped exclusively" vs. "mapped shared" as long as only two MMs map pages of a folio at one point in time -- for example with fork() and short-lived child processes, or with apps that hand over state from one instance to another.
As soon as three MMs are involved at the same time, we might detect "maybe mapped shared" although the folio is "mapped exclusively".
Example 1:
(1) App1 faults in a (shmem/file-backed) folio page -> Tracked as MM0 (2) App2 faults in a folio page -> Tracked as MM1 (4) App1 unmaps all folio pages
-> We will detect "mapped exclusively".
Example 2:
(1) App1 faults in a (shmem/file-backed) folio page -> Tracked as MM0 (2) App2 faults in a folio page -> Tracked as MM1 (3) App3 faults in a folio page -> No slot available, tracked as "unknown" (4) App1 and App2 unmap all folio pages
-> We will detect "maybe mapped shared".
Make use of __always_inline to keep possible performance degradation when (un)mapping large folios to a minimum.
Note: by squeezing the two flags into the "unsigned long" that stores the MM ids, we can use non-atomic __bit_spin_unlock() and non-atomic setting/clearing of the "maybe mapped shared" bit, effectively not adding any new atomics on the hot path when updating the large mapcount + new metadata, which further helps reduce the runtime overhead in micro-benchmarks.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andy Lutomirks^H^Hski <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lance Yang <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michal Koutn <[email protected]> Cc: Muchun Song <[email protected]> Cc: tejun heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
d65917c4 |
| 28-Feb-2025 |
Frank van der Linden <[email protected]> |
mm/sparse: allow for alternate vmemmap section init at boot
Add functions that are called just before the per-section memmap is initialized and just before the memmap page structures are initialized
mm/sparse: allow for alternate vmemmap section init at boot
Add functions that are called just before the per-section memmap is initialized and just before the memmap page structures are initialized. They are called sparse_vmemmap_init_nid_early and sparse_vmemmap_init_nid_late, respectively.
This allows for mm subsystems to add calls to initialize memmap and page structures in a specific way, if using SPARSEMEM_VMEMMAP. Specifically, hugetlb can pre-HVO bootmem allocated pages that way, so that no time and resources are wasted on allocating vmemmap pages, only to free them later (and possibly unnecessarily running the system out of memory in the process).
Refactor some code and export a few convenience functions for external use.
In sparse_init_nid, skip any sections that are already initialized, e.g. they have been initialized by sparse_vmemmap_init_nid_early already.
The hugetlb code to use these functions will be added in a later commit.
Export section_map_size, as any alternate memmap init code will want to use it.
The internal config option to enable this is SPARSEMEM_VMEMMAP_PREINIT, which is selected if an architecture-specific option, ARCH_WANT_HUGETLB_VMEMMAP_PREINIT, is set. In the future, if other subsystems want to do preinit too, they can do it in a similar fashion.
The internal config option is there because a section flag is used, and the number of flags available is architecture-dependent (see mmzone.h). Architecures can decide if there is room for the flag when enabling options that select SPARSEMEM_VMEMMAP_PREINIT.
Fortunately, as of right now, all sparse vmemmap using architectures do have room.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Frank van der Linden <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Joao Martins <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin (Cruise) <[email protected]> Cc: Usama Arif <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
6df8bae8 |
| 29-Jan-2025 |
Yosry Ahmed <[email protected]> |
mm: zbud: remove zbud
The zbud compressed pages allocator is rarely used, most users use zsmalloc. zbud consumes much more memory (only stores 1 or 2 compressed pages per physical page). The only
mm: zbud: remove zbud
The zbud compressed pages allocator is rarely used, most users use zsmalloc. zbud consumes much more memory (only stores 1 or 2 compressed pages per physical page). The only advantage of zbud is a marginal performance improvement that by no means justify the memory overhead.
Historically, zsmalloc had significantly worse latency than zbud and z3fold but offered better memory savings. This is no longer the case as shown by a simple recent analysis [1]. In a kernel build test on tmpfs in a limited cgroup, zbud 2-3% less time than zsmalloc, but at the cost of using ~32% more memory (1.5G vs 1.13G). The tradeoff does not make sense for zbud in any practical scenario.
The only alleged advantage of zbud is not having the dependency on CONFIG_MMU, but CONFIG_SWAP already depends on CONFIG_MMU anyway, and zbud is only used by zswap.
Remove zbud after z3fold's removal, leaving zsmalloc as the one and only zpool allocator. Leave the removal of the zpool API (and its associated config options) to a followup cleanup after no more allocators show up.
Deprecating zbud for a few cycles before removing it was initially proposed [2], like z3fold was marked as deprecated for 2 cycles [3]. However, Johannes rightfully pointed out that the 2 cycles is too short for most downstream consumers, and z3fold was deprecated first only as a courtesy anyway.
[1]https://lore.kernel.org/lkml/CAJD7tkbRF6od-2x_L8-A1QL3=2Ww13sCj4S3i4bNndqF+3+_Vg@mail.gmail.com/ [2]https://lore.kernel.org/lkml/[email protected]/ [3]https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Nhat Pham <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: WANG Xuerui <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
58ba73e5 |
| 29-Jan-2025 |
Yosry Ahmed <[email protected]> |
mm: z3fold: remove z3fold
Patch series "mm: zswap: remove z3fold and zbud", v2.
After 2 cycles of deprecating z3fold, remove it as well as zbud (rationale in specific patches).
This patch (of 2):
mm: z3fold: remove z3fold
Patch series "mm: zswap: remove z3fold and zbud", v2.
After 2 cycles of deprecating z3fold, remove it as well as zbud (rationale in specific patches).
This patch (of 2):
Z3fold has been marked as deprecated for 2 cycles and no one complained, as expected. As there are no known users, remove the code now.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Nhat Pham <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: WANG Xuerui <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
c9f8f124 |
| 03-Feb-2025 |
Vlastimil Babka <[email protected]> |
slab: don't batch kvfree_rcu() with SLUB_TINY
kvfree_rcu() is batched for better performance except on TINY_RCU, which is a simple implementation for small UP systems. Similarly SLUB_TINY is an opti
slab: don't batch kvfree_rcu() with SLUB_TINY
kvfree_rcu() is batched for better performance except on TINY_RCU, which is a simple implementation for small UP systems. Similarly SLUB_TINY is an option intended for small systems, whether or not used together with TINY_RCU. In case SLUB_TINY is used with !TINY_RCU, it makes arguably sense to not do the batching and limit the memory footprint. It's also suboptimal to have RCU-specific #ifdefs in slab code.
With that, add CONFIG_KVFREE_RCU_BATCHED to determine whether batching kvfree_rcu() implementation is used. It is not set by a user prompt, but enabled by default and disabled in case TINY_RCU or SLUB_TINY are enabled.
Use the new config for #ifdef's in slab code and extend their scope to cover all code used by the batched kvfree_rcu(). For example there's no need to perform kvfree_rcu_init() if the batching is disabled.
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
44d46b76 |
| 20-Dec-2024 |
Gregory Price <[email protected]> |
mm: add build-time option for hotplug memory default online type
Memory hotplug presently auto-onlines memory into a zone the kernel deems appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.
The
mm: add build-time option for hotplug memory default online type
Memory hotplug presently auto-onlines memory into a zone the kernel deems appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.
The memhp_default_state boot param enables runtime config, but it's not possible to do this at build-time.
Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param.
Selections: CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE => mhp_default_online_type = "offline" Memory will not be onlined automatically.
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO => mhp_default_online_type = "online" Memory will be onlined automatically in a zone deemed. appropriate by the kernel.
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL => mhp_default_online_type = "online_kernel" Memory will be onlined automatically. The zone may allow kernel data (e.g. ZONE_NORMAL).
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE => mhp_default_online_type = "online_movable" Memory will be onlined automatically. The zone will be ZONE_MOVABLE.
Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior.
Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.
[[email protected]: update KConfig comments] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gregory Price <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: WANG Xuerui <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2 |
|
| #
6375e95f |
| 04-Dec-2024 |
Qi Zheng <[email protected]> |
mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)
Now in order to pursue high performance, applications mostly use some high-performance user-mode memory allocators, such as jemalloc or
mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)
Now in order to pursue high performance, applications mostly use some high-performance user-mode memory allocators, such as jemalloc or tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release physical memory, but neither MADV_DONTNEED nor MADV_FREE will release page table memory, which may cause huge page table memory usage.
The following are a memory usage snapshot of one process which actually happened on our server:
VIRT: 55t RES: 590g VmPTE: 110g
In this case, most of the page table entries are empty. For such a PTE page where all entries are empty, we can actually free it back to the system for others to use.
As a first step, this commit aims to synchronously free the empty PTE pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases other than madvise(MADV_DONTNEED).
Once an empty PTE is detected, we first try to hold the pmd lock within the pte lock. If successful, we clear the pmd entry directly (fast path). Otherwise, we wait until the pte lock is released, then re-hold the pmd and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-detect whether the PTE page is empty and free it (slow path).
For other cases such as madvise(MADV_FREE), consider scanning and freeing empty PTE pages asynchronously in the future.
The following code snippet can show the effect of optimization:
mmap 50G while (1) { for (; i < 1024 * 25; i++) { touch 2M memory madvise MADV_DONTNEED 2M } }
As we can see, the memory usage of VmPTE is reduced:
before after VIRT 50.0 GB 50.0 GB RES 3.1 MB 3.1 MB VmPTE 102640 KB 240 KB
[[email protected]: fix uninitialized symbol 'ptl'] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/92aba2b319a734913f18ba41e7d86a265f0b84e2.1733305182.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jann Horn <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: Zach O'Keefe <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4 |
|
| #
b7c5f9a1 |
| 15-Oct-2024 |
Huang Ying <[email protected]> |
resource: remove dependency on SPARSEMEM from GET_FREE_REGION
We want to use the functions (get_free_mem_region()) configured via GET_FREE_REGION in resource kunit tests. However, GET_FREE_REGION d
resource: remove dependency on SPARSEMEM from GET_FREE_REGION
We want to use the functions (get_free_mem_region()) configured via GET_FREE_REGION in resource kunit tests. However, GET_FREE_REGION depends on SPARSEMEM now. This makes resource kunit tests cannot be built on some architectures lacking SPARSEMEM, or causes config warning as follows,
WARNING: unmet direct dependencies detected for GET_FREE_REGION Depends on [n]: SPARSEMEM [=n] Selected by [y]: - RESOURCE_KUNIT_TEST [=y] && RUNTIME_TESTING_MENU [=y] && KUNIT [=y]
When get_free_mem_region() was introduced the only consumers were those looking to pass the address range to memremap_pages(). That address range needed to be mindful of the maximum addressable platform physical address which at the time only SPARSMEM defined via MAX_PHYSMEM_BITS.
Given that memremap_pages() also depended on SPARSEMEM via ZONE_DEVICE, it was easier to just depend on that definition than invent a general MAX_PHYSMEM_BITS concept outside of SPARSEMEM.
Turns out that decision was buggy and did not account for KASAN consumption of physical address space. That problem was resolved recently with commit ea72ce5da228 ("x86/kaslr: Expose and use the end of the physical memory address space"), and GET_FREE_REGION dropped its MAX_PHYSMEM_BITS dependency.
Then commit 99185c10d5d9 ("resource, kunit: add test case for region_intersects()"), went ahead and fixed up the only remaining dependency on SPARSEMEM which was usage of the PA_SECTION_SHIFT macro for setting the default alignment. A PAGE_SIZE fallback is fine in the SPARSEMEM=n case.
With those build dependencies gone GET_FREE_REGION no longer depends on SPARSEMEM. So, the patch removes dependency on SPARSEMEM from GET_FREE_REGION to fix the build issues.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 99185c10d5d9 ("resource, kunit: add test case for region_intersects()") Signed-off-by: "Huang, Ying" <[email protected]> Co-developed-by: Dan Williams <[email protected]> Signed-off-by: Dan Williams <[email protected]> Tested-by: Guenter Roeck <[email protected]> Acked-by: David Hildenbrand <[email protected]> Tested-by: Nathan Chancellor <[email protected]> # build Cc: Arnd Bergmann <[email protected]> Cc: Jonathan Cameron <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2 |
|
| #
bcc9d04e |
| 01-Oct-2024 |
Mark Brown <[email protected]> |
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
Since multiple architectures have support for shadow stacks and we need to select support for this feature in several places in the generic code provide a ge
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
Since multiple architectures have support for shadow stacks and we need to select support for this feature in several places in the generic code provide a generic config option that the architectures can select.
Suggested-by: David Hildenbrand <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Deepak Gupta <[email protected]> Reviewed-by: Rick Edgecombe <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Reviewed-by: Kees Cook <[email protected]> Tested-by: Kees Cook <[email protected]> Acked-by: Shuah Khan <[email protected]> Reviewed-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1 |
|
| #
a3344078 |
| 24-Sep-2024 |
Guenter Roeck <[email protected]> |
mm: make SPLIT_PTE_PTLOCKS depend on SMP
SPLIT_PTE_PTLOCKS depends on "NR_CPUS >= 4". Unfortunately, that evaluates to true if there is no NR_CPUS configuration option. This results in CONFIG_SPLI
mm: make SPLIT_PTE_PTLOCKS depend on SMP
SPLIT_PTE_PTLOCKS depends on "NR_CPUS >= 4". Unfortunately, that evaluates to true if there is no NR_CPUS configuration option. This results in CONFIG_SPLIT_PTE_PTLOCKS=y for mac_defconfig. This in turn causes the m68k "q800" and "virt" machines to crash in qemu if debugging options are enabled.
Making CONFIG_SPLIT_PTE_PTLOCKS dependent on the existence of NR_CPUS does not work since a dependency on the existence of a numeric Kconfig entry always evaluates to false. Example:
config HAVE_NO_NR_CPUS def_bool y depends on !NR_CPUS
After adding this to a Kconfig file, "make defconfig" includes: $ grep NR_CPUS .config CONFIG_NR_CPUS=64 CONFIG_HAVE_NO_NR_CPUS=y
Defining NR_CPUS for m68k does not help either since many architectures define NR_CPUS only for SMP configurations.
Make SPLIT_PTE_PTLOCKS depend on SMP instead to solve the problem.
Link: https://lkml.kernel.org/r/[email protected] Fixes: 394290cba966 ("mm: turn USE_SPLIT_PTE_PTLOCKS / USE_SPLIT_PTE_PTLOCKS into Kconfig options") Signed-off-by: Guenter Roeck <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7 |
|
| #
7a2369b7 |
| 04-Sep-2024 |
Yosry Ahmed <[email protected]> |
mm: z3fold: deprecate CONFIG_Z3FOLD
The z3fold compressed pages allocator is rarely used, most users use zsmalloc. The only disadvantage of zsmalloc in comparison is the dependency on MMU, and zbud
mm: z3fold: deprecate CONFIG_Z3FOLD
The z3fold compressed pages allocator is rarely used, most users use zsmalloc. The only disadvantage of zsmalloc in comparison is the dependency on MMU, and zbud is a more common option for !MMU as it was the default zswap allocator for a long time.
Historically, zsmalloc had worse latency than zbud and z3fold but offered better memory savings. This is no longer the case as shown by a simple recent analysis [1]. That analysis showed that z3fold does not have any advantage over zsmalloc or zbud considering both performance and memory usage. In a kernel build test on tmpfs in a limited cgroup, z3fold took 3% more time and used 1.8% more memory. The latency of zswap_load() was 7% higher, and that of zswap_store() was 10% higher. Zsmalloc is better in all metrics.
Moreover, z3fold apparently has latent bugs, which was made noticeable by a recent soft lockup bug report with z3fold [2]. Switching to zsmalloc not only fixed the problem, but also reduced the swap usage from 6~8G to 1~2G. Other users have also reported being bitten by mistakenly enabling z3fold.
Other than hurting users, z3fold is repeatedly causing wasted engineering effort. Apart from investigating the above bug, it came up in multiple development discussions (e.g. [3]) as something we need to handle, when there aren't any legit users (at least not intentionally).
The natural course of action is to deprecate z3fold, and remove in a few cycles if no objections are raised from active users. Next on the list should be zbud, as it offers marginal latency gains at the cost of huge memory waste when compared to zsmalloc. That one will need to wait until zsmalloc does not depend on MMU.
Rename the user-visible config option from CONFIG_Z3FOLD to CONFIG_Z3FOLD_DEPRECATED so that users with CONFIG_Z3FOLD=y get a new prompt with explanation during make oldconfig. Also, remove CONFIG_Z3FOLD=y from defconfigs.
[1]https://lore.kernel.org/lkml/CAJD7tkbRF6od-2x_L8-A1QL3=2Ww13sCj4S3i4bNndqF+3+_Vg@mail.gmail.com/ [2]https://lore.kernel.org/lkml/[email protected]/ [3]https://lore.kernel.org/lkml/CAJD7tkbnmeVugfunffSovJf9FAgy9rhBVt_tx=nxUveLUfqVsA@mail.gmail.com/
[[email protected]: deprecate ZSWAP_ZPOOL_DEFAULT_Z3FOLD as well] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Chris Down <[email protected]> Acked-by: Nhat Pham <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vitaly Wool <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6 |
|
| #
6857be5f |
| 26-Aug-2024 |
Peter Xu <[email protected]> |
mm: introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud
Patch series "mm: Support huge pfnmaps", v2.
Overview ========
This series implements huge pfnmaps support for mm in general. H
mm: introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud
Patch series "mm: Support huge pfnmaps", v2.
Overview ========
This series implements huge pfnmaps support for mm in general. Huge pfnmap allows e.g. VM_PFNMAP vmas to map in either PMD or PUD levels, similar to what we do with dax / thp / hugetlb so far to benefit from TLB hits. Now we extend that idea to PFN mappings, e.g. PCI MMIO bars where it can grow as large as 8GB or even bigger.
Currently, only x86_64 (1G+2M) and arm64 (2M) are supported. The last patch (from Alex Williamson) will be the first user of huge pfnmap, so as to enable vfio-pci driver to fault in huge pfn mappings.
Implementation ==============
In reality, it's relatively simple to add such support comparing to many other types of mappings, because of PFNMAP's specialties when there's no vmemmap backing it, so that most of the kernel routines on huge mappings should simply already fail for them, like GUPs or old-school follow_page() (which is recently rewritten to be folio_walk* APIs by David).
One trick here is that we're still unmature on PUDs in generic paths here and there, as DAX is so far the only user. This patchset will add the 2nd user of it. Hugetlb can be a 3rd user if the hugetlb unification work can go on smoothly, but to be discussed later.
The other trick is how to allow gup-fast working for such huge mappings even if there's no direct sign of knowing whether it's a normal page or MMIO mapping. This series chose to keep the pte_special solution, so that it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that gup-fast will be able to identify them and fail properly.
Along the way, we'll also notice that the major pgtable pfn walker, aka, follow_pte(), will need to retire soon due to the fact that it only works with ptes. A new set of simple API is introduced (follow_pfnmap* API) to be able to do whatever follow_pte() can already do, plus that it can also process huge pfnmaps now. Half of this series is about that and converting all existing pfnmap walkers to use the new API properly. Hopefully the new API also looks better to avoid exposing e.g. pgtable lock details into the callers, so that it can be used in an even more straightforward way.
Here, three more options will be introduced and involved in huge pfnmap:
- ARCH_SUPPORTS_HUGE_PFNMAP
Arch developers will need to select this option when huge pfnmap is supported in arch's Kconfig. After this patchset applied, both x86_64 and arm64 will start to enable it by default.
- ARCH_SUPPORTS_PMD_PFNMAP / ARCH_SUPPORTS_PUD_PFNMAP
These options are for driver developers to identify whether current arch / config supports huge pfnmaps, making decision on whether it can use the huge pfnmap APIs to inject them. One can refer to the last vfio-pci patch from Alex on the use of them properly in a device driver.
So after the whole set applied, and if one would enable some dynamic debug lines in vfio-pci core files, we should observe things like:
vfio-pci 0000:00:06.0: vfio_pci_mmap_huge_fault(,order = 9) BAR 0 page offset 0x0: 0x100 vfio-pci 0000:00:06.0: vfio_pci_mmap_huge_fault(,order = 9) BAR 0 page offset 0x200: 0x100 vfio-pci 0000:00:06.0: vfio_pci_mmap_huge_fault(,order = 9) BAR 0 page offset 0x400: 0x100
In this specific case, it says that vfio-pci faults in PMDs properly for a few BAR0 offsets.
Patch Layout ============
Patch 1: Introduce the new options mentioned above for huge PFNMAPs Patch 2: A tiny cleanup Patch 3-8: Preparation patches for huge pfnmap (include introduce special bit for pmd/pud) Patch 9-16: Introduce follow_pfnmap*() API, use it everywhere, and then drop follow_pte() API Patch 17: Add huge pfnmap support for x86_64 Patch 18: Add huge pfnmap support for arm64 Patch 19: Add vfio-pci support for all kinds of huge pfnmaps (Alex)
TODO ====
More architectures / More page sizes ------------------------------------
Currently only x86_64 (2M+1G) and arm64 (2M) are supported. There seems to have plan to support arm64 1G later on top of this series [2].
Any arch will need to first support THP / THP_1G, then provide a special bit in pmds/puds to support huge pfnmaps.
remap_pfn_range() support -------------------------
Currently, remap_pfn_range() still only maps PTEs. With the new option, remap_pfn_range() can logically start to inject either PMDs or PUDs when the alignment requirements match on the VAs.
When the support is there, it should be able to silently benefit all drivers that is using remap_pfn_range() in its mmap() handler on better TLB hit rate and overall faster MMIO accesses similar to processor on hugepages.
More driver support -------------------
VFIO is so far the only consumer for the huge pfnmaps after this series applied. Besides above remap_pfn_range() generic optimization, device driver can also try to optimize its mmap() on a better VA alignment for either PMD/PUD sizes. This may, iiuc, normally require userspace changes, as the driver doesn't normally decide the VA to map a bar. But I don't think I know all the drivers to know the full picture.
Credits all go to Alex on help testing the GPU/NIC use cases above.
[0] https://lore.kernel.org/r/[email protected] [1] https://lore.kernel.org/r/[email protected] [2] https://lore.kernel.org/r/[email protected]
This patch (of 19):
This patch introduces the option to introduce special pte bit into pmd/puds. Archs can start to define pmd_special / pud_special when supported by selecting the new option. Per-arch support will be added later.
Before that, create fallbacks for these helpers so that they are always available.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Gavin Shan <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Niklas Schnelle <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
5ad7a998 |
| 03-Sep-2024 |
Sergey Senozhatsky <[email protected]> |
mm: Kconfig: fixup zsmalloc configuration
zsmalloc is not exclusive to zswap. Commit b3fbd58fcbb1 ("mm: Kconfig: simplify zswap configuration") made CONFIG_ZSMALLOC only visible when CONFIG_ZSWAP i
mm: Kconfig: fixup zsmalloc configuration
zsmalloc is not exclusive to zswap. Commit b3fbd58fcbb1 ("mm: Kconfig: simplify zswap configuration") made CONFIG_ZSMALLOC only visible when CONFIG_ZSWAP is selected, which makes it impossible to menuconfig zsmalloc-specific features (stats, chain-size, etc.) on systems that use ZRAM but don't have ZSWAP enabled.
Make zsmalloc depend on both ZRAM and ZSWAP.
Link: https://lkml.kernel.org/r/[email protected] Fixes: b3fbd58fcbb1 ("mm: Kconfig: simplify zswap configuration") Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5 |
|
| #
7a87225a |
| 21-Aug-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
x86: remove PG_uncached
Convert x86 to use PG_arch_2 instead of PG_uncached and remove PG_uncached.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matt
x86: remove PG_uncached
Convert x86 to use PG_arch_2 instead of PG_uncached and remove PG_uncached.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
04cb7502 |
| 21-Aug-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
zsmalloc: use all available 24 bits of page_type
Now that we have an extra 8 bits, we don't need to limit ourselves to supporting a 64KiB page size. I'm sure both Hexagon users are grateful, but it
zsmalloc: use all available 24 bits of page_type
Now that we have an extra 8 bits, we don't need to limit ourselves to supporting a 64KiB page size. I'm sure both Hexagon users are grateful, but it does reduce complexity a little. We can also remove reset_first_obj_offset() as calling __ClearPageZsmalloc() will now reset all 32 bits of page_type.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Kent Overstreet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3 |
|
| #
b0c4e27c |
| 07-Aug-2024 |
Mike Rapoport (Microsoft) <[email protected]> |
mm: introduce numa_emulation
Move numa_emulation code from arch/x86 to mm/numa_emulation.c
This code will be later reused by arch_numa.
No functional changes.
Link: https://lkml.kernel.org/r/2024
mm: introduce numa_emulation
Move numa_emulation code from arch/x86 to mm/numa_emulation.c
This code will be later reused by arch_numa.
No functional changes.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (Microsoft) <[email protected]> Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64 Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU] Acked-by: Dan Williams <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David S. Miller <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: John Paul Adrian Glaubitz <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Rob Herring (Arm) <[email protected]> Cc: Samuel Holland <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
87482708 |
| 07-Aug-2024 |
Mike Rapoport (Microsoft) <[email protected]> |
mm: introduce numa_memblks
Move code dealing with numa_memblks from arch/x86 to mm/ and add Kconfig options to let x86 select it in its Kconfig.
This code will be later reused by arch_numa.
No fun
mm: introduce numa_memblks
Move code dealing with numa_memblks from arch/x86 to mm/ and add Kconfig options to let x86 select it in its Kconfig.
This code will be later reused by arch_numa.
No functional changes.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (Microsoft) <[email protected]> Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64 Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU] Acked-by: Dan Williams <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David S. Miller <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: John Paul Adrian Glaubitz <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Rob Herring (Arm) <[email protected]> Cc: Samuel Holland <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1 |
|
| #
394290cb |
| 26-Jul-2024 |
David Hildenbrand <[email protected]> |
mm: turn USE_SPLIT_PTE_PTLOCKS / USE_SPLIT_PTE_PTLOCKS into Kconfig options
Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".
This series is a follow up to the fixes: "[PA
mm: turn USE_SPLIT_PTE_PTLOCKS / USE_SPLIT_PTE_PTLOCKS into Kconfig options
Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".
This series is a follow up to the fixes: "[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"
When working on the fixes, I wondered why 8xx is fine (-> never uses split PT locks) and how PT locking even works properly with PMD page table sharing (-> always requires split PMD PT locks).
Let's improve the split PT lock detection, make hugetlb properly depend on it and make 8xx bail out if it would ever get enabled by accident.
As an alternative to patch #3 we could extend the Kconfig SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the code that actually implements it feels a bit nicer for documentation purposes, and there is no need to actually disable it because it should always be disabled (!SMP).
Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks are still getting used where we would expect them.
[1] https://lkml.kernel.org/r/[email protected]
This patch (of 3):
Let's clean that up a bit and prepare for depending on CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.
More cleanups would be reasonable (like the arch-specific "depends on" for CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Qi Zheng <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Muchun Song <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Xu <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7 |
|
| #
8268614b |
| 02-Jul-2024 |
Christophe Leroy <[email protected]> |
mm: remove CONFIG_ARCH_HAS_HUGEPD
powerpc was the only user of CONFIG_ARCH_HAS_HUGEPD and doesn't use it anymore, so remove all related code.
Link: https://lkml.kernel.org/r/4b10c54c794780b955f3ad6
mm: remove CONFIG_ARCH_HAS_HUGEPD
powerpc was the only user of CONFIG_ARCH_HAS_HUGEPD and doesn't use it anymore, so remove all related code.
Link: https://lkml.kernel.org/r/4b10c54c794780b955f3ad6c657d0199dd792146.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Oscar Salvador <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc6, v6.10-rc5 |
|
| #
854fa98d |
| 21-Jun-2024 |
Ilya Leoshkevich <[email protected]> |
kmsan: disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled
KMSAN relies on memblock returning all available pages to it (see kmsan_memblock_free_pages()). It partitions these pages into 3 categ
kmsan: disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled
KMSAN relies on memblock returning all available pages to it (see kmsan_memblock_free_pages()). It partitions these pages into 3 categories: pages available to the buddy allocator, shadow pages and origin pages. This partitioning is static.
If new pages appear after kmsan_init_runtime(), it is considered an error. DEFERRED_STRUCT_PAGE_INIT causes this, so mark it as incompatible with KMSAN.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
34f7c528 |
| 19-Jun-2024 |
Javier Martinez Canillas <[email protected]> |
mm/Kconfig: mention arm64 in DEFAULT_MMAP_MIN_ADDR symbol help text
Currently ppc64 and x86 are mentioned as architectures where a 65536 value is reasonable but arm64 isn't listed and it is also a 6
mm/Kconfig: mention arm64 in DEFAULT_MMAP_MIN_ADDR symbol help text
Currently ppc64 and x86 are mentioned as architectures where a 65536 value is reasonable but arm64 isn't listed and it is also a 64-bit architecture.
The help text says that for "arm" the value should be no higher than 32768 but it's only talking about 32-bit ARM. Adding arm64 to the above list can make this more clear and avoid confusing users who may think that the 32k limit would also apply to 64-bit ARM.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Javier Martinez Canillas <[email protected]> Cc: Brian Masney <[email protected]> Cc: Javier Martinez Canillas <[email protected]> Cc: Maxime Ripard <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
43d746dc |
| 29-May-2024 |
David Hildenbrand <[email protected]> |
mm/zsmalloc: use a proper page type
Let's clean it up: use a proper page type and store our data (offset into a page) in the lower 16 bit as documented.
We won't be able to support 256 KiB base pag
mm/zsmalloc: use a proper page type
Let's clean it up: use a proper page type and store our data (offset into a page) in the lower 16 bit as documented.
We won't be able to support 256 KiB base pages, which is acceptable. Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC.
Based on this, we should do a proper "struct zsdesc" conversion, as proposed in [1].
This removes the last _mapcount/page_type offender.
[1] https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Tested-by: Sergey Senozhatsky <[email protected]> [zram/zsmalloc workloads] Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
67f2df3b |
| 01-Jul-2024 |
Kees Cook <[email protected]> |
mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to support separated kmalloc buckets (in the following kmem_buckets_create() patc
mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to support separated kmalloc buckets (in the following kmem_buckets_create() patches and future codetag-based separation). Since this will provide a mitigation for a very common case of exploits, it is recommended to enable this feature for general purpose distros. By default, the new Kconfig will be enabled if CONFIG_SLAB_FREELIST_HARDENED is enabled (and it is added to the hardening.config Kconfig fragment).
To be able to choose which buckets to allocate from, make the buckets available to the internal kmalloc interfaces by adding them as the second argument, rather than depending on the buckets being chosen from the fixed set of global buckets. Where the bucket is not available, pass NULL, which means "use the default system kmalloc bucket set" (the prior existing behavior), as implemented in kmalloc_slab().
To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the top-level macros and static inlines use the buckets argument (where they are stripped out and compiled out respectively). The actual extern functions can then be built without the argument, and the internals fall back to the global kmalloc buckets unconditionally.
Co-developed-by: Vlastimil Babka <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|