|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
9039b909 |
| 06-Mar-2025 |
Luiz Capitulino <[email protected]> |
mm: page_ext: add an iteration API for page extensions
Patch series "mm: page_ext: Introduce new iteration API", v3.
Introduction ============
[ Thanks to David Hildenbrand for identifying the r
mm: page_ext: add an iteration API for page extensions
Patch series "mm: page_ext: Introduce new iteration API", v3.
Introduction ============
[ Thanks to David Hildenbrand for identifying the root cause of this issue and proving guidance on how to fix it. The new API idea, bugs and misconceptions are all mine though ]
Currently, trying to reserve 1G pages with page_owner=on and sparsemem causes a crash. The reproducer is very simple:
1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line 3. Reserve one 1G page at run-time, this should crash (see patch 1 for backtrace)
[ A crash with page_table_check is also possible, but harder to trigger ]
Apparently, starting with commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios") we now pass the full allocation order to page extension clients and the page extension implementation assumes that all PFNs of an allocation range will be stored in the same memory section (which is not true for 1G pages).
To fix this, this series introduces a new iteration API for page extension objects. The API checks if the next page extension object can be retrieved from the current section or if it needs to look up for it in another section.
Please, find all details in patch 1.
I tested this series on arm64 and x86 by reserving 1G pages at run-time and doing kernel builds (always with page_owner=on and page_table_check=on).
This patch (of 3):
The page extension implementation assumes that all page extensions of a given page order are stored in the same memory section. The function page_ext_next() relies on this assumption by adding an offset to the current object to return the next adjacent page extension.
This behavior works as expected for flatmem but fails for sparsemem when using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios") exposes this issue, making it possible for a crash when using page_owner or page_table_check page extensions.
The problem is that for 1G pages, the page extensions may span memory section boundaries and be stored in different memory sections. This issue was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios") because alloc_contig_pages() never passed more than MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing mentioned commit changed this behavior allowing the full 1G page order to be passed.
Reproducer:
1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions support 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line 3. Reserve one 1G page at run-time, this should crash (backtrace below)
To address this issue, this commit introduces a new API for iterating through page extensions. The main iteration macro is for_each_page_ext() and it must be called with the RCU read lock taken. Here's an usage example:
""" struct page_ext_iter iter; struct page_ext *page_ext;
...
rcu_read_lock(); for_each_page_ext(page, 1 << order, page_ext, iter) { struct my_page_ext *obj = get_my_page_ext_obj(page_ext); ... } rcu_read_unlock(); """
The loop construct uses page_ext_iter_next() which checks to see if we have crossed sections in the iteration. In this case, page_ext_iter_next() retrieves the next page_ext object from another section.
Thanks to David Hildenbrand for helping identify the root cause and providing suggestions on how to fix and optmize the solution (final implementation and bugs are all mine through).
Lastly, here's the backtrace, without kasan you can get random crashes:
[ 76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298 [ 76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598 [ 76.066714] [ 76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3 [ 76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10 [ 76.088972] Call trace: [ 76.091411] show_stack+0x20/0x38 (C) [ 76.095073] dump_stack_lvl+0x80/0xf8 [ 76.098733] print_address_description.constprop.0+0x88/0x398 [ 76.104476] print_report+0xa8/0x278 [ 76.108041] kasan_report+0xa8/0xf8 [ 76.111520] __asan_report_store4_noabort+0x20/0x30 [ 76.116391] __update_page_owner_handle+0x238/0x298 [ 76.121259] __set_page_owner+0xdc/0x140 [ 76.125173] post_alloc_hook+0x190/0x1d8 [ 76.129090] alloc_contig_range_noprof+0x54c/0x890 [ 76.133874] alloc_contig_pages_noprof+0x35c/0x4a8 [ 76.138656] alloc_gigantic_folio.isra.0+0x2c0/0x368 [ 76.143616] only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150 [ 76.149353] alloc_pool_huge_folio+0x11c/0x1f8 [ 76.153787] set_max_huge_pages+0x364/0xca8 [ 76.157961] __nr_hugepages_store_common+0xb0/0x1a0 [ 76.162829] nr_hugepages_store+0x108/0x118 [ 76.167003] kobj_attr_store+0x3c/0x70 [ 76.170745] sysfs_kf_write+0xfc/0x188 [ 76.174492] kernfs_fop_write_iter+0x274/0x3e0 [ 76.178927] vfs_write+0x64c/0x8e0 [ 76.182323] ksys_write+0xf8/0x1f0 [ 76.185716] __arm64_sys_write+0x74/0xb0 [ 76.189630] invoke_syscall.constprop.0+0xd8/0x1e0 [ 76.194412] do_el0_svc+0x164/0x1e0 [ 76.197891] el0_svc+0x40/0xe0 [ 76.200939] el0t_64_sync_handler+0x144/0x168 [ 76.205287] el0t_64_sync+0x1ac/0x1b0
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/a45893880b7e1601082d39d2c5c8b50bcc096305.1741301089.git.luizcap@redhat.com Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios") Signed-off-by: Luiz Capitulino <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Muchun Song <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
6e65aa55 |
| 26-Mar-2024 |
Matthew Wilcox (Oracle) <[email protected]> |
mm: make page_ext_get() take a const argument
In order to constify other functions, we need page_ext_get() to be const. This is no problem as lookup_page_ext() already takes a const argument.
Link
mm: make page_ext_get() take a const argument
In order to constify other functions, we need page_ext_get() to be const. This is no problem as lookup_page_ext() already takes a const argument.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1 |
|
| #
dcfe378c |
| 21-Mar-2024 |
Suren Baghdasaryan <[email protected]> |
lib: introduce support for page allocation tagging
Introduce helper functions to easily instrument page allocators by storing a pointer to the allocation tag associated with the code that allocated
lib: introduce support for page allocation tagging
Introduce helper functions to easily instrument page allocators by storing a pointer to the allocation tag associated with the code that allocated the page in a page_ext field.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Co-developed-by: Kent Overstreet <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3 |
|
| #
67311a36 |
| 17-Jul-2023 |
Kemeng Shi <[email protected]> |
mm/page_ext: move page_ext_operations definition under CONFIG_PAGE_EXTENSION
page_ext_operations should only be defined when CONFIG_PAGE_EXTENSION is enabled.
Besides, this may detect missing relia
mm/page_ext: move page_ext_operations definition under CONFIG_PAGE_EXTENSION
page_ext_operations should only be defined when CONFIG_PAGE_EXTENSION is enabled.
Besides, this may detect missing reliance on CONFIG_PAGE_EXTENSION from future Page Extension clients at compile time.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
c0a5d93a |
| 18-Jul-2023 |
Kemeng Shi <[email protected]> |
mm/page_ext: add common function to get client data from page_ext
Patch series "add page_ext_data to get client data in page_ext".
Current clients get data from page_ext by adding offset which is a
mm/page_ext: add common function to get client data from page_ext
Patch series "add page_ext_data to get client data in page_ext".
Current clients get data from page_ext by adding offset which is auto generated in page_ext core and exposes the data layout design inside page_ext core. This series adds a page_ext_data() to hide this from clients.
Benefits include:
1. Future clients can call page_ext_data directly instead of defining a new function like get_page_owner to get the data.
2. There is no change to clients if the layout of page_ext data changes.
This patch (of 3):
Add common page_ext_data function to get client data. This could hide offset which is auto generated in page_ext core and expose the desgin of page_ext data layout.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kemeng Shi <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4 |
|
| #
de57807e |
| 21-Mar-2023 |
Mike Rapoport (IBM) <[email protected]> |
init,mm: fold late call to page_ext_init() to page_alloc_init_late()
When deferred initialization of struct pages is enabled, page_ext_init() must be called after all the deferred initialization is
init,mm: fold late call to page_ext_init() to page_alloc_init_late()
When deferred initialization of struct pages is enabled, page_ext_init() must be called after all the deferred initialization is done, but there is no point to keep it a separate call from kernel_init_freeable() right after page_alloc_init_late().
Fold the call to page_ext_init() into page_alloc_init_late() and localize deferred_struct_pages variable.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5 |
|
| #
7ec7096b |
| 17-Jan-2023 |
Pasha Tatashin <[email protected]> |
mm/page_ext: init page_ext early if there are no deferred struct pages
page_ext must be initialized after all struct pages are initialized. Therefore, page_ext is initialized after page_alloc_init_
mm/page_ext: init page_ext early if there are no deferred struct pages
page_ext must be initialized after all struct pages are initialized. Therefore, page_ext is initialized after page_alloc_init_late(), and can optionally be initialized earlier via early_page_ext kernel parameter which as a side effect also disables deferred struct pages.
Allow to automatically init page_ext early when there are no deferred struct pages in order to be able to use page_ext during kernel boot and track for example page allocations early.
[[email protected]: fix build with CONFIG_PAGE_EXTENSION=n] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pasha Tatashin <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Charan Teja Kalla <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Li Zhe <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc4 |
|
| #
6189eb82 |
| 13-Jan-2023 |
Pasha Tatashin <[email protected]> |
mm/page_ext: do not allocate space for page_ext->flags if not needed
There is 8 byte page_ext->flags field allocated per page whenever CONFIG_PAGE_EXTENSION is enabled. However, not every user of p
mm/page_ext: do not allocate space for page_ext->flags if not needed
There is 8 byte page_ext->flags field allocated per page whenever CONFIG_PAGE_EXTENSION is enabled. However, not every user of page_ext uses flags. Therefore, check whether flags is needed at least by one user and if so allocate space for it.
For example when page_table_check is enabled, on a machine with 128G of memory before the fix:
[ 2.244288] allocated 536870912 bytes of page_ext after the fix: [ 2.160154] allocated 268435456 bytes of page_ext
Also, add a kernel-doc comment before page_ext_operations that describes the fields, and remove check if need() is set, as that is now a required field.
[[email protected]: address comments from Mike Rapoport] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pasha Tatashin <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Charan Teja Kalla <[email protected]> Cc: Li Zhe <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3 |
|
| #
c4f20f14 |
| 25-Aug-2022 |
Li Zhe <[email protected]> |
page_ext: introduce boot parameter 'early_page_ext'
In commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init""), we call page_ext_init() after page_alloc_init_late() to avoid some
page_ext: introduce boot parameter 'early_page_ext'
In commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init""), we call page_ext_init() after page_alloc_init_late() to avoid some panic problem. It seems that we cannot track early page allocations in current kernel even if page structure has been initialized early.
This patch introduces a new boot parameter 'early_page_ext' to resolve this problem. If we pass it to the kernel, page_ext_init() will be moved up and the feature 'deferred initialization of struct pages' will be disabled to initialize the page allocator early and prevent the panic problem above. It can help us to catch early page allocations. This is useful especially when we find that the free memory value is not the same right after different kernel booting.
[[email protected]: fix section issue by removing __meminitdata] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Li Zhe <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mark-PK Tsai <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc2 |
|
| #
b1d5488a |
| 18-Aug-2022 |
Charan Teja Kalla <[email protected]> |
mm: fix use-after free of page_ext after race with memory-offline
The below is one path where race between page_ext and offline of the respective memory blocks will cause use-after-free on the acces
mm: fix use-after free of page_ext after race with memory-offline
The below is one path where race between page_ext and offline of the respective memory blocks will cause use-after-free on the access of page_ext structure.
process1 process2 --------- --------- a)doing /proc/page_owner doing memory offline through offline_pages.
b) PageBuddy check is failed thus proceed to get the page_owner information through page_ext access. page_ext = lookup_page_ext(page);
migrate_pages(); ................. Since all pages are successfully migrated as part of the offline operation,send MEM_OFFLINE notification where for page_ext it calls: offline_page_ext()--> __free_page_ext()--> free_page_ext()--> vfree(ms->page_ext) mem_section->page_ext = NULL
c) Check for the PAGE_EXT flags in the page_ext->flags access results into the use-after-free (leading to the translation faults).
As mentioned above, there is really no synchronization between page_ext access and its freeing in the memory_offline.
The memory offline steps(roughly) on a memory block is as below:
1) Isolate all the pages
2) while(1) try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
3) delete the pages from this buddy list.
4) Then free page_ext.(Note: The struct page is still alive as it is freed only during hot remove of the memory which frees the memmap, which steps the user might not perform).
This design leads to the state where struct page is alive but the struct page_ext is freed, where the later is ideally part of the former which just representing the page_flags (check [3] for why this design is chosen).
The abovementioned race is just one example __but the problem persists in the other paths too involving page_ext->flags access(eg: page_is_idle())__.
Fix all the paths where offline races with page_ext access by maintaining synchronization with rcu lock and is achieved in 3 steps:
1) Invalidate all the page_ext's of the sections of a memory block by storing a flag in the LSB of mem_section->page_ext.
2) Wait until all the existing readers to finish working with the ->page_ext's with synchronize_rcu(). Any parallel process that starts after this call will not get page_ext, through lookup_page_ext(), for the block parallel offline operation is being performed.
3) Now safely free all sections ->page_ext's of the block on which offline operation is being performed.
Note: If synchronize_rcu() takes time then optimizations can be done in this path through call_rcu()[2].
Thanks to David Hildenbrand for his views/suggestions on the initial discussion[1] and Pavan kondeti for various inputs on this patch.
[1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/T/#u [3] https://lore.kernel.org/all/[email protected]/
[[email protected]: rename label `loop' to `ext_put_continue' per David] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Charan Teja Kalla <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Fernand Sieber <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Pavan Kondeti <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1 |
|
| #
1c676e0d |
| 08-Sep-2021 |
SeongJae Park <[email protected]> |
mm/idle_page_tracking: make PG_idle reusable
PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page Tracking and the reclaim logic concurrently work while not interfering with each oth
mm/idle_page_tracking: make PG_idle reusable
PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page Tracking and the reclaim logic concurrently work while not interfering with each other. That is, when they need to clear the Accessed bit, they set PG_young to represent the previous state of the bit, respectively. And when they need to read the bit, if the bit is cleared, they further read the PG_young to know whether the other has cleared the bit meanwhile or not.
For yet another user of the PTE Accessed bit, we could add another page flag, or extend the mechanism to use the flags. For the DAMON usecase, however, we don't need to do that just yet. IDLE_PAGE_TRACKING and DAMON are mutually exclusive, so there's only ever going to be one user of the current set of flags.
In this commit, we split out the CONFIG options to allow for the use of PG_young and PG_idle outside of idle page tracking.
In the next commit, DAMON's reference implementation of the virtual memory address space monitoring primitives will use it.
[[email protected]: set PAGE_EXTENSION for non-64BIT] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: tweak Kconfig text] [[email protected]: hide PAGE_IDLE_FLAG from users] Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Fernand Sieber <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Amit Shah <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Fan Du <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leonard Foerster <[email protected]> Cc: Marco Elver <[email protected]> Cc: Markus Boehme <[email protected]> Cc: Maximilian Heyne <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1 |
|
| #
7fb7ab6d |
| 15-Dec-2020 |
Zhenhua Huang <[email protected]> |
mm: fix page_owner initializing issue for arm32
Page owner of pages used by page owner itself used is missing on arm32 targets. The reason is dummy_handle and failure_handle is not initialized corr
mm: fix page_owner initializing issue for arm32
Page owner of pages used by page owner itself used is missing on arm32 targets. The reason is dummy_handle and failure_handle is not initialized correctly. Buddy allocator is used to initialize these two handles. However, buddy allocator is not ready when page owner calls it. This change fixed that by initializing page owner after buddy initialization.
The working flow before and after this change are: original logic: 1. allocated memory for page_ext(using memblock). 2. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). 3. initialize buddy.
after this change: 1. allocated memory for page_ext(using memblock). 2. initialize buddy. 3. invoke the init callback of page_ext_ops like page_owner(using buddy allocator).
with the change, failure/dummy_handle can get its correct value and page owner output for example has the one for page owner itself:
Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0() init_page_owner+0x28/0x2f8 invoke_init_callbacks_flatmem+0x24/0x34 start_kernel+0x33c/0x5d8
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhenhua Huang <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4 |
|
| #
fdf3bf80 |
| 14-Oct-2019 |
Vlastimil Babka <[email protected]> |
mm, page_owner: rename flag indicating that page is allocated
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate th
mm, page_owner: rename flag indicating that page is allocated
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page is tracked as being allocated. Kirril suggested naming it PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat loaded term for a page".
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Suggested-by: Kirill A. Shutemov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Walter Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
5556cfe8 |
| 14-Oct-2019 |
Vlastimil Babka <[email protected]> |
mm, page_owner: fix off-by-one error in __set_page_owner_handle()
Patch series "followups to debug_pagealloc improvements through page_owner", v3.
These are followups to [1] which made it to Linus
mm, page_owner: fix off-by-one error in __set_page_owner_handle()
Patch series "followups to debug_pagealloc improvements through page_owner", v3.
These are followups to [1] which made it to Linus meanwhile. Patches 1 and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It would be nice if all of this made it to 5.4 with [1] already there (or at least Patch 1).
This patch (of 3):
As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") has introduced an off-by-one error in __set_page_owner_handle() when looking up page_ext for subpages. As a result, the head page page_owner info is set twice, while for the last tail page, it's not set at all.
Fix this and also make the code more efficient by advancing the page_ext pointer we already have, instead of calling lookup_page_ext() for each subpage. Since the full size of struct page_ext is not known at compile time, we can't use a simple page_ext++ statement, so introduce a page_ext_next() inline function for that.
Link: http://lkml.kernel.org/r/[email protected] Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Kirill A. Shutemov <[email protected]> Reported-by: Miles Chen <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Walter Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.4-rc3, v5.4-rc2, v5.4-rc1 |
|
| #
37389167 |
| 23-Sep-2019 |
Vlastimil Babka <[email protected]> |
mm, page_owner: keep owner info when freeing the page
For debugging purposes it might be useful to keep the owner info even after page has been freed, and include it in e.g. dump_page() when detect
mm, page_owner: keep owner info when freeing the page
For debugging purposes it might be useful to keep the owner info even after page has been freed, and include it in e.g. dump_page() when detecting a bad page state. For that, change the PAGE_EXT_OWNER flag meaning to "page owner info has been set at least once" and add new PAGE_EXT_OWNER_ACTIVE for tracking whether page is supposed to be currently tracked allocated or free. Adjust dump_page() accordingly, distinguishing free and allocated pages. In the page_owner debugfs file, keep printing only allocated pages so that existing scripts are not confused, and also because free pages are irrelevant for the memory statistics or leak detection that's the typical use case of the file, anyway.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1 |
|
| #
3972f6bb |
| 12-Jul-2019 |
Vlastimil Babka <[email protected]> |
mm, debug_pagealloc: use a page type instead of page_ext flag
When debug_pagealloc is enabled, we currently allocate the page_ext array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag. Now t
mm, debug_pagealloc: use a page type instead of page_ext flag
When debug_pagealloc is enabled, we currently allocate the page_ext array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag. Now that we have the page_type field in struct page, we can use that instead, as guard pages are neither PageSlab nor mapped to userspace. This reduces memory overhead when debug_pagealloc is enabled and there are no other features requiring the page_ext array.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1 |
|
| #
10ed6341 |
| 17-Aug-2018 |
Kirill A. Shutemov <[email protected]> |
mm/page_ext.c: constify lookup_page_ext() argument
lookup_page_ext() finds 'struct page_ext' for a given page. It requires only read access to the given struct page.
Current implemnentation takes
mm/page_ext.c: constify lookup_page_ext() argument
lookup_page_ext() finds 'struct page_ext' for a given page. It requires only read access to the given struct page.
Current implemnentation takes 'struct page *' as an argument. It makes compiler complain when 'const struct page *' passed.
Change the argument to 'const struct page *'.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
b3a23696 |
| 17-Aug-2018 |
Kirill A. Shutemov <[email protected]> |
include/linux/page_ext.h: drop definition of unused PAGE_EXT_DEBUG_POISON
After commit bd33ef368135 ("mm: enable page poisoning early at boot") PAGE_EXT_DEBUG_POISON is not longer used. Remove it.
include/linux/page_ext.h: drop definition of unused PAGE_EXT_DEBUG_POISON
After commit bd33ef368135 ("mm: enable page poisoning early at boot") PAGE_EXT_DEBUG_POISON is not longer used. Remove it.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8 |
|
| #
b2441318 |
| 01-Nov-2017 |
Greg Kroah-Hartman <[email protected]> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license identifiers to apply.
- when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary:
SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became the concluded license(s).
- when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time.
In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches.
Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Philippe Ombredanne <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3, v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3, v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8, v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3, v4.9-rc2, v4.9-rc1 |
|
| #
9300d8df |
| 07-Oct-2016 |
Joonsoo Kim <[email protected]> |
mm/page_owner: don't define fields on struct page_ext by hard-coding
There is a memory waste problem if we define field on struct page_ext by hard-coding. Entry size of struct page_ext includes the
mm/page_owner: don't define fields on struct page_ext by hard-coding
There is a memory waste problem if we define field on struct page_ext by hard-coding. Entry size of struct page_ext includes the size of those fields even if it is disabled at runtime. Now, extra memory request at runtime is possible so page_owner don't need to define it's own fields by hard-coding.
This patch removes hard-coded define and uses extra memory for storing page_owner information in page_owner. Most of code are just mechanical changes.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
980ac167 |
| 07-Oct-2016 |
Joonsoo Kim <[email protected]> |
mm/page_ext: support extra space allocation by page_ext user
Until now, if some page_ext users want to use it's own field on page_ext, it should be defined in struct page_ext by hard-coding. It has
mm/page_ext: support extra space allocation by page_ext user
Until now, if some page_ext users want to use it's own field on page_ext, it should be defined in struct page_ext by hard-coding. It has a problem that wastes memory in following situation.
struct page_ext { #ifdef CONFIG_A int a; #endif #ifdef CONFIG_B int b; #endif };
Assume that kernel is built with both CONFIG_A and CONFIG_B. Even if we enable feature A and doesn't enable feature B at runtime, each entry of struct page_ext takes two int rather than one int. It's undesirable result so this patch tries to fix it.
To solve above problem, this patch implements to support extra space allocation at runtime. When need() callback returns true, it's extra memory requirement is summed to entry size of page_ext. Also, offset for each user's extra memory space is returned. With this offset, user can use this extra space and there is no need to define needed field on page_ext by hard-coding.
This patch only implements an infrastructure. Following patch will use it for page_owner which is only user having it's own fields on page_ext.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.8, v4.8-rc8, v4.8-rc7, v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2, v4.8-rc1 |
|
| #
f2ca0b55 |
| 26-Jul-2016 |
Joonsoo Kim <[email protected]> |
mm/page_owner: use stackdepot to store stacktrace
Currently, we store each page's allocation stacktrace on corresponding page_ext structure and it requires a lot of memory. This causes the problem
mm/page_owner: use stackdepot to store stacktrace
Currently, we store each page's allocation stacktrace on corresponding page_ext structure and it requires a lot of memory. This causes the problem that memory tight system doesn't work well if page_owner is enabled. Moreover, even with this large memory consumption, we cannot get full stacktrace because we allocate memory at boot time and just maintain 8 stacktrace slots to balance memory consumption. We could increase it to more but it would make system unusable or change system behaviour.
To solve the problem, this patch uses stackdepot to store stacktrace. It obviously provides memory saving but there is a drawback that stackdepot could fail.
stackdepot allocates memory at runtime so it could fail if system has not enough memory. But, most of allocation stack are generated at very early time and there are much memory at this time. So, failure would not happen easily. And, one failure means that we miss just one page's allocation stacktrace so it would not be a big problem. In this patch, when memory allocation failure happens, we store special stracktrace handle to the page that is failed to save stacktrace. With it, user can guess memory usage properly even if failure happens.
Memory saving looks as following. (4GB memory system with page_owner) (before the patch -> after the patch)
static allocation: 92274688 bytes -> 25165824 bytes
dynamic allocation after boot + kernel build: 0 bytes -> 327680 bytes
total: 92274688 bytes -> 25493504 bytes
72% reduction in total.
Note that implementation looks complex than someone would imagine because there is recursion issue. stackdepot uses page allocator and page_owner is called at page allocation. Using stackdepot in page_owner could re-call page allcator and then page_owner. That is a recursion. To detect and avoid it, whenever we obtain stacktrace, recursion is checked and page_owner is set to dummy information if found. Dummy information means that this page is allocated for page_owner feature itself (such as stackdepot) and it's understandable behavior for user.
[[email protected]: mm-page_owner-use-stackdepot-to-store-stacktrace-v3] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3, v4.7-rc2, v4.7-rc1, v4.6, v4.6-rc7, v4.6-rc6, v4.6-rc5, v4.6-rc4, v4.6-rc3, v4.6-rc2, v4.6-rc1 |
|
| #
7cd12b4a |
| 15-Mar-2016 |
Vlastimil Babka <[email protected]> |
mm, page_owner: track and print last migrate reason
During migration, page_owner info is now copied with the rest of the page, so the stacktrace leading to free page allocation during migration is o
mm, page_owner: track and print last migrate reason
During migration, page_owner info is now copied with the rest of the page, so the stacktrace leading to free page allocation during migration is overwritten. For debugging purposes, it might be however useful to know that the page has been migrated since its initial allocation. This might happen many times during the lifetime for different reasons and fully tracking this, especially with stacktraces would incur extra memory costs. As a compromise, store and print the migrate_reason of the last migration that occurred to the page. This is enough to distinguish compaction, numa balancing etc.
Example page_owner entry after the patch:
Page allocated via order 0, mask 0x24200ca(GFP_HIGHUSER_MOVABLE) PFN 628753 type Movable Block 1228 type Movable Flags 0x1fffff80040030(dirty|lru|swapbacked) [<ffffffff811682c4>] __alloc_pages_nodemask+0x134/0x230 [<ffffffff811b6325>] alloc_pages_vma+0xb5/0x250 [<ffffffff81177491>] shmem_alloc_page+0x61/0x90 [<ffffffff8117a438>] shmem_getpage_gfp+0x678/0x960 [<ffffffff8117c2b9>] shmem_fallocate+0x329/0x440 [<ffffffff811de600>] vfs_fallocate+0x140/0x230 [<ffffffff811df434>] SyS_fallocate+0x44/0x70 [<ffffffff8158cc2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Page has been migrated, last migrate reason: compaction
Signed-off-by: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Sasha Levin <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.5, v4.5-rc7, v4.5-rc6, v4.5-rc5, v4.5-rc4, v4.5-rc3, v4.5-rc2, v4.5-rc1, v4.4, v4.4-rc8, v4.4-rc7, v4.4-rc6, v4.4-rc5, v4.4-rc4, v4.4-rc3, v4.4-rc2, v4.4-rc1, v4.3, v4.3-rc7, v4.3-rc6, v4.3-rc5, v4.3-rc4, v4.3-rc3, v4.3-rc2, v4.3-rc1 |
|
| #
33c3fc71 |
| 09-Sep-2015 |
Vladimir Davydov <[email protected]> |
mm: introduce idle page tracking
Knowing the portion of memory that is not used by a certain application or memory cgroup (idle memory) can be useful for partitioning the system efficiently, e.g. b
mm: introduce idle page tracking
Knowing the portion of memory that is not used by a certain application or memory cgroup (idle memory) can be useful for partitioning the system efficiently, e.g. by setting memory cgroup limits appropriately. Currently, the only means to estimate the amount of idle memory provided by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the access bit for all pages mapped to a particular process by writing 1 to clear_refs, wait for some time, and then count smaps:Referenced. However, this method has two serious shortcomings:
- it does not count unmapped file pages - it affects the reclaimer logic
To overcome these drawbacks, this patch introduces two new page flags, Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap. A page's Idle flag can only be set from userspace by setting bit in /sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page, and it is cleared whenever the page is accessed either through page tables (it is cleared in page_referenced() in this case) or using the read(2) system call (mark_page_accessed()). Thus by setting the Idle flag for pages of a particular workload, which can be found e.g. by reading /proc/PID/pagemap, waiting for some time to let the workload access its working set, and then reading the bitmap file, one can estimate the amount of pages that are not used by the workload.
The Young page flag is used to avoid interference with the memory reclaimer. A page's Young flag is set whenever the Access bit of a page table entry pointing to the page is cleared by writing to the bitmap file. If page_referenced() is called on a Young page, it will add 1 to its return value, therefore concealing the fact that the Access bit was cleared.
Note, since there is no room for extra page flags on 32 bit, this feature uses extended page flags when compiled on 32 bit.
[[email protected]: fix build] [[email protected]: kpageidle requires an MMU] [[email protected]: decouple from page-flags rework] Signed-off-by: Vladimir Davydov <[email protected]> Reviewed-by: Andres Lagar-Cavilla <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Raghavendra K T <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: David Rientjes <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2, v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1 |
|
| #
94f759d6 |
| 11-Feb-2015 |
Sergei Rogachev <[email protected]> |
mm/page_owner.c: remove unnecessary stack_trace field
Page owner uses the page_ext structure to keep meta-information for every page in the system. The structure also contains a field of type 'stru
mm/page_owner.c: remove unnecessary stack_trace field
Page owner uses the page_ext structure to keep meta-information for every page in the system. The structure also contains a field of type 'struct stack_trace', page owner uses this field during invocation of the function save_stack_trace. It is easy to notice that keeping a copy of this structure for every page in the system is very inefficiently in terms of memory.
The patch removes this unnecessary field of page_ext and forces page owner to use a stack_trace structure allocated on the stack.
[[email protected]: use struct initializers] Signed-off-by: Sergei Rogachev <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|