|
Revision tags: v6.15, v6.15-rc7 |
|
| #
12ca42c2 |
| 17-May-2025 |
Suren Baghdasaryan <[email protected]> |
alloc_tag: allocate percpu counters for module tags dynamically
When a module gets unloaded it checks whether any of its tags are still in use and if so, we keep the memory containing module's alloc
alloc_tag: allocate percpu counters for module tags dynamically
When a module gets unloaded it checks whether any of its tags are still in use and if so, we keep the memory containing module's allocation tags alive until all tags are unused. However percpu counters referenced by the tags are freed by free_module(). This will lead to UAF if the memory allocated by a module is accessed after module was unloaded.
To fix this we allocate percpu counters for module allocation tags dynamically and we keep it alive for tags which are still in use after module unloading. This also removes the requirement of a larger PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because percpu memory for counters does not need to be reserved anymore.
Link: https://lkml.kernel.org/r/[email protected] Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: David Wang <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Tested-by: David Wang <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
8f3ce3d9 |
| 07-Oct-2024 |
Sebastian Andrzej Siewior <[email protected]> |
mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on certain builds.
Arnd reported a build failure due to the BUILD_BUG_ON() statement in alloc_kmem_cache_cpus(). The test
PERCPU_DYNAMIC_EARLY_SIZE
mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on certain builds.
Arnd reported a build failure due to the BUILD_BUG_ON() statement in alloc_kmem_cache_cpus(). The test
PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)
The factors that increase the right side of the equation: - PAGE_SIZE > 4KiB increases KMALLOC_SHIFT_HIGH - For the local_lock_t in kmem_cache_cpu: - PREEMPT_RT adds an actual lock. - LOCKDEP increases the size of the lock. - LOCK_STAT adds additional bytes plus padding to the lockdep structure.
The net difference with and without PREEMPT_RT is 88 bytes for the lock_lock_t, 96 bytes for kmem_cache_cpu due to additional padding. This is enough to exceed the 80KiB limit with 16KiB page size - the 8KiB page size is fine.
Increase PERCPU_DYNAMIC_SIZE_SHIFT to 13 on configs with PAGE_SIZE larger than 4KiB and LOCKDEP enabled.
Link: https://lkml.kernel.org/r/[email protected] Fixes: d8fccd9ca5f9 ("arm64: Allow to enable PREEMPT_RT.") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reported-by: Arnd Bergmann <[email protected]> Closes: https://lore.kernel.org/[email protected] Acked-by: Arnd Bergmann <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3 |
|
| #
47baed6a |
| 07-Aug-2024 |
Jianhui Zhou <[email protected]> |
percpu: remove pcpu_alloc_size()
pcpu_alloc_size() was added in 7ac5c53e0073 "mm/percpu.c: introduce pcpu_alloc_size()", which is used to get the allocated memory size in bpf. However, pcpu_alloc_si
percpu: remove pcpu_alloc_size()
pcpu_alloc_size() was added in 7ac5c53e0073 "mm/percpu.c: introduce pcpu_alloc_size()", which is used to get the allocated memory size in bpf. However, pcpu_alloc_size() is no longer used in "bpf: Use c->unit_size to select target cache during free" because its actuall allocated memory size may change at runtime due to its slab merging mechanism. Therefore, pcpu_alloc_size() can be removed.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jianhui Zhou <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: JonasZhou <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
7f36688f |
| 28-May-2024 |
Yury Norov <[email protected]> |
cpumask: cleanup core headers inclusion
Many core headers include cpumask.h for nothing. Drop it.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yury No
cpumask: cleanup core headers inclusion
Many core headers include cpumask.h for nothing. Drop it.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yury Norov <[email protected]> Cc: Amit Daniel Kachhap <[email protected]> Cc: Anna-Maria Behnsen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5 |
|
| #
2c321f3f |
| 15-Apr-2024 |
Suren Baghdasaryan <[email protected]> |
mm: change inlined allocation helpers to account at the call site
Main goal of memory allocation profiling patchset is to provide accounting that is cheap enough to run in production. To achieve th
mm: change inlined allocation helpers to account at the call site
Main goal of memory allocation profiling patchset is to provide accounting that is cheap enough to run in production. To achieve that we inject counters using codetags at the allocation call sites to account every time allocation is made. This injection allows us to perform accounting efficiently because injected counters are immediately available as opposed to the alternative methods, such as using _RET_IP_, which would require counter lookup and appropriate locking that makes accounting much more expensive. This method requires all allocation functions to inject separate counters at their call sites so that their callers can be individually accounted. Counter injection is implemented by allocation hooks which should wrap all allocation functions.
Inlined functions which perform allocations but do not use allocation hooks are directly charged for the allocations they perform. In most cases these functions are just specialized allocation wrappers used from multiple places to allocate objects of a specific type. It would be more useful to do the accounting at their call sites instead. Instrument these helpers to do accounting at the call site. Simple inlined allocation wrappers are converted directly into macros. More complex allocators or allocators with documentation are converted into _noprof versions and allocation hooks are added. This allows memory allocation profiling mechanism to charge allocations to the callers of these functions.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Jan Kara <[email protected]> [jbd2] Cc: Anna Schumaker <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Tissoires <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: David S. Miller <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jakub Sitnicki <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
24e44cc2 |
| 21-Mar-2024 |
Suren Baghdasaryan <[email protected]> |
mm: percpu: enable per-cpu allocation tagging
Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu to record allocations and deallocations done by these functions.
[surenb@google
mm: percpu: enable per-cpu allocation tagging
Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu to record allocations and deallocations done by these functions.
[[email protected]: undo _noprof additions in the documentation] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
ccdabb1d |
| 21-Mar-2024 |
Suren Baghdasaryan <[email protected]> |
mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags
As each allocation tag generates a per-cpu variable, more space is required to store them. Increase PERCPU_MODULE_RESERVE t
mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags
As each allocation tag generates a per-cpu variable, more space is required to store them. Increase PERCPU_MODULE_RESERVE to provide enough area. A better long-term solution would be to allocate this memory dynamically.
[[email protected]: increase PERCPU_MODULE_RESERVE to accommodate allocation tags] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
b460bc83 |
| 20-Oct-2023 |
Hou Tao <[email protected]> |
mm/percpu.c: introduce pcpu_alloc_size()
Introduce pcpu_alloc_size() to get the size of the dynamic per-cpu area. It will be used by bpf memory allocator in the following patches. BPF memory allocat
mm/percpu.c: introduce pcpu_alloc_size()
Introduce pcpu_alloc_size() to get the size of the dynamic per-cpu area. It will be used by bpf memory allocator in the following patches. BPF memory allocator maintains per-cpu area caches for multiple area sizes and its free API only has the to-be-freed per-cpu pointer, so it needs the size of dynamic per-cpu area to select the corresponding cache when bpf program frees the dynamic per-cpu pointer.
Acked-by: Dennis Zhou <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2 |
|
| #
3c615294 |
| 14-Jul-2023 |
GONG, Ruiqi <[email protected]> |
Randomized slab caches for kmalloc()
When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it play
Randomized slab caches for kmalloc()
When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc.
There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill.
To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed.
Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups.
The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below:
sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec)
control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759
experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343
The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies:
control 4 copies 8 copies 16 copies
total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M
Co-developed-by: Xiu Jianfeng <[email protected]> Signed-off-by: Xiu Jianfeng <[email protected]> Signed-off-by: GONG, Ruiqi <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Acked-by: Dennis Zhou <[email protected]> # percpu Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4 |
|
| #
54da6a09 |
| 26-May-2023 |
Peter Zijlstra <[email protected]> |
locking: Introduce __cleanup() based infrastructure
Use __attribute__((__cleanup__(func))) to build:
- simple auto-release pointers using __free()
- 'classes' with constructor and destructor sem
locking: Introduce __cleanup() based infrastructure
Use __attribute__((__cleanup__(func))) to build:
- simple auto-release pointers using __free()
- 'classes' with constructor and destructor semantics for scope-based resource management.
- lock guards based on the above classes.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/20230612093537.614161713%40infradead.org
show more ...
|
|
Revision tags: v6.4-rc3 |
|
| #
6ca0f81c |
| 17-May-2023 |
Arnd Bergmann <[email protected]> |
mm: percpu: unhide pcpu_embed_first_chunk prototype
Patch series "mm/init/kernel: missing-prototypes warnings".
These are patches addressing -Wmissing-prototypes warnings in common kernel code and
mm: percpu: unhide pcpu_embed_first_chunk prototype
Patch series "mm/init/kernel: missing-prototypes warnings".
These are patches addressing -Wmissing-prototypes warnings in common kernel code and memory management code files that usually get merged through the -mm tree.
This patch (of 12):
This function is called whenever CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK or CONFIG_HAVE_SETUP_PER_CPU_AREA, but only declared when the former is set:
mm/percpu.c:3055:12: error: no previous prototype for 'pcpu_embed_first_chunk' [-Werror=missing-prototypes]
There is no real point in hiding declarations, so just remove the #ifdef here.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Eric Paris <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Moore <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5 |
|
| #
e8753e41 |
| 13-Nov-2022 |
Baoquan He <[email protected]> |
percpu: adjust the value of PERCPU_DYNAMIC_EARLY_SIZE
LKP reported a build failure as below on the following patch "mm/slub, percpu: correct the calculation of early percpu allocation size"
~~~~~~
percpu: adjust the value of PERCPU_DYNAMIC_EARLY_SIZE
LKP reported a build failure as below on the following patch "mm/slub, percpu: correct the calculation of early percpu allocation size"
~~~~~~ In file included from <command-line>: In function 'alloc_kmem_cache_cpus', inlined from 'kmem_cache_open' at mm/slub.c:4340:6: >> >> include/linux/compiler_types.h:357:45: error: call to '__compiletime_assert_474' declared with attribute error: BUILD_BUG_ON failed: PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu) 357 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ~~~~~~
From the kernel config file provided by LKP, the building was made on arm64 with below Kconfig item enabled:
CONFIG_ZONE_DMA=y CONFIG_SLUB_CPU_PARTIAL=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_SLUB_STATS=y CONFIG_ARM64_PAGE_SHIFT=16 CONFIG_ARM64_64K_PAGES=y
Then we will have: NR_KMALLOC_TYPES:4 KMALLOC_SHIFT_HIGH:17 sizeof(struct kmem_cache_cpu):184
The product of them is 12512, which is bigger than PERCPU_DYNAMIC_EARLY_SIZE, 12K. Hence, the BUILD_BUG_ON in alloc_kmem_cache_cpus() is triggered.
Earlier, in commit 099a19d91ca4 ("percpu: allow limited allocation before slab is online"), PERCPU_DYNAMIC_EARLY_SIZE was introduced and set to 12K which is equal to the then PERPCU_DYNAMIC_RESERVE. Later, in commit 1a4d76076cda ("percpu: implement asynchronous chunk population"), PERPCU_DYNAMIC_RESERVE was increased by 8K, while PERCPU_DYNAMIC_EARLY_SIZE was kept unchanged.
So, here increase PERCPU_DYNAMIC_EARLY_SIZE by 8K too to accommodate to the slub's requirement.
Reported-by: kernel test robot <[email protected]> Signed-off-by: Baoquan He <[email protected]> Acked-by: Dennis Zhou <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc4, v6.1-rc3 |
|
| #
d667c949 |
| 24-Oct-2022 |
Baoquan He <[email protected]> |
mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTS
Since commit 40064aeca35c ("percpu: replace area map allocator with bitmap"), there's no place to use PERCPU_DYNAMIC_EARLY_SLOTS. So clean it up.
mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTS
Since commit 40064aeca35c ("percpu: replace area map allocator with bitmap"), there's no place to use PERCPU_DYNAMIC_EARLY_SLOTS. So clean it up.
Signed-off-by: Baoquan He <[email protected]> Signed-off-by: Dennis Zhou <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
20c03576 |
| 20-Jan-2022 |
Kefeng Wang <[email protected]> |
mm: percpu: add generic pcpu_populate_pte() function
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to populate pte, this patch adds a generic pcpu populate pte function, pcpu_popula
mm: percpu: add generic pcpu_populate_pte() function
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to populate pte, this patch adds a generic pcpu populate pte function, pcpu_populate_pte(), which is marked __weak and used on most architectures, but it is overridden on x86, which has its own implementation.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Albert Ou <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
23f91716 |
| 20-Jan-2022 |
Kefeng Wang <[email protected]> |
mm: percpu: add generic pcpu_fc_alloc/free funciton
With the previous patch, we could add a generic pcpu first chunk allocate and free function to cleanup the duplicated definations on each architec
mm: percpu: add generic pcpu_fc_alloc/free funciton
With the previous patch, we could add a generic pcpu first chunk allocate and free function to cleanup the duplicated definations on each architecture.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Albert Ou <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
1ca3fb3a |
| 20-Jan-2022 |
Kefeng Wang <[email protected]> |
mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu first chunk allocation will call it to alloc memblock on the corresponding nod
mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu first chunk allocation will call it to alloc memblock on the corresponding node by it, this is prepare for the next patch.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Albert Ou <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1 |
|
| #
a0ddee65 |
| 12-Nov-2021 |
Andy Shevchenko <[email protected]> |
printk: Remove printk.h inclusion in percpu.h
After the commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") the printk.h is not needed anymore in percpu.h.
Moreover `make he
printk: Remove printk.h inclusion in percpu.h
After the commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") the printk.h is not needed anymore in percpu.h.
Moreover `make headerdep` complains (an excerpt)
In file included from linux/printk.h, from linux/dynamic_debug.h:188 from linux/printk.h:559 <-- here from linux/percpu.h:9 from linux/idr.h:17 include/net/9p/client.h:13: warning: recursive header inclusion
Yeah, it's not a root cause of this, but removing will help to reduce the noise.
Fixes: 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Dennis Zhou <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
17197dd4 |
| 05-Nov-2021 |
Kees Cook <[email protected]> |
percpu: add __alloc_size attributes for better bounds checking
As already done in GrapheneOS, add the __alloc_size attribute for appropriate percpu allocator interfaces, to provide additional hintin
percpu: add __alloc_size attributes for better bounds checking
As already done in GrapheneOS, add the __alloc_size attribute for appropriate percpu allocator interfaces, to provide additional hinting for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other compiler optimizations.
Note that due to the implementation of the percpu API, this is unlikely to ever actually provide compile-time checking beyond very simple non-SMP builds. But, since they are technically allocators, mark them as such.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Co-developed-by: Daniel Micay <[email protected]> Signed-off-by: Daniel Micay <[email protected]> Acked-by: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dwaipayan Ray <[email protected]> Cc: Joe Perches <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Lukas Bulwahn <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Alexandre Bounine <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jing Xiangfeng <[email protected]> Cc: John Hubbard <[email protected]> Cc: kernel test robot <[email protected]> Cc: Matt Porter <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Souptick Joarder <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2 |
|
| #
163fa234 |
| 03-Jul-2019 |
Kefeng Wang <[email protected]> |
percpu: Make pcpu_setup_first_chunk() void function
pcpu_setup_first_chunk() will panic or BUG_ON if the are some error and doesn't return any error, hence it can be defined to return void.
Reporte
percpu: Make pcpu_setup_first_chunk() void function
pcpu_setup_first_chunk() will panic or BUG_ON if the are some error and doesn't return any error, hence it can be defined to return void.
Reported-by: kbuild test robot <[email protected]> Signed-off-by: Kefeng Wang <[email protected]> Signed-off-by: Dennis Zhou <[email protected]> [Dennis: fixed kbuild warning for pcpu_page_first_chunk()]
show more ...
|
|
Revision tags: v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7 |
|
| #
b239f7da |
| 13-Feb-2019 |
Dennis Zhou <[email protected]> |
percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE
Previously, block size was flexible based on the constraint that the GCD(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) > 1. However, this carried the overhead th
percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE
Previously, block size was flexible based on the constraint that the GCD(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) > 1. However, this carried the overhead that keeping a floating number of populated free pages required scanning over the free regions of a chunk.
Setting the block size to be fixed at PAGE_SIZE lets us know when an empty page becomes used as we will break a full contig_hint of a block. This means we no longer have to scan the whole chunk upon breaking a contig_hint which empty page management piggybacked off. A later patch takes advantage of this to optimize the allocation path by only scanning forward using the scan_hint introduced later too.
Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Peng Fan <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1 |
|
| #
7e8a6304 |
| 22-Aug-2018 |
Dennis Zhou (Facebook) <[email protected]> |
/proc/meminfo: add percpu populated pages count
Currently, percpu memory only exposes allocation and utilization information via debugfs. This more or less is only really useful for understanding t
/proc/meminfo: add percpu populated pages count
Currently, percpu memory only exposes allocation and utilization information via debugfs. This more or less is only really useful for understanding the fragmentation and allocation information at a per-chunk level with a few global counters. This is also gated behind a config. BPF and cgroup, for example, have seen an increase in use causing increased use of percpu memory. Let's make it easier for someone to identify how much memory is being used.
This patch adds the "Percpu" stat to meminfo to more easily look up how much percpu memory is in use. This number includes the cost for all allocated backing pages and not just insight at the per a unit, per chunk level. Metadata is excluded. I think excluding metadata is fair because the backing memory scales with the numbere of cpus and can quickly outweigh the metadata. It also makes this calculation light.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Roman Gushchin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8 |
|
| #
b2441318 |
| 01-Nov-2017 |
Greg Kroah-Hartman <[email protected]> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license identifiers to apply.
- when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary:
SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became the concluded license(s).
- when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time.
In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches.
Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Philippe Ombredanne <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3 |
|
| #
b185cd0d |
| 24-Jul-2017 |
Dennis Zhou (Facebook) <[email protected]> |
percpu: update free path to take advantage of contig hints
The bitmap allocator must keep metadata consistent. The easiest way is to scan after every allocation for each affected block and the entir
percpu: update free path to take advantage of contig hints
The bitmap allocator must keep metadata consistent. The easiest way is to scan after every allocation for each affected block and the entire chunk. This is rather expensive.
The free path can take advantage of current contig hints to prevent scanning within the start and end block. If a scan is needed, it can be done by scanning backwards from the start and forwards from the end to identify the entire free area this can be combined with. The blocks can then be updated by some basic checks rather than complete block scans.
A chunk scan happens when the freed area makes a page free, a block free, or spans across blocks. This is necessary as the contig hint at this point could span across blocks. The check uses the minimum of page size and the block size to allow for variable sized blocks. There is a tradeoff here with not updating after every free. It is possible a contig hint in one block can be merged with the contig hint in the next block. This means the contig hint can be off by up to a page. However, if the chunk's contig hint is contained in one block, the contig hint will be accurate.
Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
show more ...
|
| #
ca460b3c |
| 24-Jul-2017 |
Dennis Zhou (Facebook) <[email protected]> |
percpu: introduce bitmap metadata blocks
This patch introduces the bitmap metadata blocks and adds the skeleton of the code that will be used to maintain these blocks. Each chunk's bitmap is made u
percpu: introduce bitmap metadata blocks
This patch introduces the bitmap metadata blocks and adds the skeleton of the code that will be used to maintain these blocks. Each chunk's bitmap is made up of full metadata blocks. These blocks maintain basic metadata to help prevent scanning unnecssarily to update hints. Full scanning methods are used for the skeleton and will be replaced in the coming patches. A number of helper functions are added as well to do conversion of pages to blocks and manage offsets. Comments will be updated as the final version of each function is added.
There exists a relationship between PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE, the region size, and unit_size. Every chunk's region (including offsets) is page aligned at the beginning to preserve alignment. The end is aligned to LCM(PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE) to ensure that the end can fit with the populated page map which is by page and every metadata block is fully accounted for. The unit_size is already page aligned, but must also be aligned with PCPU_BITMAP_BLOCK_SIZE to ensure full metadata blocks.
Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
show more ...
|
|
Revision tags: v4.13-rc2, v4.13-rc1 |
|
| #
40064aec |
| 12-Jul-2017 |
Dennis Zhou (Facebook) <[email protected]> |
percpu: replace area map allocator with bitmap
The percpu memory allocator is experiencing scalability issues when allocating and freeing large numbers of counters as in BPF. Additionally, there is
percpu: replace area map allocator with bitmap
The percpu memory allocator is experiencing scalability issues when allocating and freeing large numbers of counters as in BPF. Additionally, there is a corner case where iteration is triggered over all chunks if the contig_hint is the right size, but wrong alignment.
This patch replaces the area map allocator with a basic bitmap allocator implementation. Each subsequent patch will introduce new features and replace full scanning functions with faster non-scanning options when possible.
Implementation: This patchset removes the area map allocator in favor of a bitmap allocator backed by metadata blocks. The primary goal is to provide consistency in performance and memory footprint with a focus on small allocations (< 64 bytes). The bitmap removes the heavy memmove from the freeing critical path and provides a consistent memory footprint. The metadata blocks provide a bound on the amount of scanning required by maintaining a set of hints.
In an effort to make freeing fast, the metadata is updated on the free path if the new free area makes a page free, a block free, or spans across blocks. This causes the chunk's contig hint to potentially be smaller than what it could allocate by up to the smaller of a page or a block. If the chunk's contig hint is contained within a block, a check occurs and the hint is kept accurate. Metadata is always kept accurate on allocation, so there will not be a situation where a chunk has a later contig hint than available.
Evaluation: I have primarily done testing against a simple workload of allocation of 1 million objects (2^20) of varying size. Deallocation was done by in order, alternating, and in reverse. These numbers were collected after rebasing ontop of a80099a152. I present the worst-case numbers here:
Area Map Allocator:
Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 310 | 4770 16B | 557 | 1325 64B | 436 | 273 256B | 776 | 131 1024B | 3280 | 122
Bitmap Allocator:
Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 490 | 70 16B | 515 | 75 64B | 610 | 80 256B | 950 | 100 1024B | 3520 | 200
This data demonstrates the inability for the area map allocator to handle less than ideal situations. In the best case of reverse deallocation, the area map allocator was able to perform within range of the bitmap allocator. In the worst case situation, freeing took nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator dramatically improves the consistency of the free path. The small allocations performed nearly identical regardless of the freeing pattern.
While it does add to the allocation latency, the allocation scenario here is optimal for the area map allocator. The area map allocator runs into trouble when it is allocating in chunks where the latter half is full. It is difficult to replicate this, so I present a variant where the pages are second half filled. Freeing was done sequentially. Below are the numbers for this scenario:
Area Map Allocator:
Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 4118 | 4892 16B | 1651 | 1163 64B | 598 | 285 256B | 771 | 158 1024B | 3034 | 160
Bitmap Allocator:
Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 481 | 67 16B | 506 | 69 64B | 636 | 75 256B | 892 | 90 1024B | 3262 | 147
The data shows a parabolic curve of performance for the area map allocator. This is due to the memmove operation being the dominant cost with the lower object sizes as more objects are packed in a chunk and at higher object sizes, the traversal of the chunk slots is the dominating cost. The bitmap allocator suffers this problem as well. The above data shows the inability to scale for the allocation path with the area map allocator and that the bitmap allocator demonstrates consistent performance in general.
The second problem of additional scanning can result in the area map allocator completing in 52 minutes when trying to allocate 1 million 4-byte objects with 8-byte alignment. The same workload takes approximately 16 seconds to complete for the bitmap allocator.
V2: Fixed a bug in pcpu_alloc_first_chunk end_offset was setting the bitmap using bytes instead of bits.
Added a comment to pcpu_cnt_pop_pages to explain bitmap_weight.
Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
show more ...
|