History log of /linux-6.15/include/linux/percpu.h (Results 1 – 25 of 133)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7
# 12ca42c2 17-May-2025 Suren Baghdasaryan <[email protected]>

alloc_tag: allocate percpu counters for module tags dynamically

When a module gets unloaded it checks whether any of its tags are still in
use and if so, we keep the memory containing module's alloc

alloc_tag: allocate percpu counters for module tags dynamically

When a module gets unloaded it checks whether any of its tags are still in
use and if so, we keep the memory containing module's allocation tags
alive until all tags are unused. However percpu counters referenced by
the tags are freed by free_module(). This will lead to UAF if the memory
allocated by a module is accessed after module was unloaded.

To fix this we allocate percpu counters for module allocation tags
dynamically and we keep it alive for tags which are still in use after
module unloading. This also removes the requirement of a larger
PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because
percpu memory for counters does not need to be reserved anymore.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reported-by: David Wang <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Tested-by: David Wang <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3
# 8f3ce3d9 07-Oct-2024 Sebastian Andrzej Siewior <[email protected]>

mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on certain builds.

Arnd reported a build failure due to the BUILD_BUG_ON() statement in
alloc_kmem_cache_cpus(). The test

PERCPU_DYNAMIC_EARLY_SIZE

mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on certain builds.

Arnd reported a build failure due to the BUILD_BUG_ON() statement in
alloc_kmem_cache_cpus(). The test

PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)

The factors that increase the right side of the equation:
- PAGE_SIZE > 4KiB increases KMALLOC_SHIFT_HIGH
- For the local_lock_t in kmem_cache_cpu:
- PREEMPT_RT adds an actual lock.
- LOCKDEP increases the size of the lock.
- LOCK_STAT adds additional bytes plus padding to the lockdep
structure.

The net difference with and without PREEMPT_RT is 88 bytes for the
lock_lock_t, 96 bytes for kmem_cache_cpu due to additional padding. This
is enough to exceed the 80KiB limit with 16KiB page size - the 8KiB page
size is fine.

Increase PERCPU_DYNAMIC_SIZE_SHIFT to 13 on configs with PAGE_SIZE larger
than 4KiB and LOCKDEP enabled.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: d8fccd9ca5f9 ("arm64: Allow to enable PREEMPT_RT.")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Reported-by: Arnd Bergmann <[email protected]>
Closes: https://lore.kernel.org/[email protected]
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3
# 47baed6a 07-Aug-2024 Jianhui Zhou <[email protected]>

percpu: remove pcpu_alloc_size()

pcpu_alloc_size() was added in 7ac5c53e0073 "mm/percpu.c: introduce
pcpu_alloc_size()", which is used to get the allocated memory size in bpf.
However, pcpu_alloc_si

percpu: remove pcpu_alloc_size()

pcpu_alloc_size() was added in 7ac5c53e0073 "mm/percpu.c: introduce
pcpu_alloc_size()", which is used to get the allocated memory size in bpf.
However, pcpu_alloc_size() is no longer used in "bpf: Use c->unit_size to
select target cache during free" because its actuall allocated memory size
may change at runtime due to its slab merging mechanism. Therefore,
pcpu_alloc_size() can be removed.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jianhui Zhou <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: JonasZhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 7f36688f 28-May-2024 Yury Norov <[email protected]>

cpumask: cleanup core headers inclusion

Many core headers include cpumask.h for nothing. Drop it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury No

cpumask: cleanup core headers inclusion

Many core headers include cpumask.h for nothing. Drop it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Cc: Amit Daniel Kachhap <[email protected]>
Cc: Anna-Maria Behnsen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Yury Norov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5
# 2c321f3f 15-Apr-2024 Suren Baghdasaryan <[email protected]>

mm: change inlined allocation helpers to account at the call site

Main goal of memory allocation profiling patchset is to provide accounting
that is cheap enough to run in production. To achieve th

mm: change inlined allocation helpers to account at the call site

Main goal of memory allocation profiling patchset is to provide accounting
that is cheap enough to run in production. To achieve that we inject
counters using codetags at the allocation call sites to account every time
allocation is made. This injection allows us to perform accounting
efficiently because injected counters are immediately available as opposed
to the alternative methods, such as using _RET_IP_, which would require
counter lookup and appropriate locking that makes accounting much more
expensive. This method requires all allocation functions to inject
separate counters at their call sites so that their callers can be
individually accounted. Counter injection is implemented by allocation
hooks which should wrap all allocation functions.

Inlined functions which perform allocations but do not use allocation
hooks are directly charged for the allocations they perform. In most
cases these functions are just specialized allocation wrappers used from
multiple places to allocate objects of a specific type. It would be more
useful to do the accounting at their call sites instead. Instrument these
helpers to do accounting at the call site. Simple inlined allocation
wrappers are converted directly into macros. More complex allocators or
allocators with documentation are converted into _noprof versions and
allocation hooks are added. This allows memory allocation profiling
mechanism to charge allocations to the callers of these functions.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Jan Kara <[email protected]> [jbd2]
Cc: Anna Schumaker <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Tissoires <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Jakub Sitnicki <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# 24e44cc2 21-Mar-2024 Suren Baghdasaryan <[email protected]>

mm: percpu: enable per-cpu allocation tagging

Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu
to record allocations and deallocations done by these functions.

[surenb@google

mm: percpu: enable per-cpu allocation tagging

Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu
to record allocations and deallocations done by these functions.

[[email protected]: undo _noprof additions in the documentation]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Tested-by: Kees Cook <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: "Björn Roy Baron" <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# ccdabb1d 21-Mar-2024 Suren Baghdasaryan <[email protected]>

mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags

As each allocation tag generates a per-cpu variable, more space is
required to store them. Increase PERCPU_MODULE_RESERVE t

mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags

As each allocation tag generates a per-cpu variable, more space is
required to store them. Increase PERCPU_MODULE_RESERVE to provide enough
area. A better long-term solution would be to allocate this memory
dynamically.

[[email protected]: increase PERCPU_MODULE_RESERVE to accommodate allocation tags]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Tested-by: Kees Cook <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: "Björn Roy Baron" <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7
# b460bc83 20-Oct-2023 Hou Tao <[email protected]>

mm/percpu.c: introduce pcpu_alloc_size()

Introduce pcpu_alloc_size() to get the size of the dynamic per-cpu
area. It will be used by bpf memory allocator in the following patches.
BPF memory allocat

mm/percpu.c: introduce pcpu_alloc_size()

Introduce pcpu_alloc_size() to get the size of the dynamic per-cpu
area. It will be used by bpf memory allocator in the following patches.
BPF memory allocator maintains per-cpu area caches for multiple area
sizes and its free API only has the to-be-freed per-cpu pointer, so it
needs the size of dynamic per-cpu area to select the corresponding cache
when bpf program frees the dynamic per-cpu pointer.

Acked-by: Dennis Zhou <[email protected]>
Signed-off-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

show more ...


Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2
# 3c615294 14-Jul-2023 GONG, Ruiqi <[email protected]>

Randomized slab caches for kmalloc()

When exploiting memory vulnerabilities, "heap spraying" is a common
technique targeting those related to dynamic memory allocation (i.e. the
"heap"), and it play

Randomized slab caches for kmalloc()

When exploiting memory vulnerabilities, "heap spraying" is a common
technique targeting those related to dynamic memory allocation (i.e. the
"heap"), and it plays an important role in a successful exploitation.
Basically, it is to overwrite the memory area of vulnerable object by
triggering allocation in other subsystems or modules and therefore
getting a reference to the targeted memory location. It's usable on
various types of vulnerablity including use after free (UAF), heap out-
of-bound write and etc.

There are (at least) two reasons why the heap can be sprayed: 1) generic
slab caches are shared among different subsystems and modules, and
2) dedicated slab caches could be merged with the generic ones.
Currently these two factors cannot be prevented at a low cost: the first
one is a widely used memory allocation mechanism, and shutting down slab
merging completely via `slub_nomerge` would be overkill.

To efficiently prevent heap spraying, we propose the following approach:
to create multiple copies of generic slab caches that will never be
merged, and random one of them will be used at allocation. The random
selection is based on the address of code that calls `kmalloc()`, which
means it is static at runtime (rather than dynamically determined at
each time of allocation, which could be bypassed by repeatedly spraying
in brute force). In other words, the randomness of cache selection will
be with respect to the code address rather than time, i.e. allocations
in different code paths would most likely pick different caches,
although kmalloc() at each place would use the same cache copy whenever
it is executed. In this way, the vulnerable object and memory allocated
in other subsystems and modules will (most probably) be on different
slab caches, which prevents the object from being sprayed.

Meanwhile, the static random selection is further enhanced with a
per-boot random seed, which prevents the attacker from finding a usable
kmalloc that happens to pick the same cache with the vulnerable
subsystem/module by analyzing the open source code. In other words, with
the per-boot seed, the random selection is static during each time the
system starts and runs, but not across different system startups.

The overhead of performance has been tested on a 40-core x86 server by
comparing the results of `perf bench all` between the kernels with and
without this patch based on the latest linux-next kernel, which shows
minor difference. A subset of benchmarks are listed below:

sched/ sched/ syscall/ mem/ mem/
messaging pipe basic memcpy memset
(sec) (sec) (sec) (GB/sec) (GB/sec)

control1 0.019 5.459 0.733 15.258789 51.398026
control2 0.019 5.439 0.730 16.009221 48.828125
control3 0.019 5.282 0.735 16.009221 48.828125
control_avg 0.019 5.393 0.733 15.759077 49.684759

experiment1 0.019 5.374 0.741 15.500992 46.502976
experiment2 0.019 5.440 0.746 16.276042 51.398026
experiment3 0.019 5.242 0.752 15.258789 51.398026
experiment_avg 0.019 5.352 0.746 15.678608 49.766343

The overhead of memory usage was measured by executing `free` after boot
on a QEMU VM with 1GB total memory, and as expected, it's positively
correlated with # of cache copies:

control 4 copies 8 copies 16 copies

total 969.8M 968.2M 968.2M 968.2M
used 20.0M 21.9M 24.1M 26.7M
free 936.9M 933.6M 931.4M 928.6M
available 932.2M 928.8M 926.6M 923.9M

Co-developed-by: Xiu Jianfeng <[email protected]>
Signed-off-by: Xiu Jianfeng <[email protected]>
Signed-off-by: GONG, Ruiqi <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>
Acked-by: Dennis Zhou <[email protected]> # percpu
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


Revision tags: v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4
# 54da6a09 26-May-2023 Peter Zijlstra <[email protected]>

locking: Introduce __cleanup() based infrastructure

Use __attribute__((__cleanup__(func))) to build:

- simple auto-release pointers using __free()

- 'classes' with constructor and destructor sem

locking: Introduce __cleanup() based infrastructure

Use __attribute__((__cleanup__(func))) to build:

- simple auto-release pointers using __free()

- 'classes' with constructor and destructor semantics for
scope-based resource management.

- lock guards based on the above classes.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/20230612093537.614161713%40infradead.org

show more ...


Revision tags: v6.4-rc3
# 6ca0f81c 17-May-2023 Arnd Bergmann <[email protected]>

mm: percpu: unhide pcpu_embed_first_chunk prototype

Patch series "mm/init/kernel: missing-prototypes warnings".

These are patches addressing -Wmissing-prototypes warnings in common
kernel code and

mm: percpu: unhide pcpu_embed_first_chunk prototype

Patch series "mm/init/kernel: missing-prototypes warnings".

These are patches addressing -Wmissing-prototypes warnings in common
kernel code and memory management code files that usually get merged
through the -mm tree.


This patch (of 12):

This function is called whenever CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK or
CONFIG_HAVE_SETUP_PER_CPU_AREA, but only declared when the former is set:

mm/percpu.c:3055:12: error: no previous prototype for 'pcpu_embed_first_chunk' [-Werror=missing-prototypes]

There is no real point in hiding declarations, so just remove
the #ifdef here.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Moore <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Russell King <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5
# e8753e41 13-Nov-2022 Baoquan He <[email protected]>

percpu: adjust the value of PERCPU_DYNAMIC_EARLY_SIZE

LKP reported a build failure as below on the following patch "mm/slub,
percpu: correct the calculation of early percpu allocation size"

~~~~~~

percpu: adjust the value of PERCPU_DYNAMIC_EARLY_SIZE

LKP reported a build failure as below on the following patch "mm/slub,
percpu: correct the calculation of early percpu allocation size"

~~~~~~
In file included from <command-line>:
In function 'alloc_kmem_cache_cpus',
inlined from 'kmem_cache_open' at mm/slub.c:4340:6:
>> >> include/linux/compiler_types.h:357:45: error: call to '__compiletime_assert_474' declared with attribute error:
BUILD_BUG_ON failed: PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)
357 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
~~~~~~

From the kernel config file provided by LKP, the building was made on
arm64 with below Kconfig item enabled:

CONFIG_ZONE_DMA=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_SLUB_STATS=y
CONFIG_ARM64_PAGE_SHIFT=16
CONFIG_ARM64_64K_PAGES=y

Then we will have:
NR_KMALLOC_TYPES:4
KMALLOC_SHIFT_HIGH:17
sizeof(struct kmem_cache_cpu):184

The product of them is 12512, which is bigger than PERCPU_DYNAMIC_EARLY_SIZE,
12K. Hence, the BUILD_BUG_ON in alloc_kmem_cache_cpus() is triggered.

Earlier, in commit 099a19d91ca4 ("percpu: allow limited allocation
before slab is online"), PERCPU_DYNAMIC_EARLY_SIZE was introduced and
set to 12K which is equal to the then PERPCU_DYNAMIC_RESERVE.
Later, in commit 1a4d76076cda ("percpu: implement asynchronous chunk
population"), PERPCU_DYNAMIC_RESERVE was increased by 8K, while
PERCPU_DYNAMIC_EARLY_SIZE was kept unchanged.

So, here increase PERCPU_DYNAMIC_EARLY_SIZE by 8K too to accommodate to
the slub's requirement.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


Revision tags: v6.1-rc4, v6.1-rc3
# d667c949 24-Oct-2022 Baoquan He <[email protected]>

mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTS

Since commit 40064aeca35c ("percpu: replace area map allocator with
bitmap"), there's no place to use PERCPU_DYNAMIC_EARLY_SLOTS. So
clean it up.

mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTS

Since commit 40064aeca35c ("percpu: replace area map allocator with
bitmap"), there's no place to use PERCPU_DYNAMIC_EARLY_SLOTS. So
clean it up.

Signed-off-by: Baoquan He <[email protected]>
Signed-off-by: Dennis Zhou <[email protected]>

show more ...


Revision tags: v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1
# 20c03576 20-Jan-2022 Kefeng Wang <[email protected]>

mm: percpu: add generic pcpu_populate_pte() function

With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_popula

mm: percpu: add generic pcpu_populate_pte() function

With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_populate_pte(), which is marked __weak and used on most
architectures, but it is overridden on x86, which has its own
implementation.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


# 23f91716 20-Jan-2022 Kefeng Wang <[email protected]>

mm: percpu: add generic pcpu_fc_alloc/free funciton

With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architec

mm: percpu: add generic pcpu_fc_alloc/free funciton

With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architecture.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


# 1ca3fb3a 20-Jan-2022 Kefeng Wang <[email protected]>

mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef

Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding nod

mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef

Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding node by it, this is prepare for the next patch.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1
# a0ddee65 12-Nov-2021 Andy Shevchenko <[email protected]>

printk: Remove printk.h inclusion in percpu.h

After the commit 42a0bb3f7138 ("printk/nmi: generic solution for safe
printk in NMI") the printk.h is not needed anymore in percpu.h.

Moreover `make he

printk: Remove printk.h inclusion in percpu.h

After the commit 42a0bb3f7138 ("printk/nmi: generic solution for safe
printk in NMI") the printk.h is not needed anymore in percpu.h.

Moreover `make headerdep` complains (an excerpt)

In file included from linux/printk.h,
from linux/dynamic_debug.h:188
from linux/printk.h:559 <-- here
from linux/percpu.h:9
from linux/idr.h:17
include/net/9p/client.h:13: warning: recursive header inclusion

Yeah, it's not a root cause of this, but removing will help to reduce
the noise.

Fixes: 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 17197dd4 05-Nov-2021 Kees Cook <[email protected]>

percpu: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
appropriate percpu allocator interfaces, to provide additional hintin

percpu: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
appropriate percpu allocator interfaces, to provide additional hinting
for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.

Note that due to the implementation of the percpu API, this is unlikely
to ever actually provide compile-time checking beyond very simple
non-SMP builds. But, since they are technically allocators, mark them
as such.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Daniel Micay <[email protected]>
Signed-off-by: Daniel Micay <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2
# 163fa234 03-Jul-2019 Kefeng Wang <[email protected]>

percpu: Make pcpu_setup_first_chunk() void function

pcpu_setup_first_chunk() will panic or BUG_ON if the are some
error and doesn't return any error, hence it can be defined to
return void.

Reporte

percpu: Make pcpu_setup_first_chunk() void function

pcpu_setup_first_chunk() will panic or BUG_ON if the are some
error and doesn't return any error, hence it can be defined to
return void.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
Signed-off-by: Dennis Zhou <[email protected]>
[Dennis: fixed kbuild warning for pcpu_page_first_chunk()]

show more ...


Revision tags: v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7
# b239f7da 13-Feb-2019 Dennis Zhou <[email protected]>

percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE

Previously, block size was flexible based on the constraint that the
GCD(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) > 1. However, this carried the
overhead th

percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE

Previously, block size was flexible based on the constraint that the
GCD(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) > 1. However, this carried the
overhead that keeping a floating number of populated free pages required
scanning over the free regions of a chunk.

Setting the block size to be fixed at PAGE_SIZE lets us know when an
empty page becomes used as we will break a full contig_hint of a block.
This means we no longer have to scan the whole chunk upon breaking a
contig_hint which empty page management piggybacked off. A later patch
takes advantage of this to optimize the allocation path by only scanning
forward using the scan_hint introduced later too.

Signed-off-by: Dennis Zhou <[email protected]>
Reviewed-by: Peng Fan <[email protected]>

show more ...


Revision tags: v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1
# 7e8a6304 22-Aug-2018 Dennis Zhou (Facebook) <[email protected]>

/proc/meminfo: add percpu populated pages count

Currently, percpu memory only exposes allocation and utilization
information via debugfs. This more or less is only really useful for
understanding t

/proc/meminfo: add percpu populated pages count

Currently, percpu memory only exposes allocation and utilization
information via debugfs. This more or less is only really useful for
understanding the fragmentation and allocation information at a per-chunk
level with a few global counters. This is also gated behind a config.
BPF and cgroup, for example, have seen an increase in use causing
increased use of percpu memory. Let's make it easier for someone to
identify how much memory is being used.

This patch adds the "Percpu" stat to meminfo to more easily look up how
much percpu memory is in use. This number includes the cost for all
allocated backing pages and not just insight at the per a unit, per chunk
level. Metadata is excluded. I think excluding metadata is fair because
the backing memory scales with the numbere of cpus and can quickly
outweigh the metadata. It also makes this calculation light.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dennis Zhou <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8
# b2441318 01-Nov-2017 Greg Kroah-Hartman <[email protected]>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <[email protected]>
Reviewed-by: Philippe Ombredanne <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

show more ...


Revision tags: v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3
# b185cd0d 24-Jul-2017 Dennis Zhou (Facebook) <[email protected]>

percpu: update free path to take advantage of contig hints

The bitmap allocator must keep metadata consistent. The easiest way is
to scan after every allocation for each affected block and the entir

percpu: update free path to take advantage of contig hints

The bitmap allocator must keep metadata consistent. The easiest way is
to scan after every allocation for each affected block and the entire
chunk. This is rather expensive.

The free path can take advantage of current contig hints to prevent
scanning within the start and end block. If a scan is needed, it can
be done by scanning backwards from the start and forwards from the end
to identify the entire free area this can be combined with. The blocks
can then be updated by some basic checks rather than complete block
scans.

A chunk scan happens when the freed area makes a page free, a block
free, or spans across blocks. This is necessary as the contig hint at
this point could span across blocks. The check uses the minimum of page
size and the block size to allow for variable sized blocks. There is a
tradeoff here with not updating after every free. It is possible a
contig hint in one block can be merged with the contig hint in the next
block. This means the contig hint can be off by up to a page. However,
if the chunk's contig hint is contained in one block, the contig hint
will be accurate.

Signed-off-by: Dennis Zhou <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

show more ...


# ca460b3c 24-Jul-2017 Dennis Zhou (Facebook) <[email protected]>

percpu: introduce bitmap metadata blocks

This patch introduces the bitmap metadata blocks and adds the skeleton
of the code that will be used to maintain these blocks. Each chunk's
bitmap is made u

percpu: introduce bitmap metadata blocks

This patch introduces the bitmap metadata blocks and adds the skeleton
of the code that will be used to maintain these blocks. Each chunk's
bitmap is made up of full metadata blocks. These blocks maintain basic
metadata to help prevent scanning unnecssarily to update hints. Full
scanning methods are used for the skeleton and will be replaced in the
coming patches. A number of helper functions are added as well to do
conversion of pages to blocks and manage offsets. Comments will be
updated as the final version of each function is added.

There exists a relationship between PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE,
the region size, and unit_size. Every chunk's region (including offsets)
is page aligned at the beginning to preserve alignment. The end is
aligned to LCM(PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE) to ensure that the end
can fit with the populated page map which is by page and every metadata
block is fully accounted for. The unit_size is already page aligned, but
must also be aligned with PCPU_BITMAP_BLOCK_SIZE to ensure full metadata
blocks.

Signed-off-by: Dennis Zhou <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

show more ...


Revision tags: v4.13-rc2, v4.13-rc1
# 40064aec 12-Jul-2017 Dennis Zhou (Facebook) <[email protected]>

percpu: replace area map allocator with bitmap

The percpu memory allocator is experiencing scalability issues when
allocating and freeing large numbers of counters as in BPF.
Additionally, there is

percpu: replace area map allocator with bitmap

The percpu memory allocator is experiencing scalability issues when
allocating and freeing large numbers of counters as in BPF.
Additionally, there is a corner case where iteration is triggered over
all chunks if the contig_hint is the right size, but wrong alignment.

This patch replaces the area map allocator with a basic bitmap allocator
implementation. Each subsequent patch will introduce new features and
replace full scanning functions with faster non-scanning options when
possible.

Implementation:
This patchset removes the area map allocator in favor of a bitmap
allocator backed by metadata blocks. The primary goal is to provide
consistency in performance and memory footprint with a focus on small
allocations (< 64 bytes). The bitmap removes the heavy memmove from the
freeing critical path and provides a consistent memory footprint. The
metadata blocks provide a bound on the amount of scanning required by
maintaining a set of hints.

In an effort to make freeing fast, the metadata is updated on the free
path if the new free area makes a page free, a block free, or spans
across blocks. This causes the chunk's contig hint to potentially be
smaller than what it could allocate by up to the smaller of a page or a
block. If the chunk's contig hint is contained within a block, a check
occurs and the hint is kept accurate. Metadata is always kept accurate
on allocation, so there will not be a situation where a chunk has a
later contig hint than available.

Evaluation:
I have primarily done testing against a simple workload of allocation of
1 million objects (2^20) of varying size. Deallocation was done by in
order, alternating, and in reverse. These numbers were collected after
rebasing ontop of a80099a152. I present the worst-case numbers here:

Area Map Allocator:

Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 310 | 4770
16B | 557 | 1325
64B | 436 | 273
256B | 776 | 131
1024B | 3280 | 122

Bitmap Allocator:

Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 490 | 70
16B | 515 | 75
64B | 610 | 80
256B | 950 | 100
1024B | 3520 | 200

This data demonstrates the inability for the area map allocator to
handle less than ideal situations. In the best case of reverse
deallocation, the area map allocator was able to perform within range
of the bitmap allocator. In the worst case situation, freeing took
nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator
dramatically improves the consistency of the free path. The small
allocations performed nearly identical regardless of the freeing
pattern.

While it does add to the allocation latency, the allocation scenario
here is optimal for the area map allocator. The area map allocator runs
into trouble when it is allocating in chunks where the latter half is
full. It is difficult to replicate this, so I present a variant where
the pages are second half filled. Freeing was done sequentially. Below
are the numbers for this scenario:

Area Map Allocator:

Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 4118 | 4892
16B | 1651 | 1163
64B | 598 | 285
256B | 771 | 158
1024B | 3034 | 160

Bitmap Allocator:

Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 481 | 67
16B | 506 | 69
64B | 636 | 75
256B | 892 | 90
1024B | 3262 | 147

The data shows a parabolic curve of performance for the area map
allocator. This is due to the memmove operation being the dominant cost
with the lower object sizes as more objects are packed in a chunk and at
higher object sizes, the traversal of the chunk slots is the dominating
cost. The bitmap allocator suffers this problem as well. The above data
shows the inability to scale for the allocation path with the area map
allocator and that the bitmap allocator demonstrates consistent
performance in general.

The second problem of additional scanning can result in the area map
allocator completing in 52 minutes when trying to allocate 1 million
4-byte objects with 8-byte alignment. The same workload takes
approximately 16 seconds to complete for the bitmap allocator.

V2:
Fixed a bug in pcpu_alloc_first_chunk end_offset was setting the bitmap
using bytes instead of bits.

Added a comment to pcpu_cnt_pop_pages to explain bitmap_weight.

Signed-off-by: Dennis Zhou <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

show more ...


123456