History log of /linux-6.15/include/linux/slab.h (Results 1 – 25 of 289)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3
# 31041385 13-Feb-2025 Suren Baghdasaryan <[email protected]>

mm: make vma cache SLAB_TYPESAFE_BY_RCU

To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that
object reuse before RCU grace period is over will be detected by
lock_vma_under_rcu().

Cu

mm: make vma cache SLAB_TYPESAFE_BY_RCU

To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that
object reuse before RCU grace period is over will be detected by
lock_vma_under_rcu().

Current checks are sufficient as long as vma is detached before it is
freed. The only place this is not currently happening is in exit_mmap().
Add the missing vma_mark_detached() in exit_mmap().

Another issue which might trick lock_vma_under_rcu() during vma reuse is
vm_area_dup(), which copies the entire content of the vma into a new one,
overriding new vma's vm_refcnt and temporarily making it appear as
attached. This might trick a racing lock_vma_under_rcu() to operate on a
reused vma if it found the vma before it got reused. To prevent this
situation, we should ensure that vm_refcnt stays at detached state (0)
when it is copied and advances to attached state only after it is added
into the vma tree. Introduce vm_area_init_from() which preserves new
vma's vm_refcnt and use it in vm_area_dup(). Since all vmas are in
detached state with no current readers when they are freed,

lock_vma_under_rcu() will not be able to take vm_refcnt after vma got
detached even if vma is reused. vma_mark_attached() in modified to
include a release fence to ensure all stores to the vma happen before
vm_refcnt gets initialized.

Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate
vm_area_struct reuse and will minimize the number of call_rcu() calls.

[[email protected]: remove atomic_set_release() usage in tools/]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Tested-by: Shivank Garg <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Cc: Christian Brauner <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Klara Modin <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Lokesh Gidra <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sourav Panda <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 7f8ceea0 13-Feb-2025 Suren Baghdasaryan <[email protected]>

refcount: provide ops for cases when object's memory can be reused

For speculative lookups where a successful inc_not_zero() pins the object,
but where we still need to double check if the object ac

refcount: provide ops for cases when object's memory can be reused

For speculative lookups where a successful inc_not_zero() pins the object,
but where we still need to double check if the object acquired is indeed
the one we set out to acquire (identity check), needs this validation to
happen *after* the increment. Similarly, when a new object is initialized
and its memory might have been previously occupied by another object, all
stores to initialize the object should happen *before* refcount
initialization.

Notably SLAB_TYPESAFE_BY_RCU is one such an example when this ordering is
required for reference counting.

Add refcount_{add|inc}_not_zero_acquire() to guarantee the proper ordering
between acquiring a reference count on an object and performing the
identity check for that object.

Add refcount_set_release() to guarantee proper ordering between stores
initializing object attributes and the store initializing the refcount.
refcount_set_release() should be done after all other object attributes
are initialized. Once refcount_set_release() is called, the object should
be considered visible to other tasks even if it was not yet added into an
object collection normally used to discover it. This is because other
tasks might have discovered the object previously occupying the same
memory and after memory reuse they can succeed in taking refcount for the
new object and start using it.

Object reuse example to consider:

consumer:
obj = lookup(collection, key);
if (!refcount_inc_not_zero_acquire(&obj->ref))
return;
if (READ_ONCE(obj->key) != key) { /* identity check */
put_ref(obj);
return;
}
use(obj->value);

producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key;
obj->value = new_value;
refcount_set_release(obj->ref, 1);
add(collection, new_key, obj);

refcount_{add|inc}_not_zero_acquire() is required to prevent the following
reordering when refcount_inc_not_zero() is used instead:

consumer:
obj = lookup(collection, key);
if (READ_ONCE(obj->key) != key) { /* reordered identity check */
put_ref(obj);
return;
}
producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key;
obj->value = new_value;
refcount_set_release(obj->ref, 1);
add(collection, new_key, obj);

if (!refcount_inc_not_zero(&obj->ref))
return;
use(obj->value); /* USING WRONG OBJECT */

refcount_set_release() is required to prevent the following reordering
when refcount_set() is used instead:

consumer:
obj = lookup(collection, key);

producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key; /* new_key == old_key */
refcount_set(obj->ref, 1);

if (!refcount_inc_not_zero_acquire(&obj->ref))
return;
if (READ_ONCE(obj->key) != key) { /* pass since new_key == old_key */
put_ref(obj);
return;
}
use(obj->value); /* USING STALE obj->value */

obj->value = new_value; /* reordered store */
add(collection, key, obj);

[[email protected]: fix title underlines in refcount-vs-atomic.rst]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Vlastimil Babka <[email protected]> [slab]
Tested-by: Shivank Garg <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Klara Modin <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Lokesh Gidra <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sourav Panda <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 939c5de3 01-Mar-2025 Ye Bin <[email protected]>

mm/slab: call kmalloc_noprof() unconditionally in kmalloc_array_noprof()

If 'n' or 'size' isn't builtin constant, we used to call __kmalloc()
before commit 7bd230a26648 ("mm/slab: enable slab alloca

mm/slab: call kmalloc_noprof() unconditionally in kmalloc_array_noprof()

If 'n' or 'size' isn't builtin constant, we used to call __kmalloc()
before commit 7bd230a26648 ("mm/slab: enable slab allocation tagging for
kmalloc and friends"), which inadvertedly changed both paths to
kmalloc_noprof().

As Harry Yoo points out we can just call kmalloc_noprof()
unconditionally. If the compiler knows n and size are constants it
doesn't guarantee that bytes will be also seen as constant, and that is
the important test in kmalloc_noprof() anyway, so we can just defer to
it always.

[ [email protected]: change as Harry suggested and adjust commit log ]

Fixes: 7bd230a26648 ("mm/slab: enable slab allocation tagging for kmalloc and friends")
Signed-off-by: Ye Bin <[email protected]>
Reviewed-by: Harry Yoo <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


Revision tags: v6.14-rc2
# c9f8f124 03-Feb-2025 Vlastimil Babka <[email protected]>

slab: don't batch kvfree_rcu() with SLUB_TINY

kvfree_rcu() is batched for better performance except on TINY_RCU, which
is a simple implementation for small UP systems. Similarly SLUB_TINY is
an opti

slab: don't batch kvfree_rcu() with SLUB_TINY

kvfree_rcu() is batched for better performance except on TINY_RCU, which
is a simple implementation for small UP systems. Similarly SLUB_TINY is
an option intended for small systems, whether or not used together with
TINY_RCU. In case SLUB_TINY is used with !TINY_RCU, it makes arguably
sense to not do the batching and limit the memory footprint. It's also
suboptimal to have RCU-specific #ifdefs in slab code.

With that, add CONFIG_KVFREE_RCU_BATCHED to determine whether batching
kvfree_rcu() implementation is used. It is not set by a user prompt, but
enabled by default and disabled in case TINY_RCU or SLUB_TINY are
enabled.

Use the new config for #ifdef's in slab code and extend their scope to
cover all code used by the batched kvfree_rcu(). For example there's no
need to perform kvfree_rcu_init() if the batching is disabled.

Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# b14ff274 03-Feb-2025 Vlastimil Babka <[email protected]>

slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB

Following the move of TREE_RCU implementation, let's move also the
TINY_RCU one for consistency and subsequent refactoring.

For simplicity,

slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB

Following the move of TREE_RCU implementation, let's move also the
TINY_RCU one for consistency and subsequent refactoring.

For simplicity, remove the separate inline __kvfree_call_rcu() as
TINY_RCU is not meant for high-performance hardware anyway.

Declare kvfree_call_rcu() in rcupdate.h to avoid header dependency
issues.

Also move the kvfree_rcu_barrier() declaration to slab.h

Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# bbe658d6 12-Dec-2024 Uladzislau Rezki (Sony) <[email protected]>

mm/slab: Move kvfree_rcu() into SLAB

Move kvfree_rcu() functionality to the slab_common.c file.

The reason to have kvfree_rcu() functionality as part of SLAB is that
there is a clear trend and need

mm/slab: Move kvfree_rcu() into SLAB

Move kvfree_rcu() functionality to the slab_common.c file.

The reason to have kvfree_rcu() functionality as part of SLAB is that
there is a clear trend and need of closer integration. One of the recent
example is creating a barrier function for SLAB caches.

Another reason is to prevent of having several implementations of RCU
machinery for reclaiming objects after a GP. As future steps, it can be
more integrated(easier) with SLAB internals.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Acked-by: Hyeonggon Yoo <[email protected]>
Tested-by: Hyeonggon Yoo <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3
# b6da9401 09-Oct-2024 Vlastimil Babka <[email protected]>

mm, slab: add kerneldocs for common SLAB_ flags

We have many SLAB_ flags but many are used only internally, by kunit
tests or debugging subsystems cooperating with slab, or are set
according to slab

mm, slab: add kerneldocs for common SLAB_ flags

We have many SLAB_ flags but many are used only internally, by kunit
tests or debugging subsystems cooperating with slab, or are set
according to slab_debug boot parameter.

Create kerneldocs for the commonly used flags that may be passed to
kmem_cache_create(). SLAB_TYPESAFE_BY_RCU already had a detailed
description, so turn it to a kerneldoc. Add some details for
SLAB_ACCOUNT, SLAB_RECLAIM_ACCOUNT and SLAB_HWCACHE_ALIGN. Reference
them from the __kmem_cache_create_args() kerneldoc.

Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11
# 1e562dea 10-Sep-2024 Lukas Wunner <[email protected]>

crypto: rsassa-pkcs1 - Migrate to sig_alg backend

A sig_alg backend has just been introduced with the intent of moving all
asymmetric sign/verify algorithms to it one by one.

Migrate the sign/verif

crypto: rsassa-pkcs1 - Migrate to sig_alg backend

A sig_alg backend has just been introduced with the intent of moving all
asymmetric sign/verify algorithms to it one by one.

Migrate the sign/verify operations from rsa-pkcs1pad.c to a separate
rsassa-pkcs1.c which uses the new backend.

Consequently there are now two templates which build on the "rsa"
akcipher_alg:

* The existing "pkcs1pad" template, which is instantiated as an
akcipher_instance and retains the encrypt/decrypt operations of
RSAES-PKCS1-v1_5 (RFC 8017 sec 7.2).

* The new "pkcs1" template, which is instantiated as a sig_instance
and contains the sign/verify operations of RSASSA-PKCS1-v1_5
(RFC 8017 sec 8.2).

In a separate step, rsa-pkcs1pad.c could optionally be renamed to
rsaes-pkcs1.c for clarity. Additional "oaep" and "pss" templates
could be added for RSAES-OAEP and RSASSA-PSS.

Note that it's currently allowed to allocate a "pkcs1pad(rsa)" transform
without specifying a hash algorithm. That makes sense if the transform
is only used for encrypt/decrypt and continues to be supported. But for
sign/verify, such transforms previously did not insert the Full Hash
Prefix into the padding. The resulting message encoding was incompliant
with EMSA-PKCS1-v1_5 (RFC 8017 sec 9.2) and therefore nonsensical.

From here on in, it is no longer allowed to allocate a transform without
specifying a hash algorithm if the transform is used for sign/verify
operations. This simplifies the code because the insertion of the Full
Hash Prefix is no longer optional, so various "if (digest_info)" clauses
can be removed.

There has been a previous attempt to forbid transform allocation without
specifying a hash algorithm, namely by commit c0d20d22e0ad ("crypto:
rsa-pkcs1pad - Require hash to be present"). It had to be rolled back
with commit b3a8c8a5ebb5 ("crypto: rsa-pkcs1pad: Allow hash to be
optional [ver #2]"), presumably because it broke allocation of a
transform which was solely used for encrypt/decrypt, not sign/verify.
Avoid such breakage by allowing transform allocation for encrypt/decrypt
with and without specifying a hash algorithm (and simply ignoring the
hash algorithm in the former case).

So again, specifying a hash algorithm is now mandatory for sign/verify,
but optional and ignored for encrypt/decrypt.

The new sig_alg API uses kernel buffers instead of sglists, which
avoids the overhead of copying signature and digest from sglists back
into kernel buffers. rsassa-pkcs1.c is thus simplified quite a bit.

sig_alg is always synchronous, whereas the underlying "rsa" akcipher_alg
may be asynchronous. So await the result of the akcipher_alg, similar
to crypto_akcipher_sync_{en,de}crypt().

As part of the migration, rename "rsa_digest_info" to "hash_prefix" to
adhere to the spec language in RFC 9580. Otherwise keep the code
unmodified wherever possible to ease reviewing and bisecting. Leave
several simplification and hardening opportunities to separate commits.

rsassa-pkcs1.c uses modern __free() syntax for allocation of buffers
which need to be freed by kfree_sensitive(), hence a DEFINE_FREE()
clause for kfree_sensitive() is introduced herein as a byproduct.

Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

show more ...


# 4b7ff9ab 13-Sep-2024 Vlastimil Babka <[email protected]>

mm, slab: restore kerneldoc for kmem_cache_create()

As kmem_cache_create() became a _Generic() wrapper macro, it currently
has no kerneldoc despite being the main API to use. Add it. Also adjust
kme

mm, slab: restore kerneldoc for kmem_cache_create()

As kmem_cache_create() became a _Generic() wrapper macro, it currently
has no kerneldoc despite being the main API to use. Add it. Also adjust
kmem_cache_create_usercopy() kerneldoc to indicate it is now a legacy
wrapper.

Also expand the kerneldoc for struct kmem_cache_args, especially for the
freeptr_offset field, where important details were removed with the
removal of kmem_cache_create_rcu().

Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.11-rc7
# 781aee75 05-Sep-2024 Christian Brauner <[email protected]>

slab: make __kmem_cache_create() static inline

Make __kmem_cache_create() a static inline function.

Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <rpp

slab: make __kmem_cache_create() static inline

Make __kmem_cache_create() a static inline function.

Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 0c9050b0 05-Sep-2024 Christian Brauner <[email protected]>

slab: make kmem_cache_create_usercopy() static inline

Make kmem_cache_create_usercopy() a static inline function.

Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Mike Rapoport (M

slab: make kmem_cache_create_usercopy() static inline

Make kmem_cache_create_usercopy() a static inline function.

Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 3d453e60 05-Sep-2024 Christian Brauner <[email protected]>

slab: remove kmem_cache_create_rcu()

Now that we have ported all users of kmem_cache_create_rcu() to struct
kmem_cache_args the function is unused and can be removed.

Reviewed-by: Kees Cook <kees@k

slab: remove kmem_cache_create_rcu()

Now that we have ported all users of kmem_cache_create_rcu() to struct
kmem_cache_args the function is unused and can be removed.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# b2e7456b 05-Sep-2024 Christian Brauner <[email protected]>

slab: create kmem_cache_create() compatibility layer

Use _Generic() to create a compatibility layer that type switches on the
third argument to either call __kmem_cache_create() or
__kmem_cache_crea

slab: create kmem_cache_create() compatibility layer

Use _Generic() to create a compatibility layer that type switches on the
third argument to either call __kmem_cache_create() or
__kmem_cache_create_args(). If NULL is passed for the struct
kmem_cache_args argument use default args making porting for callers
that don't care about additional arguments easy.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 199cd13a 05-Sep-2024 Christian Brauner <[email protected]>

slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args

Make KMEM_CACHE_USERCOPY() use struct kmem_cache_args.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>

slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args

Make KMEM_CACHE_USERCOPY() use struct kmem_cache_args.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 052d67b4 05-Sep-2024 Christian Brauner <[email protected]>

slab: port KMEM_CACHE() to struct kmem_cache_args

Make KMEM_CACHE() use struct kmem_cache_args.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Mike

slab: port KMEM_CACHE() to struct kmem_cache_args

Make KMEM_CACHE() use struct kmem_cache_args.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 879fb3c2 05-Sep-2024 Christian Brauner <[email protected]>

slab: add struct kmem_cache_args

Currently we have multiple kmem_cache_create*() variants that take up to
seven separate parameters with one of the functions having to grow an
eigth parameter in the

slab: add struct kmem_cache_args

Currently we have multiple kmem_cache_create*() variants that take up to
seven separate parameters with one of the functions having to grow an
eigth parameter in the future to handle both usercopy and a custom
freelist pointer.

Add a struct kmem_cache_args structure and move less common parameters
into it. Core parameters such as name, object size, and flags continue
to be passed separately.

Add a new function __kmem_cache_create_args() that takes a struct
kmem_cache_args pointer and port do_kmem_cache_create_usercopy() over to
it.

In follow-up patches we will port the other kmem_cache_create*()
variants over to it as well.

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 9028cdeb 05-Sep-2024 Shakeel Butt <[email protected]>

memcg: add charging of already allocated slab objects

At the moment, the slab objects are charged to the memcg at the
allocation time. However there are cases where slab objects are
allocated at the

memcg: add charging of already allocated slab objects

At the moment, the slab objects are charged to the memcg at the
allocation time. However there are cases where slab objects are
allocated at the time where the right target memcg to charge it to is
not known. One such case is the network sockets for the incoming
connection which are allocated in the softirq context.

Couple hundred thousand connections are very normal on large loaded
server and almost all of those sockets underlying those connections get
allocated in the softirq context and thus not charged to any memcg.
However later at the accept() time we know the right target memcg to
charge. Let's add new API to charge already allocated objects, so we can
have better accounting of the memory usage.

To measure the performance impact of this change, tcp_crr is used from
the neper [1] performance suite. Basically it is a network ping pong
test with new connection for each ping pong.

The server and the client are run inside 3 level of cgroup hierarchy
using the following commands:

Server:
$ tcp_crr -6

Client:
$ tcp_crr -6 -c -H ${server_ip}

If the client and server run on different machines with 50 GBPS NIC,
there is no visible impact of the change.

For the same machine experiment with v6.11-rc5 as base.

base (throughput) with-patch
tcp_crr 14545 (+- 80) 14463 (+- 56)

It seems like the performance impact is within the noise.

Link: https://github.com/google/neper [1]
Signed-off-by: Shakeel Butt <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Yosry Ahmed <[email protected]>
Acked-by: Paolo Abeni <[email protected]> # net
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4
# 489a744e 12-Aug-2024 Danilo Krummrich <[email protected]>

mm: krealloc: clarify valid usage of __GFP_ZERO

Properly document that if __GFP_ZERO logic is requested, callers must
ensure that, starting with the initial memory allocation, every subsequent
call

mm: krealloc: clarify valid usage of __GFP_ZERO

Properly document that if __GFP_ZERO logic is requested, callers must
ensure that, starting with the initial memory allocation, every subsequent
call to this API for the same memory allocation is flagged with
__GFP_ZERO. Otherwise, it is possible that __GFP_ZERO is not fully
honored by this API.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Danilo Krummrich <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1
# 590b9d57 22-Jul-2024 Danilo Krummrich <[email protected]>

mm: kvmalloc: align kvrealloc() with krealloc()

Besides the obvious (and desired) difference between krealloc() and
kvrealloc(), there is some inconsistency in their function signatures and
behavior

mm: kvmalloc: align kvrealloc() with krealloc()

Besides the obvious (and desired) difference between krealloc() and
kvrealloc(), there is some inconsistency in their function signatures and
behavior:

- krealloc() frees the memory when the requested size is zero, whereas
kvrealloc() simply returns a pointer to the existing allocation.

- krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
kvrealloc() does not accept a NULL pointer at all and, if passed,
would fault instead.

- krealloc() is self-contained, whereas kvrealloc() relies on the caller
to provide the size of the previous allocation.

Inconsistent behavior throughout allocation APIs is error prone, hence
make kvrealloc() behave like krealloc(), which seems superior in all
mentioned aspects.

Besides that, implementing kvrealloc() by making use of krealloc() and
vrealloc() provides oppertunities to grow (and shrink) allocations more
efficiently. For instance, vrealloc() can be optimized to allocate and
map additional pages to grow the allocation or unmap and free unused pages
to shrink the allocation.

[[email protected]: document concurrency restrictions]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: disable KASAN when switching to vmalloc]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: properly document __GFP_ZERO behavior]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Danilo Krummrich <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Chandan Babu R <[email protected]>
Cc: Christian König <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# d345bd2e 28-Aug-2024 Christian Brauner <[email protected]>

mm: add kmem_cache_create_rcu()

When a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer
must be located outside of the object because we don't know what part of
the memory can safely

mm: add kmem_cache_create_rcu()

When a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer
must be located outside of the object because we don't know what part of
the memory can safely be overwritten as it may be needed to prevent
object recycling.

That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a
new cacheline. This is the case for e.g., struct file. After having it
shrunk down by 40 bytes and having it fit in three cachelines we still
have SLAB_TYPESAFE_BY_RCU adding a fourth cacheline because it needs to
accommodate the free pointer.

Add a new kmem_cache_create_rcu() function that allows the caller to
specify an offset where the free pointer is supposed to be placed.

Link: https://lore.kernel.org/r/[email protected]
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.10, v6.10-rc7
# 3a3b7fec 01-Jul-2024 Johannes Weiner <[email protected]>

mm: remove CONFIG_MEMCG_KMEM

CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab
tracking is enabled. It has been default-enabled and equivalent to
CONFIG_MEMCG for almost a decade.

mm: remove CONFIG_MEMCG_KMEM

CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab
tracking is enabled. It has been default-enabled and equivalent to
CONFIG_MEMCG for almost a decade. We've only grown more kernel memory
accounting sites since, and there is no imaginable cgroup usecase going
forward that wants to track user pages but not the multitude of
user-drivable kernel allocations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# b32801d1 01-Jul-2024 Kees Cook <[email protected]>

mm/slab: Introduce kmem_buckets_create() and family

Dedicated caches are available for fixed size allocations via
kmem_cache_alloc(), but for dynamically sized allocations there is only
the global k

mm/slab: Introduce kmem_buckets_create() and family

Dedicated caches are available for fixed size allocations via
kmem_cache_alloc(), but for dynamically sized allocations there is only
the global kmalloc API's set of buckets available. This means it isn't
possible to separate specific sets of dynamically sized allocations into
a separate collection of caches.

This leads to a use-after-free exploitation weakness in the Linux
kernel since many heap memory spraying/grooming attacks depend on using
userspace-controllable dynamically sized allocations to collide with
fixed size allocations that end up in same cache.

While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
against these kinds of "type confusion" attacks, including for fixed
same-size heap objects, we can create a complementary deterministic
defense for dynamically sized allocations that are directly user
controlled. Addressing these cases is limited in scope, so isolating these
kinds of interfaces will not become an unbounded game of whack-a-mole. For
example, many pass through memdup_user(), making isolation there very
effective.

In order to isolate user-controllable dynamically-sized
allocations from the common system kmalloc allocations, introduce
kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
kmem_buckets_alloc_track_caller() for where caller tracking is
needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
is needed. Note that these caches are specifically flagged with
SLAB_NO_MERGE, since merging would defeat the entire purpose of the
mitigation.

This can also be used in the future to extend allocation profiling's use
of code tagging to implement per-caller allocation cache isolation[1]
even for dynamic allocations.

Memory allocation pinning[2] is still needed to plug the Use-After-Free
cross-allocator weakness (where attackers can arrange to free an
entire slab page and have it reallocated to a different cache),
but that is an existing and separate issue which is complementary
to this improvement. Development continues for that feature via the
SLAB_VIRTUAL[3] series (which could also provide guard pages -- another
complementary improvement).

Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
Link: https://lore.kernel.org/lkml/[email protected]/ [3]
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 2e8000b8 01-Jul-2024 Kees Cook <[email protected]>

mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument

Plumb kmem_buckets arguments through kvmalloc_node_noprof() so it is
possible to provide an API to perform kvmalloc-sty

mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument

Plumb kmem_buckets arguments through kvmalloc_node_noprof() so it is
possible to provide an API to perform kvmalloc-style allocations with
a particular set of buckets. Introduce kvmalloc_buckets_node() that takes a
kmem_buckets argument.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 67f2df3b 01-Jul-2024 Kees Cook <[email protected]>

mm/slab: Plumb kmem_buckets into __do_kmalloc_node()

Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
support separated kmalloc buckets (in the following kmem_buckets_create()
patc

mm/slab: Plumb kmem_buckets into __do_kmalloc_node()

Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
support separated kmalloc buckets (in the following kmem_buckets_create()
patches and future codetag-based separation). Since this will provide
a mitigation for a very common case of exploits, it is recommended to
enable this feature for general purpose distros. By default, the new
Kconfig will be enabled if CONFIG_SLAB_FREELIST_HARDENED is enabled (and
it is added to the hardening.config Kconfig fragment).

To be able to choose which buckets to allocate from, make the buckets
available to the internal kmalloc interfaces by adding them as the
second argument, rather than depending on the buckets being chosen from
the fixed set of global buckets. Where the bucket is not available,
pass NULL, which means "use the default system kmalloc bucket set"
(the prior existing behavior), as implemented in kmalloc_slab().

To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
top-level macros and static inlines use the buckets argument (where
they are stripped out and compiled out respectively). The actual extern
functions can then be built without the argument, and the internals
fall back to the global kmalloc buckets unconditionally.

Co-developed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


# 72e0fe22 01-Jul-2024 Kees Cook <[email protected]>

mm/slab: Introduce kmem_buckets typedef

Encapsulate the concept of a single set of kmem_caches that are used
for the kmalloc size buckets. Redefine kmalloc_caches as an array
of these buckets (for t

mm/slab: Introduce kmem_buckets typedef

Encapsulate the concept of a single set of kmem_caches that are used
for the kmalloc size buckets. Redefine kmalloc_caches as an array
of these buckets (for the different global cache buckets).

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>

show more ...


12345678910>>...12