|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
31041385 |
| 13-Feb-2025 |
Suren Baghdasaryan <[email protected]> |
mm: make vma cache SLAB_TYPESAFE_BY_RCU
To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that object reuse before RCU grace period is over will be detected by lock_vma_under_rcu().
Cu
mm: make vma cache SLAB_TYPESAFE_BY_RCU
To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that object reuse before RCU grace period is over will be detected by lock_vma_under_rcu().
Current checks are sufficient as long as vma is detached before it is freed. The only place this is not currently happening is in exit_mmap(). Add the missing vma_mark_detached() in exit_mmap().
Another issue which might trick lock_vma_under_rcu() during vma reuse is vm_area_dup(), which copies the entire content of the vma into a new one, overriding new vma's vm_refcnt and temporarily making it appear as attached. This might trick a racing lock_vma_under_rcu() to operate on a reused vma if it found the vma before it got reused. To prevent this situation, we should ensure that vm_refcnt stays at detached state (0) when it is copied and advances to attached state only after it is added into the vma tree. Introduce vm_area_init_from() which preserves new vma's vm_refcnt and use it in vm_area_dup(). Since all vmas are in detached state with no current readers when they are freed,
lock_vma_under_rcu() will not be able to take vm_refcnt after vma got detached even if vma is reused. vma_mark_attached() in modified to include a release fence to ensure all stores to the vma happen before vm_refcnt gets initialized.
Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate vm_area_struct reuse and will minimize the number of call_rcu() calls.
[[email protected]: remove atomic_set_release() usage in tools/] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Tested-by: Shivank Garg <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Cc: Christian Brauner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Klara Modin <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sourav Panda <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
7f8ceea0 |
| 13-Feb-2025 |
Suren Baghdasaryan <[email protected]> |
refcount: provide ops for cases when object's memory can be reused
For speculative lookups where a successful inc_not_zero() pins the object, but where we still need to double check if the object ac
refcount: provide ops for cases when object's memory can be reused
For speculative lookups where a successful inc_not_zero() pins the object, but where we still need to double check if the object acquired is indeed the one we set out to acquire (identity check), needs this validation to happen *after* the increment. Similarly, when a new object is initialized and its memory might have been previously occupied by another object, all stores to initialize the object should happen *before* refcount initialization.
Notably SLAB_TYPESAFE_BY_RCU is one such an example when this ordering is required for reference counting.
Add refcount_{add|inc}_not_zero_acquire() to guarantee the proper ordering between acquiring a reference count on an object and performing the identity check for that object.
Add refcount_set_release() to guarantee proper ordering between stores initializing object attributes and the store initializing the refcount. refcount_set_release() should be done after all other object attributes are initialized. Once refcount_set_release() is called, the object should be considered visible to other tasks even if it was not yet added into an object collection normally used to discover it. This is because other tasks might have discovered the object previously occupying the same memory and after memory reuse they can succeed in taking refcount for the new object and start using it.
Object reuse example to consider:
consumer: obj = lookup(collection, key); if (!refcount_inc_not_zero_acquire(&obj->ref)) return; if (READ_ONCE(obj->key) != key) { /* identity check */ put_ref(obj); return; } use(obj->value);
producer: remove(collection, obj->key); if (!refcount_dec_and_test(&obj->ref)) return; obj->key = KEY_INVALID; free(obj); obj = malloc(); /* obj is reused */ obj->key = new_key; obj->value = new_value; refcount_set_release(obj->ref, 1); add(collection, new_key, obj);
refcount_{add|inc}_not_zero_acquire() is required to prevent the following reordering when refcount_inc_not_zero() is used instead:
consumer: obj = lookup(collection, key); if (READ_ONCE(obj->key) != key) { /* reordered identity check */ put_ref(obj); return; } producer: remove(collection, obj->key); if (!refcount_dec_and_test(&obj->ref)) return; obj->key = KEY_INVALID; free(obj); obj = malloc(); /* obj is reused */ obj->key = new_key; obj->value = new_value; refcount_set_release(obj->ref, 1); add(collection, new_key, obj);
if (!refcount_inc_not_zero(&obj->ref)) return; use(obj->value); /* USING WRONG OBJECT */
refcount_set_release() is required to prevent the following reordering when refcount_set() is used instead:
consumer: obj = lookup(collection, key);
producer: remove(collection, obj->key); if (!refcount_dec_and_test(&obj->ref)) return; obj->key = KEY_INVALID; free(obj); obj = malloc(); /* obj is reused */ obj->key = new_key; /* new_key == old_key */ refcount_set(obj->ref, 1);
if (!refcount_inc_not_zero_acquire(&obj->ref)) return; if (READ_ONCE(obj->key) != key) { /* pass since new_key == old_key */ put_ref(obj); return; } use(obj->value); /* USING STALE obj->value */
obj->value = new_value; /* reordered store */ add(collection, key, obj);
[[email protected]: fix title underlines in refcount-vs-atomic.rst] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Vlastimil Babka <[email protected]> [slab] Tested-by: Shivank Garg <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Christian Brauner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Klara Modin <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sourav Panda <[email protected]> Cc: Wei Yang <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
939c5de3 |
| 01-Mar-2025 |
Ye Bin <[email protected]> |
mm/slab: call kmalloc_noprof() unconditionally in kmalloc_array_noprof()
If 'n' or 'size' isn't builtin constant, we used to call __kmalloc() before commit 7bd230a26648 ("mm/slab: enable slab alloca
mm/slab: call kmalloc_noprof() unconditionally in kmalloc_array_noprof()
If 'n' or 'size' isn't builtin constant, we used to call __kmalloc() before commit 7bd230a26648 ("mm/slab: enable slab allocation tagging for kmalloc and friends"), which inadvertedly changed both paths to kmalloc_noprof().
As Harry Yoo points out we can just call kmalloc_noprof() unconditionally. If the compiler knows n and size are constants it doesn't guarantee that bytes will be also seen as constant, and that is the important test in kmalloc_noprof() anyway, so we can just defer to it always.
[ [email protected]: change as Harry suggested and adjust commit log ]
Fixes: 7bd230a26648 ("mm/slab: enable slab allocation tagging for kmalloc and friends") Signed-off-by: Ye Bin <[email protected]> Reviewed-by: Harry Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2 |
|
| #
c9f8f124 |
| 03-Feb-2025 |
Vlastimil Babka <[email protected]> |
slab: don't batch kvfree_rcu() with SLUB_TINY
kvfree_rcu() is batched for better performance except on TINY_RCU, which is a simple implementation for small UP systems. Similarly SLUB_TINY is an opti
slab: don't batch kvfree_rcu() with SLUB_TINY
kvfree_rcu() is batched for better performance except on TINY_RCU, which is a simple implementation for small UP systems. Similarly SLUB_TINY is an option intended for small systems, whether or not used together with TINY_RCU. In case SLUB_TINY is used with !TINY_RCU, it makes arguably sense to not do the batching and limit the memory footprint. It's also suboptimal to have RCU-specific #ifdefs in slab code.
With that, add CONFIG_KVFREE_RCU_BATCHED to determine whether batching kvfree_rcu() implementation is used. It is not set by a user prompt, but enabled by default and disabled in case TINY_RCU or SLUB_TINY are enabled.
Use the new config for #ifdef's in slab code and extend their scope to cover all code used by the batched kvfree_rcu(). For example there's no need to perform kvfree_rcu_init() if the batching is disabled.
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
b14ff274 |
| 03-Feb-2025 |
Vlastimil Babka <[email protected]> |
slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
Following the move of TREE_RCU implementation, let's move also the TINY_RCU one for consistency and subsequent refactoring.
For simplicity,
slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
Following the move of TREE_RCU implementation, let's move also the TINY_RCU one for consistency and subsequent refactoring.
For simplicity, remove the separate inline __kvfree_call_rcu() as TINY_RCU is not meant for high-performance hardware anyway.
Declare kvfree_call_rcu() in rcupdate.h to avoid header dependency issues.
Also move the kvfree_rcu_barrier() declaration to slab.h
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
bbe658d6 |
| 12-Dec-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
mm/slab: Move kvfree_rcu() into SLAB
Move kvfree_rcu() functionality to the slab_common.c file.
The reason to have kvfree_rcu() functionality as part of SLAB is that there is a clear trend and need
mm/slab: Move kvfree_rcu() into SLAB
Move kvfree_rcu() functionality to the slab_common.c file.
The reason to have kvfree_rcu() functionality as part of SLAB is that there is a clear trend and need of closer integration. One of the recent example is creating a barrier function for SLAB caches.
Another reason is to prevent of having several implementations of RCU machinery for reclaiming objects after a GP. As future steps, it can be more integrated(easier) with SLAB internals.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
b6da9401 |
| 09-Oct-2024 |
Vlastimil Babka <[email protected]> |
mm, slab: add kerneldocs for common SLAB_ flags
We have many SLAB_ flags but many are used only internally, by kunit tests or debugging subsystems cooperating with slab, or are set according to slab
mm, slab: add kerneldocs for common SLAB_ flags
We have many SLAB_ flags but many are used only internally, by kunit tests or debugging subsystems cooperating with slab, or are set according to slab_debug boot parameter.
Create kerneldocs for the commonly used flags that may be passed to kmem_cache_create(). SLAB_TYPESAFE_BY_RCU already had a detailed description, so turn it to a kerneldoc. Add some details for SLAB_ACCOUNT, SLAB_RECLAIM_ACCOUNT and SLAB_HWCACHE_ALIGN. Reference them from the __kmem_cache_create_args() kerneldoc.
Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
1e562dea |
| 10-Sep-2024 |
Lukas Wunner <[email protected]> |
crypto: rsassa-pkcs1 - Migrate to sig_alg backend
A sig_alg backend has just been introduced with the intent of moving all asymmetric sign/verify algorithms to it one by one.
Migrate the sign/verif
crypto: rsassa-pkcs1 - Migrate to sig_alg backend
A sig_alg backend has just been introduced with the intent of moving all asymmetric sign/verify algorithms to it one by one.
Migrate the sign/verify operations from rsa-pkcs1pad.c to a separate rsassa-pkcs1.c which uses the new backend.
Consequently there are now two templates which build on the "rsa" akcipher_alg:
* The existing "pkcs1pad" template, which is instantiated as an akcipher_instance and retains the encrypt/decrypt operations of RSAES-PKCS1-v1_5 (RFC 8017 sec 7.2).
* The new "pkcs1" template, which is instantiated as a sig_instance and contains the sign/verify operations of RSASSA-PKCS1-v1_5 (RFC 8017 sec 8.2).
In a separate step, rsa-pkcs1pad.c could optionally be renamed to rsaes-pkcs1.c for clarity. Additional "oaep" and "pss" templates could be added for RSAES-OAEP and RSASSA-PSS.
Note that it's currently allowed to allocate a "pkcs1pad(rsa)" transform without specifying a hash algorithm. That makes sense if the transform is only used for encrypt/decrypt and continues to be supported. But for sign/verify, such transforms previously did not insert the Full Hash Prefix into the padding. The resulting message encoding was incompliant with EMSA-PKCS1-v1_5 (RFC 8017 sec 9.2) and therefore nonsensical.
From here on in, it is no longer allowed to allocate a transform without specifying a hash algorithm if the transform is used for sign/verify operations. This simplifies the code because the insertion of the Full Hash Prefix is no longer optional, so various "if (digest_info)" clauses can be removed.
There has been a previous attempt to forbid transform allocation without specifying a hash algorithm, namely by commit c0d20d22e0ad ("crypto: rsa-pkcs1pad - Require hash to be present"). It had to be rolled back with commit b3a8c8a5ebb5 ("crypto: rsa-pkcs1pad: Allow hash to be optional [ver #2]"), presumably because it broke allocation of a transform which was solely used for encrypt/decrypt, not sign/verify. Avoid such breakage by allowing transform allocation for encrypt/decrypt with and without specifying a hash algorithm (and simply ignoring the hash algorithm in the former case).
So again, specifying a hash algorithm is now mandatory for sign/verify, but optional and ignored for encrypt/decrypt.
The new sig_alg API uses kernel buffers instead of sglists, which avoids the overhead of copying signature and digest from sglists back into kernel buffers. rsassa-pkcs1.c is thus simplified quite a bit.
sig_alg is always synchronous, whereas the underlying "rsa" akcipher_alg may be asynchronous. So await the result of the akcipher_alg, similar to crypto_akcipher_sync_{en,de}crypt().
As part of the migration, rename "rsa_digest_info" to "hash_prefix" to adhere to the spec language in RFC 9580. Otherwise keep the code unmodified wherever possible to ease reviewing and bisecting. Leave several simplification and hardening opportunities to separate commits.
rsassa-pkcs1.c uses modern __free() syntax for allocation of buffers which need to be freed by kfree_sensitive(), hence a DEFINE_FREE() clause for kfree_sensitive() is introduced herein as a byproduct.
Signed-off-by: Lukas Wunner <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
show more ...
|
| #
4b7ff9ab |
| 13-Sep-2024 |
Vlastimil Babka <[email protected]> |
mm, slab: restore kerneldoc for kmem_cache_create()
As kmem_cache_create() became a _Generic() wrapper macro, it currently has no kerneldoc despite being the main API to use. Add it. Also adjust kme
mm, slab: restore kerneldoc for kmem_cache_create()
As kmem_cache_create() became a _Generic() wrapper macro, it currently has no kerneldoc despite being the main API to use. Add it. Also adjust kmem_cache_create_usercopy() kerneldoc to indicate it is now a legacy wrapper.
Also expand the kerneldoc for struct kmem_cache_args, especially for the freeptr_offset field, where important details were removed with the removal of kmem_cache_create_rcu().
Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7 |
|
| #
781aee75 |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
slab: make __kmem_cache_create() static inline
Make __kmem_cache_create() a static inline function.
Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <rpp
slab: make __kmem_cache_create() static inline
Make __kmem_cache_create() a static inline function.
Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
0c9050b0 |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
slab: make kmem_cache_create_usercopy() static inline
Make kmem_cache_create_usercopy() a static inline function.
Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Mike Rapoport (M
slab: make kmem_cache_create_usercopy() static inline
Make kmem_cache_create_usercopy() a static inline function.
Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
3d453e60 |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
slab: remove kmem_cache_create_rcu()
Now that we have ported all users of kmem_cache_create_rcu() to struct kmem_cache_args the function is unused and can be removed.
Reviewed-by: Kees Cook <kees@k
slab: remove kmem_cache_create_rcu()
Now that we have ported all users of kmem_cache_create_rcu() to struct kmem_cache_args the function is unused and can be removed.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
b2e7456b |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
slab: create kmem_cache_create() compatibility layer
Use _Generic() to create a compatibility layer that type switches on the third argument to either call __kmem_cache_create() or __kmem_cache_crea
slab: create kmem_cache_create() compatibility layer
Use _Generic() to create a compatibility layer that type switches on the third argument to either call __kmem_cache_create() or __kmem_cache_create_args(). If NULL is passed for the struct kmem_cache_args argument use default args making porting for callers that don't care about additional arguments easy.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
199cd13a |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args
Make KMEM_CACHE_USERCOPY() use struct kmem_cache_args.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]>
slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args
Make KMEM_CACHE_USERCOPY() use struct kmem_cache_args.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
052d67b4 |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
slab: port KMEM_CACHE() to struct kmem_cache_args
Make KMEM_CACHE() use struct kmem_cache_args.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Mike
slab: port KMEM_CACHE() to struct kmem_cache_args
Make KMEM_CACHE() use struct kmem_cache_args.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
879fb3c2 |
| 05-Sep-2024 |
Christian Brauner <[email protected]> |
slab: add struct kmem_cache_args
Currently we have multiple kmem_cache_create*() variants that take up to seven separate parameters with one of the functions having to grow an eigth parameter in the
slab: add struct kmem_cache_args
Currently we have multiple kmem_cache_create*() variants that take up to seven separate parameters with one of the functions having to grow an eigth parameter in the future to handle both usercopy and a custom freelist pointer.
Add a struct kmem_cache_args structure and move less common parameters into it. Core parameters such as name, object size, and flags continue to be passed separately.
Add a new function __kmem_cache_create_args() that takes a struct kmem_cache_args pointer and port do_kmem_cache_create_usercopy() over to it.
In follow-up patches we will port the other kmem_cache_create*() variants over to it as well.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
9028cdeb |
| 05-Sep-2024 |
Shakeel Butt <[email protected]> |
memcg: add charging of already allocated slab objects
At the moment, the slab objects are charged to the memcg at the allocation time. However there are cases where slab objects are allocated at the
memcg: add charging of already allocated slab objects
At the moment, the slab objects are charged to the memcg at the allocation time. However there are cases where slab objects are allocated at the time where the right target memcg to charge it to is not known. One such case is the network sockets for the incoming connection which are allocated in the softirq context.
Couple hundred thousand connections are very normal on large loaded server and almost all of those sockets underlying those connections get allocated in the softirq context and thus not charged to any memcg. However later at the accept() time we know the right target memcg to charge. Let's add new API to charge already allocated objects, so we can have better accounting of the memory usage.
To measure the performance impact of this change, tcp_crr is used from the neper [1] performance suite. Basically it is a network ping pong test with new connection for each ping pong.
The server and the client are run inside 3 level of cgroup hierarchy using the following commands:
Server: $ tcp_crr -6
Client: $ tcp_crr -6 -c -H ${server_ip}
If the client and server run on different machines with 50 GBPS NIC, there is no visible impact of the change.
For the same machine experiment with v6.11-rc5 as base.
base (throughput) with-patch tcp_crr 14545 (+- 80) 14463 (+- 56)
It seems like the performance impact is within the noise.
Link: https://github.com/google/neper [1] Signed-off-by: Shakeel Butt <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Yosry Ahmed <[email protected]> Acked-by: Paolo Abeni <[email protected]> # net Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
489a744e |
| 12-Aug-2024 |
Danilo Krummrich <[email protected]> |
mm: krealloc: clarify valid usage of __GFP_ZERO
Properly document that if __GFP_ZERO logic is requested, callers must ensure that, starting with the initial memory allocation, every subsequent call
mm: krealloc: clarify valid usage of __GFP_ZERO
Properly document that if __GFP_ZERO logic is requested, callers must ensure that, starting with the initial memory allocation, every subsequent call to this API for the same memory allocation is flagged with __GFP_ZERO. Otherwise, it is possible that __GFP_ZERO is not fully honored by this API.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Danilo Krummrich <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
590b9d57 |
| 22-Jul-2024 |
Danilo Krummrich <[email protected]> |
mm: kvmalloc: align kvrealloc() with krealloc()
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior
mm: kvmalloc: align kvrealloc() with krealloc()
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior:
- krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation.
- krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead.
- krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation.
Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects.
Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation.
[[email protected]: document concurrency restrictions] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Danilo Krummrich <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Chandan Babu R <[email protected]> Cc: Christian König <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
d345bd2e |
| 28-Aug-2024 |
Christian Brauner <[email protected]> |
mm: add kmem_cache_create_rcu()
When a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer must be located outside of the object because we don't know what part of the memory can safely
mm: add kmem_cache_create_rcu()
When a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer must be located outside of the object because we don't know what part of the memory can safely be overwritten as it may be needed to prevent object recycling.
That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a new cacheline. This is the case for e.g., struct file. After having it shrunk down by 40 bytes and having it fit in three cachelines we still have SLAB_TYPESAFE_BY_RCU adding a fourth cacheline because it needs to accommodate the free pointer.
Add a new kmem_cache_create_rcu() function that allows the caller to specify an offset where the free pointer is supposed to be placed.
Link: https://lore.kernel.org/r/[email protected] Acked-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7 |
|
| #
3a3b7fec |
| 01-Jul-2024 |
Johannes Weiner <[email protected]> |
mm: remove CONFIG_MEMCG_KMEM
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab tracking is enabled. It has been default-enabled and equivalent to CONFIG_MEMCG for almost a decade.
mm: remove CONFIG_MEMCG_KMEM
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab tracking is enabled. It has been default-enabled and equivalent to CONFIG_MEMCG for almost a decade. We've only grown more kernel memory accounting sites since, and there is no imaginable cgroup usecase going forward that wants to track user pages but not the multitude of user-drivable kernel allocations.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
b32801d1 |
| 01-Jul-2024 |
Kees Cook <[email protected]> |
mm/slab: Introduce kmem_buckets_create() and family
Dedicated caches are available for fixed size allocations via kmem_cache_alloc(), but for dynamically sized allocations there is only the global k
mm/slab: Introduce kmem_buckets_create() and family
Dedicated caches are available for fixed size allocations via kmem_cache_alloc(), but for dynamically sized allocations there is only the global kmalloc API's set of buckets available. This means it isn't possible to separate specific sets of dynamically sized allocations into a separate collection of caches.
This leads to a use-after-free exploitation weakness in the Linux kernel since many heap memory spraying/grooming attacks depend on using userspace-controllable dynamically sized allocations to collide with fixed size allocations that end up in same cache.
While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense against these kinds of "type confusion" attacks, including for fixed same-size heap objects, we can create a complementary deterministic defense for dynamically sized allocations that are directly user controlled. Addressing these cases is limited in scope, so isolating these kinds of interfaces will not become an unbounded game of whack-a-mole. For example, many pass through memdup_user(), making isolation there very effective.
In order to isolate user-controllable dynamically-sized allocations from the common system kmalloc allocations, introduce kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for where caller tracking is needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback is needed. Note that these caches are specifically flagged with SLAB_NO_MERGE, since merging would defeat the entire purpose of the mitigation.
This can also be used in the future to extend allocation profiling's use of code tagging to implement per-caller allocation cache isolation[1] even for dynamic allocations.
Memory allocation pinning[2] is still needed to plug the Use-After-Free cross-allocator weakness (where attackers can arrange to free an entire slab page and have it reallocated to a different cache), but that is an existing and separate issue which is complementary to this improvement. Development continues for that feature via the SLAB_VIRTUAL[3] series (which could also provide guard pages -- another complementary improvement).
Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2] Link: https://lore.kernel.org/lkml/[email protected]/ [3] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
2e8000b8 |
| 01-Jul-2024 |
Kees Cook <[email protected]> |
mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
Plumb kmem_buckets arguments through kvmalloc_node_noprof() so it is possible to provide an API to perform kvmalloc-sty
mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
Plumb kmem_buckets arguments through kvmalloc_node_noprof() so it is possible to provide an API to perform kvmalloc-style allocations with a particular set of buckets. Introduce kvmalloc_buckets_node() that takes a kmem_buckets argument.
Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
67f2df3b |
| 01-Jul-2024 |
Kees Cook <[email protected]> |
mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to support separated kmalloc buckets (in the following kmem_buckets_create() patc
mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to support separated kmalloc buckets (in the following kmem_buckets_create() patches and future codetag-based separation). Since this will provide a mitigation for a very common case of exploits, it is recommended to enable this feature for general purpose distros. By default, the new Kconfig will be enabled if CONFIG_SLAB_FREELIST_HARDENED is enabled (and it is added to the hardening.config Kconfig fragment).
To be able to choose which buckets to allocate from, make the buckets available to the internal kmalloc interfaces by adding them as the second argument, rather than depending on the buckets being chosen from the fixed set of global buckets. Where the bucket is not available, pass NULL, which means "use the default system kmalloc bucket set" (the prior existing behavior), as implemented in kmalloc_slab().
To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the top-level macros and static inlines use the buckets argument (where they are stripped out and compiled out respectively). The actual extern functions can then be built without the argument, and the internals fall back to the global kmalloc buckets unconditionally.
Co-developed-by: Vlastimil Babka <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
72e0fe22 |
| 01-Jul-2024 |
Kees Cook <[email protected]> |
mm/slab: Introduce kmem_buckets typedef
Encapsulate the concept of a single set of kmem_caches that are used for the kmalloc size buckets. Redefine kmalloc_caches as an array of these buckets (for t
mm/slab: Introduce kmem_buckets typedef
Encapsulate the concept of a single set of kmem_caches that are used for the kmalloc size buckets. Redefine kmalloc_caches as an array of these buckets (for the different global cache buckets).
Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|