|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
ff8d523c |
| 13-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Track object usage to avoid premature freeing of objects
The freelist is freed at a constant rate independent of the actual usage requirements. That's bad in scenarios where usage come
debugobjects: Track object usage to avoid premature freeing of objects
The freelist is freed at a constant rate independent of the actual usage requirements. That's bad in scenarios where usage comes in bursts. The end of a burst puts the objects on the free list and freeing proceeds even when the next burst which requires objects started again.
Keep track of the usage with a exponentially wheighted moving average and take that into account in the worker function which frees objects from the free list.
This further reduces the kmem_cache allocation/free rate for a full kernel compile:
kmem_cache_alloc() kmem_cache_free() Baseline: 225k 173k Usage: 170k 117k
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/87bjznhme2.ffs@tglx
show more ...
|
| #
13f9ca72 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Refill per CPU pool more agressively
Right now the per CPU pools are only refilled when they become empty. That's suboptimal especially when there are still non-freed objects in the to
debugobjects: Refill per CPU pool more agressively
Right now the per CPU pools are only refilled when they become empty. That's suboptimal especially when there are still non-freed objects in the to free list.
Check whether an allocation from the per CPU pool emptied a batch and try to allocate from the free pool if that still has objects available.
kmem_cache_alloc() kmem_cache_free() Baseline: 295k 245k Refill: 225k 173k
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
a201a96b |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Double the per CPU slots
In situations where objects are rapidly allocated from the pool and handed back, the size of the per CPU pool turns out to be too small.
Double the size of th
debugobjects: Double the per CPU slots
In situations where objects are rapidly allocated from the pool and handed back, the size of the per CPU pool turns out to be too small.
Double the size of the per CPU pool.
This reduces the kmem cache allocation and free operations during a kernel compile:
alloc free Baseline: 380k 330k Double size: 295k 245k
Especially the reduction of allocations is important because that happens in the hot path when objects are initialized.
The maximum increase in per CPU pool memory consumption is about 2.5K per online CPU, which is acceptable.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
2638345d |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Move pool statistics into global_pool struct
Keep it along with the pool as that's a hot cache line anyway and it makes the code more comprehensible.
Signed-off-by: Thomas Gleixner <t
debugobjects: Move pool statistics into global_pool struct
Keep it along with the pool as that's a hot cache line anyway and it makes the code more comprehensible.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
f57ebb92 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Implement batch processing
Adding and removing single objects in a loop is bad in terms of lock contention and cache line accesses.
To implement batching, record the last object in a
debugobjects: Implement batch processing
Adding and removing single objects in a loop is bad in terms of lock contention and cache line accesses.
To implement batching, record the last object in a batch in the object itself. This is trivialy possible as hlists are strictly stacks. At a batch boundary, when the first object is added to the list the object stores a pointer to itself in debug_obj::batch_last. When the next object is added to the list then the batch_last pointer is retrieved from the first object in the list and stored in the to be added one.
That means for batch processing the first object always has a pointer to the last object in a batch, which allows to move batches in a cache line efficient way and reduces the lock held time.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
aebbfe07 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Prepare kmem_cache allocations for batching
Allocate a batch and then push it into the pool. Utilize the debug_obj::last_node pointer for keeping track of the batch boundary.
Signed-o
debugobjects: Prepare kmem_cache allocations for batching
Allocate a batch and then push it into the pool. Utilize the debug_obj::last_node pointer for keeping track of the batch boundary.
Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
74fe1ad4 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Prepare for batching
Move the debug_obj::object pointer into a union and add a pointer to the last node in a batch. That allows to implement batch processing efficiently by utilizing t
debugobjects: Prepare for batching
Move the debug_obj::object pointer into a union and add a pointer to the last node in a batch. That allows to implement batch processing efficiently by utilizing the stack property of hlist:
When the first object of a batch is added to the list, then the batch pointer is set to the hlist node of the object itself. Any subsequent add retrieves the pointer to the last node from the first object in the list and uses that for storing the last node pointer in the newly added object.
Add the pointer to the data structure and ensure that all relevant pool sizes are strictly batch sized. The actual batching implementation follows in subsequent changes.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
14077b9e |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Use static key for boot pool selection
Get rid of the conditional in the hot path.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]
debugobjects: Use static key for boot pool selection
Get rid of the conditional in the hot path.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
9ce99c6d |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Rework free_object_work()
Convert it to batch processing with intermediate helper functions. This reduces the final changes for batch processing.
Signed-off-by: Thomas Gleixner <tglx@
debugobjects: Rework free_object_work()
Convert it to batch processing with intermediate helper functions. This reduces the final changes for batch processing.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
a3b9e191 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Rework object freeing
__free_object() is uncomprehensibly complex. The same can be achieved by:
1) Adding the object to the per CPU pool
2) If that pool is full, move a batch o
debugobjects: Rework object freeing
__free_object() is uncomprehensibly complex. The same can be achieved by:
1) Adding the object to the per CPU pool
2) If that pool is full, move a batch of objects into the global pool or if the global pool is full into the to free pool
This also prepares for batch processing.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
fb60c004 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Rework object allocation
The current allocation scheme tries to allocate from the per CPU pool first. If that fails it allocates one object from the global pool and then refills the pe
debugobjects: Rework object allocation
The current allocation scheme tries to allocate from the per CPU pool first. If that fails it allocates one object from the global pool and then refills the per CPU pool from the global pool.
That is in the way of switching the pool management to batch mode as the global pool needs to be a strict stack of batches, which does not allow to allocate single objects.
Rework the code to refill the per CPU pool first and then allocate the object from the refilled batch. Also try to allocate from the to free pool first to avoid freeing and reallocating objects.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
96a9a042 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Move min/max count into pool struct
Having the accounting in the datastructure is better in terms of cache lines and allows more optimizations later on.
Signed-off-by: Thomas Gleixner
debugobjects: Move min/max count into pool struct
Having the accounting in the datastructure is better in terms of cache lines and allows more optimizations later on.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
18b8afcb |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Rename and tidy up per CPU pools
No point in having a separate data structure. Reuse struct obj_pool and tidy up the code.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed
debugobjects: Rename and tidy up per CPU pools
No point in having a separate data structure. Reuse struct obj_pool and tidy up the code.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
cb58d190 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Use separate list head for boot pool
There is no point to handle the statically allocated objects during early boot in the actual pool list. This phase does not require accounting, so
debugobjects: Use separate list head for boot pool
There is no point to handle the statically allocated objects during early boot in the actual pool list. This phase does not require accounting, so all of the related complexity can be avoided.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
e18328ff |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Move pools into a datastructure
The contention on the global pool lock can be reduced by strict batch processing where batches of objects are moved from one list head to another instea
debugobjects: Move pools into a datastructure
The contention on the global pool lock can be reduced by strict batch processing where batches of objects are moved from one list head to another instead of moving them object by object. This also reduces the cache footprint because it avoids the list walk and dirties at maximum three cache lines instead of potentially up to eighteen.
To prepare for that, move the hlist head and related counters into a struct.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
d8c6cd3a |
| 07-Oct-2024 |
Zhen Lei <[email protected]> |
debugobjects: Reduce parallel pool fill attempts
The contention on the global pool_lock can be massive when the global pool needs to be refilled and many CPUs try to handle this.
Address this by:
debugobjects: Reduce parallel pool fill attempts
The contention on the global pool_lock can be massive when the global pool needs to be refilled and many CPUs try to handle this.
Address this by:
- splitting the refill from free list and allocation.
Refill from free list has no constraints vs. the context on RT, so it can be tried outside of the RT specific preemptible() guard
- Let only one CPU handle the free list
- Let only one CPU do allocations unless the pool level is below half of the minimum fill level.
Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected]
-- lib/debugobjects.c | 84 +++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 59 insertions(+), 25 deletions(-)
show more ...
|
| #
661cc28b |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Make debug_objects_enabled bool
Make it what it is.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.or
debugobjects: Make debug_objects_enabled bool
Make it what it is.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
49a5cb82 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Provide and use free_object_list()
Move the loop to free a list of objects into a helper function so it can be reused later.
Signed-off-by: Thomas Gleixner <[email protected]> Link:
debugobjects: Provide and use free_object_list()
Move the loop to free a list of objects into a helper function so it can be reused later.
Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
241463f4 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Remove pointless debug printk
It has zero value.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/a
debugobjects: Remove pointless debug printk
It has zero value.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
49968cf1 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Reuse put_objects() on OOM
Reuse the helper function instead of having a open coded copy.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <thunder.leizhen@hu
debugobjects: Reuse put_objects() on OOM
Reuse the helper function instead of having a open coded copy.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
a2a70238 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Dont free objects directly on CPU hotplug
Freeing the per CPU pool of the unplugged CPU directly is suboptimal as the objects can be reused in the real pool if there is room. Aside of
debugobjects: Dont free objects directly on CPU hotplug
Freeing the per CPU pool of the unplugged CPU directly is suboptimal as the objects can be reused in the real pool if there is room. Aside of that this gets the accounting wrong.
Use the regular free path, which allows reuse and has the accounting correct.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
3f397bf9 |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Remove pointless hlist initialization
It's BSS zero initialized.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://l
debugobjects: Remove pointless hlist initialization
It's BSS zero initialized.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
55fb412e |
| 07-Oct-2024 |
Thomas Gleixner <[email protected]> |
debugobjects: Dont destroy kmem cache in init()
debug_objects_mem_init() is invoked from mm_core_init() before work queues are available. If debug_objects_mem_init() destroys the kmem cache in the e
debugobjects: Dont destroy kmem cache in init()
debug_objects_mem_init() is invoked from mm_core_init() before work queues are available. If debug_objects_mem_init() destroys the kmem cache in the error path it causes an Oops in __queue_work():
Oops: Oops: 0000 [#1] PREEMPT SMP PTI RIP: 0010:__queue_work+0x35/0x6a0 queue_work_on+0x66/0x70 flush_all_cpus_locked+0xdf/0x1a0 __kmem_cache_shutdown+0x2f/0x340 kmem_cache_destroy+0x4e/0x150 mm_core_init+0x9e/0x120 start_kernel+0x298/0x800 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xc5/0xe0 common_startup_64+0x12c/0x138
Further the object cache pointer is used in various places to check for early boot operation. It is exposed before the replacments for the static boot time objects are allocated and the self test operates on it.
This can be avoided by:
1) Running the self test with the static boot objects
2) Exposing it only after the replacement objects have been added to the pool.
Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
813fd078 |
| 07-Oct-2024 |
Zhen Lei <[email protected]> |
debugobjects: Collect newly allocated objects in a list to reduce lock contention
Collect the newly allocated debug objects in a list outside the lock, so that the lock held time and the potential l
debugobjects: Collect newly allocated objects in a list to reduce lock contention
Collect the newly allocated debug objects in a list outside the lock, so that the lock held time and the potential lock contention is reduced.
Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
a0ae9504 |
| 07-Oct-2024 |
Zhen Lei <[email protected]> |
debugobjects: Delete a piece of redundant code
The statically allocated objects are all located in obj_static_pool[], the whole memory of obj_static_pool[] will be reclaimed later. Therefore, there
debugobjects: Delete a piece of redundant code
The statically allocated objects are all located in obj_static_pool[], the whole memory of obj_static_pool[] will be reclaimed later. Therefore, there is no need to split the remaining statically nodes in list obj_pool into isolated ones, no one will use them anymore. Just write INIT_HLIST_HEAD(&obj_pool) is enough. Since hlist_move_list() directly discards the old list, even this can be omitted.
Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected]
show more ...
|