|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
823beb31 |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callers
Each of get_{mon,ctrl}_domain_from_cpu() only has one caller.
Once the filesystem code is moved to /fs/, there is no eq
x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callers
Each of get_{mon,ctrl}_domain_from_cpu() only has one caller.
Once the filesystem code is moved to /fs/, there is no equivalent to core.c.
Move these functions to each live next to their caller. This allows them to be made static and the header file entries to be removed.
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
c32a7d77 |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Move mbm_cfg_mask to struct rdt_resource
The mbm_cfg_mask field lists the bits that user-space can set when configuring an event. This value is output via the last_cmd_status file.
Onc
x86/resctrl: Move mbm_cfg_mask to struct rdt_resource
The mbm_cfg_mask field lists the bits that user-space can set when configuring an event. This value is output via the last_cmd_status file.
Once the filesystem parts of resctrl are moved to live in /fs/, the struct rdt_hw_resource is inaccessible to the filesystem code. Because this value is output to user-space, it has to be accessible to the filesystem code.
Move it to struct rdt_resource.
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
37bae175 |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Move mba_mbps_default_event init to filesystem code
mba_mbps_default_event is initialised based on whether mbm_local or mbm_total is supported. In the case of both, it is initialised to
x86/resctrl: Move mba_mbps_default_event init to filesystem code
mba_mbps_default_event is initialised based on whether mbm_local or mbm_total is supported. In the case of both, it is initialised to mbm_local. mba_mbps_default_event is initialised in core.c's get_rdt_mon_resources(), while all the readers are in rdtgroup.c.
After this code is split into architecture-specific and filesystem code, get_rdt_mon_resources() remains part of the architecture code, which would mean mba_mbps_default_event has to be exposed by the filesystem code.
Move the initialisation to the filesystem's resctrl_mon_resource_init().
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d81826f8 |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC
When BMEC is supported the resctrl event can be configured in a number of ways. This depends on architecture support. rdt_get_mon
x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC
When BMEC is supported the resctrl event can be configured in a number of ways. This depends on architecture support. rdt_get_mon_l3_config() modifies the struct mon_evt and calls resctrl_file_fflags_init() to create the files that allow the configuration.
Splitting this into separate architecture and filesystem parts would require the struct mon_evt and resctrl_file_fflags_init() to be exposed.
Instead, add resctrl_arch_is_evt_configurable(), and use this from resctrl_mon_resource_init() to initialise struct mon_evt and call resctrl_file_fflags_init().
resctrl_arch_is_evt_configurable() calls rdt_cpu_has() so it doesn't obviously benefit from being inlined. Putting it in core.c will allow rdt_cpu_has() to eventually become static.
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d012b66a |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.h
The architecture specific parts of resctrl provide helpers like is_mbm_total_enabled() and is_mbm_local_enabled() to hide accesses t
x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.h
The architecture specific parts of resctrl provide helpers like is_mbm_total_enabled() and is_mbm_local_enabled() to hide accesses to the rdt_mon_features bitmap.
Exposing a group of helpers between the architecture and filesystem code is preferable to a single unsigned-long like rdt_mon_features. Helpers can be more readable and have a well defined behaviour, while allowing architectures to hide more complex behaviour.
Once the filesystem parts of resctrl are moved, these existing helpers can no longer live in internal.h. Move them to include/linux/resctrl.h Once these are exposed to the wider kernel, they should have a 'resctrl_arch_' prefix, to fit the rest of the arch<->fs interface.
Move and rename the helpers that touch rdt_mon_features directly. is_mbm_event() and is_mbm_enabled() are only called from rdtgroup.c, so can be moved into that file.
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
4b6bdbf2 |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Move monitor init work to a resctrl init call
rdt_get_mon_l3_config() is called from the arch's resctrl_arch_late_init(), and initialises both architecture specific fields, such as hw_r
x86/resctrl: Move monitor init work to a resctrl init call
rdt_get_mon_l3_config() is called from the arch's resctrl_arch_late_init(), and initialises both architecture specific fields, such as hw_res->mon_scale and resctrl filesystem fields by calling dom_data_init().
To separate the filesystem and architecture parts of resctrl, this function needs splitting up.
Add resctrl_mon_resource_init() to do the filesystem specific work, and call it from resctrl_init(). This runs later, but is still before the filesystem is mounted and the rmid_ptrs[] array can be used.
[ bp: Massage commit message. ]
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
01184272 |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Move monitor exit work to a resctrl exit call
rdt_put_mon_l3_config() is called via the architecture's resctrl_arch_exit() call, and appears to free the rmid_ptrs[] and closid_num_dirty
x86/resctrl: Move monitor exit work to a resctrl exit call
rdt_put_mon_l3_config() is called via the architecture's resctrl_arch_exit() call, and appears to free the rmid_ptrs[] and closid_num_dirty_rmid[] arrays. In reality this code is marked __exit, and is removed by the linker as resctrl can't be built as a module.
To separate the filesystem and architecture parts of resctrl, this free()ing work needs to be triggered by the filesystem, as these structures belong to the filesystem code.
Rename rdt_put_mon_l3_config() to resctrl_mon_resource_exit() and call it from resctrl_exit(). The kfree() is currently dependent on r->mon_capable.
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
8079565d |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Expose resctrl fs's init function to the rest of the kernel
rdtgroup_init() needs exposing to the rest of the kernel so that arch code can call it once it lives in core code. As this is
x86/resctrl: Expose resctrl fs's init function to the rest of the kernel
rdtgroup_init() needs exposing to the rest of the kernel so that arch code can call it once it lives in core code. As this is one of the few functions exposed, rename it to have "resctrl" in the name. The same goes for the exit call.
Rename x86's arch code init functions for RDT to have an arch prefix to make it clear these are part of the architecture code.
Co-developed-by: Dave Martin <[email protected]> Signed-off-by: Dave Martin <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
3c021531 |
| 11-Mar-2025 |
James Morse <[email protected]> |
x86/resctrl: Add a helper to avoid reaching into the arch code resource list
Resctrl occasionally wants to know something about a specific resource, in these cases it reaches into the arch code's rd
x86/resctrl: Add a helper to avoid reaching into the arch code resource list
Resctrl occasionally wants to know something about a specific resource, in these cases it reaches into the arch code's rdt_resources_all[] array.
Once the filesystem parts of resctrl are moved to /fs/, this means it will need visibility of the architecture specific struct rdt_hw_resource definition, and the array of all resources. All architectures would also need a r_resctrl member in this struct.
Instead, abstract this via a helper to allow architectures to do different things here. Move the level enum to the resctrl header and add a helper to retrieve the struct rdt_resource by 'rid'.
resctrl_arch_get_resource() should not return NULL for any value in the enum, it may instead return a dummy resource that is !alloc_enabled && !mon_enabled.
Co-developed-by: Dave Martin <[email protected]> Signed-off-by: Dave Martin <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Tested-by: Shaopeng Tan <[email protected]> Tested-by: Amit Singh Tomar <[email protected]> # arm64 Tested-by: Shanker Donthineni <[email protected]> # arm64 Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
2c272fad |
| 06-Dec-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Compute memory bandwidth for all supported events
Switching between local and total memory bandwidth events as the input to the mba_sc feedback loop would be cumbersome and take effect
x86/resctrl: Compute memory bandwidth for all supported events
Switching between local and total memory bandwidth events as the input to the mba_sc feedback loop would be cumbersome and take effect slowly in the current implementation as the bandwidth is only known after two consecutive readings of the same event.
Compute the bandwidth for all supported events. This doesn't add significant overhead and will make changing which event is used simple.
Suggested-by: Reinette Chatre <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
481d3637 |
| 06-Dec-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
update_mba_bw() hard codes use of the memory bandwidth local event which prevents more flexible options from being deployed.
Chan
x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
update_mba_bw() hard codes use of the memory bandwidth local event which prevents more flexible options from being deployed.
Change this function to use the event specified in the rdtgroup that is being processed.
Mount time checks for the "mba_MBps" option ensure that local memory bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
2937f9c3 |
| 06-Dec-2024 |
Babu Moger <[email protected]> |
x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize fflags for resctrl files.
Adding new files will invol
x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize fflags for resctrl files.
Adding new files will involve adding another function to initialize the fflags. This can be simplified by adding a new function resctrl_file_fflags_init() and passing the file name and flags to be initialized.
Consolidate fflags initialization into resctrl_file_fflags_init() and remove thread_throttle_mode_init() and mbm_config_rftype_init().
[ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at run time. ]
Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
9bce6e94 |
| 31-Oct-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Support Sub-NUMA cluster mode SNC6
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some Intel platforms.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by:
x86/resctrl: Support Sub-NUMA cluster mode SNC6
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some Intel platforms.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6 |
|
| #
13488150 |
| 28-Jun-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode
There isn't a simple hardware bit that indicates whether a CPU is running in Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the number of C
x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode
There isn't a simple hardware bit that indicates whether a CPU is running in Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the number of CPUs sharing the L3 cache with CPU0 to the number of CPUs in the same NUMA node as CPU0.
Add the missing definition of pr_fmt() to monitor.c. This wasn't noticed before as there are only "can't happen" console messages from this file.
[ bp: Massage commit message. ]
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
21b362cc |
| 02-Jul-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
Hardware has two RMID configuration options for SNC systems. The default mode divides RMID counters between SNC nodes. E.g. wit
x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
Hardware has two RMID configuration options for SNC systems. The default mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199 on node 1. This isn't compatible with Linux resctrl usage. On this example system a process using RMID 5 would only update monitor counters while running on SNC node 0.
The other mode is "RMID Sharing Mode". This is enabled by clearing bit 0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode the number of logical RMIDs is the number of physical RMIDs (from CPUID leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A process can use the same RMID across different SNC nodes.
See the "Intel Resource Director Technology Architecture Specification" for additional details.
When SNC is enabled, update the MSR when a monitor domain is marked online. Technically this is overkill. It only needs to be done once per L3 cache instance rather than per SNC domain. But there is no harm in doing it more than once, and this is not in a critical path.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
9fbb303e |
| 28-Jun-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Make __mon_event_count() handle sum domains
Legacy resctrl monitor files must provide the sum of event values across all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.
x86/resctrl: Make __mon_event_count() handle sum domains
Legacy resctrl monitor files must provide the sum of event values across all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.
There are now two cases: 1) A specific domain is provided in struct rmid_read This is either a non-SNC system, or the request is to read data from just one SNC node. 2) Domain pointer is NULL. In this case the cacheinfo field in struct rmid_read indicates that all SNC nodes that share that L3 cache instance should have the event read and return the sum of all values.
Update the CPU sanity check. The existing check that an event is read from a CPU in the requested domain still applies when reading a single domain. But when summing across domains a more relaxed check that the current CPU is in the scope of the L3 cache instance is appropriate since the MSRs to read events are scoped at L3 cache level.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
587edd70 |
| 28-Jun-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Initialize on-stack struct rmid_read instances
New semantics rely on some struct rmid_read members having NULL values to distinguish between the SNC and non-SNC scenarios. resctrl can
x86/resctrl: Initialize on-stack struct rmid_read instances
New semantics rely on some struct rmid_read members having NULL values to distinguish between the SNC and non-SNC scenarios. resctrl can thus no longer rely on this struct not being initialized properly.
Initialize all on-stack declarations of struct rmid_read:
rdtgroup_mondata_show() mbm_update() mkdir_mondata_subdir()
to ensure that garbage values from the stack are not passed down to other functions.
[ bp: Massage commit message. ]
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
e13db55b |
| 28-Jun-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Introduce snc_nodes_per_l3_cache
Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores and memory controllers on a socket into two or more groups. These are presented
x86/resctrl: Introduce snc_nodes_per_l3_cache
Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores and memory controllers on a socket into two or more groups. These are presented to the operating system as NUMA nodes.
This may enable some workloads to have slightly lower latency to memory as the memory controller(s) in an SNC node are electrically closer to the CPU cores on that SNC node. This cost may be offset by lower bandwidth since the memory accesses for each core can only be interleaved between the memory controllers on the same SNC node.
Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks to track L3 cache occupancy and memory bandwidth. There is an MSR that controls how the RMIDs are shared between SNC nodes.
The default mode divides them numerically. E.g. when there are two SNC nodes on a socket the lower number half of the RMIDs are given to the first node, the remainder to the second node. This would be difficult to use with the Linux resctrl interface as specific RMID values assigned to resctrl groups are not visible to users.
RMID sharing mode divides the physical RMIDs evenly between SNC nodes but uses a logical RMID in the IA32_PQR_ASSOC MSR. For example a system with 200 physical RMIDs (as enumerated by CPUID leaf 0xF) that has two SNC nodes per L3 cache instance would have 100 logical RMIDs available for Linux to use. A task running on SNC node 0 with RMID 5 would accumulate LLC occupancy and MBM bandwidth data in physical RMID 5. Another task using RMID 5, but running on SNC node 1 would accumulate data in physical RMID 105.
Even with this renumbering SNC mode requires several changes in resctrl behavior for correct operation.
Add a static global to arch/x86/kernel/cpu/resctrl/monitor.c to indicate how many SNC domains share an L3 cache instance. Initialize this to "1". Runtime detection of SNC mode will adjust this value.
Update all places to take appropriate action when SNC mode is enabled: 1) The number of logical RMIDs per L3 cache available for use is the number of physical RMIDs divided by the number of SNC nodes. 2) Likewise the "mon_scale" value must be divided by the number of SNC nodes. 3) Add a function to convert from logical RMID values (assigned to tasks and loaded into the IA32_PQR_ASSOC MSR on context switch) to physical RMID values to load into IA32_QM_EVTSEL MSR when reading counters on each SNC node.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
cae2bcb6 |
| 28-Jun-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
The same rdt_domain structure is used for both control and monitor functions. But this results in wasted memory as some of the fields a
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
The same rdt_domain structure is used for both control and monitor functions. But this results in wasted memory as some of the fields are only used by control functions, while most are only used for monitor functions.
Split into separate rdt_ctrl_domain and rdt_mon_domain structures with just the fields required for control and monitoring respectively.
Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain and rdt_hw_mon_domain.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
cd84f72b |
| 28-Jun-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Prepare for different scope for control/monitor operations
Resctrl assumes that control and monitor operations on a resource are performed at the same scope.
Prepare for systems that u
x86/resctrl: Prepare for different scope for control/monitor operations
Resctrl assumes that control and monitor operations on a resource are performed at the same scope.
Prepare for systems that use different scope (specifically Intel needs to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control and NODE scope for cache occupancy and memory bandwidth monitoring).
Create separate domain lists for control and monitor operations.
Note that errors during initialization of either control or monitor functions on a domain would previously result in that domain being excluded from both control and monitor operations. Now the domains are allocated independently it is no longer required to disable both control and monitor operations if either fail.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
c103d4d4 |
| 28-Jun-2024 |
Tony Luck <[email protected]> |
x86/resctrl: Prepare to split rdt_domain structure
The rdt_domain structure is used for both control and monitor features. It is about to be split into separate structures for these two usages becau
x86/resctrl: Prepare to split rdt_domain structure
The rdt_domain structure is used for both control and monitor features. It is about to be split into separate structures for these two usages because the scope for control and monitoring features for a resource will be different for future resources.
To allow for common code that scans a list of domains looking for a specific domain id, move all the common fields ("list", "id", "cpu_mask") into their own structure within the rdt_domain structure.
Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc5 |
|
| #
739c9765 |
| 18-Jun-2024 |
Dave Martin <[email protected]> |
x86/resctrl: Don't try to free nonexistent RMIDs
Commit
6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")
adds logic to map individual monitoring groups into a global index space
x86/resctrl: Don't try to free nonexistent RMIDs
Commit
6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")
adds logic to map individual monitoring groups into a global index space used for tracking allocated RMIDs.
Attempts to free the default RMID are ignored in free_rmid(), and this works fine on x86.
With arm64 MPAM, there is a latent bug here however: on platforms with no monitors exposed through resctrl, each control group still gets a different monitoring group ID as seen by the hardware, since the CLOSID always forms part of the monitoring group ID.
This means that when removing a control group, the code may try to free this group's default monitoring group RMID for real. If there are no monitors however, the RMID tracking table rmid_ptrs[] would be a waste of memory and is never allocated, leading to a splat when free_rmid() tries to dereference the table.
One option would be to treat RMID 0 as special for every CLOSID, but this would be ugly since bookkeeping still needs to be done for these monitoring group IDs when there are monitors present in the hardware.
Instead, add a gating check of resctrl_arch_mon_capable() in free_rmid(), and just do nothing if the hardware doesn't have monitors.
This fix mirrors the gating checks already present in mkdir_rdt_prepare_rmid_alloc() and elsewhere.
No functional change on x86.
[ bp: Massage commit message. ]
Fixes: 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index") Signed-off-by: Dave Martin <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
931be446 |
| 08-Apr-2024 |
Haifeng Xu <[email protected]> |
x86/resctrl: Add tracepoint for llc_occupancy tracking
In our production environment, after removing monitor groups, those unused RMIDs get stuck in the limbo list forever because their llc_occupanc
x86/resctrl: Add tracepoint for llc_occupancy tracking
In our production environment, after removing monitor groups, those unused RMIDs get stuck in the limbo list forever because their llc_occupancy is always larger than the threshold. But the unused RMIDs can be successfully freed by turning up the threshold.
In order to know how much the threshold should be, perf can be used to acquire the llc_occupancy of RMIDs in each rdt domain.
Instead of using perf tool to track llc_occupancy and filter the log manually, it is more convenient for users to use tracepoint to do this work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs when scanning the limbo list.
Suggested-by: Reinette Chatre <[email protected]> Suggested-by: James Morse <[email protected]> Signed-off-by: Haifeng Xu <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: James Morse <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5 |
|
| #
fb700810 |
| 13-Feb-2024 |
James Morse <[email protected]> |
x86/resctrl: Separate arch and fs resctrl locks
resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture co
x86/resctrl: Separate arch and fs resctrl locks
resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture code updates the domain list. Filesystem handlers that walk the domains list should not run concurrently with the cpuhp callback modifying the list.
Exposing a lock from the filesystem code means the interface is not cleanly defined, and creates the possibility of cross-architecture lock ordering headaches. The interaction only exists so that certain filesystem paths are serialised against CPU hotplug. The CPU hotplug code already has a mechanism to do this using cpus_read_lock().
MPAM's monitors have an overflow interrupt, so it needs to be possible to walk the domains list in irq context. RCU is ideal for this, but some paths need to be able to sleep to allocate memory.
Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp callback, cpus_read_lock() must always be taken first. rdtgroup_schemata_write() already does this.
Most of the filesystem code's domain list walkers are currently protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are rdt_bit_usage_show() and the mon_config helpers which take the lock directly.
Make the domain list protected by RCU. An architecture-specific lock prevents concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU, but to keep all the filesystem operations the same, this is changed to call cpus_read_lock(). The mon_config helpers send multiple IPIs, take the cpus_read_lock() in these cases.
The other filesystem list walkers need to be able to sleep. Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't be invoked when file system operations are occurring.
Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live() call isn't obvious.
Resctrl's domain online/offline calls now need to take the rdtgroup_mutex themselves.
[ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ]
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
show more ...
|
| #
978fcca9 |
| 13-Feb-2024 |
James Morse <[email protected]> |
x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU
When a CPU is taken offline resctrl may need to move the overflow or limbo handlers to run on a different CPU.
Once the off
x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU
When a CPU is taken offline resctrl may need to move the overflow or limbo handlers to run on a different CPU.
Once the offline callbacks have been split, cqm_setup_limbo_handler() will be called while the CPU that is going offline is still present in the CPU mask.
Pass the CPU to exclude to cqm_setup_limbo_handler() and mbm_setup_overflow_handler(). These functions can use a variant of cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need excluding.
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
show more ...
|