History log of /linux-6.15/arch/x86/kernel/cpu/resctrl/monitor.c (Results 1 – 25 of 84)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# 823beb31 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callers

Each of get_{mon,ctrl}_domain_from_cpu() only has one caller.

Once the filesystem code is moved to /fs/, there is no eq

x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callers

Each of get_{mon,ctrl}_domain_from_cpu() only has one caller.

Once the filesystem code is moved to /fs/, there is no equivalent to
core.c.

Move these functions to each live next to their caller. This allows
them to be made static and the header file entries to be removed.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# c32a7d77 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Move mbm_cfg_mask to struct rdt_resource

The mbm_cfg_mask field lists the bits that user-space can set when configuring
an event. This value is output via the last_cmd_status file.

Onc

x86/resctrl: Move mbm_cfg_mask to struct rdt_resource

The mbm_cfg_mask field lists the bits that user-space can set when configuring
an event. This value is output via the last_cmd_status file.

Once the filesystem parts of resctrl are moved to live in /fs/, the struct
rdt_hw_resource is inaccessible to the filesystem code. Because this value is
output to user-space, it has to be accessible to the filesystem code.

Move it to struct rdt_resource.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 37bae175 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Move mba_mbps_default_event init to filesystem code

mba_mbps_default_event is initialised based on whether mbm_local or mbm_total
is supported. In the case of both, it is initialised to

x86/resctrl: Move mba_mbps_default_event init to filesystem code

mba_mbps_default_event is initialised based on whether mbm_local or mbm_total
is supported. In the case of both, it is initialised to mbm_local.
mba_mbps_default_event is initialised in core.c's get_rdt_mon_resources(),
while all the readers are in rdtgroup.c.

After this code is split into architecture-specific and filesystem code,
get_rdt_mon_resources() remains part of the architecture code, which would
mean mba_mbps_default_event has to be exposed by the filesystem code.

Move the initialisation to the filesystem's resctrl_mon_resource_init().

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# d81826f8 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC

When BMEC is supported the resctrl event can be configured in a number of
ways. This depends on architecture support. rdt_get_mon

x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC

When BMEC is supported the resctrl event can be configured in a number of
ways. This depends on architecture support. rdt_get_mon_l3_config() modifies
the struct mon_evt and calls resctrl_file_fflags_init() to create the files
that allow the configuration.

Splitting this into separate architecture and filesystem parts would require
the struct mon_evt and resctrl_file_fflags_init() to be exposed.

Instead, add resctrl_arch_is_evt_configurable(), and use this from
resctrl_mon_resource_init() to initialise struct mon_evt and call
resctrl_file_fflags_init().

resctrl_arch_is_evt_configurable() calls rdt_cpu_has() so it doesn't obviously
benefit from being inlined. Putting it in core.c will allow rdt_cpu_has() to
eventually become static.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# d012b66a 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.h

The architecture specific parts of resctrl provide helpers like
is_mbm_total_enabled() and is_mbm_local_enabled() to hide accesses t

x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.h

The architecture specific parts of resctrl provide helpers like
is_mbm_total_enabled() and is_mbm_local_enabled() to hide accesses to the
rdt_mon_features bitmap.

Exposing a group of helpers between the architecture and filesystem code is
preferable to a single unsigned-long like rdt_mon_features. Helpers can be more
readable and have a well defined behaviour, while allowing architectures to hide
more complex behaviour.

Once the filesystem parts of resctrl are moved, these existing helpers can no
longer live in internal.h. Move them to include/linux/resctrl.h Once these are
exposed to the wider kernel, they should have a 'resctrl_arch_' prefix, to fit
the rest of the arch<->fs interface.

Move and rename the helpers that touch rdt_mon_features directly. is_mbm_event()
and is_mbm_enabled() are only called from rdtgroup.c, so can be moved into that
file.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 4b6bdbf2 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Move monitor init work to a resctrl init call

rdt_get_mon_l3_config() is called from the arch's resctrl_arch_late_init(),
and initialises both architecture specific fields, such as hw_r

x86/resctrl: Move monitor init work to a resctrl init call

rdt_get_mon_l3_config() is called from the arch's resctrl_arch_late_init(),
and initialises both architecture specific fields, such as hw_res->mon_scale
and resctrl filesystem fields by calling dom_data_init().

To separate the filesystem and architecture parts of resctrl, this function
needs splitting up.

Add resctrl_mon_resource_init() to do the filesystem specific work, and call
it from resctrl_init(). This runs later, but is still before the filesystem is
mounted and the rmid_ptrs[] array can be used.

[ bp: Massage commit message. ]

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 01184272 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Move monitor exit work to a resctrl exit call

rdt_put_mon_l3_config() is called via the architecture's resctrl_arch_exit()
call, and appears to free the rmid_ptrs[] and closid_num_dirty

x86/resctrl: Move monitor exit work to a resctrl exit call

rdt_put_mon_l3_config() is called via the architecture's resctrl_arch_exit()
call, and appears to free the rmid_ptrs[] and closid_num_dirty_rmid[] arrays.
In reality this code is marked __exit, and is removed by the linker as resctrl
can't be built as a module.

To separate the filesystem and architecture parts of resctrl, this free()ing
work needs to be triggered by the filesystem, as these structures belong to
the filesystem code.

Rename rdt_put_mon_l3_config() to resctrl_mon_resource_exit() and call it from
resctrl_exit(). The kfree() is currently dependent on r->mon_capable.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 8079565d 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Expose resctrl fs's init function to the rest of the kernel

rdtgroup_init() needs exposing to the rest of the kernel so that arch code can
call it once it lives in core code. As this is

x86/resctrl: Expose resctrl fs's init function to the rest of the kernel

rdtgroup_init() needs exposing to the rest of the kernel so that arch code can
call it once it lives in core code. As this is one of the few functions
exposed, rename it to have "resctrl" in the name. The same goes for the exit
call.

Rename x86's arch code init functions for RDT to have an arch prefix to make
it clear these are part of the architecture code.

Co-developed-by: Dave Martin <[email protected]>
Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 3c021531 11-Mar-2025 James Morse <[email protected]>

x86/resctrl: Add a helper to avoid reaching into the arch code resource list

Resctrl occasionally wants to know something about a specific resource, in
these cases it reaches into the arch code's rd

x86/resctrl: Add a helper to avoid reaching into the arch code resource list

Resctrl occasionally wants to know something about a specific resource, in
these cases it reaches into the arch code's rdt_resources_all[] array.

Once the filesystem parts of resctrl are moved to /fs/, this means it will
need visibility of the architecture specific struct rdt_hw_resource
definition, and the array of all resources. All architectures would also need
a r_resctrl member in this struct.

Instead, abstract this via a helper to allow architectures to do different
things here. Move the level enum to the resctrl header and add a helper to
retrieve the struct rdt_resource by 'rid'.

resctrl_arch_get_resource() should not return NULL for any value in the enum,
it may instead return a dummy resource that is !alloc_enabled && !mon_enabled.

Co-developed-by: Dave Martin <[email protected]>
Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Amit Singh Tomar <[email protected]> # arm64
Tested-by: Shanker Donthineni <[email protected]> # arm64
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2
# 2c272fad 06-Dec-2024 Tony Luck <[email protected]>

x86/resctrl: Compute memory bandwidth for all supported events

Switching between local and total memory bandwidth events as the input
to the mba_sc feedback loop would be cumbersome and take effect

x86/resctrl: Compute memory bandwidth for all supported events

Switching between local and total memory bandwidth events as the input
to the mba_sc feedback loop would be cumbersome and take effect slowly
in the current implementation as the bandwidth is only known after two
consecutive readings of the same event.

Compute the bandwidth for all supported events. This doesn't add
significant overhead and will make changing which event is used
simple.

Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 481d3637 06-Dec-2024 Tony Luck <[email protected]>

x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event

update_mba_bw() hard codes use of the memory bandwidth local event which
prevents more flexible options from being deployed.

Chan

x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event

update_mba_bw() hard codes use of the memory bandwidth local event which
prevents more flexible options from being deployed.

Change this function to use the event specified in the rdtgroup that is
being processed.

Mount time checks for the "mba_MBps" option ensure that local memory
bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 2937f9c3 06-Dec-2024 Babu Moger <[email protected]>

x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags

thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will invol

x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags

thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

[ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at
run time. ]

Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6
# 9bce6e94 31-Oct-2024 Tony Luck <[email protected]>

x86/resctrl: Support Sub-NUMA cluster mode SNC6

Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some
Intel platforms.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by:

x86/resctrl: Support Sub-NUMA cluster mode SNC6

Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some
Intel platforms.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6
# 13488150 28-Jun-2024 Tony Luck <[email protected]>

x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode

There isn't a simple hardware bit that indicates whether a CPU is running in
Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the number of C

x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode

There isn't a simple hardware bit that indicates whether a CPU is running in
Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the number of CPUs
sharing the L3 cache with CPU0 to the number of CPUs in the same NUMA node as
CPU0.

Add the missing definition of pr_fmt() to monitor.c. This wasn't noticed
before as there are only "can't happen" console messages from this file.

[ bp: Massage commit message. ]

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 21b362cc 02-Jul-2024 Tony Luck <[email protected]>

x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems

Hardware has two RMID configuration options for SNC systems. The default
mode divides RMID counters between SNC nodes. E.g. wit

x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems

Hardware has two RMID configuration options for SNC systems. The default
mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and
two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199
on node 1. This isn't compatible with Linux resctrl usage. On this
example system a process using RMID 5 would only update monitor counters
while running on SNC node 0.

The other mode is "RMID Sharing Mode". This is enabled by clearing bit
0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode
the number of logical RMIDs is the number of physical RMIDs (from CPUID
leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A
process can use the same RMID across different SNC nodes.

See the "Intel Resource Director Technology Architecture Specification"
for additional details.

When SNC is enabled, update the MSR when a monitor domain is marked
online. Technically this is overkill. It only needs to be done once
per L3 cache instance rather than per SNC domain. But there is no harm
in doing it more than once, and this is not in a critical path.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 9fbb303e 28-Jun-2024 Tony Luck <[email protected]>

x86/resctrl: Make __mon_event_count() handle sum domains

Legacy resctrl monitor files must provide the sum of event values across
all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.

x86/resctrl: Make __mon_event_count() handle sum domains

Legacy resctrl monitor files must provide the sum of event values across
all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.

There are now two cases:
1) A specific domain is provided in struct rmid_read
This is either a non-SNC system, or the request is to read data
from just one SNC node.
2) Domain pointer is NULL. In this case the cacheinfo field in struct
rmid_read indicates that all SNC nodes that share that L3 cache
instance should have the event read and return the sum of all
values.

Update the CPU sanity check. The existing check that an event is read
from a CPU in the requested domain still applies when reading a single
domain. But when summing across domains a more relaxed check that the
current CPU is in the scope of the L3 cache instance is appropriate
since the MSRs to read events are scoped at L3 cache level.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 587edd70 28-Jun-2024 Tony Luck <[email protected]>

x86/resctrl: Initialize on-stack struct rmid_read instances

New semantics rely on some struct rmid_read members having NULL values to
distinguish between the SNC and non-SNC scenarios. resctrl can

x86/resctrl: Initialize on-stack struct rmid_read instances

New semantics rely on some struct rmid_read members having NULL values to
distinguish between the SNC and non-SNC scenarios. resctrl can thus no longer
rely on this struct not being initialized properly.

Initialize all on-stack declarations of struct rmid_read:

rdtgroup_mondata_show()
mbm_update()
mkdir_mondata_subdir()

to ensure that garbage values from the stack are not passed down to other
functions.

[ bp: Massage commit message. ]

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# e13db55b 28-Jun-2024 Tony Luck <[email protected]>

x86/resctrl: Introduce snc_nodes_per_l3_cache

Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented

x86/resctrl: Introduce snc_nodes_per_l3_cache

Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented to the operating system as NUMA nodes.

This may enable some workloads to have slightly lower latency to memory
as the memory controller(s) in an SNC node are electrically closer to the
CPU cores on that SNC node. This cost may be offset by lower bandwidth
since the memory accesses for each core can only be interleaved between
the memory controllers on the same SNC node.

Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
to track L3 cache occupancy and memory bandwidth. There is an MSR that
controls how the RMIDs are shared between SNC nodes.

The default mode divides them numerically. E.g. when there are two SNC
nodes on a socket the lower number half of the RMIDs are given to the
first node, the remainder to the second node. This would be difficult
to use with the Linux resctrl interface as specific RMID values assigned
to resctrl groups are not visible to users.

RMID sharing mode divides the physical RMIDs evenly between SNC nodes
but uses a logical RMID in the IA32_PQR_ASSOC MSR. For example a system
with 200 physical RMIDs (as enumerated by CPUID leaf 0xF) that has two
SNC nodes per L3 cache instance would have 100 logical RMIDs available
for Linux to use. A task running on SNC node 0 with RMID 5 would
accumulate LLC occupancy and MBM bandwidth data in physical RMID 5.
Another task using RMID 5, but running on SNC node 1 would accumulate
data in physical RMID 105.

Even with this renumbering SNC mode requires several changes in resctrl
behavior for correct operation.

Add a static global to arch/x86/kernel/cpu/resctrl/monitor.c to indicate
how many SNC domains share an L3 cache instance. Initialize this to
"1". Runtime detection of SNC mode will adjust this value.

Update all places to take appropriate action when SNC mode is enabled:
1) The number of logical RMIDs per L3 cache available for use is the
number of physical RMIDs divided by the number of SNC nodes.
2) Likewise the "mon_scale" value must be divided by the number of SNC
nodes.
3) Add a function to convert from logical RMID values (assigned to
tasks and loaded into the IA32_PQR_ASSOC MSR on context switch)
to physical RMID values to load into IA32_QM_EVTSEL MSR when
reading counters on each SNC node.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# cae2bcb6 28-Jun-2024 Tony Luck <[email protected]>

x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

The same rdt_domain structure is used for both control and monitor
functions. But this results in wasted memory as some of the fields a

x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

The same rdt_domain structure is used for both control and monitor
functions. But this results in wasted memory as some of the fields are
only used by control functions, while most are only used for monitor
functions.

Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
just the fields required for control and monitoring respectively.

Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
and rdt_hw_mon_domain.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# cd84f72b 28-Jun-2024 Tony Luck <[email protected]>

x86/resctrl: Prepare for different scope for control/monitor operations

Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that u

x86/resctrl: Prepare for different scope for control/monitor operations

Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that use different scope (specifically Intel needs
to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control
and NODE scope for cache occupancy and memory bandwidth monitoring).

Create separate domain lists for control and monitor operations.

Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# c103d4d4 28-Jun-2024 Tony Luck <[email protected]>

x86/resctrl: Prepare to split rdt_domain structure

The rdt_domain structure is used for both control and monitor features.
It is about to be split into separate structures for these two usages
becau

x86/resctrl: Prepare to split rdt_domain structure

The rdt_domain structure is used for both control and monitor features.
It is about to be split into separate structures for these two usages
because the scope for control and monitoring features for a resource
will be different for future resources.

To allow for common code that scans a list of domains looking for a
specific domain id, move all the common fields ("list", "id", "cpu_mask")
into their own structure within the rdt_domain structure.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.10-rc5
# 739c9765 18-Jun-2024 Dave Martin <[email protected]>

x86/resctrl: Don't try to free nonexistent RMIDs

Commit

6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")

adds logic to map individual monitoring groups into a global index space

x86/resctrl: Don't try to free nonexistent RMIDs

Commit

6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")

adds logic to map individual monitoring groups into a global index space used
for tracking allocated RMIDs.

Attempts to free the default RMID are ignored in free_rmid(), and this works
fine on x86.

With arm64 MPAM, there is a latent bug here however: on platforms with no
monitors exposed through resctrl, each control group still gets a different
monitoring group ID as seen by the hardware, since the CLOSID always forms part
of the monitoring group ID.

This means that when removing a control group, the code may try to free this
group's default monitoring group RMID for real. If there are no monitors
however, the RMID tracking table rmid_ptrs[] would be a waste of memory and is
never allocated, leading to a splat when free_rmid() tries to dereference the
table.

One option would be to treat RMID 0 as special for every CLOSID, but this would
be ugly since bookkeeping still needs to be done for these monitoring group IDs
when there are monitors present in the hardware.

Instead, add a gating check of resctrl_arch_mon_capable() in free_rmid(), and
just do nothing if the hardware doesn't have monitors.

This fix mirrors the gating checks already present in
mkdir_rdt_prepare_rmid_alloc() and elsewhere.

No functional change on x86.

[ bp: Massage commit message. ]

Fixes: 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")
Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4
# 931be446 08-Apr-2024 Haifeng Xu <[email protected]>

x86/resctrl: Add tracepoint for llc_occupancy tracking

In our production environment, after removing monitor groups, those
unused RMIDs get stuck in the limbo list forever because their
llc_occupanc

x86/resctrl: Add tracepoint for llc_occupancy tracking

In our production environment, after removing monitor groups, those
unused RMIDs get stuck in the limbo list forever because their
llc_occupancy is always larger than the threshold. But the unused RMIDs
can be successfully freed by turning up the threshold.

In order to know how much the threshold should be, perf can be used to
acquire the llc_occupancy of RMIDs in each rdt domain.

Instead of using perf tool to track llc_occupancy and filter the log
manually, it is more convenient for users to use tracepoint to do this
work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
when scanning the limbo list.

Suggested-by: Reinette Chatre <[email protected]>
Suggested-by: James Morse <[email protected]>
Signed-off-by: Haifeng Xu <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: James Morse <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5
# fb700810 13-Feb-2024 James Morse <[email protected]>

x86/resctrl: Separate arch and fs resctrl locks

resctrl has one mutex that is taken by the architecture-specific code, and the
filesystem parts. The two interact via cpuhp, where the architecture co

x86/resctrl: Separate arch and fs resctrl locks

resctrl has one mutex that is taken by the architecture-specific code, and the
filesystem parts. The two interact via cpuhp, where the architecture code
updates the domain list. Filesystem handlers that walk the domains list should
not run concurrently with the cpuhp callback modifying the list.

Exposing a lock from the filesystem code means the interface is not cleanly
defined, and creates the possibility of cross-architecture lock ordering
headaches. The interaction only exists so that certain filesystem paths are
serialised against CPU hotplug. The CPU hotplug code already has a mechanism to
do this using cpus_read_lock().

MPAM's monitors have an overflow interrupt, so it needs to be possible to walk
the domains list in irq context. RCU is ideal for this, but some paths need to
be able to sleep to allocate memory.

Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp
callback, cpus_read_lock() must always be taken first.
rdtgroup_schemata_write() already does this.

Most of the filesystem code's domain list walkers are currently protected by
the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are
rdt_bit_usage_show() and the mon_config helpers which take the lock directly.

Make the domain list protected by RCU. An architecture-specific lock prevents
concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU,
but to keep all the filesystem operations the same, this is changed to call
cpus_read_lock(). The mon_config helpers send multiple IPIs, take the
cpus_read_lock() in these cases.

The other filesystem list walkers need to be able to sleep. Add
cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't
be invoked when file system operations are occurring.

Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live()
call isn't obvious.

Resctrl's domain online/offline calls now need to take the rdtgroup_mutex
themselves.

[ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ]

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>

show more ...


# 978fcca9 13-Feb-2024 James Morse <[email protected]>

x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU

When a CPU is taken offline resctrl may need to move the overflow or limbo
handlers to run on a different CPU.

Once the off

x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU

When a CPU is taken offline resctrl may need to move the overflow or limbo
handlers to run on a different CPU.

Once the offline callbacks have been split, cqm_setup_limbo_handler() will be
called while the CPU that is going offline is still present in the CPU mask.

Pass the CPU to exclude to cqm_setup_limbo_handler() and
mbm_setup_overflow_handler(). These functions can use a variant of
cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need
excluding.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>

show more ...


1234