memory-tiers.h - OpenGrok history log for /linux-6.15/include/linux/memory-tiers.h

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7
# 823430c8	04-Jul-2024	Ho-Ren (Jack) Chuang <[email protected]>	memory tier: consolidate the initialization of memory tiers The current memory tier initialization process is distributed across two different functions, memory_tier_init() and memory_tier_late_init memory tier: consolidate the initialization of memory tiers The current memory tier initialization process is distributed across two different functions, memory_tier_init() and memory_tier_late_init(). This design is hard to maintain. Thus, this patch is proposed to reduce the possible code paths by consolidating different initialization patches into one. The earlier discussion with Jonathan and Ying is listed here: https://lore.kernel.org/lkml/[email protected]/ If we want to put these two initializations together, they must be placed together in the later function. Because only at that time, the HMAT information will be ready, adist between nodes can be calculated, and memory tiering can be established based on the adist. So we position the initialization at memory_tier_init() to the memory_tier_late_init() call. Moreover, it's natural to keep memory_tier initialization in drivers at device_initcall() level. If we simply move the set_node_memory_tier() from memory_tier_init() to late_initcall(), it will result in HMAT not registering the mt_adistance_algorithm callback function, because set_node_memory_tier() is not performed during the memory tiering initialization phase, leading to a lack of correct default_dram information. Therefore, we introduced a nodemask to pass the information of the default DRAM nodes. The reason for not choosing to reuse default_dram_type->nodes is that it is not clean enough. So in the end, we use a __initdata variable, which is a variable that is released once initialization is complete, including both CPU and memory nodes for HMAT to iterate through. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ho-Ren (Jack) Chuang <[email protected]> Suggested-by: Jonathan Cameron <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Gregory Price <[email protected]> Cc: Len Brown <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Ravi Jonnalagadda <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3
# a72a30af	05-Apr-2024	Ho-Ren (Jack) Chuang <[email protected]>	memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types Patch series "Improved Memory Tier Creation for CPUless NUMA Nodes", v11. When a memory device, memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types Patch series "Improved Memory Tier Creation for CPUless NUMA Nodes", v11. When a memory device, such as CXL1.1 type3 memory, is emulated as normal memory (E820_TYPE_RAM), the memory device is indistinguishable from normal DRAM in terms of memory tiering with the current implementation. The current memory tiering assigns all detected normal memory nodes to the same DRAM tier. This results in normal memory devices with different attributions being unable to be assigned to the correct memory tier, leading to the inability to migrate pages between different types of memory. https://lore.kernel.org/linux-mm/PH0PR08MB7955E9F08CCB64F23963B5C3A860A@PH0PR08MB7955.namprd08.prod.outlook.com/T/ This patchset automatically resolves the issues. It delays the initialization of memory tiers for CPUless NUMA nodes until they obtain HMAT information and after all devices are initialized at boot time, eliminating the need for user intervention. If no HMAT is specified, it falls back to using `default_dram_type`. Example usecase: We have CXL memory on the host, and we create VMs with a new system memory device backed by host CXL memory. We inject CXL memory performance attributes through QEMU, and the guest now sees memory nodes with performance attributes in HMAT. With this change, we enable the guest kernel to construct the correct memory tiering for the memory nodes. This patch (of 2): Since different memory devices require finding, allocating, and putting memory types, these common steps are abstracted in this patch, enhancing the scalability and conciseness of the code. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ho-Ren (Jack) Chuang <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Gregory Price <[email protected]> Cc: Hao Xiang <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ravi Jonnalagadda <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7
# 6a954e94	21-Dec-2023	Dave Jiang <[email protected]>	base/node / acpi: Change 'node_hmem_attrs' to 'access_coordinates' Dan Williams suggested changing the struct 'node_hmem_attrs' to 'access_coordinates' [1]. The struct is a container of r/w-latency base/node / acpi: Change 'node_hmem_attrs' to 'access_coordinates' Dan Williams suggested changing the struct 'node_hmem_attrs' to 'access_coordinates' [1]. The struct is a container of r/w-latency and r/w-bandwidth numbers. Moving forward, this container will also be used by CXL to store the performance characteristics of each link hop in the PCIE/CXL topology. So, where node_hmem_attrs is just the access parameters of a memory-node, access_coordinates applies more broadly to hardware topology characteristics. The observation is that seemed like an exercise in having the application identify "where" it falls on a spectrum of bandwidth and latency needs. For the tuple of read/write-latency and read/write-bandwidth, "coordinates" is not a perfect fit. Sometimes it is just conveying values in isolation and not a "location" relative to other performance points, but in the end this data is used to identify the performance operation point of a given memory-node. [2] Link: http://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/linux-cxl/[email protected]/ Suggested-by: Dan Williams <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/170319615734.2212653.15319394025985499185.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <[email protected]> show more ...
Revision tags: v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4
# 6bc2cfdf	26-Sep-2023	Huang Ying <[email protected]>	dax, kmem: calculate abstract distance with general interface Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is used for slow memory type in kmem driver. This limits the usage dax, kmem: calculate abstract distance with general interface Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is used for slow memory type in kmem driver. This limits the usage of kmem driver, for example, it cannot be used for HBM (high bandwidth memory). So, we use the general abstract distance calculation mechanism in kmem drivers to get more accurate abstract distance on systems with proper support. The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as fallback only. Now, multiple memory types may be managed by kmem. These memory types are put into the "kmem_memory_types" list and protected by kmem_memory_type_lock. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Tested-by: Bharata B Rao <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Wei Xu <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Yang Shi <[email protected]> Cc: Rafael J Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 3718c02d	26-Sep-2023	Huang Ying <[email protected]>	acpi, hmat: calculate abstract distance with HMAT A memory tiering abstract distance calculation algorithm based on ACPI HMAT is implemented. The basic idea is as follows. The performance attribut acpi, hmat: calculate abstract distance with HMAT A memory tiering abstract distance calculation algorithm based on ACPI HMAT is implemented. The basic idea is as follows. The performance attributes of system default DRAM nodes are recorded as the base line. Whose abstract distance is MEMTIER_ADISTANCE_DRAM. Then, the ratio of the abstract distance of a memory node (target) to MEMTIER_ADISTANCE_DRAM is scaled based on the ratio of the performance attributes of the node to that of the default DRAM nodes. The functions to record the read/write latency/bandwidth of the default DRAM nodes and calculate abstract distance according to read/write latency/bandwidth ratio will be used by CXL CDAT (Coherent Device Attribute Table) and other memory device drivers. So, they are put in memory-tiers.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Tested-by: Bharata B Rao <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Wei Xu <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Yang Shi <[email protected]> Cc: Rafael J Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 07a8bdd4	26-Sep-2023	Huang Ying <[email protected]>	memory tiering: add abstract distance calculation algorithms management Patch series "memory tiering: calculate abstract distance based on ACPI HMAT", v4. We have the explicit memory tiers framewor memory tiering: add abstract distance calculation algorithms management Patch series "memory tiering: calculate abstract distance based on ACPI HMAT", v4. We have the explicit memory tiers framework to manage systems with multiple types of memory, e.g., DRAM in DIMM slots and CXL memory devices. Where, same kind of memory devices will be grouped into memory types, then put into memory tiers. To describe the performance of a memory type, abstract distance is defined. Which is in direct proportion to the memory latency and inversely proportional to the memory bandwidth. To keep the code as simple as possible, fixed abstract distance is used in dax/kmem to describe slow memory such as Optane DCPMM. To support more memory types, in this series, we added the abstract distance calculation algorithm management mechanism, provided a algorithm implementation based on ACPI HMAT, and used the general abstract distance calculation interface in dax/kmem driver. So, dax/kmem can support HBM (high bandwidth memory) in addition to the original Optane DCPMM. This patch (of 4): The abstract distance may be calculated by various drivers, such as ACPI HMAT, CXL CDAT, etc. While it may be used by various code which hot-add memory node, such as dax/kmem etc. To decouple the algorithm users and the providers, the abstract distance calculation algorithms management mechanism is implemented in this patch. It provides interface for the providers to register the implementation, and interface for the users. Multiple algorithm implementations can cooperate via calculating abstract distance for different memory nodes. The preference of algorithm implementations can be specified via priority (notifier_block.priority). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Tested-by: Bharata B Rao <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Wei Xu <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Yang Shi <[email protected]> Cc: Rafael J Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5
# 51a23b1b	02-Aug-2023	Li Zhijian <[email protected]>	acpi,mm: fix typo sibiling -> sibling First found this typo as reviewing memory tier code. Fix it by sed like: $ sed -i 's/sibiling/sibling/g' $(git grep -l sibiling) so the acpi one will be correc acpi,mm: fix typo sibiling -> sibling First found this typo as reviewing memory tier code. Fix it by sed like: $ sed -i 's/sibiling/sibling/g' $(git grep -l sibiling) so the acpi one will be corrected as well. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Li Zhijian <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Len Brown <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1
# bded67f8	06-Jul-2023	Miaohe Lin <[email protected]>	memory tier: rename destroy_memory_type() to put_memory_type() It appears that destroy_memory_type() isn't a very good name because we usually will not free the memory_type here. So rename it to a memory tier: rename destroy_memory_type() to put_memory_type() It appears that destroy_memory_type() isn't a very good name because we usually will not free the memory_type here. So rename it to a more appropriate name i.e. put_memory_type(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Suggested-by: Huang, Ying <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: Xiao Yang <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7
# 1eeaa4fd	23-Sep-2022	Liu Shixin <[email protected]>	memory: move hotplug memory notifier priority to same file for easy sorting The priority of hotplug memory callback is defined in a different file. And there are some callers using numbers directly memory: move hotplug memory notifier priority to same file for easy sorting The priority of hotplug memory callback is defined in a different file. And there are some callers using numbers directly. Collect them together into include/linux/memory.h for easy reading. This allows us to sort their priorities more intuitively without additional comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Waiman Long <[email protected]> Cc: zefan li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2
# 467b171a	18-Aug-2022	Aneesh Kumar K.V <[email protected]>	mm/demotion: update node_is_toptier to work with memory tiers With memory tier support we can have memory only NUMA nodes in the top tier from which we want to avoid promotion tracking NUMA faults. mm/demotion: update node_is_toptier to work with memory tiers With memory tier support we can have memory only NUMA nodes in the top tier from which we want to avoid promotion tracking NUMA faults. Update node_is_toptier to work with memory tiers. All NUMA nodes are by default top tier nodes. With lower(slower) memory tiers added we consider all memory tiers above a memory tier having CPU NUMA nodes as a top memory tier [[email protected]: include missed header file, memory-tiers.h] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/memory.c needs linux/memory-tiers.h] [[email protected]: make toptier_distance inclusive upper bound of toptiers] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 32008027	18-Aug-2022	Jagdish Gediya <[email protected]>	mm/demotion: demote pages according to allocation fallback order Currently, a higher tier node can only be demoted to selected nodes on the next lower tier as defined by the demotion path. This str mm/demotion: demote pages according to allocation fallback order Currently, a higher tier node can only be demoted to selected nodes on the next lower tier as defined by the demotion path. This strict demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space). This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that currently. This patch adds support to get all the allowed demotion targets for a memory tier. demote_page_list() function is now modified to utilize this allowed node mask as the fallback allocation mask. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jagdish Gediya <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# b26ac6f3	18-Aug-2022	Aneesh Kumar K.V <[email protected]>	mm/demotion: drop memtier from memtype Now that we track node-specific memtier in pg_data_t, we can drop memtier from memtype. Link: https://lkml.kernel.org/r/20220818131042.113280-8-aneesh.kumar@l mm/demotion: drop memtier from memtype Now that we track node-specific memtier in pg_data_t, we can drop memtier from memtype. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 6c542ab7	18-Aug-2022	Aneesh Kumar K.V <[email protected]>	mm/demotion: build demotion targets based on explicit memory tiers This patch switch the demotion target building logic to use memory tiers instead of NUMA distance. All N_MEMORY NUMA nodes will be mm/demotion: build demotion targets based on explicit memory tiers This patch switch the demotion target building logic to use memory tiers instead of NUMA distance. All N_MEMORY NUMA nodes will be placed in the default memory tier and additional memory tiers will be added by drivers like dax kmem. This patch builds the demotion target for a NUMA node by looking at all memory tiers below the tier to which the NUMA node belongs. The closest node in the immediately following memory tier is used as a demotion target. Since we are now only building demotion target for N_MEMORY NUMA nodes the CPU hotplug calls are removed in this patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 7b88bda3	18-Aug-2022	Aneesh Kumar K.V <[email protected]>	mm/demotion/dax/kmem: set node's abstract distance to MEMTIER_DEFAULT_DAX_ADISTANCE By default, all nodes are assigned to the default memory tier which is the memory tier designated for nodes with D mm/demotion/dax/kmem: set node's abstract distance to MEMTIER_DEFAULT_DAX_ADISTANCE By default, all nodes are assigned to the default memory tier which is the memory tier designated for nodes with DRAM Set dax kmem device node's tier to slower memory tier by assigning abstract distance to MEMTIER_DEFAULT_DAX_ADISTANCE. Low-level drivers like papr_scm or ACPI NFIT can initialize memory device type to a more accurate value based on device tree details or HMAT. If the kernel doesn't find the memory type initialized, a default slower memory type is assigned by the kmem driver. [[email protected]: assign correct memory type for multiple dax devices with the same node affinity] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# c6123a19	18-Aug-2022	Aneesh Kumar K.V <[email protected]>	mm/demotion: add hotplug callbacks to handle new numa node onlined If the new NUMA node onlined doesn't have a abstract distance assigned, the kernel adds the NUMA node to default memory tier. [ane mm/demotion: add hotplug callbacks to handle new numa node onlined If the new NUMA node onlined doesn't have a abstract distance assigned, the kernel adds the NUMA node to default memory tier. [[email protected]: fix kernel error with memory hotplug] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 91952440	18-Aug-2022	Aneesh Kumar K.V <[email protected]>	mm/demotion: move memory demotion related code This moves memory demotion related code to mm/memory-tiers.c. No functional change in this patch. Link: https://lkml.kernel.org/r/20220818131042.1132 mm/demotion: move memory demotion related code This moves memory demotion related code to mm/memory-tiers.c. No functional change in this patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 992bf775	18-Aug-2022	Aneesh Kumar K.V <[email protected]>	mm/demotion: add support for explicit memory tiers Patch series "mm/demotion: Memory tiers and demotion", v15. The current kernel has the basic memory tiering support: Inactive pages on a higher ti mm/demotion: add support for explicit memory tiers Patch series "mm/demotion: Memory tiers and demotion", v15. The current kernel has the basic memory tiering support: Inactive pages on a higher tier NUMA node can be migrated (demoted) to a lower tier NUMA node to make room for new allocations on the higher tier NUMA node. Frequently accessed pages on a lower tier NUMA node can be migrated (promoted) to a higher tier NUMA node to improve the performance. In the current kernel, memory tiers are defined implicitly via a demotion path relationship between NUMA nodes, which is created during the kernel initialization and updated when a NUMA node is hot-added or hot-removed. The current implementation puts all nodes with CPU into the highest tier, and builds the tier hierarchy tier-by-tier by establishing the per-node demotion targets based on the distances between nodes. This current memory tier kernel implementation needs to be improved for several important use cases: * The current tier initialization code always initializes each memory-only NUMA node into a lower tier. But a memory-only NUMA node may have a high performance memory device (e.g. a DRAM-backed memory-only node on a virtual machine) and that should be put into a higher tier. * The current tier hierarchy always puts CPU nodes into the top tier. But on a system with HBM (e.g. GPU memory) devices, these memory-only HBM NUMA nodes should be in the top tier, and DRAM nodes with CPUs are better to be placed into the next lower tier. * Also because the current tier hierarchy always puts CPU nodes into the top tier, when a CPU is hot-added (or hot-removed) and triggers a memory node from CPU-less into a CPU node (or vice versa), the memory tier hierarchy gets changed, even though no memory node is added or removed. This can make the tier hierarchy unstable and make it difficult to support tier-based memory accounting. * A higher tier node can only be demoted to nodes with shortest distance on the next lower tier as defined by the demotion path, not any other node from any lower tier. This strict, demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space), and has resulted in the feature request for an interface to override the system-wide, per-node demotion order from the userspace. This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that. This patch series make the creation of memory tiers explicit under the control of device driver. Memory Tier Initialization ========================== Linux kernel presents memory devices as NUMA nodes and each memory device is of a specific type. The memory type of a device is represented by its abstract distance. A memory tier corresponds to a range of abstract distance. This allows for classifying memory devices with a specific performance range into a memory tier. By default, all memory nodes are assigned to the default tier with abstract distance 512. A device driver can move its memory nodes from the default tier. For example, PMEM can move its memory nodes below the default tier, whereas GPU can move its memory nodes above the default tier. The kernel initialization code makes the decision on which exact tier a memory node should be assigned to based on the requests from the device drivers as well as the memory device hardware information provided by the firmware. Hot-adding/removing CPUs doesn't affect memory tier hierarchy. This patch (of 10): In the current kernel, memory tiers are defined implicitly via a demotion path relationship between NUMA nodes, which is created during the kernel initialization and updated when a NUMA node is hot-added or hot-removed. The current implementation puts all nodes with CPU into the highest tier, and builds the tier hierarchy by establishing the per-node demotion targets based on the distances between nodes. This current memory tier kernel implementation needs to be improved for several important use cases, The current tier initialization code always initializes each memory-only NUMA node into a lower tier. But a memory-only NUMA node may have a high performance memory device (e.g. a DRAM-backed memory-only node on a virtual machine) that should be put into a higher tier. The current tier hierarchy always puts CPU nodes into the top tier. But on a system with HBM or GPU devices, the memory-only NUMA nodes mapping these devices should be in the top tier, and DRAM nodes with CPUs are better to be placed into the next lower tier. With current kernel higher tier node can only be demoted to nodes with shortest distance on the next lower tier as defined by the demotion path, not any other node from any lower tier. This strict, demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space), This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that. This patch series address the above by defining memory tiers explicitly. Linux kernel presents memory devices as NUMA nodes and each memory device is of a specific type. The memory type of a device is represented by its abstract distance. A memory tier corresponds to a range of abstract distance. This allows for classifying memory devices with a specific performance range into a memory tier. This patch configures the range/chunk size to be 128. The default DRAM abstract distance is 512. We can have 4 memory tiers below the default DRAM with abstract distance range 0 - 127, 127 - 255, 256- 383, 384 - 511. Faster memory devices can be placed in these faster(higher) memory tiers. Slower memory devices like persistent memory will have abstract distance higher than the default DRAM level. [[email protected]: fix comment, per Aneesh] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yang Shi <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...