| 699ea521 | 12-Feb-2025 |
Shiju Jose <[email protected]> |
EDAC: Add a memory repair control feature
Add a generic EDAC memory repair control driver to manage memory repairs in the system, such as CXL Post Package Repair (PPR) and other soft and hard PPR fe
EDAC: Add a memory repair control feature
Add a generic EDAC memory repair control driver to manage memory repairs in the system, such as CXL Post Package Repair (PPR) and other soft and hard PPR features.
For example, a CXL device with DRAM components that support PPR features may implement PPR maintenance operations. DRAM components may support two types of PPR:
- hard PPR, for a permanent row repair, and - soft PPR, for a temporary row repair.
Soft PPR is much faster than hard PPR, but the repair is lost with a power cycle.
When a CXL device detects an error in a memory, it may report the need for a repair maintenance operation by using an event record where the "maintenance needed" flag is set. The event records contain the device physical address (DPA) and other optional attributes of the memory to repair.
The kernel will report the corresponding CXL general media or DRAM trace event to userspace, and userspace tools (e.g. rasdaemon) will initiate a repair operation in response to the device request via the sysfs repair control.
Device with memory repair features registers with EDAC device driver, which retrieves a memory repair descriptor from EDAC memory repair driver and exposes the sysfs repair control attributes to userspace in
/sys/bus/edac/devices/<dev-name>/mem_repairX/.
The common memory repair control interface abstracts the control of arbitrary memory repair functionality into a standardized set of functions. The sysfs memory repair attribute nodes are only available if the client driver has implemented the corresponding attribute callback function and provided operations to the EDAC device driver during registration.
[ bp: Massage, fixup edac_dev_register() retvals, merge write_overflow fix to mem_repair_create_desc() ]
Signed-off-by: Shiju Jose <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| bcbd069b | 12-Feb-2025 |
Shiju Jose <[email protected]> |
EDAC: Add a Error Check Scrub control feature
Add an Error Check Scrub (ECS) control to manage a memory device's ECS feature.
The ECS is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-
EDAC: Add a Error Check Scrub control feature
Add an Error Check Scrub (ECS) control to manage a memory device's ECS feature.
The ECS is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and allows the DRAM to internally read, correct single-bit errors, and write back corrected data bits to the DRAM array while providing transparency to error counts.
The DDR5 device contains a number of memory media Field Replaceable Units (FRU) per device. The DDR5 ECS feature and thus the ECS control driver supports configuring the ECS parameters per FRU.
Memory devices support the ECS feature register with the EDAC device driver, which retrieves the ECS descriptor from the EDAC ECS driver. This driver exposes sysfs ECS control attributes to userspace via
/sys/bus/edac/devices/<dev-name>/ecs_fruX/.
The common sysfs ECS control interface abstracts the control of an arbitrary ECS functionality to a common set of functions.
Support for the ECS feature is added separately because the control attributes of the DDR5 ECS feature differ from those of the scrub feature.
The sysfs ECS attribute nodes are only present if the client driver has implemented the corresponding attribute callback function and passed the necessary operations to the EDAC RAS feature driver during registration.
[ bp: Massage, fixup edac_dev_register() retvals. ]
Co-developed-by: Jonathan Cameron <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]> Signed-off-by: Shiju Jose <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Fan Ni <[email protected]> Tested-by: Fan Ni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| f90b7381 | 12-Feb-2025 |
Shiju Jose <[email protected]> |
EDAC: Add scrub control feature
Add a scrub control to manage memory scrubbers in the system.
Devices with a scrub feature register with the EDAC device driver which retrieves the scrub descriptor
EDAC: Add scrub control feature
Add a scrub control to manage memory scrubbers in the system.
Devices with a scrub feature register with the EDAC device driver which retrieves the scrub descriptor from the scrub driver and exposes the control attributes for a instance to userspace at
/sys/bus/edac/devices/<dev-name>/scrubX/.
The common sysfs scrub control interface abstracts the control of arbitrary scrubbing functionality into a common set of functions. The attribute nodes are only present if the client driver has implemented the corresponding attribute callback function and passed the operations to the device driver during registration.
[ bp: Massage commit message, docs and code, simplify text a bit. Integrate fixup for: https://lore.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> ]
Co-developed-by: Jonathan Cameron <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]> Signed-off-by: Shiju Jose <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Daniel Ferguson <[email protected]> Tested-by: Fan Ni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|