| 58029c39 | 27-Feb-2025 |
Yazen Ghannam <[email protected]> |
RAS/AMD/FMPM: Get masked address
Some operations require checking, or ignoring, specific bits in an address value. For example, this can be comparing address values to identify unique structures.
C
RAS/AMD/FMPM: Get masked address
Some operations require checking, or ignoring, specific bits in an address value. For example, this can be comparing address values to identify unique structures.
Currently, the full address value is compared when filtering for duplicates. This results in over counting and creation of extra records. This gives the impression that more unique events occurred than did in reality.
Mask the address for physical rows on MI300.
[ bp: Simplify. ]
Fixes: 6f15e617cc99 ("RAS: Introduce a FRU memory poison manager") Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Cc: [email protected]
show more ...
|
| ba437905 | 07-Jun-2024 |
Yazen Ghannam <[email protected]> |
RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation
The currently used normalized address format is not applicable to all MI300 systems. This leads to incorrect results
RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation
The currently used normalized address format is not applicable to all MI300 systems. This leads to incorrect results during address translation.
Drop the fixed layout and construct the normalized address from system settings.
Fixes: 87a612375307 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support") Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| e0372d69 | 06-Jun-2024 |
John Allen <[email protected]> |
RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization
Unlike with previous Data Fabric versions, with Data Fabric 4.5 non-power-of-2 denormalization, there are bits of the system physical address that c
RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization
Unlike with previous Data Fabric versions, with Data Fabric 4.5 non-power-of-2 denormalization, there are bits of the system physical address that can't be fully reconstructed from the normalized address.
To determine the proper combination of missing system physical address bits, iterate through each possible combination of these bits, normalize the resulting system physical address, and compare to the original address that is being translated. If the addresses match, then the correct permutation of bits has been found.
Signed-off-by: John Allen <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| 6cce048c | 06-Jun-2024 |
John Allen <[email protected]> |
RAS/AMD/ATL: Expand helpers for adding and removing base and hole
The ret_addr field in struct addr_ctx contains the intermediate value of the returned address as it passes through multiple steps in
RAS/AMD/ATL: Expand helpers for adding and removing base and hole
The ret_addr field in struct addr_ctx contains the intermediate value of the returned address as it passes through multiple steps in the translation process. Currently, adding the DRAM base and legacy hole is only done once, so it operates directly on the intermediate value.
However, for DF 4.5 non-power-of-2 denormalization, adding and removing the DRAM base and legacy hole needs to be done for multiple temporary address values. During this process, the intermediate value should not be lost so the ret_addr value can't be reused.
Update the existing 'add' helper to operate on an arbitrary address and introduce a new 'remove' helper to do the inverse operations.
Signed-off-by: John Allen <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| dd61b55d | 22-Feb-2024 |
Yazen Ghannam <[email protected]> |
RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
The hash_pa8 and hashed_bit values in denorm_addr_df4_np2() are currently defined as u8 types. These variables represent single bits.
'hash_pa
RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
The hash_pa8 and hashed_bit values in denorm_addr_df4_np2() are currently defined as u8 types. These variables represent single bits.
'hash_pa8' is set based on logical AND operations using masks with more than 8 bits. So the calculated value will not fit in this variable. It will always be '0'. The 'hash_pa8' check later in the function will fail which produces incorrect results for some cases.
Change these variables to bool type. This clarifies that they are single bit values. Also, this allows the compiler to ensure they hold the proper results. Remove an unnecessary shift operation.
[ bp: Remove the unnecessary brackets in the else-branch of the hash_pa8 assignment. ]
Fixes: 3f3174996be6 ("RAS: Introduce AMD Address Translation Library") Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| 87a61237 | 31-Jan-2024 |
Yazen Ghannam <[email protected]> |
RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
Zen-based AMD systems report DRAM ECC errors through Unified Memory Controller (UMC) MCA banks. The value provided in MCA_ADDR i
RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
Zen-based AMD systems report DRAM ECC errors through Unified Memory Controller (UMC) MCA banks. The value provided in MCA_ADDR is a "normalized" address which represents the UMC's view of its managed memory. The normalized address must be translated to a system physical address for software to take action.
MI300 systems, uniquely, do not provide a normalized address in MCA_ADDR for DRAM ECC errors. Rather, the "DRAM" address is reported. This value includes identifiers for the bank, row, column, pseudochannel and stack of the memory location.
The DRAM address must be converted to a normalized address in order to be further translated to a system physical address.
Add helper functions to do the DRAM to normalized translation for MI300 systems. The method is based on the fixed hardware layout of the on-chip memory.
[ bp: Massage commit message, decapitalize some, rename function. ]
Signed-off-by: Yazen Ghannam <[email protected]> Co-developed-by: Muralidhara M K <[email protected]> Signed-off-by: Muralidhara M K <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Muralidhara M K <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| a7b57372 | 31-Jan-2024 |
Dan Carpenter <[email protected]> |
RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
Check against ARRAY_SIZE() which is the number of elements instead of sizeof() which is the number of bytes.
Fixes: 453f0ae79
RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
Check against ARRAY_SIZE() which is the number of elements instead of sizeof() which is the number of bytes.
Fixes: 453f0ae79732 ("RAS/AMD/ATL: Add MI300 support") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| 453f0ae7 | 28-Jan-2024 |
Muralidhara M K <[email protected]> |
RAS/AMD/ATL: Add MI300 support
AMD MI300 systems include on-die HBM3 memory and a unique topology. And they fall under Data Fabric version 4.5 in overall design.
Generally, topology information (ID
RAS/AMD/ATL: Add MI300 support
AMD MI300 systems include on-die HBM3 memory and a unique topology. And they fall under Data Fabric version 4.5 in overall design.
Generally, topology information (IDs, etc.) is gathered from Data Fabric registers. However, the unique topology for MI300 means that some topology information is fixed in hardware and follows arbitrary mappings. Furthermore, not all hardware instances are software-visible, so register accesses must be adjusted.
Recognize and add helper functions for the new MI300 interleave modes. Add lookup tables for fixed values where appropriate. Adjust how Die and Node IDs are found and used.
Also, fix some register bitmasks that were mislabeled.
Signed-off-by: Muralidhara M K <[email protected]> Co-developed-by: Yazen Ghannam <[email protected]> Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|