amdgpu_ras.h - OpenGrok history log for /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1
# cc11dffc	25-Mar-2025	Stanley.Yang <[email protected]>	drm/amdgpu: Update ta ras block Update ta ra block to keep sync with RAS TA. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher drm/amdgpu: Update ta ras block Update ta ra block to keep sync with RAS TA. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5
# d4bd7a50	26-Feb-2025	Xiang Liu <[email protected]>	drm/amdgpu: Report generic instead of unknown boot time errors Change the DMESG reporting of unknown errors to "Boot Controller Generic Error" to align with the RAS SPEC and provide more clarity to drm/amdgpu: Report generic instead of unknown boot time errors Change the DMESG reporting of unknown errors to "Boot Controller Generic Error" to align with the RAS SPEC and provide more clarity to customers. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1
# 16b85a09	22-Jan-2025	Hawking Zhang <[email protected]>	drm/amdgpu: Update usage for bad page threshold The driver's behavior varies based on the configuration of amdgpu_bad_page_threshold setting Signed-off-by: Hawking Zhang <[email protected]> Rev drm/amdgpu: Update usage for bad page threshold The driver's behavior varies based on the configuration of amdgpu_bad_page_threshold setting Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6
# a8d133e6	31-Oct-2024	Tao Zhou <[email protected]>	drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes All legacy RAS bad pages are generated in NPS1 mode, but new bad page can be generated in any NPS mode, so we can't use drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes All legacy RAS bad pages are generated in NPS1 mode, but new bad page can be generated in any NPS mode, so we can't use retired_page stored on eeprom directly in non-nps1 mode even for legacy data. We need to take different actions for different data, new data can be identified from old data by UMC_CHANNEL_IDX_V2 flag. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 71a0e963	29-Oct-2024	Tao Zhou <[email protected]>	drm/amdgpu: save UMC global channel index to eeprom Save the global channel index returned by RAS TA to eeprom. We can get memory physical address by MCA address and channel index. Signed-off-by: T drm/amdgpu: save UMC global channel index to eeprom Save the global channel index returned by RAS TA to eeprom. We can get memory physical address by MCA address and channel index. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.12-rc5
# e1ee2111	24-Oct-2024	Lijo Lazar <[email protected]>	drm/amdgpu: Prefer RAS recovery for scheduler hang Before scheduling a recovery due to scheduler/job hang, check if a RAS error is detected. If so, choose RAS recovery to handle the situation. A sch drm/amdgpu: Prefer RAS recovery for scheduler hang Before scheduling a recovery due to scheduler/job hang, check if a RAS error is detected. If so, choose RAS recovery to handle the situation. A scheduler/job hang could be the side effect of a RAS error. In such cases, it is required to go through the RAS error recovery process. A RAS error recovery process in certains cases also could avoid a full device device reset. An error state is maintained in RAS context to detect the block affected. Fatal Error state uses unused block id. Set the block id when error is detected. If the interrupt handler detected a poison error, it's not required to look for a fatal error. Skip fatal error checking in such cases. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 84a2947e	30-Oct-2024	Victor Skvortsov <[email protected]>	drm/amdgpu: Implement virt req_ras_err_count Enable RAS late init if VF RAS Telemetry is supported. When enabled, the VF can use this interface to query total RAS error counts from the host. The drm/amdgpu: Implement virt req_ras_err_count Enable RAS late init if VF RAS Telemetry is supported. When enabled, the VF can use this interface to query total RAS error counts from the host. The VF FB access may abruptly end due to a fatal error, therefore the VF must cache and sanitize the input. The Host allows 15 Telemetry messages every 60 seconds, afterwhich the host will ignore any more in-coming telemetry messages. The VF will rate limit its msg calling to once every 5 seconds (12 times in 60 seconds). While the VF is rate limited, it will continue to report the last good cached data. v2: Flip generate report & update statistics order for VF Signed-off-by: Victor Skvortsov <[email protected]> Acked-by: Tao Zhou <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6
# b17f8732	30-Aug-2024	Lijo Lazar <[email protected]>	drm/amdgpu: Add helper to initialize badpage info Add a separate function to read badpage data during initialization. Reading bad pages will need hardware access and cannot be done during reset. Hen drm/amdgpu: Add helper to initialize badpage info Add a separate function to read badpage data during initialization. Reading bad pages will need hardware access and cannot be done during reset. Hence in cases where device needs a full reset during init itself, attempting to read will cause a deadlock. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2
# 671af066	02-Aug-2024	Yang Wang <[email protected]>	drm/amdgpu: remove RAS unused paramter 'err_addr' - amdgpu_ras_error_statistic_ue_count() - amdgpu_ras_error_statistic_ce_count() - amdgpu_ras_error_statistic_de_count() The parameter 'err_addr' is drm/amdgpu: remove RAS unused paramter 'err_addr' - amdgpu_ras_error_statistic_ue_count() - amdgpu_ras_error_statistic_ce_count() - amdgpu_ras_error_statistic_de_count() The parameter 'err_addr' is no longer used since following patch. Fixes: a7e8467fbeee ("drm/amdgpu: Remove unused code") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 792be2e2	01-Aug-2024	Tao Zhou <[email protected]>	drm/amdgpu: create function to check RAS RMA status In the convenience of calling it globally. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed- drm/amdgpu: create function to check RAS RMA status In the convenience of calling it globally. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# dfe9d047	01-Aug-2024	Hawking Zhang <[email protected]>	drm/amdgpu: Add more types for boot time error reporting Data abort exception and unknown errors are supported. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <tao.zhou1 drm/amdgpu: Add more types for boot time error reporting Data abort exception and unknown errors are supported. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.11-rc1, v6.10
# a7e8467f	11-Jul-2024	YiPeng Chai <[email protected]>	drm/amdgpu: Remove unused code Remove unused code. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <alexander.deucher drm/amdgpu: Remove unused code Remove unused code. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 56631dee	11-Jul-2024	YiPeng Chai <[email protected]>	drm/amdgpu: optimize logging deferred error info 1. Use pa_pfn as the radix-tree key index to log deferred error info. 2. Use local array to store a row of bad pages. Signed-off-by: YiPeng Chai drm/amdgpu: optimize logging deferred error info 1. Use pa_pfn as the radix-tree key index to log deferred error info. 2. Use local array to store a row of bad pages. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc7
# 59f488be	03-Jul-2024	Yang Wang <[email protected]>	drm/amdgpu: add ras event state device attribute support add amdgpu ras 'event_state' sysfs device attribute support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <tao.zho drm/amdgpu: add ras event state device attribute support add amdgpu ras 'event_state' sysfs device attribute support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc6
# 12b435a4	28-Jun-2024	Yang Wang <[email protected]>	drm/amdgpu: add ras POSION_CONSUMPTION event id support add amdgpu ras POSION_CONSUMPTION event id support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected] drm/amdgpu: add ras POSION_CONSUMPTION event id support add amdgpu ras POSION_CONSUMPTION event id support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 5b9de259	27-Jun-2024	Yang Wang <[email protected]>	drm/amdgpu: add ras POSION_CREATION event id support add amdgpu ras POSION_CREATION event id support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Rev drm/amdgpu: add ras POSION_CREATION event id support add amdgpu ras POSION_CREATION event id support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 75ac6a25	25-Jun-2024	Yang Wang <[email protected]>	drm/amdgpu: refine amdgpu ras event id core code v1: - use unified event id to manage ras events - add a new function amdgpu_ras_query_error_status_with_event() to accept event type as parameter. drm/amdgpu: refine amdgpu ras event id core code v1: - use unified event id to manage ras events - add a new function amdgpu_ras_query_error_status_with_event() to accept event type as parameter. v2: add a warn log to show the location of function failure when calling amdgpu_ras_mark_event(). (Tao Zhou) v3: change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL. v4: rename amdgpu_ras_get_recovery_event() to amdgpu_ras_get_fatal_error_event(). Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 332210c1	04-Jul-2024	Yang Wang <[email protected]>	drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG remove redundant semicolons in RAS_EVENT_LOG to avoid code format check warning. Fixes: b712d7c20133 ("drm/amdgpu: fix compiler 'side-effect drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG remove redundant semicolons in RAS_EVENT_LOG to avoid code format check warning. Fixes: b712d7c20133 ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 5f08275c	24-Jun-2024	YiPeng Chai <[email protected]>	drm/amdgpu: refine poison creation interrupt handler In order to apply to the case where a large number of ras poison interrupts: 1. Change to use variable to record poison creation requests to a drm/amdgpu: refine poison creation interrupt handler In order to apply to the case where a large number of ras poison interrupts: 1. Change to use variable to record poison creation requests to avoid fifo full. 2. Prioritize handling poison creation requests instead of following the order of requests received by the driver. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 78146c1d	24-Jun-2024	YiPeng Chai <[email protected]>	drm/amdgpu: add variable to record the deferred error number read by driver Add variable to record the deferred error number read by driver. Signed-off-by: YiPeng Chai <[email protected]> Reviewe drm/amdgpu: add variable to record the deferred error number read by driver Add variable to record the deferred error number read by driver. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 7e437167	29-May-2024	Tao Zhou <[email protected]>	drm/amdgpu: create amdgpu_ras_in_recovery to simplify code Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawk drm/amdgpu: create amdgpu_ras_in_recovery to simplify code Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc1
# b95fa494	23-May-2024	Tao Zhou <[email protected]>	drm/amdgpu: add RAS is_rma flag Set the flag to true if bad page number reaches threshold. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off- drm/amdgpu: add RAS is_rma flag Set the flag to true if bad page number reaches threshold. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# a474161e	30-May-2024	Hawking Zhang <[email protected]>	drm/amdgpu: Update programming for boot error reporting AMDGPU_RAS_GPU_ERR_BOOT_STATUS field is no longer valid. The polling sequence is also simplifed according to the latest firmware change. Sign drm/amdgpu: Update programming for boot error reporting AMDGPU_RAS_GPU_ERR_BOOT_STATUS field is no longer valid. The polling sequence is also simplifed according to the latest firmware change. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 473af28d	28-May-2024	Hawking Zhang <[email protected]>	drm/amdgpu: Estimate RAS reservation when report capacity v2 Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 a drm/amdgpu: Estimate RAS reservation when report capacity v2 Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 and v13_0_14 Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# cf85764e	21-May-2024	Hawking Zhang <[email protected]>	drm/amdgpu: correct hbm field in boot status hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed- drm/amdgpu: correct hbm field in boot status hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
12 3 4 5 6