|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
e1ee2111 |
| 24-Oct-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Prefer RAS recovery for scheduler hang
Before scheduling a recovery due to scheduler/job hang, check if a RAS error is detected. If so, choose RAS recovery to handle the situation. A sch
drm/amdgpu: Prefer RAS recovery for scheduler hang
Before scheduling a recovery due to scheduler/job hang, check if a RAS error is detected. If so, choose RAS recovery to handle the situation. A scheduler/job hang could be the side effect of a RAS error. In such cases, it is required to go through the RAS error recovery process. A RAS error recovery process in certains cases also could avoid a full device device reset.
An error state is maintained in RAS context to detect the block affected. Fatal Error state uses unused block id. Set the block id when error is detected. If the interrupt handler detected a poison error, it's not required to look for a fatal error. Skip fatal error checking in such cases.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
a86e0c0e |
| 15-Nov-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to ident
drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to identify whether the IP/feature is initialized for the first time or whether it's reinitialized after a reset.
Add RESET_RECOVERY init level to identify post reset reinitialization phase. This only provides a device level identification, IP/features may choose to track their state independently also.
Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4 |
|
| #
502d7630 |
| 17-Oct-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: validate resume before function call
Before making a function call to resume, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_resume where s
drm/amdgpu: validate resume before function call
Before making a function call to resume, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_resume where same checks and calls are repeated.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
e095026f |
| 17-Oct-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_suspend wher
drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2 |
|
| #
7feb4f3a |
| 30-Sep-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: update the handle ptr in resume
Update the *handle to amdgpu_ip_block ptr for all functions pointers of resume.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian
drm/amdgpu: update the handle ptr in resume
Update the *handle to amdgpu_ip_block ptr for all functions pointers of resume.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
982d7f9b |
| 30-Sep-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: update the handle ptr in suspend
Update the *handle to amdgpu_ip_block ptr for all functions pointers of suspend.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christi
drm/amdgpu: update the handle ptr in suspend
Update the *handle to amdgpu_ip_block ptr for all functions pointers of suspend.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1 |
|
| #
3138ab2c |
| 26-Sep-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: update the handle ptr in late_init
Update the ptr handle to amdgpu_ip_block ptr in all the functions of late_init function ptr.
Signed-off-by: Sunil Khatri <[email protected]> Review
drm/amdgpu: update the handle ptr in late_init
Update the ptr handle to amdgpu_ip_block ptr in all the functions of late_init function ptr.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7 |
|
| #
c4f00312 |
| 02-Sep-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Support reset-on-init on select SOCs
Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <Feifei.X
drm/amdgpu: Support reset-on-init on select SOCs
Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7 |
|
| #
78347b65 |
| 01-Jul-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: sysfs node disable query error count during gpu reset
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Stanley.Yang <
drm/amdgpu: sysfs node disable query error count during gpu reset
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
60c44843 |
| 22-Apr-2024 |
Ma Jun <[email protected]> |
drm/amdgpu: Fix uninitialized variable warnings
return 0 to avoid returning an uninitialized variable r
Signed-off-by: Ma Jun <[email protected]> Acked-by: Christian König <[email protected]>
drm/amdgpu: Fix uninitialized variable warnings
return 0 to avoid returning an uninitialized variable r
Signed-off-by: Ma Jun <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
a32c6f7f |
| 15-Dec-2023 |
Stanley.Yang <[email protected]> |
drm/amdgpu: Fix ecc irq enable/disable unpaired
The ecc_irq is disabled while GPU mode2 reset suspending process, but not be enabled during GPU mode2 reset resume process.
Changed from V1: only do
drm/amdgpu: Fix ecc irq enable/disable unpaired
The ecc_irq is disabled while GPU mode2 reset suspending process, but not be enabled during GPU mode2 reset resume process.
Changed from V1: only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip delete amdgpu_ras_late_resume function
Changed from V2: check umc ras supported before put ecc_irq
Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2 |
|
| #
4e8303cf |
| 11-Sep-2023 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Use function for IP version check
Use an inline function for version check. Gives more flexibility to handle any format changes.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-
drm/amdgpu: Use function for IP version check
Use an inline function for version check. Gives more flexibility to handle any format changes.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc1 |
|
| #
ca7aa3bf |
| 06-Sep-2023 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Use default reset method handler
When reset method is not passed in reset context, look for the handler for default reset method. On Aldebaran, default reset method for SOCs connected to
drm/amdgpu: Use default reset method handler
When reset method is not passed in reset context, look for the handler for default reset method. On Aldebaran, default reset method for SOCs connected to CPU over XGMI is MODE2.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Tested-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5 |
|
| #
f8a499ae |
| 05-Aug-2023 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Keep reset handlers shared
Instead of maintaining a list per device, keep the reset handlers common per ASIC family. A pointer to the list of handlers is maintained in reset control.
Si
drm/amdgpu: Keep reset handlers shared
Instead of maintaining a list per device, keep the reset handlers common per ASIC family. A pointer to the list of handlers is maintained in reset control.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Tested-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1 |
|
| #
0a83bb35 |
| 03-Aug-2022 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Avoid another list of reset devices
A list of devices to be reset is already created in amdgpu_device_gpu_recover function. Creating another list with the same nodes is incorrect and not
drm/amdgpu: Avoid another list of reset devices
A list of devices to be reset is already created in amdgpu_device_gpu_recover function. Creating another list with the same nodes is incorrect and not supported in list_head. Instead, pass the device list as part of reset context.
Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2) Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
9e085647 |
| 25-Feb-2022 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Refactor mode2 reset logic for v13.0.2
Use IP version and refactor reset logic to apply to a list of devices.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <
drm/amdgpu: Refactor mode2 reset logic for v13.0.2
Use IP version and refactor reset logic to apply to a list of devices.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3 |
|
| #
bc143d8b |
| 22-Nov-2021 |
Evan Quan <[email protected]> |
drm/amd/pm: do not expose implementation details to other blocks out of power
Those implementation details(whether swsmu supported, some ppt_funcs supported, accessing internal statistics ...)should
drm/amd/pm: do not expose implementation details to other blocks out of power
Those implementation details(whether swsmu supported, some ppt_funcs supported, accessing internal statistics ...)should be kept internally. It's not a good practice and even error prone to expose implementation details.
Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3 |
|
| #
a4967a1e |
| 20-Sep-2021 |
Mukul Joshi <[email protected]> |
drm/amdgpu: Enable RAS error injection after mode2 reset on Aldebaran
Add the missing call to re-enable RAS error injections on the Aldebaran mode2 reset code path.
Signed-off-by: Mukul Joshi <muku
drm/amdgpu: Enable RAS error injection after mode2 reset on Aldebaran
Add the missing call to re-enable RAS error injections on the Aldebaran mode2 reset code path.
Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5 |
|
| #
3b42ca80 |
| 01-Jun-2021 |
Zheng Yongjun <[email protected]> |
drm/amdgpu: Remove unneeded semicolon
Remove unneeded semicolon.
Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
|
|
Revision tags: v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5 |
|
| #
928a0fe6 |
| 24-Mar-2021 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Fix build warnings
Fix header guard and make internal functions static. Fixes the below warnings:
drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__
drm/amdgpu: Fix build warnings
Fix header guard and make internal functions static. Fixes the below warnings:
drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] drivers/gpu/drm/amd/amdgpu/aldebaran.c:110:6: warning: no previous prototype for function 'aldebaran_async_reset' [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/aldebaran_ppt.c:1435:5: warning: no previous prototype for function 'aldebaran_mode2_reset' [-Wmissing-prototypes]
Signed-off-by: Lijo Lazar <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc4 |
|
| #
142600e8 |
| 16-Mar-2021 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add mode2 reset support for aldebaran
v1: Aldebaran uses reset control to support mode2 reset. The sequences to reset and restore hardware context are specific to a particular configurat
drm/amdgpu: Add mode2 reset support for aldebaran
v1: Aldebaran uses reset control to support mode2 reset. The sequences to reset and restore hardware context are specific to a particular configuration.
v2: Clear bus mastering before reset. Fix coding style issues, drop unwanted variables and info log.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|