History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/aldebaran.c (Results 1 – 21 of 21)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5
# e1ee2111 24-Oct-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Prefer RAS recovery for scheduler hang

Before scheduling a recovery due to scheduler/job hang, check if a RAS
error is detected. If so, choose RAS recovery to handle the situation. A
sch

drm/amdgpu: Prefer RAS recovery for scheduler hang

Before scheduling a recovery due to scheduler/job hang, check if a RAS
error is detected. If so, choose RAS recovery to handle the situation. A
scheduler/job hang could be the side effect of a RAS error. In such
cases, it is required to go through the RAS error recovery process. A
RAS error recovery process in certains cases also could avoid a full
device device reset.

An error state is maintained in RAS context to detect the block
affected. Fatal Error state uses unused block id. Set the block id when
error is detected. If the interrupt handler detected a poison error,
it's not required to look for a fatal error. Skip fatal error checking
in such cases.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# a86e0c0e 15-Nov-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Add init level for post reset reinit

When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
ident

drm/amdgpu: Add init level for post reset reinit

When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
identify whether the IP/feature is initialized for the first time or
whether it's reinitialized after a reset.

Add RESET_RECOVERY init level to identify post reset reinitialization
phase. This only provides a device level identification, IP/features may
choose to track their state independently also.

Signed-off-by: Lijo Lazar <[email protected]>
Acked-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc4
# 502d7630 17-Oct-2024 Sunil Khatri <[email protected]>

drm/amdgpu: validate resume before function call

Before making a function call to resume, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_resume where
s

drm/amdgpu: validate resume before function call

Before making a function call to resume, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_resume where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# e095026f 17-Oct-2024 Sunil Khatri <[email protected]>

drm/amdgpu: validate suspend before function call

Before making a function call to suspend, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_suspend wher

drm/amdgpu: validate suspend before function call

Before making a function call to suspend, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc3, v6.12-rc2
# 7feb4f3a 30-Sep-2024 Sunil Khatri <[email protected]>

drm/amdgpu: update the handle ptr in resume

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of resume.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian

drm/amdgpu: update the handle ptr in resume

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of resume.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 982d7f9b 30-Sep-2024 Sunil Khatri <[email protected]>

drm/amdgpu: update the handle ptr in suspend

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of suspend.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christi

drm/amdgpu: update the handle ptr in suspend

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of suspend.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc1
# 3138ab2c 26-Sep-2024 Sunil Khatri <[email protected]>

drm/amdgpu: update the handle ptr in late_init

Update the ptr handle to amdgpu_ip_block ptr in all
the functions of late_init function ptr.

Signed-off-by: Sunil Khatri <[email protected]>
Review

drm/amdgpu: update the handle ptr in late_init

Update the ptr handle to amdgpu_ip_block ptr in all
the functions of late_init function ptr.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11, v6.11-rc7
# c4f00312 02-Sep-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Support reset-on-init on select SOCs

Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <Feifei.X

drm/amdgpu: Support reset-on-init on select SOCs

Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Rajneesh Bhardwaj <[email protected]>
Tested-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7
# 78347b65 01-Jul-2024 YiPeng Chai <[email protected]>

drm/amdgpu: sysfs node disable query error count during gpu reset

Sysfs node disable query error count during gpu reset.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Stanley.Yang <

drm/amdgpu: sysfs node disable query error count during gpu reset

Sysfs node disable query error count during gpu reset.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Stanley.Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6
# 60c44843 22-Apr-2024 Ma Jun <[email protected]>

drm/amdgpu: Fix uninitialized variable warnings

return 0 to avoid returning an uninitialized variable r

Signed-off-by: Ma Jun <[email protected]>
Acked-by: Christian König <[email protected]>

drm/amdgpu: Fix uninitialized variable warnings

return 0 to avoid returning an uninitialized variable r

Signed-off-by: Ma Jun <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6
# a32c6f7f 15-Dec-2023 Stanley.Yang <[email protected]>

drm/amdgpu: Fix ecc irq enable/disable unpaired

The ecc_irq is disabled while GPU mode2 reset suspending process,
but not be enabled during GPU mode2 reset resume process.

Changed from V1:
only do

drm/amdgpu: Fix ecc irq enable/disable unpaired

The ecc_irq is disabled while GPU mode2 reset suspending process,
but not be enabled during GPU mode2 reset resume process.

Changed from V1:
only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip
delete amdgpu_ras_late_resume function

Changed from V2:
check umc ras supported before put ecc_irq

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2
# 4e8303cf 11-Sep-2023 Lijo Lazar <[email protected]>

drm/amdgpu: Use function for IP version check

Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-

drm/amdgpu: Use function for IP version check

Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.6-rc1
# ca7aa3bf 06-Sep-2023 Lijo Lazar <[email protected]>

drm/amdgpu: Use default reset method handler

When reset method is not passed in reset context, look for the handler
for default reset method. On Aldebaran, default reset method for SOCs
connected to

drm/amdgpu: Use default reset method handler

When reset method is not passed in reset context, look for the handler
for default reset method. On Aldebaran, default reset method for SOCs
connected to CPU over XGMI is MODE2.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Tested-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5
# f8a499ae 05-Aug-2023 Lijo Lazar <[email protected]>

drm/amdgpu: Keep reset handlers shared

Instead of maintaining a list per device, keep the reset handlers common
per ASIC family. A pointer to the list of handlers is maintained in
reset control.

Si

drm/amdgpu: Keep reset handlers shared

Instead of maintaining a list per device, keep the reset handlers common
per ASIC family. A pointer to the list of handlers is maintained in
reset control.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Tested-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1
# 0a83bb35 03-Aug-2022 Lijo Lazar <[email protected]>

drm/amdgpu: Avoid another list of reset devices

A list of devices to be reset is already created in
amdgpu_device_gpu_recover function. Creating another list with the
same nodes is incorrect and not

drm/amdgpu: Avoid another list of reset devices

A list of devices to be reset is already created in
amdgpu_device_gpu_recover function. Creating another list with the
same nodes is incorrect and not supported in list_head. Instead, pass
the device list as part of reset context.

Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2)
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6
# 9e085647 25-Feb-2022 Lijo Lazar <[email protected]>

drm/amdgpu: Refactor mode2 reset logic for v13.0.2

Use IP version and refactor reset logic to apply to a list of devices.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <

drm/amdgpu: Refactor mode2 reset logic for v13.0.2

Use IP version and refactor reset logic to apply to a list of devices.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3
# bc143d8b 22-Nov-2021 Evan Quan <[email protected]>

drm/amd/pm: do not expose implementation details to other blocks out of power

Those implementation details(whether swsmu supported, some ppt_funcs supported,
accessing internal statistics ...)should

drm/amd/pm: do not expose implementation details to other blocks out of power

Those implementation details(whether swsmu supported, some ppt_funcs supported,
accessing internal statistics ...)should be kept internally. It's not a good
practice and even error prone to expose implementation details.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3
# a4967a1e 20-Sep-2021 Mukul Joshi <[email protected]>

drm/amdgpu: Enable RAS error injection after mode2 reset on Aldebaran

Add the missing call to re-enable RAS error injections on the Aldebaran
mode2 reset code path.

Signed-off-by: Mukul Joshi <muku

drm/amdgpu: Enable RAS error injection after mode2 reset on Aldebaran

Add the missing call to re-enable RAS error injections on the Aldebaran
mode2 reset code path.

Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5
# 3b42ca80 01-Jun-2021 Zheng Yongjun <[email protected]>

drm/amdgpu: Remove unneeded semicolon

Remove unneeded semicolon.

Signed-off-by: Zheng Yongjun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>


Revision tags: v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5
# 928a0fe6 24-Mar-2021 Lijo Lazar <[email protected]>

drm/amdgpu: Fix build warnings

Fix header guard and make internal functions static. Fixes the below warnings:

drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__

drm/amdgpu: Fix build warnings

Fix header guard and make internal functions static. Fixes the below warnings:

drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
drivers/gpu/drm/amd/amdgpu/aldebaran.c:110:6: warning: no previous prototype for function 'aldebaran_async_reset' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/aldebaran_ppt.c:1435:5: warning: no previous prototype for function 'aldebaran_mode2_reset' [-Wmissing-prototypes]

Signed-off-by: Lijo Lazar <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.12-rc4
# 142600e8 16-Mar-2021 Lijo Lazar <[email protected]>

drm/amdgpu: Add mode2 reset support for aldebaran

v1: Aldebaran uses reset control to support mode2 reset. The sequences to
reset and restore hardware context are specific to a particular
configurat

drm/amdgpu: Add mode2 reset support for aldebaran

v1: Aldebaran uses reset control to support mode2 reset. The sequences to
reset and restore hardware context are specific to a particular
configuration.

v2: Clear bus mastering before reset.
Fix coding style issues, drop unwanted variables and info log.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...