History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c (Results 1 – 25 of 39)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11
# 100350c3 10-Sep-2024 Asad Kamal <[email protected]>

drm/amd/pm: Add mode2 support for SMU v13.0.12

Add mode2 reset support for smu version 13.0.12

Signed-off-by: Asad Kamal <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off

drm/amd/pm: Add mode2 support for SMU v13.0.12

Add mode2 reset support for smu version 13.0.12

Signed-off-by: Asad Kamal <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# a86e0c0e 15-Nov-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Add init level for post reset reinit

When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
ident

drm/amdgpu: Add init level for post reset reinit

When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
identify whether the IP/feature is initialized for the first time or
whether it's reinitialized after a reset.

Add RESET_RECOVERY init level to identify post reset reinitialization
phase. This only provides a device level identification, IP/features may
choose to track their state independently also.

Signed-off-by: Lijo Lazar <[email protected]>
Acked-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 591aec15 15-Oct-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Save VCN shared memory with init reset

VCN shared memory is in framebuffer and there are some flags initialized
during sw_init. Ideally, such programming should be during hw_init.

Make

drm/amdgpu: Save VCN shared memory with init reset

VCN shared memory is in framebuffer and there are some flags initialized
during sw_init. Ideally, such programming should be during hw_init.

Make sure the flags are saved during reset on initialization since that
reset will affect frame buffer region. For clarity, separate it out to
another function.

Fixes: 1e4acf4d93cd ("drm/amdgpu: Add reset on init handler for XGMI")
Signed-off-by: Lijo Lazar <[email protected]>
Reported-by: Hao Zhou <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# e095026f 17-Oct-2024 Sunil Khatri <[email protected]>

drm/amdgpu: validate suspend before function call

Before making a function call to suspend, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_suspend wher

drm/amdgpu: validate suspend before function call

Before making a function call to suspend, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 982d7f9b 30-Sep-2024 Sunil Khatri <[email protected]>

drm/amdgpu: update the handle ptr in suspend

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of suspend.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christi

drm/amdgpu: update the handle ptr in suspend

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of suspend.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5
# 1e4acf4d 21-Aug-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Add reset on init handler for XGMI

In some cases, device needs to be reset before first use. Add handlers
for doing device reset during driver init sequence.

Signed-off-by: Lijo Lazar <

drm/amdgpu: Add reset on init handler for XGMI

In some cases, device needs to be reset before first use. Add handlers
for doing device reset during driver init sequence.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Acked-by: Rajneesh Bhardwaj <[email protected]>
Tested-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# 7bed1df8 06-Jun-2024 Eric Huang <[email protected]>

drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc

amdgpu_job_ring may return NULL, which causes kernel NULL
pointer error, using another way to print ring name instead
of ring->name.

Suggested-

drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc

amdgpu_job_ring may return NULL, which causes kernel NULL
pointer error, using another way to print ring name instead
of ring->name.

Suggested-by: Lijo Lazar <[email protected]>
Signed-off-by: Eric Huang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 2656e1ce 03-Jun-2024 Eric Huang <[email protected]>

drm/amdgpu: add reset sources in gpu reset context

reset source or reset cause is very useful info
for reset context, it will be used by events API.

Suggested-by: Lijo Lazar <[email protected]>
Si

drm/amdgpu: add reset sources in gpu reset context

reset source or reset cause is very useful info
for reset context, it will be used by events API.

Suggested-by: Lijo Lazar <[email protected]>
Signed-off-by: Eric Huang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7
# a6bcffa5 30-Apr-2024 Hawking Zhang <[email protected]>

drm/amdgpu: Add smu v13_0_14 ip block

Add smu v13_0_14 ip block support

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <alexande

drm/amdgpu: Add smu v13_0_14 ip block

Add smu v13_0_14 ip block support

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# 9022f01b 20-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: refactor code to split devcoredump code

Refractor devcoredump code into new files since its
functionality is expanded further and better to slit
and devcoredump to have its own file.

v2

drm/amdgpu: refactor code to split devcoredump code

Refractor devcoredump code into new files since its
functionality is expanded further and better to slit
and devcoredump to have its own file.

v2: Fix the build failure caught by arm compiler
of implicit function declaration with #ifdef

v3: squash in fix for implicit declaration error

Cc: Ivan Lipski <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Sunil Khatri <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 6fe4dab3 18-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: remove the adev check for NULL

adev is a global data structure and isn't expected
to be NULL and hence removing the redundant adev
check from the devcoredump code.

Cc: Dan Carpenter <da

drm/amdgpu: remove the adev check for NULL

adev is a global data structure and isn't expected
to be NULL and hence removing the redundant adev
check from the devcoredump code.

Cc: Dan Carpenter <[email protected]>
Signed-off-by: Sunil Khatri <[email protected]>
Suggested-by: Dan Carpenter <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8
# d72e2bda 08-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: add the hw_ip version of all IP's

Add all the IP's version information on a SOC to the
devcoredump.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Alex Deucher <alexand

drm/amdgpu: add the hw_ip version of all IP's

Add all the IP's version information on a SOC to the
devcoredump.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 3eb899c4 11-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: add ring buffer information in devcoredump

Add relevant ringbuffer information such as
rptr, wptr,rb mask, ring name, ring size and also
the rings content for each ring on a gpu reset.

drm/amdgpu: add ring buffer information in devcoredump

Add relevant ringbuffer information such as
rptr, wptr,rb mask, ring name, ring size and also
the rings content for each ring on a gpu reset.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 583681d4 06-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: add vm fault information to devcoredump

Add page fault information to the devcoredump.

Output of devcoredump:
**** AMDGPU Device Coredump ****
version: 1
kernel: 6.7.0-amd-staging-drm-n

drm/amdgpu: add vm fault information to devcoredump

Add page fault information to the devcoredump.

Output of devcoredump:
**** AMDGPU Device Coredump ****
version: 1
kernel: 6.7.0-amd-staging-drm-next
module: amdgpu
time: 29.725011811
process_name: soft_recovery_p PID: 1720

Ring timed out details
IP Type: 0 Ring Name: gfx_0.0.0

[gfxhub] Page fault observed
Faulty page starting at address: 0x0000000000000000
Protection fault status register: 0x301031

VRAM is lost due to GPU reset!

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc7
# 5e592956 01-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: add ring timeout information in devcoredump

Add ring timeout related information in the amdgpu
devcoredump file for debugging purposes.

During the gpu recovery process the registered ca

drm/amdgpu: add ring timeout information in devcoredump

Add ring timeout related information in the amdgpu
devcoredump file for debugging purposes.

During the gpu recovery process the registered call
is triggered and add the debug information in data
file created by devcoredump framework under the
directory /sys/class/devcoredump/devcdx/

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1
# b8f67b9d 18-Jan-2024 Shashank Sharma <[email protected]>

drm/amdgpu: change vm->task_info handling

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its ua

drm/amdgpu: change vm->task_info handling

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

V2: Do not block all the prints when task_info not found (Felix)

V3: Fixed review comments from Felix
- Fix wrong indentation
- No debug message for -ENOMEM
- Add NULL check for task_info
- Do not duplicate the debug messages (ti vs no ti)
- Get first reference of task_info in vm_init(), put last
in vm_fini()

V4: Fixed review comments from Felix
- fix double reference increment in create_task_info
- change amdgpu_vm_get_task_info_pasid
- additional changes in amdgpu_gem.c while porting

Cc: Christian Koenig <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2
# de009982 15-Sep-2023 André Almeida <[email protected]>

drm/amdgpu: Create version number for coredumps

Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to prope

drm/amdgpu: Create version number for coredumps

Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida <[email protected]>
Reviewed-by: Shashank Sharma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 69619868 15-Sep-2023 André Almeida <[email protected]>

drm/amdgpu: Move coredump code to amdgpu_reset file

Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-

drm/amdgpu: Move coredump code to amdgpu_reset file

Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Reviewed-by: Shashank Sharma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 4e8303cf 11-Sep-2023 Lijo Lazar <[email protected]>

drm/amdgpu: Use function for IP version check

Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-

drm/amdgpu: Use function for IP version check

Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5
# f8a499ae 05-Aug-2023 Lijo Lazar <[email protected]>

drm/amdgpu: Keep reset handlers shared

Instead of maintaining a list per device, keep the reset handlers common
per ASIC family. A pointer to the list of handlers is maintained in
reset control.

Si

drm/amdgpu: Keep reset handlers shared

Instead of maintaining a list per device, keep the reset handlers common
per ASIC family. A pointer to the list of handlers is maintained in
reset control.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Tested-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.5-rc4, v6.5-rc3
# b8920e1e 23-Jul-2023 Srinivasan Shanmugam <[email protected]>

drm/amdgpu: Fix ENOSYS means 'invalid syscall nr' in amdgpu_device.c

ENOSYS should be used for nonexistent syscalls only, replace ENOSYS with
EOPNOTSUPP for reset handlers that are not implemented f

drm/amdgpu: Fix ENOSYS means 'invalid syscall nr' in amdgpu_device.c

ENOSYS should be used for nonexistent syscalls only, replace ENOSYS with
EOPNOTSUPP for reset handlers that are not implemented for respective ASIC.

WARNING: ENOSYS means 'invalid syscall nr' and nothing else
+ if (r == -ENOSYS)

WARNING: ENOSYS means 'invalid syscall nr' and nothing else
+ if (r == -ENOSYS)

And other following style fixes in amdgpu_device.c:

WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.
WARNING: Block comments should align the * on each line
WARNING: Missing a blank line after declarations
WARNING: braces {} are not necessary for single statement blocks

Cc: Lijo Lazar <[email protected]>
Cc: Kent Russell <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Guchun Chen <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7
# 5cf16755 28-Feb-2022 Lijo Lazar <[email protected]>

drm/amdgpu: Add mode2 reset logic for v13.0.6

Mode2 reset for v13.0.6 has similar workflow as v13.0.2

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]

drm/amdgpu: Add mode2 reset logic for v13.0.6

Mode2 reset for v13.0.6 has similar workflow as v13.0.2

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 230dd6bb 10-Feb-2023 Kenneth Feng <[email protected]>

drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10

implement mode2 reset on smu_v13_0_10

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Signed-off-

drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10

implement mode2 reset on smu_v13_0_10

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# a340847b 13-Oct-2022 Victor Zhao <[email protected]>

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# afbaa155 13-Oct-2022 Victor Zhao <[email protected]>

Revert "drm/amdgpu: add debugfs amdgpu_reset_level"

This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the

Revert "drm/amdgpu: add debugfs amdgpu_reset_level"

This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


12