History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h (Results 1 – 25 of 27)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12
# a86e0c0e 15-Nov-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Add init level for post reset reinit

When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
ident

drm/amdgpu: Add init level for post reset reinit

When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
identify whether the IP/feature is initialized for the first time or
whether it's reinitialized after a reset.

Add RESET_RECOVERY init level to identify post reset reinitialization
phase. This only provides a device level identification, IP/features may
choose to track their state independently also.

Signed-off-by: Lijo Lazar <[email protected]>
Acked-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# 1e4acf4d 21-Aug-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Add reset on init handler for XGMI

In some cases, device needs to be reset before first use. Add handlers
for doing device reset during driver init sequence.

Signed-off-by: Lijo Lazar <

drm/amdgpu: Add reset on init handler for XGMI

In some cases, device needs to be reset before first use. Add handlers
for doing device reset during driver init sequence.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Acked-by: Rajneesh Bhardwaj <[email protected]>
Tested-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2
# 19cff165 02-Aug-2024 Victor Skvortsov <[email protected]>

drm/amdgpu: abort KIQ waits when there is a pending reset

Stop waiting for the KIQ to return back when there is a reset pending.
It's quite likely that the KIQ will never response.

Signed-off-by: K

drm/amdgpu: abort KIQ waits when there is a pending reset

Stop waiting for the KIQ to return back when there is a reset pending.
It's quite likely that the KIQ will never response.

Signed-off-by: Koenig Christian <[email protected]>
Suggested-by: Lazar Lijo <[email protected]>
Tested-by: Victor Skvortsov <[email protected]>
Signed-off-by: Victor Skvortsov <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# 2656e1ce 03-Jun-2024 Eric Huang <[email protected]>

drm/amdgpu: add reset sources in gpu reset context

reset source or reset cause is very useful info
for reset context, it will be used by events API.

Suggested-by: Lijo Lazar <[email protected]>
Si

drm/amdgpu: add reset sources in gpu reset context

reset source or reset cause is very useful info
for reset context, it will be used by events API.

Suggested-by: Lijo Lazar <[email protected]>
Signed-off-by: Eric Huang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6
# 25c01191 22-Apr-2024 Yunxiang Li <[email protected]>

drm/amdgpu: Add reset_context flag for host FLR

There are other reset sources that pass NULL as the job pointer, such as
amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if
the FL

drm/amdgpu: Add reset_context flag for host FLR

There are other reset sources that pass NULL as the job pointer, such as
amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if
the FLR comes from the host does not work.

Add a flag in reset_context to explicitly mark host triggered reset, and
set this flag when we receive host reset notification.

Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Emily Deng <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc5
# ea137071 16-Apr-2024 Ahmad Rehman <[email protected]>

drm/amdgpu: Skip the coredump collection on reset during driver reload

In passthrough environment, the driver triggers the mode-1 reset on
reload. The reset causes the core dump collection which is

drm/amdgpu: Skip the coredump collection on reset during driver reload

In passthrough environment, the driver triggers the mode-1 reset on
reload. The reset causes the core dump collection which is delayed task
and prevents driver from unloading until it is completed. Since we do
not need to collect data on "reset on reload" case, we can skip core
dump collection.

v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well.

Signed-off-by: Ahmad Rehman <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# 9022f01b 20-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: refactor code to split devcoredump code

Refractor devcoredump code into new files since its
functionality is expanded further and better to slit
and devcoredump to have its own file.

v2

drm/amdgpu: refactor code to split devcoredump code

Refractor devcoredump code into new files since its
functionality is expanded further and better to slit
and devcoredump to have its own file.

v2: Fix the build failure caught by arm compiler
of implicit function declaration with #ifdef

v3: squash in fix for implicit declaration error

Cc: Ivan Lipski <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Sunil Khatri <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8, v6.8-rc7
# 5e592956 01-Mar-2024 Sunil Khatri <[email protected]>

drm/amdgpu: add ring timeout information in devcoredump

Add ring timeout related information in the amdgpu
devcoredump file for debugging purposes.

During the gpu recovery process the registered ca

drm/amdgpu: add ring timeout information in devcoredump

Add ring timeout related information in the amdgpu
devcoredump file for debugging purposes.

During the gpu recovery process the registered call
is triggered and add the debug information in data
file created by devcoredump framework under the
directory /sys/class/devcoredump/devcdx/

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1
# fb1c93c2 10-Jan-2024 Christian König <[email protected]>

drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"

Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callback

drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"

Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.

Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.

Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.

Both is a complete no-go for driver unload so completely revert the
workaround for now.

This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04.

Signed-off-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

show more ...


# 087a3e13 10-Jan-2024 Christian König <[email protected]>

drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"

Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callback

drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"

Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.

Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.

Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.

Both is a complete no-go for driver unload so completely revert the
workaround for now.

This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04.

Signed-off-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2
# de009982 15-Sep-2023 André Almeida <[email protected]>

drm/amdgpu: Create version number for coredumps

Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to prope

drm/amdgpu: Create version number for coredumps

Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida <[email protected]>
Reviewed-by: Shashank Sharma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 69619868 15-Sep-2023 André Almeida <[email protected]>

drm/amdgpu: Move coredump code to amdgpu_reset file

Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-

drm/amdgpu: Move coredump code to amdgpu_reset file

Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Reviewed-by: Shashank Sharma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5
# f8a499ae 05-Aug-2023 Lijo Lazar <[email protected]>

drm/amdgpu: Keep reset handlers shared

Instead of maintaining a list per device, keep the reset handlers common
per ASIC family. A pointer to the list of handlers is maintained in
reset control.

Si

drm/amdgpu: Keep reset handlers shared

Instead of maintaining a list per device, keep the reset handlers common
per ASIC family. A pointer to the list of handlers is maintained in
reset control.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Tested-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1
# a340847b 13-Oct-2022 Victor Zhao <[email protected]>

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# b98a1648 13-Oct-2022 Victor Zhao <[email protected]>

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.0
# f61a825a 28-Sep-2022 Vignesh Chander <[email protected]>

drm/amdgpu: Skip put_reset_domain if it doesn't exist

For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.

drm/amdgpu: Skip put_reset_domain if it doesn't exist

For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.

Signed-off-by: Vignesh Chander <[email protected]>
Reviewed-by: Shaoyun Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.0-rc7, v6.0-rc6, v6.0-rc5
# f5c7e779 07-Sep-2022 YiPeng Chai <[email protected]>

drm/amdgpu: Adjust removal control flow for smu v13_0_2

Adjust removal control flow for smu v13_0_2:
During amdgpu uninstallation, when removing the first
device, the kernel needs to first send a

drm/amdgpu: Adjust removal control flow for smu v13_0_2

Adjust removal control flow for smu v13_0_2:
During amdgpu uninstallation, when removing the first
device, the kernel needs to first send a mode1reset message
to all gpu devices. Otherwise, smu initialization will fail
the next time amdgpu is installed.

V2:
1. Update commit comments.
2. Remove the global variable amdgpu_device_remove_cnt
and add a variable to the structure amdgpu_hive_info.
3. Use hive to detect the first removed device instead of
a global variable.

V3:
1. Update commit comments.
2. Split a patch into multiple patches.
3. The current patch does:
a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into
the existing gpu recover path, which make all devices
in hive list only have HW reset but no resume (except
the base IP).
b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and
AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover
in amdgpu_pci_remove when removing the first device in
hive list.
c. When removing the first device, the IP blocks keyword
function call sequence is as follows:
.suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini.
^ |
|-<----------<---------<----|
The first three sequences are because of a call to
amdgpu_device_gpu_recover. The three sequences will be
executed in a loop until all devices in the hive list
are iterated.
The sequences starting from .hw_fini only apply to the
first device. Since .suspend has been called before,
except the resumed phase1 basic ip blocks, all other ip
blocks .hw_fini of current device will do nothing.
d. When removing other devices, the calling sequences is the
same as legacy:
.hw_fini -> .early_fini -> .sw_fini.
Since .suspend has been called when removing the first device,
except the resumed phase1 basic ip blocks, all of other ip
blocks .hw_fini of current device will do nothing.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19
# dac6b808 28-Jul-2022 Victor Zhao <[email protected]>

drm/amdgpu: let mode2 reset fallback to default when failure

- introduce AMDGPU_SKIP_MODE2_RESET flag
- let mode2 reset fallback to default reset method if failed

v2: move this part out from the as

drm/amdgpu: let mode2 reset fallback to default when failure

- introduce AMDGPU_SKIP_MODE2_RESET flag
- let mode2 reset fallback to default reset method if failed

v2: move this part out from the asic specific part

Signed-off-by: Victor Zhao <[email protected]>
Acked-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 0a83bb35 03-Aug-2022 Lijo Lazar <[email protected]>

drm/amdgpu: Avoid another list of reset devices

A list of devices to be reset is already created in
amdgpu_device_gpu_recover function. Creating another list with the
same nodes is incorrect and not

drm/amdgpu: Avoid another list of reset devices

A list of devices to be reset is already created in
amdgpu_device_gpu_recover function. Creating another list with the
same nodes is incorrect and not supported in list_head. Instead, pass
the device list as part of reset context.

Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2)
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18
# ab9a0b1f 17-May-2022 Andrey Grodzovsky <[email protected]>

drm/amdgpu: Cache result of last reset at reset domain level.

Will be read by executors of async reset like debugfs.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christ

drm/amdgpu: Cache result of last reset at reset domain level.

Will be read by executors of async reset like debugfs.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4
# f5666d48 10-Feb-2022 Andrey Grodzovsky <[email protected]>

drm/amdgpu: Fix compile error.

Seems I forgot to add this to the relevant commit
when submitting.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Reported-by: kernel test robot <lkp@in

drm/amdgpu: Fix compile error.

Seems I forgot to add this to the relevant commit
when submitting.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


Revision tags: v5.17-rc3, v5.17-rc2
# e923be99 25-Jan-2022 Andrey Grodzovsky <[email protected]>

drm/amdgpu: Rework amdgpu_device_lock_adev

This functions needs to be split into 2 parts where
one is called only once for locking single instance of
reset_domain's sem and reset flag and the other

drm/amdgpu: Rework amdgpu_device_lock_adev

This functions needs to be split into 2 parts where
one is called only once for locking single instance of
reset_domain's sem and reset flag and the other part
which handles MP1 states should still be called for
each device in XGMI hive.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://www.spinics.net/lists/amd-gfx/msg74118.html

show more ...


Revision tags: v5.17-rc1
# 89a7a870 19-Jan-2022 Andrey Grodzovsky <[email protected]>

drm/amdgpu: Move in_gpu_reset into reset_domain

We should have a single instance per entrire reset domain.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Suggested-by: Lijo Lazar <lij

drm/amdgpu: Move in_gpu_reset into reset_domain

We should have a single instance per entrire reset domain.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Suggested-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://www.spinics.net/lists/amd-gfx/msg74116.html

show more ...


# d0fb18b5 19-Jan-2022 Andrey Grodzovsky <[email protected]>

drm/amdgpu: Move reset sem into reset_domain

We want single instance of reset sem across all
reset clients because in case of XGMI we should stop
access cross device MMIO because any of them could b

drm/amdgpu: Move reset sem into reset_domain

We want single instance of reset sem across all
reset clients because in case of XGMI we should stop
access cross device MMIO because any of them could be
in a reset in the moment.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://www.spinics.net/lists/amd-gfx/msg74117.html

show more ...


# cfbb6b00 21-Jan-2022 Andrey Grodzovsky <[email protected]>

drm/amdgpu: Rework reset domain to be refcounted.

The reset domain contains register access semaphor
now and so needs to be present as long as each device
in a hive needs it and so it cannot be bind

drm/amdgpu: Rework reset domain to be refcounted.

The reset domain contains register access semaphor
now and so needs to be present as long as each device
in a hive needs it and so it cannot be binded to XGMI
hive life cycle.
Adress this by making reset domain refcounted and pointed
by each member of the hive and the hive itself.

v4:

Fix crash on boot witrh XGMI hive by adding type to reset_domain.
XGMI will only create a new reset_domain if prevoius was of single
device type meaning it's first boot. Otherwsie it will take a
refocunt to exsiting reset_domain from the amdgou device.

Add a wrapper around reset_domain->refcount get/put
and a wrapper around send to reset wq (Lijo)

Signed-off-by: Andrey Grodzovsky <[email protected]>
Acked-by: Christian König <[email protected]>
Link: https://www.spinics.net/lists/amd-gfx/msg74121.html

show more ...


12