|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
100350c3 |
| 10-Sep-2024 |
Asad Kamal <[email protected]> |
drm/amd/pm: Add mode2 support for SMU v13.0.12
Add mode2 reset support for smu version 13.0.12
Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off
drm/amd/pm: Add mode2 support for SMU v13.0.12
Add mode2 reset support for smu version 13.0.12
Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
a86e0c0e |
| 15-Nov-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to ident
drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to identify whether the IP/feature is initialized for the first time or whether it's reinitialized after a reset.
Add RESET_RECOVERY init level to identify post reset reinitialization phase. This only provides a device level identification, IP/features may choose to track their state independently also.
Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
591aec15 |
| 15-Oct-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Save VCN shared memory with init reset
VCN shared memory is in framebuffer and there are some flags initialized during sw_init. Ideally, such programming should be during hw_init.
Make
drm/amdgpu: Save VCN shared memory with init reset
VCN shared memory is in framebuffer and there are some flags initialized during sw_init. Ideally, such programming should be during hw_init.
Make sure the flags are saved during reset on initialization since that reset will affect frame buffer region. For clarity, separate it out to another function.
Fixes: 1e4acf4d93cd ("drm/amdgpu: Add reset on init handler for XGMI") Signed-off-by: Lijo Lazar <[email protected]> Reported-by: Hao Zhou <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
e095026f |
| 17-Oct-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_suspend wher
drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
982d7f9b |
| 30-Sep-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: update the handle ptr in suspend
Update the *handle to amdgpu_ip_block ptr for all functions pointers of suspend.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christi
drm/amdgpu: update the handle ptr in suspend
Update the *handle to amdgpu_ip_block ptr for all functions pointers of suspend.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
1e4acf4d |
| 21-Aug-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add reset on init handler for XGMI
In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence.
Signed-off-by: Lijo Lazar <
drm/amdgpu: Add reset on init handler for XGMI
In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
7bed1df8 |
| 06-Jun-2024 |
Eric Huang <[email protected]> |
drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc
amdgpu_job_ring may return NULL, which causes kernel NULL pointer error, using another way to print ring name instead of ring->name.
Suggested-
drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc
amdgpu_job_ring may return NULL, which causes kernel NULL pointer error, using another way to print ring name instead of ring->name.
Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Eric Huang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
2656e1ce |
| 03-Jun-2024 |
Eric Huang <[email protected]> |
drm/amdgpu: add reset sources in gpu reset context
reset source or reset cause is very useful info for reset context, it will be used by events API.
Suggested-by: Lijo Lazar <[email protected]> Si
drm/amdgpu: add reset sources in gpu reset context
reset source or reset cause is very useful info for reset context, it will be used by events API.
Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
a6bcffa5 |
| 30-Apr-2024 |
Hawking Zhang <[email protected]> |
drm/amdgpu: Add smu v13_0_14 ip block
Add smu v13_0_14 ip block support
Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <alexande
drm/amdgpu: Add smu v13_0_14 ip block
Add smu v13_0_14 ip block support
Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
9022f01b |
| 20-Mar-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: refactor code to split devcoredump code
Refractor devcoredump code into new files since its functionality is expanded further and better to slit and devcoredump to have its own file.
v2
drm/amdgpu: refactor code to split devcoredump code
Refractor devcoredump code into new files since its functionality is expanded further and better to slit and devcoredump to have its own file.
v2: Fix the build failure caught by arm compiler of implicit function declaration with #ifdef
v3: squash in fix for implicit declaration error
Cc: Ivan Lipski <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
6fe4dab3 |
| 18-Mar-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: remove the adev check for NULL
adev is a global data structure and isn't expected to be NULL and hence removing the redundant adev check from the devcoredump code.
Cc: Dan Carpenter <da
drm/amdgpu: remove the adev check for NULL
adev is a global data structure and isn't expected to be NULL and hence removing the redundant adev check from the devcoredump code.
Cc: Dan Carpenter <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Suggested-by: Dan Carpenter <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8 |
|
| #
d72e2bda |
| 08-Mar-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: add the hw_ip version of all IP's
Add all the IP's version information on a SOC to the devcoredump.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <alexand
drm/amdgpu: add the hw_ip version of all IP's
Add all the IP's version information on a SOC to the devcoredump.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
3eb899c4 |
| 11-Mar-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: add ring buffer information in devcoredump
Add relevant ringbuffer information such as rptr, wptr,rb mask, ring name, ring size and also the rings content for each ring on a gpu reset.
drm/amdgpu: add ring buffer information in devcoredump
Add relevant ringbuffer information such as rptr, wptr,rb mask, ring name, ring size and also the rings content for each ring on a gpu reset.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
583681d4 |
| 06-Mar-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: add vm fault information to devcoredump
Add page fault information to the devcoredump.
Output of devcoredump: **** AMDGPU Device Coredump **** version: 1 kernel: 6.7.0-amd-staging-drm-n
drm/amdgpu: add vm fault information to devcoredump
Add page fault information to the devcoredump.
Output of devcoredump: **** AMDGPU Device Coredump **** version: 1 kernel: 6.7.0-amd-staging-drm-next module: amdgpu time: 29.725011811 process_name: soft_recovery_p PID: 1720
Ring timed out details IP Type: 0 Ring Name: gfx_0.0.0
[gfxhub] Page fault observed Faulty page starting at address: 0x0000000000000000 Protection fault status register: 0x301031
VRAM is lost due to GPU reset!
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7 |
|
| #
5e592956 |
| 01-Mar-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: add ring timeout information in devcoredump
Add ring timeout related information in the amdgpu devcoredump file for debugging purposes.
During the gpu recovery process the registered ca
drm/amdgpu: add ring timeout information in devcoredump
Add ring timeout related information in the amdgpu devcoredump file for debugging purposes.
During the gpu recovery process the registered call is triggered and add the debug information in data file created by devcoredump framework under the directory /sys/class/devcoredump/devcdx/
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1 |
|
| #
b8f67b9d |
| 18-Jan-2024 |
Shashank Sharma <[email protected]> |
drm/amdgpu: change vm->task_info handling
This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its ua
drm/amdgpu: change vm->task_info handling
This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted. - introducing two new helper funcs for task_info lifecycle management - amdgpu_vm_get_task_info: reference counts up task_info before returning this info - amdgpu_vm_put_task_info: reference counts down task_info - last put to task_info() frees task_info from the vm.
This patch also does logistical changes required for existing usage of vm->task_info.
V2: Do not block all the prints when task_info not found (Felix)
V3: Fixed review comments from Felix - Fix wrong indentation - No debug message for -ENOMEM - Add NULL check for task_info - Do not duplicate the debug messages (ti vs no ti) - Get first reference of task_info in vm_init(), put last in vm_fini()
V4: Fixed review comments from Felix - fix double reference increment in create_task_info - change amdgpu_vm_get_task_info_pasid - additional changes in amdgpu_gem.c while porting
Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Felix Kuehling <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2 |
|
| #
de009982 |
| 15-Sep-2023 |
André Almeida <[email protected]> |
drm/amdgpu: Create version number for coredumps
Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to prope
drm/amdgpu: Create version number for coredumps
Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to properly read the file.
Create a version number to be displayed on top of coredump file, to be incremented when the file format or content get changed.
Signed-off-by: André Almeida <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
69619868 |
| 15-Sep-2023 |
André Almeida <[email protected]> |
drm/amdgpu: Move coredump code to amdgpu_reset file
Giving that we use codedump just for device resets, move it's functions and structs to a more semantic file, the amdgpu_reset.{c, h}.
Signed-off-
drm/amdgpu: Move coredump code to amdgpu_reset file
Giving that we use codedump just for device resets, move it's functions and structs to a more semantic file, the amdgpu_reset.{c, h}.
Signed-off-by: André Almeida <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
4e8303cf |
| 11-Sep-2023 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Use function for IP version check
Use an inline function for version check. Gives more flexibility to handle any format changes.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-
drm/amdgpu: Use function for IP version check
Use an inline function for version check. Gives more flexibility to handle any format changes.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5 |
|
| #
f8a499ae |
| 05-Aug-2023 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Keep reset handlers shared
Instead of maintaining a list per device, keep the reset handlers common per ASIC family. A pointer to the list of handlers is maintained in reset control.
Si
drm/amdgpu: Keep reset handlers shared
Instead of maintaining a list per device, keep the reset handlers common per ASIC family. A pointer to the list of handlers is maintained in reset control.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Tested-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc4, v6.5-rc3 |
|
| #
b8920e1e |
| 23-Jul-2023 |
Srinivasan Shanmugam <[email protected]> |
drm/amdgpu: Fix ENOSYS means 'invalid syscall nr' in amdgpu_device.c
ENOSYS should be used for nonexistent syscalls only, replace ENOSYS with EOPNOTSUPP for reset handlers that are not implemented f
drm/amdgpu: Fix ENOSYS means 'invalid syscall nr' in amdgpu_device.c
ENOSYS should be used for nonexistent syscalls only, replace ENOSYS with EOPNOTSUPP for reset handlers that are not implemented for respective ASIC.
WARNING: ENOSYS means 'invalid syscall nr' and nothing else + if (r == -ENOSYS)
WARNING: ENOSYS means 'invalid syscall nr' and nothing else + if (r == -ENOSYS)
And other following style fixes in amdgpu_device.c:
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. WARNING: Block comments should align the * on each line WARNING: Missing a blank line after declarations WARNING: braces {} are not necessary for single statement blocks
Cc: Lijo Lazar <[email protected]> Cc: Kent Russell <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7 |
|
| #
5cf16755 |
| 28-Feb-2022 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add mode2 reset logic for v13.0.6
Mode2 reset for v13.0.6 has similar workflow as v13.0.2
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]
drm/amdgpu: Add mode2 reset logic for v13.0.6
Mode2 reset for v13.0.6 has similar workflow as v13.0.2
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
230dd6bb |
| 10-Feb-2023 |
Kenneth Feng <[email protected]> |
drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10
implement mode2 reset on smu_v13_0_10
Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-
drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10
implement mode2 reset on smu_v13_0_10
Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
a340847b |
| 13-Oct-2022 |
Victor Zhao <[email protected]> |
Revert "drm/amdgpu: let mode2 reset fallback to default when failure"
This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.
This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts
Revert "drm/amdgpu: let mode2 reset fallback to default when failure"
This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.
This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it.
Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
afbaa155 |
| 13-Oct-2022 |
Victor Zhao <[email protected]> |
Revert "drm/amdgpu: add debugfs amdgpu_reset_level"
This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.
This commit breaks the reset logic for aldebaran, revert it for now. Will move the
Revert "drm/amdgpu: add debugfs amdgpu_reset_level"
This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.
This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler.
Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|