|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5 |
|
| #
d0ce1aaa |
| 01-May-2025 |
Alex Deucher <[email protected]> |
Revert "drm/amd: Stop evicting resources on APUs in suspend"
This reverts commit 3a9626c816db901def438dc2513622e281186d39.
This breaks S4 because we end up setting the s3/s0ix flags even when we ar
Revert "drm/amd: Stop evicting resources on APUs in suspend"
This reverts commit 3a9626c816db901def438dc2513622e281186d39.
This breaks S4 because we end up setting the s3/s0ix flags even when we are entering s4 since prepare is used by both flows. The causes both the S3/s0ix and s4 flags to be set which breaks several checks in the driver which assume they are mutually exclusive.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit ce8f7d95899c2869b47ea6ce0b3e5bf304b2fff4) Cc: [email protected]
show more ...
|
|
Revision tags: v6.15-rc4, v6.15-rc3, v6.15-rc2 |
|
| #
1657793d |
| 08-Apr-2025 |
Mario Limonciello <[email protected]> |
drm/amd: Forbid suspending into non-default suspend states
On systems that default to 'deep' some userspace software likes to try to suspend in 'deep' first. If there is a failure for any reason (s
drm/amd: Forbid suspending into non-default suspend states
On systems that default to 'deep' some userspace software likes to try to suspend in 'deep' first. If there is a failure for any reason (such as -ENOMEM) the failure is ignored and then it will try to use 's2idle' as a fallback. This fails, but more importantly it leads to graphical problems.
Forbid this behavior and only allow suspending in the last state supported by the system.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4093 Acked-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 2aabd44aa8a3c08da3d43264c168370f6da5e81d)
show more ...
|
|
Revision tags: v6.15-rc1, v6.14 |
|
| #
3666ed82 |
| 21-Mar-2025 |
Jay Cornwall <[email protected]> |
drm/amdgpu: Increase KIQ invalidate_tlbs timeout
KIQ invalidate_tlbs request has been seen to marginally exceed the configured 100 ms timeout on systems under load.
All other KIQ requests in the dr
drm/amdgpu: Increase KIQ invalidate_tlbs timeout
KIQ invalidate_tlbs request has been seen to marginally exceed the configured 100 ms timeout on systems under load.
All other KIQ requests in the driver use a 10 second timeout. Use a similar timeout implementation on the invalidate_tlbs path.
v2: Poll once before msleep v3: Fix return value
Signed-off-by: Jay Cornwall <[email protected]> Cc: Kent Russell <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13 |
|
| #
bd22e44a |
| 15-Jan-2025 |
Christian König <[email protected]> |
drm/amdgpu: rework how isolation is enforced v2
Limiting the number of available VMIDs to enforce isolation causes some issues with gang submit and applying certain HW workarounds which require mult
drm/amdgpu: rework how isolation is enforced v2
Limiting the number of available VMIDs to enforce isolation causes some issues with gang submit and applying certain HW workarounds which require multiple VMIDs to work correctly.
So instead start to track all submissions to the relevant engines in a per partition data structure and use the dma_fences of the submissions to enforce isolation similar to what a VMID limit does.
v2: use ~0l for jobs without isolation to distinct it from kernel submissions which uses NULL for the owner. Add some warning when we are OOM.
Signed-off-by: Christian König <[email protected]> Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
9c696cc5 |
| 26-Feb-2025 |
André Almeida <[email protected]> |
drm/amdgpu: Create a debug option to disable ring reset
Prior to the addition of ring reset, the debug option `debug_disable_soft_recovery` could be used to force a full device reset. Now that we ha
drm/amdgpu: Create a debug option to disable ring reset
Prior to the addition of ring reset, the debug option `debug_disable_soft_recovery` could be used to force a full device reset. Now that we have ring reset, create a debug option to disable them in amdgpu, forcing the driver to go with the full device reset path again when both options are combined.
This option is useful for testing and debugging purposes when one wants to test the full reset from userspace.
Signed-off-by: André Almeida <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
76b1f8b3 |
| 11-Feb-2025 |
Candice Li <[email protected]> |
drm/amdgpu: Optimize the enablement of GECC
Enable GECC only when the default memory ECC mode or the module parameter amdgpu_ras_enable is activated.
v2: Add kernel message to remind users explicit
drm/amdgpu: Optimize the enablement of GECC
Enable GECC only when the default memory ECC mode or the module parameter amdgpu_ras_enable is activated.
v2: Add kernel message to remind users explicitly set amdgpu_ras_enable=1 before driver loading to enable GECC and set amdgpu_ras_enable=0 to disable GECC when GECC is currently enabled if needed.
Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
92d5d2a0 |
| 24-Jan-2025 |
Hawking Zhang <[email protected]> |
drm/amdgpu: Introduce funcs for populating CPER
Introduce utility functions designed to assist in populating CPER records.
v2: call cper_init/fini in device_ip_init/fini.
Signed-off-by: Hawking Zh
drm/amdgpu: Introduce funcs for populating CPER
Introduce utility functions designed to assist in populating CPER records.
v2: call cper_init/fini in device_ip_init/fini.
Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
8b0d068e |
| 30-Jan-2025 |
Alex Deucher <[email protected]> |
drm/amdkfd: add a new flag to manage where VRAM allocations go
On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if
drm/amdkfd: add a new flag to manage where VRAM allocations go
On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM.
No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future.
Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
e986e896 |
| 05-Feb-2025 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add wrapper for freeing vbios memory
Use bios_release wrapper to release memory allocated for vbios image and reset the variables.
v2: Use the same wrapper for clean up in sw_fini (Ale
drm/amdgpu: Add wrapper for freeing vbios memory
Use bios_release wrapper to release memory allocated for vbios image and reset the variables.
v2: Use the same wrapper for clean up in sw_fini (Alex Deucher)
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
11815bb0 |
| 12-Dec-2024 |
Christian König <[email protected]> |
drm/amdgpu: partially revert "reduce reset time"
This partially reverts commit 194eb174cbe4fe2b3376ac30acca2dc8c8beca00.
This commit introduced a new state variable into adev without even remotely
drm/amdgpu: partially revert "reduce reset time"
This partially reverts commit 194eb174cbe4fe2b3376ac30acca2dc8c8beca00.
This commit introduced a new state variable into adev without even remotely worrying about CPU barriers.
Since we already have the amdgpu_in_reset() function exactly for this use case partially revert that.
Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1 |
|
| #
2965e635 |
| 28-Nov-2024 |
Mario Limonciello <[email protected]> |
drm/amd: Add Suspend/Hibernate notification callback support
As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the p
drm/amd: Add Suspend/Hibernate notification callback support
As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend.
Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible.
Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary.
Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12 |
|
| #
a86e0c0e |
| 15-Nov-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to ident
drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to identify whether the IP/feature is initialized for the first time or whether it's reinitialized after a reset.
Add RESET_RECOVERY init level to identify post reset reinitialization phase. This only provides a device level identification, IP/features may choose to track their state independently also.
Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7 |
|
| #
6c8d1f4b |
| 05-Nov-2024 |
[email protected] <[email protected]> |
drm/amdgpu: Add sysfs interface for gc reset mask
Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask
These interfaces are read-only and show the resets supported by the
drm/amdgpu: Add sysfs interface for gc reset mask
Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask
These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) v4: Fixing uninitialized variables (Tim)
Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4 |
|
| #
efe6a877 |
| 14-Oct-2024 |
Alex Deucher <[email protected]> |
drm/amdgpu: fix fairness in enforce isolation handling
Make sure KFD gets a turn when serializing access to the GC IP. Currently non-KFD jobs can starve KFD if they submit often enough. This patch
drm/amdgpu: fix fairness in enforce isolation handling
Make sure KFD gets a turn when serializing access to the GC IP. Currently non-KFD jobs can starve KFD if they submit often enough. This patch prevents that by stalling non-KFD if its time period has elapsed.
v2: fix units v3: check enablement properly
Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
d5e3d8a2 |
| 14-Oct-2024 |
Prike Liang <[email protected]> |
drm/amdgpu: clean up the suspend_complete
To check the status of S3 suspend completion, use the PM core pm_suspend_global_flags bit(1) to detect S3 abort events. Therefore, clean up the AMDGPU drive
drm/amdgpu: clean up the suspend_complete
To check the status of S3 suspend completion, use the PM core pm_suspend_global_flags bit(1) to detect S3 abort events. Therefore, clean up the AMDGPU driver's private flag suspend_complete.
Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
502d7630 |
| 17-Oct-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: validate resume before function call
Before making a function call to resume, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_resume where s
drm/amdgpu: validate resume before function call
Before making a function call to resume, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_resume where same checks and calls are repeated.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
e095026f |
| 17-Oct-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_suspend wher
drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated.
Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7 |
|
| #
2accf9d6 |
| 02-Sep-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Drop delayed reset work handler
Drop delayed reset work handler as it is no longer used.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Review
drm/amdgpu: Drop delayed reset work handler
Drop delayed reset work handler as it is no longer used.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
6e261ecb |
| 23-Sep-2024 |
Dr. David Alan Gilbert <[email protected]> |
drm/amdgpu: Remove unused amdgpu_atpx functions
amdgpu_atpx_dgpu_req_power_for_displays has been unused since commit bdb1ccb080da ("drm/amdgpu: remove ATPX_DGPU_REQ_POWER_FOR_DISPLAYS check when hot
drm/amdgpu: Remove unused amdgpu_atpx functions
amdgpu_atpx_dgpu_req_power_for_displays has been unused since commit bdb1ccb080da ("drm/amdgpu: remove ATPX_DGPU_REQ_POWER_FOR_DISPLAYS check when hotplug-in")
amdgpu_atpx_get_dhandle has been unused since commit f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
632aac62 |
| 23-Sep-2024 |
Dr. David Alan Gilbert <[email protected]> |
drm/amdgpu: Remove unused amdgpu_device_ip_is_idle
amdgpu_device_ip_is_idle is unused. It was renamed from 'amdgpu_is_idle' which was originally added in commit 5dbbb60ba61e ("drm/amdgpu: add IP hel
drm/amdgpu: Remove unused amdgpu_device_ip_is_idle
amdgpu_device_ip_is_idle is unused. It was renamed from 'amdgpu_is_idle' which was originally added in commit 5dbbb60ba61e ("drm/amdgpu: add IP helpers for wait_for_idle and is_idle")
but hasn't been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5 |
|
| #
1e4acf4d |
| 21-Aug-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add reset on init handler for XGMI
In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence.
Signed-off-by: Lijo Lazar <
drm/amdgpu: Add reset on init handler for XGMI
In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
37b99322 |
| 24-Sep-2024 |
Sunil Khatri <[email protected]> |
drm/amdgpu: add amdgpu_device reference in ip block
To handle amdgpu_device reference for different GPUs we add it's reference in each ip block which can be used to differentiate between difference
drm/amdgpu: add amdgpu_device reference in ip block
To handle amdgpu_device reference for different GPUs we add it's reference in each ip block which can be used to differentiate between difference gpu devices.
Signed-off-by: Sunil Khatri <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
6e37ae8b |
| 26-Aug-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Separate reinitialization after reset
Move the reinitialization part after a reset to another function. No functional changes.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by
drm/amdgpu: Separate reinitialization after reset
Move the reinitialization part after a reset to another function. No functional changes.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Acked-by: Alex Deucher <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
14f2fe34 |
| 19-Aug-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add init levels
Add init levels to define the level to which device needs to be initialized.
Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwa
drm/amdgpu: Add init levels
Add init levels to define the level to which device needs to be initialized.
Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7 |
|
| #
dc443aa4 |
| 04-Jul-2024 |
Asad Kamal <[email protected]> |
drm/amd/amdgpu: Add helper to get ip block valid
Add helper function to check if ip block is enabled
Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Sign
drm/amd/amdgpu: Add helper to get ip block valid
Add helper function to check if ip block is enabled
Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|