|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
aedc92be |
| 24-Mar-2025 |
Xiang Liu <[email protected]> |
drm/amdgpu: Parse all deferred errors with UMC aca handle
We should only increase the deferred errors in UMC block.
Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <Hawking.
drm/amdgpu: Parse all deferred errors with UMC aca handle
We should only increase the deferred errors in UMC block.
Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
35750679 |
| 06-Feb-2025 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Calculate IP specific xgmi bandwidth
Use IP version specific xgmi speed/width for bandwidth calculation.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Jonathan Kim <jonath
drm/amdgpu: Calculate IP specific xgmi bandwidth
Use IP version specific xgmi speed/width for bandwidth calculation.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Jonathan Kim <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
f9234217 |
| 26-Feb-2025 |
Asad Kamal <[email protected]> |
drm/amd/amdgpu: Add support for xgmi_v6_4_1
Add support for xgmi_v6_4_1 and use it appropriate places
Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Rev
drm/amd/amdgpu: Add support for xgmi_v6_4_1
Add support for xgmi_v6_4_1 and use it appropriate places
Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
485993e2 |
| 06-Feb-2025 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add xgmi speed/width related info
Add APIs to initialize XGMI speed, width details and get to max bandwidth supported. It is assumed that a device only supports same generation of XGMI l
drm/amdgpu: Add xgmi speed/width related info
Add APIs to initialize XGMI speed, width details and get to max bandwidth supported. It is assumed that a device only supports same generation of XGMI links with uniform width.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
6f16d101 |
| 06-Feb-2025 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Move xgmi definitions to xgmi header
Move definitions related to xgmi to amdgpu_xgmi header
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.
drm/amdgpu: Move xgmi definitions to xgmi header
Move definitions related to xgmi to amdgpu_xgmi header
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
00f85667 |
| 26-Feb-2025 |
Xiang Liu <[email protected]> |
drm/amdgpu: Decode deferred error type in aca bank parser
In the case of poison inband log, the error type need to be specified by checking the deferred or poison bit of status register.
v2: check
drm/amdgpu: Decode deferred error type in aca bank parser
In the case of poison inband log, the error type need to be specified by checking the deferred or poison bit of status register.
v2: check both deferred and poison bit
Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
9424a5bf |
| 10-Feb-2025 |
Jonathan Kim <[email protected]> |
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <[email protected]
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1 |
|
| #
56316ee9 |
| 26-Jan-2025 |
Hawking Zhang <[email protected]> |
drm/amdgpu: Include ACA error type in aca bank
ACA error types managed by driver a direct 1:1 correspondence with those managed by firmware.
To address this, for each ACA bank, include both the ACA
drm/amdgpu: Include ACA error type in aca bank
ACA error types managed by driver a direct 1:1 correspondence with those managed by firmware.
To address this, for each ACA bank, include both the ACA error type and the ACA SMU type.
This addition is useful for creating CPER records.
Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
466a59ab |
| 11-Nov-2024 |
Asad Kamal <[email protected]> |
drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0
Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for SMU_v_13_0_6
v2: Get link status register value for each soc from separate functio
drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0
Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for SMU_v_13_0_6
v2: Get link status register value for each soc from separate function (Lijo)
Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
e46738a5 |
| 20-Sep-2024 |
Jonathan Kim <[email protected]> |
drm/amdkfd: sever xgmi io link if host driver has disable sharing
Host drivers can create partial hives per guest by disabling xgmi sharing between certain peers in the main hive. Typically, these p
drm/amdkfd: sever xgmi io link if host driver has disable sharing
Host drivers can create partial hives per guest by disabling xgmi sharing between certain peers in the main hive. Typically, these partial hives are fully connected per guest session. In the event that the host makes a mistake by adding a non-shared node to a guest session, have the KFD reflect sharing disabled by severing the IO link.
Signed-off-by: Jonathan Kim <[email protected]> Tested-by: James Yao <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
d37bc6a4 |
| 17-Oct-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Fix the logic for NPS request failure
On a hive, NPS request is placed by the first one for all devices in the hive. If the request fails, mark the mode as UNKNOWN so that subsequent dev
drm/amdgpu: Fix the logic for NPS request failure
On a hive, NPS request is placed by the first one for all devices in the hive. If the request fails, mark the mode as UNKNOWN so that subsequent devices on unload don't request it. Also, fix the mutex double lock issue in error condition, should have been mutex_unlock.
Fixes: ee52489d1210 ("drm/amdgpu: Place NPS mode request on unload") Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Rajneesh Bhardwaj <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
d25d26b8 |
| 07-Oct-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Wait for reset on init completion
When reset on initialization is requested, wait for the reset to finish. In cases where module is loaded after boot, this makes sure all initialization
drm/amdgpu: Wait for reset on init completion
When reset on initialization is requested, wait for the reset to finish. In cases where module is loaded after boot, this makes sure all initialization work is done after a successful return of modprobe.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Ramesh Errabolu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
ee52489d |
| 20-Sep-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Place NPS mode request on unload
If a user has requested NPS mode switch, place the request through PSP during unload of the driver. For devices which are part of a hive, all requests ar
drm/amdgpu: Place NPS mode request on unload
If a user has requested NPS mode switch, place the request through PSP during unload of the driver. For devices which are part of a hive, all requests are placed together. If one of them fails, revert back to the current NPS mode.
Signed-off-by: Lijo Lazar <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
bbc16008 |
| 19-Sep-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Add gmc interface to request NPS mode
Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode.
Signed
drm/amdgpu: Add gmc interface to request NPS mode
Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode.
Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
1845752b |
| 02-Oct-2024 |
Colin Ian King <[email protected]> |
drm/amdgpu: Fix spelling mistake "initializtion" -> "initialization"
There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <[email protected]> Signed-off-by:
drm/amdgpu: Fix spelling mistake "initializtion" -> "initialization"
There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
631af731 |
| 26-Aug-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Refactor XGMI reset on init handling
Use XGMI hive information to rely on resetting XGMI devices on initialization rather than using mgpu structure. mgpu structure may have other devices
drm/amdgpu: Refactor XGMI reset on init handling
Use XGMI hive information to rely on resetting XGMI devices on initialization rather than using mgpu structure. mgpu structure may have other devices as well.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5 |
|
| #
5839d27d |
| 19-Aug-2024 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Use init level for pending_reset flag
Drop pending_reset flag in gmc block. Instead use init level to determine which type of init is preferred - in this case MINIMAL.
Signed-off-by: Li
drm/amdgpu: Use init level for pending_reset flag
Drop pending_reset flag in gmc block. Instead use init level to determine which type of init is preferred - in this case MINIMAL.
Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
671af066 |
| 02-Aug-2024 |
Yang Wang <[email protected]> |
drm/amdgpu: remove RAS unused paramter 'err_addr'
- amdgpu_ras_error_statistic_ue_count() - amdgpu_ras_error_statistic_ce_count() - amdgpu_ras_error_statistic_de_count()
The parameter 'err_addr' is
drm/amdgpu: remove RAS unused paramter 'err_addr'
- amdgpu_ras_error_statistic_ue_count() - amdgpu_ras_error_statistic_ce_count() - amdgpu_ras_error_statistic_de_count()
The parameter 'err_addr' is no longer used since following patch.
Fixes: a7e8467fbeee ("drm/amdgpu: Remove unused code") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8 |
|
| #
546e6309 |
| 04-Mar-2024 |
Lijo Lazar <[email protected]> |
drm/amd/pm: Remove legacy interface for xgmi plpd
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client. Remov
drm/amd/pm: Remove legacy interface for xgmi plpd
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client. Remove that as well.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
8f184f8e |
| 06-May-2024 |
Tim Huang <[email protected]> |
drm/amdgpu: fix uninitialized variable warning for amdgpu_xgmi
Clear warning that using uninitialized variable current_node.
Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Alex Deucher <
drm/amdgpu: fix uninitialized variable warning for amdgpu_xgmi
Clear warning that using uninitialized variable current_node.
Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
9ecef5b2 |
| 01-Apr-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: update check condition for XGMI ACA UE
Check more possible ext error codes.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Al
drm/amdgpu: update check condition for XGMI ACA UE
Check more possible ext error codes.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6 |
|
| #
e3d4de8d |
| 22-Feb-2024 |
Yang Wang <[email protected]> |
drm/amdgpu: retire unused aca_bank_report data structure
retire unused aca_bank_report data structure.
Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <Hawking.Zhang@am
drm/amdgpu: retire unused aca_bank_report data structure
retire unused aca_bank_report data structure.
Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
62d2aaa7 |
| 22-Feb-2024 |
Yang Wang <[email protected]> |
drm/amdgpu: refine aca error cache for xgmi v6.4.0
refine aca error cache for xgmi v6.4.0
Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed
drm/amdgpu: refine aca error cache for xgmi v6.4.0
refine aca error cache for xgmi v6.4.0
Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3 |
|
| #
abc3b5d2 |
| 31-Jan-2024 |
Yang Wang <[email protected]> |
drm/amdgpu: add new aca_smu_type support
Add new types to distinguish between ACA error type and smu mca type.
e.g.: the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank channel, so ad
drm/amdgpu: add new aca_smu_type support
Add new types to distinguish between ACA error type and smu mca type.
e.g.: the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank channel, so add new type 'aca_smu_type' to distinguish aca error type and smu mca type.
Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
1714a1ff |
| 24-Nov-2023 |
Yang Wang <[email protected]> |
drm/amdgpu: replace MCA macro with ACA for XGMI
use new ACA macro to instead of MCA
Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-b
drm/amdgpu: replace MCA macro with ACA for XGMI
use new ACA macro to instead of MCA
Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|