History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c (Results 1 – 25 of 138)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14
# aedc92be 24-Mar-2025 Xiang Liu <[email protected]>

drm/amdgpu: Parse all deferred errors with UMC aca handle

We should only increase the deferred errors in UMC block.

Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.

drm/amdgpu: Parse all deferred errors with UMC aca handle

We should only increase the deferred errors in UMC block.

Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2
# 35750679 06-Feb-2025 Lijo Lazar <[email protected]>

drm/amdgpu: Calculate IP specific xgmi bandwidth

Use IP version specific xgmi speed/width for bandwidth calculation.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Jonathan Kim <jonath

drm/amdgpu: Calculate IP specific xgmi bandwidth

Use IP version specific xgmi speed/width for bandwidth calculation.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Jonathan Kim <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# f9234217 26-Feb-2025 Asad Kamal <[email protected]>

drm/amd/amdgpu: Add support for xgmi_v6_4_1

Add support for xgmi_v6_4_1 and use it appropriate places

Signed-off-by: Asad Kamal <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Rev

drm/amd/amdgpu: Add support for xgmi_v6_4_1

Add support for xgmi_v6_4_1 and use it appropriate places

Signed-off-by: Asad Kamal <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 485993e2 06-Feb-2025 Lijo Lazar <[email protected]>

drm/amdgpu: Add xgmi speed/width related info

Add APIs to initialize XGMI speed, width details and get to max
bandwidth supported. It is assumed that a device only supports same
generation of XGMI l

drm/amdgpu: Add xgmi speed/width related info

Add APIs to initialize XGMI speed, width details and get to max
bandwidth supported. It is assumed that a device only supports same
generation of XGMI links with uniform width.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 6f16d101 06-Feb-2025 Lijo Lazar <[email protected]>

drm/amdgpu: Move xgmi definitions to xgmi header

Move definitions related to xgmi to amdgpu_xgmi header

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.

drm/amdgpu: Move xgmi definitions to xgmi header

Move definitions related to xgmi to amdgpu_xgmi header

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 00f85667 26-Feb-2025 Xiang Liu <[email protected]>

drm/amdgpu: Decode deferred error type in aca bank parser

In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.

v2: check

drm/amdgpu: Decode deferred error type in aca bank parser

In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.

v2: check both deferred and poison bit

Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 9424a5bf 10-Feb-2025 Jonathan Kim <[email protected]>

drm/amdgpu: simplify xgmi peer info calls

Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.

Signed-off-by: Jonathan Kim <[email protected]

drm/amdgpu: simplify xgmi peer info calls

Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.

Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc1
# 56316ee9 26-Jan-2025 Hawking Zhang <[email protected]>

drm/amdgpu: Include ACA error type in aca bank

ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA

drm/amdgpu: Include ACA error type in aca bank

ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.

This addition is useful for creating CPER records.

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12
# 466a59ab 11-Nov-2024 Asad Kamal <[email protected]>

drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0

Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for
SMU_v_13_0_6

v2: Get link status register value for each soc from separate
functio

drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0

Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for
SMU_v_13_0_6

v2: Get link status register value for each soc from separate
function (Lijo)

Signed-off-by: Asad Kamal <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1
# e46738a5 20-Sep-2024 Jonathan Kim <[email protected]>

drm/amdkfd: sever xgmi io link if host driver has disable sharing

Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these p

drm/amdkfd: sever xgmi io link if host driver has disable sharing

Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these partial hives are fully connected per guest session.
In the event that the host makes a mistake by adding a non-shared node
to a guest session, have the KFD reflect sharing disabled by severing
the IO link.

Signed-off-by: Jonathan Kim <[email protected]>
Tested-by: James Yao <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# d37bc6a4 17-Oct-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Fix the logic for NPS request failure

On a hive, NPS request is placed by the first one for all devices in the
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
dev

drm/amdgpu: Fix the logic for NPS request failure

On a hive, NPS request is placed by the first one for all devices in the
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
devices on unload don't request it. Also, fix the mutex double lock
issue in error condition, should have been mutex_unlock.

Fixes: ee52489d1210 ("drm/amdgpu: Place NPS mode request on unload")
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Rajneesh Bhardwaj <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# d25d26b8 07-Oct-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Wait for reset on init completion

When reset on initialization is requested, wait for the reset to finish.
In cases where module is loaded after boot, this makes sure all
initialization

drm/amdgpu: Wait for reset on init completion

When reset on initialization is requested, wait for the reset to finish.
In cases where module is loaded after boot, this makes sure all
initialization work is done after a successful return of modprobe.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Ramesh Errabolu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# ee52489d 20-Sep-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Place NPS mode request on unload

If a user has requested NPS mode switch, place the request through PSP
during unload of the driver. For devices which are part of a hive, all
requests ar

drm/amdgpu: Place NPS mode request on unload

If a user has requested NPS mode switch, place the request through PSP
during unload of the driver. For devices which are part of a hive, all
requests are placed together. If one of them fails, revert back to the
current NPS mode.

Signed-off-by: Lijo Lazar <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# bbc16008 19-Sep-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Add gmc interface to request NPS mode

Add a common interface in GMC to request NPS mode through PSP. Also add
a variable in hive and gmc control to track the last requested mode.

Signed

drm/amdgpu: Add gmc interface to request NPS mode

Add a common interface in GMC to request NPS mode through PSP. Also add
a variable in hive and gmc control to track the last requested mode.

Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 1845752b 02-Oct-2024 Colin Ian King <[email protected]>

drm/amdgpu: Fix spelling mistake "initializtion" -> "initialization"

There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by:

drm/amdgpu: Fix spelling mistake "initializtion" -> "initialization"

There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11, v6.11-rc7, v6.11-rc6
# 631af731 26-Aug-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Refactor XGMI reset on init handling

Use XGMI hive information to rely on resetting XGMI devices on
initialization rather than using mgpu structure. mgpu structure may have
other devices

drm/amdgpu: Refactor XGMI reset on init handling

Use XGMI hive information to rely on resetting XGMI devices on
initialization rather than using mgpu structure. mgpu structure may have
other devices as well.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Acked-by: Rajneesh Bhardwaj <[email protected]>
Tested-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc5
# 5839d27d 19-Aug-2024 Lijo Lazar <[email protected]>

drm/amdgpu: Use init level for pending_reset flag

Drop pending_reset flag in gmc block. Instead use init level to
determine which type of init is preferred - in this case MINIMAL.

Signed-off-by: Li

drm/amdgpu: Use init level for pending_reset flag

Drop pending_reset flag in gmc block. Instead use init level to
determine which type of init is preferred - in this case MINIMAL.

Signed-off-by: Lijo Lazar <[email protected]>
Acked-by: Rajneesh Bhardwaj <[email protected]>
Tested-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2
# 671af066 02-Aug-2024 Yang Wang <[email protected]>

drm/amdgpu: remove RAS unused paramter 'err_addr'

- amdgpu_ras_error_statistic_ue_count()
- amdgpu_ras_error_statistic_ce_count()
- amdgpu_ras_error_statistic_de_count()

The parameter 'err_addr' is

drm/amdgpu: remove RAS unused paramter 'err_addr'

- amdgpu_ras_error_statistic_ue_count()
- amdgpu_ras_error_statistic_ce_count()
- amdgpu_ras_error_statistic_de_count()

The parameter 'err_addr' is no longer used since following patch.

Fixes: a7e8467fbeee ("drm/amdgpu: Remove unused code")
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8
# 546e6309 04-Mar-2024 Lijo Lazar <[email protected]>

drm/amd/pm: Remove legacy interface for xgmi plpd

Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI
PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client.
Remov

drm/amd/pm: Remove legacy interface for xgmi plpd

Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI
PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client.
Remove that as well.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 8f184f8e 06-May-2024 Tim Huang <[email protected]>

drm/amdgpu: fix uninitialized variable warning for amdgpu_xgmi

Clear warning that using uninitialized variable current_node.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Alex Deucher <

drm/amdgpu: fix uninitialized variable warning for amdgpu_xgmi

Clear warning that using uninitialized variable current_node.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 9ecef5b2 01-Apr-2024 Tao Zhou <[email protected]>

drm/amdgpu: update check condition for XGMI ACA UE

Check more possible ext error codes.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Al

drm/amdgpu: update check condition for XGMI ACA UE

Check more possible ext error codes.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc7, v6.8-rc6
# e3d4de8d 22-Feb-2024 Yang Wang <[email protected]>

drm/amdgpu: retire unused aca_bank_report data structure

retire unused aca_bank_report data structure.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.Zhang@am

drm/amdgpu: retire unused aca_bank_report data structure

retire unused aca_bank_report data structure.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 62d2aaa7 22-Feb-2024 Yang Wang <[email protected]>

drm/amdgpu: refine aca error cache for xgmi v6.4.0

refine aca error cache for xgmi v6.4.0

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed

drm/amdgpu: refine aca error cache for xgmi v6.4.0

refine aca error cache for xgmi v6.4.0

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3
# abc3b5d2 31-Jan-2024 Yang Wang <[email protected]>

drm/amdgpu: add new aca_smu_type support

Add new types to distinguish between ACA error type and smu mca type.

e.g.:
the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank
channel, so ad

drm/amdgpu: add new aca_smu_type support

Add new types to distinguish between ACA error type and smu mca type.

e.g.:
the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank
channel, so add new type 'aca_smu_type' to distinguish aca error type
and smu mca type.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3
# 1714a1ff 24-Nov-2023 Yang Wang <[email protected]>

drm/amdgpu: replace MCA macro with ACA for XGMI

use new ACA macro to instead of MCA

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-b

drm/amdgpu: replace MCA macro with ACA for XGMI

use new ACA macro to instead of MCA

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


123456