History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c (Results 1 – 25 of 71)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6
# f7a594e4 01-Jan-2025 Lijo Lazar <[email protected]>

drm/amdgpu: Use active umc info from discovery

There could be configs where some UMC instances are harvested. This
information is obtained through discovery data and populated in
umc.active_mask. Av

drm/amdgpu: Use active umc info from discovery

There could be configs where some UMC instances are harvested. This
information is obtained through discovery data and populated in
umc.active_mask. Avoid reassigning this as AID mask, instead use the
mask directly while iterating through umc instances. This is to avoid
accesses to harvested UMC instances.

v2: fix warning (Alex)

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1
# ae756cd8 29-Nov-2024 Tao Zhou <[email protected]>

drm/amdgpu: correct the calculation of RAS bad page

After the introduction of NPS RAS, one bad page record on eeprom may be
related to 1 or 16 bad pages, so the bad page record and bad page are
two

drm/amdgpu: correct the calculation of RAS bad page

After the introduction of NPS RAS, one bad page record on eeprom may be
related to 1 or 16 bad pages, so the bad page record and bad page are
two different concepts, define a new variable to store bad page number.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12, v6.12-rc7, v6.12-rc6
# a8d133e6 31-Oct-2024 Tao Zhou <[email protected]>

drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes

All legacy RAS bad pages are generated in NPS1 mode, but new bad page
can be generated in any NPS mode, so we can't use

drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes

All legacy RAS bad pages are generated in NPS1 mode, but new bad page
can be generated in any NPS mode, so we can't use retired_page stored
on eeprom directly in non-nps1 mode even for legacy data. We need to
take different actions for different data, new data can be identified
from old data by UMC_CHANNEL_IDX_V2 flag.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 71a0e963 29-Oct-2024 Tao Zhou <[email protected]>

drm/amdgpu: save UMC global channel index to eeprom

Save the global channel index returned by RAS TA to eeprom.
We can get memory physical address by MCA address and channel index.

Signed-off-by: T

drm/amdgpu: save UMC global channel index to eeprom

Save the global channel index returned by RAS TA to eeprom.
We can get memory physical address by MCA address and channel index.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc5
# b02ef407 24-Oct-2024 Tao Zhou <[email protected]>

drm/amdgpu: add function to find all memory pages in one physical row

And the function can be reused across amdgpu driver.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <Ha

drm/amdgpu: add function to find all memory pages in one physical row

And the function can be reused across amdgpu driver.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc4
# f44a3058 18-Oct-2024 Tao Zhou <[email protected]>

drm/amdgpu: add return value for convert_ras_err_addr

So upper layer can return failure directly if address conversion fails.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang

drm/amdgpu: add return value for convert_ras_err_addr

So upper layer can return failure directly if address conversion fails.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 76723fbc 18-Oct-2024 Tao Zhou <[email protected]>

drm/amdgpu: reduce memory usage for umc_lookup_bad_pages_in_a_row

The function handles one page in one time, allocating umc.retire_unit
bad page records is enough.

Signed-off-by: Tao Zhou <tao.zhou

drm/amdgpu: reduce memory usage for umc_lookup_bad_pages_in_a_row

The function handles one page in one time, allocating umc.retire_unit
bad page records is enough.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 4e7812e2 17-Oct-2024 Tao Zhou <[email protected]>

drm/amdgpu: make convert_ras_err_addr visible outside UMC block

And change some UMC v12 specific functions to generic version, so the
code can be shared.

Signed-off-by: Tao Zhou <[email protected]>

drm/amdgpu: make convert_ras_err_addr visible outside UMC block

And change some UMC v12 specific functions to generic version, so the
code can be shared.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 84a2947e 30-Oct-2024 Victor Skvortsov <[email protected]>

drm/amdgpu: Implement virt req_ras_err_count

Enable RAS late init if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The

drm/amdgpu: Implement virt req_ras_err_count

Enable RAS late init if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.

The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.

v2: Flip generate report & update statistics order for VF

Signed-off-by: Victor Skvortsov <[email protected]>
Acked-by: Tao Zhou <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2
# 792be2e2 01-Aug-2024 Tao Zhou <[email protected]>

drm/amdgpu: create function to check RAS RMA status

In the convenience of calling it globally.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-

drm/amdgpu: create function to check RAS RMA status

In the convenience of calling it globally.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc1, v6.10
# a7e8467f 11-Jul-2024 YiPeng Chai <[email protected]>

drm/amdgpu: Remove unused code

Remove unused code.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <alexander.deucher

drm/amdgpu: Remove unused code

Remove unused code.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 56631dee 11-Jul-2024 YiPeng Chai <[email protected]>

drm/amdgpu: optimize logging deferred error info

1. Use pa_pfn as the radix-tree key index to log
deferred error info.
2. Use local array to store a row of bad pages.

Signed-off-by: YiPeng Chai

drm/amdgpu: optimize logging deferred error info

1. Use pa_pfn as the radix-tree key index to log
deferred error info.
2. Use local array to store a row of bad pages.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc7, v6.10-rc6
# e278849c 24-Jun-2024 YiPeng Chai <[email protected]>

drm/amdgpu: refine poison consumption interrupt handler

1. The poison fifo is only used for poison consumption
requests.
2. Merge reset requests when poison fifo caches multiple
poison consump

drm/amdgpu: refine poison consumption interrupt handler

1. The poison fifo is only used for poison consumption
requests.
2. Merge reset requests when poison fifo caches multiple
poison consumption messages

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1
# 5f7697bb 23-May-2024 Tao Zhou <[email protected]>

drm/amdgpu: trigger mode1 reset for RAS RMA status

Check RMA status in bad page retirement flow.

v2: fix coding bugs in v1.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <

drm/amdgpu: trigger mode1 reset for RAS RMA status

Check RMA status in bad page retirement flow.

v2: fix coding bugs in v1.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9, v6.9-rc7, v6.9-rc6
# 506c245f 23-Apr-2024 Bob Zhou <[email protected]>

drm/amdgpu: fix double free err_addr pointer warnings

In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages
will be run many times so that double free err_addr in some special case

drm/amdgpu: fix double free err_addr pointer warnings

In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages
will be run many times so that double free err_addr in some special case.
So set the err_addr to NULL to avoid the warnings.

Signed-off-by: Bob Zhou <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# bfa579b3 22-Apr-2024 YiPeng Chai <[email protected]>

drm/amdgpu: prepare to handle pasid poison consumption

Prepare to handle pasid poison consumption.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-

drm/amdgpu: prepare to handle pasid poison consumption

Prepare to handle pasid poison consumption.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# e74313be 22-Mar-2024 YiPeng Chai <[email protected]>

drm/amdgpu: add condition check for amdgpu_umc_fill_error_record

Add condition check for amdgpu_umc_fill_error_record.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <tao.zh

drm/amdgpu: add condition check for amdgpu_umc_fill_error_record

Add condition check for amdgpu_umc_fill_error_record.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 2cf8e50e 22-Apr-2024 YiPeng Chai <[email protected]>

drm/amdgpu: Add delay work to retire bad pages

Add delay work to retire bad pages.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deu

drm/amdgpu: Add delay work to retire bad pages

Add delay work to retire bad pages.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# f27defca 18-Mar-2024 YiPeng Chai <[email protected]>

drm/amdgpu: umc v12_0 logs ecc errors

1. umc v12_0 logs ecc errors.
2. Reserve newly detected ecc error pages.
3. Add tag for bad pages, so that they can
be retired later.

Signed-off-by: YiPeng

drm/amdgpu: umc v12_0 logs ecc errors

1. umc v12_0 logs ecc errors.
2. Reserve newly detected ecc error pages.
3. Add tag for bad pages, so that they can
be retired later.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 95b4063d 19-Mar-2024 YiPeng Chai <[email protected]>

drm/amdgpu: add interface to update umc v12_0 ecc status

Add interface to update umc v12_0 ecc status.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Sig

drm/amdgpu: add interface to update umc v12_0 ecc status

Add interface to update umc v12_0 ecc status.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 2fc46e0b 12-Mar-2024 Tao Zhou <[email protected]>

drm/amdgpu: make reset method configurable for RAS poison

Each RAS block has different requirement for gpu reset in poison
consumption handling.
Add support for mmhub RAS poison consumption handling

drm/amdgpu: make reset method configurable for RAS poison

Each RAS block has different requirement for gpu reset in poison
consumption handling.
Add support for mmhub RAS poison consumption handling.

v2: remove the mmhub poison support for kfd int v10.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2
# ed1e1e42 23-Jan-2024 YiPeng Chai <[email protected]>

drm/amdgpu: Support passing poison consumption ras block to SRIOV

Support passing poison consumption ras blocks
to SRIOV.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang

drm/amdgpu: Support passing poison consumption ras block to SRIOV

Support passing poison consumption ras blocks
to SRIOV.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc1
# 1757bb7d 18-Jan-2024 Tao Zhou <[email protected]>

drm/amdgpu: update check condition of query for ras page retire

Support page retirement handling in debug mode.

v2: revert smu_v13_0_6_get_ecc_info directly.

Signed-off-by: Tao Zhou <tao.zhou1@amd

drm/amdgpu: update check condition of query for ras page retire

Support page retirement handling in debug mode.

v2: revert smu_v13_0_6_get_ecc_info directly.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 6c23f3d1 15-Jan-2024 YiPeng Chai <[email protected]>

drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning

Use asynchronous polling to handle umc_v12_0 poisoning.

v2:
1. Change function name.
2. Change the debugging information conte

drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning

Use asynchronous polling to handle umc_v12_0 poisoning.

v2:
1. Change function name.
2. Change the debugging information content.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.7
# 46e2231c 03-Jan-2024 Candice Li <[email protected]>

drm/amdgpu: Log deferred error separately

Separate deferred error from UE and CE and log it
individually.

Signed-off-by: Candice Li <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.Zhang@am

drm/amdgpu: Log deferred error separately

Separate deferred error from UE and CE and log it
individually.

Signed-off-by: Candice Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


123