|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6 |
|
| #
f7a594e4 |
| 01-Jan-2025 |
Lijo Lazar <[email protected]> |
drm/amdgpu: Use active umc info from discovery
There could be configs where some UMC instances are harvested. This information is obtained through discovery data and populated in umc.active_mask. Av
drm/amdgpu: Use active umc info from discovery
There could be configs where some UMC instances are harvested. This information is obtained through discovery data and populated in umc.active_mask. Avoid reassigning this as AID mask, instead use the mask directly while iterating through umc instances. This is to avoid accesses to harvested UMC instances.
v2: fix warning (Alex)
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
ae756cd8 |
| 29-Nov-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: correct the calculation of RAS bad page
After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two
drm/amdgpu: correct the calculation of RAS bad page
After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two different concepts, define a new variable to store bad page number.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
a8d133e6 |
| 31-Oct-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes
All legacy RAS bad pages are generated in NPS1 mode, but new bad page can be generated in any NPS mode, so we can't use
drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes
All legacy RAS bad pages are generated in NPS1 mode, but new bad page can be generated in any NPS mode, so we can't use retired_page stored on eeprom directly in non-nps1 mode even for legacy data. We need to take different actions for different data, new data can be identified from old data by UMC_CHANNEL_IDX_V2 flag.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
71a0e963 |
| 29-Oct-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: save UMC global channel index to eeprom
Save the global channel index returned by RAS TA to eeprom. We can get memory physical address by MCA address and channel index.
Signed-off-by: T
drm/amdgpu: save UMC global channel index to eeprom
Save the global channel index returned by RAS TA to eeprom. We can get memory physical address by MCA address and channel index.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5 |
|
| #
b02ef407 |
| 24-Oct-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: add function to find all memory pages in one physical row
And the function can be reused across amdgpu driver.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <Ha
drm/amdgpu: add function to find all memory pages in one physical row
And the function can be reused across amdgpu driver.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4 |
|
| #
f44a3058 |
| 18-Oct-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: add return value for convert_ras_err_addr
So upper layer can return failure directly if address conversion fails.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang
drm/amdgpu: add return value for convert_ras_err_addr
So upper layer can return failure directly if address conversion fails.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
76723fbc |
| 18-Oct-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: reduce memory usage for umc_lookup_bad_pages_in_a_row
The function handles one page in one time, allocating umc.retire_unit bad page records is enough.
Signed-off-by: Tao Zhou <tao.zhou
drm/amdgpu: reduce memory usage for umc_lookup_bad_pages_in_a_row
The function handles one page in one time, allocating umc.retire_unit bad page records is enough.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
4e7812e2 |
| 17-Oct-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: make convert_ras_err_addr visible outside UMC block
And change some UMC v12 specific functions to generic version, so the code can be shared.
Signed-off-by: Tao Zhou <[email protected]>
drm/amdgpu: make convert_ras_err_addr visible outside UMC block
And change some UMC v12 specific functions to generic version, so the code can be shared.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
84a2947e |
| 30-Oct-2024 |
Victor Skvortsov <[email protected]> |
drm/amdgpu: Implement virt req_ras_err_count
Enable RAS late init if VF RAS Telemetry is supported.
When enabled, the VF can use this interface to query total RAS error counts from the host.
The
drm/amdgpu: Implement virt req_ras_err_count
Enable RAS late init if VF RAS Telemetry is supported.
When enabled, the VF can use this interface to query total RAS error counts from the host.
The VF FB access may abruptly end due to a fatal error, therefore the VF must cache and sanitize the input.
The Host allows 15 Telemetry messages every 60 seconds, afterwhich the host will ignore any more in-coming telemetry messages. The VF will rate limit its msg calling to once every 5 seconds (12 times in 60 seconds). While the VF is rate limited, it will continue to report the last good cached data.
v2: Flip generate report & update statistics order for VF
Signed-off-by: Victor Skvortsov <[email protected]> Acked-by: Tao Zhou <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
792be2e2 |
| 01-Aug-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: create function to check RAS RMA status
In the convenience of calling it globally.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-
drm/amdgpu: create function to check RAS RMA status
In the convenience of calling it globally.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10 |
|
| #
a7e8467f |
| 11-Jul-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: Remove unused code
Remove unused code.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <alexander.deucher
drm/amdgpu: Remove unused code
Remove unused code.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
56631dee |
| 11-Jul-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: optimize logging deferred error info
1. Use pa_pfn as the radix-tree key index to log deferred error info. 2. Use local array to store a row of bad pages.
Signed-off-by: YiPeng Chai
drm/amdgpu: optimize logging deferred error info
1. Use pa_pfn as the radix-tree key index to log deferred error info. 2. Use local array to store a row of bad pages.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7, v6.10-rc6 |
|
| #
e278849c |
| 24-Jun-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: refine poison consumption interrupt handler
1. The poison fifo is only used for poison consumption requests. 2. Merge reset requests when poison fifo caches multiple poison consump
drm/amdgpu: refine poison consumption interrupt handler
1. The poison fifo is only used for poison consumption requests. 2. Merge reset requests when poison fifo caches multiple poison consumption messages
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1 |
|
| #
5f7697bb |
| 23-May-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: trigger mode1 reset for RAS RMA status
Check RMA status in bad page retirement flow.
v2: fix coding bugs in v1.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <
drm/amdgpu: trigger mode1 reset for RAS RMA status
Check RMA status in bad page retirement flow.
v2: fix coding bugs in v1.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
506c245f |
| 23-Apr-2024 |
Bob Zhou <[email protected]> |
drm/amdgpu: fix double free err_addr pointer warnings
In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages will be run many times so that double free err_addr in some special case
drm/amdgpu: fix double free err_addr pointer warnings
In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages will be run many times so that double free err_addr in some special case. So set the err_addr to NULL to avoid the warnings.
Signed-off-by: Bob Zhou <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
bfa579b3 |
| 22-Apr-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: prepare to handle pasid poison consumption
Prepare to handle pasid poison consumption.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-
drm/amdgpu: prepare to handle pasid poison consumption
Prepare to handle pasid poison consumption.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
e74313be |
| 22-Mar-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: add condition check for amdgpu_umc_fill_error_record
Add condition check for amdgpu_umc_fill_error_record.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <tao.zh
drm/amdgpu: add condition check for amdgpu_umc_fill_error_record
Add condition check for amdgpu_umc_fill_error_record.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
2cf8e50e |
| 22-Apr-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: Add delay work to retire bad pages
Add delay work to retire bad pages.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deu
drm/amdgpu: Add delay work to retire bad pages
Add delay work to retire bad pages.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
f27defca |
| 18-Mar-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: umc v12_0 logs ecc errors
1. umc v12_0 logs ecc errors. 2. Reserve newly detected ecc error pages. 3. Add tag for bad pages, so that they can be retired later.
Signed-off-by: YiPeng
drm/amdgpu: umc v12_0 logs ecc errors
1. umc v12_0 logs ecc errors. 2. Reserve newly detected ecc error pages. 3. Add tag for bad pages, so that they can be retired later.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
95b4063d |
| 19-Mar-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: add interface to update umc v12_0 ecc status
Add interface to update umc v12_0 ecc status.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Sig
drm/amdgpu: add interface to update umc v12_0 ecc status
Add interface to update umc v12_0 ecc status.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
2fc46e0b |
| 12-Mar-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: make reset method configurable for RAS poison
Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling
drm/amdgpu: make reset method configurable for RAS poison
Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling.
v2: remove the mmhub poison support for kfd int v10.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2 |
|
| #
ed1e1e42 |
| 23-Jan-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: Support passing poison consumption ras block to SRIOV
Support passing poison consumption ras blocks to SRIOV.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang
drm/amdgpu: Support passing poison consumption ras block to SRIOV
Support passing poison consumption ras blocks to SRIOV.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc1 |
|
| #
1757bb7d |
| 18-Jan-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: update check condition of query for ras page retire
Support page retirement handling in debug mode.
v2: revert smu_v13_0_6_get_ecc_info directly.
Signed-off-by: Tao Zhou <tao.zhou1@amd
drm/amdgpu: update check condition of query for ras page retire
Support page retirement handling in debug mode.
v2: revert smu_v13_0_6_get_ecc_info directly.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
6c23f3d1 |
| 15-Jan-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning
Use asynchronous polling to handle umc_v12_0 poisoning.
v2: 1. Change function name. 2. Change the debugging information conte
drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning
Use asynchronous polling to handle umc_v12_0 poisoning.
v2: 1. Change function name. 2. Change the debugging information content.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.7 |
|
| #
46e2231c |
| 03-Jan-2024 |
Candice Li <[email protected]> |
drm/amdgpu: Log deferred error separately
Separate deferred error from UE and CE and log it individually.
Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <Hawking.Zhang@am
drm/amdgpu: Log deferred error separately
Separate deferred error from UE and CE and log it individually.
Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|