History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h (Results 1 – 22 of 22)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14
# aedc92be 24-Mar-2025 Xiang Liu <[email protected]>

drm/amdgpu: Parse all deferred errors with UMC aca handle

We should only increase the deferred errors in UMC block.

Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.

drm/amdgpu: Parse all deferred errors with UMC aca handle

We should only increase the deferred errors in UMC block.

Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 338f7412 19-Mar-2025 Xiang Liu <[email protected]>

drm/amdgpu: Decode deferred error type in gfx aca bank parser

In the case of injecting uncorrected error with background workload,
the deferred error among uncorrected errors need to be specified
by

drm/amdgpu: Decode deferred error type in gfx aca bank parser

In the case of injecting uncorrected error with background workload,
the deferred error among uncorrected errors need to be specified
by checking the deferred and poison bits of status register.

v2: refine checking for deferred error
v2: log possiable DEs among CEs
v2: generate CPER records for DEs among UEs

Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5
# 00f85667 26-Feb-2025 Xiang Liu <[email protected]>

drm/amdgpu: Decode deferred error type in aca bank parser

In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.

v2: check

drm/amdgpu: Decode deferred error type in aca bank parser

In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.

v2: check both deferred and poison bit

Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1
# ad97840f 26-Jan-2025 Hawking Zhang <[email protected]>

drm/amdgpu: Introduce funcs for generating cper record

Introduce new functions that are used to generate
cper ue or ce records.

v2: return -ENOMEM instead of false
v2: check return value of fill se

drm/amdgpu: Introduce funcs for generating cper record

Introduce new functions that are used to generate
cper ue or ce records.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Hawking Zhang <[email protected]>
Signed-off-by: Xiang Liu <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 56316ee9 26-Jan-2025 Hawking Zhang <[email protected]>

drm/amdgpu: Include ACA error type in aca bank

ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA

drm/amdgpu: Include ACA error type in aca bank

ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.

This addition is useful for creating CPER records.

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1
# abfcf956 29-Nov-2024 Yang Wang <[email protected]>

drm/amdgpu: move common ACA ipid defines into amdgpu_aca.h

move common ACA ipid defines into amdgpu_aca.h file.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <Hawking

drm/amdgpu: move common ACA ipid defines into amdgpu_aca.h

move common ACA ipid defines into amdgpu_aca.h file.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5
# a4fcb5f7 18-Jun-2024 Yang Wang <[email protected]>

Revert "drm/amdgpu: change aca bank error lock type to spinlock"

This reverts commit f6bce954f432c556659a57be9e18fecdc575affb.

Revert this patch to modify lock type back to 'mutex' to avoid kernel

Revert "drm/amdgpu: change aca bank error lock type to spinlock"

This reverts commit f6bce954f432c556659a57be9e18fecdc575affb.

Revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.

[ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[ 602.668939] Call Trace:
[ 602.668940] <TASK>
[ 602.668941] dump_stack_lvl+0x4c/0x70
[ 602.668945] dump_stack+0x14/0x20
[ 602.668946] __schedule_bug+0x5a/0x70
[ 602.668950] __schedule+0x940/0xb30
[ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.668955] ? hrtimer_reprogram+0x77/0xb0
[ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.668959] ? hrtimer_start_range_ns+0x126/0x370
[ 602.668961] schedule+0x39/0xe0
[ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140
[ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10
[ 602.668966] schedule_hrtimeout_range+0x17/0x20
[ 602.668967] usleep_range_state+0x69/0x90
[ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu]
[ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu]
[ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669339] ? stack_depot_save+0x12/0x20
[ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669343] ? set_track_prepare+0x52/0x70
[ 602.669346] ? kmemleak_alloc+0x4f/0x90
[ 602.669348] ? __kmalloc_node+0x34b/0x450
[ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[ 602.669743] ? kmemleak_free+0x36/0x60
[ 602.669745] ? kvfree+0x32/0x40
[ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669749] ? kfree+0x15d/0x2a0
[ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0
[ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[ 602.670187] ? srso_alias_return_thunk+0x5/0

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: YiPeng Chai <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc4, v6.10-rc3
# 9817f061 04-Jun-2024 Yang Wang <[email protected]>

drm/amdgpu: move aca/mca init functions into ras_init() stage

adjust the function position to better match aca/mca fini code in ras_fini().

Signed-off-by: Yang Wang <[email protected]>
Reviewe

drm/amdgpu: move aca/mca init functions into ras_init() stage

adjust the function position to better match aca/mca fini code in ras_fini().

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc2, v6.10-rc1
# 062a7ce6 17-May-2024 Yang Wang <[email protected]>

drm/amdgpu: fix ACA no query result after gpu reset

fix ACA no query result after gpu reset.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-b

drm/amdgpu: fix ACA no query result after gpu reset

fix ACA no query result after gpu reset.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# f6bce954 16-May-2024 Yang Wang <[email protected]>

drm/amdgpu: change aca bank error lock type to spinlock

modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-b

drm/amdgpu: change aca bank error lock type to spinlock

modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4
# f2355862 12-Apr-2024 Yang Wang <[email protected]>

drm/amdgpu: add new aca smu callback func parse_error_code()

add new aca smu callback parse_error_code{} to avoid specific asic check
in amdgpu_aca.c file

Signed-off-by: Yang Wang <kevinyang.wang@a

drm/amdgpu: add new aca smu callback func parse_error_code()

add new aca smu callback parse_error_code{} to avoid specific asic check
in amdgpu_aca.c file

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc3, v6.9-rc2
# 81d96e8b 28-Mar-2024 Yang Wang <[email protected]>

drm/amdgpu: refine function signature of amdgpu_aca_get_error_data()

refine function signature of amdgpu_aca_get_error_data();

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zho

drm/amdgpu: refine function signature of amdgpu_aca_get_error_data()

refine function signature of amdgpu_aca_get_error_data();

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc1
# 31fd330b 18-Mar-2024 Yang Wang <[email protected]>

drm/amdgpu: add ras event id support for ACA

add ras event id support for ACA.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao

drm/amdgpu: add ras event id support for ACA

add ras event id support for ACA.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8, v6.8-rc7
# bd15bf74 03-Mar-2024 Yang Wang <[email protected]>

drm/amdgpu: avoid update aca bank multi times during ras isr

Because the UE Valid MCA count will only be cleared after reset,
in order to avoid repeated counting of the error count,
the aca bank is

drm/amdgpu: avoid update aca bank multi times during ras isr

Because the UE Valid MCA count will only be cleared after reset,
in order to avoid repeated counting of the error count,
the aca bank is only updated once during ras isr.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc6
# e3d4de8d 22-Feb-2024 Yang Wang <[email protected]>

drm/amdgpu: retire unused aca_bank_report data structure

retire unused aca_bank_report data structure.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.Zhang@am

drm/amdgpu: retire unused aca_bank_report data structure

retire unused aca_bank_report data structure.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc5, v6.8-rc4
# 949899cb 06-Feb-2024 Yang Wang <[email protected]>

drm/amdgpu: add new api to save error count into aca cache

add new api to save error count into aca cache.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.Zhan

drm/amdgpu: add new api to save error count into aca cache

add new api to save error count into aca cache.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc3
# abc3b5d2 31-Jan-2024 Yang Wang <[email protected]>

drm/amdgpu: add new aca_smu_type support

Add new types to distinguish between ACA error type and smu mca type.

e.g.:
the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank
channel, so ad

drm/amdgpu: add new aca_smu_type support

Add new types to distinguish between ACA error type and smu mca type.

e.g.:
the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank
channel, so add new type 'aca_smu_type' to distinguish aca error type
and smu mca type.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc2
# c0c48f0d 24-Jan-2024 Yang Wang <[email protected]>

drm/amdgpu: adjust aca init/fini sequence to match gpu reset

- move aca init/fini function into ras init/fini to adapt gpu reset
sequence.
- add new function amdgpu_aca_reset()

Signed-off-by: Yan

drm/amdgpu: adjust aca init/fini sequence to match gpu reset

- move aca init/fini function into ras init/fini to adapt gpu reset
sequence.
- add new function amdgpu_aca_reset()

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc1, v6.7
# 37973b69 02-Jan-2024 Yang Wang <[email protected]>

drm/amdgpu: add aca sysfs support

add aca sysfs node support

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <alexan

drm/amdgpu: add aca sysfs support

add aca sysfs node support

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3
# 04c4fcd2 24-Nov-2023 Yang Wang <[email protected]>

drm/amdgpu: add amdgpu ras aca query interface

v1:
add ACA error query interface

v2:
Add a new helper function to determine whether to use ACA or MCA.

Signed-off-by: Yang Wang <kevinyang.wang@amd.

drm/amdgpu: add amdgpu ras aca query interface

v1:
add ACA error query interface

v2:
Add a new helper function to determine whether to use ACA or MCA.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 33dcda51 24-Nov-2023 Yang Wang <[email protected]>

drm/amdgpu: add ACA bank dump debugfs support

add ACA bank dump debugfs support

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: A

drm/amdgpu: add ACA bank dump debugfs support

add ACA bank dump debugfs support

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.7-rc2
# f5e4cc84 13-Nov-2023 Yang Wang <[email protected]>

drm/amdgpu: implement RAS ACA driver framework

v1:
implement new RAS ACA driver code framework.

v2:
- rename aca_bank_set to aca_banks.
- rename aca_source_xxx to aca_handle_xxx.

v3:
Optimize some

drm/amdgpu: implement RAS ACA driver framework

v1:
implement new RAS ACA driver code framework.

v2:
- rename aca_bank_set to aca_banks.
- rename aca_source_xxx to aca_handle_xxx.

v3:
Optimize some function implementation details. (from Hawking's suggestion)

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...