History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h (Results 1 – 25 of 32)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6
# 05d50ea3 04-Mar-2025 Tao Zhou <[email protected]>

drm/amdgpu: format old RAS eeprom data into V3 version

Clear old data and save it in V3 format.

v2: only format eeprom data for new ASICs.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by:

drm/amdgpu: format old RAS eeprom data into V3 version

Clear old data and save it in V3 format.

v2: only format eeprom data for new ASICs.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc5
# a8f921a1 24-Feb-2025 ganglxie <[email protected]>

drm/amdgpu: Change page/record number calculation based on nps

save only one record to save eeprom space,and
bad_page_num = pa_rec_num + mca_rec_num*16

Signed-off-by: ganglxie <[email protected]>
Re

drm/amdgpu: Change page/record number calculation based on nps

save only one record to save eeprom space,and
bad_page_num = pa_rec_num + mca_rec_num*16

Signed-off-by: ganglxie <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1
# ae756cd8 29-Nov-2024 Tao Zhou <[email protected]>

drm/amdgpu: correct the calculation of RAS bad page

After the introduction of NPS RAS, one bad page record on eeprom may be
related to 1 or 16 bad pages, so the bad page record and bad page are
two

drm/amdgpu: correct the calculation of RAS bad page

After the introduction of NPS RAS, one bad page record on eeprom may be
related to 1 or 16 bad pages, so the bad page record and bad page are
two different concepts, define a new variable to store bad page number.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 1f06e7f3 28-Nov-2024 Tao Zhou <[email protected]>

drm/amdgpu: split ras_eeprom_init into init and check functions

Init function is for ras table header read and check function is
responsible for the validation of the header. Call them in different

drm/amdgpu: split ras_eeprom_init into init and check functions

Init function is for ras table header read and check function is
responsible for the validation of the header. Call them in different
stages.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4
# 772df3df 18-Oct-2024 Tao Zhou <[email protected]>

drm/amdgpu: add flag to indicate the type of RAS eeprom record

One UMC MCA address could map to multiply physical address (PA):

AMDGPU_RAS_EEPROM_REC_PA: one record store one PA
AMDGPU_RAS_EEPROM_R

drm/amdgpu: add flag to indicate the type of RAS eeprom record

One UMC MCA address could map to multiply physical address (PA):

AMDGPU_RAS_EEPROM_REC_PA: one record store one PA
AMDGPU_RAS_EEPROM_REC_MCA: one record store one MCA address, PA
is not cared about

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1
# b95fa494 23-May-2024 Tao Zhou <[email protected]>

drm/amdgpu: add RAS is_rma flag

Set the flag to true if bad page number reaches threshold.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-

drm/amdgpu: add RAS is_rma flag

Set the flag to true if bad page number reaches threshold.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5
# 0bc3137b 01-Jun-2023 Stanley.Yang <[email protected]>

drm/amdgpu: Set EEPROM ras info

Set EEPROM ras info: rma status, health percent and bad
page threshold.

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Hawking Zhang <Hawking.Zhang@

drm/amdgpu: Set EEPROM ras info

Set EEPROM ras info: rma status, health percent and bad
page threshold.

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 7f599fed 31-May-2023 Stanley.Yang <[email protected]>

drm/amdgpu: Add support EEPROM table v2.1

Add ras info to EEPROM table, app can analyse device ECC
status without GPU driver through EEPROM table ras info.

Signed-off-by: Stanley.Yang <Stanley.Yang

drm/amdgpu: Add support EEPROM table v2.1

Add ras info to EEPROM table, app can analyse device ECC
status without GPU driver through EEPROM table ras info.

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 65183fae 30-May-2023 Stanley.Yang <[email protected]>

drm/amdgpu: Add RAS table v2.1 macro definition

Add RAS EEPROM table version 2.1 macro definition.

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]

drm/amdgpu: Add RAS table v2.1 macro definition

Add RAS EEPROM table version 2.1 macro definition.

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 71c79a19 30-May-2023 Stanley.Yang <[email protected]>

drm/amdgpu: Rename ras table version

Rename RAS_TABLE_VER to RAS_TABLE_VER_V1,
move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h.

Signed-off-by: Stanley.Yang <[email protected]

drm/amdgpu: Rename ras table version

Rename RAS_TABLE_VER to RAS_TABLE_VER_V1,
move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h.

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7
# 69691c82 03-Mar-2022 Stanley.Yang <[email protected]>

drm/amdgpu: message smu to update bad channel info

It should notice SMU to update bad channel info when detected
uncorrectable error in UMC block

Signed-off-by: Stanley.Yang <[email protected]>

drm/amdgpu: message smu to update bad channel info

It should notice SMU to update bad channel info when detected
uncorrectable error in UMC block

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1
# 6cd1f9b4 09-Sep-2021 Michel Dänzer <[email protected]>

drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count

This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation

drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count

This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation
units. The latter would be more involved in this case, so just drop
the inline declaration for now.

Fixes compile failure building for ppc64le on RHEL 8:

In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call
to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available
90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
1985 | max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: c84d46707ebb "drm/amdgpu: validate bad page threshold in ras(v3)"
Reviewed-by: Lyude Paul <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 114518ff 09-Sep-2021 Michel Dänzer <[email protected]>

drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count

This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation

drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count

This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation
units. The latter would be more involved in this case, so just drop
the inline declaration for now.

Fixes compile failure building for ppc64le on RHEL 8:

In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call
to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available
90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
1985 | max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: c84d46707ebb "drm/amdgpu: validate bad page threshold in ras(v3)"
Reviewed-by: Lyude Paul <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7
# c65b0805 08-Apr-2021 Luben Tuikov <[email protected]>

drm/amdgpu: RAS EEPROM table is now in debugfs

Add "ras_eeprom_size" file in debugfs, which
reports the maximum size allocated to the RAS
table in EEROM, as the number of bytes and the
number of rec

drm/amdgpu: RAS EEPROM table is now in debugfs

Add "ras_eeprom_size" file in debugfs, which
reports the maximum size allocated to the RAS
table in EEROM, as the number of bytes and the
number of records it could store. For instance,

$cat /sys/kernel/debug/dri/0/ras/ras_eeprom_size
262144 bytes or 10921 records
$_

Add "ras_eeprom_table" file in debugfs, which
dumps the RAS table stored EEPROM, in a formatted
way. For instance,

$cat ras_eeprom_table
Signature Version FirstOffs Size Checksum
0x414D4452 0x00010000 0x00000014 0x000000EC 0x000000DA
Index Offset ErrType Bank/CU TimeStamp Offs/Addr MemChl MCUMCID RetiredPage
0 0x00014 ue 0x00 0x00000000607608DC 0x000000000000 0x00 0x00 0x000000000000
1 0x0002C ue 0x00 0x00000000607608DC 0x000000001000 0x00 0x00 0x000000000001
2 0x00044 ue 0x00 0x00000000607608DC 0x000000002000 0x00 0x00 0x000000000002
3 0x0005C ue 0x00 0x00000000607608DC 0x000000003000 0x00 0x00 0x000000000003
4 0x00074 ue 0x00 0x00000000607608DC 0x000000004000 0x00 0x00 0x000000000004
5 0x0008C ue 0x00 0x00000000607608DC 0x000000005000 0x00 0x00 0x000000000005
6 0x000A4 ue 0x00 0x00000000607608DC 0x000000006000 0x00 0x00 0x000000000006
7 0x000BC ue 0x00 0x00000000607608DC 0x000000007000 0x00 0x00 0x000000000007
8 0x000D4 ue 0x00 0x00000000607608DD 0x000000008000 0x00 0x00 0x000000000008
$_

Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: John Clements <[email protected]>
Cc: Hawking Zhang <[email protected]>
Cc: Xinhui Pan <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 63d4c081 06-Apr-2021 Luben Tuikov <[email protected]>

drm/amdgpu: Optimize EEPROM RAS table I/O

Split functionality between read and write, which
simplifies the code and exposes areas of
optimization and more or less complexity, and take
advantage of t

drm/amdgpu: Optimize EEPROM RAS table I/O

Split functionality between read and write, which
simplifies the code and exposes areas of
optimization and more or less complexity, and take
advantage of that.

Read and write the table in one go; use a separate
stage to decode or encode the data, as opposed to
on the fly, which keeps the I2C bus busy. Use a
single read/write to read/write the table or at
most two if the number of records we're
reading/writing wraps around.

Check the check-sum of a table in EEPROM on init.

Update the checksum at the same time as when
updating the table header signature, when the
threshold was increased on boot.

Take advantage of arithmetic modulo 256, that is,
use a byte!, to greatly simplify checksum
arithmetic.

Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 017dad64 16-Jun-2021 Luben Tuikov <[email protected]>

drm/amdgpu: Get rid of test function

The code is now tested from userspace.
Remove already macroed out test function.

Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <Andrey

drm/amdgpu: Get rid of test function

The code is now tested from userspace.
Remove already macroed out test function.

Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Reviewed-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 0686627b 16-Jun-2021 Luben Tuikov <[email protected]>

drm/amdgpu: Some renames

Qualify with "ras_". Use kernel's own--don't
redefine your own.

Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Signed-o

drm/amdgpu: Some renames

Qualify with "ras_". Use kernel's own--don't
redefine your own.

Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Reviewed-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.12-rc6, v5.12-rc5
# e4e6a589 27-Mar-2021 Luben Tuikov <[email protected]>

drm/amdgpu: Use explicit cardinality for clarity

RAS_MAX_RECORD_NUM may mean the maximum record
number, as in the maximum house number on your
street, or it may mean the maximum number of
records, a

drm/amdgpu: Use explicit cardinality for clarity

RAS_MAX_RECORD_NUM may mean the maximum record
number, as in the maximum house number on your
street, or it may mean the maximum number of
records, as in the count of records, which is also
a number. To make this distinction whether the
number is ordinal (index) or cardinal (count),
rename this macro to RAS_MAX_RECORD_COUNT.

This makes it easy to understand what it refers
to, especially when we compute quantities such as,
how many records do we have left in the table,
especially when there are so many other numbers,
quantities and numerical macros around.

Also rename the long,
amdgpu_ras_eeprom_get_record_max_length() to the
more succinct and clear,
amdgpu_ras_eeprom_max_record_count().

When computing the threshold, which also deals
with counts, i.e. "how many", use cardinal
"max_eeprom_records_count", than the quantitative
"max_eeprom_records_len".

Simplify the logic here and there, as well.

Cc: Guchun Chen <[email protected]>
Cc: John Clements <[email protected]>
Cc: Hawking Zhang <[email protected]>
Cc: Alexander Deucher <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 803c6ebd 27-Mar-2021 Luben Tuikov <[email protected]>

drm/amdgpu: Simplify RAS EEPROM checksum calculations

Rename update_table_header() to
write_table_header() as this function is actually
writing it to EEPROM.

Use kernel types; use u8 to carry aroun

drm/amdgpu: Simplify RAS EEPROM checksum calculations

Rename update_table_header() to
write_table_header() as this function is actually
writing it to EEPROM.

Use kernel types; use u8 to carry around the
checksum, in order to take advantage of arithmetic
modulo 8-bits (256).

Tidy up to 80 columns.

When updating the checksum, just recalculate the
whole thing.

Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.12-rc4, v5.12-rc3
# 1fab841f 11-Mar-2021 Luben Tuikov <[email protected]>

drm/amdgpu: RAS xfer to read/write

Wrap amdgpu_ras_eeprom_xfer(..., bool write),
into amdgpu_ras_eeprom_read() and
amdgpu_ras_eeprom_write(), as that makes reading
and understanding the code clearer

drm/amdgpu: RAS xfer to read/write

Wrap amdgpu_ras_eeprom_xfer(..., bool write),
into amdgpu_ras_eeprom_read() and
amdgpu_ras_eeprom_write(), as that makes reading
and understanding the code clearer.

Cc: Jean Delvare <[email protected]>
Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Lijo Lazar <[email protected]>
Cc: Stanley Yang <[email protected]>
Cc: Hawking Zhang <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7
# a4399657 05-Feb-2021 Luben Tuikov <[email protected]>

drm/amdgpu: Rename misspelled function

Instead of fixing the spelling in
amdgpu_ras_eeprom_process_recods(),
rename it to,
amdgpu_ras_eeprom_xfer(),
to look similar to other I2C and protocol
tra

drm/amdgpu: Rename misspelled function

Instead of fixing the spelling in
amdgpu_ras_eeprom_process_recods(),
rename it to,
amdgpu_ras_eeprom_xfer(),
to look similar to other I2C and protocol
transfer (read/write) functions.

Also to keep the column span to within reason by
using a shorter name.

Change the "num" function parameter from "int" to
"const u32" since it is the number of items
(records) to xfer, i.e. their count, which cannot
be a negative number.

Also swap the order of parameters, keeping the
pointer to records and their number next to each
other, while the direction now becomes the last
parameter.

Cc: Jean Delvare <[email protected]>
Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Lijo Lazar <[email protected]>
Cc: Stanley Yang <[email protected]>
Cc: Hawking Zhang <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# ccdfbfec 02-Feb-2021 Luben Tuikov <[email protected]>

drm/amdgpu: RAS and FRU now use 19-bit I2C address

Convert RAS and FRU code to use the 19-bit I2C
memory address and remove all "slave_addr", as
this is now absolved into the 19-bit address.

Cc: Je

drm/amdgpu: RAS and FRU now use 19-bit I2C address

Convert RAS and FRU code to use the 19-bit I2C
memory address and remove all "slave_addr", as
this is now absolved into the 19-bit address.

Cc: Jean Delvare <[email protected]>
Cc: John Clements <[email protected]>
Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Lijo Lazar <[email protected]>
Cc: Stanley Yang <[email protected]>
Cc: Hawking Zhang <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alexander Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 11003c68 26-Feb-2021 Dennis Li <[email protected]>

drm/amdgpu: remove unnecessary reading for epprom header

If the number of badpage records exceed the threshold, driver has
updated both epprom header and control->tbl_hdr.header before gpu reset,
th

drm/amdgpu: remove unnecessary reading for epprom header

If the number of badpage records exceed the threshold, driver has
updated both epprom header and control->tbl_hdr.header before gpu reset,
therefore GPU recovery thread no need to read epprom header directly.

v2: merge amdgpu_ras_check_err_threshold into amdgpu_ras_eeprom_check_err_threshold

Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7
# e8fbaf03 23-Jul-2020 Guchun Chen <[email protected]>

drm/amdgpu: break GPU recovery once it's in bad state(v4)

When GPU executes recovery and retriving bad GPU tag
from external eerpom device, the recovery will be broken
and error message is printed a

drm/amdgpu: break GPU recovery once it's in bad state(v4)

When GPU executes recovery and retriving bad GPU tag
from external eerpom device, the recovery will be broken
and error message is printed as well for user's awareness.

v2: Refine warning message in threshold reaching case, and
fix spelling typo.

v3: Fix explicit calling of bad gpu.

v4: Rename function names.

Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# b82e65a9 23-Jul-2020 Guchun Chen <[email protected]>

drm/amdgpu: break driver init process when it's bad GPU(v5)

When retrieving bad gpu tag from eeprom, GPU init should
fail as the GPU needs to be retired for further check.

v2: Fix spelling typo, co

drm/amdgpu: break driver init process when it's bad GPU(v5)

When retrieving bad gpu tag from eeprom, GPU init should
fail as the GPU needs to be retired for further check.

v2: Fix spelling typo, correct the condition to detect
bad gpu tag and refine error message.

v3: Refine function argument name.

v4: Fix missing check of returning value of i2c
initialization error case.

v5: Use dev_err to print PCI information in dmesg instead
of DRM_ERROR.

Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


12