History log of /linux-6.15/drivers/gpu/drm/xe/xe_devcoredump.c (Results 1 – 25 of 52)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4
# 046eda65 20-Feb-2025 Shuicheng Lin <[email protected]>

drm/xe/devcoredump: Remove IS_ERR_OR_NULL check for kzalloc

kzalloc returns a valid pointer or NULL if the allocation fails.
It never returns an error pointer. It is better to check for NULL directl

drm/xe/devcoredump: Remove IS_ERR_OR_NULL check for kzalloc

kzalloc returns a valid pointer or NULL if the allocation fails.
It never returns an error pointer. It is better to check for NULL directly.

Signed-off-by: Shuicheng Lin <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Reviewed-by: Tejas Upadhyay <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

show more ...


# c504ad91 20-Feb-2025 Shuicheng Lin <[email protected]>

drm/xe/devcoredump: Fix print typo of offset

The log should print with "offset" instead of "size".
Correct the typo in the comment.

v2: split kzalloc change and add typo fix in commit message (Luca

drm/xe/devcoredump: Fix print typo of offset

The log should print with "offset" instead of "size".
Correct the typo in the comment.

v2: split kzalloc change and add typo fix in commit message (Lucas)

Signed-off-by: Shuicheng Lin <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Reviewed-by: Tejas Upadhyay <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

show more ...


Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1
# a9ab6591 23-Jan-2025 Lucas De Marchi <[email protected]>

drm/xe: Fix and re-enable xe_print_blob_ascii85()

Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to me

drm/xe: Fix and re-enable xe_print_blob_ascii85()

Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.

The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.

v2: Add suffix description comment
v3: Reword explanation of xe_print_blob_ascii85() calling drm_puts()
in a loop

Reviewed-by: José Roberto de Souza <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Julia Filipchuk <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: [email protected]
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit 2c95bbf5002776117a69caed3b31c10bf7341bec)
Signed-off-by: Rodrigo Vivi <[email protected]>

show more ...


# 042c48b7 23-Jan-2025 Lucas De Marchi <[email protected]>

drm/xe/devcoredump: Move exec queue snapshot to Contexts section

Having the exec queue snapshot inside a "GuC CT" section was always
wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section

drm/xe/devcoredump: Move exec queue snapshot to Contexts section

Having the exec queue snapshot inside a "GuC CT" section was always
wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section
headings and add tile info") tried to fix that bug, but with that also
broke the mesa tool that parses the devcoredump, hence it was reverted
in commit a53da2fb25a3 ("drm/xe: Revert some changes that break a mesa
debug tool").

With the mesa tool also fixed, this can propagate as a fix on both
kernel and userspace side to avoid unnecessary headache for a debug
feature.

Cc: John Harrison <[email protected]>
Cc: Julia Filipchuk <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: [email protected]
Fixes: a53da2fb25a3 ("drm/xe: Revert some changes that break a mesa debug tool")
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit a37934ea75d331fafa7fe80b6180642ba5193422)
Signed-off-by: Rodrigo Vivi <[email protected]>

show more ...


# 2c95bbf5 23-Jan-2025 Lucas De Marchi <[email protected]>

drm/xe: Fix and re-enable xe_print_blob_ascii85()

Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to me

drm/xe: Fix and re-enable xe_print_blob_ascii85()

Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.

The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.

v2: Add suffix description comment
v3: Reword explanation of xe_print_blob_ascii85() calling drm_puts()
in a loop

Reviewed-by: José Roberto de Souza <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Julia Filipchuk <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: [email protected]
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

show more ...


# a37934ea 23-Jan-2025 Lucas De Marchi <[email protected]>

drm/xe/devcoredump: Move exec queue snapshot to Contexts section

Having the exec queue snapshot inside a "GuC CT" section was always
wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section

drm/xe/devcoredump: Move exec queue snapshot to Contexts section

Having the exec queue snapshot inside a "GuC CT" section was always
wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section
headings and add tile info") tried to fix that bug, but with that also
broke the mesa tool that parses the devcoredump, hence it was reverted
in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool").

With the mesa tool also fixed, this can propagate as a fix on both
kernel and userspace side to avoid unnecessary headache for a debug
feature.

Cc: John Harrison <[email protected]>
Cc: Julia Filipchuk <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: [email protected]
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7
# 75fd04f2 06-Jan-2025 Nitin Gote <[email protected]>

drm/xe: Fix all typos in xe

Fix all typos in files of xe, reported by codespell tool.

Signed-off-by: Nitin Gote <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Reviewe

drm/xe: Fix all typos in xe

Fix all typos in files of xe, reported by codespell tool.

Signed-off-by: Nitin Gote <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Reviewed-by: Stuart Summers <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

show more ...


Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# a53da2fb 13-Dec-2024 John Harrison <[email protected]>

drm/xe: Revert some changes that break a mesa debug tool

There is a mesa debug tool for decoding devcoredump files. Recent
changes to improve the devcoredump output broke that tool. So revert
the ch

drm/xe: Revert some changes that break a mesa debug tool

There is a mesa debug tool for decoding devcoredump files. Recent
changes to improve the devcoredump output broke that tool. So revert
the changes until the tool can be extended to support the new fields.

Signed-off-by: John Harrison <[email protected]>
Fixes: c28fd6c358db ("drm/xe/devcoredump: Improve section headings and add tile info")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Cc: John Harrison <[email protected]>
Cc: Julia Filipchuk <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: [email protected]
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit 70fb86a85dc9fd66014d7eb2fe356f50702ceeb6)
Signed-off-by: Thomas Hellström <[email protected]>

show more ...


# 70fb86a8 13-Dec-2024 John Harrison <[email protected]>

drm/xe: Revert some changes that break a mesa debug tool

There is a mesa debug tool for decoding devcoredump files. Recent
changes to improve the devcoredump output broke that tool. So revert
the ch

drm/xe: Revert some changes that break a mesa debug tool

There is a mesa debug tool for decoding devcoredump files. Recent
changes to improve the devcoredump output broke that tool. So revert
the changes until the tool can be extended to support the new fields.

Signed-off-by: John Harrison <[email protected]>
Fixes: c28fd6c358db ("drm/xe/devcoredump: Improve section headings and add tile info")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Cc: John Harrison <[email protected]>
Cc: Julia Filipchuk <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: [email protected]
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

show more ...


Revision tags: v6.13-rc2, v6.13-rc1
# 5dce85fe 28-Nov-2024 John Harrison <[email protected]>

drm/xe: Move the coredump registration to the worker thread

Adding lockdep checking to the coredump code showed that there was an
existing violation. The dev_coredumpm_timeout() call is used to
regi

drm/xe: Move the coredump registration to the worker thread

Adding lockdep checking to the coredump code showed that there was an
existing violation. The dev_coredumpm_timeout() call is used to
register the dump with the base coredump subsystem. However, that
makes multiple memory allocations, only some of which use the GFP_
flags passed in. So that also needs to be deferred to the worker
function where it is safe to allocate with arbitrary flags.

In order to not add protoypes for the callback functions, moving the
_timeout call also means moving the worker thread function to later in
the file.

v2: Rebased after other changes to the worker function.

Fixes: e799485044cb ("drm/xe: Introduce the dev_coredump infrastructure.")
Cc: Thomas Hellström <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Francois Dugast <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: "Thomas Hellström" <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # v6.8+
Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 90f51a7f4ec1004fc4ddfbc6d1f1068d85ef4771)
Signed-off-by: Thomas Hellström <[email protected]>

show more ...


# 906c4b30 28-Nov-2024 John Harrison <[email protected]>

drm/xe: Add mutex locking to devcoredump

There are now multiple places that can trigger a coredump. Some of
which can happen in parallel. There is already a check against
capturing multiple dumps se

drm/xe: Add mutex locking to devcoredump

There are now multiple places that can trigger a coredump. Some of
which can happen in parallel. There is already a check against
capturing multiple dumps sequentially, but without locking it doesn't
guarantee to work against concurrent dumps. And if two dumps do happen
in parallel, they can end up doing Bad Things such as one call stack
freeing the data the other call stack is still processing. Which leads
to a crashed kernel.

Further, it is possible for the DRM timeout to expire and trigger a
free of the capture while a user is still reading that capture out
through sysfs. Again leading to dodgy pointer problems.

So, add a mutext lock around the capture, read and free functions to
prevent inteference.

v2: Swap tiny scope spin_lock for larger scope mutex and fix
kernel-doc comment (review feedback from Matthew Brost)
v3: Move mutex locks to exclude worker thread and add reclaim
annotation (review feedback from Matthew Brost)
v4: Fix typo.

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# 90f51a7f 28-Nov-2024 John Harrison <[email protected]>

drm/xe: Move the coredump registration to the worker thread

Adding lockdep checking to the coredump code showed that there was an
existing violation. The dev_coredumpm_timeout() call is used to
regi

drm/xe: Move the coredump registration to the worker thread

Adding lockdep checking to the coredump code showed that there was an
existing violation. The dev_coredumpm_timeout() call is used to
register the dump with the base coredump subsystem. However, that
makes multiple memory allocations, only some of which use the GFP_
flags passed in. So that also needs to be deferred to the worker
function where it is safe to allocate with arbitrary flags.

In order to not add protoypes for the callback functions, moving the
_timeout call also means moving the worker thread function to later in
the file.

v2: Rebased after other changes to the worker function.

Fixes: e799485044cb ("drm/xe: Introduce the dev_coredump infrastructure.")
Cc: Thomas Hellström <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Francois Dugast <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: "Thomas Hellström" <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # v6.8+
Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# 429915ac 28-Nov-2024 John Harrison <[email protected]>

drm/xe: Add a reason string to the devcoredump

There are debug level prints giving more information about the cause
of the hang immediately before core dumps are created. However, not
everyone has d

drm/xe: Add a reason string to the devcoredump

There are debug level prints giving more information about the cause
of the hang immediately before core dumps are created. However, not
everyone has debug level prints enabled or saves the dmesg log at all.
So include that information in the dump file itself. Also, at least
one of those prints included the pid as well as the process name. So
include that in the capture too.

v2: Fix kvfree vs kfree and missing kernel-doc (review feedback from
Matthew Brost)
v3: Use GFP_ATOMIC instead of GFP_KERNEL.

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# aef0b4a0 26-Nov-2024 Matthew Brost <[email protected]>

drm/xe: Take PM ref in delayed snapshot capture worker

The delayed snapshot capture worker can access the GPU or VRAM both of
which require a PM reference. Take a reference in this worker.

Cc: Rodr

drm/xe: Take PM ref in delayed snapshot capture worker

The delayed snapshot capture worker can access the GPU or VRAM both of
which require a PM reference. Take a reference in this worker.

Cc: Rodrigo Vivi <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Fixes: 4f04d07c0a94 ("drm/xe: Faster devcoredump")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 1c6878af115a4586a40d6c14d530fa9f93e0bd83)
Signed-off-by: Thomas Hellström <[email protected]>

show more ...


# 1c6878af 26-Nov-2024 Matthew Brost <[email protected]>

drm/xe: Take PM ref in delayed snapshot capture worker

The delayed snapshot capture worker can access the GPU or VRAM both of
which require a PM reference. Take a reference in this worker.

Cc: Rodr

drm/xe: Take PM ref in delayed snapshot capture worker

The delayed snapshot capture worker can access the GPU or VRAM both of
which require a PM reference. Take a reference in this worker.

Cc: Rodrigo Vivi <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Fixes: 4f04d07c0a94 ("drm/xe: Faster devcoredump")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


Revision tags: v6.12
# a54b0de7 14-Nov-2024 Matthew Brost <[email protected]>

drm/xe: Change xe_engine_snapshot_capture_for_job to be for queue

During capture time, the target job may be unavailable (e.g., if it's in
LR mode). However, the associated exec queue will be availa

drm/xe: Change xe_engine_snapshot_capture_for_job to be for queue

During capture time, the target job may be unavailable (e.g., if it's in
LR mode). However, the associated exec queue will be available
regardless, change xe_engine_snapshot_capture_for_job to take a queue
argument ann rename to xe_engine_snapshot_capture_for_queue.

v2:
- Reword commit message (Jonathan)
- Remove redundant queueu check (Zhanjun)
- Remove devcoredump job member (Zhanjun)

Cc: Zhanjun Dong <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# f62e6edf 14-Nov-2024 Matthew Brost <[email protected]>

drm/xe: Add exec queue param to devcoredump

During capture time, the target job may be unavailable (e.g., if it's in
LR mode). However, the associated exec queue will be available
regardless, so add

drm/xe: Add exec queue param to devcoredump

During capture time, the target job may be unavailable (e.g., if it's in
LR mode). However, the associated exec queue will be available
regardless, so add an exec queue param for such cases.

v2:
- Reword commit message (Jonathan)

Cc: Zhanjun Dong <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


Revision tags: v6.12-rc7, v6.12-rc6
# aa06cb83 02-Nov-2024 Lucas De Marchi <[email protected]>

drm/xe: Improve devcoredump documentation

Let the introduction be useful for both userspace and kernel. Also
improve the formatting to wire up later to the documentation build.

Reviewed-by: Raag Ja

drm/xe: Improve devcoredump documentation

Let the introduction be useful for both userspace and kernel. Also
improve the formatting to wire up later to the documentation build.

Reviewed-by: Raag Jadav <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

show more ...


Revision tags: v6.12-rc5
# db38fdb7 24-Oct-2024 John Harrison <[email protected]>

drm/xe/guc: Separate full CTB content from guc_info debugfs

The guc_info debugfs file is meant to be a quick view of the current
software state of the GuC interface. Including the full CTB contents

drm/xe/guc: Separate full CTB content from guc_info debugfs

The guc_info debugfs file is meant to be a quick view of the current
software state of the GuC interface. Including the full CTB contents
makes the file as a whole much less human readable and is not
partiular useful in the general case. So don't pollute the info dump
with the full buffers. Instead, move those into a separate debugfs
entry that can be read when that information is actually required.

Also, improve the human readability by adding a few extra blank lines
to delimt the sections.

v2: Hide the internal capture/print params from external callers that
don't need to know (review feedback from Matthew Brost).

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


Revision tags: v6.12-rc4
# 9ffd6ec2 14-Oct-2024 Himal Prasad Ghimiray <[email protected]>

drm/xe/devcoredump: Update handling of xe_force_wake_get return

xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will

drm/xe/devcoredump: Update handling of xe_force_wake_get return

xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Use helper xe_force_wake_ref_has_domain to
verify all domains are initialized or not. Update the return handling of
xe_force_wake_get() to reflect this behavior, and ensure that the return
value is passed as input to xe_force_wake_put().

v3
- return xe_wakeref_t instead of int in xe_force_wake_get()

v5
- return unsigned int for xe_force_wake_get()

v6
- use helper xe_force_wake_ref_has_domain()

v7
- Fix commit message

Cc: Rodrigo Vivi <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Reviewed-by: Badal Nilawar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

show more ...


Revision tags: v6.12-rc3, v6.12-rc2
# 0f1fdf55 04-Oct-2024 Zhanjun Dong <[email protected]>

drm/xe/guc: Save manual engine capture into capture list

Save manual engine capture into capture list.
This removes duplicate register definitions across manual-capture vs
guc-err-capture.

Signed-o

drm/xe/guc: Save manual engine capture into capture list

Save manual engine capture into capture list.
This removes duplicate register definitions across manual-capture vs
guc-err-capture.

Signed-off-by: Zhanjun Dong <[email protected]>
Reviewed-by: Alan Previn <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# ecb63364 04-Oct-2024 Zhanjun Dong <[email protected]>

drm/xe/guc: Plumb GuC-capture into dev coredump

When we decide to kill a job, (from guc_exec_queue_timedout_job), we could
end up with 4 possible scenarios at this starting point of this decision:
1

drm/xe/guc: Plumb GuC-capture into dev coredump

When we decide to kill a job, (from guc_exec_queue_timedout_job), we could
end up with 4 possible scenarios at this starting point of this decision:
1. the guc-captured register-dump is already there.
2. the driver is wedged.mode > 1, so GuC-engine-reset / GuC-err-capture
will not happen.
3. the user has started the driver in execlist-submission mode.
4. the guc-captured register-dump is not ready yet so we force GuC to kill
that context now, but:
A. we don't know yet if GuC will be successful on the engine-reset
and get the guc-err-capture, else kmd will do a manual reset later
OR B. guc will be successful and we will get a guc-err-capture
shortly.

So to accomdate the scenarios of 2 and 4A, we will need to do a manual KMD
capture first(which is not be reliable in guc-submission mode) and decide
later if we need to use that for the cases of 2 or 4A. So this flow is
part of the implementation for this patch.

Provide xe_guc_capture_get_reg_desc_list to get the register dscriptor
list.
Add manual capture by read from hw engine if GuC capture is not ready.
If it becomes ready at later time, GuC sourced data will be used.

Although there may only be a small delay between (1) the check for whether
guc-err-capture is available at the start of guc_exec_queue_timedout_job
and (2) the decision on using a valid guc-err-capture or manual-capture,
lets not take any chances and lock the matching node down so it doesn't
get re-claimed if GuC-Err-Capture subsystem is running out of pre-cached
nodes.

Signed-off-by: Zhanjun Dong <[email protected]>
Reviewed-by: Alan Previn <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# 8b7dfb98 03-Oct-2024 John Harrison <[email protected]>

drm/xe/guc: Add GuC log to devcoredump captures

Include the GuC log in devcoredump captures because they can be useful
with debugging certain types of bug.

v2: Fix kerneldoc
v3: Drop module paramet

drm/xe/guc: Add GuC log to devcoredump captures

Include the GuC log in devcoredump captures because they can be useful
with debugging certain types of bug.

v2: Fix kerneldoc
v3: Drop module parameter as now using more compact ascii85 encoding
rather than hexdump (although still not compressed) (review feedback
from Matthew B). Rebase onto recent refactoring of devcoredump code.
v4: Don't move the submission snapshot inside the GuC internals
structure 'cos it really doesn't belong there.

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# ec1455ce 03-Oct-2024 John Harrison <[email protected]>

drm/xe/devcoredump: Add ASCII85 dump helper function

There is a need to include the GuC log and other large binary objects
in core dumps and via dmesg. So add a helper for dumping to a printer
funct

drm/xe/devcoredump: Add ASCII85 dump helper function

There is a need to include the GuC log and other large binary objects
in core dumps and via dmesg. So add a helper for dumping to a printer
function via conversion to ASCII85 encoding.

Another issue with dumping such a large buffer is that it can be slow,
especially if dumping to dmesg over a serial port. So add a yield to
prevent the 'task has been stuck for 120s' kernel hang check feature
from firing.

v2: Add a prefix to the output string. Fix memory allocation bug.
v3: Correct a string size calculation and clean up a define (review
feedback from Julia F).

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


# c28fd6c3 03-Oct-2024 John Harrison <[email protected]>

drm/xe/devcoredump: Improve section headings and add tile info

The xe_guc_exec_queue_snapshot is not really a GuC internal thing and
is definitely not a GuC CT thing. So give it its own section head

drm/xe/devcoredump: Improve section headings and add tile info

The xe_guc_exec_queue_snapshot is not really a GuC internal thing and
is definitely not a GuC CT thing. So give it its own section heading.
The snapshot itself is really a capture of the submission backend's
internal state. Although all it currently prints out is the submission
contexts. So label it as 'Contexts'. If more general state is added
later then it could be change to 'Submission backend' or some such.

Further, everything from the GuC CT section onwards is GT specific but
there was no indication of which GT it was related to (and that is
impossible to work out from the other fields that are given). So add a
GT section heading. Also include the tile id of the GT, because again
significant information.

Lastly, drop a couple of unnecessary line feeds within sections.

v2: Add GT section heading, add tile id to device section.

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

show more ...


123