History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c (Results 1 – 25 of 156)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# f81cd793 10-Mar-2025 Shaoyun Liu <[email protected]>

drm/amd/amdgpu: Fix MES init sequence

When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu <[email protected]>
Ack

drm/amd/amdgpu: Fix MES init sequence

When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Srinivasan Shanmugam <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc6, v6.14-rc5
# a91d91b6 26-Feb-2025 Tony Yi <[email protected]>

drm/amdgpu: Add support for CPERs on virtualization

Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host ev

drm/amdgpu: Add support for CPERs on virtualization

Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.

For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.

Signed-off-by: Tony Yi <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc4, v6.14-rc3
# dc0297f3 14-Feb-2025 Srinivasan Shanmugam <[email protected]>

drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtuali

drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[ 253.013423] =============================
[ 253.013434] [ BUG: Invalid wait context ]
[ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[ 253.013464] -----------------------------
[ 253.013475] kworker/0:1/10 is trying to lock:
[ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.013815] other info that might help us debug this:
[ 253.013827] context-{4:4}
[ 253.013835] 3 locks held by kworker/0:1/10:
[ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[ 253.014154] stack backtrace:
[ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14
[ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[ 253.014224] Workqueue: events work_for_cpu_fn
[ 253.014241] Call Trace:
[ 253.014250] <TASK>
[ 253.014260] dump_stack_lvl+0x9b/0xf0
[ 253.014275] dump_stack+0x10/0x20
[ 253.014287] __lock_acquire+0xa47/0x2810
[ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.014321] lock_acquire+0xd1/0x300
[ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.014562] ? __lock_acquire+0xa6b/0x2810
[ 253.014578] __mutex_lock+0x85/0xe20
[ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.014782] ? sched_clock_noinstr+0x9/0x10
[ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.014808] ? local_clock_noinstr+0xe/0xc0
[ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.015029] mutex_lock_nested+0x1b/0x30
[ 253.015044] ? mutex_lock_nested+0x1b/0x30
[ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu]
[ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[ 253.017493] local_pci_probe+0x4b/0xb0
[ 253.017746] work_for_cpu_fn+0x1a/0x30
[ 253.017995] process_one_work+0x21e/0x680
[ 253.018248] worker_thread+0x190/0x330
[ 253.018500] ? __pfx_worker_thread+0x10/0x10
[ 253.018746] kthread+0xe7/0x120
[ 253.018988] ? __pfx_kthread+0x10/0x10
[ 253.019231] ret_from_fork+0x3c/0x60
[ 253.019468] ? __pfx_kthread+0x10/0x10
[ 253.019701] ret_from_fork_asm+0x1a/0x30
[ 253.019939] </TASK>

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao <[email protected]>
Cc: Jingwen Chen <[email protected]>
Cc: Victor Skvortsov <[email protected]>
Cc: Zhigang Luo <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.14-rc2, v6.14-rc1
# 04893397 21-Jan-2025 Victor Skvortsov <[email protected]>

drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks

VFs are not able to query error counts for all RAS blocks. Rather than
returning error for queries on these blocks, skip sysfs

drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks

VFs are not able to query error counts for all RAS blocks. Rather than
returning error for queries on these blocks, skip sysfs the creation
all together.

Signed-off-by: Victor Skvortsov <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4
# a21ab06b 17-Dec-2024 Mirsad Todorovac <[email protected]>

drm/admgpu: replace kmalloc() and memcpy() with kmemdup()

The static analyser tool gave the following advice:

./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup

drm/admgpu: replace kmalloc() and memcpy() with kmemdup()

The static analyser tool gave the following advice:

./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup

→ 1266 tmp = kmalloc(used_size, GFP_KERNEL);
1267 if (!tmp)
1268 return -ENOMEM;
1269
→ 1270 memcpy(tmp, &host_telemetry->body.error_count, used_size);

Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics.
Original code works without fault, so this is not a bug fix but proposed improvement.

Link: https://lwn.net/Articles/198928/
Fixes: 84a2947ecc85 ("drm/amdgpu: Implement virt req_ras_err_count")
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Xinhui Pan <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Simona Vetter <[email protected]>
Cc: Zhigang Luo <[email protected]>
Cc: Victor Skvortsov <[email protected]>
Cc: Hawking Zhang <[email protected]>
Cc: Lijo Lazar <[email protected]>
Cc: Yunxiang Li <[email protected]>
Cc: Jack Xiao <[email protected]>
Cc: Vignesh Chander <[email protected]>
Cc: Danijel Slivka <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Mirsad Todorovac <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6
# 84a2947e 30-Oct-2024 Victor Skvortsov <[email protected]>

drm/amdgpu: Implement virt req_ras_err_count

Enable RAS late init if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The

drm/amdgpu: Implement virt req_ras_err_count

Enable RAS late init if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.

The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.

v2: Flip generate report & update statistics order for VF

Signed-off-by: Victor Skvortsov <[email protected]>
Acked-by: Tao Zhou <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 907fec2d 30-Oct-2024 Victor Skvortsov <[email protected]>

drm/amdgpu: VF Query RAS Caps from Host if supported

If VF RAS Capability support is enabled, guest is able to
retrieve the real RAS support from the host.

Signed-off-by: Victor Skvortsov <victor.s

drm/amdgpu: VF Query RAS Caps from Host if supported

If VF RAS Capability support is enabled, guest is able to
retrieve the real RAS support from the host.

Signed-off-by: Victor Skvortsov <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3
# 2029b3d7 07-Aug-2024 Jack Xiao <[email protected]>

drm/amdgpu/mes: add multiple mes ring instances support

Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex D

drm/amdgpu/mes: add multiple mes ring instances support

Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit c7d4355648ffa02a1551495b05c71ea6c884d29c)

show more ...


# f83cec3b 08-Aug-2024 Victor Skvortsov <[email protected]>

drm/amdgpu: Disable dpm_enabled flag while VF is in reset

VFs do not perform HW fini/suspend in FLR, so the dpm_enabled
is incorrectly kept enabled. Add interface to disable it in
virt_pre_reset cal

drm/amdgpu: Disable dpm_enabled flag while VF is in reset

VFs do not perform HW fini/suspend in FLR, so the dpm_enabled
is incorrectly kept enabled. Add interface to disable it in
virt_pre_reset call.

v2: Made implementation generic for all asics
v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF

Signed-off-by: Victor Skvortsov <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# c7d43556 07-Aug-2024 Jack Xiao <[email protected]>

drm/amdgpu/mes: add multiple mes ring instances support

Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex D

drm/amdgpu/mes: add multiple mes ring instances support

Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6
# 33f23fc3 27-Jun-2024 Yifan Zha <[email protected]>

drm/amdgpu: Set no_hw_access when VF request full GPU fails

[Why]
If VF request full GPU access and the request failed,
the VF driver can get stuck accessing registers for an extended period during

drm/amdgpu: Set no_hw_access when VF request full GPU fails

[Why]
If VF request full GPU access and the request failed,
the VF driver can get stuck accessing registers for an extended period during
the unload of KMS.

[How]
Set no_hw_access flag when VF request for full GPU access fails
This prevents further hardware access attempts, avoiding the prolonged
stuck state.

Signed-off-by: Yifan Zha <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# cbda2758 24-Jun-2024 Vignesh Chander <[email protected]>

drm/amdgpu: process RAS fatal error MB notification

For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset complet

drm/amdgpu: process RAS fatal error MB notification

For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset completion message first
to avoid accidentally spamming multiple reset requests to host.

v2: add another mailbox check for handling case where kfd detects
timeout first

v3: set host_flr bit and use wait_for_reset

Signed-off-by: Vignesh Chander <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1
# 5c0a1cdd 24-May-2024 Yunxiang Li <[email protected]>

drm/amdgpu: fix sriov host flr handler

We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.

In the current st

drm/amdgpu: fix sriov host flr handler

We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.

In the current state since we take tens of seconds to stop everything,
it is very likely that host would give up waiting and reset the GPU
before we send ready, so it would be the same as before. But this gets
rid of the hack with reset_domain locking and also let us tell how slow
ready to reset actually is from the host. The ready to reset speed can
be improved later.

Signed-off-by: Yunxiang Li <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Emily Deng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# b3948ad1 24-May-2024 Yunxiang Li <[email protected]>

drm/amdgpu: add skip_hw_access checks for sriov

Accessing registers via host is missing the check for skip_hw_access and
the lockdep check that comes with it.

Signed-off-by: Yunxiang Li <Yunxiang.L

drm/amdgpu: add skip_hw_access checks for sriov

Accessing registers via host is missing the check for skip_hw_access and
the lockdep check that comes with it.

Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# c8ad1bbb 31-May-2024 Lin.Cao <[email protected]>

drm/amdgpu: fix failure mapping legacy queue when FLR

Flag "mes.ring.shced.ready" will be set as true after mes hw init and set
as false when mes hw fini to avoid duplicate initialization. But hw fi

drm/amdgpu: fix failure mapping legacy queue when FLR

Flag "mes.ring.shced.ready" will be set as true after mes hw init and set
as false when mes hw fini to avoid duplicate initialization. But hw fini
will not be called when function level reset, which will cause mes hw
init be skipped during FLR, which will leads to mapping legacy queue
fail. Set this flag as false when post reset will fix this issue.

Signed-off-by: Lin.Cao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# e864180e 27-May-2024 Victor Skvortsov <[email protected]>

drm/amdgpu: Add lock around VF RLCG interface

flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.

Both of these threads access registers through the VF
RLCG interfa

drm/amdgpu: Add lock around VF RLCG interface

flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.

Both of these threads access registers through the VF
RLCG interface during VF Full Access. Add a lock around this interface
to prevent race conditions between these threads.

Signed-off-by: Victor Skvortsov <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 5434bc03 19-May-2024 Victor Skvortsov <[email protected]>

drm/amdgpu: Queue KFD reset workitem in VF FED

The guest recovery sequence is buggy in Fatal Error when both
FLR & KFD reset workitems are queued at the same time. In addition,
FLR guest recovery se

drm/amdgpu: Queue KFD reset workitem in VF FED

The guest recovery sequence is buggy in Fatal Error when both
FLR & KFD reset workitems are queued at the same time. In addition,
FLR guest recovery sequence is out of order when PF/VF communication
breaks due to a GPU fatal error

As a temporary work around, perform a KFD style reset (Initiate reset
request from the guest) inside the pf2vf thread on FED.

Signed-off-by: Victor Skvortsov <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9, v6.9-rc7, v6.9-rc6
# f4322b9f 22-Apr-2024 Yunxiang Li <[email protected]>

drm/amdgpu: Fix two reset triggered in a row

Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule a

drm/amdgpu: Fix two reset triggered in a row

Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.

Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.

Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc5
# 3bcc0ee1 16-Apr-2024 Zhigang Luo <[email protected]>

drm/amdgpu: avoid reading vf2pf info size from FB

VF can't access FB when host is doing mode1 reset. Using sizeof to get
vf2pf info size, instead of reading it from vf2pf header stored in FB.

Signe

drm/amdgpu: avoid reading vf2pf info size from FB

VF can't access FB when host is doing mode1 reset. Using sizeof to get
vf2pf info size, instead of reading it from vf2pf header stored in FB.

Signed-off-by: Zhigang Luo <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 0fa4c25d 26-Apr-2024 Tim Huang <[email protected]>

drm/amdgpu: fix uninitialized scalar variable warning

Clear warning that field bp is uninitialized when
calling amdgpu_virt_ras_add_bps.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Ya

drm/amdgpu: fix uninitialized scalar variable warning

Clear warning that field bp is uninitialized when
calling amdgpu_virt_ras_add_bps.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# d1999b40 20-Mar-2024 Zhigang Luo <[email protected]>

amd/amdgpu: improve VF recover time

1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5.
2. set fatel error detected flag.

Signed-off-by: Zhigang Luo <[email protected]>
Reviewed-by: Lijo

amd/amdgpu: improve VF recover time

1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5.
2. set fatel error detected flag.

Signed-off-by: Zhigang Luo <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# f6ac0842 26-Mar-2024 chongli2 <[email protected]>

drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov

support MES command SET_HW_RESOURCE1 in sriov

Signed-off-by: chongli2 <[email protected]>
Reviewed-by: Jingwen Chen <[email protected]

drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov

support MES command SET_HW_RESOURCE1 in sriov

Signed-off-by: chongli2 <[email protected]>
Reviewed-by: Jingwen Chen <[email protected]>
Acked-by: Jingwen Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 3e2dacca 27-Mar-2024 Danijel Slivka <[email protected]>

drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards

Apply this rule to all newer asics in sriov case.
For asic with VF MMIO access protection avoid using CPU for VM table update

drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards

Apply this rule to all newer asics in sriov case.
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.
Moved the check to amdgpu_device_init() to ensure it is done after
amdgpu_device_ip_early_init() where the IP versions are discovered.

Signed-off-by: Danijel Slivka <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8, v6.8-rc7
# ab66c832 29-Feb-2024 Zhigang Luo <[email protected]>

drm/amdgpu: trigger flr_work if reading pf2vf data failed

if reading pf2vf data failed 30 times continuously, it means something is
wrong. Need to trigger flr_work to recover the issue.

also use de

drm/amdgpu: trigger flr_work if reading pf2vf data failed

if reading pf2vf data failed 30 times continuously, it means something is
wrong. Need to trigger flr_work to recover the issue.

also use dev_err to print the error message to get which device has
issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL
timeout.

Signed-off-by: Zhigang Luo <[email protected]>
Acked-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.8-rc6, v6.8-rc5
# 8093383a 12-Feb-2024 Victor Lu <[email protected]>

drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2)

The current error detection only looks for a timeout.
This should be changed to also check scratch_reg1 for any errors
returned fro

drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2)

The current error detection only looks for a timeout.
This should be changed to also check scratch_reg1 for any errors
returned from RLCG.

v2: remove new error value

Signed-off-by: Victor Lu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


1234567