|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
f81cd793 |
| 10-Mar-2025 |
Shaoyun Liu <[email protected]> |
drm/amd/amdgpu: Fix MES init sequence
When MES is been used , the set_hw_resource_1 API is required to initialize MES internal context correctly
Signed-off-by: Shaoyun Liu <[email protected]> Ack
drm/amd/amdgpu: Fix MES init sequence
When MES is been used , the set_hw_resource_1 API is required to initialize MES internal context correctly
Signed-off-by: Shaoyun Liu <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5 |
|
| #
a91d91b6 |
| 26-Feb-2025 |
Tony Yi <[email protected]> |
drm/amdgpu: Add support for CPERs on virtualization
Add support for CPERs on VFs.
VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host ev
drm/amdgpu: Add support for CPERs on virtualization
Add support for CPERs on VFs.
VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host event guard, CPER queries need to be rate limited. CPER queries share the same RAS telemetry buffer as error count query, so a mutex protecting the shared buffer was added as well.
For readability, the amdgpu_detect_virtualization was refactored into multiple individual functions.
Signed-off-by: Tony Yi <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3 |
|
| #
dc0297f3 |
| 14-Feb-2025 |
Srinivasan Shanmugam <[email protected]> |
drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV
RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtuali
drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV
RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.
The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb.
The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock).
Fixes the below:
[ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK>
v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).
Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <[email protected]> Cc: Jingwen Chen <[email protected]> Cc: Victor Skvortsov <[email protected]> Cc: Zhigang Luo <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1 |
|
| #
04893397 |
| 21-Jan-2025 |
Victor Skvortsov <[email protected]> |
drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks
VFs are not able to query error counts for all RAS blocks. Rather than returning error for queries on these blocks, skip sysfs
drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks
VFs are not able to query error counts for all RAS blocks. Rather than returning error for queries on these blocks, skip sysfs the creation all together.
Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
a21ab06b |
| 17-Dec-2024 |
Mirsad Todorovac <[email protected]> |
drm/admgpu: replace kmalloc() and memcpy() with kmemdup()
The static analyser tool gave the following advice:
./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup
drm/admgpu: replace kmalloc() and memcpy() with kmemdup()
The static analyser tool gave the following advice:
./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup
→ 1266 tmp = kmalloc(used_size, GFP_KERNEL); 1267 if (!tmp) 1268 return -ENOMEM; 1269 → 1270 memcpy(tmp, &host_telemetry->body.error_count, used_size);
Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics. Original code works without fault, so this is not a bug fix but proposed improvement.
Link: https://lwn.net/Articles/198928/ Fixes: 84a2947ecc85 ("drm/amdgpu: Implement virt req_ras_err_count") Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Cc: Xinhui Pan <[email protected]> Cc: David Airlie <[email protected]> Cc: Simona Vetter <[email protected]> Cc: Zhigang Luo <[email protected]> Cc: Victor Skvortsov <[email protected]> Cc: Hawking Zhang <[email protected]> Cc: Lijo Lazar <[email protected]> Cc: Yunxiang Li <[email protected]> Cc: Jack Xiao <[email protected]> Cc: Vignesh Chander <[email protected]> Cc: Danijel Slivka <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Mirsad Todorovac <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
84a2947e |
| 30-Oct-2024 |
Victor Skvortsov <[email protected]> |
drm/amdgpu: Implement virt req_ras_err_count
Enable RAS late init if VF RAS Telemetry is supported.
When enabled, the VF can use this interface to query total RAS error counts from the host.
The
drm/amdgpu: Implement virt req_ras_err_count
Enable RAS late init if VF RAS Telemetry is supported.
When enabled, the VF can use this interface to query total RAS error counts from the host.
The VF FB access may abruptly end due to a fatal error, therefore the VF must cache and sanitize the input.
The Host allows 15 Telemetry messages every 60 seconds, afterwhich the host will ignore any more in-coming telemetry messages. The VF will rate limit its msg calling to once every 5 seconds (12 times in 60 seconds). While the VF is rate limited, it will continue to report the last good cached data.
v2: Flip generate report & update statistics order for VF
Signed-off-by: Victor Skvortsov <[email protected]> Acked-by: Tao Zhou <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
907fec2d |
| 30-Oct-2024 |
Victor Skvortsov <[email protected]> |
drm/amdgpu: VF Query RAS Caps from Host if supported
If VF RAS Capability support is enabled, guest is able to retrieve the real RAS support from the host.
Signed-off-by: Victor Skvortsov <victor.s
drm/amdgpu: VF Query RAS Caps from Host if supported
If VF RAS Capability support is enabled, guest is able to retrieve the real RAS support from the host.
Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3 |
|
| #
2029b3d7 |
| 07-Aug-2024 |
Jack Xiao <[email protected]> |
drm/amdgpu/mes: add multiple mes ring instances support
Add multiple mes ring instances in mes structure to support multiple mes pipes.
Signed-off-by: Jack Xiao <[email protected]> Acked-by: Alex D
drm/amdgpu/mes: add multiple mes ring instances support
Add multiple mes ring instances in mes structure to support multiple mes pipes.
Signed-off-by: Jack Xiao <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit c7d4355648ffa02a1551495b05c71ea6c884d29c)
show more ...
|
| #
f83cec3b |
| 08-Aug-2024 |
Victor Skvortsov <[email protected]> |
drm/amdgpu: Disable dpm_enabled flag while VF is in reset
VFs do not perform HW fini/suspend in FLR, so the dpm_enabled is incorrectly kept enabled. Add interface to disable it in virt_pre_reset cal
drm/amdgpu: Disable dpm_enabled flag while VF is in reset
VFs do not perform HW fini/suspend in FLR, so the dpm_enabled is incorrectly kept enabled. Add interface to disable it in virt_pre_reset call.
v2: Made implementation generic for all asics v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF
Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
c7d43556 |
| 07-Aug-2024 |
Jack Xiao <[email protected]> |
drm/amdgpu/mes: add multiple mes ring instances support
Add multiple mes ring instances in mes structure to support multiple mes pipes.
Signed-off-by: Jack Xiao <[email protected]> Acked-by: Alex D
drm/amdgpu/mes: add multiple mes ring instances support
Add multiple mes ring instances in mes structure to support multiple mes pipes.
Signed-off-by: Jack Xiao <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6 |
|
| #
33f23fc3 |
| 27-Jun-2024 |
Yifan Zha <[email protected]> |
drm/amdgpu: Set no_hw_access when VF request full GPU fails
[Why] If VF request full GPU access and the request failed, the VF driver can get stuck accessing registers for an extended period during
drm/amdgpu: Set no_hw_access when VF request full GPU fails
[Why] If VF request full GPU access and the request failed, the VF driver can get stuck accessing registers for an extended period during the unload of KMS.
[How] Set no_hw_access flag when VF request for full GPU access fails This prevents further hardware access attempts, avoiding the prolonged stuck state.
Signed-off-by: Yifan Zha <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
cbda2758 |
| 24-Jun-2024 |
Vignesh Chander <[email protected]> |
drm/amdgpu: process RAS fatal error MB notification
For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset complet
drm/amdgpu: process RAS fatal error MB notification
For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host.
v2: add another mailbox check for handling case where kfd detects timeout first
v3: set host_flr bit and use wait_for_reset
Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1 |
|
| #
5c0a1cdd |
| 24-May-2024 |
Yunxiang Li <[email protected]> |
drm/amdgpu: fix sriov host flr handler
We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen.
In the current st
drm/amdgpu: fix sriov host flr handler
We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen.
In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later.
Signed-off-by: Yunxiang Li <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
b3948ad1 |
| 24-May-2024 |
Yunxiang Li <[email protected]> |
drm/amdgpu: add skip_hw_access checks for sriov
Accessing registers via host is missing the check for skip_hw_access and the lockdep check that comes with it.
Signed-off-by: Yunxiang Li <Yunxiang.L
drm/amdgpu: add skip_hw_access checks for sriov
Accessing registers via host is missing the check for skip_hw_access and the lockdep check that comes with it.
Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
c8ad1bbb |
| 31-May-2024 |
Lin.Cao <[email protected]> |
drm/amdgpu: fix failure mapping legacy queue when FLR
Flag "mes.ring.shced.ready" will be set as true after mes hw init and set as false when mes hw fini to avoid duplicate initialization. But hw fi
drm/amdgpu: fix failure mapping legacy queue when FLR
Flag "mes.ring.shced.ready" will be set as true after mes hw init and set as false when mes hw fini to avoid duplicate initialization. But hw fini will not be called when function level reset, which will cause mes hw init be skipped during FLR, which will leads to mapping legacy queue fail. Set this flag as false when post reset will fix this issue.
Signed-off-by: Lin.Cao <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
e864180e |
| 27-May-2024 |
Victor Skvortsov <[email protected]> |
drm/amdgpu: Add lock around VF RLCG interface
flush_gpu_tlb may be called from another thread while device_gpu_recover is running.
Both of these threads access registers through the VF RLCG interfa
drm/amdgpu: Add lock around VF RLCG interface
flush_gpu_tlb may be called from another thread while device_gpu_recover is running.
Both of these threads access registers through the VF RLCG interface during VF Full Access. Add a lock around this interface to prevent race conditions between these threads.
Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
5434bc03 |
| 19-May-2024 |
Victor Skvortsov <[email protected]> |
drm/amdgpu: Queue KFD reset workitem in VF FED
The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery se
drm/amdgpu: Queue KFD reset workitem in VF FED
The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error
As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED.
Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
f4322b9f |
| 22-Apr-2024 |
Yunxiang Li <[email protected]> |
drm/amdgpu: Fix two reset triggered in a row
Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule a
drm/amdgpu: Fix two reset triggered in a row
Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule after we call amdgpu_device_stop_pending_resets.
Move amdgpu_device_stop_pending_resets to after the reset is done. Since at this point the GPU is supposedly in a good state, any reset scheduled after this point would be a legitimate reset.
Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose.
Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5 |
|
| #
3bcc0ee1 |
| 16-Apr-2024 |
Zhigang Luo <[email protected]> |
drm/amdgpu: avoid reading vf2pf info size from FB
VF can't access FB when host is doing mode1 reset. Using sizeof to get vf2pf info size, instead of reading it from vf2pf header stored in FB.
Signe
drm/amdgpu: avoid reading vf2pf info size from FB
VF can't access FB when host is doing mode1 reset. Using sizeof to get vf2pf info size, instead of reading it from vf2pf header stored in FB.
Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
0fa4c25d |
| 26-Apr-2024 |
Tim Huang <[email protected]> |
drm/amdgpu: fix uninitialized scalar variable warning
Clear warning that field bp is uninitialized when calling amdgpu_virt_ras_add_bps.
Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Ya
drm/amdgpu: fix uninitialized scalar variable warning
Clear warning that field bp is uninitialized when calling amdgpu_virt_ras_add_bps.
Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
d1999b40 |
| 20-Mar-2024 |
Zhigang Luo <[email protected]> |
amd/amdgpu: improve VF recover time
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag.
Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Lijo
amd/amdgpu: improve VF recover time
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag.
Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
f6ac0842 |
| 26-Mar-2024 |
chongli2 <[email protected]> |
drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov
support MES command SET_HW_RESOURCE1 in sriov
Signed-off-by: chongli2 <[email protected]> Reviewed-by: Jingwen Chen <[email protected]
drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov
support MES command SET_HW_RESOURCE1 in sriov
Signed-off-by: chongli2 <[email protected]> Reviewed-by: Jingwen Chen <[email protected]> Acked-by: Jingwen Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
3e2dacca |
| 27-Mar-2024 |
Danijel Slivka <[email protected]> |
drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards
Apply this rule to all newer asics in sriov case. For asic with VF MMIO access protection avoid using CPU for VM table update
drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards
Apply this rule to all newer asics in sriov case. For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. Moved the check to amdgpu_device_init() to ensure it is done after amdgpu_device_ip_early_init() where the IP versions are discovered.
Signed-off-by: Danijel Slivka <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7 |
|
| #
ab66c832 |
| 29-Feb-2024 |
Zhigang Luo <[email protected]> |
drm/amdgpu: trigger flr_work if reading pf2vf data failed
if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue.
also use de
drm/amdgpu: trigger flr_work if reading pf2vf data failed
if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue.
also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout.
Signed-off-by: Zhigang Luo <[email protected]> Acked-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5 |
|
| #
8093383a |
| 12-Feb-2024 |
Victor Lu <[email protected]> |
drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2)
The current error detection only looks for a timeout. This should be changed to also check scratch_reg1 for any errors returned fro
drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2)
The current error detection only looks for a timeout. This should be changed to also check scratch_reg1 for any errors returned from RLCG.
v2: remove new error value
Signed-off-by: Victor Lu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|