|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
9424a5bf |
| 10-Feb-2025 |
Jonathan Kim <[email protected]> |
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <[email protected]
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1 |
|
| #
cb0de06d |
| 29-Jan-2025 |
Christian König <[email protected]> |
drm/amdgpu: remove all KFD fences from the BO on release
Remove all KFD BOs from the private dma_resv object.
This prevents the KFD from being evict unecessarily when an exported BO is released.
S
drm/amdgpu: remove all KFD fences from the BO on release
Remove all KFD BOs from the private dma_resv object.
This prevents the KFD from being evict unecessarily when an exported BO is released.
Signed-off-by: Christian König <[email protected]> Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-and-tested-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
10e08943 |
| 12-Feb-2025 |
Xiaogang Chen <[email protected]> |
drm/amdkfd: Fix pasid value leak
Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values
drm/amdkfd: Fix pasid value leak
Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values since the values are allocated by graphic driver, not kfd driver anymore. This patch does not stop graphic driver release pasid values.
Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13 |
|
| #
8544374c |
| 13-Jan-2025 |
Xiaogang Chen <[email protected]> |
drm/amdkfd: Have kfd driver use same PASID values from graphic driver
Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kf
drm/amdkfd: Have kfd driver use same PASID values from graphic driver
Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid.
These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space.
Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
1b001432 |
| 13-Nov-2024 |
Philip Yang <[email protected]> |
drm/amdgpu: Optimize gfx v9 GPU page fault handling
After GPU page fault, there are lots of page fault interrupts generated at short period even with CAM filter enabled because the fault address is
drm/amdgpu: Optimize gfx v9 GPU page fault handling
After GPU page fault, there are lots of page fault interrupts generated at short period even with CAM filter enabled because the fault address is different. Each page fault copy to KFD ih fifo to send event to user space by KFD interrupt worker, this could cause KFD ih fifo overflow while other processes generate events at same time.
KFD process is aborted after GPU page fault, we only need one GPU page fault interrupt sent to KFD ih fifo to send memory exception event to user space.
Incease KFD ih fifo size to 2 times of IH primary ring size, to handle the burst events case.
This patch handle the gfx v9 path, cover retry on/off and CAM filter on/off cases.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4 |
|
| #
8fe7cf58 |
| 14-Oct-2024 |
Alex Deucher <[email protected]> |
drm/amdkfd: add an interface to query whether is KFD is active
Add an interface to query whether KFD has any active queues.
v2: fix build issues
Acked-by: Srinivasan Shanmugam <srinivasan.shanmuga
drm/amdkfd: add an interface to query whether is KFD is active
Add an interface to query whether KFD has any active queues.
v2: fix build issues
Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
3eebfd5e |
| 12-Sep-2024 |
Feifei Xu <[email protected]> |
drm/amdkfd:Add kfd function to config sq perfmon
Expose the interface for kfd to config sq perfmon.
Signed-off-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> R
drm/amdkfd:Add kfd function to config sq perfmon
Expose the interface for kfd to config sq perfmon.
Signed-off-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
b05d6476 |
| 19-Aug-2024 |
Hawking Zhang <[email protected]> |
drm/amdgpu: Retire query_utcl2_poison_status callback
Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed.
Signed-off-by: Hawking Zhang <Hawking.Z
drm/amdgpu: Retire query_utcl2_poison_status callback
Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed.
Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
234eebe1 |
| 29-Jul-2024 |
Amber Lin <[email protected]> |
drm/amdkfd: APIs to stop/start KFD scheduling
Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgp
drm/amdkfd: APIs to stop/start KFD scheduling
Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from runlist. If users send ioctls to KFD to create queues, they'll be added but those queues won't be mapped to runlist (so not scheduled) until amdgpu_amdkfd_start_sched is called.
v2: fix build (Alex)
Signed-off-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5 |
|
| #
fb910658 |
| 20-Jun-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: Refactor queue wptr_bo GART mapping
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size.
drm/amdkfd: Refactor queue wptr_bo GART mapping
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size.
Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo parameter from pqm_create_queue.
Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue, the same location with queue ctx_bo GART mapping.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
c86ad391 |
| 14-Jul-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original poi
drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original pointer not set to NULL, this could cause use-after-free bug.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
f9e292cb |
| 20-Jun-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: kfd_bo_mapped_dev support partition
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint
drm/amdkfd: kfd_bo_mapped_dev support partition
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc4, v6.10-rc3 |
|
| #
dbe2c4c8 |
| 03-Jun-2024 |
Eric Huang <[email protected]> |
drm/amdkfd: add reset cause in gpu pre-reset smi event
reset cause is requested by customer as additional info for gpu reset smi event.
v2: integerate reset sources suggested by Lijo Lazar
Signed-
drm/amdkfd: add reset cause in gpu pre-reset smi event
reset cause is requested by customer as additional info for gpu reset smi event.
v2: integerate reset sources suggested by Lijo Lazar
Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
bfa579b3 |
| 22-Apr-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: prepare to handle pasid poison consumption
Prepare to handle pasid poison consumption.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-
drm/amdgpu: prepare to handle pasid poison consumption
Prepare to handle pasid poison consumption.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
2fc46e0b |
| 12-Mar-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: make reset method configurable for RAS poison
Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling
drm/amdgpu: make reset method configurable for RAS poison
Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling.
v2: remove the mmhub poison support for kfd int v10.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
d8070c42 |
| 11-Mar-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: support utcl2 RAS poison query for mmhub
Support the query for both gfxhub and mmhub, also replace xcc_id with hub_inst.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking
drm/amdgpu: support utcl2 RAS poison query for mmhub
Support the query for both gfxhub and mmhub, also replace xcc_id with hub_inst.
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
71a8d61e |
| 19-Feb-2024 |
Tao Zhou <[email protected]> |
drm/amdgpu: retire gfx ras query_utcl2_poison_status
Replace it with related interface in gfxhub functions.
v2: replace node id with xcc id. get node id for query_utcl2_poison_status
Signed-of
drm/amdgpu: retire gfx ras query_utcl2_poison_status
Replace it with related interface in gfxhub functions.
v2: replace node id with xcc id. get node id for query_utcl2_poison_status
Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
1761d9a6 |
| 28-Feb-2024 |
Eric Huang <[email protected]> |
amd/amdkfd: remove unused parameter
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart().
Signed-off-by: Eric
amd/amdkfd: remove unused parameter
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart().
Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
e1f6746f |
| 22-Feb-2024 |
Lijo Lazar <[email protected]> |
drm/amdkfd: Skip packet submission on fatal error
If fatal error is detected, packet submission won't go through. Return error in such cases. Also, avoid waiting for fence when fatal error is detect
drm/amdkfd: Skip packet submission on fatal error
If fatal error is detected, packet submission won't go through. Return error in such cases. Also, avoid waiting for fence when fatal error is detected.
Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1 |
|
| #
9c29282e |
| 11-Jan-2024 |
Lang Yu <[email protected]> |
drm/amdkfd: reserve the BO before validating it
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
drm/amdkfd: reserve the BO before validating it
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
db2aad03 |
| 25-Jan-2024 |
Le Ma <[email protected]> |
drm/amdgpu: move the drm client creation behind drm device registration
This patch is to eliminate interrupt warning below:
"[drm] Fence fallback timer expired on ring sdma0.0".
An early vm pt c
drm/amdgpu: move the drm client creation behind drm device registration
This patch is to eliminate interrupt warning below:
"[drm] Fence fallback timer expired on ring sdma0.0".
An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow.
v2: wrap the drm client creation
Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
0c93bd49 |
| 11-Jan-2024 |
Lang Yu <[email protected]> |
drm/amdkfd: reserve the BO before validating it
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
drm/amdkfd: reserve the BO before validating it
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
c0125b84 |
| 25-Jan-2024 |
Le Ma <[email protected]> |
drm/amdgpu: move the drm client creation behind drm device registration
This patch is to eliminate interrupt warning below:
"[drm] Fence fallback timer expired on ring sdma0.0".
An early vm pt c
drm/amdgpu: move the drm client creation behind drm device registration
This patch is to eliminate interrupt warning below:
"[drm] Fence fallback timer expired on ring sdma0.0".
An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow.
v2: wrap the drm client creation
Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
ed1e1e42 |
| 23-Jan-2024 |
YiPeng Chai <[email protected]> |
drm/amdgpu: Support passing poison consumption ras block to SRIOV
Support passing poison consumption ras blocks to SRIOV.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang
drm/amdgpu: Support passing poison consumption ras block to SRIOV
Support passing poison consumption ras blocks to SRIOV.
Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.7 |
|
| #
50661eb1 |
| 03-Jan-2024 |
Felix Kuehling <[email protected]> |
drm/amdgpu: Auto-validate DMABuf imports in compute VMs
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to valid
drm/amdgpu: Auto-validate DMABuf imports in compute VMs
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them.
This patch automatically validates and fences dymanic DMABuf imports when they are added to a compute VM. Revalidation after evictions is handled in the VM code.
v2: * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate * Eliminated evicted_user state, use evicted state for VM BOs and user BOs * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD
v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the reservation with its fences is shared with the export, as long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker.
v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket.
Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|