amdgpu_amdkfd_gpuvm.c - OpenGrok history log for /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13
# 16590745	15-Jan-2025	Christian König <[email protected]>	drm/amdgpu: use GFP_NOWAIT for memory allocations In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish. Finally clean drm/amdgpu: use GFP_NOWAIT for memory allocations In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish. Finally clean that up and mark most memory allocations in the critical path with GFP_NOWAIT. The only exception left is the dma_fence_array() used when no VMID is available, but that will be cleaned up later on. Signed-off-by: Christian König <[email protected]> Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# ca17c8e1	03-Mar-2025	James Zhu <[email protected]>	drm/amdkfd: remove unnecessary cpu domain validation before move to GTT domain. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex drm/amdkfd: remove unnecessary cpu domain validation before move to GTT domain. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# cb0de06d	29-Jan-2025	Christian König <[email protected]>	drm/amdgpu: remove all KFD fences from the BO on release Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. S drm/amdgpu: remove all KFD fences from the BO on release Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <[email protected]> Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-and-tested-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 10e08943	12-Feb-2025	Xiaogang Chen <[email protected]>	drm/amdkfd: Fix pasid value leak Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values drm/amdkfd: Fix pasid value leak Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values since the values are allocated by graphic driver, not kfd driver anymore. This patch does not stop graphic driver release pasid values. Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 8b0d068e	30-Jan-2025	Alex Deucher <[email protected]>	drm/amdkfd: add a new flag to manage where VRAM allocations go On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if drm/amdkfd: add a new flag to manage where VRAM allocations go On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM. No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future. Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 8544374c	13-Jan-2025	Xiaogang Chen <[email protected]>	drm/amdkfd: Have kfd driver use same PASID values from graphic driver Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kf drm/amdkfd: Have kfd driver use same PASID values from graphic driver Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid. These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space. Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# 357ef5b3	10-Dec-2024	Andrew Martin <[email protected]>	drm/amdgpu: Failed to check various return code Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by: Andrew Martin <[email protected]> Reviewed-by: Harish drm/amdgpu: Failed to check various return code Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by: Andrew Martin <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4
# 10112bf8	17-Oct-2024	Xiaogang Chen <[email protected]>	drm/amdkfd: Not restore userptr buffer if kfd process has been removed When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range. Signed-off-by: Xiaogan drm/amdkfd: Not restore userptr buffer if kfd process has been removed When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range. Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1
# d7d7b947	27-Sep-2024	Lang Yu <[email protected]>	drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Feli drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <[email protected]> Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 5fa436289483ae56427b0896c31f72361223c758) Cc: [email protected] show more ...
# 5fa43628	27-Sep-2024	Lang Yu <[email protected]>	drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Feli drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <[email protected]> Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# f2be7b39	05-Jun-2024	Christian König <[email protected]>	drm/amdgpu: remove amdgpu_pin_restricted() We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by: Christian König <christian.k drm/amdgpu: remove amdgpu_pin_restricted() We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by: Christian König <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 6c6ca71b	04-Jun-2024	Al Viro <[email protected]>	drm/amdgpu: fix a race in kfd_mem_export_dmabuf() Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it drm/amdgpu: fix a race in kfd_mem_export_dmabuf() Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy; another thread might have modified the descriptor table while we'd been going through that song and dance. Switch kfd_mem_export_dmabuf() to using drm_gem_prime_handle_to_dmabuf() and leave the descriptor table alone... Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Al Viro <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 834368ea	20-Jun-2024	Philip Yang <[email protected]>	drm/amdkfd: Ensure user queue buffers residency Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero. Create queue to incre drm/amdkfd: Ensure user queue buffers residency Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero. Create queue to increase the bo_va queue_refcount, destroy queue to decrease the bo_va queue_refcount, to ensure the queue buffers mapped on the GPU when queue is active. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# fb910658	20-Jun-2024	Philip Yang <[email protected]>	drm/amdkfd: Refactor queue wptr_bo GART mapping Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size. drm/amdkfd: Refactor queue wptr_bo GART mapping Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size. Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo parameter from pqm_create_queue. Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue, the same location with queue ctx_bo GART mapping. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# f9e292cb	20-Jun-2024	Philip Yang <[email protected]>	drm/amdkfd: kfd_bo_mapped_dev support partition Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint drm/amdkfd: kfd_bo_mapped_dev support partition Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc2
# 473af28d	28-May-2024	Hawking Zhang <[email protected]>	drm/amdgpu: Estimate RAS reservation when report capacity v2 Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 a drm/amdgpu: Estimate RAS reservation when report capacity v2 Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 and v13_0_14 Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc1
# 7978c4d4	22-May-2024	Alex Deucher <[email protected]>	drm/amdkfd: simplify APU VRAM handling With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_I drm/amdkfd: simplify APU VRAM handling With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# f326d7cc	14-May-2024	Xiaogang Chen <[email protected]>	drm/kfd: Correct pinned buffer handling at kfd restore and validate process This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not r drm/kfd: Correct pinned buffer handling at kfd restore and validate process This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not related whether it needs mapping And skip buffer validation at kfd driver if the buffer has been pinned. Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.9, v6.9-rc7
# 9095e554	30-Apr-2024	Philip Yang <[email protected]>	drm/amdkfd: Remove arbitrary timeout for hmm_range_fault On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary drm/amdkfd: Remove arbitrary timeout for hmm_range_fault On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure. Remove the arbitrary timeout value, return EAGAIN to application if hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call ioctl again. Change EAGAIN to debug message as this is not error. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.9-rc6
# 89773b85	26-Apr-2024	Lang Yu <[email protected]>	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads m drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.9-rc5, v6.9-rc4
# 2d6f49ee	11-Apr-2024	Lang Yu <[email protected]>	drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.9-rc3
# d53ce023	05-Apr-2024	Philip Yang <[email protected]>	drm/amdkfd: Evict BO itself for contiguous allocation If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM drm/amdkfd: Evict BO itself for contiguous allocation If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 155ce502	05-Apr-2024	Philip Yang <[email protected]>	drm/amdgpu: Support contiguous VRAM allocation RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory fl drm/amdgpu: Support contiguous VRAM allocation RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 1f327dfc	22-May-2024	Alex Deucher <[email protected]>	drm/amdkfd: simplify APU VRAM handling With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_I drm/amdkfd: simplify APU VRAM handling With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# eb853413	26-Apr-2024	Lang Yu <[email protected]>	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads m drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
12 3 4 5 6 7 8 9 10 >>...12