|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13 |
|
| #
16590745 |
| 15-Jan-2025 |
Christian König <[email protected]> |
drm/amdgpu: use GFP_NOWAIT for memory allocations
In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish.
Finally clean
drm/amdgpu: use GFP_NOWAIT for memory allocations
In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish.
Finally clean that up and mark most memory allocations in the critical path with GFP_NOWAIT. The only exception left is the dma_fence_array() used when no VMID is available, but that will be cleaned up later on.
Signed-off-by: Christian König <[email protected]> Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
ca17c8e1 |
| 03-Mar-2025 |
James Zhu <[email protected]> |
drm/amdkfd: remove unnecessary cpu domain validation
before move to GTT domain.
Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex
drm/amdkfd: remove unnecessary cpu domain validation
before move to GTT domain.
Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
cb0de06d |
| 29-Jan-2025 |
Christian König <[email protected]> |
drm/amdgpu: remove all KFD fences from the BO on release
Remove all KFD BOs from the private dma_resv object.
This prevents the KFD from being evict unecessarily when an exported BO is released.
S
drm/amdgpu: remove all KFD fences from the BO on release
Remove all KFD BOs from the private dma_resv object.
This prevents the KFD from being evict unecessarily when an exported BO is released.
Signed-off-by: Christian König <[email protected]> Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-and-tested-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
10e08943 |
| 12-Feb-2025 |
Xiaogang Chen <[email protected]> |
drm/amdkfd: Fix pasid value leak
Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values
drm/amdkfd: Fix pasid value leak
Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values since the values are allocated by graphic driver, not kfd driver anymore. This patch does not stop graphic driver release pasid values.
Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
8b0d068e |
| 30-Jan-2025 |
Alex Deucher <[email protected]> |
drm/amdkfd: add a new flag to manage where VRAM allocations go
On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if
drm/amdkfd: add a new flag to manage where VRAM allocations go
On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM.
No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future.
Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
8544374c |
| 13-Jan-2025 |
Xiaogang Chen <[email protected]> |
drm/amdkfd: Have kfd driver use same PASID values from graphic driver
Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kf
drm/amdkfd: Have kfd driver use same PASID values from graphic driver
Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid.
These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space.
Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
357ef5b3 |
| 10-Dec-2024 |
Andrew Martin <[email protected]> |
drm/amdgpu: Failed to check various return code
Clean up code to quiet the compiler on us failing to check the return code.
Signed-off-by: Andrew Martin <[email protected]> Reviewed-by: Harish
drm/amdgpu: Failed to check various return code
Clean up code to quiet the compiler on us failing to check the return code.
Signed-off-by: Andrew Martin <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4 |
|
| #
10112bf8 |
| 17-Oct-2024 |
Xiaogang Chen <[email protected]> |
drm/amdkfd: Not restore userptr buffer if kfd process has been removed
When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range.
Signed-off-by: Xiaogan
drm/amdkfd: Not restore userptr buffer if kfd process has been removed
When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range.
Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
d7d7b947 |
| 27-Sep-2024 |
Lang Yu <[email protected]> |
drm/amdkfd: Fix an eviction fence leak
Only creating a new reference for each process instead of each VM.
Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Feli
drm/amdkfd: Fix an eviction fence leak
Only creating a new reference for each process instead of each VM.
Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <[email protected]> Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 5fa436289483ae56427b0896c31f72361223c758) Cc: [email protected]
show more ...
|
| #
5fa43628 |
| 27-Sep-2024 |
Lang Yu <[email protected]> |
drm/amdkfd: Fix an eviction fence leak
Only creating a new reference for each process instead of each VM.
Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Feli
drm/amdkfd: Fix an eviction fence leak
Only creating a new reference for each process instead of each VM.
Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <[email protected]> Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
| #
f2be7b39 |
| 05-Jun-2024 |
Christian König <[email protected]> |
drm/amdgpu: remove amdgpu_pin_restricted()
We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it.
Signed-off-by: Christian König <christian.k
drm/amdgpu: remove amdgpu_pin_restricted()
We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it.
Signed-off-by: Christian König <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
6c6ca71b |
| 04-Jun-2024 |
Al Viro <[email protected]> |
drm/amdgpu: fix a race in kfd_mem_export_dmabuf()
Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it
drm/amdgpu: fix a race in kfd_mem_export_dmabuf()
Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy; another thread might have modified the descriptor table while we'd been going through that song and dance.
Switch kfd_mem_export_dmabuf() to using drm_gem_prime_handle_to_dmabuf() and leave the descriptor table alone...
Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Al Viro <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
834368ea |
| 20-Jun-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: Ensure user queue buffers residency
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero.
Create queue to incre
drm/amdkfd: Ensure user queue buffers residency
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero.
Create queue to increase the bo_va queue_refcount, destroy queue to decrease the bo_va queue_refcount, to ensure the queue buffers mapped on the GPU when queue is active.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
fb910658 |
| 20-Jun-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: Refactor queue wptr_bo GART mapping
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size.
drm/amdkfd: Refactor queue wptr_bo GART mapping
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size.
Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo parameter from pqm_create_queue.
Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue, the same location with queue ctx_bo GART mapping.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
f9e292cb |
| 20-Jun-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: kfd_bo_mapped_dev support partition
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint
drm/amdkfd: kfd_bo_mapped_dev support partition
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc2 |
|
| #
473af28d |
| 28-May-2024 |
Hawking Zhang <[email protected]> |
drm/amdgpu: Estimate RAS reservation when report capacity v2
Add estimate of how much vram we need to reserve for RAS when caculating the total available vram.
v2: apply the change to MP0 v13_0_2 a
drm/amdgpu: Estimate RAS reservation when report capacity v2
Add estimate of how much vram we need to reserve for RAS when caculating the total available vram.
v2: apply the change to MP0 v13_0_2 and v13_0_14
Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1 |
|
| #
7978c4d4 |
| 22-May-2024 |
Alex Deucher <[email protected]> |
drm/amdkfd: simplify APU VRAM handling
With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_I
drm/amdkfd: simplify APU VRAM handling
With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code.
v2: clean up a few more places (Lang)
Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
f326d7cc |
| 14-May-2024 |
Xiaogang Chen <[email protected]> |
drm/kfd: Correct pinned buffer handling at kfd restore and validate process
This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not r
drm/kfd: Correct pinned buffer handling at kfd restore and validate process
This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not related whether it needs mapping And skip buffer validation at kfd driver if the buffer has been pinned.
Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9, v6.9-rc7 |
|
| #
9095e554 |
| 30-Apr-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: Remove arbitrary timeout for hmm_range_fault
On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary
drm/amdkfd: Remove arbitrary timeout for hmm_range_fault
On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure.
Remove the arbitrary timeout value, return EAGAIN to application if hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call ioctl again.
Change EAGAIN to debug message as this is not error.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6 |
|
| #
89773b85 |
| 26-Apr-2024 |
Lang Yu <[email protected]> |
drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads m
drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements.
We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log:
"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated"
Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device.
The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource.
v2: Report local_mem_size_private as 0. (Felix)
Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4 |
|
| #
2d6f49ee |
| 11-Apr-2024 |
Lang Yu <[email protected]> |
drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice.
drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice.
[ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30
Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3 |
|
| #
d53ce023 |
| 05-Apr-2024 |
Philip Yang <[email protected]> |
drm/amdkfd: Evict BO itself for contiguous allocation
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM
drm/amdkfd: Evict BO itself for contiguous allocation
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM.
v6: user context should use interruptible call (Felix)
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
155ce502 |
| 05-Apr-2024 |
Philip Yang <[email protected]> |
drm/amdgpu: Support contiguous VRAM allocation
RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory fl
drm/amdgpu: Support contiguous VRAM allocation
RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM.
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
1f327dfc |
| 22-May-2024 |
Alex Deucher <[email protected]> |
drm/amdkfd: simplify APU VRAM handling
With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_I
drm/amdkfd: simplify APU VRAM handling
With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code.
v2: clean up a few more places (Lang)
Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|
| #
eb853413 |
| 26-Apr-2024 |
Lang Yu <[email protected]> |
drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads m
drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements.
We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log:
"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated"
Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device.
The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource.
v2: Report local_mem_size_private as 0. (Felix)
Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
show more ...
|