History log of /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c (Results 1 – 25 of 298)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13
# 16590745 15-Jan-2025 Christian König <[email protected]>

drm/amdgpu: use GFP_NOWAIT for memory allocations

In the critical submission path memory allocations can't wait for
reclaim since that can potentially wait for submissions to finish.

Finally clean

drm/amdgpu: use GFP_NOWAIT for memory allocations

In the critical submission path memory allocations can't wait for
reclaim since that can potentially wait for submissions to finish.

Finally clean that up and mark most memory allocations in the critical
path with GFP_NOWAIT. The only exception left is the dma_fence_array()
used when no VMID is available, but that will be cleaned up later on.

Signed-off-by: Christian König <[email protected]>
Acked-by: Srinivasan Shanmugam <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# ca17c8e1 03-Mar-2025 James Zhu <[email protected]>

drm/amdkfd: remove unnecessary cpu domain validation

before move to GTT domain.

Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex

drm/amdkfd: remove unnecessary cpu domain validation

before move to GTT domain.

Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# cb0de06d 29-Jan-2025 Christian König <[email protected]>

drm/amdgpu: remove all KFD fences from the BO on release

Remove all KFD BOs from the private dma_resv object.

This prevents the KFD from being evict unecessarily when an exported BO
is released.

S

drm/amdgpu: remove all KFD fences from the BO on release

Remove all KFD BOs from the private dma_resv object.

This prevents the KFD from being evict unecessarily when an exported BO
is released.

Signed-off-by: Christian König <[email protected]>
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-and-tested-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 10e08943 12-Feb-2025 Xiaogang Chen <[email protected]>

drm/amdkfd: Fix pasid value leak

Curret kfd does not allocate pasid values, instead uses pasid value for each
vm from graphic driver. So should not prevent graphic driver from releasing
pasid values

drm/amdkfd: Fix pasid value leak

Curret kfd does not allocate pasid values, instead uses pasid value for each
vm from graphic driver. So should not prevent graphic driver from releasing
pasid values since the values are allocated by graphic driver, not kfd driver
anymore. This patch does not stop graphic driver release pasid values.

Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Signed-off-by: Xiaogang Chen <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 8b0d068e 30-Jan-2025 Alex Deucher <[email protected]>

drm/amdkfd: add a new flag to manage where VRAM allocations go

On big and small APUs we send KFD VRAM allocations to GTT
since the carve out is either non-existent or relatively
small. However, if

drm/amdkfd: add a new flag to manage where VRAM allocations go

On big and small APUs we send KFD VRAM allocations to GTT
since the carve out is either non-existent or relatively
small. However, if someone sets the carve out size to be
relatively large, we may end up using GTT rather than VRAM.

No change of logic with this patch, but it allows the
driver to determine which logic to use based on the
carve out size in the future.

Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 8544374c 13-Jan-2025 Xiaogang Chen <[email protected]>

drm/amdkfd: Have kfd driver use same PASID values from graphic driver

Current kfd driver has its own PASID value for a kfd process and uses it to
locate vm at interrupt handler or mapping between kf

drm/amdkfd: Have kfd driver use same PASID values from graphic driver

Current kfd driver has its own PASID value for a kfd process and uses it to
locate vm at interrupt handler or mapping between kfd process and vm. That
design is not working when a physical gpu device has multiple spatial
partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid
values that graphic driver generated which is per vm per pasid.

These pasid values are passed to fw/hardware. We do not need change interrupt
handler though more pasid values are used. Also, pasid values at log are
replaced by user process pid; pasid values are not exposed to user. Users see
their process pids that have meaning in user space.

Signed-off-by: Xiaogang Chen <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# 357ef5b3 10-Dec-2024 Andrew Martin <[email protected]>

drm/amdgpu: Failed to check various return code

Clean up code to quiet the compiler on us failing to check the return
code.

Signed-off-by: Andrew Martin <[email protected]>
Reviewed-by: Harish

drm/amdgpu: Failed to check various return code

Clean up code to quiet the compiler on us failing to check the return
code.

Signed-off-by: Andrew Martin <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4
# 10112bf8 17-Oct-2024 Xiaogang Chen <[email protected]>

drm/amdkfd: Not restore userptr buffer if kfd process has been removed

When kfd process has been terminated not restore userptr buffer after mmu
notifier invalidates a range.

Signed-off-by: Xiaogan

drm/amdkfd: Not restore userptr buffer if kfd process has been removed

When kfd process has been terminated not restore userptr buffer after mmu
notifier invalidates a range.

Signed-off-by: Xiaogang Chen <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.12-rc3, v6.12-rc2, v6.12-rc1
# d7d7b947 27-Sep-2024 Lang Yu <[email protected]>

drm/amdkfd: Fix an eviction fence leak

Only creating a new reference for each process instead of each VM.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Feli

drm/amdkfd: Fix an eviction fence leak

Only creating a new reference for each process instead of each VM.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Felix Kuehling <[email protected]>
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 5fa436289483ae56427b0896c31f72361223c758)
Cc: [email protected]

show more ...


# 5fa43628 27-Sep-2024 Lang Yu <[email protected]>

drm/amdkfd: Fix an eviction fence leak

Only creating a new reference for each process instead of each VM.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Feli

drm/amdkfd: Fix an eviction fence leak

Only creating a new reference for each process instead of each VM.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Felix Kuehling <[email protected]>
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# f2be7b39 05-Jun-2024 Christian König <[email protected]>

drm/amdgpu: remove amdgpu_pin_restricted()

We haven't used the functionality to pin BOs in a certain range at all
while the driver existed. Just nuke it.

Signed-off-by: Christian König <christian.k

drm/amdgpu: remove amdgpu_pin_restricted()

We haven't used the functionality to pin BOs in a certain range at all
while the driver existed. Just nuke it.

Signed-off-by: Christian König <[email protected]>
Acked-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 6c6ca71b 04-Jun-2024 Al Viro <[email protected]>

drm/amdgpu: fix a race in kfd_mem_export_dmabuf()

Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into
descriptor table, only to have it looked up by file descriptor and
remove it

drm/amdgpu: fix a race in kfd_mem_export_dmabuf()

Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into
descriptor table, only to have it looked up by file descriptor and
remove it from descriptor table is not just too convoluted - it's
racy; another thread might have modified the descriptor table while
we'd been going through that song and dance.

Switch kfd_mem_export_dmabuf() to using drm_gem_prime_handle_to_dmabuf()
and leave the descriptor table alone...

Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 834368ea 20-Jun-2024 Philip Yang <[email protected]>

drm/amdkfd: Ensure user queue buffers residency

Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap
BO from the GPU if the bo_va queue_refcount is not zero.

Create queue to incre

drm/amdkfd: Ensure user queue buffers residency

Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap
BO from the GPU if the bo_va queue_refcount is not zero.

Create queue to increase the bo_va queue_refcount, destroy queue to
decrease the bo_va queue_refcount, to ensure the queue buffers mapped on
the GPU when queue is active.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# fb910658 20-Jun-2024 Philip Yang <[email protected]>

drm/amdkfd: Refactor queue wptr_bo GART mapping

Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.

drm/amdkfd: Refactor queue wptr_bo GART mapping

Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.

Add wptr_bo to structure queue_properties because structure queue is
allocated after queue buffers are validated, then we can remove wptr_bo
parameter from pqm_create_queue.

Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART
mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue,
the same location with queue ctx_bo GART mapping.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# f9e292cb 20-Jun-2024 Philip Yang <[email protected]>

drm/amdkfd: kfd_bo_mapped_dev support partition

Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter
instead of adev, to support spatial partition. This is only used by CRIU
checkpoint

drm/amdkfd: kfd_bo_mapped_dev support partition

Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter
instead of adev, to support spatial partition. This is only used by CRIU
checkpoint restore now. No functional change.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc2
# 473af28d 28-May-2024 Hawking Zhang <[email protected]>

drm/amdgpu: Estimate RAS reservation when report capacity v2

Add estimate of how much vram we need to reserve for RAS
when caculating the total available vram.

v2: apply the change to MP0 v13_0_2 a

drm/amdgpu: Estimate RAS reservation when report capacity v2

Add estimate of how much vram we need to reserve for RAS
when caculating the total available vram.

v2: apply the change to MP0 v13_0_2 and v13_0_14

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.10-rc1
# 7978c4d4 22-May-2024 Alex Deucher <[email protected]>

drm/amdkfd: simplify APU VRAM handling

With commit 89773b85599a
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified. Since AMD_I

drm/amdkfd: simplify APU VRAM handling

With commit 89773b85599a
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.

v2: clean up a few more places (Lang)

Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lang Yu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# f326d7cc 14-May-2024 Xiaogang Chen <[email protected]>

drm/kfd: Correct pinned buffer handling at kfd restore and validate process

This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop")
since buffer got pinned is not r

drm/kfd: Correct pinned buffer handling at kfd restore and validate process

This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop")
since buffer got pinned is not related whether it needs mapping
And skip buffer validation at kfd driver if the buffer has been pinned.

Signed-off-by: Xiaogang Chen <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9, v6.9-rc7
# 9095e554 30-Apr-2024 Philip Yang <[email protected]>

drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

On system with khugepaged enabled and user cases with THP buffer, the
hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary

drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

On system with khugepaged enabled and user cases with THP buffer, the
hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary
timeout value is not accurate, cause memory allocation failure.

Remove the arbitrary timeout value, return EAGAIN to application if
hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call
ioctl again.

Change EAGAIN to debug message as this is not error.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc6
# 89773b85 26-Apr-2024 Lang Yu <[email protected]>

drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs

Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
m

drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs

Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
memory allocation requirements.

We can't even run a Basic MNIST Example with a default 512MB carveout.
https://github.com/pytorch/examples/tree/main/mnist. Error Log:

"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
and 22.17 MiB is reserved by PyTorch but unallocated"

Though we can change BIOS settings to enlarge carveout size,
which is inflexible and may bring complaint. On the other hand,
the memory resource can't be effectively used between host and device.

The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
Then device and host can flexibly and effectively share memory resource.

v2: Report local_mem_size_private as 0. (Felix)

Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc5, v6.9-rc4
# 2d6f49ee 11-Apr-2024 Lang Yu <[email protected]>

drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms

Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
Two attachments use the same VM, root PD would be locked twice.

drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms

Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
Two attachments use the same VM, root PD would be locked twice.

[ 57.910418] Call Trace:
[ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
[ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
[ 57.793923] ? idr_get_next_ul+0xbe/0x100
[ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
[ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
[ 57.794141] ? process_scheduled_works+0x29c/0x580
[ 57.794147] process_scheduled_works+0x303/0x580
[ 57.794157] ? __pfx_worker_thread+0x10/0x10
[ 57.794160] worker_thread+0x1a2/0x370
[ 57.794165] ? __pfx_worker_thread+0x10/0x10
[ 57.794167] kthread+0x11b/0x150
[ 57.794172] ? __pfx_kthread+0x10/0x10
[ 57.794177] ret_from_fork+0x3d/0x60
[ 57.794181] ? __pfx_kthread+0x10/0x10
[ 57.794184] ret_from_fork_asm+0x1b/0x30

Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


Revision tags: v6.9-rc3
# d53ce023 05-Apr-2024 Philip Yang <[email protected]>

drm/amdkfd: Evict BO itself for contiguous allocation

If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM

drm/amdkfd: Evict BO itself for contiguous allocation

If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.

v6: user context should use interruptible call (Felix)

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 155ce502 05-Apr-2024 Philip Yang <[email protected]>

drm/amdgpu: Support contiguous VRAM allocation

RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.

Add a new KFD alloc memory fl

drm/amdgpu: Support contiguous VRAM allocation

RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.

Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask
VRAM buddy allocator to get contiguous VRAM.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# 1f327dfc 22-May-2024 Alex Deucher <[email protected]>

drm/amdkfd: simplify APU VRAM handling

With commit 89773b85599a
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified. Since AMD_I

drm/amdkfd: simplify APU VRAM handling

With commit 89773b85599a
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.

v2: clean up a few more places (Lang)

Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lang Yu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


# eb853413 26-Apr-2024 Lang Yu <[email protected]>

drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs

Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
m

drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs

Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
memory allocation requirements.

We can't even run a Basic MNIST Example with a default 512MB carveout.
https://github.com/pytorch/examples/tree/main/mnist. Error Log:

"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
and 22.17 MiB is reserved by PyTorch but unallocated"

Though we can change BIOS settings to enlarge carveout size,
which is inflexible and may bring complaint. On the other hand,
the memory resource can't be effectively used between host and device.

The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
Then device and host can flexibly and effectively share memory resource.

v2: Report local_mem_size_private as 0. (Felix)

Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

show more ...


12345678910>>...12