amdgpu_amdkfd.c - OpenGrok history log for /linux-6.15/drivers/gpu/drm/amd/amdgpu/amdgpu

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3
# 9424a5bf	10-Feb-2025	Jonathan Kim <[email protected]>	drm/amdgpu: simplify xgmi peer info calls Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions. Signed-off-by: Jonathan Kim <[email protected] drm/amdgpu: simplify xgmi peer info calls Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.14-rc2, v6.14-rc1
# 8b0d068e	30-Jan-2025	Alex Deucher <[email protected]>	drm/amdkfd: add a new flag to manage where VRAM allocations go On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if drm/amdkfd: add a new flag to manage where VRAM allocations go On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM. No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future. Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.13, v6.13-rc7
# 90505894	09-Jan-2025	Kenneth Feng <[email protected]>	drm/amdgpu: disable gfxoff with the compute workload on gfx12 Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenne drm/amdgpu: disable gfxoff with the compute workload on gfx12 Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 2affe2bbc997b3920045c2c434e480c81a5f9707) Cc: [email protected] # 6.12.x show more ...
# 2affe2bb	09-Jan-2025	Kenneth Feng <[email protected]>	drm/amdgpu: disable gfxoff with the compute workload on gfx12 Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenne drm/amdgpu: disable gfxoff with the compute workload on gfx12 Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# 11815bb0	12-Dec-2024	Christian König <[email protected]>	drm/amdgpu: partially revert "reduce reset time" This partially reverts commit 194eb174cbe4fe2b3376ac30acca2dc8c8beca00. This commit introduced a new state variable into adev without even remotely drm/amdgpu: partially revert "reduce reset time" This partially reverts commit 194eb174cbe4fe2b3376ac30acca2dc8c8beca00. This commit introduced a new state variable into adev without even remotely worrying about CPU barriers. Since we already have the amdgpu_in_reset() function exactly for this use case partially revert that. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# 357ef5b3	10-Dec-2024	Andrew Martin <[email protected]>	drm/amdgpu: Failed to check various return code Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by: Andrew Martin <[email protected]> Reviewed-by: Harish drm/amdgpu: Failed to check various return code Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by: Andrew Martin <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1
# 80d80511	29-Sep-2024	Boyuan Zhang <[email protected]>	drm/amdgpu: pass ip_block in set_powergating_state Pass ip_block instead of adev in set_powergating_state callback function. Modify set_powergating_state ip functions for all correspoding ip blocks. drm/amdgpu: pass ip_block in set_powergating_state Pass ip_block instead of adev in set_powergating_state callback function. Modify set_powergating_state ip functions for all correspoding ip blocks. v2: fix a ip block index error. v3: remove type casting Signed-off-by: Boyuan Zhang <[email protected]> Suggested-by: Christian König <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# fa317985	05-Nov-2024	Lijo Lazar <[email protected]>	drm/amdgpu: Fix map/unmap queue logic In current logic, it calls ring_alloc followed by a ring_test. ring_test in turn will call another ring_alloc. This is illegal usage as a ring_alloc is expected drm/amdgpu: Fix map/unmap queue logic In current logic, it calls ring_alloc followed by a ring_test. ring_test in turn will call another ring_alloc. This is illegal usage as a ring_alloc is expected to be closed properly with a ring_commit. Change to commit the map/unmap queue packet first followed by a ring_test. Add a comment about the usage of ring_test. Also, reorder the current pre-condition checks of job hang or kiq ring scheduler not ready. Without them being met, it is not useful to attempt ring or memory allocations. Fixes tag refers to the original patch which introduced this issue which then got carried over into newer code. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Fixes: 6c10b5cc4eaa ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c") show more ...
# 8fe7cf58	14-Oct-2024	Alex Deucher <[email protected]>	drm/amdkfd: add an interface to query whether is KFD is active Add an interface to query whether KFD has any active queues. v2: fix build issues Acked-by: Srinivasan Shanmugam <srinivasan.shanmuga drm/amdkfd: add an interface to query whether is KFD is active Add an interface to query whether KFD has any active queues. v2: fix build issues Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.11
# 3eebfd5e	12-Sep-2024	Feifei Xu <[email protected]>	drm/amdkfd:Add kfd function to config sq perfmon Expose the interface for kfd to config sq perfmon. Signed-off-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> R drm/amdkfd:Add kfd function to config sq perfmon Expose the interface for kfd to config sq perfmon. Signed-off-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5
# b05d6476	19-Aug-2024	Hawking Zhang <[email protected]>	drm/amdgpu: Retire query_utcl2_poison_status callback Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed. Signed-off-by: Hawking Zhang <Hawking.Z drm/amdgpu: Retire query_utcl2_poison_status callback Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2
# 234eebe1	29-Jul-2024	Amber Lin <[email protected]>	drm/amdkfd: APIs to stop/start KFD scheduling Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgp drm/amdkfd: APIs to stop/start KFD scheduling Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from runlist. If users send ioctls to KFD to create queues, they'll be added but those queues won't be mapped to runlist (so not scheduled) until amdgpu_amdkfd_start_sched is called. v2: fix build (Alex) Signed-off-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.11-rc1, v6.10
# c86ad391	14-Jul-2024	Philip Yang <[email protected]>	drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original poi drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original pointer not set to NULL, this could cause use-after-free bug. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# dbe2c4c8	03-Jun-2024	Eric Huang <[email protected]>	drm/amdkfd: add reset cause in gpu pre-reset smi event reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed- drm/amdkfd: add reset cause in gpu pre-reset smi event reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6
# 89773b85	26-Apr-2024	Lang Yu <[email protected]>	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads m drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# eb853413	26-Apr-2024	Lang Yu <[email protected]>	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads m drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# bfa579b3	22-Apr-2024	YiPeng Chai <[email protected]>	drm/amdgpu: prepare to handle pasid poison consumption Prepare to handle pasid poison consumption. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed- drm/amdgpu: prepare to handle pasid poison consumption Prepare to handle pasid poison consumption. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# 2fc46e0b	12-Mar-2024	Tao Zhou <[email protected]>	drm/amdgpu: make reset method configurable for RAS poison Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling drm/amdgpu: make reset method configurable for RAS poison Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling. v2: remove the mmhub poison support for kfd int v10. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# d8070c42	11-Mar-2024	Tao Zhou <[email protected]>	drm/amdgpu: support utcl2 RAS poison query for mmhub Support the query for both gfxhub and mmhub, also replace xcc_id with hub_inst. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking drm/amdgpu: support utcl2 RAS poison query for mmhub Support the query for both gfxhub and mmhub, also replace xcc_id with hub_inst. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.8, v6.8-rc7, v6.8-rc6
# 71a8d61e	19-Feb-2024	Tao Zhou <[email protected]>	drm/amdgpu: retire gfx ras query_utcl2_poison_status Replace it with related interface in gfxhub functions. v2: replace node id with xcc id. get node id for query_utcl2_poison_status Signed-of drm/amdgpu: retire gfx ras query_utcl2_poison_status Replace it with related interface in gfxhub functions. v2: replace node id with xcc id. get node id for query_utcl2_poison_status Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# f679fd60	04-Mar-2024	Ahmad Rehman <[email protected]>	drm/amdgpu: Init zone device and drm client after mode-1 reset on reload In passthrough environment, when amdgpu is reloaded after unload, mode-1 is triggered after initializing the necessary IPs, T drm/amdgpu: Init zone device and drm client after mode-1 reset on reload In passthrough environment, when amdgpu is reloaded after unload, mode-1 is triggered after initializing the necessary IPs, That init does not include KFD, and KFD init waits until the reset is completed. KFD init is called in the reset handler, but in this case, the zone device and drm client is not initialized, causing app to create kernel panic. v2: Removing the init KFD condition from amdgpu_amdkfd_drm_client_create. As the previous version has the potential of creating DRM client twice. v3: v2 patch results in SDMA engine hung as DRM open causes VM clear to SDMA before SDMA init. Adding the condition to in drm client creation, on top of v1, to guard against drm client creation call multiple times. Signed-off-by: Ahmad Rehman <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# e1f6746f	22-Feb-2024	Lijo Lazar <[email protected]>	drm/amdkfd: Skip packet submission on fatal error If fatal error is detected, packet submission won't go through. Return error in such cases. Also, avoid waiting for fence when fatal error is detect drm/amdkfd: Skip packet submission on fatal error If fatal error is detected, packet submission won't go through. Return error in such cases. Also, avoid waiting for fence when fatal error is detected. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2
# db2aad03	25-Jan-2024	Le Ma <[email protected]>	drm/amdgpu: move the drm client creation behind drm device registration This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt c drm/amdgpu: move the drm client creation behind drm device registration This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow. v2: wrap the drm client creation Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# c0125b84	25-Jan-2024	Le Ma <[email protected]>	drm/amdgpu: move the drm client creation behind drm device registration This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt c drm/amdgpu: move the drm client creation behind drm device registration This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow. v2: wrap the drm client creation Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
# ed1e1e42	23-Jan-2024	YiPeng Chai <[email protected]>	drm/amdgpu: Support passing poison consumption ras block to SRIOV Support passing poison consumption ras blocks to SRIOV. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang drm/amdgpu: Support passing poison consumption ras block to SRIOV Support passing poison consumption ras blocks to SRIOV. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> show more ...
12 3 4 5 6 7 8 9