|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
77613a2e |
| 06-Mar-2025 |
Matthew Brost <[email protected]> |
drm/xe/uapi: Add DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag, which indicates whether the device supports CPU address mirrorin
drm/xe/uapi: Add DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag, which indicates whether the device supports CPU address mirroring. The intent is for UMDs to use this query to determine if a VM can be set up with CPU address mirroring. This flag is implemented by checking if the device supports GPU faults.
v7: - Only report enabled if CONFIG_DRM_GPUSVM is selected (CI)
Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Himal Prasad Ghimiray <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Reviewed-by: Tejas Upadhyay <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
| #
b43e864a |
| 06-Mar-2025 |
Matthew Brost <[email protected]> |
drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to create unpopulated virtual memory areas (VMAs) without memory backing or GPU p
drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to create unpopulated virtual memory areas (VMAs) without memory backing or GPU page tables. These VMAs are referred to as CPU address mirror VMAs. The idea is that upon a page fault or prefetch, the memory backing and GPU page tables will be populated.
CPU address mirror VMAs only update GPUVM state; they do not have an internal page table (PT) state, nor do they have GPU mappings.
It is expected that CPU address mirror VMAs will be mixed with buffer object (BO) VMAs within a single VM. In other words, system allocations and runtime allocations can be mixed within a single user-mode driver (UMD) program.
Expected usage:
- Bind the entire virtual address (VA) space upon program load using the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - If a buffer object (BO) requires GPU mapping (runtime allocation), allocate a CPU address using mmap(PROT_NONE), bind the BO to the mmapped address using existing bind IOCTLs. If a CPU map of the BO is needed, mmap it again to the same CPU address using mmap(MAP_FIXED) - If a BO no longer requires GPU mapping, munmap it from the CPU address space and them bind the mapping address with the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - Any malloc'd or mmapped CPU address accessed by the GPU will be faulted in via the SVM implementation (system allocation). - Upon freeing any mmapped or malloc'd data, the SVM implementation will remove GPU mappings.
Only supporting 1 to 1 mapping between user address space and GPU address space at the moment as that is the expected use case. uAPI defines interface for non 1 to 1 but enforces 1 to 1, this restriction can be lifted if use cases arrise for non 1 to 1 mappings.
This patch essentially short-circuits the code in the existing VM bind paths to avoid populating page tables when the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set.
v3: - Call vm_bind_ioctl_ops_fini on -ENODATA - Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs - s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas) - Rework commit message for expected usage (Thomas) - Describe state of code after patch in commit message (Thomas) v4: - Fix alignment (Checkpatch)
Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Reviewed-by: Himal Prasad Ghimiray <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
5488bec9 |
| 28-Feb-2025 |
Tejas Upadhyay <[email protected]> |
drm/xe/uapi: Use hint for guc to set GT frequency
Allow user to provide a low latency hint. When set, KMD sends a hint to GuC which results in special handling for that process. SLPC will ramp the G
drm/xe/uapi: Use hint for guc to set GT frequency
Allow user to provide a low latency hint. When set, KMD sends a hint to GuC which results in special handling for that process. SLPC will ramp the GT frequency aggressively every time it switches to this process.
We need to enable the use of SLPC Compute strategy during init, but it will apply only to processes that set this bit during process creation.
Improvement with this approach as below:
Before,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz Kernel launch latency : 283.16 us
After,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz
Kernel launch latency : 63.38 us
Compute PR: https://github.com/intel/compute-runtime/pull/794 Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214 IGT PR: https://patchwork.freedesktop.org/patch/639989/
V10(Lucas): - Remove doc from drm-uapi.rst v9(Vinay): - remove extra line, align commit message v8(Vinay): - Add separate example for using low latency hint v7(Jose): - Update UMD PR - applicable to all gpus V6: - init flags, remove redundant flags check (MAuld) V5: - Move uapi doc to documentation and GuC ABI specific change (Rodrigo) - Modify logic to restrict exec queue flags (MAuld) V4: - To make it clear, dont use exec queue word (Vinay) - Correct typo in description of flag (Jose/Vinay) - rename set_strategy api and replace ctx with exec queue(Vinay) - Start with 0th bit to indentify user flags (Jose) V3: - Conver user flag to kernel internal flag and use (Oak) - Support query config for use to check kernel support (Jose) - Dont need to take runtime pm (Vinay) V2: - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon) - Add motivation to description (Lucas)
Acked-by: Lucas De Marchi <[email protected]> Reviewed-by: Vinay Belgaumkar <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Tejas Upadhyay <[email protected]>
show more ...
|
| #
cd5bbb25 |
| 26-Feb-2025 |
Harish Chegondi <[email protected]> |
drm/xe/uapi: Add a device query to get EU stall sampling information
User space can get the EU stall data record size, EU stall capabilities, EU stall sampling rates, and per XeCore buffer size with
drm/xe/uapi: Add a device query to get EU stall sampling information
User space can get the EU stall data record size, EU stall capabilities, EU stall sampling rates, and per XeCore buffer size with query IOCTL DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL. A struct drm_xe_query_eu_stall will be returned to the user space along with an array of supported sampling rates sorted in the fastest sampling rate first order. sampling_rates in struct drm_xe_query_eu_stall will point to the array of sampling rates.
Any capabilities in EU stall sampling as of this patch are considered as base capabilities. New capability bits will be added for any new functionality added later.
v12: Rename has_eu_stall_sampling_support() to xe_eu_stall_supported_on_platform() and move it to header file. v11: Check if EU stall sampling is supported on the platform. v10: Change comments and variable names as per feedback v9: Move reserved fields above num_sampling_rates in struct drm_xe_query_eu_stall. v7: Change sampling_rates from a pointer to flexible array. v6: Include EU stall sampling rates information and per XeCore buffer size in the query information.
Reviewed-by: Ashutosh Dixit <[email protected]> Signed-off-by: Harish Chegondi <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/67ba42796a5a99d648239c315694cd222812a49b.1740533885.git.harish.chegondi@intel.com
show more ...
|
| #
1537ec85 |
| 26-Feb-2025 |
Harish Chegondi <[email protected]> |
drm/xe/uapi: Introduce API for EU stall sampling
A new hardware feature first introduced in PVC gives capability to periodically sample EU stall state and record counts for different stall reasons,
drm/xe/uapi: Introduce API for EU stall sampling
A new hardware feature first introduced in PVC gives capability to periodically sample EU stall state and record counts for different stall reasons, on a per IP basis, aggregate across all EUs in a subslice and record the samples in a buffer in each subslice. Eventually, the aggregated data is written out to a buffer in the memory. This feature is also supported in XE2 and later architecture GPUs.
Use an existing IOCTL - DRM_IOCTL_XE_OBSERVATION as the interface into the driver from the user space to do initial setup and obtain a file descriptor for the EU stall data stream. Input parameter to the IOCTL is a struct drm_xe_observation_param in which observation_type should be set to DRM_XE_OBSERVATION_TYPE_EU_STALL, observation_op should be DRM_XE_OBSERVATION_OP_STREAM_OPEN and param should point to a chain of drm_xe_ext_set_property structures in which each structure has a pair of property and value. The EU stall sampling input properties are defined in drm_xe_eu_stall_property_id enum.
With the file descriptor obtained from DRM_IOCTL_XE_OBSERVATION, user space can enable and disable EU stall sampling with the IOCTLs: DRM_XE_OBSERVATION_IOCTL_ENABLE and DRM_XE_OBSERVATION_IOCTL_DISABLE. User space can also call poll() to check for availability of data in the buffer. The data can be read with read(). Finally, the file descriptor can be closed with close().
v11: Changed a couple of variables in struct eu_stall_open_properties from unsigned int to int. v10: Use extension number while parsing chain of extensions. Remove function description for static functions. Move code around as per review feedback. v9: Changed some u32 to unsigned int. Moved some code around as per review feedback from v8. v8: Used div_u64 instead of / to fix 32-bit build issue. Changed copyright year in xe_eu_stall.c/h to 2025. v7: Renamed input property DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT to DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS to be consistent with OA. Renamed the corresponding internal variables. Fixed some commit messages based on review feedback. v6: Change the input sampling rate to GPU cycles instead of GPU cycles multiplier.
Reviewed-by: Ashutosh Dixit <[email protected]> Signed-off-by: Harish Chegondi <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/bb707a27975c33e4a912b9839b023acb7a1f9c90.1740533885.git.harish.chegondi@intel.com
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
41a97c4a |
| 29-Jan-2025 |
Daniele Ceraolo Spurio <[email protected]> |
drm/xe/pxp/uapi: Add API to mark a BO as using PXP
The driver needs to know if a BO is encrypted with PXP to enable the display decryption at flip time. Furthermore, we want to keep track of the sta
drm/xe/pxp/uapi: Add API to mark a BO as using PXP
The driver needs to know if a BO is encrypted with PXP to enable the display decryption at flip time. Furthermore, we want to keep track of the status of the encryption and reject any operation that involves a BO that is encrypted using an old key. There are two points in time where such checks can kick in:
1 - at VM bind time, all operations except for unmapping will be rejected if the key used to encrypt the BO is no longer valid. This check is opt-in via a new VM_BIND flag, to avoid a scenario where a malicious app purposely shares an invalid BO with a non-PXP aware app (such as a compositor). If the VM_BIND was failed, the compositor would be unable to display anything at all. Allowing the bind to go through means that output still works, it just displays garbage data within the bounds of the illegal BO.
2 - at job submission time, if the queue is marked as using PXP, all objects bound to the VM will be checked and the submission will be rejected if any of them was encrypted with a key that is no longer valid.
Note that there is no risk of leaking the encrypted data if a user does not opt-in to those checks; the only consequence is that the user will not realize that the encryption key is changed and that the data is no longer valid.
v2: Better commnnts and descriptions (John), rebase
v3: Properly return the result of key_assign up the stack, do not use xe_bo in display headers (Jani)
v4: improve key_instance variable documentation (John)
Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: John Harrison <[email protected]> Cc: Jani Nikula <[email protected]> Reviewed-by: John Harrison <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
| #
bd98ac2e |
| 29-Jan-2025 |
Daniele Ceraolo Spurio <[email protected]> |
drm/xe/pxp/uapi: Add a query for PXP status
PXP prerequisites (SW proxy and HuC auth via GSC) are completed asynchronously from driver load, which means that userspace can start submitting before we
drm/xe/pxp/uapi: Add a query for PXP status
PXP prerequisites (SW proxy and HuC auth via GSC) are completed asynchronously from driver load, which means that userspace can start submitting before we're ready to start a PXP session. Therefore, we need a query that userspace can use to check not only if PXP is supported but also to wait until the prerequisites are done.
v2: Improve doc, do not report TYPE_NONE as supported (José) v3: Better comments, remove unneeded copy_from_user (John)
Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Cc: José Roberto de Souza <[email protected]> Cc: John Harrison <[email protected]> Reviewed-by: John Harrison <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
| #
72d47960 |
| 29-Jan-2025 |
Daniele Ceraolo Spurio <[email protected]> |
drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues
Userspace is required to mark a queue as using PXP to guarantee that the PXP instructions will work. In addition to managing the P
drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues
Userspace is required to mark a queue as using PXP to guarantee that the PXP instructions will work. In addition to managing the PXP sessions, when a PXP queue is created the driver will set the relevant bits in its context control register.
On submission of a valid PXP queue, the driver will validate all encrypted objects mapped to the VM to ensured they were encrypted with the current key.
v2: Remove pxp_types include outside of PXP code (Jani), better comments and code cleanup (John)
v3: split the internal PXP management to a separate patch for ease of review. re-order ioctl checks to always return -EINVAL if parameters are invalid, rebase on msix changes.
Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Cc: John Harrison <[email protected]> Reviewed-by: John Harrison <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.13 |
|
| #
a46ea12e |
| 17-Jan-2025 |
Rodrigo Vivi <[email protected]> |
drm/xe/uapi: Fix documentation indentation
Fix these issues:
Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:817: WARNING: +Bullet list ends without a blank line; unexpected unindent. D
drm/xe/uapi: Fix documentation indentation
Fix these issues:
Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:817: WARNING: +Bullet list ends without a blank line; unexpected unindent. Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:835: WARNING: +Definition list ends without a blank line; unexpected unindent.
Fixes: 75d37750a753 ("drm/xe/mmap: Add mmap support for PCI memory barrier") Reported-by: Stephen Rothwell <[email protected]> Closes: https://lore.kernel.org/intel-xe/[email protected]/ Cc: Tejas Upadhyay <[email protected]> Tested-by: Bagas Sanjaya <[email protected]> Reviewed-by: Tejas Upadhyay <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]>
show more ...
|
| #
75d37750 |
| 13-Jan-2025 |
Tejas Upadhyay <[email protected]> |
drm/xe/mmap: Add mmap support for PCI memory barrier
In order to avoid having userspace to use MI_MEM_FENCE, we are adding a mechanism for userspace to generate a PCI memory barrier with low overhea
drm/xe/mmap: Add mmap support for PCI memory barrier
In order to avoid having userspace to use MI_MEM_FENCE, we are adding a mechanism for userspace to generate a PCI memory barrier with low overhead (avoiding IOCTL call as well as writing to VRAM will adds some overhead).
This is implemented by memory-mapping a page as uncached that is backed by MMIO on the dGPU and thus allowing userspace to do memory write to the page without invoking an IOCTL. We are selecting the MMIO so that it is not accessible from the PCI bus so that the MMIO writes themselves are ignored, but the PCI memory barrier will still take action as the MMIO filtering will happen after the memory barrier effect.
When we detect special defined offset in mmap(), We are mapping 4K page which contains the last of page of doorbell MMIO range to userspace for same purpose.
For user to query special offset we are adding special flag in mmap_offset ioctl which needs to be passed as follows, struct drm_xe_gem_mmap_offset mmo = { .handle = 0, /* this must be 0 */ .flags = DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER, }; igt_ioctl(fd, DRM_IOCTL_XE_GEM_MMAP_OFFSET, &mmo); map = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, mmo);
IGT : https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/commit/b2dbc6f22815128c0dd5c737504f42e1f1a6ad62 UMD : https://github.com/intel/compute-runtime/pull/772
V7: - Dgpu filter added V6(MAuld) - Move physical mmap to fault handler - Modify kernel-doc and attach UMD PR when ready V5(MAuld) - Return invalid early in case of non 4K PAGE_SIZE - Format kernel-doc and add note for 4K PAGE_SIZE HW limit V4(MAuld) - Add kernel-doc for uapi change - Restrict page size to 4K V3(MAuld) - Remove offset defination from UAPI to be able to change later - Edit commit message for special flag addition V2(MAuld) - Add fault handler with dummy page to handle unplug device - Add Build check for special offset to be below normal start page - Test d3hot, mapping seems to be valid in d3hot as well - Add more info to commit message
Cc: Matthew Auld <[email protected]> Acked-by: Michal Mrozek <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Signed-off-by: Tejas Upadhyay <[email protected]> Signed-off-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
5637797a |
| 12-Dec-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/oa/uapi: Expose an unblock after N reports OA property
Expose an "unblock after N reports" OA property, to allow userspace threads to be woken up less frequently.
Co-developed-by: Umesh Nerl
drm/xe/oa/uapi: Expose an unblock after N reports OA property
Expose an "unblock after N reports" OA property, to allow userspace threads to be woken up less frequently.
Co-developed-by: Umesh Nerlige Ramappa <[email protected]> Signed-off-by: Umesh Nerlige Ramappa <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Reviewed-by: Umesh Nerlige Ramappa <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.13-rc2 |
|
| #
720f63a8 |
| 05-Dec-2024 |
Sai Teja Pottumuttu <[email protected]> |
drm/xe/oa/uapi: Make OA buffer size configurable
Add a new property called DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE to allow OA buffer size to be configurable from userspace.
With this OA buffer size can
drm/xe/oa/uapi: Make OA buffer size configurable
Add a new property called DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE to allow OA buffer size to be configurable from userspace.
With this OA buffer size can be configured to any power of 2 size between 128KB and 128MB and it would default to 16MB in case the size is not supplied.
v2: - Rebase v3: - Add oa buffer size to capabilities [Ashutosh] - Address several nitpicks [Ashutosh] - Fix commit message/subject [Ashutosh]
BSpec: 61100, 61228 Signed-off-by: Sai Teja Pottumuttu <[email protected]> Reviewed-by: Ashutosh Dixit <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
c8507a25 |
| 22-Oct-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/oa/uapi: Define and parse OA sync properties
Now that we have laid the groundwork, introduce OA sync properties in the uapi and parse the input xe_sync array as is done elsewhere in the drive
drm/xe/oa/uapi: Define and parse OA sync properties
Now that we have laid the groundwork, introduce OA sync properties in the uapi and parse the input xe_sync array as is done elsewhere in the driver. Also add DRM_XE_OA_CAPS_SYNCS bit in OA capabilities for userspace.
v2: Fix and document DRM_XE_SYNC_TYPE_USER_FENCE for OA (Matt B) Add DRM_XE_OA_CAPS_SYNCS bit to OA capabilities (Jose)
Acked-by: José Roberto de Souza <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3 |
|
| #
9ab440a9 |
| 07-Oct-2024 |
Shekhar Chauhan <[email protected]> |
drm/xe/ptl: L3bank mask is not available on the media GT
On PTL platforms with media version 30.00, the fuse registers for reporting L3 bank availability to the GT just read out as ~0 and do not pro
drm/xe/ptl: L3bank mask is not available on the media GT
On PTL platforms with media version 30.00, the fuse registers for reporting L3 bank availability to the GT just read out as ~0 and do not provide proper values. Xe does not use the L3 bank mask for anything internally; it only passes the mask through to userspace via the GT topology query.
Since we don't have any way to get the real L3 bank mask, we don't want to pass garbage to userspace. Passing a zeroed mask or a copy of the primary GT's L3 bank mask would also be inaccurate and likely to cause confusion for userspace. The best approach is to simply not include L3 in the list of masks returned by the topology query in cases where we aren't able to provide a meaningful value. This won't change the behavior for any existing platforms (where we can always obtain L3 masks successfully for all GTs), it will only prevent us from mis-reporting bad information on upcoming platform(s).
There's a good chance this will become a formal workaround in the future, but for now we don't have a lineage number so "no_media_l3" is used in place of a lineage as the OOB workaround descriptor.
v2: - Re-calculate query size to properly match data returned. (Gustavo) - Update kerneldoc to clarify that the L3bank mask may not be included in the query results if the hardware doesn't make it available. (Gustavo)
Cc: Matt Atwood <[email protected]> Cc: Gustavo Sousa <[email protected]> Signed-off-by: Shekhar Chauhan <[email protected]> Co-developed-by: Matt Roper <[email protected]> Signed-off-by: Matt Roper <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Reviewed-by: Gustavo Sousa <[email protected]> Acked-by: Francois Dugast <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
ad614a70 |
| 29-Jul-2024 |
Geert Uytterhoeven <[email protected]> |
drm/xe/oa/uapi: Make bit masks unsigned
When building with gcc-5:
In function ‘decode_oa_format.isra.26’, inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6: ././
drm/xe/oa/uapi: Make bit masks unsigned
When building with gcc-5:
In function ‘decode_oa_format.isra.26’, inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6: ././include/linux/compiler_types.h:510:38: error: call to ‘__compiletime_assert_1336’ declared with attribute error: FIELD_GET: mask is not constant [...] ./include/linux/bitfield.h:155:3: note: in expansion of macro ‘__BF_FIELD_CHECK’ __BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \ ^ drivers/gpu/drm/xe/xe_oa.c:1573:18: note: in expansion of macro ‘FIELD_GET’ u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt); ^
Fixes: b6fd51c62119 ("drm/xe/oa/uapi: Define and parse OA stream properties") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit f2881dfdaaa9ec873dbd383ef5512fc31e576cbb) Signed-off-by: Rodrigo Vivi <[email protected]>
show more ...
|
| #
f2881dfd |
| 29-Jul-2024 |
Geert Uytterhoeven <[email protected]> |
drm/xe/oa/uapi: Make bit masks unsigned
When building with gcc-5:
In function ‘decode_oa_format.isra.26’, inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6: ././
drm/xe/oa/uapi: Make bit masks unsigned
When building with gcc-5:
In function ‘decode_oa_format.isra.26’, inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6: ././include/linux/compiler_types.h:510:38: error: call to ‘__compiletime_assert_1336’ declared with attribute error: FIELD_GET: mask is not constant [...] ./include/linux/bitfield.h:155:3: note: in expansion of macro ‘__BF_FIELD_CHECK’ __BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \ ^ drivers/gpu/drm/xe/xe_oa.c:1573:18: note: in expansion of macro ‘FIELD_GET’ u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt); ^
Fixes: b6fd51c62119 ("drm/xe/oa/uapi: Define and parse OA stream properties") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10 |
|
| #
7108b4a5 |
| 10-Jul-2024 |
Lucas De Marchi <[email protected]> |
drm/xe/uapi: Expose SIMD16 EU mask in topology query
PVC, Xe2 and later platforms have 16-wide EUs. We were implicitly reporting for PVC the number of 16-wide EUs without giving userspace any hint t
drm/xe/uapi: Expose SIMD16 EU mask in topology query
PVC, Xe2 and later platforms have 16-wide EUs. We were implicitly reporting for PVC the number of 16-wide EUs without giving userspace any hint that they were different than for other platforms. Xe2 and later also have 16-wide, but in those cases the reported number would correspond to the 8-wide count.
To avoid confusion and make sure the right number is used by userspace depending on the platform, add a new item to the topology query and drop the one that is not available. The new mask reported for both PVC and Xe2 should now match the numbers reported via hwconfig.
v2: Use a different topo item with EU type in its name to report the new mask instead of adding the type itself as the item (Matt Roper)
Reviewed-by: Matt Roper <[email protected]> Acked-by: José Roberto de Souza <[email protected]> Acked-by: Mateusz Jablonski <[email protected]> Acked-by: Wenbin Lu <[email protected]> Acked-by: Effie Yu <[email protected]> Acked-by: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7 |
|
| #
5207c393 |
| 05-Jul-2024 |
Thomas Hellström <[email protected]> |
drm/xe: Use write-back caching mode for system memory on DGFX
The caching mode for buffer objects with VRAM as a possible placement was forced to write-combined, regardless of placement.
However, w
drm/xe: Use write-back caching mode for system memory on DGFX
The caching mode for buffer objects with VRAM as a possible placement was forced to write-combined, regardless of placement.
However, write-combined system memory is expensive to allocate and even though it is pooled, the pool is expensive to shrink, since it involves global CPU TLB flushes.
Moreover write-combined system memory from TTM is only reliably available on x86 and DGFX doesn't have an x86 restriction.
So regardless of the cpu caching mode selected for a bo, internally use write-back caching mode for system memory on DGFX.
Coherency is maintained, but user-space clients may perceive a difference in cpu access speeds.
v2: - Update RB- and Ack tags. - Rephrase wording in xe_drm.h (Matt Roper) v3: - Really rephrase wording.
Signed-off-by: Thomas Hellström <[email protected]> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Cc: Pallavi Mishra <[email protected]> Cc: Matthew Auld <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Effie Yu <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Jose Souza <[email protected]> Cc: Michal Mrozek <[email protected]> Cc: <[email protected]> # v6.8+ Acked-by: Matthew Auld <[email protected]> Acked-by: José Roberto de Souza <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Acked-by: Michal Mrozek <[email protected]> Acked-by: Effie Yu <[email protected]> #On chat Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 01e0cfc994be484ddcb9e121e353e51d8bb837c0) Signed-off-by: Lucas De Marchi <[email protected]>
show more ...
|
| #
63347fe0 |
| 03-Jul-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/uapi: Rename xe perf layer as xe observation layer
In Xe, the perf layer allows capture of HW counter streams. These HW counters are generally performance related but don't have to be necessa
drm/xe/uapi: Rename xe perf layer as xe observation layer
In Xe, the perf layer allows capture of HW counter streams. These HW counters are generally performance related but don't have to be necessarily so. Also, the name "perf" is a carryover from i915 and is not preferred.
Here we propose the name "observation" for this common layer which allows capture of different types of these counter streams.
v2: Rename observability layer to observation layer (Lucas/Rodrigo) v3: Rename sysctl file to "observation_paranoid" (Jose)
Fixes: 52c2e956dceb ("drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types") Fixes: fe8929bdf835 ("drm/xe/perf/uapi: Add perf_stream_paranoid sysctl") Acked-by: Lucas De Marchi <[email protected]> Acked-by: Rodrigo Vivi <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Reviewed-by: Umesh Nerlige Ramappa <[email protected]> Acked-by: José Roberto de Souza <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 8169b2097d88d99d7e4a72e20e4b549efe9eb8d7) Signed-off-by: Rodrigo Vivi <[email protected]>
show more ...
|
| #
01e0cfc9 |
| 05-Jul-2024 |
Thomas Hellström <[email protected]> |
drm/xe: Use write-back caching mode for system memory on DGFX
The caching mode for buffer objects with VRAM as a possible placement was forced to write-combined, regardless of placement.
However, w
drm/xe: Use write-back caching mode for system memory on DGFX
The caching mode for buffer objects with VRAM as a possible placement was forced to write-combined, regardless of placement.
However, write-combined system memory is expensive to allocate and even though it is pooled, the pool is expensive to shrink, since it involves global CPU TLB flushes.
Moreover write-combined system memory from TTM is only reliably available on x86 and DGFX doesn't have an x86 restriction.
So regardless of the cpu caching mode selected for a bo, internally use write-back caching mode for system memory on DGFX.
Coherency is maintained, but user-space clients may perceive a difference in cpu access speeds.
v2: - Update RB- and Ack tags. - Rephrase wording in xe_drm.h (Matt Roper) v3: - Really rephrase wording.
Signed-off-by: Thomas Hellström <[email protected]> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Cc: Pallavi Mishra <[email protected]> Cc: Matthew Auld <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Effie Yu <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Jose Souza <[email protected]> Cc: Michal Mrozek <[email protected]> Cc: <[email protected]> # v6.8+ Acked-by: Matthew Auld <[email protected]> Acked-by: José Roberto de Souza <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Acked-by: Michal Mrozek <[email protected]> Acked-by: Effie Yu <[email protected]> #On chat Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
| #
8169b209 |
| 03-Jul-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/uapi: Rename xe perf layer as xe observation layer
In Xe, the perf layer allows capture of HW counter streams. These HW counters are generally performance related but don't have to be necessa
drm/xe/uapi: Rename xe perf layer as xe observation layer
In Xe, the perf layer allows capture of HW counter streams. These HW counters are generally performance related but don't have to be necessarily so. Also, the name "perf" is a carryover from i915 and is not preferred.
Here we propose the name "observation" for this common layer which allows capture of different types of these counter streams.
v2: Rename observability layer to observation layer (Lucas/Rodrigo) v3: Rename sysctl file to "observation_paranoid" (Jose)
Fixes: 52c2e956dceb ("drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types") Fixes: fe8929bdf835 ("drm/xe/perf/uapi: Add perf_stream_paranoid sysctl") Acked-by: Lucas De Marchi <[email protected]> Acked-by: Rodrigo Vivi <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Reviewed-by: Umesh Nerlige Ramappa <[email protected]> Acked-by: José Roberto de Souza <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
|
Revision tags: v6.10-rc6 |
|
| #
406d058d |
| 26-Jun-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/oa/uapi: Allow preemption to be disabled on the stream exec queue
Mesa VK_KHR_performance_query use case requires preemption and timeslicing to be disabled for the stream exec queue. Implemen
drm/xe/oa/uapi: Allow preemption to be disabled on the stream exec queue
Mesa VK_KHR_performance_query use case requires preemption and timeslicing to be disabled for the stream exec queue. Implement this functionality here.
v2: Minor change to debug print to print both ret values (Umesh)
Acked-by: José Roberto de Souza <[email protected]> Reviewed-by: Umesh Nerlige Ramappa <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc5 |
|
| #
7e5161da |
| 23-Jun-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/oa: Fix kernel doc in xe_drm.h
Fix kernel doc in xe_drm.h. Also eliminate private/non-abi enum definitions.
v2: Remove __DRM_XE_PERF_TYPE_MAX since it is unused (Michal) v3: Also remove DRM_
drm/xe/oa: Fix kernel doc in xe_drm.h
Fix kernel doc in xe_drm.h. Also eliminate private/non-abi enum definitions.
v2: Remove __DRM_XE_PERF_TYPE_MAX since it is unused (Michal) v3: Also remove DRM_XE_OA_PROPERTY_MAX since it can also be eliminated (Michal)
Suggested-by: Michal Wajdeczko <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Reviewed-by: Michal Wajdeczko <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]>
show more ...
|
| #
dd6b4718 |
| 18-Jun-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/oa/uapi: Query OA unit properties
Implement query for properties of OA units present on a device.
v2: Clean up reserved/pad fields (Umesh) Follow the same scheme as other query structs v
drm/xe/oa/uapi: Query OA unit properties
Implement query for properties of OA units present on a device.
v2: Clean up reserved/pad fields (Umesh) Follow the same scheme as other query structs v3: Skip reporting reserved engines attached to OA units v4: Expose oa_buf_size via DRM_XE_PERF_IOCTL_INFO (Umesh) v5: Don't expose capabilities as OR of properties (Umesh) v6: Add extensions to query output structs: drm_xe_oa_unit, drm_xe_query_oa_units and drm_xe_oa_stream_info v7: Change oa_units[] array to __u64 type
Acked-by: Rodrigo Vivi <[email protected]> Reviewed-by: Umesh Nerlige Ramappa <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|
| #
efb315d0 |
| 18-Jun-2024 |
Ashutosh Dixit <[email protected]> |
drm/xe/oa/uapi: Read file_operation
Implement the OA stream read file_operation. Both blocking and non-blocking reads are supported. As part of read system call, the read copies OA perf data from th
drm/xe/oa/uapi: Read file_operation
Implement the OA stream read file_operation. Both blocking and non-blocking reads are supported. As part of read system call, the read copies OA perf data from the OA buffer to the user buffer, after appending packet headers for status and data packets.
v2: Drop OA report headers, implement DRM_XE_PERF_IOCTL_STATUS (Umesh) v3: Introduce 'struct drm_xe_oa_stream_status' v4: Define oa_status register bitfields (Umesh) v5: Add extensions to 'struct drm_xe_oa_stream_status' v6: Minor cleanup, eliminate report32 variable v7: Use -EIO to signal to userspace to read OASTATUS using DRM_XE_PERF_IOCTL_STATUS, change previous sites returning -EIO to return -EINVAL Make drm_xe_oa_stream_status bits contiguous (Jose, Umesh) rmw oa_status bits (Umesh)
Acked-by: Rodrigo Vivi <[email protected]> Acked-by: José Roberto de Souza <[email protected]> Reviewed-by: Umesh Nerlige Ramappa <[email protected]> Signed-off-by: Ashutosh Dixit <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
show more ...
|