|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10 |
|
| #
7296f230 |
| 08-Jul-2024 |
Michael Kelley <[email protected]> |
swiotlb: reduce swiotlb pool lookups
With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple places, the pool is fou
swiotlb: reduce swiotlb pool lookups
With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple places, the pool is found and used in one function, and then must be found again in the next function that is called because only the tlb_addr is passed as an argument. These are the six call sites:
dma_direct_map_page: 1. swiotlb_map -> swiotlb_tbl_map_single -> swiotlb_bounce
dma_direct_unmap_page: 2. dma_direct_sync_single_for_cpu -> is_swiotlb_buffer 3. dma_direct_sync_single_for_cpu -> swiotlb_sync_single_for_cpu -> swiotlb_bounce 4. is_swiotlb_buffer 5. swiotlb_tbl_unmap_single -> swiotlb_del_transient 6. swiotlb_tbl_unmap_single -> swiotlb_release_slots
Reduce the number of calls by finding the pool at a higher level, and passing it as an argument instead of searching again. A key change is for is_swiotlb_buffer() to return a pool pointer instead of a boolean, and then pass this pool pointer to subsequent swiotlb functions.
There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer is a swiotlb buffer before calling a swiotlb function. To reduce code duplication in getting the pool pointer and passing it as an argument, introduce inline wrappers for this pattern. The generated code is essentially unchanged.
Since is_swiotlb_buffer() no longer returns a boolean, rename some functions to reflect the change:
* swiotlb_find_pool() becomes __swiotlb_find_pool() * is_swiotlb_buffer() becomes swiotlb_find_pool() * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()
With these changes, a round-trip map/unmap pair requires only 2 pool lookups (listed using the new names and wrappers):
dma_direct_unmap_page: 1. dma_direct_sync_single_for_cpu -> swiotlb_find_pool 2. swiotlb_tbl_unmap_single -> swiotlb_find_pool
These changes come from noticing the inefficiencies in a code review, not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC, __swiotlb_find_pool() is not trivial, and it uses an RCU read lock, so avoiding the redundant calls helps performance in a hot path. When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction is minimal and the perf benefits are likely negligible, but no harm is done.
No functional change is intended.
Signed-off-by: Michael Kelley <[email protected]> Reviewed-by: Petr Tesarik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6 |
|
| #
a409d960 |
| 28-Oct-2023 |
Jia He <[email protected]> |
dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM
There is an unusual case that the range map covers right up to the top of system RAM, but leaves a hole somewher
dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM
There is an unusual case that the range map covers right up to the top of system RAM, but leaves a hole somewhere lower down. Then it prevents the nvme device dma mapping in the checking path of phys_to_dma() and causes the hangs at boot.
E.g. On an Armv8 Ampere server, the dsdt ACPI table is: Method (_DMA, 0, Serialized) // _DMA: Direct Memory Access { Name (RBUF, ResourceTemplate () { QWordMemory (ResourceConsumer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x0000000000000000, // Granularity 0x0000000000000000, // Range Minimum 0x00000000FFFFFFFF, // Range Maximum 0x0000000000000000, // Translation Offset 0x0000000100000000, // Length ,, , AddressRangeMemory, TypeStatic) QWordMemory (ResourceConsumer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x0000000000000000, // Granularity 0x0000006010200000, // Range Minimum 0x000000602FFFFFFF, // Range Maximum 0x0000000000000000, // Translation Offset 0x000000001FE00000, // Length ,, , AddressRangeMemory, TypeStatic) QWordMemory (ResourceConsumer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x0000000000000000, // Granularity 0x00000060F0000000, // Range Minimum 0x00000060FFFFFFFF, // Range Maximum 0x0000000000000000, // Translation Offset 0x0000000010000000, // Length ,, , AddressRangeMemory, TypeStatic) QWordMemory (ResourceConsumer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x0000000000000000, // Granularity 0x0000007000000000, // Range Minimum 0x000003FFFFFFFFFF, // Range Maximum 0x0000000000000000, // Translation Offset 0x0000039000000000, // Length ,, , AddressRangeMemory, TypeStatic) })
But the System RAM ranges are: cat /proc/iomem |grep -i ram 90000000-91ffffff : System RAM 92900000-fffbffff : System RAM 880000000-fffffffff : System RAM 8800000000-bff5990fff : System RAM bff59d0000-bff5a4ffff : System RAM bff8000000-bfffffffff : System RAM So some RAM ranges are out of dma_range_map.
Fix it by checking whether each of the system RAM resources can be properly encompassed within the dma_range_map.
Signed-off-by: Jia He <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7 |
|
| #
370645f4 |
| 12-Jun-2023 |
Catalin Marinas <[email protected]> |
dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned
For direct DMA, if the size is small enough to have originated from a kmalloc() cache below ARCH_DMA_MINALIGN, check its a
dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned
For direct DMA, if the size is small enough to have originated from a kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against dma_get_cache_alignment() and bounce if necessary. For larger sizes, it is the responsibility of the DMA API caller to ensure proper alignment.
At this point, the kmalloc() caches are properly aligned but this will change in a subsequent patch.
Architectures can opt in by selecting DMA_BOUNCE_UNALIGNED_KMALLOC.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Tested-by: Isaac J. Manjarres <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Jerry Snitselaar <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Lars-Peter Clausen <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Saravana Kannan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6 |
|
| #
f02ad36d |
| 08-Jul-2022 |
Logan Gunthorpe <[email protected]> |
dma-direct: support PCI P2PDMA pages in dma-direct map_sg
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map PCI P2PDMA pages directly without a hack in the callers. This allows for h
dma-direct: support PCI P2PDMA pages in dma-direct map_sg
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map PCI P2PDMA pages directly without a hack in the callers. This allows for heterogeneous SGLs that contain both P2PDMA and regular pages.
A P2PDMA page may have three possible outcomes when being mapped: 1) If the data path between the two devices doesn't go through the root port, then it should be mapped with a PCI bus address 2) If the data path goes through the host bridge, it should be mapped normally, as though it were a CPU physical address 3) It is not possible for the two devices to communicate and thus the mapping operation should fail (and it will return -EREMOTEIO).
SGL segments that contain PCI bus addresses are marked with sg_dma_mark_pci_p2pdma() and are ignored when unmapped.
P2PDMA mappings are also failed if swiotlb needs to be used on the mapping.
Signed-off-by: Logan Gunthorpe <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5 |
|
| #
07410559 |
| 14-Feb-2022 |
Christoph Hellwig <[email protected]> |
dma-direct: use is_swiotlb_active in dma_direct_map_page
Use the more specific is_swiotlb_active check instead of checking the global swiotlb_force variable.
Signed-off-by: Christoph Hellwig <hch@l
dma-direct: use is_swiotlb_active in dma_direct_map_page
Use the more specific is_swiotlb_active check instead of checking the global swiotlb_force variable.
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Boris Ostrovsky <[email protected]>
show more ...
|
| #
9e02977b |
| 13-Apr-2022 |
Chao Gao <[email protected]> |
dma-direct: avoid redundant memory sync for swiotlb
When we looked into FIO performance with swiotlb enabled in VM, we found swiotlb_bounce() is always called one more time than expected for each DM
dma-direct: avoid redundant memory sync for swiotlb
When we looked into FIO performance with swiotlb enabled in VM, we found swiotlb_bounce() is always called one more time than expected for each DMA read request.
It turns out that the bounce buffer is copied to original DMA buffer twice after the completion of a DMA request (one is done by in dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()). But the content in bounce buffer actually doesn't change between the two rounds of copy. So, one round of copy is redundant.
Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to skip the memory copy in it.
This fix increases FIO 64KB sequential read throughput in a guest with swiotlb=force by 5.6%.
Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code") Reported-by: Wang Zhaoyang1 <[email protected]> Reported-by: Gao Liang <[email protected]> Signed-off-by: Chao Gao <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13 |
|
| #
903cd0f3 |
| 24-Jun-2021 |
Claire Chang <[email protected]> |
swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and use it to determine whether to bounce the data or not. This will
swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and use it to determine whether to bounce the data or not. This will be useful later to allow for different pools.
Signed-off-by: Claire Chang <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Stefano Stabellini <[email protected]> Tested-by: Will Deacon <[email protected]> Acked-by: Stefano Stabellini <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
[v2: Includes Will's fix]
show more ...
|
|
Revision tags: v5.13-rc7 |
|
| #
7fd856aa |
| 19-Jun-2021 |
Claire Chang <[email protected]> |
swiotlb: Update is_swiotlb_buffer to add a struct device argument
Update is_swiotlb_buffer to add a struct device argument. This will be useful later to allow for different pools.
Signed-off-by: Cl
swiotlb: Update is_swiotlb_buffer to add a struct device argument
Update is_swiotlb_buffer to add a struct device argument. This will be useful later to allow for different pools.
Signed-off-by: Claire Chang <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Stefano Stabellini <[email protected]> Tested-by: Will Deacon <[email protected]> Acked-by: Stefano Stabellini <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2 |
|
| #
80808d27 |
| 01-Mar-2021 |
Christoph Hellwig <[email protected]> |
swiotlb: split swiotlb_tbl_sync_single
Split swiotlb_tbl_sync_single into two separate funtions for the to device and to cpu synchronization.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-of
swiotlb: split swiotlb_tbl_sync_single
Split swiotlb_tbl_sync_single into two separate funtions for the to device and to cpu synchronization.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
show more ...
|
| #
2973073a |
| 01-Mar-2021 |
Christoph Hellwig <[email protected]> |
swiotlb: remove the alloc_size parameter to swiotlb_tbl_unmap_single
Now that swiotlb remembers the allocation size there is no need to pass it back to swiotlb_tbl_unmap_single.
Signed-off-by: Chri
swiotlb: remove the alloc_size parameter to swiotlb_tbl_unmap_single
Now that swiotlb remembers the allocation size there is no need to pass it back to swiotlb_tbl_unmap_single.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7 |
|
| #
19c65c3d |
| 22-Sep-2020 |
Christoph Hellwig <[email protected]> |
dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma
Most of the dma_direct symbols should only be used by direct.c and mapping.c, so move them to kernel/dma. In fact more of dma-dir
dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma
Most of the dma_direct symbols should only be used by direct.c and mapping.c, so move them to kernel/dma. In fact more of dma-direct.h should eventually move, but that will require more coordination with other subsystems.
Signed-off-by: Christoph Hellwig <[email protected]>
show more ...
|