History log of /linux-6.15/kernel/dma/direct.h (Results 1 – 11 of 11)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10
# 7296f230 08-Jul-2024 Michael Kelley <[email protected]>

swiotlb: reduce swiotlb pool lookups

With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is fou

swiotlb: reduce swiotlb pool lookups

With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is found and used in one function, and then must
be found again in the next function that is called because only the
tlb_addr is passed as an argument. These are the six call sites:

dma_direct_map_page:
1. swiotlb_map -> swiotlb_tbl_map_single -> swiotlb_bounce

dma_direct_unmap_page:
2. dma_direct_sync_single_for_cpu -> is_swiotlb_buffer
3. dma_direct_sync_single_for_cpu -> swiotlb_sync_single_for_cpu ->
swiotlb_bounce
4. is_swiotlb_buffer
5. swiotlb_tbl_unmap_single -> swiotlb_del_transient
6. swiotlb_tbl_unmap_single -> swiotlb_release_slots

Reduce the number of calls by finding the pool at a higher level, and
passing it as an argument instead of searching again. A key change is
for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
and then pass this pool pointer to subsequent swiotlb functions.

There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer
is a swiotlb buffer before calling a swiotlb function. To reduce code
duplication in getting the pool pointer and passing it as an argument,
introduce inline wrappers for this pattern. The generated code is
essentially unchanged.

Since is_swiotlb_buffer() no longer returns a boolean, rename some
functions to reflect the change:

* swiotlb_find_pool() becomes __swiotlb_find_pool()
* is_swiotlb_buffer() becomes swiotlb_find_pool()
* is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()

With these changes, a round-trip map/unmap pair requires only 2 pool
lookups (listed using the new names and wrappers):

dma_direct_unmap_page:
1. dma_direct_sync_single_for_cpu -> swiotlb_find_pool
2. swiotlb_tbl_unmap_single -> swiotlb_find_pool

These changes come from noticing the inefficiencies in a code review,
not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
__swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
so avoiding the redundant calls helps performance in a hot path.
When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
is minimal and the perf benefits are likely negligible, but no
harm is done.

No functional change is intended.

Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Petr Tesarik <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

show more ...


Revision tags: v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6
# a409d960 28-Oct-2023 Jia He <[email protected]>

dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM

There is an unusual case that the range map covers right up to the top
of system RAM, but leaves a hole somewher

dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM

There is an unusual case that the range map covers right up to the top
of system RAM, but leaves a hole somewhere lower down. Then it prevents
the nvme device dma mapping in the checking path of phys_to_dma() and
causes the hangs at boot.

E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
Method (_DMA, 0, Serialized) // _DMA: Direct Memory Access
{
Name (RBUF, ResourceTemplate ()
{
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000000000000000, // Range Minimum
0x00000000FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000100000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000006010200000, // Range Minimum
0x000000602FFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x000000001FE00000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x00000060F0000000, // Range Minimum
0x00000060FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000010000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000007000000000, // Range Minimum
0x000003FFFFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000039000000000, // Length
,, , AddressRangeMemory, TypeStatic)
})

But the System RAM ranges are:
cat /proc/iomem |grep -i ram
90000000-91ffffff : System RAM
92900000-fffbffff : System RAM
880000000-fffffffff : System RAM
8800000000-bff5990fff : System RAM
bff59d0000-bff5a4ffff : System RAM
bff8000000-bfffffffff : System RAM
So some RAM ranges are out of dma_range_map.

Fix it by checking whether each of the system RAM resources can be
properly encompassed within the dma_range_map.

Signed-off-by: Jia He <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

show more ...


Revision tags: v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7
# 370645f4 12-Jun-2023 Catalin Marinas <[email protected]>

dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned

For direct DMA, if the size is small enough to have originated from a
kmalloc() cache below ARCH_DMA_MINALIGN, check its a

dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned

For direct DMA, if the size is small enough to have originated from a
kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against
dma_get_cache_alignment() and bounce if necessary. For larger sizes, it
is the responsibility of the DMA API caller to ensure proper alignment.

At this point, the kmalloc() caches are properly aligned but this will
change in a subsequent patch.

Architectures can opt in by selecting DMA_BOUNCE_UNALIGNED_KMALLOC.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Tested-by: Isaac J. Manjarres <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jerry Snitselaar <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Saravana Kannan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6
# f02ad36d 08-Jul-2022 Logan Gunthorpe <[email protected]>

dma-direct: support PCI P2PDMA pages in dma-direct map_sg

Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for h

dma-direct: support PCI P2PDMA pages in dma-direct map_sg

Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

A P2PDMA page may have three possible outcomes when being mapped:
1) If the data path between the two devices doesn't go through the
root port, then it should be mapped with a PCI bus address
2) If the data path goes through the host bridge, it should be mapped
normally, as though it were a CPU physical address
3) It is not possible for the two devices to communicate and thus
the mapping operation should fail (and it will return -EREMOTEIO).

SGL segments that contain PCI bus addresses are marked with
sg_dma_mark_pci_p2pdma() and are ignored when unmapped.

P2PDMA mappings are also failed if swiotlb needs to be used on the
mapping.

Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

show more ...


Revision tags: v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5
# 07410559 14-Feb-2022 Christoph Hellwig <[email protected]>

dma-direct: use is_swiotlb_active in dma_direct_map_page

Use the more specific is_swiotlb_active check instead of checking the
global swiotlb_force variable.

Signed-off-by: Christoph Hellwig <hch@l

dma-direct: use is_swiotlb_active in dma_direct_map_page

Use the more specific is_swiotlb_active check instead of checking the
global swiotlb_force variable.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Tested-by: Boris Ostrovsky <[email protected]>

show more ...


# 9e02977b 13-Apr-2022 Chao Gao <[email protected]>

dma-direct: avoid redundant memory sync for swiotlb

When we looked into FIO performance with swiotlb enabled in VM, we found
swiotlb_bounce() is always called one more time than expected for each DM

dma-direct: avoid redundant memory sync for swiotlb

When we looked into FIO performance with swiotlb enabled in VM, we found
swiotlb_bounce() is always called one more time than expected for each DMA
read request.

It turns out that the bounce buffer is copied to original DMA buffer twice
after the completion of a DMA request (one is done by in
dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
But the content in bounce buffer actually doesn't change between the two
rounds of copy. So, one round of copy is redundant.

Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
skip the memory copy in it.

This fix increases FIO 64KB sequential read throughput in a guest with
swiotlb=force by 5.6%.

Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
Reported-by: Wang Zhaoyang1 <[email protected]>
Reported-by: Gao Liang <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

show more ...


Revision tags: v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13
# 903cd0f3 24-Jun-2021 Claire Chang <[email protected]>

swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will

swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

[v2: Includes Will's fix]

show more ...


Revision tags: v5.13-rc7
# 7fd856aa 19-Jun-2021 Claire Chang <[email protected]>

swiotlb: Update is_swiotlb_buffer to add a struct device argument

Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Cl

swiotlb: Update is_swiotlb_buffer to add a struct device argument

Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

show more ...


Revision tags: v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2
# 80808d27 01-Mar-2021 Christoph Hellwig <[email protected]>

swiotlb: split swiotlb_tbl_sync_single

Split swiotlb_tbl_sync_single into two separate funtions for the to device
and to cpu synchronization.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-of

swiotlb: split swiotlb_tbl_sync_single

Split swiotlb_tbl_sync_single into two separate funtions for the to device
and to cpu synchronization.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

show more ...


# 2973073a 01-Mar-2021 Christoph Hellwig <[email protected]>

swiotlb: remove the alloc_size parameter to swiotlb_tbl_unmap_single

Now that swiotlb remembers the allocation size there is no need to pass
it back to swiotlb_tbl_unmap_single.

Signed-off-by: Chri

swiotlb: remove the alloc_size parameter to swiotlb_tbl_unmap_single

Now that swiotlb remembers the allocation size there is no need to pass
it back to swiotlb_tbl_unmap_single.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

show more ...


Revision tags: v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7
# 19c65c3d 22-Sep-2020 Christoph Hellwig <[email protected]>

dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma

Most of the dma_direct symbols should only be used by direct.c and
mapping.c, so move them to kernel/dma. In fact more of dma-dir

dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma

Most of the dma_direct symbols should only be used by direct.c and
mapping.c, so move them to kernel/dma. In fact more of dma-direct.h
should eventually move, but that will require more coordination with
other subsystems.

Signed-off-by: Christoph Hellwig <[email protected]>

show more ...