linux/arch
Doug Anderson 14d3ae2efe ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory.  The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Decoding video is also pretty heavy and the TLB misses aren't a
huge deal.  Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11 15:33:38 +00:00
..
alpha dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
arc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
arm ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc 2016-02-11 15:33:38 +00:00
arm64 ARM: SoC support for Tegra platforms for v4.5 2016-01-22 17:30:52 -08:00
avr32 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
blackfin dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
c6x dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
cris Merge branch 'akpm' (patches from Andrew) 2016-01-21 12:32:08 -08:00
frv dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
h8300 Merge branch 'akpm' (patches from Andrew) 2016-01-21 12:32:08 -08:00
hexagon dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
ia64 [IA64] Enable copy_file_range syscall for ia64 2016-01-22 14:20:01 -08:00
m32r m32r: fix m32104ut_defconfig build fail 2016-01-14 16:00:49 -08:00
m68k dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
metag dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
microblaze dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
mips Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-01-24 12:50:56 -08:00
mn10300 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
nios2 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
openrisc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
parisc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
powerpc wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
s390 wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
score
sh dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
sparc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
tile dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
um um: kill pfn_t 2016-01-15 17:56:32 -08:00
unicore32 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
x86 pmem: add wb_cache_pmem() to the PMEM API 2016-01-22 17:02:18 -08:00
xtensa dma-mapping: remove <asm-generic/dma-coherent.h> 2016-01-20 17:09:18 -08:00
.gitignore
Kconfig dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00