linux/arch/arm
Doug Anderson 14d3ae2efe ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory.  The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Decoding video is also pretty heavy and the TLB misses aren't a
huge deal.  Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11 15:33:38 +00:00
..
boot ARM: SoC fixes for v4.5 merge window 2016-01-22 17:26:00 -08:00
common ARM: SoC cleanups for v4.5 2016-01-20 17:55:20 -08:00
configs ARM: SoC defconfig updates for v4.5 2016-01-20 18:29:13 -08:00
crypto
firmware
include ARM: 8504/1: __arch_xprod_64(): small optimization 2016-02-11 15:33:37 +00:00
kernel ARM: 8501/1: mm: flip priority of CONFIG_DEBUG_RODATA 2016-02-08 15:56:45 +00:00
kvm kvm: rename pfn_t to kvm_pfn_t 2016-01-15 17:56:32 -08:00
lib arm, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
mach-alpine
mach-asm9260
mach-at91 ARM: SoC cleanups for v4.5 2016-01-20 17:55:20 -08:00
mach-axxia
mach-bcm Merge tag 'bcm2835-soc-next-2015-12-28' of http://github.com/anholt/linux into next/soc 2015-12-31 17:37:12 +01:00
mach-berlin
mach-clps711x gpio: generic: factor into gpio_chip struct 2016-01-05 11:21:00 +01:00
mach-cns3xxx
mach-davinci ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
mach-digicolor
mach-dove
mach-ebsa110
mach-efm32
mach-ep93xx
mach-exynos ARM: SoC platform updates for v4.5 2016-01-20 18:10:05 -08:00
mach-footbridge ARM: debug-ll: rework footbridge handling 2015-12-15 23:43:29 +01:00
mach-gemini
mach-highbank
mach-hisi
mach-imx ARM: SoC platform updates for v4.5 2016-01-20 18:10:05 -08:00
mach-integrator Merge branch 'treewide/cleanup' into next/multiplatform 2015-12-18 17:07:52 +01:00
mach-iop13xx
mach-iop32x
mach-iop33x
mach-ixp4xx MTD updates for v4.5: 2016-01-13 11:25:54 -08:00
mach-keystone ARM: make virt_to_idmap() return unsigned long 2016-02-08 15:47:28 +00:00
mach-ks8695
mach-lpc18xx
mach-lpc32xx
mach-mediatek ARM: DT updates for v4.5 2016-01-20 18:16:29 -08:00
mach-meson
mach-mmp
mach-moxart
mach-mv78xx0
mach-mvebu mvebu cleanup for 4.5 (part 1) 2015-12-15 17:36:20 +01:00
mach-mxs
mach-netx
mach-nomadik
mach-nspire
mach-omap1 ARM: SoC platform updates for v4.5 2016-01-20 18:10:05 -08:00
mach-omap2 More power management and ACPI updates for v4.5-rc1 2016-01-20 19:06:49 -08:00
mach-orion5x ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
mach-picoxcell
mach-prima2
mach-pxa ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
mach-qcom
mach-realview ARM: realview: fix device tree build 2016-01-21 13:16:17 +01:00
mach-rockchip Merge branch 'treewide/cleanup' into next/soc 2015-12-22 13:10:00 -08:00
mach-rpc
mach-s3c24xx ARM: s3c: simplify s3c_irqwake_{e,}intallow definition 2015-12-31 17:26:18 +01:00
mach-s3c64xx ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
mach-s5pv210
mach-sa1100 ARM: sa1100/simpad: Be sure to clamp return value 2015-12-22 14:57:50 -08:00
mach-shmobile
mach-socfpga
mach-spear
mach-sti ARM: SoC cleanups for v4.5 2016-01-20 17:55:20 -08:00
mach-stm32
mach-sunxi
mach-tango ARM: tango: Fix UP build issues 2016-01-07 06:33:41 +01:00
mach-tegra ARM: tegra: Core SoC changes for v4.5-rc1 2016-01-12 10:14:52 -08:00
mach-u300
mach-uniphier Merge branch 'treewide/cleanup' into next/soc 2015-12-22 13:10:00 -08:00
mach-ux500 ARM: SoC cleanups for v4.5 2016-01-20 17:55:20 -08:00
mach-versatile ARM: versatile: convert to multi-platform 2015-12-15 23:54:48 +01:00
mach-vexpress
mach-vt8500
mach-w90x900
mach-zx
mach-zynq Merge branch 'treewide/cleanup' into next/soc 2015-12-22 13:10:00 -08:00
mm ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc 2016-02-11 15:33:38 +00:00
net ARM: net: bpf: fix zero right shift 2016-01-06 01:32:09 -05:00
nwfpe
oprofile
plat-iop
plat-omap
plat-orion ARM: orion: implement ARM delay timer 2016-01-26 23:45:05 +00:00
plat-pxa
plat-samsung ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
plat-versatile
probes
tools
vdso
vfp
xen xen/arm: set the system time in Xen via the XENPF_settime64 hypercall 2015-12-21 14:40:58 +00:00
Kconfig Merge branch 'akpm' (patches from Andrew) 2016-01-21 12:32:08 -08:00
Kconfig-nommu
Kconfig.debug ARM: SoC fixes for v4.5 merge window 2016-01-22 17:26:00 -08:00
Makefile ARM: SoC defconfig updates for v4.5 2016-01-20 18:29:13 -08:00