FPSIMD register bank context switching and crypto algorithms
optimisations for arm64 from Ard Biesheuvel.
* tag 'for-3.16' of git://git.linaro.org/people/ard.biesheuvel/linux-arm:
arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions
arm64: pull in <asm/simd.h> from asm-generic
arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
arm64/crypto: AES using ARMv8 Crypto Extensions
arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
arm64: add support for kernel mode NEON in interrupt context
arm64: defer reloading a task's FPSIMD state to userland resume
arm64: add abstractions for FPSIMD state manipulation
asm-generic: allow generic unaligned access if the arch supports it
Conflicts:
arch/arm64/include/asm/thread_info.h
This patch removes clk_register_clkdev() for the clocks that do not
have any users from boards and drivers.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
This patch adds missing clocks for mpll_gate, spll_gate, uart3_gate,
ssi2_gate and brom_gate. As an additional this fixes incorrect bit
position for dma_gate clock.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Since the beginning of the introduction of hardware I/O coherency
support for Armada 370 and Armada XP, the special DMA operations
should have applied to all DMA capable devices. Unfortunately, while
the original code properly took into account platform devices, it
didn't take into account PCI devices, which can also be DMA masters.
This commit fixes that by registering a bus notifier on pci_bus_type,
to register our custom DMA operations, like is already done for
platform devices. While doing this, we also rename
mvebu_hwcc_platform_notifier() to mvebu_hwcc_notifier() and
mvebu_hwcc_platform_nb to mvebu_hwcc_nb because they are no longer
specific to platform devices.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Link: https://lkml.kernel.org/r/1399997070-11434-1-git-send-email-thomas.petazzoni@free-electrons.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
SoC specific late_init call is now registered during PRM init, and will
be called automatically by PRM core. This helps to get rid of some
redundant initcalls and cpu_is_X checks from the PRM code.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
prm_features flag will contain SoC specific feature enabler flags. Initially
IO wakeup is added under this. Helps to get rid of runtime cpu_is_X checks.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
This helps to make the PRM registration modular, and also gets rid of a
cpu type check done later.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Done in preparation to make PRM its own driver, as the cpu_is_XXX calls are
not available outside mach-omap2 folder.
The init functions are called only from cpu specific init chain, and thus
don't need to double check against cpu type.
The exit calls check against the data provided during init-time registration
and thus don't need cpu check either.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
[paul@pwsan.com: updated to apply]
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Some of the includes are totally unnecessary, remove some others in
preparation to make the PRCM its own driver.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
[paul@pwsan.com: updated to apply; fixed build error on OMAP2xxx-only configs]
Signed-off-by: Paul Walmsley <paul@pwsan.com>
OMAP44XX_CM*_REGADDR macros should be avoided, instead use the cm_base*
iomaps.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Done in preparation to move the CM driver to its own driver folder.
These drivers will not have access to functionality under mach-omap2 anymore.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Done in preparation to move cm/prm to drivers. These will still use
omap_test_timeout, but will not have access to common.h header under
mach-omap2 anymore.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Maps all internal BPF instructions into x86_64 instructions.
This patch replaces original BPF x64 JIT with internal BPF x64 JIT.
sysctl net.core.bpf_jit_enable is reused as on/off switch.
Performance:
1. old BPF JIT and internal BPF JIT generate equivalent x86_64 code.
No performance difference is observed for filters that were JIT-able before
Example assembler code for BPF filter "tcpdump port 22"
original BPF -> old JIT: original BPF -> internal BPF -> new JIT:
0: push %rbp 0: push %rbp
1: mov %rsp,%rbp 1: mov %rsp,%rbp
4: sub $0x60,%rsp 4: sub $0x228,%rsp
8: mov %rbx,-0x8(%rbp) b: mov %rbx,-0x228(%rbp) // prologue
12: mov %r13,-0x220(%rbp)
19: mov %r14,-0x218(%rbp)
20: mov %r15,-0x210(%rbp)
27: xor %eax,%eax // clear A
c: xor %ebx,%ebx 29: xor %r13,%r13 // clear X
e: mov 0x68(%rdi),%r9d 2c: mov 0x68(%rdi),%r9d
12: sub 0x6c(%rdi),%r9d 30: sub 0x6c(%rdi),%r9d
16: mov 0xd8(%rdi),%r8 34: mov 0xd8(%rdi),%r10
3b: mov %rdi,%rbx
1d: mov $0xc,%esi 3e: mov $0xc,%esi
22: callq 0xffffffffe1021e15 43: callq 0xffffffffe102bd75
27: cmp $0x86dd,%eax 48: cmp $0x86dd,%rax
2c: jne 0x0000000000000069 4f: jne 0x000000000000009a
2e: mov $0x14,%esi 51: mov $0x14,%esi
33: callq 0xffffffffe1021e31 56: callq 0xffffffffe102bd91
38: cmp $0x84,%eax 5b: cmp $0x84,%rax
3d: je 0x0000000000000049 62: je 0x0000000000000074
3f: cmp $0x6,%eax 64: cmp $0x6,%rax
42: je 0x0000000000000049 68: je 0x0000000000000074
44: cmp $0x11,%eax 6a: cmp $0x11,%rax
47: jne 0x00000000000000c6 6e: jne 0x0000000000000117
49: mov $0x36,%esi 74: mov $0x36,%esi
4e: callq 0xffffffffe1021e15 79: callq 0xffffffffe102bd75
53: cmp $0x16,%eax 7e: cmp $0x16,%rax
56: je 0x00000000000000bf 82: je 0x0000000000000110
58: mov $0x38,%esi 88: mov $0x38,%esi
5d: callq 0xffffffffe1021e15 8d: callq 0xffffffffe102bd75
62: cmp $0x16,%eax 92: cmp $0x16,%rax
65: je 0x00000000000000bf 96: je 0x0000000000000110
67: jmp 0x00000000000000c6 98: jmp 0x0000000000000117
69: cmp $0x800,%eax 9a: cmp $0x800,%rax
6e: jne 0x00000000000000c6 a1: jne 0x0000000000000117
70: mov $0x17,%esi a3: mov $0x17,%esi
75: callq 0xffffffffe1021e31 a8: callq 0xffffffffe102bd91
7a: cmp $0x84,%eax ad: cmp $0x84,%rax
7f: je 0x000000000000008b b4: je 0x00000000000000c2
81: cmp $0x6,%eax b6: cmp $0x6,%rax
84: je 0x000000000000008b ba: je 0x00000000000000c2
86: cmp $0x11,%eax bc: cmp $0x11,%rax
89: jne 0x00000000000000c6 c0: jne 0x0000000000000117
8b: mov $0x14,%esi c2: mov $0x14,%esi
90: callq 0xffffffffe1021e15 c7: callq 0xffffffffe102bd75
95: test $0x1fff,%ax cc: test $0x1fff,%rax
99: jne 0x00000000000000c6 d3: jne 0x0000000000000117
d5: mov %rax,%r14
9b: mov $0xe,%esi d8: mov $0xe,%esi
a0: callq 0xffffffffe1021e44 dd: callq 0xffffffffe102bd91 // MSH
e2: and $0xf,%eax
e5: shl $0x2,%eax
e8: mov %rax,%r13
eb: mov %r14,%rax
ee: mov %r13,%rsi
a5: lea 0xe(%rbx),%esi f1: add $0xe,%esi
a8: callq 0xffffffffe1021e0d f4: callq 0xffffffffe102bd6d
ad: cmp $0x16,%eax f9: cmp $0x16,%rax
b0: je 0x00000000000000bf fd: je 0x0000000000000110
ff: mov %r13,%rsi
b2: lea 0x10(%rbx),%esi 102: add $0x10,%esi
b5: callq 0xffffffffe1021e0d 105: callq 0xffffffffe102bd6d
ba: cmp $0x16,%eax 10a: cmp $0x16,%rax
bd: jne 0x00000000000000c6 10e: jne 0x0000000000000117
bf: mov $0xffff,%eax 110: mov $0xffff,%eax
c4: jmp 0x00000000000000c8 115: jmp 0x000000000000011c
c6: xor %eax,%eax 117: mov $0x0,%eax
c8: mov -0x8(%rbp),%rbx 11c: mov -0x228(%rbp),%rbx // epilogue
cc: leaveq 123: mov -0x220(%rbp),%r13
cd: retq 12a: mov -0x218(%rbp),%r14
131: mov -0x210(%rbp),%r15
138: leaveq
139: retq
On fully cached SKBs both JITed functions take 12 nsec to execute.
BPF interpreter executes the program in 30 nsec.
The difference in generated assembler is due to the following:
Old BPF imlements LDX_MSH instruction via sk_load_byte_msh() helper function
inside bpf_jit.S.
New JIT removes the helper and does it explicitly, so ldx_msh cost
is the same for both JITs, but generated code looks longer.
New JIT has 4 registers to save, so prologue/epilogue are larger,
but the cost is within noise on x64.
Old JIT checks whether first insn clears A and if not emits 'xor %eax,%eax'.
New JIT clears %rax unconditionally.
2. old BPF JIT doesn't support ANC_NLATTR, ANC_PAY_OFFSET, ANC_RANDOM
extensions. New JIT supports all BPF extensions.
Performance of such filters improves 2-4 times depending on a filter.
The longer the filter the higher performance gain.
Synthetic benchmarks with many ancillary loads see 20x speedup
which seems to be the maximum gain from JIT
Notes:
. net.core.bpf_jit_enable=2 + tools/net/bpf_jit_disasm is still functional
and can be used to see generated assembler
. there are two jit_compile() functions and code flow for classic filters is:
sk_attach_filter() - load classic BPF
bpf_jit_compile() - try to JIT from classic BPF
sk_convert_filter() - convert classic to internal
bpf_int_jit_compile() - JIT from internal BPF
seccomp and tracing filters will just call bpf_int_jit_compile()
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split bpf_jit_compile() into two functions to improve readability
of for(pass++) loop. The change follows similar style of JIT compilers
for arm, powerpc, s390
The body of new do_jit() was not reformatted to reduce noise
in this patch, since the following patch replaces most of it.
Tested with BPF testsuite.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add device-tree bindings for the ARM CCI-400 on Exynos5420. There
are two slave interfaces: one for the A15 cluster and one for the
A7 cluster.
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Add generic cluster power control functions for exynos based SoCS
for cluster power up/down and to know the cluster status.
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Use generic exynos cpu power control functions to power up/down
and to know the status of the cpu in platsmp and hotplug code.
Signed-off-by: Leela Krishna Amudala <leela.krishna@linaro.org>
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Add generic cpu power control functions for exynos based
SoCS for cpu power up/down and to know the cpu status.
Signed-off-by: Leela Krishna Amudala <leela.krishna@linaro.org>
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The attached change significantly improves the performance of the LWS-CAS code
in syscall.S.
This allows a number of packages to build (e.g., zeromq3, gtest and libxs)
that previously failed because slow LWS-CAS performance under contention. In
particular, interrupts taken while the lock was taken degraded performance
significantly.
The change does the following:
1) Disables interrupts around the CAS operation, and
2) Changes the loads and stores to use the ordered completer, "o", on
PA 2.0. "o" and "ma" with a zero offset are equivalent. The latter is
accepted on both PA 1.X and 2.0.
The use of ordered loads and stores probably makes no difference on all
existing hardware, but it seemed pedantically correct. In particular, the CAS
operation must complete before LDCW lock is released. As written before, a
processor could reorder the operations.
I don't believe the period interrupts are disabled is long enough to
significantly increase interrupt latency. For example, the TLB insert code is
longer. Worst case is a memory fault in the CAS operation.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>
Ratelimit printing of userspace segfaults and make it runtime
configurable via the /proc/sys/debug/exception-trace variable. This
should resolve syslog from growing way too fast and thus prevents
possible system service attacks.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 3.13+
This patch adds the device tree to support Toradex Colibri T30, a
computer on module which can be used on different carrier boards.
The module consists of a Tegra 30 SoC, two PMIC, DDR3L RAM, eMMC,
a LM95245 temperature sensor and an AX88772B USB Ethernet
Controller. Furthermore, there is a STMPE811 and SGTL5000 audio
codec which are not yet supported. Anything that is not self
contained on the module is disabled by default.
The device tree for the Evaluation Board includes the modules
device tree and enables the supported pheripherials of the carrier
board (the Evaluation Board supports almost all of them).
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
The last reason for static memory mapping is the HBI (board
identification number) check early in the machine code.
Moving the check to the sysreg driver makes it possible to
completely remove the early mapping and init functions.
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
As all cores must be properly described in the Device Tree,
there is no point in getting their numbers from SCU on
A5/A9 platforms. This significantly simplifies the code,
removing the need for flat-tree scanning and early static
mapping.
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
arm_dt_init_cpu_maps parses the device tree, validates and sets the
cpu_possible_mask appropriately. It is unnecessary to do another DT
parse to get the number of cpus, use num_possible_cpus instead.
This patch also removes setting cpu_present_mask as platforms should
only re-initialize it in smp_prepare_cpus() if present != possible.
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
This patch adds a trival sched clock source using free
running, 24MHz clocked counter present in the ARM Ltd.
reference platforms (Versatile, RealView, Versatile
Express) System Registers block.
This code replaces the call in the VE machine code.
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
This patch - finally, after over 6 months! :-( - addresses
Samuel's request to split the vexpress-sysreg driver into
smaller portions and define the device in a form of MFD
cells:
* LEDs code has been completely removed and replaced with
"gpio-leds" nodes in the tree (referencing dedicated
GPIO subnodes in sysreg - bindings documentation updated);
this also better fits the reality as some variants of the
motherboard don't have all the LEDs populated
* syscfg bridge code has been extracted into a separate
driver (placed in drivers/misc for no better place)
* all the ID & MISC registers are defined as sysconf
making them available for other drivers should they need
to use them (and also to the user via /sys/kernel/debug/regmap
which can be helpful in platform debugging)
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Components of the Versatile Express platform (configuration
microcontrollers on motherboard and daughterboards in particular)
talk to each other over a custom configuration bus. They
provide miscellaneous functions (from clock generator control
to energy sensors) which are represented as platform devices
(and Device Tree nodes). The transactions on the bus can
be generated by different "bridges" in the system, some
of which are universal for the whole platform (for the price
of high transfer latencies), others restricted to a subsystem
(but much faster).
Until now drivers for such functions were using custom "func"
API, which is being replaced in this patch by regmap calls.
This required:
* a rework (and move to drivers/bus directory, as suggested
by Samuel and Arnd) of the config bus core, which is much
simpler now and uses device model infrastructure (class)
to keep track of the bridges; non-DT case (soon to be
retired anyway) is simply covered by a special device
registration function
* the new config-bus driver also takes over device population,
so there is no need for special matching table for
of_platform_populate nor "simple-bus" hack in the arm64
model dtsi file (relevant bindings documentation has
been updated); this allows all the vexpress devices
fit into normal device model, making it possible
to remove plenty of early inits and other hacks in
the near future
* adaptation of the syscfg bridge implementation in the
sysreg driver, again making it much simpler; there is
a special case of the "energy" function spanning two
registers, where they should be both defined in the tree
now, but backward compatibility is maintained in the code
* modification of the relevant drivers:
* hwmon - just a straight-forward API change
* power/reset driver - API change
* regulator - API change plus error handling
simplification
* osc clock driver - this one required larger rework
in order to turn in into a standard platform driver
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Mark Brown <broonie@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Mike Turquette <mturquette@linaro.org>
Select the MFD_SUN6I_PRCM option when sun6i arch is enabled in order to get
the PRCM (Power/Reset/Clock Management) related drivers compiled.
Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
_PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the
pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate
the MFN. If mfn_pte() is used instead, the p2m look up is avoided and
the use of _PAGE_IOMAP is no longer needed.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
PCI devices may have BARs located above the end of RAM so mark such
frames as identity frames in the p2m (instead of the default of
missing).
PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be
identity frames for the same reason.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In xen_add_extra_mem(), if the WARN() checks for bad MFNs trigger it is
likely that they will trigger at lot, spamming the log.
Use WARN_ONCE() instead.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Large (multi-GB) identity ranges currently require a unique middle page
(filled with p2m_identity entries) per 1 GB region.
Similar to the common p2m_mid_missing middle page for large missing
regions, introduce a p2m_mid_identity page (filled with p2m_identity
entries) which can be used instead.
set_phys_range_identity() thus only needs to allocate new middle pages
at the beginning and end of the range.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>