Commit Graph

591 Commits

Author SHA1 Message Date
Will Deacon
c209f79940 arm64: rwsem: use asm-generic rwsem implementation
asm-generic offers an atomic-add based rwsem implementation, which
can avoid the need for heavier, spinlock-based synchronisation on the
fast path.

This patch makes use of the optimised implementation for arm64 CPUs.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-14 18:02:09 +00:00
Ard Biesheuvel
3be1a5c4f7 arm64: enable generic CPU feature modalias matching for this architecture
This enables support for the generic CPU feature modalias implementation that
wires up optional CPU features to udev based module autoprobing.

A file <asm/cpufeature.h> is provided that maps CPU feature numbers to
elf_hwcap bits, which is the standard way on arm64 to advertise optional CPU
features both internally and to user space.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[catalin.marinas@arm.com: removed unnecessary "!!"]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-14 18:01:36 +00:00
Jingoo Han
7184659bed arm64: smp: make local symbol static
Make smp_spin_table_cpu_postboot() static, because this function
is used only in this file.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-13 11:22:41 +00:00
Jingoo Han
242c04bc4b arm64: debug: make local symbols static
Make local symbols static, because these are used only in this
file.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-13 11:22:39 +00:00
Jean Pihet
5f888a1d33 ARM64: perf: support dwarf unwinding in compat mode
Add support for unwinding using the dwarf information in compat
mode. Using the correct user stack pointer allows perf to record
the frames correctly in the native and compat modes.

Note that although the dwarf frame unwinding works ok using
libunwind in native mode (on ARMv7 & ARMv8), some changes are
required to the libunwind code for the compat mode. Those changes
are posted separately on the libunwind mailing list.

Tested on ARMv8 platform with v8 and compat v7 binaries, the latter
are statically built.

Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-13 11:22:38 +00:00
Jean Pihet
23c7d70d55 ARM64: perf: add support for frame pointer unwinding in compat mode
When profiling a 32-bit application, user space callchain unwinding
using the frame pointer is performed in compat mode. The code is taken
over from the AARCH32 code and adapted to work on AARCH64.

Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-13 11:22:38 +00:00
Jean Pihet
2ee0d7fd36 ARM64: perf: add support for perf registers API
This patch implements the functions required for the perf registers API,
allowing the perf tool to interface kernel register dumps with libunwind
in order to provide userspace backtracing.
Compat mode is also supported.

Only the general purpose user space registers are exported, i.e.:
 PERF_REG_ARM_X0,
 ...
 PERF_REG_ARM_X28,
 PERF_REG_ARM_FP,
 PERF_REG_ARM_LR,
 PERF_REG_ARM_SP,
 PERF_REG_ARM_PC
and not the PERF_REG_ARM_V* registers.

Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-13 11:22:37 +00:00
Radha Mohan Chintakuntla
87366d8cf7 arm64: Add boot time configuration of Intermediate Physical Address size
ARMv8 supports a range of physical address bit sizes. The PARange bits
from ID_AA64MMFR0_EL1 register are read during boot-time and the
intermediate physical address size bits are written in the translation
control registers (TCR_EL1 and VTCR_EL2).

There is no change in the VA bits and levels of translation.

Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Reviewed-by: Will Deacon <Will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-13 11:22:36 +00:00
Catalin Marinas
71fdb6bf61 arm64: Do not synchronise I and D caches for special ptes
Special pte mappings are not intended to be executable and do not even
have an associated struct page. This patch ensures that we do not call
__sync_icache_dcache() on such ptes.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Steve Capper <Steve.Capper@arm.com>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Cc: <stable@vger.kernel.org>
2014-03-13 11:22:29 +00:00
Catalin Marinas
de2db74329 arm64: Make DMA coherent and strongly ordered mappings not executable
pgprot_{dmacoherent,writecombine,noncached} don't need to generate
executable mappings with side-effects like __sync_icache_dcache() being
called when the mapping is in user space.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Cc: <stable@vger.kernel.org>
2014-03-13 11:22:21 +00:00
Will Deacon
d152d22a18 arm64: barriers: add dmb barrier
Commit 8adbf57fc4 ("irqchip: gic: use dmb ishst instead of dsb when
raising a softirq") added an explicit dmb(...) call to the GIC driver.

This patch adds a simple dmb() macro to arm64, which expands to a DMB SY
instruction.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-10 11:57:40 +00:00
Mark Brown
f6e763b93a arm64: topology: Implement basic CPU topology support
Add basic CPU topology support to arm64, based on the existing pre-v8
code and some work done by Mark Hambleton.  This patch does not
implement any topology discovery support since that should be based on
information from firmware, it merely implements the scaffolding for
integration of topology support in the architecture.

No locking of the topology data is done since it is only modified during
CPU bringup with external serialisation from the SMP code.

The goal is to separate the architecture hookup for providing topology
information from the DT parsing in order to ease review and avoid
blocking the architecture code (which will be built on by other work)
with the DT code review by providing something simple and basic.

Following patches will implement support for interpreting topology
information from MPIDR and for parsing the DT topology bindings for ARM,
similar patches will be needed for ACPI.

Signed-off-by: Mark Brown <broonie@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: removed CONFIG_CPU_TOPOLOGY, always on if SMP]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-04 10:30:07 +00:00
Ard Biesheuvel
4cf761cdcc arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries
This adds support for advertising the presence of ARMv8 Crypto
Extensions in the Aarch32 execution state to 32-bit ELF binaries
running in 32-bit compat mode under the arm64 kernel.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-04 08:06:32 +00:00
Ard Biesheuvel
28964d32d4 arm64: add AT_HWCAP2 support for 32-bit compat
Add support for the ELF auxv entry AT_HWCAP2 when running 32-bit
ELF binaries in compat mode.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-04 08:05:59 +00:00
Mark Rutland
bff705950e arm64: remove unnecessary cache flush at boot
Currently we flush the entire dcache at boot within __cpu_setup, but
this is unnecessary as the booting protocol demands that the dcache is
invalid and off upon entering the kernel. The presence of the cache
flush only serves to hide bugs in bootloaders, and is not safe in the
presence of SMP.

In an SMP boot scenario the CPUs enter coherency outside of the kernel,
and the primary CPU enables its caches before bringing up secondary
CPUs. Therefore if any secondary CPU has an entry in its cache (in
violation of the boot protocol), the primary CPU might snoop it even if
the secondary CPU's cache is disabled. The boot-time cache flush only
serves to hide a firmware bug, and slows down a cpu boot unnecessarily.

This patch removes the unnecessary boot-time cache flush.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
[catalin.marinas@arm.com: make __flush_dcache_all local only]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-03-04 01:07:22 +00:00
Rob Herring
addea9ef05 cpufreq: enable ARM drivers on arm64
Enable cpufreq and power kconfig menus on arm64 along with arm cpufreq
drivers. The power menu is needed for OPP support. At least on Calxeda
systems, the same cpufreq driver is used for arm and arm64 based
systems.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-28 15:03:17 +00:00
Vladimir Murzin
64b4f60f49 arm64: remove return value form psci_init()
psci_init() is written to return err code if something goes wrong. However,
the single user, setup_arch(), doesn't care about it. Moreover, every error
path is supplied with a clear message which is enough for pleasant debugging.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-28 14:14:53 +00:00
Vladimir Murzin
288ac26cc2 arm64: remove redundant "psci:" prefixes
Since 652af89979 "arm64: factor out spin-table
boot method" psci prefix's been introduced. We have a common pr_fmt, so clean
them up.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-28 14:14:18 +00:00
Catalin Marinas
7363590d2c arm64: Implement coherent DMA API based on swiotlb
This patch adds support for DMA API cache maintenance on SoCs without
hardware device cache coherency.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-27 17:16:59 +00:00
Catalin Marinas
3690951fc6 arm64: Use swiotlb late initialisation
Since arm64 does not support ISA, there is no need for early swiotlb
initialisation. This patch switches the DMA mapping code to
swiotlb_tlb_late_init_with_default_size(). A side effect of this is that
GFP_DMA is used for the swiotlb buffer and devices with a 32-bit
coherent mask are correctly supported.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-27 14:11:53 +00:00
Catalin Marinas
19e7640d1f arm64: Replace ZONE_DMA32 with ZONE_DMA
On arm64 we do not have two DMA zones, so it does not make sense to
implement ZONE_DMA32. This patch changes ZONE_DMA32 with ZONE_DMA, the
latter covering 32-bit dma address space to honour GFP_DMA allocations.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-27 12:09:22 +00:00
Nathan Lynch
16fb1a9bec arm64: vdso: clean up vdso_pagelist initialization
Remove some unnecessary bits that were apparently carried over from
another architecture's implementation:

- No need to get_page() the vdso text/data - these are part of the
  kernel image.
- No need for ClearPageReserved on the vdso text.
- No need to vmap the first text page to check the ELF header - this
  can be done through &vdso_start.

Also some minor cleanup:
- Use kcalloc for vdso_pagelist array allocation.
- Don't print on allocation failure, slab/slub will do that for us.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:30 +00:00
Ritesh Harjani
bb10eb7b4d arm64: Change misleading function names in dma-mapping
arm64_swiotlb_alloc/free_coherent name can be misleading
somtimes with CMA support being enabled after this
patch (c2104debc235b745265b64d610237a6833fd53)

Change this name to be more generic:
__dma_alloc/free_coherent

Signed-off-by: Ritesh Harjani <ritesh.harjani@gmail.com>
[catalin.marinas@arm.com: renamed arm64_swiotlb_dma_ops to coherent_swiotlb_dma_ops]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:28 +00:00
Geoff Levand
09024aa61e arm64: Fix the soft_restart routine
Change the soft_restart() routine to call cpu_reset() at its identity mapped
physical address.

The cpu_reset() routine must be called at its identity mapped physical address
so that when the MMU is turned off the instruction pointer will be at the correct
location in physical memory.

Signed-off-by: Geoff Levand <geoff@infradead.org> for Huawei, Linaro
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:28 +00:00
Catalin Marinas
ea8c2e1124 arm64: Extend the idmap to the whole kernel image
This patch changes the idmap page table creation during boot to cover
the whole kernel image, allowing functions like cpu_reset() to be safely
called with the physical address.

This patch also simplifies the create_block_map asm macro to no longer
take an idmap argument and always use the phys/virt/end parameters. For
the idmap case, phys == virt.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:28 +00:00
Catalin Marinas
020c1427f3 arm64: Convert asm/tlb.h to generic mmu_gather
Over the past couple of years, the generic mmu_gather gained range
tracking - 597e1c3580 (mm/mmu_gather: enable tlb flush range in generic
mmu_gather), 2b047252d0 (Fix TLB gather virtual address range
invalidation corner cases) - and tlb_fast_mode() has been removed -
29eb77825c (arch, mm: Remove tlb_fast_mode()).

The new mmu_gather structure is now suitable for arm64 and this patch
converts the arch asm/tlb.h to the generic code. One functional
difference is the shift_arg_pages() case where previously the code was
flushing the full mm (no tlb_start_vma call) but now it flushes the
range given to tlb_gather_mmu() (possibly slightly more efficient
previously).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2014-02-26 11:16:27 +00:00
Catalin Marinas
22bd1c91fe arm64: Extend the PCI I/O space to 16MB
The patch moves the PCI I/O space (currently at 64K) before the
earlyprintk mapping and extends it to 16MB.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:27 +00:00
Vijaya Kumar K
d8ed442a00 arm64: enable processor debug state for secondary cpus
processor debug state PSTATE.D is unmasked in smp call
clear_os_lock for secondary cpus. So debug state is still
masked in normal kernel context.  With this patch, unmask
debug state on secondary boot for the cpus in normal kernel
context. Now kgdb tests passed with multicore.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:27 +00:00
Vijaya Kumar K
9529247db9 arm64: KGDB: Add KGDB config
Add HAVE_ARCH_KGDB for arm64 Kconfig

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:27 +00:00
Vijaya Kumar K
58dcc204f1 misc: debug: remove compilation warnings
typecast instruction_pointer macro to unsigned long to
resolve following compiler warnings like
warning: format '%lx' expects argument of type 'long unsigned int',
but argument 2 has type 'u64' [-Wformat]

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:26 +00:00
Vijaya Kumar K
44679a4f14 arm64: KGDB: Add step debugging support
Add KGDB software step debugging support for EL1 debug
in AArch64 mode.

KGDB registers step debug handler with debug monitor.
On receiving 'step' command from GDB tool, target enables
software step debugging and step address is updated in ELR.

Software Step debugging is disabled when 'continue' command
is received

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:25 +00:00
Vijaya Kumar K
bcf5763b0d arm64: KGDB: Add Basic KGDB support
Add KGDB debug support for kernel debugging.
With this patch, basic KGDB debugging is possible.GDB register
layout is updated and GDB tool can establish connection with
target and can set/clear breakpoints.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:25 +00:00
Vijaya Kumar K
c7db4ff5d2 arm64: Add macros to manage processor debug state
Add macros to enable and disable to manage PSTATE.D
for debugging. The macros local_dbg_save and local_dbg_restore
are moved to irqflags.h file

KGDB boot tests fail because of PSTATE.D is masked.
unmask it for debugging support

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-26 11:16:25 +00:00
Linus Torvalds
c1b8ae03c3 A small error handling problem and a compile breakage for ARM64.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS/eyuAAoJEBvWZb6bTYby9ksP/0QTkE2cYXPAY8xDZM8BNExe
 vo8y1noulQ2cFkWONKRHgoWPILL0luRzsOxdNa4ge5UA0QPRJsTE8i1SBqRXlWlZ
 Lj303SE9naL97K2tMKONA2DaXGKMeyNdIbpZYAgoUTdxqdOabpVPqWBFKpQGENiq
 Febfa5O2UD6XwDxFdDj1+5UO5w+Vlwc3ObZerbJzX2r5PabArtBnfyyC92ghOIp/
 BhS0BzCCRY0vzNE/i4dYlrDrggowlmF/OOdHyAIx8ae7fEt/cHu3viNqytLb2eFe
 P7CQ6wVaPZffX0oR2sPQx47P6rDS68TO6DGwzTv3qkFGK2pC2NKPAB3e2KHr47pu
 f17xJq3s04OuvgWkb58CNuMLp2tPQvTkiYsVTprQME4YY/KSw5+CW+CklBhwahph
 lhAEL3/0jf8AbhxZZTa6rIxVD0purgbsqzvNvdpWEDLdk0ElNrWi1c+NZga6HP1z
 DUuKfHgUME5Rs2/uyCzFYCMPdjc2/7BpeRgx5jQ5NARJbegH2JbAGGR9LRkagQIV
 yaPqdPXP1FgF5V+otWNAzCwFiULFWhdEVAAgYu0Je7/R2cXK8wuX1yqTrsjkKLb6
 gVWBqzyLG2av8NNSpeBarpJ5GsurJ/rq/1sJhf/Mb3NVz64hSFwiZ1LzM950jqFG
 1eh1N0AUU05VvnW+38pO
 =+70Q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "A small error handling problem and a compile breakage for ARM64"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  arm64: KVM: Add VGIC device control for arm64
  KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio()
2014-02-14 11:10:49 -08:00
Christoffer Dall
2a2f3e269c arm64: KVM: Add VGIC device control for arm64
This fixes the build breakage introduced by
c07a0191ef and adds support for the device
control API and save/restore of the VGIC state for ARMv8.

The defines were simply missing from the arm64 header files and
uaccess.h must be implicitly imported from somewhere else on arm.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-14 11:09:49 +01:00
Mark Rutland
55834a773f arm64: defconfig: Expand default enabled features
FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.

Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.

This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.

A few options which don't need to appear in defconfig are trimmed:

* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07 17:17:28 +00:00
Will Deacon
95c4189689 arm64: asm: remove redundant "cc" clobbers
cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07 16:46:07 +00:00
Will Deacon
8e86f0b409 arm64: atomics: fix use of acquire + release for full barrier semantics
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	<Access [A]>

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	<Access [A]>

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	<Access [A]>

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07 16:45:43 +00:00
Will Deacon
4a7ac12eed arm64: barriers: allow dsb macro to take option parameter
The dsb instruction takes an option specifying both the target access
types and shareability domain.

This patch allows such an option to be passed to the dsb macro,
resulting in potentially more efficient code. Currently the option is
ignored until all callers are updated (unlike ARM, the option is
mandated by the assembler).

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-06 11:39:11 +00:00
Catalin Marinas
6290b53de0 arm64: compat: Wire up new AArch32 syscalls
This patch enables sys_compat, sys_finit_module, sys_sched_setattr and
sys_sched_getattr for compat (AArch32) applications.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 12:03:52 +00:00
Nathan Lynch
d4022a3352 arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
Update wall-to-monotonic fields in the VDSO data page
unconditionally.  These are used to service CLOCK_MONOTONIC_COARSE,
which is not guarded by use_syscall.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 11:55:49 +00:00
Nathan Lynch
069b918623 arm64: vdso: fix coarse clock handling
When __kernel_clock_gettime is called with a CLOCK_MONOTONIC_COARSE or
CLOCK_REALTIME_COARSE clock id, it returns incorrectly to whatever the
caller has placed in x2 ("ret x2" to return from the fast path).  Fix
this by saving x30/LR to x2 only in code that will call
__do_get_tspec, restoring x30 afterward, and using a plain "ret" to
return from the routine.

Also: while the resulting tv_nsec value for CLOCK_REALTIME and
CLOCK_MONOTONIC must be computed using intermediate values that are
left-shifted by cs_shift (x12, set by __do_get_tspec), the results for
coarse clocks should be calculated using unshifted values
(xtime_coarse_nsec is in units of actual nanoseconds).  The current
code shifts intermediate values by x12 unconditionally, but x12 is
uninitialized when servicing a coarse clock.  Fix this by setting x12
to 0 once we know we are dealing with a coarse clock id.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 11:55:30 +00:00
Mark Rutland
883d50a0ed arm64: simplify pgd_alloc
Currently pgd_alloc has a redundant NULL check in its return path that
can be removed with no ill effects. With that removed it's also possible
to return early and eliminate the new_pgd temporary variable.

This patch applies said modifications, making the logic of pgd_alloc
correspond 1-1 with that of pgd_free.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 10:45:07 +00:00
Mark Rutland
bfb67a5606 arm64: fix typo: s/SERRROR/SERROR/
Somehow SERROR has acquired an additional 'R' in a couple of headers.
This patch removes them before they spread further. As neither instance
is in use yet, no other sites need to be fixed up.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 10:42:32 +00:00
Catalin Marinas
a55f9929a9 arm64: Invalidate the TLB when replacing pmd entries during boot
With the 64K page size configuration, __create_page_tables in head.S
maps enough memory to get started but using 64K pages rather than 512M
sections with a single pgd/pud/pmd entry pointing to a pte table.
create_mapping() may override the pgd/pud/pmd table entry with a block
(section) one if the RAM size is more than 512MB and aligned correctly.
For the end of this block to be accessible, the old TLB entry must be
invalidated.

Cc: <stable@vger.kernel.org>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 10:30:51 +00:00
Laura Abbott
ccc9e244eb arm64: Align CMA sizes to PAGE_SIZE
dma_alloc_from_contiguous takes number of pages for a size.
Align up the dma size passed in to page size to avoid truncation
and allocation failures on sizes less than PAGE_SIZE.

Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 10:28:27 +00:00
Vinayak Kale
5044bad43e arm64: add DSB after icache flush in __flush_icache_all()
Add DSB after icache flush to complete the cache maintenance operation.
The function __flush_icache_all() is used only for user space mappings
and an ISB is not required because of an exception return before executing
user instructions. An exception return would behave like an ISB.

Signed-off-by: Vinayak Kale <vkale@apm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 10:26:35 +00:00
Will Deacon
4050740348 arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k
Whilst the text segment for our VDSO is marked as PT_LOAD in the ELF
headers, it is mapped by the kernel and not actually subject to
demand-paging. ld doesn't realise this, and emits a p_align field of 64k
(the maximum supported page size), which conflicts with the load address
picked by the kernel on 4k systems, which will be 4k aligned. This
causes GDB to fail with "Failed to read a valid object file image from
memory" when attempting to load the VDSO.

This patch passes the -n option to ld, which prevents it from aligning
PT_LOAD segments to the maximum page size.

Cc: <stable@vger.kernel.org>
Reported-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-04 17:52:47 +00:00
Linus Torvalds
deb2a1d29b - Build fix with DMA_CMA enabled
- Introduction of PTE_WRITE to distinguish between writable but clean
   and truly read-only pages
 - FIQs enabling/disabling clean-up (they aren't used on arm64)
 - CPU resume fix for the per-cpu offset restoring
 - Code comment typos
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6+SLAAoJEGvWsS0AyF7xk60P/RBEAXqBdlNkTPQ9eIBs8P9b
 txv7T6k6MnQt6vK+3Kepgnoadeaw4WRCqKPduA90v7tnrw8fVLmuAM29o0uITkDH
 4Cz69/8A9B2Y+ouiODS6MrEvlz99BfbSZdUZd89eNSO7tTzv1YckpepNQCL3Wf+S
 1nXa4dZrciCR2vMmkF01jvc3yTjCJ0W7IduUhFQevpMfEzq2bLhvS5eGQd618pBc
 Ih44IYO6RfBeYB3CxWiSJeHgSXzlSijUeoPQbiDEhFsKXO7SaZN5oQ5Ra8v+YAMr
 fhcblX9rxH2Z1AfYizPssQ0VEnm3cN4nKurr7CA5R+LeHuR38lD9WWABhkYgQMRB
 V43/3wvozRyWhLBkS9V0NjObfpPhDuLR6pdbWRQnCO97mH/0HVd2ql49gvsOcdgb
 qa2XTF+iOJOQ3325SGdruTkbzkccFoKniiiOsndhx48VGLhHD9U0sCYLfb5X6NJC
 ZFI3qQx+3+q32jkDoShqQZ/4fnb8TqzH/KqNtZdZPJ7dlkw/2B8wcnAIoF+BfgbH
 2TRfbndSrNGLJSc4bgFQ3e7SCHSdm/PUR/l4c+xN9Ia4tF9614ndfPAaSKaMaKca
 O6IwX8+IBo94JDhtPsypplYllBlBeXzMjZFO6b60sBF9UaxgdS1ZQ3NP9pumGN80
 2czoY2kwV68DScgxBFLx
 =tWas
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pyll ARM64 patches from Catalin Marinas:
 - Build fix with DMA_CMA enabled
 - Introduction of PTE_WRITE to distinguish between writable but clean
   and truly read-only pages
 - FIQs enabling/disabling clean-up (they aren't used on arm64)
 - CPU resume fix for the per-cpu offset restoring
 - Code comment typos

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: Introduce PTE_WRITE
  arm64: mm: Remove PTE_BIT_FUNC macro
  arm64: FIQs are unused
  arm64: mm: fix the function name in comment of cpu_do_switch_mm
  arm64: fix build error if DMA_CMA is enabled
  arm64: kernel: fix per-cpu offset restore on resume
  arm64: mm: fix the function name in comment of __flush_dcache_area
  arm64: mm: use ubfm for dcache_line_size
2014-01-31 14:25:52 -08:00
Steve Capper
c2c93e5b7f arm64: mm: Introduce PTE_WRITE
We have the following means for encoding writable or dirty ptes:

                                PTE_DIRTY       PTE_RDONLY
!pte_dirty && !pte_write        0               1
!pte_dirty && pte_write         0               1
pte_dirty && !pte_write         1               1
pte_dirty && pte_write          1               0

So we can't distinguish between writable clean ptes and read only
ptes. This can cause problems with ptes being incorrectly flagged as
read only when they are writable but not dirty.

This patch introduces a new software bit PTE_WRITE which allows us to
correctly identify writable ptes. PTE_RDONLY is now only clear for
valid ptes where a page is both writable and dirty.

Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-31 11:30:49 +00:00