linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-13 06:32:50 +00:00

Author	SHA1	Message	Date
Linus Torvalds	a9a16a6d13	IOMMU Updates for Linux v4.10 These changes include: * Support for the ACPI IORT table on ARM systems and patches to make the ARM-SMMU driver make use of it * Conversion of the Exynos IOMMU driver to device dependency links and implementation of runtime pm support based on that conversion * Update the Mediatek IOMMU driver to use the new struct device->iommu_fwspec member * Implementation of dma_map/unmap_resource in the generic ARM dma-iommu layer * A number of smaller fixes and improvements all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJYUskLAAoJECvwRC2XARrjGnQQALfaIJffrh+5vjs08h3J86TF lUEvxu1/5PGCwXEWbIH+NQh4pMgBuOL0exa6sR55CqtHPkUKMndv+sVXWUkCrl9Z Q4jro+IuPxbHHdQvULjdqn1gtvPvBXN/OFCbQsfj6G+GblAZCIn0OtkKm2MqdGa4 xHPTAhW+QAt7X8O7PoMsvCNa0loZJx05ToIPbhN23U0H12TwU4aHaMg13+KSGgz/ wEXGDYDU8EnGubcM/JEpy84qywRU1D+TDwzaIFT2sy7Ewhmgz6rSf+SOa+vn4u5z G9xhc6b8Vq+2ZKr1j28pPdrCYMRpQxBWru7t2rcdV0Xa+J1JEPKJfyEPxeldDeXJ 4bqld/D/0nq7MFvGL73qff+oiU2qOJ1VnPrQO8SbHyueeFd4axMYlPsU7GeDWqoz LX1gKuHE1t2GqsNgn/dLouyJGDKjyYzZbio0HuPHGjegkx+dkTXEM0yzLtZJG8hw cZSlYijrRJeEROpM8/5BGid3BOAIx8qlOXO/QzI41e0KnQikkgQBgr1L2+Ki8ZaB mNs/S/BqDr6LSUP5ArQHlZ0wu/pk1ehwYkmzp9j7Ui1jGdSWwbqw7KkLDXd0pL4g NMhsprJLYlnY2GUdrbcDOEWHS9Ex0vRsPKNWoqCggzg/8h8cQmuiDZO346vhlvq/ ORMwDO7LNZuPHqDM0WA+ =EYLL -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "These changes include: - support for the ACPI IORT table on ARM systems and patches to make the ARM-SMMU driver make use of it - conversion of the Exynos IOMMU driver to device dependency links and implementation of runtime pm support based on that conversion - update the Mediatek IOMMU driver to use the new struct device->iommu_fwspec member - implementation of dma_map/unmap_resource in the generic ARM dma-iommu layer - a number of smaller fixes and improvements all over the place" * tag 'iommu-updates-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (44 commits) ACPI/IORT: Make dma masks set-up IORT specific iommu/amd: Missing error code in amd_iommu_init_device() iommu/s390: Drop duplicate header pci.h ACPI/IORT: Introduce iort_iommu_configure ACPI/IORT: Add single mapping function ACPI/IORT: Replace rid map type with type mask iommu/arm-smmu: Add IORT configuration iommu/arm-smmu: Split probe functions into DT/generic portions iommu/arm-smmu-v3: Add IORT configuration iommu/arm-smmu-v3: Split probe functions into DT/generic portions ACPI/IORT: Add support for ARM SMMU platform devices creation ACPI/IORT: Add node match function ACPI: Implement acpi_dma_configure iommu/arm-smmu-v3: Convert struct device of_node to fwnode usage iommu/arm-smmu: Convert struct device of_node to fwnode usage iommu: Make of_iommu_set/get_ops() DT agnostic ACPI/IORT: Add support for IOMMU fwnode registration ACPI/IORT: Introduce linker section for IORT entries probing ACPI: Add FWNODE_ACPI_STATIC fwnode type iommu/arm-smmu: Set SMTNMB_TLBEN in ACR to enable caching of bypass entries ...	2016-12-15 12:24:14 -08:00
Boris Ostrovsky	aec03f89e9	ACPI/NUMA: Do not map pxm to node when NUMA is turned off acpi_map_pxm_to_node() unconditially maps nodes even when NUMA is turned off. So acpi_get_node() might return a node > 0, which is fatal when NUMA is disabled as the rest of the kernel assumes that only node 0 exists. Expose numa_off to the acpi code and return NUMA_NO_NODE when it's set. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: linux-ia64@vger.kernel.org Cc: catalin.marinas@arm.com Cc: rjw@rjwysocki.net Cc: will.deacon@arm.com Cc: linux-acpi@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1481602709-18260-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:32:32 +01:00
Catalin Marinas	ee6a7fce8e	arm64: Remove I-cache invalidation from flush_cache_range() The flush_cache_range() function (similarly for flush_cache_page()) is called when the kernel is changing an existing VA->PA mapping range to either a new PA or to different attributes. Since ARMv8 has PIPT-like D-caches, this function does not need to perform any D-cache maintenance. The I-cache maintenance is already handled via set_pte_at() and flush_cache_range() cannot anyway guarantee that there are no cache lines left after invalidation due to the speculative loads. This patch makes flush_cache_range() a no-op. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-23 18:05:52 +00:00
Catalin Marinas	786889636a	arm64: Handle faults caused by inadvertent user access with PAN enabled When TTBR0_EL1 is set to the reserved page, an erroneous kernel access to user space would generate a translation fault. This patch adds the checks for the software-set PSR_PAN_BIT to emulate a permission fault and report it accordingly. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-21 18:48:54 +00:00
Catalin Marinas	39bc88e5e3	arm64: Disable TTBR0_EL1 during normal kernel execution When the TTBR0 PAN feature is enabled, the kernel entry points need to disable access to TTBR0_EL1. The PAN status of the interrupted context is stored as part of the saved pstate, reusing the PSR_PAN_BIT (22). Restoring access to TTBR0_EL1 is done on exception return if returning to user or returning to a context where PAN was disabled. Context switching via switch_mm() must defer the update of TTBR0_EL1 until a return to user or an explicit uaccess_enable() call. Special care needs to be taken for two cases where TTBR0_EL1 is set outside the normal kernel context switch operation: EFI run-time services (via efi_set_pgd) and CPU suspend (via cpu_(un)install_idmap). Code has been added to avoid deferred TTBR0_EL1 switching as in switch_mm() and restore the reserved TTBR0_EL1 when uninstalling the special TTBR0_EL1. User cache maintenance (user_cache_maint_handler and __flush_cache_user_range) needs the TTBR0_EL1 re-instated since the operations are performed by user virtual address. This patch also removes a stale comment on the switch_mm() function. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-21 18:48:54 +00:00
Catalin Marinas	f33bcf03e6	arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro This patch takes the errata workaround code out of cpu_do_switch_mm into a dedicated post_ttbr0_update_workaround macro which will be reused in a subsequent patch. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-21 17:33:47 +00:00
Catalin Marinas	a8ada146f5	arm64: Update the synchronous external abort fault description This patch updates the description of the synchronous external aborts on translation table walks. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-21 17:33:47 +00:00
Robin Murphy	60c4e804ff	arm64: Wire up iommu_dma_{map, unmap}_resource() With no coherency to worry about, just plug'em straight in. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2016-11-14 16:58:36 +01:00
Mark Rutland	623b476fc8	arm64: move sp_el0 and tpidr_el1 into cpu_suspend_ctx When returning from idle, we rely on the fact that thread_info lives at the end of the kernel stack, and restore this by masking the saved stack pointer. Subsequent patches will sever the relationship between the stack and thread_info, and to cater for this we must save/restore sp_el0 explicitly, storing it in cpu_suspend_ctx. As cpu_suspend_ctx must be doubleword aligned, this leaves us with an extra slot in cpu_suspend_ctx. We can use this to save/restore tpidr_el1 in the same way, which simplifies the code, avoiding pointer chasing on the restore path (as we no longer need to load thread_info::cpu followed by the relevant slot in __per_cpu_offset based on this). This patch stashes both registers in cpu_suspend_ctx. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-11 18:25:44 +00:00
Huang Shijie	0c2f0afe35	arm64: hugetlb: fix the wrong address for several functions The libhugetlbfs meets several failures since the following functions do not use the correct address: huge_ptep_get_and_clear() huge_ptep_set_access_flags() huge_ptep_set_wrprotect() huge_ptep_clear_flush() This patch fixes the wrong address for them. Signed-off-by: Huang Shijie <shijie.huang@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-09 16:55:13 +00:00
Huang Shijie	20156ce236	arm64: hugetlb: remove the wrong pmd check in find_num_contig() The find_num_contig() will return 1 when the pmd is not present. It will cause a kernel dead loop in the following scenaro: 1.) pmd entry is not present. 2.) the page fault occurs: ... hugetlb_fault() --> hugetlb_no_page() --> set_huge_pte_at() 3.) set_huge_pte_at() will only set the first PMD entry, since the find_num_contig just return 1 in this case. So the PMD entries are all empty except the first one. 4.) when kernel accesses the address mapped by the second PMD entry, a new page fault occurs: ... hugetlb_fault() --> huge_ptep_set_access_flags() The second PMD entry is still empty now. 5.) When the kernel returns, the access will cause a page fault again. The kernel will run like the "4)" above. We will see a dead loop since here. The dead loop is caught in the 32M hugetlb page (2M PMD + Contiguous bit). This patch removes wrong pmd check, and fixes this dead loop. This patch also removes the redundant checks for PGD/PUD in the find_num_contig(). Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Huang Shijie <shijie.huang@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-09 16:54:55 +00:00
Catalin Marinas	6ed0038d5e	arm64: Fix typo in add_default_hugepagesz() for 64K pages The default hugepage size when 64K pages are enabled is set to 2MB using the contiguous PTE bit. The add_default_hugepagesz(), however, uses CONT_PMD_SHIFT instead of CONT_PTE_SHIFT. There is no functional change since the values are the same. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-09 16:52:21 +00:00
Pratyush Anand	9842ceae9f	arm64: Add uprobe support This patch adds support for uprobe on ARM64 architecture. Unit tests for following have been done so far and they have been found working 1. Step-able instructions, like sub, ldr, add etc. 2. Simulation-able like ret, cbnz, cbz etc. 3. uretprobe 4. Reject-able instructions like sev, wfe etc. 5. trapped and abort xol path 6. probe at unaligned user address. 7. longjump test cases Currently it does not support aarch32 instruction probing. Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:21 +00:00
Laura Abbott	1404d6f13e	arm64: dump: Add checking for writable and exectuable pages Page mappings with full RWX permissions are a security risk. x86 has an option to walk the page tables and dump any bad pages. (See `e1a58320a3` ("x86/mm: Warn on W^X mappings")). Add a similar implementation for arm64. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [catalin.marinas@arm.com: folded fix for KASan out of bounds from Mark Rutland] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:04 +00:00
Laura Abbott	ae5d1cf358	arm64: dump: Make the page table dumping seq_file optional The page table dumping code always assumes it will be dumping to a seq_file to userspace. Future code will be taking advantage of the page table dumping code but will not need the seq_file. Make the seq_file optional for these cases. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:04 +00:00
Laura Abbott	4ddb9bf833	arm64: dump: Make ptdump debugfs a separate option ptdump_register currently initializes a set of page table information and registers debugfs. There are uses for the ptdump option without wanting the debugfs options. Split this out to make it a separate option. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:04 +00:00
Ard Biesheuvel	0bfc445dec	arm64: mm: set the contiguous bit for kernel mappings where appropriate Now that we no longer allow live kernel PMDs to be split, it is safe to start using the contiguous bit for kernel mappings. So set the contiguous bit in the kernel page mappings for regions whose size and alignment are suitable for this. This enables the following contiguous range sizes for the virtual mapping of the kernel image, and for the linear mapping: granule size \| cont PTE \| cont PMD \| -------------+------------+------------+ 4 KB \| 64 KB \| 32 MB \| 16 KB \| 2 MB \| 1 GB* \| 64 KB \| 2 MB \| 16 GB* \| * Only when built for 3 or more levels of translation. This is due to the fact that a 2 level configuration only consists of PGDs and PTEs, and the added complexity of dealing with folded PMDs is not justified considering that 16 GB contiguous ranges are likely to be ignored by the hardware (and 16k/2 levels is a niche configuration) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:04 +00:00
Ard Biesheuvel	f14c66ce81	arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only' In preparation of adding support for contiguous PTE and PMD mappings, let's replace 'block_mappings_allowed' with 'page_mappings_only', which will be a more accurate description of the nature of the setting once we add such contiguous mappings into the mix. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:04 +00:00
Ard Biesheuvel	e98216b521	arm64: mm: BUG on unsupported manipulations of live kernel mappings Now that we take care not manipulate the live kernel page tables in a way that may lead to TLB conflicts, the case where a table mapping is replaced by a block mapping can no longer occur. So remove the handling of this at the PUD and PMD levels, and instead, BUG() on any occurrence of live kernel page table manipulations that modify anything other than the permission bits. Since mark_rodata_ro() is the only caller where the kernel mappings that are being manipulated are actually live, drop the various conditional flush_tlb_all() invocations, and add a single call to mark_rodata_ro() instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:03 +00:00
Robin Murphy	b7b941afe5	arm64: Remove pointless WARN_ON in DMA teardown We expect arch_teardown_dma_ops() to be called very late in a device's life, after it has been removed from its bus, and thus after the IOMMU bus notifier has run. As such, even if this funny little check did make sense, it's unlikely to achieve what it thinks it's trying to do anyway. It's a residual trace of an earlier implementation which didn't belong here from the start; belatedly snuff it out. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-11-07 18:15:03 +00:00
Hanjun Guo	3f7a09f44e	arm64/numa: fix incorrect log for memory-less node When booting on NUMA system with memory-less node (no memory dimm on this memory controller), the print for setup_node_data() is incorrect: NUMA: Initmem setup node 2 [mem 0x00000000-0xffffffffffffffff] It can be fixed by printing [mem 0x00000000-0x00000000] when end_pfn is 0, but print <memory-less node> will be more useful. Fixes: `1a2db30034` ("arm64, numa: Add NUMA support for arm64 platforms.") Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-26 18:21:51 +01:00
Yisheng Xie	26984c3bc2	arm64/numa: fix pcpu_cpu_distance() to get correct CPU proximity The pcpu_build_alloc_info() function group CPUs according to their proximity, by call callback function @cpu_distance_fn from different ARCHs. For arm64 the callback of @cpu_distance_fn is pcpu_cpu_distance(from, to) -> node_distance(from, to) The @from and @to for function node_distance() should be nid. However, pcpu_cpu_distance() in arch/arm64/mm/numa.c just past the cpu id for @from and @to, and didn't convert to numa node id. For this incorrect cpu proximity get from ARCH, it may cause each CPU in one group and make group_cnt out of bound: setup_per_cpu_areas() pcpu_embed_first_chunk() pcpu_build_alloc_info() in pcpu_build_alloc_info, since cpu_distance_fn will return REMOTE_DISTANCE if we pass cpu ids (0,1,2...), so cpu_distance_fn(cpu, tcpu) > LOCAL_DISTANCE will wrongly be ture. This may results in triggering the BUG_ON(unit != nr_units) later: [ 0.000000] kernel BUG at mm/percpu.c:1916! [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc1-00003-g14155ca-dirty #26 [ 0.000000] Hardware name: Hisilicon Hi1616 Evaluation Board (DT) [ 0.000000] task: ffff000008d6e900 task.stack: ffff000008d60000 [ 0.000000] PC is at pcpu_embed_first_chunk+0x420/0x704 [ 0.000000] LR is at pcpu_embed_first_chunk+0x3bc/0x704 [ 0.000000] pc : [<ffff000008c754f4>] lr : [<ffff000008c75490>] pstate: 800000c5 [ 0.000000] sp : ffff000008d63eb0 [ 0.000000] x29: ffff000008d63eb0 [ 0.000000] x28: 0000000000000000 [ 0.000000] x27: 0000000000000040 [ 0.000000] x26: ffff8413fbfcef00 [ 0.000000] x25: 0000000000000042 [ 0.000000] x24: 0000000000000042 [ 0.000000] x23: 0000000000001000 [ 0.000000] x22: 0000000000000046 [ 0.000000] x21: 0000000000000001 [ 0.000000] x20: ffff000008cb3bc8 [ 0.000000] x19: ffff8413fbfcf570 [ 0.000000] x18: 0000000000000000 [ 0.000000] x17: ffff000008e49ae0 [ 0.000000] x16: 0000000000000003 [ 0.000000] x15: 000000000000001e [ 0.000000] x14: 0000000000000004 [ 0.000000] x13: 0000000000000000 [ 0.000000] x12: 000000000000006f [ 0.000000] x11: 00000413fbffff00 [ 0.000000] x10: 0000000000000004 [ 0.000000] x9 : 0000000000000000 [ 0.000000] x8 : 0000000000000001 [ 0.000000] x7 : ffff8413fbfcf63c [ 0.000000] x6 : ffff000008d65d28 [ 0.000000] x5 : ffff000008d65e50 [ 0.000000] x4 : 0000000000000000 [ 0.000000] x3 : ffff000008cb3cc8 [ 0.000000] x2 : 0000000000000040 [ 0.000000] x1 : 0000000000000040 [ 0.000000] x0 : 0000000000000000 [...] [ 0.000000] Call trace: [ 0.000000] Exception stack(0xffff000008d63ce0 to 0xffff000008d63e10) [ 0.000000] 3ce0: ffff8413fbfcf570 0001000000000000 ffff000008d63eb0 ffff000008c754f4 [ 0.000000] 3d00: ffff000008d63d50 ffff0000081af210 00000413fbfff010 0000000000001000 [ 0.000000] 3d20: ffff000008d63d50 ffff0000081af220 00000413fbfff010 0000000000001000 [ 0.000000] 3d40: 00000413fbfcef00 0000000000000004 ffff000008d63db0 ffff0000081af390 [ 0.000000] 3d60: 00000413fbfcef00 0000000000001000 0000000000000000 0000000000001000 [ 0.000000] 3d80: 0000000000000000 0000000000000040 0000000000000040 ffff000008cb3cc8 [ 0.000000] 3da0: 0000000000000000 ffff000008d65e50 ffff000008d65d28 ffff8413fbfcf63c [ 0.000000] 3dc0: 0000000000000001 0000000000000000 0000000000000004 00000413fbffff00 [ 0.000000] 3de0: 000000000000006f 0000000000000000 0000000000000004 000000000000001e [ 0.000000] 3e00: 0000000000000003 ffff000008e49ae0 [ 0.000000] [<ffff000008c754f4>] pcpu_embed_first_chunk+0x420/0x704 [ 0.000000] [<ffff000008c6658c>] setup_per_cpu_areas+0x38/0xc8 [ 0.000000] [<ffff000008c608d8>] start_kernel+0x10c/0x390 [ 0.000000] [<ffff000008c601d8>] __primary_switched+0x5c/0x64 [ 0.000000] Code: b8018660 17ffffd7 6b16037f 54000080 (d4210000) [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! Fix by getting cpu's node id with early_cpu_to_node() then pass it to node_distance() as the original intention. Fixes: `7af3a0a992` ("arm64/numa: support HAVE_SETUP_PER_CPU_AREA") Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-26 18:21:51 +01:00
Mark Rutland	f7881bd644	arm64: remove pr_cont abuse from mem_init All the lines printed by mem_init are independent, with each ending with a newline. While they logically form a large block, none are actually continuations of previous lines. The kernel-side printk code and the userspace demsg tool differ in their handling of KERN_CONT following a newline, and while this isn't always a problem kernel-side, it does cause difficulty for userspace. Using pr_cont causes the userspace tool to not print line prefix (e.g. timestamps) even when following a newline, mis-aligning the output and making it harder to read, e.g. [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000 ( 128 MB) vmalloc : 0xffff000008000000 - 0xffff7dffbfff0000 (129022 GB) .text : 0xffff000008080000 - 0xffff0000088b0000 ( 8384 KB) .rodata : 0xffff0000088b0000 - 0xffff000008c50000 ( 3712 KB) .init : 0xffff000008c50000 - 0xffff000008d50000 ( 1024 KB) .data : 0xffff000008d50000 - 0xffff000008e25200 ( 853 KB) .bss : 0xffff000008e25200 - 0xffff000008e6bec0 ( 284 KB) fixed : 0xffff7dfffe7fd000 - 0xffff7dfffec00000 ( 4108 KB) PCI I/O : 0xffff7dfffee00000 - 0xffff7dffffe00000 ( 16 MB) vmemmap : 0xffff7e0000000000 - 0xffff800000000000 ( 2048 GB maximum) 0xffff7e0000000000 - 0xffff7e0026000000 ( 608 MB actual) memory : 0xffff800000000000 - 0xffff800980000000 ( 38912 MB) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1 Fix this by using pr_notice consistently for all lines, which both the kernel and userspace are happy with. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 15:27:56 +01:00
James Morse	7209c86860	arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call Commit `338d4f49d6` ("arm64: kernel: Add support for Privileged Access Never") enabled PAN by enabling the 'SPAN' feature-bit in SCTLR_EL1. This means the PSTATE.PAN bit won't be set until the next return to the kernel from userspace. On a preemptible kernel we may schedule work that accesses userspace on a CPU before it has done this. Now that cpufeature enable() calls are scheduled via stop_machine(), we can set PSTATE.PAN from the cpu_enable_pan() call. Add WARN_ON_ONCE(in_interrupt()) to check the PSTATE value we updated is not immediately discarded. Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> [will: fixed typo in comment] Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 09:50:53 +01:00
James Morse	2a6dcb2b5f	arm64: cpufeature: Schedule enable() calls instead of calling them via IPI The enable() call for a cpufeature/errata is called using on_each_cpu(). This issues a cross-call IPI to get the work done. Implicitly, this stashes the running PSTATE in SPSR when the CPU receives the IPI, and restores it when we return. This means an enable() call can never modify PSTATE. To allow PAN to do this, change the on_each_cpu() call to use stop_machine(). This schedules the work on each CPU which allows us to modify PSTATE. This involves changing the protype of all the enable() functions. enable_cpu_capabilities() is called during boot and enables the feature on all online CPUs. This path now uses stop_machine(). CPU features for hotplug'd CPUs are enabled by verify_local_cpu_features() which only acts on the local CPU, and can already modify the running PSTATE as it is called from secondary_start_kernel(). Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 09:50:53 +01:00
Linus Torvalds	56e520c7a0	IOMMU Updates for Linux v4.9 Including: * Support for interrupt virtualization in the AMD IOMMU driver. These patches were shared with the KVM tree and are already merged through that tree. * Generic DT-binding support for the ARM-SMMU driver. With this the driver now makes use of the generic DMA-API code. This also required some changes outside of the IOMMU code, but these are acked by the respective maintainers. * More cleanups and fixes all over the place. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJX/PmUAAoJECvwRC2XARrjyloP/1hymxXC2yXZ4EIBTHSO5X+c jSJaGTIbQAdQDpllscSNJ0Au43L3vGtJcHo4JqwEERNlwLsU82LH7QJhq+q1La/b 5cPaY5gI3E++qxQt8umuZJAIUQthFYrfGoS5lJc5t5r/d8iVsLWbW4VkR19/1o7A 4/Uz7ETmi9VVy8Hkvumx+PQ0VHJet381KB7ud9LU5Spim0En2AAGwZXLMkmxXd2W uDQ+O1rlDVc2/ka3+GmfZEml5EASWRqS/MTNoU/ZbQGYWKCWygXbuiqt6gLudWjx dCR1Knh68b0gN6k/QAj8XY/1gkfmZ3YkfS0AHIMLYTFRT51BuxOrkXrBdkYnWEBv UirmaiV87SlR1j83yb3ZmjpBPvd2sGWYFDqY1P0riLutjGUS6zycWWs13olvbfbz SFrH7PT7JPQGYprI1oVn4ihszjN1NZ4+Gj7QBhyFW6FtvqTzmaFVsMOlDIeg1FwR k8cOzov4NG33Bp4IpsHK8e0/qV6K3oJOiOQgCyQp9kPKK+UWv9v9+HaEA7npJuRV c+lTE6j3G4LjEoVybkqm8TiPKxTMVNjUjgA3kwB2yNkCQT7hTCNYIAFrtfCYjYdo B1dnFE7feVqtoimnu2qvkVs59hWlF7Hc3RRHoBxMmO8DLwl9n2OcmoQIeCTsviss i9aNwC9bzBs+Hd3X/psB =1hFE -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - support for interrupt virtualization in the AMD IOMMU driver. These patches were shared with the KVM tree and are already merged through that tree. - generic DT-binding support for the ARM-SMMU driver. With this the driver now makes use of the generic DMA-API code. This also required some changes outside of the IOMMU code, but these are acked by the respective maintainers. - more cleanups and fixes all over the place. * tag 'iommu-updates-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (40 commits) iommu/amd: No need to wait iommu completion if no dte irq entry change iommu/amd: Free domain id when free a domain of struct dma_ops_domain iommu/amd: Use standard bitmap operation to set bitmap iommu/amd: Clean up the cmpxchg64 invocation iommu/io-pgtable-arm: Check for v7s-incapable systems iommu/dma: Avoid PCI host bridge windows iommu/dma: Add support for mapping MSIs iommu/arm-smmu: Set domain geometry iommu/arm-smmu: Wire up generic configuration support Docs: dt: document ARM SMMU generic binding usage iommu/arm-smmu: Convert to iommu_fwspec iommu/arm-smmu: Intelligent SMR allocation iommu/arm-smmu: Add a stream map entry iterator iommu/arm-smmu: Streamline SMMU data lookups iommu/arm-smmu: Refactor mmu-masters handling iommu/arm-smmu: Keep track of S2CR state iommu/arm-smmu: Consolidate stream map entry state iommu/arm-smmu: Handle stream IDs more dynamically iommu/arm-smmu: Set PRIVCFG in stage 1 STEs iommu/arm-smmu: Support non-PCI devices with SMMUv3 ...	2016-10-11 12:52:41 -07:00
Linus Torvalds	7af8a0f808	arm64 updates for 4.9: - Support for execute-only page permissions - Support for hibernate and DEBUG_PAGEALLOC - Support for heterogeneous systems with mismatches cache line sizes - Errata workarounds (A53 843419 update and QorIQ A-008585 timer bug) - arm64 PMU perf updates, including cpumasks for heterogeneous systems - Set UTS_MACHINE for building rpm packages - Yet another head.S tidy-up - Some cleanups and refactoring, particularly in the NUMA code - Lots of random, non-critical fixes across the board -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJX7k31AAoJELescNyEwWM0XX0H/iOaWCfKlWOhvBsStGUCsLrK XryTzQT2KjdnLKf3jwP+1ateCuBR5ROurYxoDCX5/7mD63c5KiI338Vbv61a1lE1 AAwjt1stmQVUg/j+kqnuQwB/0DYg+2C8se3D3q5Iyn7zc19cDZJEGcBHNrvLMufc XgHrgHgl/rzBDDlHJXleknDFge/MfhU5/Q1vJMRRb4JYrpAtmIokzCO75CYMRcCT ND2QbmppKtsyuFPGUTVbAFzJlP6dGKb3eruYta7/ct5d0pJQxav3u98D2yWGfjdM YaYq1EmX5Pol7rWumqLtk0+mA9yCFcKLLc+PrJu20Vx0UkvOq8G8Xt70sHNvZU8= =gdPM -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "It's a bit all over the place this time with no "killer feature" to speak of. Support for mismatched cache line sizes should help people seeing whacky JIT failures on some SoCs, and the big.LITTLE perf updates have been a long time coming, but a lot of the changes here are cleanups. We stray outside arch/arm64 in a few areas: the arch/arm/ arch_timer workaround is acked by Russell, the DT/OF bits are acked by Rob, the arch_timer clocksource changes acked by Marc, CPU hotplug by tglx and jump_label by Peter (all CC'd). Summary: - Support for execute-only page permissions - Support for hibernate and DEBUG_PAGEALLOC - Support for heterogeneous systems with mismatches cache line sizes - Errata workarounds (A53 843419 update and QorIQ A-008585 timer bug) - arm64 PMU perf updates, including cpumasks for heterogeneous systems - Set UTS_MACHINE for building rpm packages - Yet another head.S tidy-up - Some cleanups and refactoring, particularly in the NUMA code - Lots of random, non-critical fixes across the board" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (100 commits) arm64: tlbflush.h: add __tlbi() macro arm64: Kconfig: remove SMP dependence for NUMA arm64: Kconfig: select OF/ACPI_NUMA under NUMA config arm64: fix dump_backtrace/unwind_frame with NULL tsk arm/arm64: arch_timer: Use archdata to indicate vdso suitability arm64: arch_timer: Work around QorIQ Erratum A-008585 arm64: arch_timer: Add device tree binding for A-008585 erratum arm64: Correctly bounds check virt_addr_valid arm64: migrate exception table users off module.h and onto extable.h arm64: pmu: Hoist pmu platform device name arm64: pmu: Probe default hw/cache counters arm64: pmu: add fallback probe table MAINTAINERS: Update ARM PMU PROFILING AND DEBUGGING entry arm64: Improve kprobes test for atomic sequence arm64/kvm: use alternative auto-nop arm64: use alternative auto-nop arm64: alternative: add auto-nop infrastructure arm64: lse: convert lse alternatives NOP padding to use __nops arm64: barriers: introduce nops and __nops macros for NOP sequences arm64: sysreg: replace open-coded mrs_s/msr_s with {read,write}_sysreg_s ...	2016-10-03 08:58:35 -07:00
Joerg Roedel	6e0a16673c	Merge branch 'for-joerg/arm-smmu/updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu	2016-09-20 13:24:14 +02:00
Paul Gortmaker	0edfa83916	arm64: migrate exception table users off module.h and onto extable.h These files were only including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and avoid all the extra header content in module.h that we don't really need to compile these files. Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-20 09:36:21 +01:00
Robin Murphy	fade1ec055	iommu/dma: Avoid PCI host bridge windows With our DMA ops enabled for PCI devices, we should avoid allocating IOVAs which a host bridge might misinterpret as peer-to-peer DMA and lead to faults, corruption or other badness. To be safe, punch out holes for all of the relevant host bridge's windows when initialising a DMA domain for a PCI device. CC: Marek Szyprowski <m.szyprowski@samsung.com> CC: Inki Dae <inki.dae@samsung.com> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-16 09:34:22 +01:00
Mark Rutland	6ba3b554f5	arm64: use alternative auto-nop Make use of the new alternative_if and alternative_else_nop_endif and get rid of our homebew NOP sleds, making the code simpler to read. Note that for cpu_do_switch_mm the ret has been moved out of the alternative sequence, and in the default case there will be three additional NOPs executed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-12 10:46:07 +01:00
Zhen Lei	7ba5f605f3	arm64/numa: remove the limitation that cpu0 must bind to node0 1. Remove the old binding code. 2. Read the nid of cpu0 from dts. 3. Fallback the nid of cpu0 to 0 when numa=off is set in bootargs. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-09 14:59:09 +01:00
Zhen Lei	df7ffa34cc	arm64/numa: remove some useless code When the deleted code is executed, only the bit of cpu0 was set on cpu_possible_mask. So that, only set_cpu_numa_node(0, NUMA_NO_NODE); will be executed. And map_cpu_to_node(0, 0) will soon be called. So these code can be safely removed. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-09 14:59:09 +01:00
Zhen Lei	7af3a0a992	arm64/numa: support HAVE_SETUP_PER_CPU_AREA To make each percpu area allocated from its local numa node. Without this patch, all percpu areas will be allocated from the node which cpu0 belongs to. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-09 14:59:09 +01:00
Kefeng Wang	f11c7bacd5	arm64: numa: Use pr_fmt() Use pr_fmt to prefix kernel output, and remove duplicated msg of NUMA turned off. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-09 14:59:09 +01:00
Zhen Lei	794224ea56	arm64/numa: avoid inconsistent information to be printed numa_init may return error because of numa configuration error. So "No NUMA configuration found" is inaccurate. In fact, specific configuration error information should be immediately printed by the testing branch. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-09 14:59:08 +01:00
Kefeng Wang	dae8c235d9	arm64: mm: drop fixup_init() and mm.h There is only fixup_init() in mm.h , and it is only called in free_initmem(), so move the codes from fixup_init() into free_initmem(), then drop fixup_init() and mm.h. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-09-06 19:09:38 +01:00
James Morse	744c6c37cc	arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1 Changes to make the resume from cpu_suspend() code behave more like secondary boot caused debug exceptions to be unmasked early by __cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(), potentially taking break or watch points based on uninitialised registers. Mask debug exceptions in cpu_do_resume(), which is specific to resume from cpu_suspend(). Debug exceptions will be restored to their original state by local_dbg_restore() in cpu_suspend(), which runs after hw_breakpoint_restore() has re-initialised the other registers. Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: `cabe1c81ea` ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va") Cc: <stable@vger.kernel.org> # 4.7+ Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-09-02 17:19:55 +01:00
James Morse	5ebe3a44cc	arm64: hibernate: Support DEBUG_PAGEALLOC DEBUG_PAGEALLOC removes the valid bit of page table entries to prevent any access to unallocated memory. Hibernate uses this as a hint that those pages don't need to be saved/restored. This patch adds the kernel_page_present() function it uses. hibernate.c copies the resume kernel's linear map for use during restore. Add _copy_pte() to fill-in the holes made by DEBUG_PAGEALLOC in the resume kernel, so we can restore data the original kernel had at these addresses. Finally, DEBUG_PAGEALLOC means the linear-map alias of KERNEL_START to KERNEL_END may have holes in it, so we can't lazily clean this whole area to the PoC. Only clean the new mmuoff region, and the kernel/kvm idmaps. This reverts commit `da24eb1f3f`. Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-08-25 18:00:30 +01:00
James Morse	b611303811	arm64: vmlinux.ld: Add mmuoff data sections and move mmuoff text into idmap Resume from hibernate needs to clean any text executed by the kernel with the MMU off to the PoC. Collect these functions together into the .idmap.text section as all this code is tightly coupled and also needs the same cleaning after resume. Data is more complicated, secondary_holding_pen_release is written with the MMU on, clean and invalidated, then read with the MMU off. In contrast __boot_cpu_mode is written with the MMU off, the corresponding cache line is invalidated, so when we read it with the MMU on we don't get stale data. These cache maintenance operations conflict with each other if the values are within a Cache Writeback Granule (CWG) of each other. Collect the data into two sections .mmuoff.data.read and .mmuoff.data.write, the linker script ensures mmuoff.data.write section is aligned to the architectural maximum CWG of 2KB. Signed-off-by: James Morse <james.morse@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-08-25 18:00:30 +01:00
Catalin Marinas	cab15ce604	arm64: Introduce execute-only page access permissions The ARMv8 architecture allows execute-only user permissions by clearing the PTE_UXN and PTE_USER bits. However, the kernel running on a CPU implementation without User Access Override (ARMv8.2 onwards) can still access such page, so execute-only page permission does not protect against read(2)/write(2) etc. accesses. Systems requiring such protection must enable features like SECCOMP. This patch changes the arm64 __P100 and __S100 protection_map[] macros to the new __PAGE_EXECONLY attributes. A side effect is that pte_user() no longer triggers for __PAGE_EXECONLY since PTE_USER isn't set. To work around this, the check is done on the PTE_NG bit via the pte_ng() macro. VM_READ is also checked now for page faults. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-08-25 18:00:29 +01:00
Jisheng Zhang	5a9e3e156e	arm64: apply __ro_after_init to some objects These objects are set during initialization, thereafter are read only. Previously I only want to mark vdso_pages, vdso_spec, vectors_page and cpu_ops as __read_mostly from performance point of view. Then inspired by Kees's patch[1] to apply more __ro_after_init for arm, I think it's better to mark them as __ro_after_init. What's more, I find some more objects are also read only after init. So apply __ro_after_init to all of them. This patch also removes global vdso_pagelist and tries to clean up vdso_spec[] assignment code. [1] http://www.spinics.net/lists/arm-kernel/msg523188.html Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-08-22 12:32:29 +01:00
Kwangwoo Lee	d34fdb7081	arm64: mm: convert __dma_* routines to use start, size __dma_* routines have been converted to use start and size instread of start and end addresses. The patch was origianlly for adding __clean_dcache_area_poc() which will be used in pmem driver to clean dcache to the PoC(Point of Coherency) in arch_wb_cache_pmem(). The functionality of __clean_dcache_area_poc() was equivalent to __dma_clean_range(). The difference was __dma_clean_range() uses the end address, but __clean_dcache_area_poc() uses the size to clean. Thus, __clean_dcache_area_poc() has been revised with a fallthrough function of __dma_clean_range() after the change that __dma_* routines use start and size instead of using start and end. As a consequence of using start and size, the name of __dma_* routines has also been altered following the terminology below: area: takes a start and size range: takes a start and end Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-08-22 10:00:48 +01:00
Catalin Marinas	a93a4d6232	arm64: Fix shift warning in arch/arm64/mm/dump.c When building with 48-bit VAs and 16K page configuration, it's possible to get the following warning when building the arm64 page table dumping code: arch/arm64/mm/dump.c: In function ‘walk_pud’: arch/arm64/mm/dump.c:274:102: warning: right shift count >= width of type [-Wshift-count-overflow] This is because pud_offset(pgd, 0) performs a shift to the right by 36 while the value 0 has the type 'int' by default, therefore 32-bit. This patch modifies all the p*_offset() uses in arch/arm64/mm/dump.c to use 0UL for the address argument. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-08-18 12:38:11 +01:00
Catalin Marinas	bfe6c8a89e	arm64: Fix NUMA build error when !CONFIG_ACPI Since asm/acpi.h is only included by linux/acpi.h when CONFIG_ACPI is enabled, disabling the latter leads to the following build error on arm64: arch/arm64/mm/numa.c: In function ‘arm64_numa_init’: arch/arm64/mm/numa.c:395:24: error: ‘arm64_acpi_numa_init’ undeclared (first use in this function) if (!acpi_disabled && !numa_init(arm64_acpi_numa_init)) This patch include the asm/acpi.h explicitly in arch/arm64/mm/numa.c for the arm64_acpi_numa_init() definition. Fixes: `d8b47fca8c` ("arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT") Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-08-17 17:16:58 +01:00
Laura Abbott	9adeb8e72d	arm64: Handle el1 synchronous instruction aborts cleanly Executing from a non-executable area gives an ugly message: lkdtm: Performing direct entry EXEC_RODATA lkdtm: attempting ok execution at ffff0000084c0e08 lkdtm: attempting bad execution at ffff000008880700 Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL) CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13 Hardware name: linux,dummy-virt (DT) task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000 PC is at lkdtm_rodata_do_nothing+0x0/0x8 LR is at execute_location+0x74/0x88 The 'IABT (current EL)' indicates the error but it's a bit cryptic without knowledge of the ARM ARM. There is also no indication of the specific address which triggered the fault. The increase in kernel page permissions makes hitting this case more likely as well. Handling the case in the vectors gives a much more familiar looking error message: lkdtm: Performing direct entry EXEC_RODATA lkdtm: attempting ok execution at ffff0000084c0840 lkdtm: attempting bad execution at ffff000008880680 Unable to handle kernel paging request at virtual address ffff000008880680 pgd = ffff8000089b2000 [ffff000008880680] pgd=00000000489b4003, pud=0000000048904003, *pmd=0000000000000000 Internal error: Oops: 8400000e [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ #24 Hardware name: linux,dummy-virt (DT) task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000 PC is at lkdtm_rodata_do_nothing+0x0/0x8 LR is at execute_location+0x74/0x88 Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-08-12 17:58:48 +01:00
Linus Torvalds	194d6ad32e	arm64 fixes: - Fix HugeTLB leak due to CoW and PTE_RDONLY mismatch - Avoid accessing unmapped FDT fields when checking validity - Correctly account for vDSO AUX entry in ARCH_DLINFO - Fix kallsyms with absolute expressions in linker script - Kill unnecessary symbol-based relocs in vmlinux -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJXpFZ5AAoJELescNyEwWM0PI4IALsTuHRzClOSMDLiqMUj8t+5 WNAcqybxAjCOVxAHckhweju++TeJBxcRH1nvBoNwiHIdHTv4fq1TZ3PeEq9kWMg5 JbKjYjvd9dW8k6LXMya8iXCYtG3kzbNejkNpOTVebC86yvas1IiEjNb/ztPdhJeM HBSOkhfk8RcskfNxhuscZzGXbbdH9/R+XSTNRHN/RwCZH8PlInmduD9BbMvDhZyP NLFonD2IgQ4as1kYG/HdIcw0BamHiURjd043+gyoqMvm7JjPksRzlQnr91SMkX17 LykXjHYPi2Me3aTrZ1NtkUNd5FHLHZ6/b9Wg6nA19d5KWkd3ER9uSJqGxkkbnt0= =dtGK -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - fix HugeTLB leak due to CoW and PTE_RDONLY mismatch - avoid accessing unmapped FDT fields when checking validity - correctly account for vDSO AUX entry in ARCH_DLINFO - fix kallsyms with absolute expressions in linker script - kill unnecessary symbol-based relocs in vmlinux * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix copy-on-write referencing in HugeTLB arm64: mm: avoid fdt_check_header() before the FDT is fully mapped arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux arm64: vmlinux.lds: make __rela_offset and __dynsym_offset ABSOLUTE	2016-08-06 08:58:59 -04:00
Krzysztof Kozlowski	00085f1efa	dma-mapping: use unsigned long for dma_attrs The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-04 08:50:07 -04:00
Ard Biesheuvel	04a8481061	arm64: mm: avoid fdt_check_header() before the FDT is fully mapped As reported by Zijun, the fdt_check_header() call in __fixmap_remap_fdt() is not safe since it is not guaranteed that the FDT header is mapped completely. Due to the minimum alignment of 8 bytes, the only fields we can assume to be mapped are 'magic' and 'totalsize'. Since the OF layer is in charge of validating the FDT image, and we are only interested in making reasonably sure that the size field contains a meaningful value, replace the fdt_check_header() call with an explicit comparison of the magic field's value against the expected value. Cc: <stable@vger.kernel.org> Reported-by: Zijun Hu <zijun_hu@htc.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-08-01 14:17:01 +01:00
Dennis Chen	cb0a650213	arm64:acpi: fix the acpi alignment exception when 'mem=' specified When booting an ACPI enabled kernel with 'mem=x', there is the possibility that ACPI data regions from the firmware will lie above the memory limit. Ordinarily these will be removed by memblock_enforce_memory_limit(.). Unfortunately, this means that these regions will then be mapped by acpi_os_ioremap(.) as device memory (instead of normal) thus unaligned accessess will then provoke alignment faults. In this patch we adopt memblock_mem_limit_remove_map instead, and this preserves these ACPI data regions (marked NOMAP) thus ensuring that these regions are not mapped as device memory. For example, below is an alignment exception observed on ARM platform when booting the kernel with 'acpi=on mem=8G': ... Unable to handle kernel paging request at virtual address ffff0000080521e7 pgd = ffff000008aa0000 [ffff0000080521e7] pgd=000000801fffe003, pud=000000801fffd003, pmd=000000801fffc003, pte=00e80083ff1c1707 Internal error: Oops: 96000021 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc3-next-20160616+ #172 Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1001A 02/09/2016 task: ffff800001ef0000 ti: ffff800001ef8000 task.ti: ffff800001ef8000 PC is at acpi_ns_lookup+0x520/0x734 LR is at acpi_ns_lookup+0x4a4/0x734 pc : [<ffff0000083b8b10>] lr : [<ffff0000083b8a94>] pstate: 60000045 sp : ffff800001efb8b0 x29: ffff800001efb8c0 x28: 000000000000001b x27: 0000000000000001 x26: 0000000000000000 x25: ffff800001efb9e8 x24: ffff000008a10000 x23: 0000000000000001 x22: 0000000000000001 x21: ffff000008724000 x20: 000000000000001b x19: ffff0000080521e7 x18: 000000000000000d x17: 00000000000038ff x16: 0000000000000002 x15: 0000000000000007 x14: 0000000000007fff x13: ffffff0000000000 x12: 0000000000000018 x11: 000000001fffd200 x10: 00000000ffffff76 x9 : 000000000000005f x8 : ffff000008725fa8 x7 : ffff000008a8df70 x6 : ffff000008a8df70 x5 : ffff000008a8d000 x4 : 0000000000000010 x3 : 0000000000000010 x2 : 000000000000000c x1 : 0000000000000006 x0 : 0000000000000000 ... acpi_ns_lookup+0x520/0x734 acpi_ds_load1_begin_op+0x174/0x4fc acpi_ps_build_named_op+0xf8/0x220 acpi_ps_create_op+0x208/0x33c acpi_ps_parse_loop+0x204/0x838 acpi_ps_parse_aml+0x1bc/0x42c acpi_ns_one_complete_parse+0x1e8/0x22c acpi_ns_parse_table+0x8c/0x128 acpi_ns_load_table+0xc0/0x1e8 acpi_tb_load_namespace+0xf8/0x2e8 acpi_load_tables+0x7c/0x110 acpi_init+0x90/0x2c0 do_one_initcall+0x38/0x12c kernel_init_freeable+0x148/0x1ec kernel_init+0x10/0xec ret_from_fork+0x10/0x40 Code: b9009fbc 2a00037b 36380057 3219037b (b9400260) ---[ end trace 03381e5eb0a24de4 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b With 'efi=debug', we can see those ACPI regions loaded by firmware on that board as: efi: 0x0083ff185000-0x0083ff1b4fff [Reserved \| \| \| \| \| \| \| \| \|WB\|WT\|WC\|UC]* efi: 0x0083ff1b5000-0x0083ff1c2fff [ACPI Reclaim Memory\| \| \| \| \| \| \| \| \|WB\|WT\|WC\|UC]* efi: 0x0083ff223000-0x0083ff224fff [ACPI Memory NVS \| \| \| \| \| \| \| \| \|WB\|WT\|WC\|UC]* Link: http://lkml.kernel.org/r/1468475036-5852-3-git-send-email-dennis.chen@arm.com Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Dennis Chen <dennis.chen@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Kaly Xin <kaly.xin@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Linus Torvalds	e831101a73	arm64 updates for 4.8: - Kexec support for arm64 - Kprobes support - Expose MIDR_EL1 and REVIDR_EL1 CPU identification registers to sysfs - Trapping of user space cache maintenance operations and emulation in the kernel (CPU errata workaround) - Clean-up of the early page tables creation (kernel linear mapping, EFI run-time maps) to avoid splitting larger blocks (e.g. pmds) into smaller ones (e.g. ptes) - VDSO support for CLOCK_MONOTONIC_RAW in clock_gettime() - ARCH_HAS_KCOV enabled for arm64 - Optimise IP checksum helpers - SWIOTLB optimisation to only allocate/initialise the buffer if the available RAM is beyond the 32-bit mask - Properly handle the "nosmp" command line argument - Fix for the initialisation of the CPU debug state during early boot - vdso-offsets.h build dependency workaround - Build fix when RANDOMIZE_BASE is enabled with MODULES off -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXmF/UAAoJEGvWsS0AyF7x+jwP/2fErtX6FTXmdG0c3HBkTpuy gEuzN2ByWbP6Io+unLC6NvbQQb1q6c73PTqjsoeMHUx2o8YK3jgWEBcC+7AuepoZ YGl3r08e75a/fGrgNwEQQC1lNlgjpog4kzVDh5ji6oRXNq+OkjJGUtRPe3gBoqxv NAjviciID/MegQaq4SaMd26AmnjuUGKogo5vlIaXK0SemX9it+ytW7eLAXuVY+gW EvO3Nxk0Y5oZKJF8qRw6oLSmw1bwn2dD26OgfXfCiI30QBookRyWIoXRedUOZmJq D0+Tipd7muO4PbjlxS8aY/wd/alfnM5+TJ6HpGDo+Y1BDauXfiXMf3ktDFE5QvJB KgtICmC0stWwbDT35dHvz8sETsrCMA2Q/IMrnyxG+nj9BxVQU7rbNrxfCXesJy7Q 4EsQbcTyJwu+ECildBezfoei99XbFZyWk2vKSkTCFKzgwXpftGFaffgZ3DIzBAHH IjecDqIFENC8ymrjyAgrGjeFG+2WB/DBgoSS3Baiz6xwQqC4wFMnI3jPECtJjb/U 6e13f+onXu5lF1YFKAiRjGmqa/G1ZMr+uKZFsembuGqsZdAPkzzUHyAE9g4JVO8p t3gc3/M3T7oLSHuw4xi1/Ow5VGb2UvbslFrp7OpuFZ7CJAvhKlHL5rPe385utsFE 7++5WHXHAegeJCDNAKY2 =iJOY -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Kexec support for arm64 - Kprobes support - Expose MIDR_EL1 and REVIDR_EL1 CPU identification registers to sysfs - Trapping of user space cache maintenance operations and emulation in the kernel (CPU errata workaround) - Clean-up of the early page tables creation (kernel linear mapping, EFI run-time maps) to avoid splitting larger blocks (e.g. pmds) into smaller ones (e.g. ptes) - VDSO support for CLOCK_MONOTONIC_RAW in clock_gettime() - ARCH_HAS_KCOV enabled for arm64 - Optimise IP checksum helpers - SWIOTLB optimisation to only allocate/initialise the buffer if the available RAM is beyond the 32-bit mask - Properly handle the "nosmp" command line argument - Fix for the initialisation of the CPU debug state during early boot - vdso-offsets.h build dependency workaround - Build fix when RANDOMIZE_BASE is enabled with MODULES off * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits) arm64: arm: Fix-up the removal of the arm64 regs_query_register_name() prototype arm64: Only select ARM64_MODULE_PLTS if MODULES=y arm64: mm: run pgtable_page_ctor() on non-swapper translation table pages arm64: mm: make create_mapping_late() non-allocating arm64: Honor nosmp kernel command line option arm64: Fix incorrect per-cpu usage for boot CPU arm64: kprobes: Add KASAN instrumentation around stack accesses arm64: kprobes: Cleanup jprobe_return arm64: kprobes: Fix overflow when saving stack arm64: kprobes: WARN if attempting to step with PSTATE.D=1 arm64: debug: remove unused local_dbg_{enable, disable} macros arm64: debug: remove redundant spsr manipulation arm64: debug: unmask PSTATE.D earlier arm64: localise Image objcopy flags arm64: ptrace: remove extra define for CPSR's E bit kprobes: Add arm64 case in kprobe example module arm64: Add kernel return probes support (kretprobes) arm64: Add trampoline code for kretprobes arm64: kprobes instruction simulation support arm64: Treat all entry code as non-kprobe-able ...	2016-07-27 11:16:05 -07:00
Linus Torvalds	0e06f5c0de	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...	2016-07-26 19:55:54 -07:00
Kirill A. Shutemov	dcddffd41d	mm: do not pass mm_struct into handle_mm_fault We always have vma->vm_mm around. Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-26 16:19:19 -07:00
Ard Biesheuvel	1378dc3d4b	arm64: mm: run pgtable_page_ctor() on non-swapper translation table pages The kernel page table creation routines are accessible to other subsystems (e.g., EFI) via the create_pgd_mapping() entry point, which allows mappings to be created that are not covered by init_mm. Since generic code such as apply_to_page_range() may expect translation table pages that are not associated with init_mm to be covered by fully constructed struct pages, add a call to pgtable_page_ctor() in the alloc function used by create_pgd_mapping. Since it is no longer used by create_mapping_late(), also update the name of this function to better reflect its purpose. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Laura Abbott <labbott@redhat.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-25 17:43:36 +01:00
Ard Biesheuvel	258a1605c7	arm64: mm: make create_mapping_late() non-allocating The only purpose served by create_mapping_late() is to remap the already mapped .text and .rodata kernel segments with read-only permissions. Since we no longer allow block mappings to be split or merged, create_mapping_late() should not pass an allocation function pointer into __create_pgd_mapping(). So pass NULL instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Laura Abbott <labbott@redhat.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-25 17:42:53 +01:00
Rafael J. Wysocki	d85f4eb699	Merge branch 'acpi-numa' * acpi-numa: ACPI / NUMA: Enable ACPI based NUMA on ARM64 arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT ACPI / processor: Add acpi_map_madt_entry() ACPI / NUMA: Improve SRAT error detection and add messages ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.c ACPI / NUMA: remove unneeded acpi_numa=1 ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.c x86 / ACPI / NUMA: cleanup acpi_numa_processor_affinity_init() arm64, NUMA: Cleanup NUMA disabled messages arm64, NUMA: rework numa_add_memblk() ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.c ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 only ACPI / NUMA: remove duplicate NULL check ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug() ACPI / NUMA: Use pr_fmt() instead of printk	2016-07-25 13:40:39 +02:00
Catalin Marinas	a95b0644b3	Merge branch 'for-next/kprobes' into for-next/core * kprobes: arm64: kprobes: Add KASAN instrumentation around stack accesses arm64: kprobes: Cleanup jprobe_return arm64: kprobes: Fix overflow when saving stack arm64: kprobes: WARN if attempting to step with PSTATE.D=1 kprobes: Add arm64 case in kprobe example module arm64: Add kernel return probes support (kretprobes) arm64: Add trampoline code for kretprobes arm64: kprobes instruction simulation support arm64: Treat all entry code as non-kprobe-able arm64: Blacklist non-kprobe-able symbol arm64: Kprobes with single stepping support arm64: add conditional instruction simulation support arm64: Add more test functions to insn.c arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature	2016-07-21 18:20:41 +01:00
Will Deacon	2ce39ad151	arm64: debug: unmask PSTATE.D earlier Clearing PSTATE.D is one of the requirements for generating a debug exception. The arm64 booting protocol requires that PSTATE.D is set, since many of the debug registers (for example, the hw_breakpoint registers) are UNKNOWN out of reset and could potentially generate spurious, fatal debug exceptions in early boot code if PSTATE.D was clear. Once the debug registers have been safely initialised, PSTATE.D is cleared, however this is currently broken for two reasons: (1) The boot CPU clears PSTATE.D in a postcore_initcall and secondary CPUs clear PSTATE.D in secondary_start_kernel. Since the initcall runs after SMP (and the scheduler) have been initialised, there is no guarantee that it is actually running on the boot CPU. In this case, the boot CPU is left with PSTATE.D set and is not capable of generating debug exceptions. (2) In a preemptible kernel, we may explicitly schedule on the IRQ return path to EL1. If an IRQ occurs with PSTATE.D set in the idle thread, then we may schedule the kthread_init thread, run the postcore_initcall to clear PSTATE.D and then context switch back to the idle thread before returning from the IRQ. The exception return path will then restore PSTATE.D from the stack, and set it again. This patch fixes the problem by moving the clearing of PSTATE.D earlier to proc.S. This has the desirable effect of clearing it in one place for all CPUs, long before we have to worry about the scheduler or any exception handling. We ensure that the previous reset of MDSCR_EL1 has completed before unmasking the exception, so that any spurious exceptions resulting from UNKNOWN debug registers are not generated. Without this patch applied, the kprobes selftests have been seen to fail under KVM, where we end up attempting to step the OOL instruction buffer with PSTATE.D set and therefore fail to complete the step. Cc: <stable@vger.kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-19 16:56:46 +01:00
Sandeepa Prabhu	2dd0e8d2d2	arm64: Kprobes with single stepping support Add support for basic kernel probes(kprobes) and jump probes (jprobes) for ARM64. Kprobes utilizes software breakpoint and single step debug exceptions supported on ARM v8. A software breakpoint is placed at the probe address to trap the kernel execution into the kprobe handler. ARM v8 supports enabling single stepping before the break exception return (ERET), with next PC in exception return address (ELR_EL1). The kprobe handler prepares an executable memory slot for out-of-line execution with a copy of the original instruction being probed, and enables single stepping. The PC is set to the out-of-line slot address before the ERET. With this scheme, the instruction is executed with the exact same register context except for the PC (and DAIF) registers. Debug mask (PSTATE.D) is enabled only when single stepping a recursive kprobe, e.g.: during kprobes reenter so that probed instruction can be single stepped within the kprobe handler -exception- context. The recursion depth of kprobe is always 2, i.e. upon probe re-entry, any further re-entry is prevented by not calling handlers and the case counted as a missed kprobe). Single stepping from the x-o-l slot has a drawback for PC-relative accesses like branching and symbolic literals access as the offset from the new PC (slot address) may not be ensured to fit in the immediate value of the opcode. Such instructions need simulation, so reject probing them. Instructions generating exceptions or cpu mode change are rejected for probing. Exclusive load/store instructions are rejected too. Additionally, the code is checked to see if it is inside an exclusive load/store sequence (code from Pratyush). System instructions are mostly enabled for stepping, except MSR/MRS accesses to "DAIF" flags in PSTATE, which are not safe for probing. This also changes arch/arm64/include/asm/ptrace.h to use include/asm-generic/ptrace.h. Thanks to Steve Capper and Pratyush Anand for several suggested Changes. Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com> Signed-off-by: David A. Long <dave.long@linaro.org> Signed-off-by: Pratyush Anand <panand@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-19 15:03:20 +01:00
Lorenzo Pieralisi	16c11325cc	arm64: mm: change IOMMU notifier action to attach DMA ops Current bus notifier in ARM64 (__iommu_attach_notifier) attempts to attach dma_ops to a device on BUS_NOTIFY_ADD_DEVICE action notification. This will cause issues on ACPI based systems, where PCI devices can be added before the IOMMUs the devices are attached to had a chance to be probed, causing failures on attempts to attach dma_ops in that the domain for the respective IOMMU may not be set-up yet by the time the bus notifier is run. Devices dma_ops do not require to be set-up till the matching device drivers are probed. This means that instead of running the notifier attaching dma_ops to devices (__iommu_attach_notifier) on BUS_NOTIFY_ADD_DEVICE action, it can be run just before the device driver is bound to the device in question (on action BUS_NOTIFY_BIND_DRIVER) so that it is certain that its IOMMU group and domain are set-up accordingly at the time the notifier is triggered. This patch changes the notifier action upon which dma_ops are attached to devices and defer it to driver binding time, so that IOMMU devices have a chance to be probed and to register their bus notifiers before the dma_ops attach sequence for a device is actually carried out. As a result we also no longer need worry about racing with iommu_bus_notifier(), or about retrying the queue in case devices were added too early on DT-based systems, so clean up the notifier itself plus the additional workaround from `722ec35f7f` ("arm64: dma-mapping: fix handling of devices registered before arch_initcall") Acked-by: Will Deacon <will.deacon@arm.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> [rm: get rid of other now-redundant bits] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-08 18:06:04 +01:00
James Morse	e19a6ee246	arm64: kernel: Save and restore UAO and addr_limit on exception entry If we take an exception while at EL1, the exception handler inherits the original context's addr_limit and PSTATE.UAO values. To be consistent always reset addr_limit and PSTATE.UAO on (re-)entry to EL1. This prevents accidental re-use of the original context's addr_limit. Based on a similar patch for arm from Russell King. Cc: <stable@vger.kernel.org> # 4.6- Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-07-07 15:55:37 +01:00
Ard Biesheuvel	40f87d3114	arm64: mm: fold init_pgd() into __create_pgd_mapping() The routine __create_pgd_mapping() does nothing except calling init_pgd(), which has no other callers. So fold the latter into the former. Also, drop a comment that has gone stale. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-01 11:56:27 +01:00
Catalin Marinas	4133af6c04	arm64: mm: Remove split_pd() functions Since the efi_create_mapping() no longer generates block mappings and being the last user of the split_pd code, remove these functions and the corresponding TLBI. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [ardb: replace 'overlapping regions' with 'block mappings' in commit log] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-01 11:56:27 +01:00
Ard Biesheuvel	53e1b32910	arm64: mm: add param to force create_pgd_mapping() to use page mappings Add a bool parameter 'allow_block_mappings' to create_pgd_mapping() and the various helper functions that it descends into, to give the caller control over whether block entries may be used to create the mapping. The UEFI runtime mapping routines will use this to avoid creating block entries that would need to split up into page entries when applying the permissions listed in the Memory Attributes firmware table. This also replaces the block_mappings_allowed() helper function that was added for DEBUG_PAGEALLOC functionality, but the resulting code is functionally equivalent (given that debug_page_alloc does not operate on EFI page table entries anyway) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-01 11:56:26 +01:00
Andre Przywara	290622efc7	arm64: fix "dc cvau" cache operation on errata-affected core The ARM errata 819472, 826319, 827319 and 824069 for affected Cortex-A53 cores demand to promote "dc cvau" instructions to "dc civac" as well. Attribute the usage of the instruction in __flush_cache_user_range to also be covered by our alternative patching efforts. For that we introduce an assembly macro which both deals with alternatives while still tagging the instructions as USER. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-07-01 11:26:20 +01:00
Kefeng Wang	6c5269f33e	arm64: mm: remove unnecessary BUG_ON The memblock_alloc() and memblock_alloc_base() will panic on their own if no free memory, remove pointless BUG_ON. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-06-30 17:55:04 +01:00
Ard Biesheuvel	9fdc14c55c	arm64: mm: fix location of _etext As Kees Cook notes in the ARM counterpart of this patch [0]: The _etext position is defined to be the end of the kernel text code, and should not include any part of the data segments. This interferes with things that might check memory ranges and expect executable code up to _etext. In particular, Kees is referring to the HARDENED_USERCOPY patch set [1], which rejects attempts to call copy_to_user() on kernel ranges containing executable code, but does allow access to the .rodata segment. Regardless of whether one may or may not agree with the distinction, it makes sense for _etext to have the same meaning across architectures. So let's put _etext where it belongs, between .text and .rodata, and fix up existing references to use __init_begin instead, which unlike _end_rodata includes the exception and notes sections as well. The _etext references in kaslr.c are left untouched, since its references to [_stext, _etext) are meant to capture potential jump instruction targets, and so disregarding .rodata is actually an improvement here. [0] http://article.gmane.org/gmane.linux.kernel/2245084 [1] http://thread.gmane.org/gmane.linux.kernel.hardened.devel/2502 Reported-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-06-27 18:21:27 +01:00
Mark Rutland	ea2cbee3bc	arm64: mm: simplify memblock numa node extraction We currently open-code extracting the NUMA node of a memblock region, which requires an ifdef to cater for !CONFIG_NUMA builds where the memblock_region::nid field does not exist. The generic memblock_get_region_node helper is intended to cater for this. For CONFIG_HAVE_MEMBLOCK_NODE_MAP, builds this returns reg->nid, and for for !CONFIG_HAVE_MEMBLOCK_NODE_MAP builds this is a static inline that returns 0. Note that for arm64, CONFIG_HAVE_MEMBLOCK_NODE_MAP is selected iff CONFIG_NUMA is. This patch makes use of memblock_get_region_node to simplify the arm64 code. At the same time, we can move the nid variable definition into the loop, as this is the only place it is used. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-06-27 18:05:39 +01:00
Shaokun Zhang	20c27a4270	arm64: mm: remove page_mapping check in __sync_icache_dcache __sync_icache_dcache unconditionally skips the cache maintenance for anonymous pages, under the assumption that flushing is only required in the presence of D-side aliases [see `7249b79f6b` ("arm64: Do not flush the D-cache for anonymous pages")]. Unfortunately, this breaks migration of anonymous pages holding self-modifying code, where userspace cannot be reasonably expected to reissue maintenance instructions in response to a migration. This patch fixes the problem by removing the broken page_mapping(page) check from the cache syncing code, otherwise we may end up fetching and executing stale instructions from the PoU. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-06-21 20:10:18 +01:00
Jean-Philippe Brucker	f7e0efc9b5	arm64: update ASID limit During a rollover, we mark the active ASID on each CPU as reserved, before allocating a new ID for the task that caused the rollover. This means that with N CPUs, we can only guarantee the new task to obtain a valid ASID if we have at least N+1 ASIDs. Update this limit in the initcall check. Note that this restriction was introduced by commit `8e648066` on the arch/arm side, which disallow re-using the previously active ASID on the local CPU, as it would introduce a TLB race. In addition, we only dispose of NUM_USER_ASIDS-1, since ASID 0 is reserved. Add this restriction as well. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-06-21 20:10:18 +01:00
Mark Rutland	541ec870ef	arm64: kill ESR_LNX_EXEC Currently we treat ESR_EL1 bit 24 as software-defined for distinguishing instruction aborts from data aborts, but this bit is architecturally RES0 for instruction aborts, and could be allocated for an arbitrary purpose in future. Additionally, we hard-code the value in entry.S without the mnemonic, making the code difficult to understand. Instead, remove ESR_LNX_EXEC, and distinguish aborts based on the esr, which we already pass to the sole use of ESR_LNX_EXEC. A new helper, is_el0_instruction_abort() is added to make the logic clear. Any instruction aborts taken from EL1 will already have been handled by bad_mode, so we need not handle that case in the helper. For consistency, the existing permission_fault helper is renamed to is_permission_fault, and the return type is changed to bool. There should be no functional changes as the return value was a boolean expression, and the result is only used in another boolean expression. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Dave P Martin <dave.martin@arm.com> Cc: Huang Shijie <shijie.huang@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-06-21 17:07:48 +01:00
Mark Rutland	275f344bec	arm64: add macro to extract ESR_ELx.EC Several places open-code extraction of the EC field from an ESR_ELx value, in subtly different ways. This is unfortunate duplication and variation, and the precise logic used to extract the field is a distraction. This patch adds a new macro, ESR_ELx_EC(), to extract the EC field from an ESR_ELx value in a consistent fashion. Existing open-coded extractions in core arm64 code are moved over to the new helper. KVM code is left as-is for the moment. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Huang Shijie <shijie.huang@arm.com> Cc: Dave P Martin <dave.martin@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-06-21 17:07:09 +01:00
Jisheng Zhang	b67a8b29df	arm64: mm: only initialize swiotlb when necessary we only initialize swiotlb when swiotlb_force is true or not all system memory is DMA-able, this trivial optimization saves us 64MB when swiotlb is not necessary. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-06-21 16:54:53 +01:00
Mark Rutland	4674fdb9f1	arm64: mm: dump: make page table dumping reusable For debugging purposes, it would be nice if we could export page tables other than the swapper_pg_dir to userspace. To enable this, this patch refactors the arm64 page table dumping code such that multiple tables may be registered with the framework, and exported under debugfs. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-06-21 15:09:11 +01:00
Mark Rutland	bbb1681ee3	arm64: mm: mark fault_info table const Unlike the debug_fault_info table, we never intentionally alter the fault_info table at runtime, and all derived pointers are treated as const currently. Make the table const so that it can be placed in .rodata and protected from unintentional writes, as we do for the syscall tables. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-06-14 15:02:34 +01:00
Will Deacon	0106d456c4	arm64: mm: always take dirty state from new pte in ptep_set_access_flags Commit `66dbd6e61a` ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") ensured that pte flags are updated atomically in the face of potential concurrent, hardware-assisted updates. However, Alex reports that: \| This patch breaks swapping for me. \| In the broken case, you'll see either systemd cpu time spike (because \| it's stuck in a page fault loop) or the system hang (because the \| application owning the screen is stuck in a page fault loop). It turns out that this is because the 'dirty' argument to ptep_set_access_flags is always 0 for read faults, and so we can't use it to set PTE_RDONLY. The failing sequence is: 1. We put down a PTE_WRITE \| PTE_DIRTY \| PTE_AF pte 2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF 3. A read faults due to the missing access flag 4. ptep_set_access_flags is called with dirty = 0, due to the read fault 5. pte is then made PTE_WRITE \| PTE_DIRTY \| PTE_AF \| PTE_RDONLY (!) 6. A write faults, but pte_write is true so we get stuck The solution is to check the new page table entry (as would be done by the generic, non-atomic definition of ptep_set_access_flags that just calls set_pte_at) to establish the dirty state. Cc: <stable@vger.kernel.org> # 4.3+ Fixes: `66dbd6e61a` ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-06-08 10:23:44 +01:00
Mark Rutland	48dd73c55d	arm64: mm: dump: log span level The page table dump code logs spans of entries at the same level (pgd/pud/pmd/pte) which have the same attributes. While we log the (decoded) attributes, we don't log the level, which leaves the output ambiguous and/or confusing in some cases. For example: 0xffff800800000000-0xffff800980000000 6G RW NX SHD AF BLK UXN MEM/NORMAL If using 4K pages, this may describe a span of 6 1G block entries at the PGD/PUD level, or 3072 2M block entries at the PMD level. This patch adds the page table level to each output line, removing this ambiguity. For the example above, this will produce: 0xffffffc800000000-0xffffffc980000000 6G PUD RW NX SHD AF BLK UXN MEM/NORMAL When 3 level tables are in use, and we use the asm-generic/nopud.h definitions, the dump code treats each entry in the PGD as a 1 element table at the PUD level, and logs spans as being PUDs, which can be confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD" when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when CONFIG_PGTABLE_LEVELS <= 2. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Huang Shijie <shijie.huang@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-06-03 10:16:22 +01:00
Will Deacon	ab2e1b8923	Revert "arm64: hugetlb: partial revert of 66b3923a1a0f" This reverts commit `ff7925848b`. Now that the contiguous-hint hugetlb regression has been debugged and fixed upstream by `66ee95d16a` ("mm: exclude HugeTLB pages from THP page_mapped() logic"), we can revert the previous partial revert of this feature. Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-05-31 11:00:09 +01:00
Hanjun Guo	d8b47fca8c	arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT Introduce a new file to hold ACPI based NUMA information parsing from SRAT and SLIT. SRAT includes the CPU ACPI ID to Proximity Domain mappings and memory ranges to Proximity Domain mapping. SLIT has the information of inter node distances(relative number for access latency). Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> [rrichter@cavium.com Reworked for numa v10 series ] Signed-off-by: Robert Richter <rrichter@cavium.com> [david.daney@cavium.com reorderd and combinded with other patches in Hanjun Guo's original set, removed get_mpidr_in_madt() and use acpi_map_madt_entry() instead.] Signed-off-by: David Daney <david.daney@cavium.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Dennis Chen <dennis.chen@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-30 14:27:09 +02:00
David Daney	34c3337052	arm64, NUMA: Cleanup NUMA disabled messages As noted by Dennis Chen, we don't want to print "No NUMA configuration found" if NUMA was forced off from the command line. Change the type of numa_off to bool, and clean up printing code. Print "NUMA disabled" if forced off on command line and "No NUMA configuration found" if there was no firmware NUMA information. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Dennis Chen <dennis.chen@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-30 14:27:08 +02:00
Hanjun Guo	8ccbbdaa2b	arm64, NUMA: rework numa_add_memblk() Rework numa_add_memblk() to update the parameter "u64 size" to "u64 end", this will make it consistent with x86 and simplifies the arm64 ACPI NUMA code to be added later. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-30 14:27:07 +02:00
Linus Torvalds	a05a70db34	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - fsnotify fix - poll() timeout fix - a few scripts/ tweaks - debugobjects updates - the (small) ocfs2 queue - Minor fixes to kernel/padata.c - Maybe half of the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) mm, page_alloc: restore the original nodemask if the fast path allocation failed mm, page_alloc: uninline the bad page part of check_new_page() mm, page_alloc: don't duplicate code in free_pcp_prepare mm, page_alloc: defer debugging checks of pages allocated from the PCP mm, page_alloc: defer debugging checks of freed pages until a PCP drain cpuset: use static key better and convert to new API mm, page_alloc: inline pageblock lookup in page free fast paths mm, page_alloc: remove unnecessary variable from free_pcppages_bulk mm, page_alloc: pull out side effects from free_pages_check mm, page_alloc: un-inline the bad part of free_pages_check mm, page_alloc: check multiple page fields with a single branch mm, page_alloc: remove field from alloc_context mm, page_alloc: avoid looking up the first zone in a zonelist twice mm, page_alloc: shortcut watermark checks for order-0 pages mm, page_alloc: reduce cost of fair zone allocation policy retry mm, page_alloc: shorten the page allocator fast path mm, page_alloc: check once if a zone has isolated pageblocks mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath mm, page_alloc: simplify last cpupid reset mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() ...	2016-05-19 20:00:06 -07:00
Vaishali Thakkar	d77e20cea7	arm64: mm: use hugetlb_bad_size() Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported hugepage size is found. Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-19 19:12:14 -07:00
Linus Torvalds	e0fb1b3639	IOMMU Updates for Linux v4.7 The updates include: * Rate limiting for the VT-d fault handler * Remove statistics code from the AMD IOMMU driver. It is unused and should be replaced by something more generic if needed * Per-domain pagesize-bitmaps in IOMMU core code to support systems with different types of IOMMUs * Support for ACPI devices in the AMD IOMMU driver * 4GB mode support for Mediatek IOMMU driver * ARM-SMMU updates from Will Deacon: - Support for 64k pages with SMMUv1 implementations (e.g MMU-401) - Remove open-coded 64-bit MMIO accessors - Initial support for 16-bit VMIDs, as supported by some ThunderX SMMU implementations - A couple of errata workarounds for silicon in the field * Various fixes here and there -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJXPeM1AAoJECvwRC2XARrjA2QP/2Cz+pVkpQCuvhAse57eN4rB wWXKTjqSFZ4PcA3Vu5yvX6XMv15g46xXFJAhf2spE5//8+xgFfYBgkBRpnqu1brw SL6f8A912MnfMRgWqcdKkJNeHbiN0kOvcIQv1J8GNfciqMiyYFhiLP6fFiRmWR/F XDBjUeFZ5+Uwf1BAGqw0cVPexeakEbsLHUGqxFsh5g2T4i43aHzO2HJT3IdwWHDt F2ivs8gNFGBeJEyzhW8TD0rOEEyHAnM3N18qPEU9+dD0UmjnTQPymEZSbsGW5d4j Cn40QYlA+Zmbwgx6LaDVChzQyRJu6O3uvFThyRviiYKCri/Nc9cUT4vHsFGU4MXb 1d3bqrgzaw7vw31BN7S1Py3MV+WpVnEYjFm2O+hW28OjtSpm6ZvbI8wc0rF4UT/I KgL0gSeA8tp25uVISM+ktpIrObYsAcoCz8nvurpDv2AGkKRzhyoSze0Jg43rusD8 BH7iFWu1LRPlulTGlrHMtNmbZeEApUPbObcQAOcrBOj9vjuFaZ8qduZmB+hwS2iV p9atn+54LmGO0LuzqsGrhApIeXTeTZSrGyjlbUADWBJlTw8Xyk/CR39Wf3m/Xmpr DiJ/5oa8SKQtNbwvbScn1+sInNWP/pH/JgnRO3Yvqth8HWF/DlpzNj5XxAB8czwr qjk9WjpEXun50ocPFQeS =jpPD -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "The updates include: - rate limiting for the VT-d fault handler - remove statistics code from the AMD IOMMU driver. It is unused and should be replaced by something more generic if needed - per-domain pagesize-bitmaps in IOMMU core code to support systems with different types of IOMMUs - support for ACPI devices in the AMD IOMMU driver - 4GB mode support for Mediatek IOMMU driver - ARM-SMMU updates from Will Deacon: - support for 64k pages with SMMUv1 implementations (e.g MMU-401) - remove open-coded 64-bit MMIO accessors - initial support for 16-bit VMIDs, as supported by some ThunderX SMMU implementations - a couple of errata workarounds for silicon in the field - various fixes here and there" * tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (44 commits) iommu/arm-smmu: Use per-domain page sizes. iommu/amd: Remove statistics code iommu/dma: Finish optimising higher-order allocations iommu: Allow selecting page sizes per domain iommu: of: enforce const-ness of struct iommu_ops iommu: remove unused priv field from struct iommu_ops iommu/dma: Implement scatterlist segment merging iommu/arm-smmu: Clear cache lock bit of ACR iommu/arm-smmu: Support SMMUv1 64KB supplement iommu/arm-smmu: Decouple context format from kernel config iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses io-64-nonatomic: Add relaxed accessor variants iommu/arm-smmu: Work around MMU-500 prefetch errata iommu/arm-smmu: Convert ThunderX workaround to new method iommu/arm-smmu: Differentiate specific implementations iommu/arm-smmu: Workaround for ThunderX erratum #27704 iommu/arm-smmu: Add support for 16 bit VMID iommu/amd: Move get_device_id() and friends to beginning of file iommu/amd: Don't use IS_ERR_VALUE to check integer values iommu/amd: Signedness bug in acpihid_device_group() ...	2016-05-19 17:07:04 -07:00
Robin Murphy	3b6b7e19e3	iommu/dma: Finish optimising higher-order allocations Now that we know exactly which page sizes our caller wants to use in the given domain, we can restrict higher-order allocation attempts to just those sizes, if any, and avoid wasting any time or effort on other sizes which offer no benefit. In the same vein, this also lets us accommodate a minimum order greater than 0 for special cases. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Yong Wu <yong.wu@mediatek.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2016-05-09 15:33:29 +02:00
Robin Murphy	53c92d7933	iommu: of: enforce const-ness of struct iommu_ops As a set of driver-provided callbacks and static data, there is no compelling reason for struct iommu_ops to be mutable in core code, so enforce const-ness throughout. Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2016-05-09 15:33:29 +02:00
Yang Shi	394bf2f248	arm64: mm: remove unnecessary EXPORT_SYMBOL_GPL arch_pick_mmap_layout is only called by fs/exec.c which is always built into kernel, it looks the EXPORT_SYMBOL_GPL is pointless and no architectures export it other than ARM64. Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-05-05 09:49:38 +01:00
James Morse	cabe1c81ea	arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va By enabling the MMU early in cpu_resume(), the sleep_save_sp and stack can be accessed by VA, which avoids the need to convert-addresses and clean to PoC on the suspend path. MMU setup is shared with the boot path, meaning the swapper_pg_dir is restored directly: ttbr1_el1 is no longer saved/restored. struct sleep_save_sp is removed, replacing it with a single array of pointers. cpu_do_{suspend,resume} could be further reduced to not restore: cpacr_el1, mdscr_el1, tcr_el1, vbar_el1 and sctlr_el1, all of which are set by __cpu_setup(). However these values all contain res0 bits that may be used to enable future features. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-28 12:05:46 +01:00
Geoff Levand	7b7293ae3d	arm64: Fold proc-macros.S into assembler.h To allow the assembler macros defined in arch/arm64/mm/proc-macros.S to be used outside the mm code move the contents of proc-macros.S to asm/assembler.h. Also, delete proc-macros.S, and fix up all references to proc-macros.S. Signed-off-by: Geoff Levand <geoff@infradead.org> Acked-by: Pavel Machek <pavel@ucw.cz> [rebased, included dcache_by_line_op] Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-28 12:05:45 +01:00
Ard Biesheuvel	d8fc68a04d	arm64: ptdump: add region marker for kasan shadow region Annotate the KASAN shadow region with boundary markers, so that its mappings stand out in the page table dumper output. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-25 12:05:21 +01:00
Ard Biesheuvel	c8f8cca483	arm64: ptdump: use static initializers for vmemmap region boundaries There is no need to initialize the vmemmap region boundaries dynamically, since they are compile time constants. So just add these constants to the global struct initializer, and drop the dynamic assignment and related code. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-25 12:04:39 +01:00
Robin Murphy	921b1f52c9	arm64/dma-mapping: Remove default domain workaround With the IOMMU core now taking care of default domains for groups regardless of bus type, we can gleefully rip out this stop-gap, as slight recompense for having to expand the other one. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-21 17:34:34 +01:00
Robin Murphy	226d89cbb2	arm64/dma-mapping: Extend DMA ops workaround to PCI devices PCI devices now suffer the same hiccup as platform devices, in that they get their DMA ops configured before they have been added to their bus, and thus before we know whether they have successfully registered with an IOMMU or not. Until the necessary driver core changes to reorder calls during device creation have been worked out, extend our delayed notifier trick onto the PCI bus so as to avoid broken DMA ops once IOMMUs get plugged into the PCI code. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-21 17:34:34 +01:00
Kefeng Wang	9974723e31	arm64: mm: Show bss segment in kernel memory layout Show the bss segment information as with text and data in Virtual memory kernel layout. Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-19 17:03:31 +01:00
Kefeng Wang	d32351c824	arm64: mm: make pr_cont() per line in Virtual kernel memory layout Each line with single pr_cont() in Virtual kernel memory layout, or the dump of the kernel memory layout in dmesg is not aligned when PRINTK_TIME enabled, due to the missing time stamps. Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-19 17:01:34 +01:00
Huang Shijie	3a72db703c	arm64: mm: remove the redundant code We already re-enable interrupts where necessary in the entry code, so there is no need to do it again in do_page fault. This patch removes the redundant code. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Huang Shijie <shijie.huang@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-19 09:52:51 +01:00
Catalin Marinas	66dbd6e61a	arm64: Implement ptep_set_access_flags() for hardware AF/DBM When hardware updates of the access and dirty states are enabled, the default ptep_set_access_flags() implementation based on calling set_pte_at() directly is potentially racy. This triggers the "racy dirty state clearing" warning in set_pte_at() because an existing writable PTE is overridden with a clean entry. There are two main scenarios for this situation: 1. The CPU getting an access fault does not support hardware updates of the access/dirty flags. However, a different agent in the system (e.g. SMMU) can do this, therefore overriding a writable entry with a clean one could potentially lose the automatically updated dirty status 2. A more complex situation is possible when all CPUs support hardware AF/DBM: a) Initial state: shareable + writable vma and pte_none(pte) b) Read fault taken by two threads of the same process on different CPUs c) CPU0 takes the mmap_sem and proceeds to handling the fault. It eventually reaches do_set_pte() which sets a writable + clean pte. CPU0 releases the mmap_sem d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The pte entry it reads is present, writable and clean and it continues to pte_mkyoung() e) CPU1 calls ptep_set_access_flags() If between (d) and (e) the hardware (another CPU) updates the dirty state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit marking the entry clean again. This patch implements an arm64-specific ptep_set_access_flags() function to perform an atomic update of the PTE flags. Fixes: `2f4b829c62` ("arm64: Add support for hardware updates of the access and dirty pte bits") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Julien Grall <julien.grall@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> # 4.3+ [will: reworded comment] Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-15 18:06:09 +01:00
Ganapatrao Kulkarni	1a2db30034	arm64, numa: Add NUMA support for arm64 platforms. Attempt to get the memory and CPU NUMA node via of_numa. If that fails, default the dummy NUMA node and map all memory and CPUs to node 0. Tested-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-15 18:06:09 +01:00
David Daney	3194ac6e66	arm64: Move unflatten_device_tree() call earlier. In order to extract NUMA information from the device tree, we need to have the tree in its unflattened form. Move the call to bootmem_init() in the tail of paging_init() into setup_arch, and adjust header files so that its declaration is visible. Move the unflatten_device_tree() call between the calls to paging_init() and bootmem_init(). Follow on patches add NUMA handling to bootmem_init(). Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-15 18:06:08 +01:00
Suzuki K Poulose	17eebd1a43	arm64: Add cpu_panic_kernel helper During the activation of a secondary CPU, we could report serious configuration issues and hence request to crash the kernel. We do this for CPU ASID bit check now. We will need it also for handling mismatched exception levels for the CPUs with VHE. Hence, add a helper to do the same for reusability. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-15 18:06:06 +01:00
James Morse	6afedcd23c	arm64: mm: Add trace_irqflags annotations to do_debug_exception() With CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCKDEP and CONFIG_TRACE_IRQFLAGS enabled, lockdep will compare current->hardirqs_enabled with the flags from local_irq_save(). When a debug exception occurs, interrupts are disabled in entry.S, but lockdep isn't told, resulting in: DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) ------------[ cut here ]------------ WARNING: at ../kernel/locking/lockdep.c:3523 Modules linked in: CPU: 3 PID: 1752 Comm: perf Not tainted 4.5.0-rc4+ #2204 Hardware name: ARM Juno development board (r1) (DT) task: ffffffc974868000 ti: ffffffc975f40000 task.ti: ffffffc975f40000 PC is at check_flags.part.35+0x17c/0x184 LR is at check_flags.part.35+0x17c/0x184 pc : [<ffffff80080fc93c>] lr : [<ffffff80080fc93c>] pstate: 600003c5 [...] ---[ end trace 74631f9305ef5020 ]--- Call trace: [<ffffff80080fc93c>] check_flags.part.35+0x17c/0x184 [<ffffff80080ffe30>] lock_acquire+0xa8/0xc4 [<ffffff8008093038>] breakpoint_handler+0x118/0x288 [<ffffff8008082434>] do_debug_exception+0x3c/0xa8 [<ffffff80080854b4>] el1_dbg+0x18/0x6c [<ffffff80081e82f4>] do_filp_open+0x64/0xdc [<ffffff80081d6e60>] do_sys_open+0x140/0x204 [<ffffff80081d6f58>] SyS_openat+0x10/0x18 [<ffffff8008085d30>] el0_svc_naked+0x24/0x28 possible reason: unannotated irqs-off. irq event stamp: 65857 hardirqs last enabled at (65857): [<ffffff80081fb1c0>] lookup_mnt+0xf4/0x1b4 hardirqs last disabled at (65856): [<ffffff80081fb188>] lookup_mnt+0xbc/0x1b4 softirqs last enabled at (65790): [<ffffff80080bdca4>] __do_softirq+0x1f8/0x290 softirqs last disabled at (65757): [<ffffff80080be038>] irq_exit+0x9c/0xd0 This patch adds the annotations to do_debug_exception(), while trying not to call trace_hardirqs_off() if el1_dbg() interrupted a task that already had irqs disabled. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-14 18:40:33 +01:00
Ard Biesheuvel	7eb90f2ff7	arm64: cover the .head.text section in the .text segment mapping Keeping .head.text out of the .text mapping buys us very little: its actual payload is only 4 KB, most of which is padding, but the page alignment may add up to 2 MB (in case of CONFIG_DEBUG_ALIGN_RODATA=y) of additional padding to the uncompressed kernel Image. Also, on 4 KB granule kernels, the 4 KB misalignment of .text forces us to map the adjacent 56 KB of code without the PTE_CONT attribute, and since this region contains things like the vector table and the GIC interrupt handling entry point, this region is likely to benefit from the reduced TLB pressure that results from PTE_CONT mappings. So remove the alignment between the .head.text and .text sections, and use the [_text, _etext) rather than the [_stext, _etext) interval for mapping the .text segment. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-14 18:11:43 +01:00
Ard Biesheuvel	2c09ec06bc	arm64: use 'segment' rather than 'chunk' to describe mapped kernel regions Replace the poorly defined term chunk with segment, which is a term that is already used by the ELF spec to describe contiguous mappings with the same permission attributes of statically allocated ranges of an executable. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-14 18:11:11 +01:00
Ard Biesheuvel	3e1907d5bf	arm64: mm: move vmemmap region right below the linear region This moves the vmemmap region right below PAGE_OFFSET, aka the start of the linear region, and redefines its size to be a power of two. Due to the placement of PAGE_OFFSET in the middle of the address space, whose size is a power of two as well, this guarantees that virt to page conversions and vice versa can be implemented efficiently, by masking and shifting rather than ordinary arithmetic. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-14 16:31:49 +01:00
Ard Biesheuvel	d386825c95	arm64: mm: free __init memory via the linear mapping The implementation of free_initmem_default() expects __init_begin and __init_end to be covered by the linear mapping, which is no longer the case. So open code it instead, using addresses that are explicitly translated from kernel virtual to linear virtual. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-14 16:31:49 +01:00
Ard Biesheuvel	177e15f0c1	arm64: add the initrd region to the linear mapping explicitly Instead of going out of our way to relocate the initrd if it turns out to occupy memory that is not covered by the linear mapping, just add the initrd to the linear mapping. This puts the burden on the bootloader to pass initrd= and mem= options that are mutually consistent. Note that, since the placement of the linear region in the PA space is also dependent on the placement of the kernel Image, which may reside anywhere in memory, we may still end up with a situation where the initrd and the kernel Image are simply too far apart to be covered by the linear region. Since we now leave it up to the bootloader to pass the initrd in memory that is guaranteed to be accessible by the kernel, add a mention of this to the arm64 boot protocol specification as well. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-14 16:20:45 +01:00
Ard Biesheuvel	2958987f5d	arm64/mm: ensure memstart_addr remains sufficiently aligned After choosing memstart_addr to be the highest multiple of ARM64_MEMSTART_ALIGN less than or equal to the first usable physical memory address, we clip the memblocks to the maximum size of the linear region. Since the kernel may be high up in memory, we take care not to clip the kernel itself, which means we have to clip some memory from the bottom if this occurs, to ensure that the distance between the first and the last usable physical memory address can be covered by the linear region. However, we fail to update memstart_addr if this clipping from the bottom occurs, which means that we may still end up with virtual addresses that wrap into the userland range. So increment memstart_addr as appropriate to prevent this from happening. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-04-14 16:15:08 +01:00
Linus Torvalds	9d854607f9	2nd set of arm64 updates for 4.6: - KASLR bug fixes: use callee-saved register, boot-time I-cache maintenance - inv_entry asm macro fix (EL0 check typo) - pr_notice("Virtual kernel memory layout...") splitting - Clean-ups: use p?d_set_huge consistently, allow preemption around copy_to_user_page, remove unused __local_flush_icache_all() -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW9CX3AAoJEGvWsS0AyF7xkdMP/21LWKBKq7IW3ZF17lS90Phw LDiKCcjHJW4Kp/c+OiI31lCBkyzXTkXFlsqam/ju3tb6SYciuFgxmuNWS+6vTvJO 6U2oh8cA+wEYhECjQb0sSR6DkFOi4EKJ6dJ1prVA+lfpJzd6nE5TMMSgaEy1uRmW e1mntJiTG+/IH9RvEj9d+iBi7hpiELDpvjfKO5SDUmbl1ht2xC+s9UqYKrEZNmgW X85jTxC5G1ebCB4K6YtUlPRZo4r+5nzfnXb5ceQhnxpOlGvZfXtEBsJVAH6NJUF3 QOlwtqxsiR1KzgEJ4vfZuvhqT5Au7arezKD57FmJgmqmhZGemLrWOYWaRM3yy+dx I8g6MhsghlYx6pnfUVOImYT7hHslUXrrESP77qnc8LwVe1NpNga5V+NkorjkcnQ/ ydiggyP909Asq5msoeX00cvMJLCq+IYPvHv2VBZ32IhYh2UvAMtzLo90qhr2ouGO 0QM0ZWzpnmJfQuZ74pJs3Of0/5Rr9C003MzN+q4btsJHdtsoKXaC32NswW93hc7l 7Z3ZIJQ/7ftZjLkZaPRnSuScR0h5LmZWG2XPnJTM4wRumeSTO95NiLO6Cwxj+MSj 8b/s97B0on/xVSCh3gZ3MpV4cGDPfZBI5ZLK1roYPfDVCv2jexOsWcJLNpsrmFeO S7xqe1Wwbh2jaB4XcpjW =iTFy -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull second set of arm64 updates from Catalin Marinas: - KASLR bug fixes: use callee-saved register, boot-time I-cache maintenance - inv_entry asm macro fix (EL0 check typo) - pr_notice("Virtual kernel memory layout...") splitting - Clean-ups: use p?d_set_huge consistently, allow preemption around copy_to_user_page, remove unused __local_flush_icache_all() * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: allow preemption in copy_to_user_page arm64: consistently use p?d_set_huge arm64: kaslr: use callee saved register to preserve SCTLR across C call arm64: Split pr_notice("Virtual kernel memory layout...") into multiple pr_cont() arm64: drop unused __local_flush_icache_all() arm64: fix KASLR boot-time I-cache maintenance arm64/kernel: fix incorrect EL0 check in inv_entry macro	2016-03-24 19:13:59 -07:00
Mark Rutland	691b1e2ebf	arm64: mm: allow preemption in copy_to_user_page Currently we disable preemption in copy_to_user_page; a behaviour that we inherited from the 32-bit arm code. This was necessary for older cores without broadcast data cache maintenance, and ensured that cache lines were dirtied and cleaned by the same CPU. On these systems dirty cache line migration was not possible, so this was sufficient to guarantee coherency. On contemporary systems, cache coherence protocols permit (dirty) cache lines to migrate between CPUs as a result of speculation, prefetching, and other behaviours. To account for this, in ARMv8 data cache maintenance operations are broadcast and affect all data caches in the domain associated with the VA (i.e. ISH for kernel and user mappings). In __switch_to we ensure that tasks can be safely migrated in the middle of a maintenance sequence, using a dsb(ish) to ensure prior explicit memory accesses are observed and cache maintenance operations are completed before a task can be run on another CPU. Given the above, it is not necessary to disable preemption in copy_to_user_page. This patch removes the preempt_{disable,enable} calls, permitting preemption. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-03-24 16:32:54 +00:00
Mark Rutland	c661cb1c53	arm64: consistently use p?d_set_huge Commit `324420bf91` ("arm64: add support for ioremap() block mappings") added new p?d_set_huge functions which do the hard work to generate and set a correct block entry. These differ from open-coded huge page creation in the early page table code by explicitly setting the P?D_TYPE_SECT bits (which are implicitly retained by mk_sect_prot() for any valid prot), but are otherwise identical (and cannot fail on arm64). For simplicity and consistency, make use of these in the initial page table creation code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-03-24 16:32:29 +00:00
Catalin Marinas	f09f1bacfe	arm64: Split pr_notice("Virtual kernel memory layout...") into multiple pr_cont() The printk() implementation has a limit of LOG_LINE_MAX (== 1024 - 32) buffer per call which the arm64 mem_init() breaches when printing the virtual memory layout with CONFIG_KASAN enabled. The result is that the last line is no longer printed. This patch splits the call into a pr_notice() + additional pr_cont() calls. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>	2016-03-21 12:19:12 +00:00
Linus Torvalds	814a2bf957	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...	2016-03-18 19:26:54 -07:00
Linus Torvalds	588ab3f9af	arm64 updates for 4.6: - Initial page table creation reworked to avoid breaking large block mappings (huge pages) into smaller ones. The ARM architecture requires break-before-make in such cases to avoid TLB conflicts but that's not always possible on live page tables - Kernel virtual memory layout: the kernel image is no longer linked to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of the vmalloc space, allowing the kernel to be loaded (nearly) anywhere in physical RAM - Kernel ASLR: position independent kernel Image and modules being randomly mapped in the vmalloc space with the randomness is provided by UEFI (efi_get_random_bytes() patches merged via the arm64 tree, acked by Matt Fleming) - Implement relative exception tables for arm64, required by KASLR (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but actual x86 conversion to deferred to 4.7 because of the merge dependencies) - Support for the User Access Override feature of ARMv8.2: this allows uaccess functions (get_user etc.) to be implemented using LDTR/STTR instructions. Such instructions, when run by the kernel, perform unprivileged accesses adding an extra level of protection. The set_fs() macro is used to "upgrade" such instruction to privileged accesses via the UAO bit - Half-precision floating point support (part of ARMv8.2) - Optimisations for CPUs with or without a hardware prefetcher (using run-time code patching) - copy_page performance improvement to deal with 128 bytes at a time - Sanity checks on the CPU capabilities (via CPUID) to prevent incompatible secondary CPUs from being brought up (e.g. weird big.LITTLE configurations) - valid_user_regs() reworked for better sanity check of the sigcontext information (restored pstate information) - ACPI parking protocol implementation - CONFIG_DEBUG_RODATA enabled by default - VDSO code marked as read-only - DEBUG_PAGEALLOC support - ARCH_HAS_UBSAN_SANITIZE_ALL enabled - Erratum workaround Cavium ThunderX SoC - set_pte_at() fix for PROT_NONE mappings - Code clean-ups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+ RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv 50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3 DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7 3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM LepccTgykiUBqW5TRzPz =/oS+ -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Here are the main arm64 updates for 4.6. There are some relatively intrusive changes to support KASLR, the reworking of the kernel virtual memory layout and initial page table creation. Summary: - Initial page table creation reworked to avoid breaking large block mappings (huge pages) into smaller ones. The ARM architecture requires break-before-make in such cases to avoid TLB conflicts but that's not always possible on live page tables - Kernel virtual memory layout: the kernel image is no longer linked to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of the vmalloc space, allowing the kernel to be loaded (nearly) anywhere in physical RAM - Kernel ASLR: position independent kernel Image and modules being randomly mapped in the vmalloc space with the randomness is provided by UEFI (efi_get_random_bytes() patches merged via the arm64 tree, acked by Matt Fleming) - Implement relative exception tables for arm64, required by KASLR (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but actual x86 conversion to deferred to 4.7 because of the merge dependencies) - Support for the User Access Override feature of ARMv8.2: this allows uaccess functions (get_user etc.) to be implemented using LDTR/STTR instructions. Such instructions, when run by the kernel, perform unprivileged accesses adding an extra level of protection. The set_fs() macro is used to "upgrade" such instruction to privileged accesses via the UAO bit - Half-precision floating point support (part of ARMv8.2) - Optimisations for CPUs with or without a hardware prefetcher (using run-time code patching) - copy_page performance improvement to deal with 128 bytes at a time - Sanity checks on the CPU capabilities (via CPUID) to prevent incompatible secondary CPUs from being brought up (e.g. weird big.LITTLE configurations) - valid_user_regs() reworked for better sanity check of the sigcontext information (restored pstate information) - ACPI parking protocol implementation - CONFIG_DEBUG_RODATA enabled by default - VDSO code marked as read-only - DEBUG_PAGEALLOC support - ARCH_HAS_UBSAN_SANITIZE_ALL enabled - Erratum workaround Cavium ThunderX SoC - set_pte_at() fix for PROT_NONE mappings - Code clean-ups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits) arm64: kasan: Fix zero shadow mapping overriding kernel image shadow arm64: kasan: Use actual memory node when populating the kernel image shadow arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission arm64: Fix misspellings in comments. arm64: efi: add missing frame pointer assignment arm64: make mrs_s prefixing implicit in read_cpuid arm64: enable CONFIG_DEBUG_RODATA by default arm64: Rework valid_user_regs arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly arm64: KVM: Move kvm_call_hyp back to its original localtion arm64: mm: treat memstart_addr as a signed quantity arm64: mm: list kernel sections in order arm64: lse: deal with clobbered IP registers after branch via PLT arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR arm64: kconfig: add submenu for 8.2 architectural features arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot arm64: Add support for Half precision floating point arm64: Remove fixmap include fragility arm64: Add workaround for Cavium erratum 27456 arm64: mm: Mark .rodata as RO ...	2016-03-17 20:03:47 -07:00
Jan Kara	0e8fb9312f	mm: remove VM_FAULT_MINOR The define has a comment from Nick Piggin from 2007: /* For backwards compat. Remove me quickly. */ I guess 9 years should not be too hurried sense of 'quickly' even for kernel measures. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Kirill A. Shutemov	3ed3a4f0dd	mm: cleanup pte_alloc interfaces There are few things about pte_alloc() helpers worth cleaning up: - 'vma' argument is unused, let's drop it; - most __pte_alloc() callers do speculative check for pmd_none(), before taking ptl: let's introduce pte_alloc() macro which does the check. The only direct user of __pte_alloc left is userfaultfd, which has different expectation about atomicity wrt pmd. - pte_alloc_map() and pte_alloc_map_lock() are redefined using pte_alloc(). [sudeep.holla@arm.com: fix build for arm64 hugetlbpage] [sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Catalin Marinas	2776e0e8ef	arm64: kasan: Fix zero shadow mapping overriding kernel image shadow With the 16KB and 64KB page size configurations, SWAPPER_BLOCK_SIZE is PAGE_SIZE and ARM64_SWAPPER_USES_SECTION_MAPS is 0. Since kimg_shadow_end is not page aligned (_end shifted by KASAN_SHADOW_SCALE_SHIFT), the edges of previously mapped kernel image shadow via vmemmap_populate() may be overridden by subsequent calls to kasan_populate_zero_shadow(), leading to kernel panics like below: ------------------------------------------------------------------------------ Unable to handle kernel paging request at virtual address fffffc100135068c pgd = fffffc8009ac0000 [fffffc100135068c] pgd=00000009ffee0003, pud=00000009ffee0003, pmd=00000009ffee0003, pte=00e0000081a00793 Internal error: Oops: 9600004f [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-rc4+ #1984 Hardware name: Juno (DT) task: fffffe09001a0000 ti: fffffe0900200000 task.ti: fffffe0900200000 PC is at __memset+0x4c/0x200 LR is at kasan_unpoison_shadow+0x34/0x50 pc : [<fffffc800846f1cc>] lr : [<fffffc800821ff54>] pstate: 00000245 sp : fffffe0900203db0 x29: fffffe0900203db0 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: fffffc80099b69d0 x24: 0000000000000001 x23: 0000000000000000 x22: 0000000000002000 x21: dffffc8000000000 x20: 1fffff9001350a8c x19: 0000000000002000 x18: 0000000000000008 x17: 0000000000000147 x16: ffffffffffffffff x15: 79746972100e041d x14: ffffff0000000000 x13: ffff000000000000 x12: 0000000000000000 x11: 0101010101010101 x10: 1fffffc11c000000 x9 : 0000000000000000 x8 : fffffc100135068c x7 : 0000000000000000 x6 : 000000000000003f x5 : 0000000000000040 x4 : 0000000000000004 x3 : fffffc100134f651 x2 : 0000000000000400 x1 : 0000000000000000 x0 : fffffc100135068c Process swapper/0 (pid: 1, stack limit = 0xfffffe0900200020) Call trace: [<fffffc800846f1cc>] __memset+0x4c/0x200 [<fffffc8008220044>] __asan_register_globals+0x5c/0xb0 [<fffffc8008a09d34>] _GLOBAL__sub_I_65535_1_sunrpc_cache_lookup+0x1c/0x28 [<fffffc8008f20d28>] kernel_init_freeable+0x104/0x274 [<fffffc80089e1948>] kernel_init+0x10/0xf8 [<fffffc8008093a00>] ret_from_fork+0x10/0x50 ------------------------------------------------------------------------------ This patch aligns kimg_shadow_start and kimg_shadow_end to SWAPPER_BLOCK_SIZE in all configurations. Fixes: `f9040773b7` ("arm64: move kernel image to base of vmalloc area") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2016-03-11 11:03:35 +00:00
Catalin Marinas	2f76969f2e	arm64: kasan: Use actual memory node when populating the kernel image shadow With the 16KB or 64KB page configurations, the generic vmemmap_populate() implementation warns on potential offnode page_structs via vmemmap_verify() because the arm64 kasan_init() passes NUMA_NO_NODE instead of the actual node for the kernel image memory. Fixes: `f9040773b7` ("arm64: move kernel image to base of vmalloc area") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: James Morse <james.morse@arm.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com>	2016-03-11 11:03:34 +00:00
Will Deacon	ff7925848b	arm64: hugetlb: partial revert of `66b3923a1a` Commit `66b3923a1a` ("arm64: hugetlb: add support for PTE contiguous bit") introduced support for huge pages using the contiguous bit in the PTE as opposed to block mappings, which may be slightly unwieldy (512M) in 64k page configurations. Unfortunately, this support has resulted in some late regressions when running the libhugetlbfs test suite with 64k pages and CONFIG_DEBUG_VM as a result of a BUG: \| readback (2M: 64): ------------[ cut here ]------------ \| kernel BUG at fs/hugetlbfs/inode.c:446! \| Internal error: Oops - BUG: 0 [#1] SMP \| Modules linked in: \| CPU: 7 PID: 1448 Comm: readback Not tainted 4.5.0-rc7 #148 \| Hardware name: linux,dummy-virt (DT) \| task: fffffe0040964b00 ti: fffffe00c2668000 task.ti: fffffe00c2668000 \| PC is at remove_inode_hugepages+0x44c/0x480 \| LR is at remove_inode_hugepages+0x264/0x480 Rather than revert the entire patch, simply avoid advertising the contiguous huge page sizes for now while people are actively working on a fix. This patch can then be reverted once things have been sorted out. Cc: David Woods <dwoods@ezchip.com> Reported-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-03-09 15:29:29 +00:00
Linus Torvalds	ed385c7a17	arm64 fix: - Ensure struct page array fits within vmemmap area -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW1NaCAAoJELescNyEwWM0Sl8H/3jJiGQixMLmjdfUsZVrtdk5 0Smn4HKYxv3AV2WQ6+miOquOhMz3xrGIHaOf1Zf4GSo3n02drq3pmHqrM2muBpp5 tmw9Q36dApXKIztpBQDqk7yCEpY7rELtIjvaOjta3OOLFbBnTsdGdkp+EWEn6m1g NJ6Cnw96KMHnivbwLpVzbeRQni9E+oJIhpv4p/wy5gSTqMCdJIBsfK3/uv3rszLZ O70F6+ZL9a2wUc4SnSUESpEuFmwoZHWROlZreZlHXQzmuyqpYIJK/JxBMaaz2yC/ 2L2k3kEmgfRwxjh5Jcp5yzKxCJH0ZUYYGoDKoDaIb8iP3SrlTlfp8jBorANIxcY= =dfNx -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "Arm64 fix for -rc7. Without it, our struct page array can overflow the vmemmap region on systems with a large PHYS_OFFSET. Nothing else on the radar at the moment, so hopefully that's it for 4.5 from us. Summary: Ensure struct page array fits within vmemmap area" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: vmemmap: use virtual projection of linear region	2016-03-04 17:43:40 -08:00
Mark Rutland	1cc6ed90dd	arm64: make mrs_s prefixing implicit in read_cpuid Commit `0f54b14e76` ("arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro") changed read_cpuid to require a SYS_ prefix on register names, to allow manual assembly of registers unknown by the toolchain, using tables in sysreg.h. This interacts poorly with commit `42b5573403` ("efi/arm64: Check for h/w support before booting a >4 KB granular kernel"), which is curretly queued via the tip tree, and uses read_cpuid without a SYS_ prefix. Due to this, a build of next-20160304 fails if EFI and 64K pages are selected. To avoid this issue when trees are merged, move the required SYS_ prefixing into read_cpuid, and revert all of the updated callsites to pass plain register names. This effectively reverts the bulk of commit `0f54b14e76`. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-03-04 14:12:46 +00:00
Ard Biesheuvel	6d2aa549de	arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly Commit `8439e62a15` ("arm64: mm: use bit ops rather than arithmetic in pa/va translations") changed the boundary check against PAGE_OFFSET from an arithmetic comparison to a bit test. This means we now silently assume that PAGE_OFFSET is a power of 2 that divides the kernel virtual address space into two equal halves. So make that assumption explicit. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-03-02 09:45:52 +00:00
Ard Biesheuvel	020d044f66	arm64: mm: treat memstart_addr as a signed quantity Commit `c031a4213c` ("arm64: kaslr: randomize the linear region") implements randomization of the linear region, by subtracting a random multiple of PUD_SIZE from memstart_addr. This causes the virtual mapping of system RAM to move upwards in the linear region, and at the same time causes memstart_addr to assume a value which may be negative if the offset of system RAM in the physical space is smaller than its offset relative to PAGE_OFFSET in the virtual space. Since memstart_addr is effectively an offset now, redefine its type as s64 so that expressions involving shifting or division preserve its sign. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-29 18:31:03 +00:00
Ard Biesheuvel	a6e1f7273b	arm64: mm: list kernel sections in order In the boot log, instead of listing .init first, list .text, .rodata, .init and .data in the same order they appear in memory Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-29 17:15:44 +00:00
Daniel Cashman	5ef11c35ce	mm: ASLR: use get_random_long() Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-02-27 10:28:52 -08:00
Kefeng Wang	cc30e6b95c	arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR Use VA_START macro in asm/memory.h instead of private LOWEST_ADDR definition in dump.c. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-26 18:31:41 +00:00
Ard Biesheuvel	dfd55ad85e	arm64: vmemmap: use virtual projection of linear region Commit `dd006da216` ("arm64: mm: increase VA range of identity map") made some changes to the memory mapping code to allow physical memory to reside at an offset that exceeds the size of the virtual mapping. However, since the size of the vmemmap area is proportional to the size of the VA area, but it is populated relative to the physical space, we may end up with the struct page array being mapped outside of the vmemmap region. For instance, on my Seattle A0 box, I can see the following output in the dmesg log. vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000 ( 8 GB maximum) 0xffffffbfc0000000 - 0xffffffbfd0000000 ( 256 MB actual) We can fix this by deciding that the vmemmap region is not a projection of the physical space, but of the virtual space above PAGE_OFFSET, i.e., the linear region. This way, we are guaranteed that the vmemmap region is of sufficient size, and we can even reduce the size by half. Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-02-26 17:59:04 +00:00
Andrew Pinski	104a0c02e8	arm64: Add workaround for Cavium erratum 27456 On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI instructions may cause the icache to become corrupted if it contains data for a non-current ASID. This patch implements the workaround (which invalidates the local icache when switching the mm) by using code patching. Signed-off-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-26 15:14:27 +00:00
Jeremy Linton	2f39b5f91e	arm64: mm: Mark .rodata as RO Currently the .rodata section is actually still executable when DEBUG_RODATA is enabled. This changes that so the .rodata is actually read only, no execute. It also adds the .rodata section to the mem_init banner. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: added vm_struct vmlinux_rodata in map_kernel()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-26 15:08:04 +00:00
Miles Chen	b7dc8d16e7	arm64/mm: remove unnecessary boundary check Remove the unnecessary boundary check since there is a huge gap between user and kernel address that they would never overlap. (arm64 does not have enough levels of page tables to cover 64-bit virtual address) See Documentation/arm64/memory.txt Signed-off-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-26 13:39:53 +00:00
Suzuki K Poulose	28c5dcb22f	arm64: Rename cpuid_feature field extract routines Now that we have a clear understanding of the sign of a feature, rename the routines to reflect the sign, so that it is not misused. The cpuid_feature_extract_field() now accepts a 'sign' parameter. Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-25 10:33:08 +00:00
Suzuki K Poulose	13f417f3b8	arm64: Ensure the secondary CPUs have safe ASIDBits size Adds a hook for checking whether a secondary CPU has the features used already by the kernel during early boot, based on the boot CPU and plugs in the check for ASID size. The ID_AA64MMFR0_EL1:ASIDBits determines the size of the mm context id and is used in the early boot to make decisions. The value is picked up from the Boot CPU and cannot be delayed until other CPUs are up. If a secondary CPU has a smaller size than that of the Boot CPU, things will break horribly and the usual SANITY check is not good enough to prevent the system from crashing. So, crash the system with enough information. Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-25 10:33:06 +00:00
Suzuki K Poulose	038dc9c66a	arm64: Add helper for extracting ASIDBits Add a helper to extract ASIDBits on the current cpu Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-25 10:33:06 +00:00
Ard Biesheuvel	c031a4213c	arm64: kaslr: randomize the linear region When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), and entropy has been provided by the bootloader, randomize the placement of RAM inside the linear region if sufficient space is available. For instance, on a 4KB granule/3 levels kernel, the linear region is 256 GB in size, and we can choose any 1 GB aligned offset that is far enough from the top of the address space to fit the distance between the start of the lowest memblock and the top of the highest memblock. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-24 14:57:27 +00:00
Ard Biesheuvel	f80fb3a3d5	arm64: add support for kernel ASLR This adds support for KASLR is implemented, based on entropy provided by the bootloader in the /chosen/kaslr-seed DT property. Depending on the size of the address space (VA_BITS) and the page size, the entropy in the virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all 4 levels), with the sidenote that displacements that result in the kernel image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB granule kernels, respectively) are not allowed, and will be rounded up to an acceptable value. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is randomized independently from the core kernel. This makes it less likely that the location of core kernel data structures can be determined by an adversary, but causes all function calls from modules into the core kernel to be resolved via entries in the module PLTs. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is randomized by choosing a page aligned 128 MB region inside the interval [_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of entropy (depending on page size), independently of the kernel randomization, but still guarantees that modules are within the range of relative branch and jump instructions (with the caveat that, since the module region is shared with other uses of the vmalloc area, modules may need to be loaded further away if the module region is exhausted) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-24 14:57:27 +00:00
Ard Biesheuvel	6c94f27ac8	arm64: switch to relative exception tables Instead of using absolute addresses for both the exception location and the fixup, use offsets relative to the exception table entry values. Not only does this cut the size of the exception table in half, it is also a prerequisite for KASLR, since absolute exception table entries are subject to dynamic relocation, which is incompatible with the sorting of the exception table that occurs at build time. This patch also introduces the _ASM_EXTABLE preprocessor macro (which exists on x86 as well) and its _asm_extable assembly counterpart, as shorthands to emit exception table entries. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-24 14:57:26 +00:00
Catalin Marinas	70c8abc287	arm64: User die() instead of panic() in do_page_fault() The former gives better error reporting on unhandled permission faults (introduced by the UAO patches). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-19 14:28:58 +00:00
EunTaik Lee	52d7523d84	arm64: mm: allow the kernel to handle alignment faults on user accesses Although we don't expect to take alignment faults on access to normal memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into system calls, leading to things like get_user accessing device memory. Rather than OOPS the kernel, allow any exception fixups to run and return something like -EFAULT back to userspace. This makes the behaviour more consistent with userspace, even though applications with access to device mappings can easily cause other issues if they try hard enough. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com> [will: dropped __kprobes annotation and rewrote commit mesage] Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-02-19 12:20:37 +00:00
Ard Biesheuvel	a7f8de168a	arm64: allow kernel Image to be loaded anywhere in physical memory This relaxes the kernel Image placement requirements, so that it may be placed at any 2 MB aligned offset in physical memory. This is accomplished by ignoring PHYS_OFFSET when installing memblocks, and accounting for the apparent virtual offset of the kernel Image. As a result, virtual address references below PAGE_OFFSET are correctly mapped onto physical references into the kernel Image regardless of where it sits in memory. Special care needs to be taken for dealing with memory limits passed via mem=, since the generic implementation clips memory top down, which may clip the kernel image itself if it is loaded high up in memory. To deal with this case, we simply add back the memory covering the kernel image, which may result in more memory to be retained than was passed as a mem= parameter. Since mem= should not be considered a production feature, a panic notifier handler is installed that dumps the memory limit at panic time if one was set. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 18:16:53 +00:00
Ard Biesheuvel	a89dea5853	arm64: defer __va translation of initrd_start and initrd_end Before deferring the assignment of memstart_addr in a subsequent patch, to the moment where all memory has been discovered and possibly clipped based on the size of the linear region and the presence of a mem= command line parameter, we need to ensure that memstart_addr is not used to perform __va translations before it is assigned. One such use is in the generic early DT discovery of the initrd location, which is recorded as a virtual address in the globals initrd_start and initrd_end. So wire up the generic support to declare the initrd addresses, and implement it without __va() translations, and perform the translation after memstart_addr has been assigned. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 18:16:49 +00:00
Ard Biesheuvel	f9040773b7	arm64: move kernel image to base of vmalloc area This moves the module area to right before the vmalloc area, and moves the kernel image to the base of the vmalloc area. This is an intermediate step towards implementing KASLR, which allows the kernel image to be located anywhere in the vmalloc area. Since other subsystems such as hibernate may still need to refer to the kernel text or data segments via their linears addresses, both are mapped in the linear region as well. The linear alias of the text region is mapped read-only/non-executable to prevent inadvertent modification or execution. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 18:16:44 +00:00
Ard Biesheuvel	157962f5a8	arm64: decouple early fixmap init from linear mapping Since the early fixmap page tables are populated using pages that are part of the static footprint of the kernel, they are covered by the initial kernel mapping, and we can refer to them without using __va/__pa translations, which are tied to the linear mapping. Since the fixmap page tables are disjoint from the kernel mapping up to the top level pgd entry, we can refer to bm_pte[] directly, and there is no need to walk the page tables and perform __pa()/__va() translations at each step. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 18:16:34 +00:00
Ard Biesheuvel	324420bf91	arm64: add support for ioremap() block mappings This wires up the existing generic huge-vmap feature, which allows ioremap() to use PMD or PUD sized block mappings. It also adds support to the unmap path for dealing with block mappings, which will allow us to unmap the __init region using unmap_kernel_range() in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 18:16:16 +00:00
Catalin Marinas	e950631e84	arm64: Remove the get_thread_info() function This function was introduced by previous commits implementing UAO. However, it can be replaced with task_thread_info() in uao_thread_switch() or get_fs() in do_page_fault() (the latter being called only on the current context, so no need for using the saved pt_regs). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 17:27:05 +00:00
James Morse	7054419600	arm64: kernel: Don't toggle PAN on systems with UAO If a CPU supports both Privileged Access Never (PAN) and User Access Override (UAO), we don't need to disable/re-enable PAN round all copy_to_user() like calls. UAO alternatives cause these calls to use the 'unprivileged' load/store instructions, which are overridden to be the privileged kind when fs==KERNEL_DS. This patch changes the copy_to_user() calls to have their PAN toggling depend on a new composite 'feature' ARM64_ALT_PAN_NOT_UAO. If both features are detected, PAN will be enabled, but the copy_to_user() alternatives will not be applied. This means PAN will be enabled all the time for these functions. If only PAN is detected, the toggling will be enabled as normal. This will save the time taken to disable/re-enable PAN, and allow us to catch copy_to_user() accesses that occur with fs==KERNEL_DS. Futex and swp-emulation code continue to hang their PAN toggling code on ARM64_HAS_PAN. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 17:27:05 +00:00
James Morse	57f4959bad	arm64: kernel: Add support for User Access Override 'User Access Override' is a new ARMv8.2 feature which allows the unprivileged load and store instructions to be overridden to behave in the normal way. This patch converts {get,put}_user() and friends to use ldtr/sttr instructions - so that they can only access EL0 memory, then enables UAO when fs==KERNEL_DS so that these functions can access kernel memory. This allows user space's read/write permissions to be checked against the page tables, instead of testing addr<USER_DS, then using the kernel's read/write permissions. Signed-off-by: James Morse <james.morse@arm.com> [catalin.marinas@arm.com: move uao_thread_switch() above dsb()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 17:27:04 +00:00
James Morse	0f54b14e76	arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro Older assemblers may not have support for newer feature registers. To get round this, sysreg.h provides a 'mrs_s' macro that takes a register encoding and generates the raw instruction. Change read_cpuid() to use mrs_s in all cases so that new registers don't have to be a special case. Including sysreg.h means we need to move the include and definition of read_cpuid() after the #ifndef __ASSEMBLY__ to avoid syntax errors in vmlinux.lds. Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-18 11:59:54 +00:00
Marek Szyprowski	722ec35f7f	arm64: dma-mapping: fix handling of devices registered before arch_initcall This patch ensures that devices, which got registered before arch_initcall will be handled correctly by IOMMU-based DMA-mapping code. Cc: <stable@vger.kernel.org> Fixes: `13b8629f65` ("arm64: Add IOMMU dma_ops") Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-02-17 11:48:01 +00:00
Laura Abbott	d7e9d59494	arm64: ptdump: Indicate whether memory should be faulting With CONFIG_DEBUG_PAGEALLOC, pages do not have the valid bit set when free in the buddy allocator. Add an indiciation to the page table dumping code that the valid bit is not set, 'F' for fault, to make this easier to understand. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:40:44 +00:00
Laura Abbott	83863f25e4	arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap pages for debugging purposes. This requires memory be mapped with PAGE_SIZE mappings since breaking down larger mappings at runtime will lead to TLB conflicts. Check if debug_pagealloc is enabled at runtime and if so, map everyting with PAGE_SIZE pages. Implement the functions to actually map/unmap the pages at runtime. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@fedoraproject.org> [catalin.marinas@arm.com: static annotation block_mappings_allowed() and #ifdef] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:40:30 +00:00
Laura Abbott	132233a759	arm64: Drop alloc function from create_mapping create_mapping is only used in fixmap_remap_fdt. All the create_mapping calls need to happen on existing translation table pages without additional allocations. Rather than have an alloc function be called and fail, just set it to NULL and catch its use. Also change the name to create_mapping_noalloc to better capture what exactly is going on. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:12:34 +00:00
Mark Rutland	068a17a580	arm64: mm: create new fine-grained mappings at boot At boot we may change the granularity of the tables mapping the kernel (by splitting or making sections). This may happen when we create the linear mapping (in __map_memblock), or at any point we try to apply fine-grained permissions to the kernel (e.g. fixup_executable, mark_rodata_ro, fixup_init). Changing the active page tables in this manner may result in multiple entries for the same address being allocated into TLBs, risking problems such as TLB conflict aborts or issues derived from the amalgamation of TLB entries. Generally, a break-before-make (BBM) approach is necessary to avoid conflicts, but we cannot do this for the kernel tables as it risks unmapping text or data being used to do so. Instead, we can create a new set of tables from scratch in the safety of the existing mappings, and subsequently migrate over to these using the new cpu_replace_ttbr1 helper, which avoids the two sets of tables being active simultaneously. To avoid issues when we later modify permissions of the page tables (e.g. in fixup_init), we must create the page tables at a granularity such that later modification does not result in splitting of tables. This patch applies this strategy, creating a new set of fine-grained page tables from scratch, and safely migrating to them. The existing fixmap and kasan shadow page tables are reused in the new fine-grained tables. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:46 +00:00
Mark Rutland	11509a306b	arm64: mm: allow passing a pgdir to alloc_init_* To allow us to initialise pgdirs which are fixmapped, allow explicitly passing a pgdir rather than an mm. A new __create_pgd_mapping function is added for this, with existing __create_mapping callers migrated to this. The mm argument was previously only used at the top level. Now that it is redundant at all levels, it is removed. To indicate its new found similarity to alloc_init_{pud,pmd,pte}, __create_mapping is renamed to init_pgd. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:46 +00:00
Mark Rutland	cdef5f6e9e	arm64: mm: allocate pagetables anywhere Now that create_mapping uses fixmap slots to modify pte, pmd, and pud entries, we can access page tables anywhere in physical memory, regardless of the extent of the linear mapping. Given that, we no longer need to limit memblock allocations during page table creation, and can leave the limit as its default MEMBLOCK_ALLOC_ANYWHERE. We never add memory which will fall outside of the linear map range given phys_offset and MAX_MEMBLOCK_ADDR are configured appropriately, so any tables we create will fall in the linear map of the final tables. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:46 +00:00
Mark Rutland	f471044545	arm64: mm: use fixmap when creating page tables As a preparatory step to allow us to allocate early page tables from unmapped memory using memblock_alloc, modify the __create_mapping callees to map and unmap the tables they modify using fixmap entries. All but the top-level pgd initialisation is performed via the fixmap. Subsequent patches will inject the pgd physical address, and migrate to using the FIX_PGD slot. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:46 +00:00
Mark Rutland	316b39db06	arm64: mm: avoid redundant __pa(__va(x)) When we "upgrade" to a section mapping, we free any table we made redundant by giving it back to memblock. To get the PA, we acquire the physical address and convert this to a VA, then subsequently convert this back to a PA. This works currently, but will not work if the tables are not accessed via linear map VAs (e.g. is we use fixmap slots). This patch uses {pmd,pud}_page_paddr to acquire the PA. This avoids the __pa(__va()) round trip, saving some work and avoiding reliance on the linear mapping. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:46 +00:00
Mark Rutland	c1a88e9124	arm64: kasan: avoid TLB conflicts The page table modification performed during the KASAN init risks the allocation of conflicting TLB entries, as it swaps a set of valid global entries for another without suitable TLB maintenance. The presence of conflicting TLB entries can result in the delivery of synchronous TLB conflict aborts, or may result in the use of erroneous data being returned in response to a TLB lookup. This can affect explicit data accesses from software as well as translations performed asynchronously (e.g. as part of page table walks or speculative I-cache fetches), and can therefore result in a wide variety of problems. To avoid this, use cpu_replace_ttbr1 to swap the page tables. This ensures that when the new tables are installed there are no stale entries from the old tables which may conflict. As all updates are made to the tables while they are not active, the updates themselves are safe. At the same time, add the missing barrier to ensure that the tmp_pg_dir entries updated via memcpy are visible to the page table walkers at the point the tmp_pg_dir is installed. All other page table updates made as part of KASAN initialisation have the requisite barriers due to the use of the standard page table accessors. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:45 +00:00
Mark Rutland	50e1881ddd	arm64: mm: add code to safely replace TTBR1_EL1 If page tables are modified without suitable TLB maintenance, the ARM architecture permits multiple TLB entries to be allocated for the same VA. When this occurs, it is permitted that TLB conflict aborts are raised in response to synchronous data/instruction accesses, and/or and amalgamation of the TLB entries may be used as a result of a TLB lookup. The presence of conflicting TLB entries may result in a variety of behaviours detrimental to the system (e.g. erroneous physical addresses may be used by I-cache fetches and/or page table walks). Some of these cases may result in unexpected changes of hardware state, and/or result in the (asynchronous) delivery of SError. To avoid these issues, we must avoid situations where conflicting entries may be allocated into TLBs. For user and module mappings we can follow a strict break-before-make approach, but this cannot work for modifications to the swapper page tables that cover the kernel text and data. Instead, this patch adds code which is intended to be executed from the idmap, which can safely unmap the swapper page tables as it only requires the idmap to be active. This enables us to uninstall the active TTBR1_EL1 entry, invalidate TLBs, then install a new TTBR1_EL1 entry without potentially unmapping code or data required for the sequence. This avoids the risk of conflict, but requires that updates are staged in a copy of the swapper page tables prior to being installed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:45 +00:00
Mark Rutland	86ccce896c	arm64: unmap idmap earlier During boot we leave the idmap in place until paging_init, as we previously had to wait for the zero page to become allocated and accessible. Now that we have a statically-allocated zero page, we can uninstall the idmap much earlier in the boot process, making it far easier to spot accidental use of physical addresses. This also brings the cold boot path in line with the secondary boot path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:45 +00:00
Mark Rutland	9e8e865bbe	arm64: unify idmap removal We currently open-code the removal of the idmap and restoration of the current task's MMU state in a few places. Before introducing yet more copies of this sequence, unify these to call a new helper, cpu_uninstall_idmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:44 +00:00
Mark Rutland	5227cfa71f	arm64: mm: place empty_zero_page in bss Currently the zero page is set up in paging_init, and thus we cannot use the zero page earlier. We use the zero page as a reserved TTBR value from which no TLB entries may be allocated (e.g. when uninstalling the idmap). To enable such usage earlier (as may be required for invasive changes to the kernel page tables), and to minimise the time that the idmap is active, we need to be able to use the zero page before paging_init. This patch follows the example set by x86, by allocating the zero page at compile time, in .bss. This means that the zero page itself is available immediately upon entry to start_kernel (as we zero .bss before this), and also means that the zero page takes up no space in the raw Image binary. The associated struct page is allocated in bootmem_init, and remains unavailable until this time. Outside of arch code, the only users of empty_zero_page assume that the empty_zero_page symbol refers to the zeroed memory itself, and that ZERO_PAGE(x) must be used to acquire the associated struct page, following the example of x86. This patch also brings arm64 inline with these assumptions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:44 +00:00
Mark Rutland	21ab99c289	arm64: mm: specialise pagetable allocators We pass a size parameter to early_alloc and late_alloc, but these are only ever used to allocate single pages. In late_alloc we always allocate a single page. Both allocators provide us with zeroed pages (such that all entries are invalid), but we have no barriers between allocating a page and adding that page to existing (live) tables. A concurrent page table walk may see stale data, leading to a number of issues. This patch specialises the two allocators for page tables. The size parameter is removed and the necessary dsb(ishst) is folded into each. To make it clear that the functions are intended for use for page table allocation, they are renamed to {early,late}_pgtable_alloc, with the related function pointed renamed to pgtable_alloc. As the dsb(ishst) is now in the allocator, the existing barrier for the zero page is redundant and thus is removed. The previously missing include of barrier.h is added. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2016-02-16 15:10:44 +00:00
Ard Biesheuvel	95f5c80050	arm64: allow vmalloc regions to be set with set_memory_* The range of set_memory_* is currently restricted to the module address range because of difficulties in breaking down larger block sizes. vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the function ranges and add a comment explaining why the range is restricted the way it is. Suggested-by: Laura Abbott <labbott@fedoraproject.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-02-02 15:42:15 +00:00
Mika Penttilä	57adec866c	arm64: mm: avoid calling apply_to_page_range on empty range Calling apply_to_page_range with an empty range results in a BUG_ON from the core code. This can be triggered by trying to load the st_drv module with CONFIG_DEBUG_SET_MODULE_RONX enabled: kernel BUG at mm/memory.c:1874! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 1764 Comm: insmod Not tainted 4.5.0-rc1+ #2 Hardware name: ARM Juno development board (r0) (DT) task: ffffffc9763b8000 ti: ffffffc975af8000 task.ti: ffffffc975af8000 PC is at apply_to_page_range+0x2cc/0x2d0 LR is at change_memory_common+0x80/0x108 This patch fixes the issue by making change_memory_common (called by the set_memory_* functions) a NOP when numpages == 0, therefore avoiding the erroneous call to apply_to_page_range and bringing us into line with x86 and s390. Cc: <stable@vger.kernel.org> Reviewed-by: Laura Abbott <labbott@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Mika Penttilä <mika.penttila@nextfour.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-01-26 15:56:44 +00:00
Masanari Iida	b3122023df	arm64: Fix an enum typo in mm/dump.c This patch fixes a typo in mm/dump.c: "MODUELS_END_NR" should be "MODULES_END_NR". Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-01-25 11:53:03 +00:00
Lorenzo Pieralisi	f436b2ac90	arm64: kernel: fix architected PMU registers unconditional access The Performance Monitors extension is an optional feature of the AArch64 architecture, therefore, in order to access Performance Monitors registers safely, the kernel should detect the architected PMU unit presence through the ID_AA64DFR0_EL1 register PMUVer field before accessing them. This patch implements a guard by reading the ID_AA64DFR0_EL1 register PMUVer field to detect the architected PMU presence and prevent accessing PMU system registers if the Performance Monitors extension is not implemented in the core. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Fixes: `60792ad349` ("arm64: kernel: enforce pmuserenr_el0 initialization and restore") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-01-25 11:09:06 +00:00
Ard Biesheuvel	7b1af97957	arm64: kasan: ensure that the KASAN zero page is mapped read-only When switching from the early KASAN shadow region, which maps the entire shadow space read-write, to the permanent KASAN shadow region, which uses a zero page to shadow regions that are not subject to instrumentation, the lowest level table kasan_zero_pte[] may be reused unmodified, which means that the mappings of the zero page that it contains will still be read-write. So update it explicitly to map the zero page read only when we activate the permanent mapping. Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-01-25 11:09:05 +00:00
Kirill A. Shutemov	b7ed934a7c	arm64, thp: remove infrastructure for handling splitting PMDs With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Daniel Cashman	8f0d3aa9de	arm64: mm: support ARCH_MMAP_RND_BITS arm64: arch_mmap_rnd() uses STACK_RND_MASK to generate the random offset for the mmap base address. This value represents a compromise between increased ASLR effectiveness and avoiding address-space fragmentation. Replace it with a Kconfig option, which is sensibly bounded, so that platform developers may choose where to place this compromise. Keep default values as new minimums. Signed-off-by: Daniel Cashman <dcashman@google.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Don Zickus <dzickus@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Rientjes <rientjes@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Nick Kralevich <nnk@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Borislav Petkov <bp@suse.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-14 16:00:49 -08:00
Linus Torvalds	541d284be0	arm[64] perf updates for 4.5: - Support for the CPU PMU in Cortex-A72 - Add sysfs entries to describe the architected events and their mappings for PMUv{1-3} -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJWj+uEAAoJELescNyEwWM0PzgIALXISGukbDOLBXFYRc+6g3BT zb9W2rFtN0j7+WmspGbdocDqnS1gPrqXftAHyk2XPRmfh5rr9aP5qWefJ9fDptTB GCTpW4iG5chHi+er13ovz20Cphz55k3VRA4suBlHHyNLjAwLvnpW28SSAssPJDbB 8UHOqHhNRmnI3D4amJhEfldvk+0h54I5W6odXthxOQZREwA87jQlbRr3PFlBUbIX NN+X6/j1N5Jja6DtaCzfDpybeLR3XQM+Fj+xokyUw5duwfrXgwoMO6N8lDTH3zwe MoWViwCVBMPA0RzJdAD1sbpdIR/e6xT3/VHfkRyR/zS9UalSTv+VAlAanGb6KzY= =1wJ0 -----END PGP SIGNATURE----- Merge tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm[64] perf updates from Will Deacon: "In the past, I have funnelled perf updates through the respective architecture trees, but now that the arm/arm64 perf driver has been largely consolidated under drivers/perf/, it makes more sense to send a separate pull, particularly as I'm listed as maintainer for all the files involved. I offered the branch to arm-soc, but Arnd suggested that I just send it to you directly. So, here is the arm/arm64 perf queue for 4.5. The main features are described below, but the most useful change is from Drew, which advertises our architected event mapping in sysfs so that the perf tool is a lot more user friendly and no longer requires the use of magic hex constants for profiling common events. - Support for the CPU PMU in Cortex-A72 - Add sysfs entries to describe the architected events and their mappings for PMUv{1-3}" * tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: perf: add support for Cortex-A72 arm64: perf: add format entry to describe event -> config mapping ARM: perf: add format entry to describe event -> config mapping arm64: kernel: enforce pmuserenr_el0 initialization and restore arm64: perf: Correct Cortex-A53/A57 compatible values arm64: perf: Add event descriptions arm64: perf: Convert event enums to #defines arm: perf: Add event descriptions arm: perf: Convert event enums to #defines drivers/perf: kill armpmu_register	2016-01-12 12:29:25 -08:00
Will Deacon	39b5be9b42	arm64: mm: move pgd_cache initialisation to pgtable_cache_init Initialising the suppport for EFI runtime services requires us to allocate a pgd off the back of an early_initcall. On systems where the PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the pgd_cache isn't initialised at this stage, and we panic with a NULL dereference during boot: Unable to handle kernel NULL pointer dereference at virtual address 00000000 __create_mapping.isra.5+0x84/0x350 create_pgd_mapping+0x20/0x28 efi_create_mapping+0x5c/0x6c arm_enable_runtime_services+0x154/0x1e4 do_one_initcall+0x8c/0x190 kernel_init_freeable+0x84/0x1ec kernel_init+0x10/0xe0 ret_from_fork+0x10/0x50 This patch fixes the problem by initialising the pgd_cache earlier, in the pgtable_cache_init callback, which sounds suspiciously like what it was intended for. Reported-by: Dennis Chen <dennis.chen@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-01-05 15:43:10 +00:00
David Woods	66b3923a1a	arm64: hugetlb: add support for PTE contiguous bit The arm64 MMU supports a Contiguous bit which is a hint that the TTE is one of a set of contiguous entries which can be cached in a single TLB entry. Supporting this bit adds new intermediate huge page sizes. The set of huge page sizes available depends on the base page size. Without using contiguous pages the huge page sizes are as follows. 4KB: 2MB 1GB 64KB: 512MB With a 4KB granule, the contiguous bit groups together sets of 16 pages and with a 64KB granule it groups sets of 32 pages. This enables two new huge page sizes in each case, so that the full set of available sizes is as follows. 4KB: 64KB 2MB 32MB 1GB 64KB: 2MB 512MB 16GB If a 16KB granule is used then the contiguous bit groups 128 pages at the PTE level and 32 pages at the PMD level. If the base page size is set to 64KB then 2MB pages are enabled by default. It is possible in the future to make 2MB the default huge page size for both 4KB and 64KB granules. Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Reviewed-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: David Woods <dwoods@ezchip.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-21 17:26:00 +00:00
Lorenzo Pieralisi	60792ad349	arm64: kernel: enforce pmuserenr_el0 initialization and restore The pmuserenr_el0 register value is architecturally UNKNOWN on reset. Current kernel code resets that register value iff the core pmu device is correctly probed in the kernel. On platforms with missing DT pmu nodes (or disabled perf events in the kernel), the pmu is not probed, therefore the pmuserenr_el0 register is not reset in the kernel, which means that its value retains the reset value that is architecturally UNKNOWN (system may run with eg pmuserenr_el0 == 0x1, which means that PMU counters access is available at EL0, which must be disallowed). This patch adds code that resets pmuserenr_el0 on cold boot and restores it on core resume from shutdown, so that the pmuserenr_el0 setup is always enforced in the kernel. Cc: <stable@vger.kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-21 14:43:04 +00:00
Ashok Kumar	0a28714c53	arm64: Use PoU cache instr for I/D coherency In systems with three levels of cache(PoU at L1 and PoC at L3), PoC cache flush instructions flushes L2 and L3 caches which could affect performance. For cache flushes for I and D coherency, PoU should suffice. So changing all I and D coherency related cache flushes to PoU. Introduced a new __clean_dcache_area_pou API for dcache flush till PoU and provided a common macro for __flush_dcache_area and __clean_dcache_area_pou. Also, now in __sync_icache_dcache, icache invalidation for non-aliasing VIPT icache is done only for that particular page instead of the earlier __flush_icache_all. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ashok Kumar <ashoks@broadcom.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-17 11:07:13 +00:00
Ashok Kumar	e6b1185f77	arm64: Defer dcache flush in __cpu_copy_user_page Defer dcache flushing to __sync_icache_dcache by calling flush_dcache_page which clears PG_dcache_clean flag. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ashok Kumar <ashoks@broadcom.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-17 11:07:13 +00:00
Will Deacon	129b985cc3	Merge branch 'aarch64/efi' into aarch64/for-next/core Merge in EFI memblock changes from Ard, which form the preparatory work for UEFI support on 32-bit ARM.	2015-12-15 10:59:03 +00:00
Will Deacon	32d6397805	arm64: mm: ensure that the zero page is visible to the page table walker In paging_init, we allocate the zero page, memset it to zero and then point TTBR0 to it in order to avoid speculative fetches through the identity mapping. In order to guarantee that the freshly zeroed page is indeed visible to the page table walker, we need to execute a dsb instruction prior to writing the TTBR. Cc: <stable@vger.kernel.org> # v3.14+, for older kernels need to drop the 'ishst' Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-11 17:33:22 +00:00
Mark Rutland	f00083cae3	arm64: mm: place __cpu_setup in .text We drop __cpu_setup in .text.init, which ends up being part of .text. The .text.init section was a legacy section name which has been unused elsewhere for a long time. The ".text.init" name is misleading if read as a synonym for ".init.text". Any CPU may execute __cpu_setup before turning the MMU on, so it should simply live in .text. Remove the pointless section assignment. This will leave __cpu_setup in the .text section. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-11 17:33:21 +00:00
Mark Rutland	9aa4ec1571	arm64: mm: fold alternatives into .init Currently we treat the alternatives separately from other data that's only used during initialisation, using separate .altinstructions and .altinstr_replacement linker sections. These are freed for general allocation separately from .init. This is problematic as: We do not remove execute permissions, as we do for .init, leaving the memory executable. * We pad between them, making the kernel Image bianry up to PAGE_SIZE bytes larger than necessary. This patch moves the two sections into the contiguous region used for .init*. This saves some memory, ensures that we remove execute permissions, and allows us to remove some code made redundant by this reorganisation. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-10 17:36:08 +00:00
Mark Rutland	e2c30ee320	arm64: mm: remove pointless PAGE_MASKing As pgd_offset{,_k} shift the input address by PGDIR_SHIFT, the sub-page bits will always be shifted out. There is no need to apply PAGE_MASK before this. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-10 17:36:08 +00:00
Ard Biesheuvel	68709f4538	arm64: only consider memblocks with NOMAP cleared for linear mapping Take the new memblock attribute MEMBLOCK_NOMAP into account when deciding whether a certain region is or should be covered by the kernel direct mapping. Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-09 16:57:23 +00:00
Jisheng Zhang	a7c61a3452	arm64: add __init/__initdata section marker to some functions/variables These functions/variables are not needed after booting, so mark them as __init or __initdata. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-02 12:17:11 +00:00
Mark Rutland	9c4e08a302	arm64: mm: allow sections for unaligned bases Callees of __create_mapping may decide to create section mappings if sufficient low bits of the physical and virtual addresses they were passed are zero. While __create_mapping rounds the virtual base address down, it does not similarly round the physical base address down, and hence non-zero bits in the physical address can prevent use of a section mapping, even where a whole next-level table would be used instead. Round down the physical base address in __create_mapping to enable all callees to always create section mappings when such a mapping is possible. Cc: Laura Abbott <labbott@fedoraproject.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-01 09:47:03 +00:00
Mark Rutland	cc5d2b3b95	arm64: mm: detect bad __create_mapping uses If a caller of __create_mapping provides a PA and VA which have different sub-page offsets, it is not clear which offset they expect to apply to the mapping, and is indicative of a bad caller. In some cases, the region we wish to map may validly have a sub-page offset in the physical and virtual addresses. For example, EFI runtime regions have 4K granularity, yet may be mapped by a 64K page kernel. So long as the physical and virtual offsets are the same, the region will be mapped at the expected VAs. Disallow calls with differing sub-page offsets, and WARN when they are encountered, so that we can detect and fix such cases. Cc: Laura Abbott <labbott@fedoraproject.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-12-01 09:47:03 +00:00
Catalin Marinas	667c27597c	Revert "arm64: Mark kernel page ranges contiguous" This reverts commit `348a65cdcb`. Incorrect page table manipulation that does not respect the ARM ARM recommended break-before-make sequence may lead to TLB conflicts. The contiguous PTE patch makes the system even more susceptible to such errors by changing the mapping from a single page to a contiguous range of pages. An additional TLB invalidation would reduce the risk window, however, the correct fix is to switch to a temporary swapper_pg_dir. Once the correct workaround is done, the reverted commit will be re-applied. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jeremy Linton <jeremy.linton@arm.com>	2015-11-26 15:42:41 +00:00
Will Deacon	0ebea80880	arm64: mm: keep reserved ASIDs in sync with mm after multiple rollovers Under some unusual context-switching patterns, it is possible to end up with multiple threads from the same mm running concurrently with different ASIDs: 1. CPU x schedules task t with mm p containing ASID a and generation g This task doesn't block and the CPU doesn't context switch. So: * per_cpu(active_asid, x) = {g,a} * p->context.id = {g,a} 2. Some other CPU generates an ASID rollover. The global generation is now (g + 1). CPU x is still running t, with no context switch and so per_cpu(reserved_asid, x) = {g,a} 3. CPU y schedules task t', which shares mm p with t. The generation mismatches, so we take the slowpath and hit the reserved ASID from CPU x. p is then updated so that p->context.id = {g + 1,a} 4. CPU y schedules some other task u, which has an mm != p. 5. Some other CPU generates another CPU rollover. The global generation is now (g + 2). CPU x is still running t, with no context switch and so per_cpu(reserved_asid, x) = {g,a}. 6. CPU y once again schedules task t', but now fails to hit the reserved ASID from CPU x because of the generation mismatch. This results in a new ASID being allocated, despite the fact that t is still running on CPU x with the same mm. Consequently, TLBIs (e.g. as a result of CoW) will not be synchronised between the two threads. This patch fixes the problem by updating all of the matching reserved ASIDs when we hit on the slowpath (i.e. in step 3 above). This keeps the reserved ASIDs in-sync with the mm and avoids the problem. Reported-by: Tony Thompson <anthony.thompson@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-26 15:27:10 +00:00
Mark Rutland	c03784ee8a	arm64: mm: fix fault_info table xFSC decoding We are missing descriptions for some valid xFSC values in the fault info table (e.g. "TLB conflict abort"), and have erroneous descriptions for reserved values (e.g. "asynchronous external abort", "debug event"). This patch adds the missing xFSC values, and removes erroneous decoding of values reserved by the architecture, as described in ARM DDI 0487A.h. At the same time, fixed the unbalanced brackets for the synchronous parity error strings in the table. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-25 15:49:16 +00:00
Suzuki K. Poulose	7142392dca	arm64: early_alloc: Fix check for allocation failure In early_alloc we check if the memblock_alloc failed by checking the virtual address of the result, which will never fail. This patch fixes it to check the actual result for failure. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-25 12:14:25 +00:00
Laura Abbott	0b2aa5b80b	arm64: Fix R/O permissions in mark_rodata_ro The permissions in mark_rodata_ro trigger a build error with STRICT_MM_TYPECHECKS. Fix this by introducing PAGE_KERNEL_ROX for the same reasons as PAGE_KERNEL_RO. From Ard: "PAGE_KERNEL_EXEC has PTE_WRITE set as well, making the range writeable under the ARMv8.1 DBM feature, that manages the dirty bit in hardware (writing to a page with the PTE_RDONLY and PTE_WRITE bits both set will clear the PTE_RDONLY bit in that case)" Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-18 12:11:36 +00:00
Arnd Bergmann	1dccb598df	arm64: simplify dma_get_ops Including linux/acpi.h from asm/dma-mapping.h causes tons of compile-time warnings, e.g. drivers/isdn/mISDN/dsp_ecdis.h:43:0: warning: "FALSE" redefined drivers/isdn/mISDN/dsp_ecdis.h:44:0: warning: "TRUE" redefined drivers/net/fddi/skfp/h/targetos.h:62:0: warning: "TRUE" redefined drivers/net/fddi/skfp/h/targetos.h:63:0: warning: "FALSE" redefined However, it looks like the dependency should not even there as I do not see why __generic_dma_ops() cares about whether we have an ACPI based system or not. The current behavior is to fall back to the global dma_ops when a device has not set its own dma_ops, but only for DT based systems. This seems dangerous, as a random device might have different requirements regarding IOMMU or coherency, so we should really never have that fallback and just forbid DMA when we have not initialized DMA for a device. This removes the global dma_ops variable and the special-casing for ACPI, and just returns the dma ops that got set for the device, or the dummy_dma_ops if none were present. The original code has apparently been copied from arm32 where we rely on it for ISA devices things like the floppy controller, but we should have no such devices on ARM64. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [catalin.marinas@arm.com: removed acpi_disabled check in arch_setup_dma_ops()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-17 12:05:18 +00:00
Ard Biesheuvel	4fee9f364b	arm64: mm: use correct mapping granularity under DEBUG_RODATA When booting a 64k pages kernel that is built with CONFIG_DEBUG_RODATA and resides at an offset that is not a multiple of 512 MB, the rounding that occurs in __map_memblock() and fixup_executable() results in incorrect regions being mapped. The following snippet from /sys/kernel/debug/kernel_page_tables shows how, when the kernel is loaded 2 MB above the base of DRAM at 0x40000000, the first 2 MB of memory (which may be inaccessible from non-secure EL1 or just reserved by the firmware) is inadvertently mapped into the end of the module region. ---[ Modules start ]--- 0xfffffdffffe00000-0xfffffe0000000000 2M RW NX ... UXN MEM/NORMAL ---[ Modules end ]--- ---[ Kernel Mapping ]--- 0xfffffe0000000000-0xfffffe0000090000 576K RW NX ... UXN MEM/NORMAL 0xfffffe0000090000-0xfffffe0000200000 1472K ro x ... UXN MEM/NORMAL 0xfffffe0000200000-0xfffffe0000800000 6M ro x ... UXN MEM/NORMAL 0xfffffe0000800000-0xfffffe0000810000 64K ro x ... UXN MEM/NORMAL 0xfffffe0000810000-0xfffffe0000a00000 1984K RW NX ... UXN MEM/NORMAL 0xfffffe0000a00000-0xfffffe00ffe00000 4084M RW NX ... UXN MEM/NORMAL The same issue is likely to occur on 16k pages kernels whose load address is not a multiple of 32 MB (i.e., SECTION_SIZE). So round to SWAPPER_BLOCK_SIZE instead of SECTION_SIZE. Fixes: `da141706ae` ("arm64: add better page protections to arm64") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Laura Abbott <labbott@redhat.com> Cc: <stable@vger.kernel.org> # 4.0+ Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-17 12:05:18 +00:00
Robin Murphy	bd1c6ff74c	arm64/dma-mapping: Fix sizes in __iommu_{alloc,free}_attrs The iommu-dma layer does its own size-alignment for coherent DMA allocations based on IOMMU page sizes, but we still need to consider CPU page sizes for the cases where a non-cacheable CPU mapping is created. Whilst everything on the alloc/map path seems to implicitly align things enough to make it work, some functions used by the corresponding unmap/free path do not, which leads to problems freeing odd-sized allocations. Either way it's something we really should be handling explicitly, so do that to make both paths suitably robust. Reported-by: Yong Wu <yong.wu@mediatek.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-16 10:05:35 +00:00
Linus Torvalds	a18e2fa5e6	arm64 fixes and clean-ups: - __cmpxchg_double() return type fix to avoid truncation of a long to int and subsequent logical "not" in cmpxchg_double() misinterpreting the operation success/failure - BPF fixes for mod and div by zero - Fix compilation with STRICT_MM_TYPECHECKS enabled - VDSO build fix without libgcov - Some static and __maybe_unused annotations - Kconfig clean-up (FRAME_POINTER) - defconfig update for CRYPTO_CRC32_ARM64 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWRNNbAAoJEGvWsS0AyF7xp+wQAIc0A+uSReEJ0Be3kSWZIy0O 9wGCtfp2e3X78ibgVoP/+KvA1JUrMJNwNH54CgGgG6H4rwjRthCvIV/HbKfYufM8 vfuTL2MV1ywkNO0uTzspsICqgKPcpG27SwAlgOcxNXpO0Kui2OlKSxS4kTA8+6Z5 Lm64qDmFG7Z6wcBHhr8JSngC+xvXOvlcUW8odnjXjyCimwnpCFXXnRWDU3RnXJZa 3Khgp8OiRtnCSLfj7YBQA9wfNNgPgKdJ5wevz2g7hiIbYx0IOHmDpzbb3sUNMMKV XLKeeJgqZL4EXZBCzapHRHCE/q0kiiBhzYSHw6aOBwjD9v683aytT/ax2/AgjzvW nB3ZPdrbRMjcmNRBT2bheoU8diilhtfxSxf+4T+pVUnVMXDNl/xY9hekGA0hFO1z nH5P5vkFKsX3U02Ox/G50Od2rM6p7uGRGFYuomSIoJYBItuxGOAuYWlY2+ujcxY5 YvAQ+3FYCkjLipVutlqLxKoZSY8Ex+0LOjPYYsI/+rsE70IVjGuLj0bTm8B/aTcy dOctNqvOGwo8O5n2jsKM3XkjfUCPRdzu1C7rQz2BqfE9cPAZxg2fQpPv4SGtPuFe lEvokuYRJ3qYnMt5MG/9Mkqmczfbch88A41wgS9/ySQ57eo3wISLkOiKqzKdJjOa 0qldWaEvST2iVUQmiMl7 =ApkD -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes and clean-ups from Catalin Marinas: "Here's a second pull request for this merging window with some fixes/clean-ups: - __cmpxchg_double() return type fix to avoid truncation of a long to int and subsequent logical "not" in cmpxchg_double() misinterpreting the operation success/failure - BPF fixes for mod and div by zero - Fix compilation with STRICT_MM_TYPECHECKS enabled - VDSO build fix without libgcov - Some static and __maybe_unused annotations - Kconfig clean-up (FRAME_POINTER) - defconfig update for CRYPTO_CRC32_ARM64" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: suspend: make hw_breakpoint_restore static arm64: mmu: make split_pud and fixup_executable static arm64: smp: make of_parse_and_init_cpus static arm64: use linux/types.h in kvm.h arm64: build vdso without libgcov arm64: mark cpus_have_hwcap as __maybe_unused arm64: remove redundant FRAME_POINTER kconfig option and force to select it arm64: fix R/O permissions of FDT mapping arm64: fix STRICT_MM_TYPECHECKS issue in PTE_CONT manipulation arm64: bpf: fix mod-by-zero case arm64: bpf: fix div-by-zero case arm64: Enable CRYPTO_CRC32_ARM64 in defconfig arm64: cmpxchg_dbl: fix return value type	2015-11-12 15:33:11 -08:00
Jisheng Zhang	9a17a21334	arm64: mmu: make split_pud and fixup_executable static split_pud and fixup_executable are only called from within mmu.c, so they can be declared static. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-12 15:18:14 +00:00
Ard Biesheuvel	fb226c3d7c	arm64: fix R/O permissions of FDT mapping The mapping permissions of the FDT are set to 'PAGE_KERNEL \| PTE_RDONLY' in an attempt to map the FDT as read-only. However, not only does this break at build time under STRICT_MM_TYPECHECKS (since the two terms are of different types in that case), it also results in both the PTE_WRITE and PTE_RDONLY attributes to be set, which means the region is still writable under ARMv8.1 DBM (and an attempted write will simply clear the PT_RDONLY bit). So instead, define PAGE_KERNEL_RO (which already has an established meaning across architectures) and use that instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-09 14:26:36 +00:00
Ard Biesheuvel	b219545e96	arm64: fix STRICT_MM_TYPECHECKS issue in PTE_CONT manipulation The new page table code that manipulates the PTE_CONT flags does so in a way that is inconsistent with STRICT_MM_TYPECHECKS. Fix it by using the correct combination of __pgprot() and pgprot_val(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-09 14:23:16 +00:00
Andrew Morton	ce5c2d2c25	arm64: fixup for mm renames __GFP_WAIT was renamed for __GFP_RECLAIM and the gfpflags_allow_blocking() helper was added. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-07 16:45:51 -08:00
Mel Gorman	d0164adc89	mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Linus Torvalds	39cf7c3981	IOMMU Updates for Linux v4.4 This time including: * A new IOMMU driver for s390 pci devices * Common dma-ops support based on iommu-api for ARM64. The plan is to use this as a basis for ARM32 and hopefully other architectures as well in the future. * MSI support for ARM-SMMUv3 * Cleanups and dead code removal in the AMD IOMMU driver * Better RMRR handling for the Intel VT-d driver * Various other cleanups and small fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81 qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo 1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji 12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD SecikoTtH1Q4RVtqOcAQ =jLlB -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "This time including: - A new IOMMU driver for s390 pci devices - Common dma-ops support based on iommu-api for ARM64. The plan is to use this as a basis for ARM32 and hopefully other architectures as well in the future. - MSI support for ARM-SMMUv3 - Cleanups and dead code removal in the AMD IOMMU driver - Better RMRR handling for the Intel VT-d driver - Various other cleanups and small fixes" * tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits) iommu/vt-d: Fix return value check of parse_ioapics_under_ir() iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope() iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir iommu: Move default domain allocation to iommu_group_get_for_dev() iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev iommu/arm-smmu: Switch to device_group call-back iommu/fsl: Convert to device_group call-back iommu: Add device_group call-back to x86 iommu drivers iommu: Add generic_device_group() function iommu: Export and rename iommu_group_get_for_pci_dev() iommu: Revive device_group iommu-ops call-back iommu/amd: Remove find_last_devid_on_pci() iommu/amd: Remove first/last_device handling iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL iommu/amd: Cleanup buffer allocation iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu iommu/amd: Align DTE flag definitions iommu/amd: Remove old alias handling code iommu/amd: Set alias DTE in do_attach/do_detach iommu/amd: WARN when __[attach\|detach]_device are called with irqs enabled ...	2015-11-05 16:12:10 -08:00
Linus Torvalds	2dc10ad81f	arm64 updates for 4.4: - "genirq: Introduce generic irq migration for cpu hotunplugged" patch merged from tip/irq/for-arm to allow the arm64-specific part to be upstreamed via the arm64 tree - CPU feature detection reworked to cope with heterogeneous systems where CPUs may not have exactly the same features. The features reported by the kernel via internal data structures or ELF_HWCAP are delayed until all the CPUs are up (and before user space starts) - Support for 16KB pages, with the additional bonus of a 36-bit VA space, though the latter only depending on EXPERT - Implement native {relaxed, acquire, release} atomics for arm64 - New ASID allocation algorithm which avoids IPI on roll-over, together with TLB invalidation optimisations (using local vs global where feasible) - KASan support for arm64 - EFI_STUB clean-up and isolation for the kernel proper (required by KASan) - copy_{to,from,in}_user optimisations (sharing the memcpy template) - perf: moving arm64 to the arm32/64 shared PMU framework - L1_CACHE_BYTES increased to 128 to accommodate Cavium hardware - Support for the contiguous PTE hint on kernel mapping (16 consecutive entries may be able to use a single TLB entry) - Generic CONFIG_HZ now used on arm64 - defconfig updates -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWOkmIAAoJEGvWsS0AyF7x4GgQAINU3NePjFFvWZNCkqobeH9+ jFKwtXamIudhTSdnXNXyYWmtRL9Krg3qI4zDQf68dvDFAZAze2kVuOi1yPpCbpFZ /j/afNyQc7+PoyqRAzmT+EMPZlcuOA84Prrl1r3QWZ58QaFeVk/6ZxrHunTHxN0x mR9PIXfWx73MTo+UnG8FChkmEY6LmV4XpemgTaMR9FqFhdT51OZSxDDAYXOTm4JW a5HdN9OWjjJ2rhLlFEaC7tszG9B5doHdy2tr5ge/YERVJzIPDogHkMe8ZhfAJc+x SQU5tKN6Pg4MOi+dLhxlk0/mKCvHLiEQ5KVREJnt8GxupAR54Bat+DQ+rP9cSnpq dRQTcARIOyy9LGgy+ROAsSo+NiyM5WuJ0/WJUYKmgWTJOfczRYoZv6TMKlwNOUYb tGLCZHhKPM3yBHJlWbQykl3xmSuudxCMmjlZzg7B+MVfTP6uo0CRSPmYl+v67q+J bBw/Z2RYXWYGnvlc6OfbMeImI6prXeE36+5ytyJFga0m+IqcTzRGzjcLxKEvdbiU pr8n9i+hV9iSsT/UwukXZ8ay6zH7PrTLzILWQlieutfXlvha7MYeGxnkbLmdYcfe GCj374io5cdImHcVKmfhnOMlFOLuOHphl9cmsd/O2LmCIqBj9BIeNH2Om8mHVK2F YHczMdpESlJApE7kUc1e =3six -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - "genirq: Introduce generic irq migration for cpu hotunplugged" patch merged from tip/irq/for-arm to allow the arm64-specific part to be upstreamed via the arm64 tree - CPU feature detection reworked to cope with heterogeneous systems where CPUs may not have exactly the same features. The features reported by the kernel via internal data structures or ELF_HWCAP are delayed until all the CPUs are up (and before user space starts) - Support for 16KB pages, with the additional bonus of a 36-bit VA space, though the latter only depending on EXPERT - Implement native {relaxed, acquire, release} atomics for arm64 - New ASID allocation algorithm which avoids IPI on roll-over, together with TLB invalidation optimisations (using local vs global where feasible) - KASan support for arm64 - EFI_STUB clean-up and isolation for the kernel proper (required by KASan) - copy_{to,from,in}_user optimisations (sharing the memcpy template) - perf: moving arm64 to the arm32/64 shared PMU framework - L1_CACHE_BYTES increased to 128 to accommodate Cavium hardware - Support for the contiguous PTE hint on kernel mapping (16 consecutive entries may be able to use a single TLB entry) - Generic CONFIG_HZ now used on arm64 - defconfig updates * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (91 commits) arm64/efi: fix libstub build under CONFIG_MODVERSIONS ARM64: Enable multi-core scheduler support by default arm64/efi: move arm64 specific stub C code to libstub arm64: page-align sections for DEBUG_RODATA arm64: Fix build with CONFIG_ZONE_DMA=n arm64: Fix compat register mappings arm64: Increase the max granular size arm64: remove bogus TASK_SIZE_64 check arm64: make Timer Interrupt Frequency selectable arm64/mm: use PAGE_ALIGNED instead of IS_ALIGNED arm64: cachetype: fix definitions of ICACHEF_* flags arm64: cpufeature: declare enable_cpu_capabilities as static genirq: Make the cpuhotplug migration code less noisy arm64: Constify hwcap name string arrays arm64/kvm: Make use of the system wide safe values arm64/debug: Make use of the system wide safe value arm64: Move FP/ASIMD hwcap handling to common code arm64/HWCAP: Use system wide safe values arm64/capabilities: Make use of system wide safe value arm64: Delay cpu feature capability checks ...	2015-11-04 14:47:13 -08:00
Robin Murphy	86a5906e4d	arm64: Fix build with CONFIG_ZONE_DMA=n Trying to build with CONFIG_ZONE_DMA=n leaves visible references to the now-undefined ZONE_DMA, resulting in a syntax error. Hide the references behind an #ifdef instead of using IS_ENABLED. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-29 16:58:00 +00:00
Alexander Kuleshov	f23bef34d3	arm64/mm: use PAGE_ALIGNED instead of IS_ALIGNED The <linux/mm.h> already provides the PAGE_ALIGNED macro. Let's use this macro instead of IS_ALIGNED and passing PAGE_SIZE directly. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Acked-by: Laura Abbott <laura@labbott.name> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-28 18:36:32 +00:00
Suzuki K. Poulose	dbb4e152b8	arm64: Delay cpu feature capability checks At the moment we run through the arm64_features capability list for each CPU and set the capability if one of the CPU supports it. This could be problematic in a heterogeneous system with differing capabilities. Delay the CPU feature checks until all the enabled CPUs are up(i.e, smp_cpus_done(), so that we can make better decisions based on the overall system capability. Once we decide and advertise the capabilities the alternatives can be applied. From this state, we cannot roll back a feature to disabled based on the values from a new hotplugged CPU, due to the runtime patching and other reasons. So, for all new CPUs, we need to make sure that they have the established system capabilities. Failing which, we bring the CPU down, preventing it from turning online. Once the capabilities are decided, any new CPU booting up goes through verification to ensure that it has all the enabled capabilities and also invokes the respective enable() method on the CPU. The CPU errata checks are not delayed and is still executed per-CPU to detect the respective capabilities. If we ever come across a non-errata capability that needs to be checked on each-CPU, we could introduce them via a new capability table(or introduce a flag), which can be processed per CPU. The next patch will make the feature checks use the system wide safe value of a feature register. NOTE: The enable() methods associated with the capability is scheduled on all the CPUs (which is the only use case at the moment). If we need a different type of 'enable()' which only needs to be run once on any CPU, we should be able to handle that when needed. Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: Dave Martin <Dave.Martin@arm.com> [catalin.marinas@arm.com: static variable and coding style fixes] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-21 15:35:58 +01:00
Suzuki K. Poulose	44eaacf1b8	arm64: Add 16K page size support This patch turns on the 16K page support in the kernel. We support 48bit VA (4 level page tables) and 47bit VA (3 level page tables). With 16K we can map 128 entries using contiguous bit hint at level 3 to map 2M using single TLB entry. TODO: 16K supports 32 contiguous entries at level 2 to get us 1G(which is not yet supported by the infrastructure). That should be a separate patch altogether. Cc: Will Deacon <will.deacon@arm.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-19 17:55:12 +01:00
Suzuki K. Poulose	b433dce056	arm64: Handle section maps for swapper/idmap We use section maps with 4K page size to create the swapper/idmaps. So far we have used !64K or 4K checks to handle the case where we use the section maps. This patch adds a new symbol, ARM64_SWAPPER_USES_SECTION_MAPS, to handle cases where we use section maps, instead of using the page size symbols. Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-19 17:52:36 +01:00
Robin Murphy	876945dbf6	arm64: Hook up IOMMU dma_ops With iommu_dma_ops in place, hook them up to the configuration code, so IOMMU-fronted devices will get them automatically. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-10-15 16:41:47 +02:00
Robin Murphy	13b8629f65	arm64: Add IOMMU dma_ops Taking some inspiration from the arch/arm code, implement the arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer. Since there is still work to do elsewhere to make DMA configuration happen in a more appropriate order and properly support platform devices in the IOMMU core, the device setup code unfortunately starts out carrying some workarounds to ensure it works correctly in the current state of things. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-10-15 16:41:37 +02:00
Ingo Molnar	c7d77a7980	Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:05:18 +02:00
Will Deacon	83040123fd	arm64: kasan: fix issues reported by sparse Sparse reports some new issues introduced by the kasan patches: arch/arm64/mm/kasan_init.c:91:13: warning: no previous prototype for 'kasan_early_init' [-Wmissing-prototypes] void __init kasan_early_init(void) ^ arch/arm64/mm/kasan_init.c:91:13: warning: symbol 'kasan_early_init' was not declared. Should it be static? [sparse] This patch resolves the problem by adding a prototype for kasan_early_init and marking the function as asmlinkage, since it's only called from head.S. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-13 14:54:42 +01:00
Linus Walleij	ee7f881b59	ARM64: kasan: print memory assignment This prints out the virtual memory assigned to KASan in the boot crawl along with other memory assignments, if and only if KASan is activated. Example dmesg from the Juno Development board: Memory: 1691156K/2080768K available (5465K kernel code, 444K rwdata, 2160K rodata, 340K init, 217K bss, 373228K reserved, 16384K cma-reserved) Virtual kernel memory layout: kasan : 0xffffff8000000000 - 0xffffff9000000000 ( 64 GB) vmalloc : 0xffffff9000000000 - 0xffffffbdbfff0000 ( 182 GB) vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000 ( 8 GB maximum) 0xffffffbdc2000000 - 0xffffffbdc3fc0000 ( 31 MB actual) fixed : 0xffffffbffabfd000 - 0xffffffbffac00000 ( 12 KB) PCI I/O : 0xffffffbffae00000 - 0xffffffbffbe00000 ( 16 MB) modules : 0xffffffbffc000000 - 0xffffffc000000000 ( 64 MB) memory : 0xffffffc000000000 - 0xffffffc07f000000 ( 2032 MB) .init : 0xffffffc0007f5000 - 0xffffffc00084a000 ( 340 KB) .text : 0xffffffc000080000 - 0xffffffc0007f45b4 ( 7634 KB) .data : 0xffffffc000850000 - 0xffffffc0008bf200 ( 445 KB) Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-12 17:46:43 +01:00
Andrey Ryabinin	39d114ddc6	arm64: add KASAN support This patch adds arch specific code for kernel address sanitizer (see Documentation/kasan.txt). 1/8 of kernel addresses reserved for shadow memory. There was no big enough hole for this, so virtual addresses for shadow were stolen from vmalloc area. At early boot stage the whole shadow region populated with just one physical page (kasan_zero_page). Later, this page reused as readonly zero shadow for some memory that KASan currently don't track (vmalloc). After mapping the physical memory, pages for shadow memory are allocated and mapped. Functions like memset/memmove/memcpy do a lot of memory accesses. If bad pointer passed to one of these function it is important to catch this. Compiler's instrumentation cannot do this since these functions are written in assembly. KASan replaces memory functions with manually instrumented variants. Original functions declared as weak symbols so strong definitions in mm/kasan/kasan.c could replace them. Original functions have aliases with '__' prefix in name, so we could call non-instrumented variant if needed. Some files built without kasan instrumentation (e.g. mm/slub.c). Original mem* function replaced (via #define) with prefixed variants to disable memory access checks for such files. Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-12 17:46:36 +01:00
Andrey Ryabinin	fd2203dd35	arm64: move PGD_SIZE definition to pgalloc.h This will be used by KASAN latter. Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-12 17:46:30 +01:00
Ard Biesheuvel	207918461e	arm64: use ENDPIPROC() to annotate position independent assembler routines For more control over which functions are called with the MMU off or with the UEFI 1:1 mapping active, annotate some assembler routines as position independent. This is done by introducing ENDPIPROC(), which replaces the ENDPROC() declaration of those routines. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-12 16:19:45 +01:00
Jeremy Linton	348a65cdcb	arm64: Mark kernel page ranges contiguous With 64k pages, the next larger segment size is 512M. The linux kernel also uses different protection flags to cover its code and data. Because of this requirement, the vast majority of the kernel code and data structures end up being mapped with 64k pages instead of the larger pages common with a 4k page kernel. Recent ARM processors support a contiguous bit in the page tables which allows the a TLB to cover a range larger than a single PTE if that range is mapped into physically contiguous ram. So, for the kernel its a good idea to set this flag. Some basic micro benchmarks show it can significantly reduce the number of L1 dTLB refills. Add boot option to enable/disable CONT marking, as well as fix a bug found by Steve Capper. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> [catalin.marinas@arm.com: remove CONFIG_ARM64_CONT_PTE altogether] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-08 18:44:14 +01:00
Jeremy Linton	202e41a1c2	arm64: Make the kernel page dump utility aware of the CONT bit The kernel page dump utility needs to be aware of the CONT bit before it will break up pages ranges for display. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-08 18:39:57 +01:00
Will Deacon	38d9628750	arm64: mm: kill mm_cpumask usage mm_cpumask isn't actually used for anything on arm64, so remove all the code trying to keep it up-to-date. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-07 11:56:29 +01:00
Will Deacon	c2775b2ee5	arm64: switch_mm: simplify mm and CPU checks switch_mm performs some checks to try and avoid entering the ASID allocator: (1) If we're switching to the init_mm (no user mappings), then simply set a reserved TTBR0 value with no page table (the zero page) (2) If prev == next and the mm_cpumask indicates that we've run on this CPU before, then we can skip the allocator. However, there is plenty of redundancy here. With the new ASID allocator, if prev == next, then we know that our ASID is valid and do not need to worry about re-allocation. Consequently, we can drop the mm_cpumask check in (2) and move the prev == next check before the init_mm check, since if prev == next == init_mm then there's nothing to do. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-07 11:56:25 +01:00
Will Deacon	5aec715d7d	arm64: mm: rewrite ASID allocator and MM context-switching code Our current switch_mm implementation suffers from a number of problems: (1) The ASID allocator relies on IPIs to synchronise the CPUs on a rollover event (2) Because of (1), we cannot allocate ASIDs with interrupts disabled and therefore make use of a TIF_SWITCH_MM flag to postpone the actual switch to finish_arch_post_lock_switch (3) We run context switch with a reserved (invalid) TTBR0 value, even though the ASID and pgd are updated atomically (4) We take a global spinlock (cpu_asid_lock) during context-switch (5) We use h/w broadcast TLB operations when they are not required (e.g. in flush_context) This patch addresses these problems by rewriting the ASID algorithm to match the bitmap-based arch/arm/ implementation more closely. This in turn allows us to remove much of the complications surrounding switch_mm, including the ugly thread flag. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-07 11:55:41 +01:00
Will Deacon	8e63d38876	arm64: flush: use local TLB and I-cache invalidation There are a number of places where a single CPU is running with a private page-table and we need to perform maintenance on the TLB and I-cache in order to ensure correctness, but do not require the operation to be broadcast to other CPUs. This patch adds local variants of tlb_flush_all and __flush_icache_all to support these use-cases and updates the callers respectively. __local_flush_icache_all also implies an isb, since it is intended to be used synchronously. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Daney <david.daney@cavium.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-07 11:45:27 +01:00
Will Deacon	fa7aae8a42	arm64: proc: de-scope TLBI operation during cold boot When cold-booting a CPU, we must invalidate any junk entries from the local TLB prior to enabling the MMU. This doesn't require broadcasting within the inner-shareable domain, so de-scope the operation to apply only to the local CPU. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-10-07 11:44:25 +01:00
Mark Salyzyn	569ba74a7b	arm64: readahead: fault retry breaks mmap file read random detection This is the arm64 portion of commit `45cac65b0f` ("readahead: fault retry breaks mmap file read random detection"), which was absent from the initial port and has since gone unnoticed. The original commit says: > .fault now can retry. The retry can break state machine of .fault. In > filemap_fault, if page is miss, ra->mmap_miss is increased. In the second > try, since the page is in page cache now, ra->mmap_miss is decreased. And > these are done in one fault, so we can't detect random mmap file access. > > Add a new flag to indicate .fault is tried once. In the second try, skip > ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. With this change, Mark reports that: > Random read improves by 250%, sequential read improves by 40%, and > random write by 400% to an eMMC device with dm crypto wrapped around it. Cc: Shaohua Li <shli@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Riley Andrews <riandrews@android.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-10-05 16:30:50 +01:00
Jisheng Zhang	ba9cc453c4	arm64: dma-mapping: check whether cma area is initialized or not If CMA is turned on and CMA size is set to zero, kernel should behave as if CMA was not enabled at compile time. Every dma allocation should check existence of cma area before requesting memory. Arm has done this by commit `e464ef16c4` ("arm: dma-mapping: add checking cma area initialized"), also do this for arm64. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-09-14 12:28:30 +01:00
Will Deacon	d8d23fa0f2	arm64: mdscr_el1: avoid exposing DCC to userspace We don't want to expose the DCC to userspace, particularly as there is a kernel console driver for it. This patch resets mdscr_el1 to disable userspace access to the DCC registers on the cold boot path. Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-08-20 16:17:58 +01:00
Jonathan (Zhixiong) Zhang	8d446c8647	arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT UEFI spec 2.5 section 2.3.6.1 defines that EFI_MEMORY_[UC\|WC\|WT\|WB] are possible EFI memory types for AArch64. Each of those EFI memory types is mapped to a corresponding AArch64 memory type. So we need to define PROT_DEVICE_nGnRnE and PROT_NORMWL_WT additionaly. MT_NORMAL_WT is defined, and its encoding is added to MAIR_EL1 when initializing the CPU. Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438936621-5215-6-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-08 10:37:40 +02:00
Will Deacon	8ec4198743	arm64: mm: ensure patched kernel text is fetched from PoU The arm64 booting document requires that the bootloader has cleaned the kernel image to the PoC. However, when a CPU re-enters the kernel due to either a CPU hotplug "on" event or resuming from a low-power state (e.g. cpuidle), the kernel text may in-fact be dirty at the PoU due to things like alternative patching or even module loading. Thanks to I-cache speculation with the MMU off, stale instructions could be fetched prior to enabling the MMU, potentially leading to crashes when executing regions of code that have been modified at runtime. This patch addresses the issue by ensuring that the local I-cache is invalidated immediately after a CPU has enabled its MMU but before jumping out of the identity mapping. Any stale instructions fetched from the PoC will then be discarded and refetched correctly from the PoU. Patching kernel text executed prior to the MMU being enabled is prohibited, so the early entry code will always be clean. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-08-05 10:05:20 +01:00
Robin Murphy	97942c2862	arm64: dma-mapping: Simplify pgprot handling Since __get_dma_pgprot() does The Right Thing(TM) in the non-coherent case, and the non-cacheable alias for DMA buffers is private to the kernel anyway, we can simplify things slightly and make the code more readable by just using PAGE_KERNEL as the base pgprot. Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-08-03 13:17:38 +01:00
Mark Rutland	c53e0baa6f	arm64: mm: mark create_mapping as __init Currently create_mapping is marked with __ref, apparently because it refers to early_alloc. However, create_mapping has no logic to prevent erroneous use of early_alloc after it has been freed, and is only ever called by __init functions anyway. Thus the __ref marker is misleading and unnecessary. Instead, this patch marks create_mapping as __init, resulting in warnings if it is used from a a non __init functions, and allowing its memory to be reclaimed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-28 11:36:09 +01:00
Wang Long	662ba3dbce	arm64: mm: add __init section marker to free_initrd_mem It is not needed after booting, this patch moves the free_initrd_mem() function to the __init section. This patch also make keep_initrd __initdata, to reduce kernel size. Signed-off-by: Wang Long <long.wanglong@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 18:29:18 +01:00
Dave P Martin	9fb7410f95	arm64/BUG: Use BRK instruction for generic BUG traps Currently, the minimal default BUG() implementation from asm- generic is used for arm64. This patch uses the BRK software breakpoint instruction to generate a trap instead, similarly to most other arches, with the generic BUG code generating the dmesg boilerplate. This allows bug metadata to be moved to a separate table and reduces the amount of inline code at BUG and WARN sites. This also avoids clobbering any registers before they can be dumped. To mitigate the size of the bug table further, this patch makes use of the existing infrastructure for encoding addresses within the bug table as 32-bit offsets instead of absolute pointers. (Note that this limits the kernel size to 2GB.) Traps are registered at arch_initcall time for aarch64, but BUG has minimal real dependencies and it is desirable to be able to generate bug splats as early as possible. This patch redirects all debug exceptions caused by BRK directly to bug_handler() until the full debug exception support has been initialised. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:42 +01:00
James Morse	338d4f49d6	arm64: kernel: Add support for Privileged Access Never 'Privileged Access Never' is a new arm8.1 feature which prevents privileged code from accessing any virtual address where read or write access is also permitted at EL0. This patch enables the PAN feature on all CPUs, and modifies {get,put}_user helpers temporarily to permit access. This will catch kernel bugs where user memory is accessed directly. 'Unprivileged loads and stores' using ldtrb et al are unaffected by PAN. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com> [will: use ALTERNATIVE in asm and tidy up pan_enable check] Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:41 +01:00
Daniel Thompson	271d35eb77	arm64: mm: Adopt new alternative assembler macros Convert the dynamic patching for ARM64_WORKAROUND_CLEAN_CACHE over to the newly added alternative assembler macros. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:40 +01:00
Jisheng Zhang	0a570e7ade	arm64: hugetlb: remove paragraph about writing to FSF Remove paragraph about writing to the Free Software Foundation's mailing address from GPL notice. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:40 +01:00
Robin Murphy	1d1ddf67dc	arm64: dma-mapping: implement dma_get_sgtable() The default dma_common_get_sgtable() implementation relies on the CPU address of the buffer being a regular lowmem address. This is not always the case on arm64, since allocations from the various DMA pools may have remapped vmalloc addresses, rendering the use of virt_to_page() invalid. Fix this by providing our own implementation based on the fact that we can safely derive a physical address from the DMA address in both cases. CC: Jon Medhurst <tixy@linaro.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> [will: made static] Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:40 +01:00
Will Deacon	4b3dc9679c	arm64: force CONFIG_SMP=y and remove redundant #ifdefs Nobody seems to be producing !SMP systems anymore, so this is just becoming a source of kernel bugs, particularly if people want to use coherent DMA with non-shared pages. This patch forces CONFIG_SMP=y for arm64, removing a modest amount of code in the process. Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:40 +01:00
Catalin Marinas	2f4b829c62	arm64: Add support for hardware updates of the access and dirty pte bits The ARMv8.1 architecture extensions introduce support for hardware updates of the access and dirty information in page table entries. With TCR_EL1.HA enabled, when the CPU accesses an address with the PTE_AF bit cleared in the page table, instead of raising an access flag fault the CPU sets the actual page table entry bit. To ensure that kernel modifications to the page tables do not inadvertently revert a change introduced by hardware updates, the exclusive monitor (ldxr/stxr) is adopted in the pte accessors. When TCR_EL1.HD is enabled, a write access to a memory location with the DBM (Dirty Bit Management) bit set in the corresponding pte automatically clears the read-only bit (AP[2]). Such DBM bit maps onto the Linux PTE_WRITE bit and to check whether a writable (DBM set) page is dirty, the kernel tests the PTE_RDONLY bit. In order to allow read-only and dirty pages, the kernel needs to preserve the software dirty bit. The hardware dirty status is transferred to the software dirty bit in ptep_set_wrprotect() (using load/store exclusive loop) and pte_modify(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:39 +01:00
Mark Salter	b08d4640a3	arm64: remove dead code Commit `68234df4ea` ("arm64: kill flush_cache_all()") removed soft_reset() from the kernel. This was the only caller of setup_mm_for_reboot(), so remove that also. Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:39 +01:00
Robin Murphy	aaf6f2f098	arm64: consolidate __swiotlb_mmap Since commit `9d3bfbb4df` ("arm64: Combine coherent and non-coherent swiotlb dma_ops"), __dma_common_mmap is no longer shared between two callers, so roll it into the remaining one. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2015-07-27 11:08:39 +01:00
Ard Biesheuvel	4b59246d9a	arm64: remove another unnecessary libfdt include path Patch `63a4aea556` ("of: clean-up unnecessary libfdt include paths") removed all explicit libfdt include paths, since those are no longer necessary after the latest dtc upgrade. However, this one snuck in during the same merge window. Remove it. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-07-06 17:15:14 +01:00
Linus Torvalds	6361c845ce	Various arm64 fixes: - suspicious RCU usage warning - BPF (out of bounds array read and endianness conversion) - perf (of_node usage after of_node_put, cpu_pmu->plat_device assignment) - huge pmd/pud check for value 0 - rate-limiting should only take unhandled signals into account Clean-up: - incorrect use of pgprot_t type - unused header include - __init annotation to arm_cpuidle_init - pr_debug instead of pr_error for disabled GICC entries in ACPI/MADT -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVlrS4AAoJEGvWsS0AyF7x/TUQAK1o3/jlyAOY2eFRZPR9CJrU xBB3bFs1TcYGyl5TvSssyAHS9RgpsagS+ZHbSaWTR9zjrtkJ5/kAFS0JY3Y3/oOy 4cPACISX8ps+YcekCIF0PLsznmvKstpXasvfEVr+QlspzF0utxZa/koP2ofzK+J4 IuTIMvXjym4x8GnV4FzkyFSOtJLEmtFpeW8GbzbrhjrIiCYP7Hx5zsn2PNfpdsQj 2UqE0JdMsCS42fmjzmX+MTTiQBIzfDXR9BSxUtbarR/wLEZbcN8cfQGIm/mrj69Y 1S6TuQVCskUC+8Ue2jKPJEb18igUZcbNulBPEEyMt1dNQxHrvicdgK/bthwmGlxb sp5W82BT1ws3E4IHFB1tY89mp4j7GpvWf19su6xLmcbgYAh+/bQZHB0OR6iBnPHI Zyp3BotBAk7zvf7hpdV2l9a3RkNR0tjI6eD7L1I7trxvPqq46DedxqpnBwXzElI8 xfTm8G4EaOBK65LnvPXnWDRawKhOcfEEAFEzJYZoO5g0gw4fA4+dRMsXBF/ENPde FTv9afGMDy1uK7s449x+F4XwcZiiCaXBjtxAbXKNbfqPmMgv3v05/oE66LyTyYYM lcEdQLMJ62zrKASPfKS3Iw5BChLoFFDKnnZRryGuYt5h5fGyMBlGYoe3y9K8zTV6 iGYFzZ6cQfv/7HvYIZqB =R+XU -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes (and cleanups) from Catalin Marinas: "Various arm64 fixes: - suspicious RCU usage warning - BPF (out of bounds array read and endianness conversion) - perf (of_node usage after of_node_put, cpu_pmu->plat_device assignment) - huge pmd/pud check for value 0 - rate-limiting should only take unhandled signals into account Clean-up: - incorrect use of pgprot_t type - unused header include - __init annotation to arm_cpuidle_init - pr_debug instead of pr_error for disabled GICC entries in ACPI/MADT" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix show_unhandled_signal_ratelimited usage ARM64 / SMP: Switch pr_err() to pr_debug() for disabled GICC entry arm64: cpuidle: add __init section marker to arm_cpuidle_init arm64: Don't report clear pmds and puds as huge arm64: perf: fix unassigned cpu_pmu->plat_device when probing PMU PPIs arm64: perf: Don't use of_node after putting it arm64: fix incorrect use of pgprot_t variable arm64/hw_breakpoint.c: remove unnecessary header arm64: bpf: fix endianness conversion bugs arm64: bpf: fix out-of-bounds read in bpf2a64_offset() ARM64: smp: Fix suspicious RCU usage with ipi tracepoints	2015-07-03 12:28:30 -07:00
Suzuki K. Poulose	f871d26807	arm64: Fix show_unhandled_signal_ratelimited usage Commit `86dca36e6b` introduced ratelimited usage for 'unhandled_signal' messages. The commit checks the ratelimit irrespective of whether the signal is handled or not, which is wrong and leads to false reports like the below in dmesg : __do_user_fault: 127 callbacks suppressed Do the ratelimit check only if the signal is unhandled. Fixes: `86dca36e6b` ("arm64: use private ratelimit state along with show_unhandled_signals") Cc: Vladimir Murzin <Vladimir.Murzin@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-07-03 17:03:06 +01:00
Christoffer Dall	fd28f5d439	arm64: Don't report clear pmds and puds as huge The current pmd_huge() and pud_huge() functions simply check if the table bit is not set and reports the entries as huge in that case. This is counter-intuitive as a clear pmd/pud cannot also be a huge pmd/pud, and it is inconsistent with at least arm and x86. To prevent others from making the same mistake as me in looking at code that calls these functions and to fix an issue with KVM on arm64 that causes memory corruption due to incorrect page reference counting resulting from this mistake, let's change the behavior. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Steve Capper <steve.capper@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Fixes: `084bd29810` ("ARM64: mm: HugeTLB support.") Cc: <stable@vger.kernel.org> # 3.11+ Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-07-01 14:29:28 +01:00
Ard Biesheuvel	1e43ba9cd8	arm64: fix incorrect use of pgprot_t variable This fixes a build failure under STRICT_MM_TYPECHECKS, by adding a missing pgprot_val() around a pgport_t reference. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-30 18:03:37 +01:00
Zhang Zhen	e81f2d2237	mm/hugetlb: reduce arch dependent code about huge_pmd_unshare Currently we have many duplicates in definitions of huge_pmd_unshare. In all architectures this function just returns 0 when CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N. This patch puts the default implementation in mm/hugetlb.c and lets these architectures use the common code. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Rientjes <rientjes@google.com> Cc: James Yang <James.Yang@freescale.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:41 -07:00
Linus Torvalds	e3d8238d7f	arm64 updates for 4.2, mostly refactoring/clean-up: - CPU ops and PSCI (Power State Coordination Interface) refactoring following the merging of the arm64 ACPI support, together with handling of Trusted (secure) OS instances - Using fixmap for permanent FDT mapping, removing the initial dtb placement requirements (within 512MB from the start of the kernel image). This required moving the FDT self reservation out of the memreserve processing - Idmap (1:1 mapping used for MMU on/off) handling clean-up - Removing flush_cache_all() - not safe on ARM unless the MMU is off. Last stages of CPU power down/up are handled by firmware already - "Alternatives" (run-time code patching) refactoring and support for immediate branch patching, GICv3 CPU interface access - User faults handling clean-up And some fixes: - Fix for VDSO building with broken ELF toolchains - Fixing another case of init_mm.pgd usage for user mappings (during ASID roll-over broadcasting) - Fix for FPSIMD reloading after CPU hotplug - Fix for missing syscall trace exit - Workaround for .inst asm bug - Compat fix for switching the user tls tpidr_el0 register -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVitZgAAoJEGvWsS0AyF7x+ToP/0Yci5bNsYVwVay8N1rK6WHh SGzDMzyxcSBjQpz2IhhTJ28eTAEH+a+HWQms+0tBehjqxqkvjuzBN0okDkc/z8NB 7Z0BV2aLkQcMwTbjgIh5jm25ZpGmvmvbWPD5oBwgmgQ4v4i1OLRKgx7+YQ+z9rWZ zC70d0UwyWjs2oxmjd2ZrAkps7v/qozEFhcRHxLzCn8Mbw+3FcTQsqMbfnoWGnH0 YuGfHQQqBY4/HC7uAslMCy7tXeJXqb+NsgrnAovjfEbHGDjsg0KNl06K++LHwE37 A5Noa3M0AQEPYqx/sg0Ec8RNUUEMB4RA2DCaibp7XlVGncXOwFfiyk/M5uVrYXIO ku5QF0ytUfZKzrMq3yQKbEDuCPOFTqjjdVpkeXKFdW66zYTohKVc3vUBV5xHZ5uO 7Kr8H0ZnhAv3OxPyKdEwAuQ5sJdWwQSvZyGClxMUO4eC/UzD0Mwwf1Y8WYtiAXx+ NSTeBKw/m33W3/KhNuNH1+qGEOKhuXuKX7AcYA84Rab8ytxYWcurHCG2bmhzpEse 2DZtNMybrP/HMQPyJlYgGac8B3QbsAIAkkU1f+dJTAv9otuBDhscaDQyZ9Y6WVht /k8zJiZeMEuGAmwgTkzLmWs/8pQq42nW4J4eQdXPZAwp4ghCIypPWfaZASAwee6/ p+es3v8P4k9wkv2TFZMh =YeGl -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Mostly refactoring/clean-up: - CPU ops and PSCI (Power State Coordination Interface) refactoring following the merging of the arm64 ACPI support, together with handling of Trusted (secure) OS instances - Using fixmap for permanent FDT mapping, removing the initial dtb placement requirements (within 512MB from the start of the kernel image). This required moving the FDT self reservation out of the memreserve processing - Idmap (1:1 mapping used for MMU on/off) handling clean-up - Removing flush_cache_all() - not safe on ARM unless the MMU is off. Last stages of CPU power down/up are handled by firmware already - "Alternatives" (run-time code patching) refactoring and support for immediate branch patching, GICv3 CPU interface access - User faults handling clean-up And some fixes: - Fix for VDSO building with broken ELF toolchains - Fix another case of init_mm.pgd usage for user mappings (during ASID roll-over broadcasting) - Fix for FPSIMD reloading after CPU hotplug - Fix for missing syscall trace exit - Workaround for .inst asm bug - Compat fix for switching the user tls tpidr_el0 register" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits) arm64: use private ratelimit state along with show_unhandled_signals arm64: show unhandled SP/PC alignment faults arm64: vdso: work-around broken ELF toolchains in Makefile arm64: kernel: rename __cpu_suspend to keep it aligned with arm arm64: compat: print compat_sp instead of sp arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP arm64: entry: fix context tracking for el0_sp_pc arm64: defconfig: enable memtest arm64: mm: remove reference to tlb.S from comment block arm64: Do not attempt to use init_mm in reset_context() arm64: KVM: Switch vgic save/restore to alternative_insn arm64: alternative: Introduce feature for GICv3 CPU interface arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning arm64: fix bug for reloading FPSIMD state after CPU hotplug. arm64: kernel thread don't need to save fpsimd context. arm64: fix missing syscall trace exit arm64: alternative: Work around .inst assembler bugs arm64: alternative: Merge alternative-asm.h into alternative.h arm64: alternative: Allow immediate branch as alternative instruction arm64: Rework alternate sequence for ARM erratum 845719 ...	2015-06-24 10:02:15 -07:00
Linus Torvalds	43c9fad942	Power management and ACPI material for v4.2-rc1 - ACPICA update to upstream revision 20150515 including basic support for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO, XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM, FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI, _MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore, Lv Zheng). - ACPI device power management core code update to follow ACPI 6 which reflects the ACPI device power management implementation in Windows (Rafael J Wysocki). - Rework of the backlight interface selection logic to reduce the number of kernel command line options and improve the handling of DMI quirks that may be involved in that and to make the code generally more straightforward (Hans de Goede). - Fixes for the ACPI Embedded Controller (EC) driver related to the handling of EC transactions (Lv Zheng). - Fix for a regression related to the ACPI resources management and resulting from a recent change of ACPI initialization code ordering (Rafael J Wysocki). - Fix for a system initialization regression related to ACPI introduced during the 3.14 cycle and caused by running the code that switches the platform over to the ACPI mode too early in the initialization sequence (Rafael J Wysocki). - Support for the ACPI _CCA device configuration object related to DMA cache coherence (Suravee Suthikulpanit). - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov). - ACPI battery driver cleanups (Luis Henriques, Mathias Krause). - ACPI processor driver cleanups (Hanjun Guo). - Cleanups and documentation update related to the ACPI device properties interface based on _DSD (Rafael J Wysocki). - ACPI device power management fixes (Rafael J Wysocki). - Assorted cleanups related to ACPI (Dominik Brodowski. Fabian Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki). - Fix for a long-standing issue causing General Protection Faults to be generated occasionally on return to user space after resume from ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar). - Fix to make the suspend core code return -EBUSY consistently in all cases when system suspend is aborted due to wakeup detection (Ruchi Kandoi). - Support for automated device wakeup IRQ handling allowing drivers to make their PM support more starightforward (Tony Lindgren). - New tracepoints for suspend-to-idle tracing and rework of the prepare/complete callbacks tracing in the PM core (Todd E Brandt, Rafael J Wysocki). - Wakeup sources framework enhancements (Jin Qian). - New macro for noirq system PM callbacks (Grygorii Strashko). - Assorted cleanups related to system suspend (Rafael J Wysocki). - cpuidle core cleanups to make the code more efficient (Rafael J Wysocki). - powernv/pseries cpuidle driver update (Shilpasri G Bhat). - cpufreq core fixes related to CPU online/offline that should reduce the overhead of these operations quite a bit, unless the CPU in question is physically going away (Viresh Kumar, Saravana Kannan). - Serialization of cpufreq governor callbacks to avoid race conditions in some cases (Viresh Kumar). - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit Bhargava, Joe Konno). - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep Holla, Felipe Balbi, Tang Yuantian). - Assorted cleanups in cpufreq drivers and core (Shailendra Verma, Fabian Frederick, Wang Long). - New Device Tree bindings for representing Operating Performance Points (Viresh Kumar). - Updates for the common clock operations support code in the PM core (Rajendra Nayak, Geert Uytterhoeven). - PM domains core code update (Geert Uytterhoeven). - Intel Knights Landing support for the RAPL (Running Average Power Limit) power capping driver (Dasaratharaman Chandramouli). - Fixes related to the floor frequency setting on Atom SoCs in the RAPL power capping driver (Ajay Thomas). - Runtime PM framework documentation update (Ben Dooks). - cpupower tool fix (Herton R Krzesinski). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJViJdWAAoJEILEb/54YlRx/9gP/3gHoFevNRycvn0VpKqdufCI Mxy2LBBLlfyW2uD3+NvqvA2WWSo0Cs/LgXa04eAVxPdU7k48s8w+54U23wSouzjW gfwAmuHxzDR8v0h8X3h6BxNzmkIQHtmDcQlA/cZdHejY/UUw01yxRGNUUZDNbxlm WXn2nmlBLmGqXTYq0fpBV+3jicUghJqHHsBCqa3VR2yQioHMJG01F4UZMqYTZunN OIvDUghxByKz6alzdCqlLl1Y0exV6vwWUAzBsl1qHqmHu/bWFSZn3ujNNVrjqHhw Kl7/8dC2pQkv3Zo3gEVvfQ0onotwWZxGHzPQRdvmxvRnBunQVCi/wynx90yABX/r PPb/iBNV0mZskbF0zb0GZT3ZZWGA8Z0p3o5JQv2jV4m62qTzx8w50Y5kbn9N1WT+ 5bre7AVbVAlGonWszcS9iE+6TOboRz9OD1CCwPFXHItFutlBkau+1hHfFoLM0o9n LhpGuyszT/EUa1BHkLzuCckFqO2DpbF3N2CKmuTekw0CdgdsvRL2pRByuerk3j7R WQhlcvBq5YH6j43AuoEZKp8r1iN8oG/iqlrMYQaYWrW9hJaoQOoU8dGJxp/e7gKN r/qeYjETI+tIsjCbtH5WQzzxDI3gPISAYAtfqs7G34EEo+Lwp6kyRUAF4kDot2V3 ZIyuKMmTu4cdwDETr/O+ =7jTj -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "The rework of backlight interface selection API from Hans de Goede stands out from the number of commits and the number of affected places perspective. The cpufreq core fixes from Viresh Kumar are quite significant too as far as the number of commits goes and because they should reduce CPU online/offline overhead quite a bit in the majority of cases. From the new featues point of view, the ACPICA update (to upstream revision 20150515) adding support for new ACPI 6 material to ACPICA is the one that matters the most as some new significant features will be based on it going forward. Also included is an update of the ACPI device power management core to follow ACPI 6 (which in turn reflects the Windows' device PM implementation), a PM core extension to support wakeup interrupts in a more generic way and support for the ACPI _CCA device configuration object. The rest is mostly fixes and cleanups all over and some documentation updates, including new DT bindings for Operating Performance Points. There is one fix for a regression introduced in the 4.1 cycle, but it adds quite a number of lines of code, it wasn't really ready before Thursday and you were on vacation, so I refrained from pushing it on the last minute for 4.1. Specifics: - ACPICA update to upstream revision 20150515 including basic support for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO, XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM, FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI, _MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore, Lv Zheng). - ACPI device power management core code update to follow ACPI 6 which reflects the ACPI device power management implementation in Windows (Rafael J Wysocki). - rework of the backlight interface selection logic to reduce the number of kernel command line options and improve the handling of DMI quirks that may be involved in that and to make the code generally more straightforward (Hans de Goede). - fixes for the ACPI Embedded Controller (EC) driver related to the handling of EC transactions (Lv Zheng). - fix for a regression related to the ACPI resources management and resulting from a recent change of ACPI initialization code ordering (Rafael J Wysocki). - fix for a system initialization regression related to ACPI introduced during the 3.14 cycle and caused by running the code that switches the platform over to the ACPI mode too early in the initialization sequence (Rafael J Wysocki). - support for the ACPI _CCA device configuration object related to DMA cache coherence (Suravee Suthikulpanit). - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov). - ACPI battery driver cleanups (Luis Henriques, Mathias Krause). - ACPI processor driver cleanups (Hanjun Guo). - cleanups and documentation update related to the ACPI device properties interface based on _DSD (Rafael J Wysocki). - ACPI device power management fixes (Rafael J Wysocki). - assorted cleanups related to ACPI (Dominik Brodowski, Fabian Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki). - fix for a long-standing issue causing General Protection Faults to be generated occasionally on return to user space after resume from ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar). - fix to make the suspend core code return -EBUSY consistently in all cases when system suspend is aborted due to wakeup detection (Ruchi Kandoi). - support for automated device wakeup IRQ handling allowing drivers to make their PM support more starightforward (Tony Lindgren). - new tracepoints for suspend-to-idle tracing and rework of the prepare/complete callbacks tracing in the PM core (Todd E Brandt, Rafael J Wysocki). - wakeup sources framework enhancements (Jin Qian). - new macro for noirq system PM callbacks (Grygorii Strashko). - assorted cleanups related to system suspend (Rafael J Wysocki). - cpuidle core cleanups to make the code more efficient (Rafael J Wysocki). - powernv/pseries cpuidle driver update (Shilpasri G Bhat). - cpufreq core fixes related to CPU online/offline that should reduce the overhead of these operations quite a bit, unless the CPU in question is physically going away (Viresh Kumar, Saravana Kannan). - serialization of cpufreq governor callbacks to avoid race conditions in some cases (Viresh Kumar). - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit Bhargava, Joe Konno). - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep Holla, Felipe Balbi, Tang Yuantian). - assorted cleanups in cpufreq drivers and core (Shailendra Verma, Fabian Frederick, Wang Long). - new Device Tree bindings for representing Operating Performance Points (Viresh Kumar). - updates for the common clock operations support code in the PM core (Rajendra Nayak, Geert Uytterhoeven). - PM domains core code update (Geert Uytterhoeven). - Intel Knights Landing support for the RAPL (Running Average Power Limit) power capping driver (Dasaratharaman Chandramouli). - fixes related to the floor frequency setting on Atom SoCs in the RAPL power capping driver (Ajay Thomas). - runtime PM framework documentation update (Ben Dooks). - cpupower tool fix (Herton R Krzesinski)" * tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits) cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state x86: Load __USER_DS into DS/ES after resume PM / OPP: Add binding for 'opp-suspend' PM / OPP: Allow multiple OPP tables to be passed via DT PM / OPP: Add new bindings to address shortcomings of existing bindings ACPI: Constify ACPI device IDs in documentation ACPI / enumeration: Document the rules regarding the PRP0001 device ID ACPI / video: Make acpi_video_unregister_backlight() private acpi-video-detect: Remove old API toshiba-acpi: Port to new backlight interface selection API thinkpad-acpi: Port to new backlight interface selection API sony-laptop: Port to new backlight interface selection API samsung-laptop: Port to new backlight interface selection API msi-wmi: Port to new backlight interface selection API msi-laptop: Port to new backlight interface selection API intel-oaktrail: Port to new backlight interface selection API ideapad-laptop: Port to new backlight interface selection API fujitsu-laptop: Port to new backlight interface selection API eeepc-laptop: Port to new backlight interface selection API dell-wmi: Port to new backlight interface selection API ...	2015-06-23 14:18:07 -07:00
Vladimir Murzin	86dca36e6b	arm64: use private ratelimit state along with show_unhandled_signals printk_ratelimit() shares the ratelimiting state with other callers what may lead to scenarios where at the time we want to print out debug information we already limited, so nothing appears in the dmesg - this makes exception-trace quite poor helper in debugging. Additionally, we have imbalance with some messages limited with global ratelimit state and other messages limited with their private state defined via pr_*_ratelimited(). To address this inconsistency show_unhandled_signals_ratelimited() macro is introduced and caller sites are converted to use it. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-19 16:26:15 +01:00
Vladimir Murzin	9e793ab84e	arm64: show unhandled SP/PC alignment faults Report unhandled SP/PC alignment faults if the show_unhandled_signals variable is set (via /proc/sys/debug/exception-trace). Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-19 16:20:10 +01:00
Dave P Martin	b9bcc91993	arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP The memmap freeing code in free_unused_memmap() computes the end of each memblock by adding the memblock size onto the base. However, if SPARSEMEM is enabled then the value (start) used for the base may already have been rounded downwards to work out which memmap entries to free after the previous memblock. This may cause memmap entries that are in use to get freed. In general, you're not likely to hit this problem unless there are at least 2 memblocks and one of them is not aligned to a sparsemem section boundary. Note that carve-outs can increase the number of memblocks by splitting the regions listed in the device tree. This problem doesn't occur with SPARSEMEM_VMEMMAP, because the vmemmap code deals with freeing the unused regions of the memmap instead of requiring the arch code to do it. This patch gets the memblock base out of the memblock directly when computing the block end address to ensure the correct value is used. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-17 14:29:34 +01:00
Suthikulpanit, Suravee	b6197b93fa	arm64 : Introduce support for ACPI _CCA object section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be specified for DMA-cabpable devices. Therefore, this patch specifies ACPI_CCA_REQUIRED in arm64 Kconfig. In addition, to handle the case when _CCA is missing, arm64 would assign dummy_dma_ops to disable DMA capability of the device. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-06-15 14:40:49 +02:00
Catalin Marinas	565630d503	arm64: Do not attempt to use init_mm in reset_context() After secondary CPU boot or hotplug, the active_mm of the idle thread is &init_mm. The init_mm.pgd (swapper_pg_dir) is only meant for TTBR1_EL1 and must not be set in TTBR0_EL1. Since when active_mm == &init_mm the TTBR0_EL1 is already set to the reserved value, there is no need to perform any context reset. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org>	2015-06-12 15:36:18 +01:00
Marc Zyngier	8d883b23ae	arm64: alternative: Merge alternative-asm.h into alternative.h asm/alternative-asm.h and asm/alternative.h are extremely similar, and really deserve to live in the same file (as this makes further modufications a bit easier). Fold the content of alternative-asm.h into alternative.h, and update the few users. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-05 10:38:53 +01:00

... 3 4 5 6 7 ...

636 Commits