Commit Graph

122764 Commits

Author SHA1 Message Date
Bjorn Helgaas
3efc702378 Merge branch 'pci/resource' into next
* pci/resource:
  unicore32/PCI: Remove pci=firmware command line parameter handling
  ARM/PCI: Remove arch-specific pcibios_enable_device()
  ARM64/PCI: Remove arch-specific pcibios_enable_device()
  MIPS/PCI: Claim bus resources on PCI_PROBE_ONLY set-ups
  ARM/PCI: Claim bus resources on PCI_PROBE_ONLY set-ups
  PCI: generic: Claim bus resources on PCI_PROBE_ONLY set-ups
  PCI: Add generic pci_bus_claim_resources()
  alx: Use pci_(request|release)_mem_regions
  ethernet/intel: Use pci_(request|release)_mem_regions
  GenWQE: Use pci_(request|release)_mem_regions
  lpfc: Use pci_(request|release)_mem_regions
  NVMe: Use pci_(request|release)_mem_regions
  PCI: Add helpers to request/release memory and I/O regions
  PCI: Extending pci=resource_alignment to specify device/vendor IDs
  sparc/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()
  powerpc/pci: Implement pci_resource_to_user() with pcibios_resource_to_bus()
  microblaze/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()
  PCI: Unify pci_resource_to_user() declarations
  microblaze/PCI: Remove useless __pci_mmap_set_pgprot()
  powerpc/pci: Remove __pci_mmap_set_pgprot()
  PCI: Ignore write combining when mapping I/O port space
2016-08-01 12:23:44 -05:00
Bjorn Helgaas
a00c74c166 Merge branches 'pci/aspm', 'pci/dpc', 'pci/hotplug', 'pci/misc', 'pci/msi', 'pci/pm' and 'pci/virtualization' into next
* pci/aspm:
  PCI/ASPM: Remove redundant check of pcie_set_clkpm

* pci/dpc:
  PCI: Remove DPC tristate module option
  PCI: Bind DPC to Root Ports as well as Downstream Ports
  PCI: Fix whitespace in struct dpc_dev
  PCI: Convert Downstream Port Containment driver to use devm_* functions

* pci/hotplug:
  PCI: Allow additional bus numbers for hotplug bridges

* pci/misc:
  PCI: Include <asm/dma.h> for isa_dma_bridge_buggy
  PCI: Make bus_attr_resource_alignment static
  MAINTAINERS: Add file patterns for PCI device tree bindings
  PCI: Fix comment typo

* pci/msi:
  PCI/MSI: irqchip: Fix PCI_MSI dependencies

* pci/pm:
  PCI: pciehp: Ignore interrupts during D3cold
  PCI: Document connection between pci_power_t and hardware PM capability
  PCI: Add runtime PM support for PCIe ports
  ACPI / hotplug / PCI: Runtime resume bridge before rescan
  PCI: Power on bridges before scanning new devices
  PCI: Put PCIe ports into D3 during suspend
  PCI: Don't clear d3cold_allowed for PCIe ports
  PCI / PM: Enforce type casting for pci_power_t

* pci/virtualization:
  PCI: Add ACS quirk for Solarflare SFC9220
  PCI: Add DMA alias quirk for Adaptec 3805
  PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset
  PCI: Add function 1 DMA alias quirk for Marvell 88SE9182
2016-08-01 12:23:31 -05:00
Bjorn Helgaas
ab2b750cad unicore32/PCI: Remove pci=firmware command line parameter handling
Remove support for the "pci=firmware" command line parameter, which was
way to keep the kernel from changing any PCI BAR assignments.  This was
copied from ARM, but is not actually needed on unicore32.

The corresponding ARM support was removed by 903589ca71 ("ARM: 8554/1:
kernel: pci: remove pci=firmware command line parameter handling").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-23 17:15:33 -05:00
Lorenzo Pieralisi
313cb90285 ARM/PCI: Remove arch-specific pcibios_enable_device()
On systems with PCI_PROBE_ONLY set, we rely on BAR assignments from
firmware.  Previously we did not insert those resources into the resource
tree, so we had to skip pci_enable_resources() because it fails if
resources are not in the resource tree.

Now that we *do* insert resources even when PCI_PROBE_ONLY is set, we no
longer need the ARM-specific pcibios_enable_device().  Remove it so we
use the generic version.

[bhelgaas: changelog]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Russell King <linux@arm.linux.org.uk>
2016-06-23 17:15:32 -05:00
Lorenzo Pieralisi
f615bca4cc ARM64/PCI: Remove arch-specific pcibios_enable_device()
On systems with PCI_PROBE_ONLY set, we rely on BAR assignments from
firmware.  Previously we did not insert those resources into the resource
tree, so we had to skip pci_enable_resources() because it fails if
resources are not in the resource tree.

Now that we *do* insert resources even when PCI_PROBE_ONLY is set, we no
longer need the ARM64-specific pcibios_enable_device().  Remove it so we
use the generic version.

[bhelgaas: changelog]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Catalin Marinas <catalin.marinas@arm.com>
2016-06-23 17:15:30 -05:00
Bjorn Helgaas
046136170a MIPS/PCI: Claim bus resources on PCI_PROBE_ONLY set-ups
We claim PCI BAR and bridge window resources in pci_bus_assign_resources(),
but when PCI_PROBE_ONLY is set, we treat those resources as immutable and
don't call pci_bus_assign_resources(), so the resources aren't put in the
resource tree.

When the resources aren't in the tree, they don't show up in /proc/iomem,
we can't detect conflicts, and we need special cases elsewhere for
PCI_PROBE_ONLY or resources without a parent pointer.

Claim all PCI BAR and window resources in the PCI_PROBE_ONLY case.

If a PCI_PROBE_ONLY platform assigns conflicting resources, Linux can't fix
the conflicts.  Previously we didn't notice the conflicts, but now we will,
which may expose new failures.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-23 16:32:20 -05:00
Lorenzo Pieralisi
b30742aa30 ARM/PCI: Claim bus resources on PCI_PROBE_ONLY set-ups
We claim PCI BAR and bridge window resources in pci_bus_assign_resources(),
but when PCI_PROBE_ONLY is set, we treat those resources as immutable and
don't call pci_bus_assign_resources(), so the resources aren't put in the
resource tree.

When the resources aren't in the tree, they don't show up in /proc/iomem,
we can't detect conflicts, and we need special cases elsewhere for
PCI_PROBE_ONLY or resources without a parent pointer.

Claim all PCI BAR and window resources in the PCI_PROBE_ONLY case.

If a PCI_PROBE_ONLY platform assigns conflicting resources, Linux can't fix
the conflicts.  Previously we didn't notice the conflicts, but now we will,
which may expose new failures.

[bhelgaas: changelog, add resource comment, remove size/assign comments]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Russell King <linux@armlinux.org.uk>
2016-06-23 16:29:44 -05:00
Bjorn Helgaas
3b146b24a4 sparc/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files.  On sparc, these are PCI bus addresses, i.e., raw BAR values.

Previously pci_resource_to_user() computed the user address by
subtracting either pbm->io_space.start or pbm->mem_space.start from the
resource start.

We've already told the PCI core about those offsets here:

  pci_scan_one_pbm()
    pci_add_resource_offset(&resources, &pbm->io_space, pbm->io_space.start);
    pci_add_resource_offset(&resources, &pbm->mem_space, pbm->mem_space.start);
    pci_add_resource_offset(&resources, &pbm->mem64_space, pbm->mem_space.start);

so pcibios_resource_to_bus() knows how to do that translation.

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2016-06-17 14:45:03 -05:00
Bjorn Helgaas
3830135857 powerpc/pci: Implement pci_resource_to_user() with pcibios_resource_to_bus()
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files.  For I/O port resources on powerpc, these are PCI bus addresses,
i.e., raw BAR values.

Previously pci_resource_to_user() computed the user address by subtracting
"hose->io_base_virt - _IO_BASE" from the resource start:

  pci_resource_to_user()
    if (IO)
      offset = (unsigned long)hose->io_base_virt - _IO_BASE;
    *start = rsrc->start - offset;

We've already told the PCI core about that "hose->io_base_virt - _IO_BASE"
offset:

  pcibios_setup_phb_resources()
    res = &hose->io_resource;
    offset = pcibios_io_space_offset();
    /* i.e., "offset = hose->io_base_virt - _IO_BASE" */
    pci_add_resource_offset(resources, res, offset);

so pcibios_resource_to_bus() knows how to do that translation.

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2016-06-17 14:44:44 -05:00
Bjorn Helgaas
0ad8f06d58 microblaze/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files.  For I/O port resources on microblaze, these are PCI bus addresses,
i.e., raw BAR values.

Previously pci_resource_to_user() computed the user address by subtracting
"hose->io_base_virt - _IO_BASE" from the resource start:

  pci_resource_to_user()
    if (IO)
      offset = (unsigned long)hose->io_base_virt - _IO_BASE;
    *start = rsrc->start - offset;

We've already told the PCI core about that "hose->io_base_virt - _IO_BASE"
offset:

  pcibios_setup_phb_resources()
    res = &hose->io_resource;
    pci_add_resource_offset(resources, res, hose->io_base_virt - _IO_BASE);

so pcibios_resource_to_bus() knows how to do that translation.

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2016-06-17 14:43:34 -05:00
Bjorn Helgaas
8221a01352 PCI: Unify pci_resource_to_user() declarations
Replace the pci_resource_to_user() declarations in each arch that defines
HAVE_ARCH_PCI_RESOURCE_TO_USER with a single one in linux/pci.h.

Change the MIPS static inline implementation to a non-inline version so the
static inline doesn't conflict with the new non-static linux/pci.h
declaration.

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-17 14:43:34 -05:00
Bjorn Helgaas
c444a2be22 microblaze/PCI: Remove useless __pci_mmap_set_pgprot()
The microblaze __pci_mmap_set_pgprot() was apparently copied from powerpc,
where it computes either an uncacheable pgprot_t or a write-combining one.
But on microblaze, we always use the regular uncacheable pgprot_t.

Remove the useless code in __pci_mmap_set_pgprot() and inline it at the
only call site.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2016-06-17 14:43:33 -05:00
Yinghai Lu
1e70cdd6e6 powerpc/pci: Remove __pci_mmap_set_pgprot()
The powerpc-specific __pci_mmap_set_pgprot() does two things:

  1) Disables write combining for I/O port space mappings

     This only affects procfs mappings.  The pci_mmap_resource() sysfs path
     only requests write combining for resources with IORESOURCE_PREFETCH
     set, which doesn't include I/O resources.

     The only way to request write combining for I/O port space mappings
     was via the PCIIOC_WRITE_COMBINE ioctl and the proc_bus_pci_mmap()
     path, and we recently changed that path to ignore write combining for
     I/O, so this code in powerpc is no longer needed.

  2) Automatically enables write combining for mappings of prefetchable
     resources, even if not requested by the user

     Both procfs (via PCIIOC_MMAP_IS_MEM and PCIIOC_WRITE_COMBINE ioctls)
     and sysfs (via "resourceN_wc" files, which are created for resources
     with IORESOURCE_PREFETCH) provide ways for the user to map PCI memory
     space with write combining.

     Users that desire write combining should use one of those ways instead
     of relying on powerpc-specific behavior.

Remove the powerpc-specific __pci_mmap_set_pgprot().

The user-visible effect of this change is that powerpc users mapping
prefetchable PCI memory space via procfs without PCIIOC_WRITE_COMBINE or
via sysfs "resourceN" (not "resourceN_wc") will get regular uncacheable
mappings instead of the write combining mappings they used to get.

The new behavior matches the behavior on all other arches that support
write combining mapping.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-17 14:43:33 -05:00
Arnd Bergmann
3ee803641e PCI/MSI: irqchip: Fix PCI_MSI dependencies
The PCI_MSI symbol is used inconsistently throughout the tree, with some
drivers using 'select' and others using 'depends on', or using conditional
selects.  This keeps causing problems; the latest one is a result of
ARCH_ALPINE using a 'select' statement to enable its platform-specific MSI
driver without enabling MSI:

  warning: (ARCH_ALPINE) selects ALPINE_MSI which has unmet direct dependencies (PCI && PCI_MSI)
  drivers/irqchip/irq-alpine-msi.c:104:15: error: variable 'alpine_msix_domain_info' has initializer but incomplete type
   static struct msi_domain_info alpine_msix_domain_info = {
		 ^~~~~~~~~~~~~~~
  drivers/irqchip/irq-alpine-msi.c:105:2: error: unknown field 'flags' specified in initializer
    .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
    ^
  drivers/irqchip/irq-alpine-msi.c:105:11: error: 'MSI_FLAG_USE_DEF_DOM_OPS' undeclared here (not in a function)
    .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
	     ^~~~~~~~~~~~~~~~~~~~~~~~

There is little reason to enable PCI support for a platform that uses MSI
but then leave MSI disabled at compile time.

Select PCI_MSI from irqchips that implement MSI, and make PCI host bridges
that use MSI on ARM depend on PCI_MSI_IRQ_DOMAIN.

For all three architectures that support PCI_MSI_IRQ_DOMAIN (ARM, ARM64,
X86), enable it by default whenever MSI is enabled.

[bhelgaas: changelog, omit crypto config change]
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-15 15:47:33 -05:00
Andrea Gelmini
386ed2ab85 PCI: Fix comment typo
Fix typo.

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10 19:05:09 -05:00
Tomasz Nowicki
0cb0786bac ARM64: PCI: Support ACPI-based PCI host controller
Implement pci_acpi_scan_root() and other arch-specific calls so ARM64 can
use ACPI to setup and enumerate PCI buses.

Use memory-mapped configuration space information from either the ACPI
_CBA method or the MCFG table and the ECAM library and generic ECAM config
accessor ops.

Implement acpi_pci_bus_find_domain_nr() to retrieve the domain number from
the acpi_pci_root structure.

Implement pcibios_add_bus() and pcibios_remove_bus() to call
acpi_pci_add_bus() and acpi_pci_remove_bus() for ACPI slot management and
other configuration.

Signed-off-by: Tomasz Nowicki <tn@semihalf.com>
Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10 18:37:25 -05:00
Tomasz Nowicki
f058f4fbd6 ARM64: PCI: Implement AML accessors for PCI_Config region
On ACPI systems, the PCI_Config OperationRegion allows AML to access PCI
configuration space.  The ACPI CA AML interpreter uses performs config
space accesses with acpi_os_read_pci_configuration() and
acpi_os_write_pci_configuration(), which are OS-dependent functions
supplied by acpi/osl.c.

Implement the arch-specific raw_pci_read() and raw_pci_write() interfaces
used by acpi/osl.c for PCI_Config accesses.

N.B. PCI_Config accesses are not supported before PCI bus enumeration.

[bhelgaas: changelog]
Signed-off-by: Tomasz Nowicki <tn@semihalf.com>
Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10 18:36:19 -05:00
Tomasz Nowicki
d8ed75d593 ARM64: PCI: ACPI support for legacy IRQs parsing and consolidation with DT code
To enable PCI legacy IRQs on platforms booting with ACPI, arch code should
include ACPI-specific callbacks that parse and set-up the device IRQ
number, equivalent to the DT boot path. Owing to the current ACPI core scan
handlers implementation, ACPI PCI legacy IRQs bindings cannot be parsed at
device add time, since that would trigger ACPI scan handlers ordering
issues depending on how the ACPI tables are defined.

To solve this problem and consolidate FW PCI legacy IRQs parsing in one
single pcibios callback (pending final removal), this patch moves DT PCI
IRQ parsing to the pcibios_alloc_irq() callback (called by PCI core code at
driver probe time) and adds ACPI PCI legacy IRQs parsing to the same
callback too, so that FW PCI legacy IRQs parsing is confined in one single
arch callback that can be easily removed when code parsing PCI legacy IRQs
is consolidated and moved to core PCI code.

Suggested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Tomasz Nowicki <tn@semihalf.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10 18:29:46 -05:00
Tomasz Nowicki
2ab51ddeca ARM64: PCI: Add acpi_pci_bus_find_domain_nr()
Extend pci_bus_find_domain_nr() so it can find the domain from either:

  - ACPI, via the new acpi_pci_bus_find_domain_nr() interface, or
  - DT, via of_pci_bus_find_domain_nr()

Note that this is only used for CONFIG_PCI_DOMAINS_GENERIC=y, so it does
not affect x86 or ia64.

[bhelgaas: changelog]
Signed-off-by: Tomasz Nowicki <tn@semihalf.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10 18:28:39 -05:00
Helge Deller
58f1c654d1 parisc: Move die_if_kernel() prototype into traps.h header
Signed-off-by: Helge Deller <deller@gmx.de>
2016-06-05 08:49:01 +02:00
Helge Deller
8b78f26088 parisc: Fix pagefault crash in unaligned __get_user() call
One of the debian buildd servers had this crash in the syslog without
any other information:

 Unaligned handler failed, ret = -2
 clock_adjtime (pid 22578): Unaligned data reference (code 28)
 CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G  E  4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
 task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
 PSW: 00001000000001001111100000001111 Tainted: G            E
 r00-03  000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
 r04-07  00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
 r08-11  0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
 r12-15  000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
 r16-19  0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
 r20-23  0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
 r24-27  0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
 r28-31  0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
 sr00-03  0000000001200000 0000000001200000 0000000000000000 0000000001200000
 sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
  IIR: 0ca0d089    ISR: 0000000001200000  IOR: 00000000fa6f7fff
  CPU:        1   CR30: 00000001bde7c000 CR31: ffffffffffffffff
  ORIG_R28: 00000002369fe628
  IAOQ[0]: compat_get_timex+0x2dc/0x3c0
  IAOQ[1]: compat_get_timex+0x2e0/0x3c0
  RP(r2): compat_get_timex+0x40/0x3c0
 Backtrace:
  [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0
  [<0000000040205024>] syscall_exit+0x0/0x14

This means the userspace program clock_adjtime called the clock_adjtime()
syscall and then crashed inside the compat_get_timex() function.
Syscalls should never crash programs, but instead return EFAULT.

The IIR register contains the executed instruction, which disassebles
into "ldw 0(sr3,r5),r9".
This load-word instruction is part of __get_user() which tried to read the word
at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in.  The
unaligned handler is able to emulate all ldw instructions, but it fails if it
fails to read the source e.g. because of page fault.

The following program reproduces the problem:

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/mman.h>

int main(void) {
        /* allocate 8k */
        char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        /* free second half (upper 4k) and make it invalid. */
        munmap(ptr+4096, 4096);
        /* syscall where first int is unaligned and clobbers into invalid memory region */
        /* syscall should return EFAULT */
        return syscall(__NR_clock_adjtime, 0, ptr+4095);
}

To fix this issue we simply need to check if the faulting instruction address
is in the exception fixup table when the unaligned handler failed. If it
is, call the fixup routine instead of crashing.

While looking at the unaligned handler I found another issue as well: The
target register should not be modified if the handler was unsuccessful.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
2016-06-05 08:48:24 +02:00
Helge Deller
0032c08833 parisc: Fix printk time during boot
Avoid showing invalid printk time stamps during boot.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Aaro Koskinen <aaro.koskinen@iki.fi>
2016-06-05 08:45:09 +02:00
Mikulas Patocka
be24a89700 parisc: Fix backtrace on PA-RISC
This patch fixes backtrace on PA-RISC

There were several problems:

1) The code that decodes instructions handles instructions that subtract
from the stack pointer incorrectly. If the instruction subtracts the
number X from the stack pointer the code increases the frame size by
(0x100000000-X).  This results in invalid accesses to memory and
recursive page faults.

2) Because gcc reorders blocks, handling instructions that subtract from
the frame pointer is incorrect. For example, this function
	int f(int a)
	{
		if (__builtin_expect(a, 1))
			return a;
		g();
		return a;
	}
is compiled in such a way, that the code that decreases the stack
pointer for the first "return a" is placed before the code for "g" call.
If we recognize this decrement, we mistakenly believe that the frame
size for the "g" call is zero.

To fix problems 1) and 2), the patch doesn't recognize instructions that
decrease the stack pointer at all. To further safeguard the unwind code
against nonsense values, we don't allow frame size larger than
Total_frame_size.

3) The backtrace is not locked. If stack dump races with module unload,
invalid table can be accessed.

This patch adds a spinlock when processing module tables.

Note, that for correct backtrace, you need recent binutils.
Binutils 2.18 from Debian 5 produce garbage unwind tables.
Binutils 2.21 work better (it sometimes forgets function frames, but at
least it doesn't generate garbage).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2016-06-04 22:05:07 +02:00
Linus Torvalds
8c52b6dcdd Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 - a few simple fixes for fallout from the recent gic-v3 changes
 - a workaround for a Cavium thunderX erratum
 - a bugfix for the pic32 irqchip to make external interrupts work proper
 - a missing return value in the generic IPI management code

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-pic32-evic: Fix bug with external interrupts.
  irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
  irqchip/gic-v3: Fix quiescence check in gic_enable_redist
  irqchip/gic-v3: Fix copy+paste mistakes in defines
  irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
  genirq: Fix missing return value in irq_destroy_ipi()
2016-06-03 16:12:35 -07:00
Linus Torvalds
e603330c86 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "Just one fix to the ptrace code, spotted by Simon Marchi, where if a
  thread migrates to a different CPU and the VFP registers are changed
  through ptrace, the application doesn't see the updated VFP registers"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: fix PTRACE_SETVFPREGS on SMP systems
2016-06-03 14:39:29 -07:00
Linus Torvalds
d29e472301 arm64 fixes:
- Revert a previous revert and get hugetlb going with contiguous hints
 - Wire up missing compat syscalls
 - Enable CONFIG_SET_MODULE_RONX by default
 - Add missing line to our compat /proc/cpuinfo output
 - Clarify levels in our page table dumps
 - Fix booting with RANDOMIZE_TEXT_OFFSET enabled
 - Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
 - Remove some dead code and update a comment
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJXUZzdAAoJELescNyEwWM0Fm4H/215TJBXcohdIpTYX/Nr3+Bh
 5jOqMgEWKe6ipmXCnzWpZ1F1GHoxLxndSWhJbi1Gwpdc+EYdXWxT5Z/IsQMKZXOb
 n81RzJbit6hXlebZ99YbEjfokfpMATno/9pD8sr1F2Xg/LsrD00O/0aPk9Fp38vX
 VRK2TlaVdBQbF80rMI6fUEEVksbJBqS+jBlqwkRri9JPusP+F5F4Qg8f4VqXDSef
 ZDEBzaI1BjrIQhDGSf0baXQ9EvuyotT2VR1NKAbYzdK8JoLTYiacIDviU2TCsuYZ
 ujB3MTOUytxAJbFhC2KBiDJfukSNY8vg+fLFg7rM3qUj6TMw6ESe1goYVScljHY=
 =cAlD
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The main thing here is reviving hugetlb support using contiguous ptes,
  which we ended up reverting at the last minute in 4.5 pending a fix
  which went into the core mm/ code during the recent merge window.

   - Revert a previous revert and get hugetlb going with contiguous hints
   - Wire up missing compat syscalls
   - Enable CONFIG_SET_MODULE_RONX by default
   - Add missing line to our compat /proc/cpuinfo output
   - Clarify levels in our page table dumps
   - Fix booting with RANDOMIZE_TEXT_OFFSET enabled
   - Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
   - Remove some dead code and update a comment"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
  arm64: move {PAGE,CONT}_SHIFT into Kconfig
  arm64: mm: dump: log span level
  arm64: update stale PAGE_OFFSET comment
  drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error
  drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu
  drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg
  arm64: report CPU number in bad_mode
  arm64: unistd32.h: wire up missing syscalls for compat tasks
  arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
  arm64: enable CONFIG_SET_MODULE_RONX by default
  arm64: Remove orphaned __addr_ok() definition
  Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"
2016-06-03 14:29:47 -07:00
Linus Torvalds
5306d766f1 powerpc fixes for 4.7
- Handle RTAS delay requests in configure_bridge from Russell Currey
  - Refactor the configure_bridge RTAS tokens from Russell Currey
  - Fix definition of SIAR and SDAR registers from Thomas Huth
  - Use privileged SPR number for MMCR2 from Thomas Huth
  - Update LPCR only if it is powernv from Aneesh Kumar K.V
  - Fix the reference bit update when handling hash fault from Aneesh Kumar K.V
  - Add missing tlb flush from Aneesh Kumar K.V
  - Add POWER8NVL support to ibm,client-architecture-support call from Thomas Huth
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXUWmzAAoJEFHr6jzI4aWAuEAQAKAWjT4JCIQMzc2Qk0LbBFhD
 Mw3xIt618g0PLRLN431rqk9xE1phh9ecSNt40kQV0UYypCwQyYaOzAAHq7QibRFh
 wFDLyrO10aiqTVBI9Dl7l6ZDespGP4ufOyj4GF3vIndTQ8CLT6+Jj/cfaT8lNIna
 vOp0Q8RKoSuqJu7wOZwxGHc9vhdpDftBOi2Yl2KHWrWUMTVtyDzgRwmFiCk/9SvX
 +xYVGfbCRkU1b05X9RgbsbMdiAzC7odzAC8e4VNHqB+YHgBFenlOKcx6uuz5Oubv
 /0dGJq1OEKxKgkEWZsXMhfcezI1i6H1qa5gwqO2pJn/mQiWUx/IxxC5j1tGMcgXJ
 tU4cJtMccUmB22bOTqJsGhQJyHh09UDql2PIAHdvTZplYBj34R+2fr6LMRC5TUyY
 yXsk/26CAh1ihPK8juytK8D2f7CfFJVe96HXMz8yGWNy2MeaJHh3Nz7E2/hA6L6B
 GzT08EBGmGU/1/L4xJi1uuRLQXMgvWkuo0C65m9jqFah7y82gHetypTBSbLOBcUX
 +pJRcWouZlksbEHydMlbvNrqmJz1zPx9JJFJJrfDzkYUB/O61NxOpaQPZHfhMt01
 Bd8n0wt7OPW9bZL9WrW5W11OhOxNZfuxcr5GmVnzsaALPazFThQm1gZH2NOp6Wr/
 9aWy/lZAuDOrtOnmfcZm
 =OkUs
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - Handle RTAS delay requests in configure_bridge from Russell Currey
 - Refactor the configure_bridge RTAS tokens from Russell Currey
 - Fix definition of SIAR and SDAR registers from Thomas Huth
 - Use privileged SPR number for MMCR2 from Thomas Huth
 - Update LPCR only if it is powernv from Aneesh Kumar K.V
 - Fix the reference bit update when handling hash fault from Aneesh
   Kumar K.V
 - Add missing tlb flush from Aneesh Kumar K.V
 - Add POWER8NVL support to ibm,client-architecture-support call from
   Thomas Huth

* tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
  powerpc/mm/radix: Add missing tlb flush
  powerpc/mm/hash: Fix the reference bit update when handling hash fault
  powerpc/mm/radix: Update LPCR only if it is powernv
  powerpc: Use privileged SPR number for MMCR2
  powerpc: Fix definition of SIAR and SDAR registers
  powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokens
  powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
2016-06-03 14:20:22 -07:00
Mark Rutland
aed7eb8367 arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the
following issue on the boot:

kernel BUG at arch/arm64/mm/mmu.c:480!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310
Hardware name: ARM Juno development board (r2) (DT)
task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000
PC is at map_kernel_segment+0x44/0xb0
LR is at paging_init+0x84/0x5b0
pc : [<ffff000008c450b4>] lr : [<ffff000008c451a4>] pstate: 600002c5

Call trace:
[<ffff000008c450b4>] map_kernel_segment+0x44/0xb0
[<ffff000008c451a4>] paging_init+0x84/0x5b0
[<ffff000008c42728>] setup_arch+0x198/0x534
[<ffff000008c40848>] start_kernel+0x70/0x388
[<ffff000008c401bc>] __primary_switched+0x30/0x74

Commit 7eb90f2ff7 ("arm64: cover the .head.text section in the .text
segment mapping") removed the alignment between the .head.text and .text
sections, and used the _text rather than the _stext interval for mapping
the .text segment.

Prior to this commit _stext was always section aligned and didn't cause
any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that
alignment has been removed and _text is used to map the .text segment,
we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET
is enabled.

This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset
is always aligned to the kernel page size. To ensure this, we rely on
the PAGE_SHIFT being available via Kconfig.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 7eb90f2ff7 ("arm64: cover the .head.text section in the .text segment mapping")
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03 10:57:18 +01:00
Mark Rutland
030c4d2444 arm64: move {PAGE,CONT}_SHIFT into Kconfig
In some cases (e.g. the awk for CONFIG_RANDOMIZE_TEXT_OFFSET) we would
like to make use of PAGE_SHIFT outside of code that can include the
usual header files.

Add a new CONFIG_ARM64_PAGE_SHIFT for this, likewise with
ARM64_CONT_SHIFT for consistency.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03 10:57:18 +01:00
Mark Rutland
48dd73c55d arm64: mm: dump: log span level
The page table dump code logs spans of entries at the same level
(pgd/pud/pmd/pte) which have the same attributes. While we log the
(decoded) attributes, we don't log the level, which leaves the output
ambiguous and/or confusing in some cases.

For example:

0xffff800800000000-0xffff800980000000           6G       RW NX SHD AF        BLK UXN MEM/NORMAL

If using 4K pages, this may describe a span of 6 1G block entries at the
PGD/PUD level, or 3072 2M block entries at the PMD level.

This patch adds the page table level to each output line, removing this
ambiguity. For the example above, this will produce:

0xffffffc800000000-0xffffffc980000000           6G PUD       RW NX SHD AF        BLK UXN MEM/NORMAL

When 3 level tables are in use, and we use the asm-generic/nopud.h
definitions, the dump code treats each entry in the PGD as a 1 element
table at the PUD level, and logs spans as being PUDs, which can be
confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD"
when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when
CONFIG_PGTABLE_LEVELS <= 2.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Huang Shijie <shijie.huang@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03 10:16:22 +01:00
Mark Rutland
a13e3a5b54 arm64: update stale PAGE_OFFSET comment
Commit ab893fb9f1 ("arm64: introduce KIMAGE_VADDR as the virtual
base of the kernel region") logically split KIMAGE_VADDR from
PAGE_OFFSET, and since commit f9040773b7 ("arm64: move kernel
image to base of vmalloc area") the two have been distinct values.

Unfortunately, neither commit updated the comment above these
definitions, which now erroneously states that PAGE_OFFSET is the start
of the kernel image rather than the start of the linear mapping.

This patch fixes said comment, and introduces an explanation of
KIMAGE_VADDR.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03 10:16:21 +01:00
Mark Rutland
8051f4d16e arm64: report CPU number in bad_mode
If we take an exception we don't expect (e.g. SError), we report this in
the bad_mode handler with pr_crit. Depending on the configured log
level, we may or may not log additional information in functions called
subsequently. Notably, the messages in dump_stack (including the CPU
number) are printed with KERN_DEFAULT and may not appear.

Some exceptions have an IMPLEMENTATION DEFINED ESR_ELx.ISS encoding, and
knowing the CPU number is crucial to correctly decode them. To ensure
that this is always possible, we should log the CPU number along with
the ESR_ELx value, so we are not reliant on subsequent logs or
additional printk configuration options.

This patch logs the CPU number in bad_mode such that it is possible for
a developer to decode these exceptions, provided access to sufficient
documentation.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Al Grant <Al.Grant@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03 10:16:20 +01:00
Linus Torvalds
4340fa5529 ARM:
- two fixes for 4.6 vgic [Christoffer]
    (cc stable)
 
  - six fixes for 4.7 vgic [Marc]
 
 x86:
  - six fixes from syzkaller reports [Paolo]
    (two of them cc stable)
 
  - allow OS X to boot [Dmitry]
 
  - don't trust compilers [Nadav]
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJXUI6+AAoJEED/6hsPKofof70H/iL+tri1Mu9q+8+EtANPp9i1
 XWjjAQCGszq27H/A1Io+igszW/ghhp0OoVYSB8wSUBubTSjjyba90dWe41UpBywt
 r2W41xbGUBRdC6aK5rZ0+Zw2MIv5+ubSsbDKqfY8IsWQzWln+NHo+q5UU8+NfdZ7
 Nu7+lRhXJfJWy31k/FeDp3FeZXPZk9b9iZBafmnNcg/i1JMjrrVK8nSBXOSUvZ22
 Dy/b3ln+8emv4lvOX0jfostQH0HkfulpBmJouZ5TadoegBtQEDEKpkglxSyUHAcT
 TmMhRwRyx4K6i5Bp3zAJ0LtjEks9/ayHVmerMJ+LH1MBaJqNmvxUzSPXvKGA8JE=
 =PCWN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:
   - two fixes for 4.6 vgic [Christoffer] (cc stable)

   - six fixes for 4.7 vgic [Marc]

  x86:
   - six fixes from syzkaller reports [Paolo] (two of them cc stable)

   - allow OS X to boot [Dmitry]

   - don't trust compilers [Nadav]"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
  KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
  KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
  KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
  KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
  kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
  KVM: Handle MSR_IA32_PERF_CTL
  KVM: x86: avoid write-tearing of TDP
  KVM: arm/arm64: vgic-new: Removel harmful BUG_ON
  arm64: KVM: vgic-v3: Relax synchronization when SRE==1
  arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1
  arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value
  KVM: arm/arm64: vgic-v3: Always resample level interrupts
  KVM: arm/arm64: vgic-v2: Always resample level interrupts
  KVM: arm/arm64: vgic-v3: Clear all dirty LRs
  KVM: arm/arm64: vgic-v2: Clear all dirty LRs
2016-06-02 15:08:06 -07:00
Ganapatrao Kulkarni
fbf8f40e16 irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
The erratum fixes the hang of ITS SYNC command by avoiding inter node
io and collections/cpu mapping on thunderx dual-socket platform.

This fix is only applicable for Cavium's ThunderX dual-socket platform.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-02 18:01:07 +01:00
Paolo Bonzini
d14bdb553f KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
time, and the next KVM_RUN oopses:

   general protection fault: 0000 [#1] SMP
   CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
   Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
   [...]
   Call Trace:
    [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
    [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
    [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
    [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
    [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
   RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
    RSP <ffff88005836bd50>

Testcase (beautified/reduced from syzkaller output):

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[8];

    int main()
    {
        struct kvm_debugregs dr = { 0 };

        r[2] = open("/dev/kvm", O_RDONLY);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);

        memcpy(&dr,
               "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
               "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
               "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
               "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
               48);
        r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
        r[6] = ioctl(r[4], KVM_RUN, 0);
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02 17:38:50 +02:00
Paolo Bonzini
78e546c824 KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return
EINVAL.  It causes a WARN from exception_type:

    WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]()
    CPU: 3 PID: 16732 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
    Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
     0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e
     0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2
     ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001
    Call Trace:
     [<ffffffff813b542e>] dump_stack+0x63/0x85
     [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
     [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
     [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm]
     [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm]
     [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
     [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
     [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
     [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
    ---[ end trace b1a0391266848f50 ]---

Testcase (beautified/reduced from syzkaller output):

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>
    #include <linux/kvm.h>

    long r[31];

    int main()
    {
        memset(r, -1, sizeof(r));
        r[2] = open("/dev/kvm", O_RDONLY);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0);

        struct kvm_vcpu_events ve = {
                .exception.injected = 1,
                .exception.nr = 0xd4
        };
        r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve);
        r[30] = ioctl(r[7], KVM_RUN, 0);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02 17:38:50 +02:00
Paolo Bonzini
83676e9238 KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
This causes an ugly dmesg splat.  Beautified syzkaller testcase:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <sys/ioctl.h>
    #include <fcntl.h>
    #include <linux/kvm.h>

    long r[8];

    int main()
    {
        struct kvm_cpuid2 c = { 0 };
        r[2] = open("/dev/kvm", O_RDWR);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_CREATE_VCPU, 0x8);
        r[7] = ioctl(r[4], KVM_SET_CPUID, &c);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02 17:38:50 +02:00
Paolo Bonzini
b21629da12 kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
Found by syzkaller:

    WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]()
    CPU: 3 PID: 15175 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
    Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
     0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e
     0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2
     00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000
    Call Trace:
     [<ffffffff813b542e>] dump_stack+0x63/0x85
     [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
     [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
     [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm]
     [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm]
     [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel]
     [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm]
     [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm]
     [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
     [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
     [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71

Testcase:

    #include <unistd.h>
    #include <sys/ioctl.h>
    #include <fcntl.h>
    #include <string.h>
    #include <linux/kvm.h>

    long r[8];

    int main()
    {
        memset(r, -1, sizeof(r));
	r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
        r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
        r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02 17:38:50 +02:00
Dmitry Bilunov
0c2df2a1af KVM: Handle MSR_IA32_PERF_CTL
Intel CPUs having Turbo Boost feature implement an MSR to provide a
control interface via rdmsr/wrmsr instructions. One could detect the
presence of this feature by issuing one of these instructions and
handling the #GP exception which is generated in case the referenced MSR
is not implemented by the CPU.

KVM's vCPU model behaves exactly as a real CPU in this case by injecting
a fault when MSR_IA32_PERF_CTL is called (which KVM does not support).
However, some operating systems use this register during an early boot
stage in which their kernel is not capable of handling #GP correctly,
causing #DP and finally a triple fault effectively resetting the vCPU.

This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the
crashes.

Signed-off-by: Dmitry Bilunov <kmeaw@yandex-team.ru>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02 17:38:50 +02:00
Nadav Amit
b19ee2ff3b KVM: x86: avoid write-tearing of TDP
In theory, nothing prevents the compiler from write-tearing PTEs, or
split PTE writes. These partially-modified PTEs can be fetched by other
cores and cause mayhem. I have not really encountered such case in
real-life, but it does seem possible.

For example, the compiler may try to do something creative for
kvm_set_pte_rmapp() and perform multiple writes to the PTE.

Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02 17:38:50 +02:00
Russell King
e2dfb4b880 ARM: fix PTRACE_SETVFPREGS on SMP systems
PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
reloaded, because it undoes one of the effects of vfp_flush_hwstate().

Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
an invalid CPU number, but vfp_set() overwrites this with the original
CPU number, thereby rendering the hardware state as apparently "valid",
even though the software state is more recent.

Fix this by reverting the previous change.

Cc: <stable@vger.kernel.org>
Fixes: 8130b9d7b9 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Simon Marchi <simon.marchi@ericsson.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2016-06-02 14:18:56 +01:00
Will Deacon
10fdf8513f arm64: unistd32.h: wire up missing syscalls for compat tasks
We're missing entries for mlock2, copy_file_range, preadv2 and pwritev2
in our compat syscall table, so hook them up. Only the last two need
compat wrappers.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-01 18:48:20 +01:00
Linus Torvalds
58c1f9950f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
 "sparc64 mmu context allocation and trap return bug fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix return from trap window fill crashes.
  sparc: Harden signal return frame checks.
  sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
2016-05-31 22:20:56 -07:00
Thomas Huth
7cc851039d powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
If we do not provide the PVR for POWER8NVL, a guest on this system
currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU
does not provide a generic PowerISA 2.07 mode yet. So some new
instructions from POWER8 (like "mtvsrd") get disabled for the guest,
resulting in crashes when using code compiled explicitly for
POWER8 (e.g. with the "-mcpu=power8" option of GCC).

Fixes: ddee09c099 ("powerpc: Add PVR for POWER8NVL processor")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01 13:47:34 +10:00
Aneesh Kumar K.V
157d4d0620 powerpc/mm/radix: Add missing tlb flush
This should not have any impact on hash, because hash does tlb
invalidate with every pte update and we don't implement
flush_tlb_* functions for hash. With radix we should make an explicit
call to flush tlb outside pte update.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01 13:47:34 +10:00
Aneesh Kumar K.V
dc47c0c1f8 powerpc/mm/hash: Fix the reference bit update when handling hash fault
When we converted the asm routines to C functions, we missed updating
HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
bits from pte via.

andi.	r3,r30,0x1fe		/* Get basic set of flags */

We also update the code such that we won't update the Change bit ('C'
bit) always. This was added by commit c5cf0e30bf ("powerpc: Fix
buglet with MMU hash management").

With hash64, we need to make sure that hardware doesn't do a pte update
directly. This is because we do end up with entries in TLB with no hash
page table entry. This happens because when we find a hash bucket full,
we "evict" a more/less random entry from it. When we do that we don't
invalidate the TLB (hpte_remove) because we assume the old translation
is still technically "valid". For more info look at commit
0608d692463("powerpc/mm: Always invalidate tlb on hpte invalidate and
update").

Thus it's critical that valid hash PTEs always have reference bit set
and writeable ones have change bit set. We do this by hashing a
non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and
thus R) when hashing anything else in. Any attempt by Linux at clearing
those bits also removes the corresponding hash entry.

Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
We don't really need to do that because we never map a RW pte entry
without setting 'C' bit. On READ fault on a RW pte entry, we still map
it READ only, hence a store update in the page will still cause a hash
pte fault.

This patch reverts the part of commit c5cf0e30bf ("[PATCH] powerpc:
Fix buglet with MMU hash management") and retain the updatepp part.

- If we hit the updatepp path on native, the old code without that
  commit, would fail to set C bcause native_hpte_updatepp()
  was implemented to filter the same bits as H_PROTECT and not let C
  through thus we would "upgrade" a RO HPTE to RW without setting C
  thus causing the bug. So the real fix in that commit was the change
  to native_hpte_updatepp

Fixes: 89ff725051 ("powerpc/mm: Convert __hash_page_64K to C")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01 13:47:34 +10:00
Aneesh Kumar K.V
d6c886006c powerpc/mm/radix: Update LPCR only if it is powernv
LPCR cannot be updated when running in guest mode.

Fixes: 2bfd65e45e ("powerpc/mm/radix: Add radix callbacks for early init routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01 13:47:34 +10:00
Catalin Marinas
e47b020a32 arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with
the 32-bit ARM one by providing an additional line:

model name      : ARMv8 Processor rev X (v8l)

Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-31 17:50:30 +01:00
Linus Torvalds
367d3fd505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Three bugs fixes and an update for the default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix info leak in do_sigsegv
  s390/config: update default configuration
  s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
  s390/bpf: reduce maximum program size to 64 KB
2016-05-31 09:43:24 -07:00
Marc Zyngier
c585132840 arm64: KVM: vgic-v3: Relax synchronization when SRE==1
The GICv3 backend of the vgic is quite barrier heavy, in order
to ensure synchronization of the system registers and the
memory mapped view for a potential GICv2 guest.

But when the guest is using a GICv3 model, there is absolutely
no need to execute all these heavy barriers, and it is actually
beneficial to avoid them altogether.

This patch makes the synchonization conditional, and ensures
that we do not change the EL1 SRE settings if we do not need to.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-31 16:12:17 +02:00