Commit Graph

2538 Commits

Author SHA1 Message Date
Linus Torvalds
0cf744bc7a Merge branch 'akpm' (fixes from Andrew Morton)
Merge patch-bomb from Andrew Morton:
 - part of OCFS2 (review is laggy again)
 - procfs
 - slab
 - all of MM
 - zram, zbud
 - various other random things: arch, filesystems.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
  nosave: consolidate __nosave_{begin,end} in <asm/sections.h>
  include/linux/screen_info.h: remove unused ORIG_* macros
  kernel/sys.c: compat sysinfo syscall: fix undefined behavior
  kernel/sys.c: whitespace fixes
  acct: eliminate compile warning
  kernel/async.c: switch to pr_foo()
  include/linux/blkdev.h: use NULL instead of zero
  include/linux/kernel.h: deduplicate code implementing clamp* macros
  include/linux/kernel.h: rewrite min3, max3 and clamp using min and max
  alpha: use Kbuild logic to include <asm-generic/sections.h>
  frv: remove deprecated IRQF_DISABLED
  frv: remove unused cpuinfo_frv and friends to fix future build error
  zbud: avoid accessing last unused freelist
  zsmalloc: simplify init_zspage free obj linking
  mm/zsmalloc.c: correct comment for fullness group computation
  zram: use notify_free to account all free notifications
  zram: report maximum used memory
  zram: zram memory size limitation
  zsmalloc: change return value unit of zs_get_total_size_bytes
  zsmalloc: move pages_allocated to zs_pool
  ...
2014-10-09 22:26:14 -04:00
Mel Gorman
6a33979d5b mm: remove misleading ARCH_USES_NUMA_PROT_NONE
ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
_PAGE_NUMA using _PROT_NONE.  This saved using an additional PTE bit and
relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
fault scanner.  This was found to be conceptually confusing with a lot of
implicit assumptions and it was asked that an alternative be found.

Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
bits and shrunk the maximum possible swap size but it did not go far
enough.  There are no architectures that reuse _PROT_NONE as _PROT_NUMA
but the relics still exist.

This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
duplication in powerpc vs the generic implementation by defining the types
the core NUMA helpers expected to exist from x86 with their ppc64
equivalent.  This necessitated that a PTE bit mask be created that
identified the bits that distinguish present from NUMA pte entries but it
is expected this will only differ between arches based on _PAGE_PROTNONE.
The naming for the generic helpers was taken from x86 originally but ppc64
has types that are equivalent for the purposes of the helper so they are
mapped instead of duplicating code.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:52 -04:00
Linus Torvalds
80213c03c4 PCI changes for the v3.18 merge window:
Enumeration
     - Check Vendor ID only for Config Request Retry Status (Rajat Jain)
     - Enable Config Request Retry Status when supported (Rajat Jain)
     - Add generic domain handling (Catalin Marinas)
     - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado)
 
   Resource management
     - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu)
     - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr)
 
   PCI device hotplug
     - Prevent NULL dereference during pciehp probe (Andreas Noever)
     - Move _HPP & _HPX handling into core (Bjorn Helgaas)
     - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas)
     - Apply _HPP/_HPX to display devices (Bjorn Helgaas)
     - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas)
     - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas)
     - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas)
     - Fix wait time in pciehp timeout message (Yinghai Lu)
     - Add more pciehp Slot Control debug output (Yinghai Lu)
     - Stop disabling pciehp notifications during init (Yinghai Lu)
 
   MSI
     - Remove arch_msi_check_device() (Alexander Gordeev)
     - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev)
     - Move D0 check into pci_msi_check_device() (Alexander Gordeev)
     - Remove unused kobject from struct msi_desc (Yijing Wang)
     - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang)
     - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang)
     - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang)
     - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang)
     - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang)
 
   Power management
     - Drop unused runtime PM support code for PCIe ports (Rafael J.  Wysocki)
     - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki)
 
   AER
     - Add additional AER error strings (Gong Chen)
     - Make <linux/aer.h> standalone includable (Thierry Reding)
 
   Virtualization
     - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson)
     - Add ACS quirk for Intel 10G NICs (Alex Williamson)
     - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp)
     - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson)
     - Add device flag helpers (Ethan Zhao)
     - Assume all Mellanox devices have broken INTx masking (Gavin Shan)
 
   Generic host bridge driver
     - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau)
     - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau)
     - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau)
     - Fix the conversion of IO ranges into IO resources (Liviu Dudau)
     - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau)
     - Add support for parsing PCI host bridge resources from DT (Liviu Dudau)
     - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau)
     - Add arm64 architectural support for PCI (Liviu Dudau)
 
   APM X-Gene
     - Add APM X-Gene PCIe driver (Tanmay Inamdar)
     - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar)
 
   Freescale i.MX6
     - Probe in module_init(), not fs_initcall() (Lucas Stach)
     - Delay enabling reference clock for SS until it stabilizes (Tim Harvey)
 
   Marvell MVEBU
     - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni)
 
   NVIDIA Tegra
     - Make sure the PCIe PLL is really reset (Eric Yuen)
     - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang)
     - Fix extended configuration space mapping (Peter Daifuku)
     - Implement resource hierarchy (Thierry Reding)
     - Clear CLKREQ# enable on port disable (Thierry Reding)
     - Add Tegra124 support (Thierry Reding)
 
   ST Microelectronics SPEAr13xx
     - Pass config resource through reg property (Pratyush Anand)
 
   Synopsys DesignWare
     - Use NULL instead of false (Fabio Estevam)
     - Parse bus-range property from devicetree (Lucas Stach)
     - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach)
     - Remove pci_assign_unassigned_resources() (Lucas Stach)
     - Check private_data validity in single place (Lucas Stach)
     - Setup and clear exactly one MSI at a time (Lucas Stach)
     - Remove open-coded bitmap operations (Lucas Stach)
     - Fix configuration base address when using 'reg' (Minghuan Lian)
     - Fix IO resource end address calculation (Minghuan Lian)
     - Rename get_msi_data() to get_msi_addr() (Minghuan Lian)
     - Add get_msi_data() to pcie_host_ops (Minghuan Lian)
     - Add support for v3.65 hardware (Murali Karicheri)
     - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand)
 
   TI Keystone
     - Add TI Keystone PCIe driver (Murali Karicheri)
     - Limit MRSS for all downstream devices (Murali Karicheri)
     - Assume controller is already in RC mode (Murali Karicheri)
     - Set device ID based on SoC to support multiple ports (Murali Karicheri)
 
   Xilinx AXI
     - Add Xilinx AXI PCIe driver (Srikanth Thokala)
     - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter)
 
   Miscellaneous
     - Clean up whitespace (Quentin Lambert)
     - Remove assignments from "if" conditions (Quentin Lambert)
     - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri)
     - x86: Mark DMI tables as initialization data (Mathias Krause)
     - x86: Move __init annotation to the correct place (Mathias Krause)
     - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause)
     - x86: Constify pci_mmcfg_probes[] array (Mathias Krause)
     - x86: Mark PCI BIOS initialization code as such (Mathias Krause)
     - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya)
     - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUNWmJAAoJEFmIoMA60/r8GncP/3uHRoBrnaF6pv+S1l1p3Fs/
 l1kKH91/IuAAU7VJX8pkNybFqx02topWmiVVXAzqvD01PcRLGCLjPbWl5h+y5/Ja
 CHZH33AwHAmm0kt4BrOSOeHTLJhAigly2zV3P4F8jRIgyaeMoGZ6Ko4tkQUpm21k
 +ohrOd4cxYkmzzCjKwsZZhKnyRNpae8FmTk3VQBPuN8DbhvFPrqo5/+GeAdSZTdS
 HZHpfl2HL4095aY7uBVsZqNkjQyl6SnWwjkjLnuI8q3qA3BLgDZE/Jr8F/MNuW1V
 y01JIjerFWMDFyBIkpg7moYnODy6oP3KvczwYdKGmqsJja+0MQvYhLTwD+R/yTQS
 SewJA0mL3T3EJEfnFYkCiaIX27xIwk/FxHfaKPN91xgx/QM7xCVZNrU2/dXjhoX1
 GqLKxOEaFHhWWTyT5Dj27I0ZcElzFZ3tIwvrHfs8y22oAuAlsAypaUgvUwRfL4CO
 hOj4ITZa0t041sYWqxCoGAA9Fdp8HMzNKKS5F4mhADz4Ad9v6uPCNv/s/RoxVsbm
 jhZOtPYJ0/iCA+kNVX563S8Z3VpfPI+7bBjcj2WKdzW+IlICvOKT+kvwL2Tv/rE7
 w0hrNsbkgGsYbPldMx7LwCavsUtYFuNj0zoU6vkhP2jk6O2Tn5VXDmjrXH0v3iHI
 v03vlUtre0bQ26fzDyLQ
 =4Zv1
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "The interesting things here are:

   - Turn on Config Request Retry Status Software Visibility.  This
     caused hangs last time, but we included a fix this time.
   - Rework PCI device configuration to use _HPP/_HPX more aggressively
   - Allow PCI devices to be put into D3cold during system suspend
   - Add arm64 PCI support
   - Add APM X-Gene host bridge driver
   - Add TI Keystone host bridge driver
   - Add Xilinx AXI host bridge driver

  More detailed summary:

  Enumeration
    - Check Vendor ID only for Config Request Retry Status (Rajat Jain)
    - Enable Config Request Retry Status when supported (Rajat Jain)
    - Add generic domain handling (Catalin Marinas)
    - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado)

  Resource management
    - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu)
    - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr)

  PCI device hotplug
    - Prevent NULL dereference during pciehp probe (Andreas Noever)
    - Move _HPP & _HPX handling into core (Bjorn Helgaas)
    - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas)
    - Apply _HPP/_HPX to display devices (Bjorn Helgaas)
    - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas)
    - Fix wait time in pciehp timeout message (Yinghai Lu)
    - Add more pciehp Slot Control debug output (Yinghai Lu)
    - Stop disabling pciehp notifications during init (Yinghai Lu)

  MSI
    - Remove arch_msi_check_device() (Alexander Gordeev)
    - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev)
    - Move D0 check into pci_msi_check_device() (Alexander Gordeev)
    - Remove unused kobject from struct msi_desc (Yijing Wang)
    - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang)
    - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang)
    - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang)
    - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang)
    - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang)

  Power management
    - Drop unused runtime PM support code for PCIe ports (Rafael J.  Wysocki)
    - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki)

  AER
    - Add additional AER error strings (Gong Chen)
    - Make <linux/aer.h> standalone includable (Thierry Reding)

  Virtualization
    - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson)
    - Add ACS quirk for Intel 10G NICs (Alex Williamson)
    - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp)
    - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson)
    - Add device flag helpers (Ethan Zhao)
    - Assume all Mellanox devices have broken INTx masking (Gavin Shan)

  Generic host bridge driver
    - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau)
    - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau)
    - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau)
    - Fix the conversion of IO ranges into IO resources (Liviu Dudau)
    - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau)
    - Add support for parsing PCI host bridge resources from DT (Liviu Dudau)
    - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau)
    - Add arm64 architectural support for PCI (Liviu Dudau)

  APM X-Gene
    - Add APM X-Gene PCIe driver (Tanmay Inamdar)
    - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar)

  Freescale i.MX6
    - Probe in module_init(), not fs_initcall() (Lucas Stach)
    - Delay enabling reference clock for SS until it stabilizes (Tim Harvey)

  Marvell MVEBU
    - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni)

  NVIDIA Tegra
    - Make sure the PCIe PLL is really reset (Eric Yuen)
    - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang)
    - Fix extended configuration space mapping (Peter Daifuku)
    - Implement resource hierarchy (Thierry Reding)
    - Clear CLKREQ# enable on port disable (Thierry Reding)
    - Add Tegra124 support (Thierry Reding)

  ST Microelectronics SPEAr13xx
    - Pass config resource through reg property (Pratyush Anand)

  Synopsys DesignWare
    - Use NULL instead of false (Fabio Estevam)
    - Parse bus-range property from devicetree (Lucas Stach)
    - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach)
    - Remove pci_assign_unassigned_resources() (Lucas Stach)
    - Check private_data validity in single place (Lucas Stach)
    - Setup and clear exactly one MSI at a time (Lucas Stach)
    - Remove open-coded bitmap operations (Lucas Stach)
    - Fix configuration base address when using 'reg' (Minghuan Lian)
    - Fix IO resource end address calculation (Minghuan Lian)
    - Rename get_msi_data() to get_msi_addr() (Minghuan Lian)
    - Add get_msi_data() to pcie_host_ops (Minghuan Lian)
    - Add support for v3.65 hardware (Murali Karicheri)
    - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand)

  TI Keystone
    - Add TI Keystone PCIe driver (Murali Karicheri)
    - Limit MRSS for all downstream devices (Murali Karicheri)
    - Assume controller is already in RC mode (Murali Karicheri)
    - Set device ID based on SoC to support multiple ports (Murali Karicheri)

  Xilinx AXI
    - Add Xilinx AXI PCIe driver (Srikanth Thokala)
    - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter)

  Miscellaneous
    - Clean up whitespace (Quentin Lambert)
    - Remove assignments from "if" conditions (Quentin Lambert)
    - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri)
    - x86: Mark DMI tables as initialization data (Mathias Krause)
    - x86: Move __init annotation to the correct place (Mathias Krause)
    - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause)
    - x86: Constify pci_mmcfg_probes[] array (Mathias Krause)
    - x86: Mark PCI BIOS initialization code as such (Mathias Krause)
    - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya)
    - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)"

* tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (109 commits)
  arm64: dts: Add APM X-Gene PCIe device tree nodes
  PCI: Add ACS quirk for AMD A88X southbridge devices
  PCI: xgene: Add APM X-Gene PCIe driver
  PCI: designware: Remove open-coded bitmap operations
  PCI/MSI: Remove unnecessary temporary variable
  PCI/MSI: Use __write_msi_msg() instead of write_msi_msg()
  MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg()
  PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg()
  PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints
  PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib
  PCI/MSI: Remove unused kobject from struct msi_desc
  PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported()
  PCI/MSI: Move D0 check into pci_msi_check_device()
  PCI/MSI: Remove arch_msi_check_device()
  irqchip: armada-370-xp: Remove arch_msi_check_device()
  PCI/MSI/PPC: Remove arch_msi_check_device()
  arm64: Add architectural support for PCI
  PCI: Add pci_remap_iospace() to map bus I/O resources
  of/pci: Add support for parsing PCI host bridge resources from DT
  of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr()
  ...

Conflicts:
	arch/arm64/boot/dts/apm-storm.dtsi
2014-10-09 15:03:49 -04:00
Linus Torvalds
afa3536be8 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Main changes:

  - Fix the deadlock reported by Dave Jones et al
  - Clean up and fix nohz_full interaction with arch abilities
  - nohz init code consolidation/cleanup"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: nohz full depends on irq work self IPI support
  nohz: Consolidate nohz full init code
  arm64: Tell irq work about self IPI support
  arm: Tell irq work about self IPI support
  x86: Tell irq work about self IPI support
  irq_work: Force raised irq work to run on irq work interrupt
  irq_work: Introduce arch_irq_work_has_interrupt()
  nohz: Move nohz full init call to tick init
2014-10-09 06:30:57 -04:00
Linus Torvalds
e4e65676f2 Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
 
 - s390 moves closer towards host large page support
 
 - PowerPC has improved support for debugging (both inside the guest and
   via gdbstub) and support for e6500 processors
 
 - ARM/ARM64 support read-only memory (which is necessary to put firmware
   in emulated NOR flash)
 
 - x86 has the usual emulator fixes and nested virtualization improvements
   (including improved Windows support on Intel and Jailhouse hypervisor
   support on AMD), adaptive PLE which helps overcommitting of huge guests.
   Also included are some patches that make KVM more friendly to memory
   hot-unplug, and fixes for rare caching bugs.
 
 Two patches have trivial mm/ parts that were acked by Rik and Andrew.
 
 Note: I will soon switch to a subkey for signing purposes.  To verify
 future signed pull requests from me, please update my key with
 "gpg --recv-keys 9B4D86F2".  You should see 3 new subkeys---the
 one for signing will be a 2048-bit RSA key, 4E6B09D7.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL
 WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq
 lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY
 x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK
 7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV
 sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU
 YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91
 YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam
 ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd
 CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy
 z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG
 BgEfS2x6jRc5zB3hjwDr
 =iEVi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Fixes and features for 3.18.

  Apart from the usual cleanups, here is the summary of new features:

   - s390 moves closer towards host large page support

   - PowerPC has improved support for debugging (both inside the guest
     and via gdbstub) and support for e6500 processors

   - ARM/ARM64 support read-only memory (which is necessary to put
     firmware in emulated NOR flash)

   - x86 has the usual emulator fixes and nested virtualization
     improvements (including improved Windows support on Intel and
     Jailhouse hypervisor support on AMD), adaptive PLE which helps
     overcommitting of huge guests.  Also included are some patches that
     make KVM more friendly to memory hot-unplug, and fixes for rare
     caching bugs.

  Two patches have trivial mm/ parts that were acked by Rik and Andrew.

  Note: I will soon switch to a subkey for signing purposes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
  kvm: do not handle APIC access page if in-kernel irqchip is not in use
  KVM: s390: count vcpu wakeups in stat.halt_wakeup
  KVM: s390/facilities: allow TOD-CLOCK steering facility bit
  KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
  arm/arm64: KVM: Report correct FSC for unsupported fault types
  arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
  kvm: Fix kvm_get_page_retry_io __gup retval check
  arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
  kvm: x86: Unpin and remove kvm_arch->apic_access_page
  kvm: vmx: Implement set_apic_access_page_addr
  kvm: x86: Add request bit to reload APIC access page address
  kvm: Add arch specific mmu notifier for page invalidation
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
  kvm: Fix page ageing bugs
  kvm/x86/mmu: Pass gfn and level to rmapp callback.
  x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
  kvm: x86: use macros to compute bank MSRs
  KVM: x86: Remove debug assertion of non-PAE reserved bits
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  kvm: Faults which trigger IO release the mmap_sem
  ...
2014-10-08 05:27:39 -04:00
Alexander Gordeev
6b2fd7efeb PCI/MSI/PPC: Remove arch_msi_check_device()
Move MSI checks from arch_msi_check_device() to arch_setup_msi_irqs().
This makes the code more compact and allows removing
arch_msi_check_device() from generic MSI code.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-01 12:21:14 -06:00
Paolo Bonzini
00c027db0c Patch queue for ppc - 2014-09-24
New awesome things in this release:
 
   - E500: e6500 core support
   - E500: guest and remote debug support
   - Book3S: remote sw breakpoint support
   - Book3S: HV: Minor bugfixes
 
 Alexander Graf (1):
       KVM: PPC: Pass enum to kvmppc_get_last_inst
 
 Bharat Bhushan (8):
       KVM: PPC: BOOKE: allow debug interrupt at "debug level"
       KVM: PPC: BOOKE : Emulate rfdi instruction
       KVM: PPC: BOOKE: Allow guest to change MSR_DE
       KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG
       KVM: PPC: BOOKE: Guest and hardware visible debug registers are same
       KVM: PPC: BOOKE: Add one reg interface for DBSR
       KVM: PPC: BOOKE: Add one_reg documentation of SPRG9 and DBSR
       KVM: PPC: BOOKE: Emulate debug registers and exception
 
 Madhavan Srinivasan (2):
       powerpc/kvm: support to handle sw breakpoint
       powerpc/kvm: common sw breakpoint instr across ppc
 
 Michael Neuling (1):
       KVM: PPC: Book3S HV: Add register name when loading toc
 
 Mihai Caraman (10):
       powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
       powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers
       KVM: PPC: Book3E: Increase FPU laziness
       KVM: PPC: Book3e: Add AltiVec support
       KVM: PPC: Make ONE_REG powerpc generic
       KVM: PPC: Move ONE_REG AltiVec support to powerpc
       KVM: PPC: Remove the tasklet used by the hrtimer
       KVM: PPC: Remove shared defines for SPE and AltiVec interrupts
       KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core
       KVM: PPC: Book3E: Enable e6500 core
 
 Paul Mackerras (2):
       KVM: PPC: Book3S HV: Increase timeout for grabbing secondary threads
       KVM: PPC: Book3S HV: Only accept host PVR value for guest PVR
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJUIyyEAAoJECszeR4D/txgiV8P/AnSRcjxrlW+ITsimZezDaj5
 MfFv2ZyQKlVjp4cfzfCTW5otQT/K2rSfJzB/V6l1xGcM/UEO+snmPddokvFLMsp9
 dLvPjZI6ivZu/rjRZ8eqnTQIAwid0K5Yss870Y8YWfRBByKVDs7rRx75gj6q8kek
 jG3wLQQxDYEapkGXiaIcX2Mbf6GAZKNhGf6M5Khn/v3RE0+mNg9J+nffBZXOxEYo
 WDe20KNSuDqDEnWIc82uibTbH1Wnxmetc5jf21DWaquLs9VGbON1X9Myl+aBNQuP
 wDt6D04rgtBZbwyHKsSO/0poK0eIms+5jiW8c+XPO2QOLXQwwNKBNmRKePyk1bt5
 gRxd+u9OGzRGHKwIS1vqHLKCdr5HiTN0uE+nZ+oDWjXVJQRMc8HCx0tWxzZg46yd
 kIIRuDrIQQUH3j2L/PnY3Nx3yKNhg97Ysek0ToIsxlkqczrAUewnXuOj9Ijf+/Cz
 Y3cVsQEhepcO3xyz5uyWJQwmFZkwJVOclzGaNgXKeKl5fkpXwLPxc6vmI2K+hnU9
 TRFoQgbknPxQe2qv9cXeMBFhZwNRKpcYW7w3G81ko/7foVmwP3CjnNulXMKiNuVH
 i8pVd8zxiJuTWVQSksGWuWCxueLmc86L4khSF5YBzg9pid7ajmxcfEDWCQGdN+Fe
 Oh4HUW0860IJYOQRIKJv
 =CR/Z
 -----END PGP SIGNATURE-----

Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-next

Patch queue for ppc - 2014-09-24

New awesome things in this release:

  - E500: e6500 core support
  - E500: guest and remote debug support
  - Book3S: remote sw breakpoint support
  - Book3S: HV: Minor bugfixes

Alexander Graf (1):
      KVM: PPC: Pass enum to kvmppc_get_last_inst

Bharat Bhushan (8):
      KVM: PPC: BOOKE: allow debug interrupt at "debug level"
      KVM: PPC: BOOKE : Emulate rfdi instruction
      KVM: PPC: BOOKE: Allow guest to change MSR_DE
      KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG
      KVM: PPC: BOOKE: Guest and hardware visible debug registers are same
      KVM: PPC: BOOKE: Add one reg interface for DBSR
      KVM: PPC: BOOKE: Add one_reg documentation of SPRG9 and DBSR
      KVM: PPC: BOOKE: Emulate debug registers and exception

Madhavan Srinivasan (2):
      powerpc/kvm: support to handle sw breakpoint
      powerpc/kvm: common sw breakpoint instr across ppc

Michael Neuling (1):
      KVM: PPC: Book3S HV: Add register name when loading toc

Mihai Caraman (10):
      powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
      powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers
      KVM: PPC: Book3E: Increase FPU laziness
      KVM: PPC: Book3e: Add AltiVec support
      KVM: PPC: Make ONE_REG powerpc generic
      KVM: PPC: Move ONE_REG AltiVec support to powerpc
      KVM: PPC: Remove the tasklet used by the hrtimer
      KVM: PPC: Remove shared defines for SPE and AltiVec interrupts
      KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core
      KVM: PPC: Book3E: Enable e6500 core

Paul Mackerras (2):
      KVM: PPC: Book3S HV: Increase timeout for grabbing secondary threads
      KVM: PPC: Book3S HV: Only accept host PVR value for guest PVR
2014-09-24 23:19:45 +02:00
Tang Chen
fe71557afb kvm: Add arch specific mmu notifier for page invalidation
This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:59 +02:00
Andres Lagar-Cavilla
5712846808 kvm: Fix page ageing bugs
1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.

Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:58 +02:00
Madhavan Srinivasan
033aaa14af powerpc/kvm: common sw breakpoint instr across ppc
This patch extends the use of illegal instruction as software
breakpoint instruction across the ppc platform. Patch extends
booke program interrupt code to support software breakpoint.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[agraf: Fix bookehv]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:36 +02:00
Madhavan Srinivasan
a59c1d9e60 powerpc/kvm: support to handle sw breakpoint
This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:35 +02:00
Mihai Caraman
188e267ce2 KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core
ePAPR represents hardware threads as cpu node properties in device tree.
So with existing QEMU, hardware threads are simply exposed as vcpus with
one hardware thread.

The e6500 core shares TLBs between hardware threads. Without tlb write
conditional instruction, the Linux kernel uses per core mechanisms to
protect against duplicate TLB entries.

The guest is unable to detect real siblings threads, so it can't use the
TLB protection mechanism. An alternative solution is to use the hypervisor
to allocate different lpids to guest's vcpus that runs simultaneous on real
siblings threads. On systems with two threads per core this patch halves
the size of the lpid pool that the allocator sees and use two lpids per VM.
Use even numbers to speedup vcpu lpid computation with consecutive lpids
per VM: vm1 will use lpids 2 and 3, vm2 lpids 4 and 5, and so on.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: fix spelling]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:35 +02:00
Mihai Caraman
e9a94832f3 KVM: PPC: Remove shared defines for SPE and AltiVec interrupts
We currently decide at compile-time which of the SPE or AltiVec units to
support exclusively. Guard kernel defines with CONFIG_SPE_POSSIBLE and
CONFIG_PPC_E500MC and remove shared defines.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:34 +02:00
Mihai Caraman
d02d4d156e KVM: PPC: Remove the tasklet used by the hrtimer
Powerpc timer implementation is a copycat version of s390. Now that they removed
the tasklet with commit ea74c0ea1b follow this
optimization.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:34 +02:00
Bharat Bhushan
2f699a59f3 KVM: PPC: BOOKE: Emulate debug registers and exception
This patch emulates debug registers and debug exception
to support guest using debug resource. This enables running
gdb/kgdb etc in guest.

On BOOKE architecture we cannot share debug resources between QEMU and
guest because:
    When QEMU is using debug resources then debug exception must
    be always enabled. To achieve this we set MSR_DE and also set
    MSRP_DEP so guest cannot change MSR_DE.

    When emulating debug resource for guest we want guest
    to control MSR_DE (enable/disable debug interrupt on need).

    So above mentioned two configuration cannot be supported
    at the same time. So the result is that we cannot share
    debug resources between QEMU and Guest on BOOKE architecture.

In the current design QEMU gets priority over guest, this means that if
QEMU is using debug resources then guest cannot use them and if guest is
using debug resource then QEMU can overwrite them.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:33 +02:00
Bharat Bhushan
348ba71081 KVM: PPC: BOOKE: Guest and hardware visible debug registers are same
Guest visible debug register and hardware visible debug registers are
same, so ther is no need to have arch->shadow_dbg_reg, instead use
arch->dbg_reg.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:30 +02:00
Bharat Bhushan
c8ca97ca9b KVM: PPC: BOOKE : Emulate rfdi instruction
This patch adds "rfdi" instruction emulation which is required for
guest debug hander on BOOKE-HV

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:29 +02:00
Peter Zijlstra
c5c38ef3d7 irq_work: Introduce arch_irq_work_has_interrupt()
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.

Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13 18:38:07 +02:00
Pranith Kumar
7d59deb50a powerpc: Wire up sys_seccomp(), sys_getrandom() and sys_memfd_create()
This patch wires up three new syscalls for powerpc. The three
new syscalls are seccomp, getrandom and memfd_create.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
2014-09-09 19:02:47 +10:00
Anton Blanchard
85101af13b powerpc/perf: Fix ABIv2 kernel backtraces
ABIv2 kernels are failing to backtrace through the kernel. An example:

39.30%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
            |
            --- find_get_entry
               __GI___libc_read

The problem is in valid_next_sp() where we check that the new stack
pointer is at least STACK_FRAME_OVERHEAD below the previous one.

ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes
and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes
with no paramter save area.

STACK_FRAME_OVERHEAD is in theory the minimum stack frame size,
but we over 240 uses of it, some of which assume that it includes
space for the parameter area.

We need to work through all our stack defines and rationalise them
but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using
in valid_next_sp(). This fixes the issue:

30.64%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
            |
            --- find_get_entry
               pagecache_get_page
               generic_file_read_iter
               new_sync_read
               vfs_read
               sys_read
               syscall_exit
               __GI___libc_read

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
2014-09-09 19:02:45 +10:00
Radim Krčmář
13a34e067e KVM: remove garbage arg to *hardware_{en,dis}able
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:55 +02:00
Radim Krčmář
0865e636ae KVM: static inline empty kvm_arch functions
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)

This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
  2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
  e790d9ef6 KVM: add kvm_arch_sched_in

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:55 +02:00
Paolo Bonzini
656473003b KVM: forward declare structs in kvm_types.h
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).

Move them from individual files to a generic place in linux/kvm_types.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:53 +02:00
Aneesh Kumar K.V
85c1fafd72 powerpc/mm: Use read barrier when creating real_pte
On ppc64 we support 4K hash pte with 64K page size. That requires
us to track the hash pte slot information on a per 4k basis. We do that
by storing the slot details in the second half of pte page. The pte bit
_PAGE_COMBO is used to indicate whether the second half need to be
looked while building real_pte. We need to use read memory barrier while
doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO
check. On the store side we already do a lwsync in __hash_page_4K

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:41 +10:00
Aneesh Kumar K.V
fc04795575 powerpc/thp: Handle combo pages in invalidate
If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault for
these pages.

We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.

Use _PAGE_COMBO to determine the page size with which we should
invalidate the hash table entries on unmap.

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:39 +10:00
Aneesh Kumar K.V
fa1f8ae80f powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
The segment identifier and segment size will remain the same in
the loop, So we can compute it outside. We also change the
hugepage_invalidate interface so that we can use it the later patch

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:38 +10:00
Nishanth Aravamudan
56758e3c3c powerpc: remove duplicate definition of TEXASR_FS
It appears that commits 7f06f21d40 ("powerpc/tm: Add checking to
treclaim/trechkpt") and e4e3812150 ("KVM: PPC: Book3S HV: Add
transactional memory support") both added definitions of TEXASR_FS.
Remove one of them. At the same time, fix the alignment of the remaining
definition (should be tab-separated like the rest of the #defines).

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:47 +10:00
Vasant Hegde
b09c2ec408 powerpc/powernv: Interface to register/unregister opal dump region
PowerNV platform is capable of capturing host memory region when system
crashes (because of host/firmware). We have new OPAL API to register/
unregister memory region to be captured when system crashes.

This patch adds support for new API. Also during boot time we register
kernel log buffer and unregister before doing kexec.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:45 +10:00
Michael Ellerman
3609e09fd8 powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS
We have been a bit slack about updating the CPU_FTRS_POSSIBLE and
CPU_FTRS_ALWAYS masks. When we added POWER8, and also POWER8E we forgot
to update the ALWAYS mask. And when we added POWER8_DD1 we forgot to
update both the POSSIBLE and ALWAYS masks.

Luckily this hasn't caused any actual bugs AFAICS. Failing to update the
ALWAYS mask just forgoes a potential optimisation opportunity. Failing
to update the POSSIBLE mask for POWER8_DD1 is also OK because it only
removes a bit rather than adding any.

Regardless they should all be in both masks so as to avoid any future
bugs when the set of ALWAYS/POSSIBLE bits changes, or the masks
themselves change.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Neuling <mikey@neuling.org>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:43 +10:00
Michael Ellerman
51d7d5205d powerpc: Add smp_mb() to arch_spin_is_locked()
The kernel defines the function spin_is_locked(), which can be used to
check if a spinlock is currently locked.

Using spin_is_locked() on a lock you don't hold is obviously racy. That
is, even though you may observe that the lock is unlocked, it may become
locked at any time.

There is (at least) one exception to that, which is if two locks are
used as a pair, and the holder of each checks the status of the other
before doing any update.

Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
value:

The first CPU does:

	spin_lock(*A)

	if spin_is_locked(*B)
		# nothing
	else
		smp_mb()
		LOAD r = *COUNTER
		r++
		STORE *COUNTER = r

	spin_unlock(*A)

And the second CPU does:

	spin_lock(*B)

	if spin_is_locked(*A)
		# nothing
	else
		smp_mb()
		LOAD r = *COUNTER
		r++
		STORE *COUNTER = r

	spin_unlock(*B)

Although this is a strange locking construct, it should work.

It seems to be understood, but not documented, that spin_is_locked() is
not a memory barrier, so in the examples above and below the caller
inserts its own memory barrier before acting on the result of
spin_is_locked().

For now we assume spin_is_locked() is implemented as below, and we break
it out in our examples:

	bool spin_is_locked(*LOCK) {
		LOAD l = *LOCK
		return l.locked
	}

Our intuition is that there should be no problem even if the two code
sequences run simultaneously such as:

	CPU 0			CPU 1
	==================================================
	spin_lock(*A)		spin_lock(*B)
	LOAD b = *B		LOAD a = *A
	if b.locked # true	if a.locked # true
	# nothing		# nothing
	spin_unlock(*A)		spin_unlock(*B)

If one CPU gets the lock before the other then it will do the update and
the other CPU will back off:

	CPU 0			CPU 1
	==================================================
	spin_lock(*A)
	LOAD b = *B
				spin_lock(*B)
	if b.locked # false	LOAD a = *A
	else			if a.locked # true
	smp_mb()		# nothing
	LOAD r1 = *COUNTER	spin_unlock(*B)
	r1++
	STORE *COUNTER = r1
	spin_unlock(*A)

However in reality spin_lock() itself is not indivisible. On powerpc we
implement it as a load-and-reserve and store-conditional.

Ignoring the retry logic for the lost reservation case, it boils down to:
	spin_lock(*LOCK) {
		LOAD l = *LOCK
		l.locked = true
		STORE *LOCK = l
		ACQUIRE_BARRIER
	}

The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
defined in memory-barriers.txt:

     This acts as a one-way permeable barrier.  It guarantees that all
     memory operations after the ACQUIRE operation will appear to happen
     after the ACQUIRE operation with respect to the other components of
     the system.

On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
also know as "lightweight sync", or "sync 1".

As described in Power ISA v2.07 section B.2.1.1, in this scenario the
lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
act as the barrier, preventing any loads or stores in the locked region
from occurring prior to the load of *LOCK.

Whether this behaviour is in accordance with the definition of ACQUIRE
semantics in memory-barriers.txt is open to discussion, we may switch to
a different barrier in future.

What this means in practice is that the following can occur:

	CPU 0			CPU 1
	==================================================
	LOAD a = *A 		LOAD b = *B
	a.locked = true		b.locked = true
	LOAD b = *B		LOAD a = *A
	STORE *A = a		STORE *B = b
	if b.locked # false	if a.locked # false
	else			else
	smp_mb()		smp_mb()
	LOAD r1 = *COUNTER	LOAD r2 = *COUNTER
	r1++			r2++
	STORE *COUNTER = r1
				STORE *COUNTER = r2	# Lost update
	spin_unlock(*A)		spin_unlock(*B)

That is, the load of *B can occur prior to the store that makes *A
visibly locked. And similarly for CPU 1. The result is both CPUs hold
their lock and believe the other lock is unlocked.

The easiest fix for this is to add a full memory barrier to the start of
spin_is_locked(), so adding to our previous definition would give us:

	bool spin_is_locked(*LOCK) {
		smp_mb()
		LOAD l = *LOCK
		return l.locked
	}

The new barrier orders the store to the lock we are locking vs the load
of the other lock:

	CPU 0			CPU 1
	==================================================
	LOAD a = *A 		LOAD b = *B
	a.locked = true		b.locked = true
	STORE *A = a		STORE *B = b
	smp_mb()		smp_mb()
	LOAD b = *B		LOAD a = *A
	if b.locked # true	if a.locked # true
	# nothing		# nothing
	spin_unlock(*A)		spin_unlock(*B)

Although the above example is theoretical, there is code similar to this
example in sem_lock() in ipc/sem.c. This commit in addition to the next
commit appears to be a fix for crashes we are seeing in that code where
we believe this race happens in practice.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:26 +10:00
Linus Torvalds
8065be8d03 Merge branch 'akpm' (second patchbomb from Andrew Morton)
Merge more incoming from Andrew Morton:
 "Two new syscalls:

     memfd_create in "shm: add memfd_create() syscall"
     kexec_file_load in "kexec: implementation of new syscall kexec_file_load"

  And:

   - Most (all?) of the rest of MM

   - Lots of the usual misc bits

   - fs/autofs4

   - drivers/rtc

   - fs/nilfs

   - procfs

   - fork.c, exec.c

   - more in lib/

   - rapidio

   - Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs,
     fs/cramfs, fs/romfs, fs/qnx6.

   - initrd/initramfs work

   - "file sealing" and the memfd_create() syscall, in tmpfs

   - add pci_zalloc_consistent, use it in lots of places

   - MAINTAINERS maintenance

   - kexec feature work"

* emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits)
  MAINTAINERS: update nomadik patterns
  MAINTAINERS: update usb/gadget patterns
  MAINTAINERS: update DMA BUFFER SHARING patterns
  kexec: verify the signature of signed PE bzImage
  kexec: support kexec/kdump on EFI systems
  kexec: support for kexec on panic using new system call
  kexec-bzImage64: support for loading bzImage using 64bit entry
  kexec: load and relocate purgatory at kernel load time
  purgatory: core purgatory functionality
  purgatory/sha256: provide implementation of sha256 in purgaotory context
  kexec: implementation of new syscall kexec_file_load
  kexec: new syscall kexec_file_load() declaration
  kexec: make kexec_segment user buffer pointer a union
  resource: provide new functions to walk through resources
  kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc()
  kexec: move segment verification code in a separate function
  kexec: rename unusebale_pages to unusable_pages
  kernel: build bin2c based on config option CONFIG_BUILD_BIN2C
  bin2c: move bin2c in scripts/basic
  shm: wait for pins to be released when sealing
  ...
2014-08-08 15:57:47 -07:00
Andy Lutomirski
a6c19dfe39 arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).

This default is only useful for ia64.  arm64, ppc, s390, sh, tile, 64-bit
UML, and x86_32 have their own code just to disable it.  arm, 32-bit UML,
and x86_64 have gate areas, but they have their own implementations.

This gets rid of the default and moves the code into ia64.

This should save some code on architectures without a gate area: it's now
possible to inline the gate_area functions in the default case.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
Acked-by: Richard Weinberger <richard@nod.at> [for um]
Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:27 -07:00
Laura Abbott
308c09f17d lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead.  At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.

[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>			[x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	[powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:26 -07:00
Linus Torvalds
66bb0aa077 Here are the PPC and ARM changes for KVM, which I separated because
they had small conflicts (respectively within KVM documentation,
 and with 3.16-rc changes).  Since they were all within the subsystem,
 I took care of them.
 
 Stephen Rothwell reported some snags in PPC builds, but they are all
 fixed now; the latest linux-next report was clean.
 
 New features for ARM include:
 - KVM VGIC v2 emulation on GICv3 hardware
 - Big-Endian support for arm/arm64 (guest and host)
 - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
 
 And for PPC:
 - Book3S: Good number of LE host fixes, enable HV on LE
 - Book3S HV: Add in-guest debug support
 
 This release drops support for KVM on the PPC440.  As a result, the
 PPC merge removes more lines than it adds. :)
 
 I also included an x86 change, since Davidlohr tied it to an independent
 bug report and the reporter quickly provided a Tested-by; there was no
 reason to wait for -rc2.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJT4iIJAAoJEBvWZb6bTYbyZqoP/3Wxy8NWPFJ8HGt81NHlGnDS
 a9UbL7EibcOEG+aaKqmtBglTD5YDiGBDNCxxiSJaDHt+grLN4fsWIliJob1nJFoO
 90f89EWN2XjeCrJXA5nUoeg5tpc5OoYKsiP6pTgzIwkP8vvs/H1+zpcTS/UmYsr/
 qipVMMsM+zZeHWZcSbqjW88z7YqIn1sr5282wJ85cbyv4KGizb/G4dyPuDqLb6np
 hkAD8Ah6VV2suQ2FSy7G2fg20R0vglUi60hkEHLoCBPVqJCl7SmC8MvxNbjBnP8S
 J36R0R0u1wHYKzAGooLJGVOZ/o/gSiVqKX+++L2EvJBN+kuA6u/7fxLyBT+LwDAE
 IF/Aln5rpg1fe+eywvhz86WljTVEQ8bO1zVsIQUPY+/ZOPedZHMwyvXft8ogbjSp
 2m9OJ/3e8Aggh0OeHpCDoeow+QDUXvX0YdCw+2Yh0p+7VMXqkyp0QEiBu38jrusC
 rB3VNifJbDSWLKdG9LfCAPHnxZD2XYEwv2WFBo6KQOGMGHfx0GXpCOL/jQihrhA6
 HtEG5Bs3lvnHQemdpUZ58xojiABbMaUPdcnPXQQEp23WhZzrfLMLzqVG0VYnhSsC
 9pi7MJj8c31rqx5WU2oRM28i/BvNxN0NCtkDpineO5s3f89Ws1xnwxqlm38AKP0J
 irJQTYFEqec+GM9JK1rG
 =hyQP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second round of KVM changes from Paolo Bonzini:
 "Here are the PPC and ARM changes for KVM, which I separated because
  they had small conflicts (respectively within KVM documentation, and
  with 3.16-rc changes).  Since they were all within the subsystem, I
  took care of them.

  Stephen Rothwell reported some snags in PPC builds, but they are all
  fixed now; the latest linux-next report was clean.

  New features for ARM include:
   - KVM VGIC v2 emulation on GICv3 hardware
   - Big-Endian support for arm/arm64 (guest and host)
   - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)

  And for PPC:
   - Book3S: Good number of LE host fixes, enable HV on LE
   - Book3S HV: Add in-guest debug support

  This release drops support for KVM on the PPC440.  As a result, the
  PPC merge removes more lines than it adds.  :)

  I also included an x86 change, since Davidlohr tied it to an
  independent bug report and the reporter quickly provided a Tested-by;
  there was no reason to wait for -rc2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
  KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
  KVM: PPC: Enable IRQFD support for the XICS interrupt controller
  KVM: Give IRQFD its own separate enabling Kconfig option
  KVM: Move irq notifier implementation into eventfd.c
  KVM: Move all accesses to kvm::irq_routing into irqchip.c
  KVM: irqchip: Provide and use accessors for irq routing table
  KVM: Don't keep reference to irq routing table in irqfd struct
  KVM: PPC: drop duplicate tracepoint
  arm64: KVM: fix 64bit CP15 VM access for 32bit guests
  KVM: arm64: GICv3: mandate page-aligned GICV region
  arm64: KVM: GICv3: move system register access to msr_s/mrs_s
  KVM: PPC: PR: Handle FSCR feature deselects
  KVM: PPC: HV: Remove generic instruction emulation
  KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
  KVM: PPC: Remove DCR handling
  KVM: PPC: Expose helper functions for data/inst faults
  KVM: PPC: Separate loadstore emulation from priv emulation
  KVM: PPC: Handle magic page in kvmppc_ld/st
  ...
2014-08-07 11:35:30 -07:00
Linus Torvalds
f536b3cae8 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "This is the powerpc new goodies for 3.17.  The short story:

  The biggest bit is Michael removing all of pre-POWER4 processor
  support from the 64-bit kernel.  POWER3 and rs64.  This gets rid of a
  ton of old cruft that has been bitrotting in a long while.  It was
  broken for quite a few versions already and nobody noticed.  Nobody
  uses those machines anymore.  While at it, he cleaned up a bunch of
  old dusty cabinets, getting rid of a skeletton or two.

  Then, we have some base VFIO support for KVM, which allows assigning
  of PCI devices to KVM guests, support for large 64-bit BARs on
  "powernv" platforms, support for HMI (Hardware Management Interrupts)
  on those same platforms, some sparse-vmemmap improvements (for memory
  hotplug),

  There is the usual batch of Freescale embedded updates (summary in the
  merge commit) and fixes here or there, I think that's it for the
  highlights"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits)
  powerpc/eeh: Export eeh_iommu_group_to_pe()
  powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API
  powerpc: Reduce scariness of interrupt frames in stack traces
  powerpc: start loop at section start of start in vmemmap_populated()
  powerpc: implement vmemmap_free()
  powerpc: implement vmemmap_remove_mapping() for BOOK3S
  powerpc: implement vmemmap_list_free()
  powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
  powerpc/book3s: Fix endianess issue for HMI handling on napping cpus.
  powerpc/book3s: handle HMIs for cpus in nap mode.
  powerpc/powernv: Invoke opal call to handle hmi.
  powerpc/book3s: Add basic infrastructure to handle HMI in Linux.
  powerpc/iommu: Fix comments with it_page_shift
  powerpc/powernv: Handle compound PE in config accessors
  powerpc/powernv: Handle compound PE for EEH
  powerpc/powernv: Handle compound PE
  powerpc/powernv: Split ioda_eeh_get_state()
  powerpc/powernv: Allow to freeze PE
  powerpc/powernv: Enable M64 aperatus for PHB3
  powerpc/eeh: Aux PE data for error log
  ...
2014-08-07 08:50:34 -07:00
Paolo Bonzini
cc568ead3c Patch queue for ppc - 2014-08-01
Highlights in this release include:
 
   - BookE: Rework instruction fetch, not racy anymore now
   - BookE HV: Fix ONE_REG accessors for some in-hardware registers
   - Book3S: Good number of LE host fixes, enable HV on LE
   - Book3S: Some misc bug fixes
   - Book3S HV: Add in-guest debug support
   - Book3S HV: Preload cache lines on context switch
   - Remove 440 support
 
 Alexander Graf (31):
       KVM: PPC: Book3s PR: Disable AIL mode with OPAL
       KVM: PPC: Book3s HV: Fix tlbie compile error
       KVM: PPC: Book3S PR: Handle hyp doorbell exits
       KVM: PPC: Book3S PR: Fix ABIv2 on LE
       KVM: PPC: Book3S PR: Fix sparse endian checks
       PPC: Add asm helpers for BE 32bit load/store
       KVM: PPC: Book3S HV: Make HTAB code LE host aware
       KVM: PPC: Book3S HV: Access guest VPA in BE
       KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
       KVM: PPC: Book3S HV: Access XICS in BE
       KVM: PPC: Book3S HV: Fix ABIv2 on LE
       KVM: PPC: Book3S HV: Enable for little endian hosts
       KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct
       KVM: PPC: Deflect page write faults properly in kvmppc_st
       KVM: PPC: Book3S: Stop PTE lookup on write errors
       KVM: PPC: Book3S: Add hack for split real mode
       KVM: PPC: Book3S: Make magic page properly 4k mappable
       KVM: PPC: Remove 440 support
       KVM: Rename and add argument to check_extension
       KVM: Allow KVM_CHECK_EXTENSION on the vm fd
       KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode
       KVM: PPC: Implement kvmppc_xlate for all targets
       KVM: PPC: Move kvmppc_ld/st to common code
       KVM: PPC: Remove kvmppc_bad_hva()
       KVM: PPC: Use kvm_read_guest in kvmppc_ld
       KVM: PPC: Handle magic page in kvmppc_ld/st
       KVM: PPC: Separate loadstore emulation from priv emulation
       KVM: PPC: Expose helper functions for data/inst faults
       KVM: PPC: Remove DCR handling
       KVM: PPC: HV: Remove generic instruction emulation
       KVM: PPC: PR: Handle FSCR feature deselects
 
 Alexey Kardashevskiy (1):
       KVM: PPC: Book3S: Fix LPCR one_reg interface
 
 Aneesh Kumar K.V (4):
       KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
       KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
       KVM: PPC: BOOK3S: PR: Emulate instruction counter
       KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
 
 Anton Blanchard (2):
       KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
       KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()
 
 Bharat Bhushan (10):
       kvm: ppc: bookehv: Added wrapper macros for shadow registers
       kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1
       kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR
       kvm: ppc: booke: Add shared struct helpers of SPRN_ESR
       kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7
       kvm: ppc: Add SPRN_EPR get helper function
       kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit
       KVM: PPC: Booke-hv: Add one reg interface for SPRG9
       KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
       KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
 
 Michael Neuling (1):
       KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling
 
 Mihai Caraman (8):
       KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
       KVM: PPC: e500: Fix default tlb for victim hint
       KVM: PPC: e500: Emulate power management control SPR
       KVM: PPC: e500mc: Revert "add load inst fixup"
       KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
       KVM: PPC: Book3s: Remove kvmppc_read_inst() function
       KVM: PPC: Allow kvmppc_get_last_inst() to fail
       KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
 
 Paul Mackerras (4):
       KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
       KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
       KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call
       KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication
 
 Stewart Smith (2):
       Split out struct kvmppc_vcore creation to separate function
       Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJT21skAAoJECszeR4D/txgeFEP/AzJopN7s//W33CfyBqURHXp
 XALCyAw+S67gtcaTZbxomcG1xuT8Lj9WEw28iz3rCtAnJwIxsY63xrI1nXMzTaI2
 p1rC0ai5Qy+nlEbd6L78spZy/Nzh8DFYGWx78iUSO1mYD8xywJwtoiBA539pwp8j
 8N+mgn61Hwhv31bKtsZlmzXymVr/jbTp5LVuxsBLJwD2lgT49g+4uBnX2cG/iXkg
 Rzbh7LxoNNXrSPI8sYmTWu/81aeXteeX70ja6DHuV5dWLNTuAXJrh5EUfeAZqBrV
 aYcLWUYmIyB87txNmt6ZGVar2p3jr2Xhb9mKx+EN4dbehblanLc1PUqlHd0q3dKc
 Nt60ByqpZn+qDAK86dShSZLEe+GT3lovvE76CqVXD4Er+OUEkc9JoxhN1cof/Gb0
 o6uwZ2isXHRdGoZx5vb4s3UTOlwZGtoL/CyY/HD/ujYDSURkCGbxLj3kkecSY8ut
 QdDAWsC15BwsHtKLr5Zwjp2w+0eGq2QJgfvO0zqWFiz9k33SCBCUpwluFeqh27Hi
 aR5Wir3j+MIw9G8XlYlDJWYfi0h/SZ4G7hh7jSu26NBNBzQsDa8ow/cLzdMhdUwH
 OYSaeqVk5wiRb9to1uq1NQWPA0uRAx3BSjjvr9MCGRqmvn+FV5nj637YWUT+53Hi
 aSvg/U2npghLPPG2cihu
 =JuLr
 -----END PGP SIGNATURE-----

Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm

Patch queue for ppc - 2014-08-01

Highlights in this release include:

  - BookE: Rework instruction fetch, not racy anymore now
  - BookE HV: Fix ONE_REG accessors for some in-hardware registers
  - Book3S: Good number of LE host fixes, enable HV on LE
  - Book3S: Some misc bug fixes
  - Book3S HV: Add in-guest debug support
  - Book3S HV: Preload cache lines on context switch
  - Remove 440 support

Alexander Graf (31):
      KVM: PPC: Book3s PR: Disable AIL mode with OPAL
      KVM: PPC: Book3s HV: Fix tlbie compile error
      KVM: PPC: Book3S PR: Handle hyp doorbell exits
      KVM: PPC: Book3S PR: Fix ABIv2 on LE
      KVM: PPC: Book3S PR: Fix sparse endian checks
      PPC: Add asm helpers for BE 32bit load/store
      KVM: PPC: Book3S HV: Make HTAB code LE host aware
      KVM: PPC: Book3S HV: Access guest VPA in BE
      KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
      KVM: PPC: Book3S HV: Access XICS in BE
      KVM: PPC: Book3S HV: Fix ABIv2 on LE
      KVM: PPC: Book3S HV: Enable for little endian hosts
      KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct
      KVM: PPC: Deflect page write faults properly in kvmppc_st
      KVM: PPC: Book3S: Stop PTE lookup on write errors
      KVM: PPC: Book3S: Add hack for split real mode
      KVM: PPC: Book3S: Make magic page properly 4k mappable
      KVM: PPC: Remove 440 support
      KVM: Rename and add argument to check_extension
      KVM: Allow KVM_CHECK_EXTENSION on the vm fd
      KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode
      KVM: PPC: Implement kvmppc_xlate for all targets
      KVM: PPC: Move kvmppc_ld/st to common code
      KVM: PPC: Remove kvmppc_bad_hva()
      KVM: PPC: Use kvm_read_guest in kvmppc_ld
      KVM: PPC: Handle magic page in kvmppc_ld/st
      KVM: PPC: Separate loadstore emulation from priv emulation
      KVM: PPC: Expose helper functions for data/inst faults
      KVM: PPC: Remove DCR handling
      KVM: PPC: HV: Remove generic instruction emulation
      KVM: PPC: PR: Handle FSCR feature deselects

Alexey Kardashevskiy (1):
      KVM: PPC: Book3S: Fix LPCR one_reg interface

Aneesh Kumar K.V (4):
      KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
      KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
      KVM: PPC: BOOK3S: PR: Emulate instruction counter
      KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page

Anton Blanchard (2):
      KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
      KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()

Bharat Bhushan (10):
      kvm: ppc: bookehv: Added wrapper macros for shadow registers
      kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1
      kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR
      kvm: ppc: booke: Add shared struct helpers of SPRN_ESR
      kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7
      kvm: ppc: Add SPRN_EPR get helper function
      kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit
      KVM: PPC: Booke-hv: Add one reg interface for SPRG9
      KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
      KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr

Michael Neuling (1):
      KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling

Mihai Caraman (8):
      KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
      KVM: PPC: e500: Fix default tlb for victim hint
      KVM: PPC: e500: Emulate power management control SPR
      KVM: PPC: e500mc: Revert "add load inst fixup"
      KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
      KVM: PPC: Book3s: Remove kvmppc_read_inst() function
      KVM: PPC: Allow kvmppc_get_last_inst() to fail
      KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

Paul Mackerras (4):
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
      KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
      KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call
      KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication

Stewart Smith (2):
      Split out struct kvmppc_vcore creation to separate function
      Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8

Conflicts:
	Documentation/virtual/kvm/api.txt
2014-08-05 09:58:11 +02:00
Madhusudanan Kandasamy
eeb03a6eaa powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
remap_4k_pfn() silently truncates upper bits of input 4K PFN
if it cannot be contained in PTE. This leads invalid memory mapping and could
result in a system crash when the memory is accessed. This patch fails
remap_4k_pfn() and returns -EINVAL if the input 4K PFN cannot be contained in
PTE.

V3 : Added parentheses to protect 'pfn' and entire macro as suggested by Brian.
V2 : Rewritten to avoid helper function as suggested by Stephen Rothwell.

Signed-off-by: Madhusudanan Kandasamy <kmadhu@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 16:34:06 +10:00
Mahesh Salgaonkar
0ef95b411e powerpc/powernv: Invoke opal call to handle hmi.
When we hit the HMI in Linux, invoke opal call to handle/recover from HMI
errors in real mode and then in virtual mode during check_irq_replay()
invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from
OPAL and act accordingly.

Now that we are ready to handle HMI interrupt directly in linux, remove
the HMI interrupt registration with firmware.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 16:33:52 +10:00
Mahesh Salgaonkar
0869b6fd20 powerpc/book3s: Add basic infrastructure to handle HMI in Linux.
Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 16:33:48 +10:00
Gavin Shan
5ca27efbd8 powerpc/powernv: Allow to freeze PE
The patch synchronizes header file with firmware to have new OPAL
API opal_pci_eeh_freeze_set(), which is used to freeze the specified
PE in order to support "compound" PE.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:52 +10:00
Guo Chao
262af557dd powerpc/powernv: Enable M64 aperatus for PHB3
This patch enables M64 aperatus for PHB3.

We already had platform hook (ppc_md.pcibios_window_alignment) to affect
the PCI resource assignment done in PCI core so that each PE's M32 resource
was built on basis of M32 segment size. Similarly, we're using that for
M64 assignment on basis of M64 segment size.

   * We're using last M64 BAR to cover M64 aperatus, and it's shared by all
     256 PEs.
   * We don't support P7IOC yet. However, some function callbacks are added
     to (struct pnv_phb) so that we can reuse them on P7IOC in future.
   * PE, corresponding to PCI bus with large M64 BAR device attached, might
     span multiple M64 segments. We introduce "compound" PE to cover the case.
     The compound PE is a list of PEs and the master PE is used as before.
     The slave PEs are just for MMIO isolation.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:47 +10:00
Gavin Shan
bb593c0049 powerpc/eeh: Aux PE data for error log
The patch allows PE (struct eeh_pe) instance to have auxillary data,
whose size is configurable on basis of platform. For PowerNV, the
auxillary data will be used to cache PHB diag-data for that PE
(frozen PE or fenced PHB). In turn, we can retrieve the diag-data
at any later points.

It's useful for the case of VFIO PCI devices where the error log
should be cached, and then be retrieved by the guest at later point.
Also, it can avoid PHB diag-data overwritting if another frozen PE
reported and the previous diag-data isn't fetched by guest.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:43 +10:00
Gavin Shan
f18440fb7e powerpc/eeh: Make diag-data not endian dependent
It's followup of commit ddf0322a ("powerpc/powernv: Fix endianness
problems in EEH"). The patch helps to get non-endian-dependent
diag-data.

Cc: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:38 +10:00
Gavin Shan
dc561fb9e7 powerpc/eeh: Selectively enable IO for error log
According to the experiment I did, PCI config access is blocked
on P7IOC frozen PE by hardware, but PHB3 doesn't do that. That
means we always get 0xFF's while dumping PCI config space of the
frozen PE on P7IOC. We don't have the problem on PHB3. So we have
to enable I/O prioir to collecting error log. Otherwise, meaningless
0xFF's are always returned.

The patch fixes it by EEH flag (EEH_ENABLE_IO_FOR_LOG), which is
selectively set to indicate the case for: P7IOC on PowerNV platform,
pSeries platform.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:25 +10:00
Gavin Shan
05b1721d9f powerpc/eeh: Refactor EEH flag accessors
There are multiple global EEH flags. Almost each flag has its own
accessor, which doesn't make sense. The patch refactors EEH flag
accessors so that they look unified:

  eeh_add_flag():   Add EEH flag
  eeh_clear_flag(): Clear EEH flag
  eeh_has_flag():   Check if one specific flag has been set
  eeh_enabled():    Check if EEH functionality has been enabled

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:21 +10:00
Gavin Shan
212d16cdca powerpc/eeh: EEH support for VFIO PCI device
The patch exports functions to be used by new VFIO ioctl command,
which will be introduced in subsequent patch, to support EEH
functinality for VFIO PCI devices.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:28:48 +10:00
Gavin Shan
05ec424e38 powerpc/eeh: Avoid event on passed PE
We must not handle EEH error on devices which are passed to somebody
else. Instead, we expect that the frozen device owner detects an EEH
error and recovers from it.

This avoids EEH error handling on passed through devices so the device
owner gets a chance to handle them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:28:47 +10:00
Benjamin Herrenschmidt
9287b95ec9 Merge remote-tracking branch 'scott/next' into next
Scott writes:

Highlights include e6500 hardware threading support, an e6500 TLB erratum
workaround, corenet error reporting, support for a new board, and some
minor fixes.
2014-08-05 14:13:41 +10:00
Linus Torvalds
8efb90cf1e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - big rtmutex and futex cleanup and robustification from Thomas
     Gleixner
   - mutex optimizations and refinements from Jason Low
   - arch_mutex_cpu_relax() removal and related cleanups
   - smaller lockdep tweaks"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  arch, locking: Ciao arch_mutex_cpu_relax()
  locking/lockdep: Only ask for /proc/lock_stat output when available
  locking/mutexes: Optimize mutex trylock slowpath
  locking/mutexes: Try to acquire mutex only if it is unlocked
  locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro
  locking/mutexes: Correct documentation on mutex optimistic spinning
  rtmutex: Make the rtmutex tester depend on BROKEN
  futex: Simplify futex_lock_pi_atomic() and make it more robust
  futex: Split out the first waiter attachment from lookup_pi_state()
  futex: Split out the waiter check from lookup_pi_state()
  futex: Use futex_top_waiter() in lookup_pi_state()
  futex: Make unlock_pi more robust
  rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
  rtmutex: Cleanup deadlock detector debug logic
  rtmutex: Confine deadlock logic to futex
  rtmutex: Simplify remove_waiter()
  rtmutex: Document pi chain walk
  rtmutex: Clarify the boost/deboost part
  rtmutex: No need to keep task ref for lock owner check
  rtmutex: Simplify and document try_to_take_rtmutex()
  ...
2014-08-04 16:09:06 -07:00
Alexander Graf
8e6afa36e7 KVM: PPC: PR: Handle FSCR feature deselects
We handle FSCR feature bits (well, TAR only really today) lazily when the guest
starts using them. So when a guest activates the bit and later uses that feature
we enable it for real in hardware.

However, when the guest stops using that bit we don't stop setting it in
hardware. That means we can potentially lose a trap that the guest expects to
happen because it thinks a feature is not active.

This patch adds support to drop TAR when then guest turns it off in FSCR. While
at it it also restricts FSCR access to 64bit systems - 32bit ones don't have it.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-31 10:23:46 +02:00