Commit Graph

773 Commits

Author SHA1 Message Date
Linus Torvalds
45d9a2220f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "Unfortunately, this merge window it'll have a be a lot of small piles -
  my fault, actually, for not keeping #for-next in anything that would
  resemble a sane shape ;-/

  This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
  cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
  components) + several long-standing patches from various folks.

  There definitely will be a lot more (starting with Miklos'
  check_submount_and_drop() series)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  direct-io: Handle O_(D)SYNC AIO
  direct-io: Implement generic deferred AIO completions
  add formats for dentry/file pathnames
  kvm eventfd: switch to fdget
  powerpc kvm: use fdget
  switch fchmod() to fdget
  switch epoll_ctl() to fdget
  switch copy_module_from_fd() to fdget
  git simplify nilfs check for busy subtree
  ibmasmfs: don't bother passing superblock when not needed
  don't pass superblock to hypfs_{mkdir,create*}
  don't pass superblock to hypfs_diag_create_files
  don't pass superblock to hypfs_vm_create_files()
  oprofile: get rid of pointless forward declarations of struct super_block
  oprofilefs_create_...() do not need superblock argument
  oprofilefs_mkdir() doesn't need superblock argument
  don't bother with passing superblock to oprofile_create_stats_files()
  oprofile: don't bother with passing superblock to ->create_files()
  don't bother passing sb to oprofile_create_files()
  coh901318: don't open-code simple_read_from_buffer()
  ...
2013-09-05 08:50:26 -07:00
Al Viro
70abadedab powerpc kvm: use fdget
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 23:04:45 -04:00
Gleb Natapov
a9f6cf965e Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into queue
* 'kvm-ppc-next' of git://github.com/agraf/linux-2.6:
  KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()
  KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls
  KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX
  KVM: PPC: Book3S: Fix compile error in XICS emulation
  KVM: PPC: Book3S PR: return appropriate error when allocation fails
  arch: powerpc: kvm: add signed type cast for comparation
  powerpc/kvm: Copy the pvr value after memset
  KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entry
  kvm/ppc/booke: Don't call kvm_guest_enter twice
  kvm/ppc: Call trace_hardirqs_on before entry
  KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers
  KVM: PPC: Book3S HV: Correct tlbie usage
  powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation.
  powerpc/kvm: Contiguous memory allocator based RMA allocation
  powerpc/kvm: Contiguous memory allocator based hash page table allocation
  KVM: PPC: Book3S: Ignore DABR register
  mm/cma: Move dma contiguous changes into a seperate config
2013-08-30 15:33:11 +03:00
Alexander Graf
bf550fc93d Merge remote-tracking branch 'origin/next' into kvm-ppc-next
Conflicts:
	mm/Kconfig

CMA DMA split and ZSWAP introduction were conflicting, fix up manually.
2013-08-29 00:41:59 +02:00
Paul Mackerras
7e48c101e0 KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()
This reworks kvmppc_mmu_book3s_64_xlate() to make it check the large
page bit in the hashed page table entries (HPTEs) it looks at, and
to simplify and streamline the code.  The checking of the first dword
of each HPTE is now done with a single mask and compare operation,
and all the code dealing with the matching HPTE, if we find one,
is consolidated in one place in the main line of the function flow.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-29 00:05:50 +02:00
Paul Mackerras
8b23de2948 KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls
It turns out that if we exit the guest due to a hcall instruction (sc 1),
and the loading of the instruction in the guest exit path fails for any
reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
instruction after the hcall instruction rather than the hcall itself.
This in turn means that the instruction doesn't get recognized as an
hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
as a sc instruction.  That usually results in the guest kernel getting
a return code of 38 (ENOSYS) from an hcall, which often triggers a
BUG_ON() or other failure.

This fixes the problem by adding a new variant of kvmppc_get_last_inst()
called kvmppc_get_last_sc(), which fetches the instruction if necessary
from pc - 4 rather than pc.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28 16:47:49 +02:00
Paul Mackerras
9d1ffdd8f3 KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX
Currently the code assumes that once we load up guest FP/VSX or VMX
state into the CPU, it stays valid in the CPU registers until we
explicitly flush it to the thread_struct.  However, on POWER7,
copy_page() and memcpy() can use VMX.  These functions do flush the
VMX state to the thread_struct before using VMX instructions, but if
this happens while we have guest state in the VMX registers, and we
then re-enter the guest, we don't reload the VMX state from the
thread_struct, leading to guest corruption.  This has been observed
to cause guest processes to segfault.

To fix this, we check before re-entering the guest that all of the
bits corresponding to facilities owned by the guest, as expressed
in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr.
Any bits that have been cleared correspond to facilities that have
been used by kernel code and thus flushed to the thread_struct, so
for them we reload the state from the thread_struct.

We also need to check current->thread.regs->msr before calling
giveup_fpu() or giveup_altivec(), since if the relevant bit is
clear, the state has already been flushed to the thread_struct and
to flush it again would corrupt it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28 16:41:14 +02:00
Paul Mackerras
7bfa9ad55d KVM: PPC: Book3S: Fix compile error in XICS emulation
Commit 8e44ddc3f3 ("powerpc/kvm/book3s: Add support for H_IPOLL and
H_XIRR_X in XICS emulation") added a call to get_tb() but didn't
include the header that defines it, and on some configs this means
book3s_xics.c fails to compile:

arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’:
arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration]

Cc: stable@vger.kernel.org [v3.10, v3.11]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28 16:28:47 +02:00
Thadeu Lima de Souza Cascardo
7c7b406e6b KVM: PPC: Book3S PR: return appropriate error when allocation fails
err was overwritten by a previous function call, and checked to be 0. If
the following page allocation fails, 0 is going to be returned instead
of -ENOMEM.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28 16:26:33 +02:00
Chen Gang
5d226ae56f arch: powerpc: kvm: add signed type cast for comparation
'rmls' is 'unsigned long', lpcr_rmls() will return negative number when
failure occurs, so it need a type cast for comparing.

'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number
when failure occurs, so it need a type cast for comparing.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28 16:23:35 +02:00
Yann Droneaud
2f84d5ea6f ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
KVM uses anon_inode_get() to allocate file descriptors as part
of some of its ioctls. But those ioctls are lacking a flag argument
allowing userspace to choose options for the newly opened file descriptor.

In such case it's advised to use O_CLOEXEC by default so that
userspace is allowed to choose, without race, if the file descriptor
is going to be inherited across exec().

This patch set O_CLOEXEC flag on all file descriptors created
with anon_inode_getfd() to not leak file descriptors across exec().

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26 13:19:56 +03:00
Aneesh Kumar K.V
87916442bd powerpc/kvm: Copy the pvr value after memset
Otherwise we would clear the pvr value

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-22 19:25:41 +02:00
Thadeu Lima de Souza Cascardo
e0e1361462 powerpc/kvm/book3s_pr: Return appropriate error when allocation fails
err was overwritten by a previous function call, and checked to be 0. If
the following page allocation fails, 0 is going to be returned instead
of -ENOMEM.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-09 18:06:54 +10:00
Chen Gang
2fb10672c8 powerpc/kvm: Add signed type cast for comparation
'rmls' is 'unsigned long', lpcr_rmls() will return negative number when
failure occurs, so it need a type cast for comparing.

'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number
when failure occurs, so it need a type cast for comparing.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-09 18:06:51 +10:00
Paul Mackerras
c8ae0ace10 KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entry
Unlike the other general-purpose SPRs, SPRG3 can be read by usermode
code, and is used in recent kernels to store the CPU and NUMA node
numbers so that they can be read by VDSO functions.  Thus we need to
load the guest's SPRG3 value into the real SPRG3 register when entering
the guest, and restore the host's value when exiting the guest.  We don't
need to save the guest SPRG3 value when exiting the guest as usermode
code can't modify SPRG3.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-25 15:33:09 +02:00
Takuya Yoshikawa
e59dbe09f8 KVM: Introduce kvm_arch_memslots_updated()
This is called right after the memslots is updated, i.e. when the result
of update_memslots() gets installed in install_new_memslots().  Since
the memslots needs to be updated twice when we delete or move a memslot,
kvm_arch_commit_memory_region() does not correspond to this exactly.

In the following patch, x86 will use this new API to check if the mmio
generation has reached its maximum value, in which case mmio sptes need
to be flushed out.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Acked-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18 12:29:25 +02:00
Scott Wood
c09ec198eb kvm/ppc/booke: Don't call kvm_guest_enter twice
kvm_guest_enter() was already called by kvmppc_prepare_to_enter().
Don't call it again.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11 00:58:49 +02:00
Scott Wood
5f1c248f52 kvm/ppc: Call trace_hardirqs_on before entry
Currently this is only being done on 64-bit.  Rather than just move it
out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
consistent with lazy ee state, and so that we don't track more host
code as interrupts-enabled than necessary.

Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
that this function now has a role on 32-bit as well.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11 00:51:28 +02:00
Paul Mackerras
4baa1d871c KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers
The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S
can contain negative values, if some of the handlers end up before the
table in the vmlinux binary.  Thus we need to use a sign-extending load
to read the values in the table rather than a zero-extending load.
Without this, the host crashes when the guest does one of the hcalls
with negative offsets, due to jumping to a bogus address.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-10 13:14:16 +02:00
Paul Mackerras
5448050124 KVM: PPC: Book3S HV: Correct tlbie usage
This corrects the usage of the tlbie (TLB invalidate entry) instruction
in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
On the PPC970, the bit to select large vs. small page is in the instruction,
not in the RB register value.  This changes the code to use the correct
form on PPC970.

On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
field of the RB value incorrectly for 64k pages.  This fixes it.

Since we now have several cases to handle for the tlbie instruction, this
factors out the code to do a sequence of tlbies into a new function,
do_tlbies(), and calls that from the various places where the code was
doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
use the same global_invalidates() function for determining whether to do
local or global TLB invalidations as is used in other places, for
consistency, and also to make sure that kvm->arch.need_tlb_flush gets
updated properly.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-10 13:14:09 +02:00
Aneesh Kumar K.V
990978e993 powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation.
Both RMA and hash page table request will be a multiple of 256K. We can use
a chunk size of 256K to track the free/used 256K chunk in the bitmap. This
should help to reduce the bitmap size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08 16:21:13 +02:00
Aneesh Kumar K.V
6c45b81098 powerpc/kvm: Contiguous memory allocator based RMA allocation
Older version of power architecture use Real Mode Offset register and Real Mode Limit
Selector for mapping guest Real Mode Area. The guest RMA should be physically
contigous since we use the range when address translation is not enabled.

This patch switch RMA allocation code to use contigous memory allocator. The patch
also remove the the linear allocator which not used any more

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08 16:20:20 +02:00
Aneesh Kumar K.V
fa61a4e376 powerpc/kvm: Contiguous memory allocator based hash page table allocation
Powerpc architecture uses a hash based page table mechanism for mapping virtual
addresses to physical address. The architecture require this hash page table to
be physically contiguous. With KVM on Powerpc currently we use early reservation
mechanism for allocating guest hash page table. This implies that we need to
reserve a big memory region to ensure we can create large number of guest
simultaneously with KVM on Power. Another disadvantage is that the reserved memory
is not available to rest of the subsystems and and that implies we limit the total
available memory in the host.

This patch series switch the guest hash page table allocation to use
contiguous memory allocator.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08 16:19:58 +02:00
Alexander Graf
f35320288c KVM: PPC: Book3S: Ignore DABR register
We don't emulate breakpoints yet, so just ignore reads and writes
to / from DABR.

This fixes booting of more recent Linux guest kernels for me.

Reported-by: Nello Martuscielli <ppc.addon@gmail.com>
Tested-by: Nello Martuscielli <ppc.addon@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08 16:18:20 +02:00
Linus Torvalds
65b97fb730 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "This is the powerpc changes for the 3.11 merge window.  In addition to
  the usual bug fixes and small updates, the main highlights are:

   - Support for transparent huge pages by Aneesh Kumar for 64-bit
     server processors.  This allows the use of 16M pages as transparent
     huge pages on kernels compiled with a 64K base page size.

   - Base VFIO support for KVM on power by Alexey Kardashevskiy

   - Wiring up of our nvram to the pstore infrastructure, including
     putting compressed oopses in there by Aruna Balakrishnaiah

   - Move, rework and improve our "EEH" (basically PCI error handling
     and recovery) infrastructure.  It is no longer specific to pseries
     but is now usable by the new "powernv" platform as well (no
     hypervisor) by Gavin Shan.

   - I fixed some bugs in our math-emu instruction decoding and made it
     usable to emulate some optional FP instructions on processors with
     hard FP that lack them (such as fsqrt on Freescale embedded
     processors).

   - Support for Power8 "Event Based Branch" facility by Michael
     Ellerman.  This facility allows what is basically "userspace
     interrupts" for performance monitor events.

   - A bunch of Transactional Memory vs.  Signals bug fixes and HW
     breakpoint/watchpoint fixes by Michael Neuling.

  And more ...  I appologize in advance if I've failed to highlight
  something that somebody deemed worth it."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  pstore: Add hsize argument in write_buf call of pstore_ftrace_call
  powerpc/fsl: add MPIC timer wakeup support
  powerpc/mpic: create mpic subsystem object
  powerpc/mpic: add global timer support
  powerpc/mpic: add irq_set_wake support
  powerpc/85xx: enable coreint for all the 64bit boards
  powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
  powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
  powerpc/mpic: Add get_version API both for internal and external use
  powerpc: Handle both new style and old style reserve maps
  powerpc/hw_brk: Fix off by one error when validating DAWR region end
  powerpc/pseries: Support compression of oops text via pstore
  powerpc/pseries: Re-organise the oops compression code
  pstore: Pass header size in the pstore write callback
  powerpc/powernv: Fix iommu initialization again
  powerpc/pseries: Inform the hypervisor we are using EBB regs
  powerpc/perf: Add power8 EBB support
  powerpc/perf: Core EBB support for 64-bit book3s
  powerpc/perf: Drop MMCRA from thread_struct
  powerpc/perf: Don't enable if we have zero events
  ...
2013-07-04 10:29:23 -07:00
Linus Torvalds
fe489bf450 KVM fixes for 3.11
On the x86 side, there are some optimizations and documentation updates.
 The big ARM/KVM change for 3.11, support for AArch64, will come through
 Catalin Marinas's tree.  s390 and PPC have misc cleanups and bugfixes.
 
 There is a conflict due to "s390/pgtable: fix ipte notify bit" having
 entered 3.10 through Martin Schwidefsky's s390 tree.  This pull request
 has additional changes on top, so this tree's version is the correct one.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0oU6AAoJEBvWZb6bTYbynnsP/RSUrrHrA8Wu1tqVfAKu+1y5
 6OIihqZ9x11/YMaNofAfv86jqxFu0/j7CzMGphNdjzujqKI+Q1tGe7oiVCmKzoG+
 UvSctWsz0lpllgBtnnrm5tcfmG6rrddhLtpA7m320+xCVx8KV5P4VfyHZEU+Ho8h
 ziPmb2mAQ65gBNX6nLHEJ3ITTgad6gt4NNbrKIYpyXuWZQJypzaRqT/vpc4md+Ed
 dCebMXsL1xgyb98EcnOdrWH1wV30MfucR7IpObOhXnnMKeeltqAQPvaOlKzZh4dK
 +QfxJfdRZVS0cepcxzx1Q2X3dgjoKQsHq1nlIyz3qu1vhtfaqBlixLZk0SguZ/R9
 1S1YqucZiLRO57RD4q0Ak5oxwobu18ZoqJZ6nledNdWwDe8bz/W2wGAeVty19ky0
 qstBdM9jnwXrc0qrVgZp3+s5dsx3NAm/KKZBoq4sXiDLd/yBzdEdWIVkIrU3X9wU
 3X26wOmBxtsB7so/JR7ciTsQHelmLicnVeXohAEP9CjIJffB81xVXnXs0P0SYuiQ
 RzbSCwjPzET4JBOaHWT0Dhv0DTS/EaI97KzlN32US3Bn3WiLlS1oDCoPFoaLqd2K
 LxQMsXS8anAWxFvexfSuUpbJGPnKSidSQoQmJeMGBa9QhmZCht3IL16/Fb641ToN
 xBohzi49L9FDbpOnTYfz
 =1zpG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "On the x86 side, there are some optimizations and documentation
  updates.  The big ARM/KVM change for 3.11, support for AArch64, will
  come through Catalin Marinas's tree.  s390 and PPC have misc cleanups
  and bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
  KVM: PPC: Ignore PIR writes
  KVM: PPC: Book3S PR: Invalidate SLB entries properly
  KVM: PPC: Book3S PR: Allow guest to use 1TB segments
  KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
  KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
  KVM: PPC: Book3S PR: Fix proto-VSID calculations
  KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
  KVM: Fix RTC interrupt coalescing tracking
  kvm: Add a tracepoint write_tsc_offset
  KVM: MMU: Inform users of mmio generation wraparound
  KVM: MMU: document fast invalidate all mmio sptes
  KVM: MMU: document fast invalidate all pages
  KVM: MMU: document fast page fault
  KVM: MMU: document mmio page fault
  KVM: MMU: document write_flooding_count
  KVM: MMU: document clear_spte_count
  KVM: MMU: drop kvm_mmu_zap_mmio_sptes
  KVM: MMU: init kvm generation close to mmio wrap-around value
  KVM: MMU: add tracepoint for check_mmio_spte
  KVM: MMU: fast invalidate all mmio sptes
  ...
2013-07-03 13:21:40 -07:00
Linus Torvalds
ab3d681e9d Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The major changes:

  - Simplify RCU's grace-period and callback processing based on the new
    numbering for callbacks.

  - Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
    single-CPU low-latency systems.

  - SRCU-related changes and fixes.

  - Miscellaneous fixes, including converting a few remaining printk()
    calls to pr_*().

  - Documentation updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs
  rcu: Shrink TINY_RCU by moving exit_rcu()
  rcu: Remove TINY_PREEMPT_RCU tracing documentation
  rcu: Consolidate rcutiny_plugin.h ifdefs
  rcu: Remove rcu_preempt_note_context_switch()
  rcu: Remove the CONFIG_TINY_RCU ifdefs in rcutiny.h
  rcu: Remove check_cpu_stall_preempt()
  rcu: Simplify RCU_TINY RCU callback invocation
  rcu: Remove rcu_preempt_process_callbacks()
  rcu: Remove rcu_preempt_remove_callbacks()
  rcu: Remove rcu_preempt_check_callbacks()
  rcu: Remove show_tiny_preempt_stats()
  rcu: Remove TINY_PREEMPT_RCU
  powerpc,kvm: fix imbalance srcu_read_[un]lock()
  rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
  rcu: Apply Dave Jones's NOCB Kconfig help feedback
  rcu: Merge adjacent identical ifdefs
  rcu: Drive quiescent-state-forcing delay from HZ
  rcu: Remove "Experimental" flags
  kthread: Add kworker kthreads to OS-jitter documentation
  ...
2013-07-02 16:13:29 -07:00
Benjamin Herrenschmidt
24a72acac1 Linux 3.10
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJR0K2gAAoJEHm+PkMAQRiGWsEH+gMZSN1qRm34hZ82q1Tx7HvL
 Eb/Gsl3Qw/7G2TlTqgjBUs36IdqV9O2cui/aa3/TfXvdvrx+0GlhRkEwQPc+ygcO
 Mvoyoke4tT4+4jVFdCg1J8avREsa28/6oaHs0ZZxuVmJBBLTJH7aXaNsGn6eU1q9
 9+p798MQis6naIiPC63somlZcCIiBhsuWCPWpEfLMn8G1HWAFTM3xXIbNBqe/brS
 bmIOfhomlIZ5dcdaXGvjtP3+KJhkNDwhkPC4tVYu8JqqgSlrE+a+EGyEuuGqKk10
 U+swiqyuD31uBI9ga54u/2FzSqDiAu6YOcMXevjo/m3g9XLdYbYLvN+nvN8alCQ=
 =Ob6Z
 -----END PGP SIGNATURE-----

Merge tag 'v3.10' into next

Merge 3.10 in order to get some of the last minute powerpc
changes, resolve conflicts and add additional fixes on top
of them.
2013-07-01 17:57:25 +10:00
Alexander Graf
a3ff5fbc94 KVM: PPC: Ignore PIR writes
While technically it's legal to write to PIR and have the identifier changed,
we don't implement logic to do so because we simply expose vcpu_id to the guest.

So instead, let's ignore writes to PIR. This ensures that we don't inject faults
into the guest for something the guest is allowed to do. While at it, we cross
our fingers hoping that it also doesn't mind that we broke its PIR read values.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:22 +02:00
Paul Mackerras
681562cd56 KVM: PPC: Book3S PR: Invalidate SLB entries properly
At present, if the guest creates a valid SLB (segment lookaside buffer)
entry with the slbmte instruction, then invalidates it with the slbie
instruction, then reads the entry with the slbmfee/slbmfev instructions,
the result of the slbmfee will have the valid bit set, even though the
entry is not actually considered valid by the host.  This is confusing,
if not worse.  This fixes it by zeroing out the orige and origv fields
of the SLB entry structure when the entry is invalidated.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:22 +02:00
Paul Mackerras
0f296829b5 KVM: PPC: Book3S PR: Allow guest to use 1TB segments
With this, the guest can use 1TB segments as well as 256MB segments.
Since we now have the situation where a single emulated guest segment
could correspond to multiple shadow segments (as the shadow segments
are still 256MB segments), this adds a new kvmppc_mmu_flush_segment()
to scan for all shadow segments that need to be removed.

This restructures the guest HPT (hashed page table) lookup code to
use the correct hashing and matching functions for HPTEs within a
1TB segment.  We use the standard hpt_hash() function instead of
open-coding the hash calculation, and we use HPTE_V_COMPARE() with
an AVPN value that has the B (segment size) field included.  The
calculation of avpn is done a little earlier since it doesn't change
in the loop starting at the do_second label.

The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that
it returns a 256MB VSID even if the guest SLB entry is a 1TB entry.
This is because the users of this function are creating 256MB SLB
entries.  We set a new VSID_1T flag so that entries created from 1T
segments don't collide with entries from 256MB segments.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:22 +02:00
Paul Mackerras
6ed1485f65 KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
The loop in kvmppc_mmu_book3s_64_xlate() that looks up a translation
in the guest hashed page table (HPT) keeps going if it finds an
HPTE that matches but doesn't allow access.  This is incorrect; it
is different from what the hardware does, and there should never be
more than one matching HPTE anyway.  This fixes it to stop when any
matching HPTE is found.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:22 +02:00
Paul Mackerras
bc1bc4e392 KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
On entering a PR KVM guest, we invalidate the whole SLB before loading
up the guest entries.  We do this using an slbia instruction, which
invalidates all entries except entry 0, followed by an slbie to
invalidate entry 0.  However, the slbie turns out to be ineffective
in some circumstances (specifically when the host linear mapping uses
64k pages) because of errors in computing the parameter to the slbie.
The result is that the guest kernel hangs very early in boot because
it takes a DSI the first time it tries to access kernel data using
a linear mapping address in real mode.

Currently we construct bits 36 - 43 (big-endian numbering) of the slbie
parameter by taking bits 56 - 63 of the SLB VSID doubleword.  These bits
for the tlbie are C (class, 1 bit), B (segment size, 2 bits) and 5
reserved bits.  For the SLB VSID doubleword these are C (class, 1 bit),
reserved (1 bit), LP (large page size, 2 bits), and 4 reserved bits.
Thus we are not setting the B field correctly, and when LP = 01 as
it is for 64k pages, we are setting a reserved bit.

Rather than add more instructions to calculate the slbie parameter
correctly, this takes a simpler approach, which is to set entry 0 to
zeroes explicitly.  Normally slbmte should not be used to invalidate
an entry, since it doesn't invalidate the ERATs, but it is OK to use
it to invalidate an entry if it is immediately followed by slbia,
which does invalidate the ERATs.  (This has been confirmed with the
Power architects.)  This approach takes fewer instructions and will
work whatever the contents of entry 0.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:21 +02:00
Paul Mackerras
8ed7b7e9d2 KVM: PPC: Book3S PR: Fix proto-VSID calculations
This makes sure the calculation of the proto-VSIDs used by PR KVM
is done with 64-bit arithmetic.  Since vcpu3s->context_id[] is int,
when we do vcpu3s->context_id[0] << ESID_BITS the shift will be done
with 32-bit instructions, possibly leading to significant bits
getting lost, as the context id can be up to 524283 and ESID_BITS is
18.  To fix this we cast the context id to u64 before shifting.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:21 +02:00
Tiejun Chen
5f17ce8b95 KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
Availablity of the doorbell_exception function is guarded by
CONFIG_PPC_DOORBELL. Use the same define to guard our caller
of it.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[agraf: improve patch description]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:21 +02:00
Aneesh Kumar K.V
db7cb5b924 powerpc/kvm: Handle transparent hugepage in KVM
We can find pte that are splitting while walking page tables. Return
None pte in that case.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:55 +10:00
Aneesh Kumar K.V
12bc9f6fc1 powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
document why we don't need to handle transparent hugepages at callsites.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:54 +10:00
Aneesh Kumar K.V
db3d853490 powerpc/mm: handle hugepage size correctly when invalidating hpte entries
If a hash bucket gets full, we "evict" a more/less random entry from it.
When we do that we don't invalidate the TLB (hpte_remove) because we assume
the old translation is still technically "valid". This implies that when
we are invalidating or updating pte, even if HPTE entry is not valid
we should do a tlb invalidate. With hugepages, we need to pass the correct
actual page size value for tlb invalidation.

This change update the patch 0608d69246
"powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle
transparent hugepages correctly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:52 +10:00
Ingo Molnar
b1fe9987b7 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

"The major changes for this series are:

 1.      Simplify RCU's grace-period and callback processing based on
         the new numbering for callbacks.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/330.

 2.      Documentation updates.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/348.

 3.      Miscellaneous fixes, including converting a few remaining printk()
         calls to pr_*().  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/324.

 4.      SRCU-related changes and fixes.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/425.

 5.      Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
         single-CPU low-latency systems.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/427."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:48:57 +02:00
Scott Wood
f8941fbe20 kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
kwmppc_lazy_ee_enable() should be called as late as possible,
or else we get things like WARN_ON(preemptible()) in enable_kernel_fp()
in configurations where preemptible() works.

Note that book3s_pr already waits until just before __kvmppc_vcpu_run
to call kvmppc_lazy_ee_enable().

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-19 12:15:13 +02:00
Scott Wood
7c11c0ccc7 kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
EE is hard-disabled on entry to kvmppc_handle_exit(), so call
hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
is unset.

Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
and sometimes host kernel hangs.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:11:00 +03:00
Scott Wood
f1e89028f0 kvm/ppc/booke: Hold srcu lock when calling gfn functions
KVM core expects arch code to acquire the srcu lock when calling
gfn_to_memslot and similar functions.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:59 +03:00
Scott Wood
2b6398fcf2 kvm/ppc/booke64: Disable e6500 support
The previous patch made 64-bit booke KVM build again, but Altivec
support is still not complete, and we can't prevent the guest from
turning on Altivec (which can corrupt host state until state
save/restore is implemented).  Disable e6500 on KVM until this is
fixed.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:56 +03:00
Lai Jiangshan
505d6421bd powerpc,kvm: fix imbalance srcu_read_[un]lock()
At the point of up_out label in kvmppc_hv_setup_htab_rma(),
srcu read lock is still held.

We have to release it before return.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: kvm@vger.kernel.org
Cc: kvm-ppc@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:26 -07:00
Paul Mackerras
8e44ddc3f3 powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation
This adds the remaining two hypercalls defined by PAPR for manipulating
the XICS interrupt controller, H_IPOLL and H_XIRR_X.  H_IPOLL returns
information about the priority and pending interrupts for a virtual
cpu, without changing any state.  H_XIRR_X is like H_XIRR in that it
reads and acknowledges the highest-priority pending interrupt, but it
also returns the timestamp (timebase register value) from when the
interrupt was first received by the hypervisor.  Currently we just
return the current time, since we don't do any software queueing of
virtual interrupts inside the XICS emulation code.

These hcalls are not currently used by Linux guests, but may be in
future.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Marc Zyngier
535cf7b3b1 KVM: get rid of $(addprefix ../../../virt/kvm/, ...) in Makefiles
As requested by the KVM maintainers, remove the addprefix used to
refer to the main KVM code from the arch code, and replace it with
a KVM variable that does the same thing.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Christoffer Dall <cdall@cs.columbia.edu>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-19 15:14:00 +03:00
Linus Torvalds
01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Linus Torvalds
5a148af669 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlights this time around are:

   - A pile of addition POWER8 bits and nits, such as updated
     performance counter support (Michael Ellerman), new branch history
     buffer support (Anshuman Khandual), base support for the new PCI
     host bridge when not using the hypervisor (Gavin Shan) and other
     random related bits and fixes from various contributors.

   - Some rework of our page table format by Aneesh Kumar which fixes a
     thing or two and paves the way for THP support.  THP itself will
     not make it this time around however.

   - More Freescale updates, including Altivec support on the new e6500
     cores, new PCI controller support, and a pile of new boards support
     and updates.

   - The usual batch of trivial cleanups & fixes"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  powerpc: Fix build error for book3e
  powerpc: Context switch the new EBB SPRs
  powerpc: Turn on the EBB H/FSCR bits
  powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
  powerpc: Setup BHRB instructions facility in HFSCR for POWER8
  powerpc: Fix interrupt range check on debug exception
  powerpc: Update tlbie/tlbiel as per ISA doc
  powerpc: Print page size info during boot
  powerpc: print both base and actual page size on hash failure
  powerpc: Fix hpte_decode to use the correct decoding for page sizes
  powerpc: Decode the pte-lp-encoding bits correctly.
  powerpc: Use encode avpn where we need only avpn values
  powerpc: Reduce PTE table memory wastage
  powerpc: Move the pte free routines from common header
  powerpc: Reduce the PTE_INDEX_SIZE
  powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
  powerpc: New hugepage directory format
  powerpc: Don't truncate pgd_index wrongly
  powerpc: Don't hard code the size of pte page
  powerpc: Save DAR and DSISR in pt_regs on MCE
  ...
2013-05-02 10:16:16 -07:00
Paul Mackerras
5975a2e095 KVM: PPC: Book3S: Add API for in-kernel XICS emulation
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it.  The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.

The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source.  The attribute number is the same as the interrupt
source number.

This does not support irq routing or irqfd yet.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:36 +02:00
Wei Yongjun
d133b40f2c kvm/ppc/mpic: fix missing unlock in set_base_addr()
Add the missing unlock before return from function set_base_addr()
when disables the mapping.

Introduced by commit 5df554ad5b
(kvm/ppc/mpic: in-kernel MPIC emulation)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:35 +02:00