Commit Graph

23422 Commits

Author SHA1 Message Date
Paolo Bonzini
0e3d0648bd KVM: x86: MMU: always set accessed bit in shadow PTEs
Commit 7a1638ce42 ("nEPT: Redefine EPT-specific link_shadow_page()",
2013-08-05) says:

    Since nEPT doesn't support A/D bit, we should not set those bit
    when building the shadow page table.

but this is not necessary.  Even though nEPT doesn't support A/D
bits, and hence the vmcs12 EPT pointer will never enable them,
we always use them for shadow page tables if available (see
construct_eptp in vmx.c).  So we can set the A/D bits freely
in the shadow page table.

This patch hence basically reverts commit 7a1638ce42.

Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:23 +01:00
Paolo Bonzini
aba2f06c07 KVM: x86: correctly print #AC in traces
Poor #AC was so unimportant until a few days ago that we were
not even tracing its name correctly.  But now it's all over
the place.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:23 +01:00
Paolo Bonzini
46896c73c1 KVM: svm: add support for RDTSCP
RDTSCP was never supported for AMD CPUs, which nobody noticed because
Linux does not use it.  But exactly the fact that Linux does not
use it makes the implementation very simple; we can freely trash
MSR_TSC_AUX while running the guest.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Paolo Bonzini
9dbe6cf941 KVM: x86: expose MSR_TSC_AUX to userspace
If we do not do this, it is not properly saved and restored across
migration.  Windows notices due to its self-protection mechanisms,
and is very upset about it (blue screen of death).

Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
db3975717a kvm/x86: Hyper-V kvm exit
A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
5c919412fe kvm/x86: Hyper-V synthetic interrupt controller
SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
d62caabb41 kvm/x86: per-vcpu apicv deactivation support
The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Andrey Smetanin
6308630bd3 kvm/x86: split ioapic-handled and EOI exit bitmaps
The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Andrey Smetanin
abdb080f7a kvm/irqchip: kvm_arch_irq_routing_update renaming split
Actually kvm_arch_irq_routing_update() should be
kvm_arch_post_irq_routing_update() as it's called at the end
of irq routing update.

This renaming frees kvm_arch_irq_routing_update function name.
kvm_arch_irq_routing_update() weak function which will be used
to update mappings for arch-specific irq routing entries
(in particular, the upcoming Hyper-V synthetic interrupts).

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Haozhong Zhang
b2467e744f KVM: nVMX: remove incorrect vpid check in nested invvpid emulation
This patch removes the vpid check when emulating nested invvpid
instruction of type all-contexts invalidation. The existing code is
incorrect because:
 (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
     Translations Based on VPID", invvpid instruction does not check
     vpid in the invvpid descriptor when its type is all-contexts
     invalidation.
 (2) According to the same document, invvpid of type all-contexts
     invalidation does not require there is an active VMCS, so/and
     get_vmcs12() in the existing code may result in a NULL-pointer
     dereference. In practice, it can crash both KVM itself and L1
     hypervisors that use invvpid (e.g. Xen).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 15:52:55 +01:00
Paolo Bonzini
8bd142c016 KVM/ARM Fixes for v4.4-rc3.
Includes some timer fixes, properly unmapping PTEs, an errata fix, and two
 tweaks to the EL2 panic code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWVJ7gAAoJEEtpOizt6ddyD5MH/3M/nhtZTnT6v0RPDvHWJo7s
 5BQmITJYPHFkTO14OHWTVLXiGgLws8gPZnWHxC4jjHjpuJnL+/MM551FpCOqDDd7
 vweYgVlSqD8ANH5nKbv1PPnzjrqhTVN+yi3ZItXy2pxsfvu63FC6Z43B2axelLvw
 XYmHoMZaeWBBw2gHi3djGfju3Yj/2SOe+ozuvAXpxA5+NhSiPHHnMefGy5k3wKnJ
 sETwshPdjiMeK4ItfMhveFTDRjl4uh9uQyORfaa5gqG0uePt3EalYynw+gEjZ6RX
 Bpc3nLwboIfRIa/WwyoHm+nmLIUYjU8dAgLwUOIbdeG0igpdALdvsB0aBHCgngk=
 =+7ED
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/ARM Fixes for v4.4-rc3.

Includes some timer fixes, properly unmapping PTEs, an errata fix, and two
tweaks to the EL2 panic code.
2015-11-24 19:34:40 +01:00
Andy Lutomirski
478dc89cf3 x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots
On CONFIG_CONTEXT_TRACKING kernels that have context tracking
disabled at runtime (which includes most distro kernels), we
still have the overhead of a call to enter_from_user_mode in
interrupt and exception entries.

If jump labels are available, this uses the jump label
infrastructure to skip the call.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/73ee804fff48cd8c66b65b724f9f728a11a8c686.1447361906.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:56:44 +01:00
Andy Lutomirski
2671c3e4fe x86/asm: Add asm macros for static keys/jump labels
Unfortunately, we can only do this if HAVE_JUMP_LABEL.  In
principle, we could do some serious surgery on the core jump
label infrastructure to keep the patch infrastructure available
on x86 on all builds, but that's probably not worth it.

Implementing the macros using a conditional branch as a fallback
seems like a bad idea: we'd have to clobber flags.

This limitation can't cause silent failures -- trying to include
asm/jump_label.h at all on a non-HAVE_JUMP_LABEL kernel will
error out.  The macro's users are responsible for handling this
issue themselves.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/63aa45c4b692e8469e1876d6ccbb5da707972990.1447361906.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:56:44 +01:00
Andy Lutomirski
c28454332f x86/asm: Error out if asm/jump_label.h is included inappropriately
Rather than potentially generating incorrect code on a
non-HAVE_JUMP_LABEL kernel if someone includes asm/jump_label.h,
error out.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/99407f0ac7fa3ab03a3d31ce076d47b5c2f44795.1447361906.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:56:43 +01:00
Ingo Molnar
49b2410631 Merge branch 'x86/urgent' into x86/asm, to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:55:11 +01:00
Andy Lutomirski
f10750536f x86/entry/64: Fix irqflag tracing wrt context tracking
Paolo pointed out that enter_from_user_mode could be called
while irqflags were traced as though IRQs were on.

In principle, this could confuse lockdep.  It doesn't cause any
problems that I've seen in any configuration, but if I build
with CONFIG_DEBUG_LOCKDEP=y, enable a nohz_full CPU, and add
code like:

	if (irqs_disabled()) {
		spin_lock(&something);
		spin_unlock(&something);
	}

to the top of enter_from_user_mode, then lockdep will complain
without this fix.  It seems that lockdep's irqflags sanity
checks are too weak to detect this bug without forcing the
issue.

This patch adds one byte to normal kernels, and it's IMO a bit
ugly. I haven't spotted a better way to do this yet, though.
The issue is that we can't do TRACE_IRQS_OFF until after SWAPGS
(if needed), but we're also supposed to do it before calling C
code.

An alternative approach would be to call trace_hardirqs_off in
enter_from_user_mode.  That would be less code and would not
bloat normal kernels at all, but it would be harder to see how
the code worked.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/86237e362390dfa6fec12de4d75a238acb0ae787.1447361906.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:55:02 +01:00
Borislav Petkov
b7106fa0f2 x86/fpu: Get rid of xstate_fault()
Add macros for the alternative XSAVE*/XRSTOR* operations which
contain the fault handling and use them. Kill xstate_fault().

Also, copy_xregs_to_kernel() didn't have the extended state as
memory reference in the asm.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447932326-4371-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:52:52 +01:00
Borislav Petkov
b74a0cf1b3 x86/fpu: Add an XSTATE_OP() macro
Add an XSTATE_OP() macro which contains the XSAVE* fault handling
and replace all non-alternatives users of xstate_fault() with
it.

This fixes also the buglet in copy_xregs_to_user() and
copy_user_to_xregs() where the inline asm didn't have @xstate as
memory reference and thus potentially causing unwanted
reordering of accesses to the extended state.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447932326-4371-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:52:52 +01:00
Juergen Gross
42baa2581c x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume
Saving and restoring lapic vectors in lapic_suspend() and
lapic_resume() is not consistent: the thmr vector saving is
guarded by a different config option than the restore part. The
cmci vector isn't handled at all.

Those inconsistencies are not very critical, as the missing cmci
vector will be set via mce resume handling, the wrong config
option used for restoring the thmr vector can't be configured
differently than the one which should be used.

Nevertheless correct the thmr vector restore and add cmci vector
handling.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448276364-31334-1-git-send-email-jgross@suse.com
[ Minor code edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:18:33 +01:00
Borislav Petkov
679bcea857 x86/MSR: Chop off lower 32-bit value
sparse complains that the cast truncates the high bits. But here
we really do know what we're doing and we need the lower 32 bits
only as the @low argument. So make that explicit.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:55 +01:00
Borislav Petkov
31ac34ca56 x86/cpu: Fix MSR value truncation issue
So sparse rightfully complains that the u64 MSR value we're
writing into the STAR MSR, i.e. 0xc0000081, is being truncated:

./arch/x86/include/asm/msr.h:193:36: warning: cast truncates
bits from constant value (23001000000000 becomes 0)

because the actual value doesn't fit into the unsigned 32-bit
quantity which are the @low and @high wrmsrl() parameters.

This is not a problem, practically, because gcc is actually
being smart enough here and does the right thing:

  .loc 3 87 0
  xorl    %esi, %esi		# we needz a 32-bit zero
  movl    $2293776, %edx	# 0x00230010 == (__USER32_CS << 16) | __KERNEL_CS go into the high bits
  movl    $-1073741695, %ecx	# MSR_STAR, i.e., 0xc0000081
  movl    %esi, %eax		# low order 32 bits in the MSR which are 0
  #APP
  # 87 "./arch/x86/include/asm/msr.h" 1
          wrmsr

More specifically, MSR_STAR[31:0] is being set to 0. That field
is reserved on Intel and on AMD it is 32-bit SYSCALL Target EIP.

I'd strongly guess because Intel doesn't have SYSCALL in
compat/legacy mode and we're using SYSENTER and INT80 there. And
for compat syscalls in long mode we use CSTAR.

So let's fix the sparse warning by writing SYSRET and SYSCALL CS
and SS into the high 32-bit half of STAR and 0 in the low half
explicitly.

 [ Actually, if we had to be precise, we would have to read what's in
   STAR[31:0] and write it back unchanged on Intel and write 0 on AMD. I
   guess the current writing to 0 is still ok since Intel can apparently
   stomach it. ]

The resulting code is identical to what we have above:

  .loc 3 87 0
  xorl    %esi, %esi      # tmp104
  movl    $2293776, %eax  #, tmp103
  movl    $-1073741695, %ecx      #, tmp102
  movl    %esi, %edx      # tmp104, tmp104

  ...

        wrmsr

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:55 +01:00
Borislav Petkov
ae8b787543 x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it
checks whether the way access filter is enabled on some F15h
models, and, if so, disables it.

kvm doesn't handle that MSR access and complains about it, which
can get really noisy in dmesg when one starts kvm guests all the
time for testing. And it is useless anyway - guest kernel
shouldn't be doing such changes anyway so tell it that that
filter is disabled.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Borislav Petkov
91713faf38 kvm: Add accessors for guest CPU's family, model, stepping
Those give the family, model and stepping of the guest vcpu.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Borislav Petkov
99f925ce92 x86/cpu: Unify CPU family, model, stepping calculation
Add generic functions which calc family, model and stepping from
the CPUID_1.EAX leaf and stick them into the library we have.

Rename those which do call CPUID with the prefix "x86_cpuid" as
suggested by Paolo Bonzini.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Borislav Petkov
feab21f835 x86/mce: Make usable address checks Intel-only
The MCi_MISC bitfield definitions mce_usable_address() checks
are Intel-only. Make them so.

While at it, move mce_usable_address() up, before all its
callers and get rid of the forward declaration.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Borislav Petkov
db548a28fc x86/mce: Add the missing memory error check on AMD
We simply need to look at the extended error code when detecting
whether the error is of type memory.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Borislav Petkov
c0ec382e19 x86/RAS: Remove mce.usable_addr
It is useless and we can use the function instead. Besides,
mcelog(8) hasn't managed to make use of it yet. So kill it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Tony Luck
8b38937b7a x86/mce: Do not enter deferred errors into the generic pool twice
We used to have a special ring buffer for deferred errors that
was used to mark problem pages. We replaced that with a generic
pool. Then later converted mce_log() to also use the same pool.
As a result, we end up adding all deferred errors to the pool
twice.

Rearrange this code. Make sure to set the m.severity and
m.usable_addr fields for deferred errors. Then if flags and
mca_cfg.dont_log_ce mean we call mce_log() we are done, because
that will add this entry to the generic pool.

If we skipped mce_log(), then we still want to take action for
the deferred error, so add to the pool.

Change the name of the boolean "error_logged" to "error_seen",
we should set it whether of not we logged an error because the
return value from machine_check_poll() is used to decide whether
storms have subsided or not.

Reported-by: Gong Chen <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1448350880-5573-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Kees Cook
8609d1b5da x86/mm: Turn CONFIG_X86_PTDUMP into a module
Being able to examine page tables is handy, so make this a
module that can be loaded as needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20151120010755.GA9060@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:50:13 +01:00
Boris Ostrovsky
75ef82190d x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", usergs_sysret32 pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-4-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Boris Ostrovsky
88c15ec90f x86/paravirt: Remove the unused irq_enable_sysexit pv op
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", the irq_enable_sysexit pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Boris Ostrovsky
5fdf5d37f4 x86/xen: Avoid fast syscall path for Xen PV guests
After 32-bit syscall rewrite, and specifically after commit:

  5f310f739b ("x86/entry/32: Re-implement SYSENTER using the new C path")

... the stack frame that is passed to xen_sysexit is no longer a
"standard" one (i.e. it's not pt_regs).

Since we end up calling xen_iret from xen_sysexit we don't need
to fix up the stack and instead follow entry_SYSENTER_32's IRET
path directly to xen_iret.

We can do the same thing for compat mode even though stack does
not need to be fixed. This will allow us to drop usergs_sysret32
paravirt op (in the subsequent patch)

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-2-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Borislav Petkov
2d5be37d68 x86/microcode: Initialize the driver late when facilities are up
Running microcode_init() from setup_arch() is a bad idea because
not even kmalloc() is ready at that point and the loader does
all kinds of allocations and init/registration with various
subsystems.

Make it a late initcall when required facilities are initialized
so that the microcode driver initialization can succeed too.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151120112400.GC4028@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:39:49 +01:00
Waiman Long
d78045306c locking/pvqspinlock, x86: Optimize the PV unlock code path
The unlock function in queued spinlocks was optimized for better
performance on bare metal systems at the expense of virtualized guests.

For x86-64 systems, the unlock call needs to go through a
PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit
registers before calling the real __pv_queued_spin_unlock()
function. The thunk code may also be in a separate cacheline from
__pv_queued_spin_unlock().

This patch optimizes the PV unlock code path by:

 1) Moving the unlock slowpath code from the fastpath into a separate
    __pv_queued_spin_unlock_slowpath() function to make the fastpath
    as simple as possible..

 2) For x86-64, hand-coded an assembly function to combine the register
    saving thunk code with the fastpath code. Only registers that
    are used in the fastpath will be saved and restored. If the
    fastpath fails, the slowpath function will be called via another
    PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C
    __pv_queued_spin_unlock() code as the thunk saves and restores
    only one 32-bit register.

With a microbenchmark of 5M lock-unlock loop, the table below shows
the execution times before and after the patch with different number
of threads in a VM running on a 32-core Westmere-EX box with x86-64
4.2-rc1 based kernels:

  Threads	Before patch	After patch	% Change
  -------	------------	-----------	--------
     1		   134.1 ms	  119.3 ms	  -11%
     2		   1286  ms	   953  ms	  -26%
     3		   3715  ms	  3480  ms	  -6.3%
     4		   4092  ms	  3764  ms	  -8.0%

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447114167-47185-5-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:02:02 +01:00
Andi Kleen
b7883a1c4f perf/x86: Handle multiple umask bits for BDW CYCLE_ACTIVITY.*
The earlier constraint fix for Broadwell CYCLE_ACTIVITY.*
forced umask 8 to counter 2. For this it used UEVENT,
to match the complete umask.

The event list for Broadwell has an additional
STALLS_L1D_PENDIND event that uses umask 8, but also
sets other bits in the umask.  The earlier strict umask match
didn't handle this case.

Add a new UBIT_EVENT constraint macro that only matches
the specified bits in the umask. Then use that macro
to handle CYCLE_ACTIVITY.* on Broadwell.

The documented event also uses cmask, but there's no
need to let the event scheduler know about the cmask,
as the scheduling restriction is only tied to the umask.

Reported-by: Grant Ayers <ayers@cs.stanford.edu>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1447719667-9998-1-git-send-email-andi@firstfloor.org
[ Filled in the missing email address of Grant Ayers - hopefully I got the right one. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:27 +01:00
Takao Indoh
da06a43d3f perf, x86: Stop Intel PT before kdump starts
This patch stops Intel PT logging and saves its registers in memory
before kdump is started. This feature is needed to prevent Intel PT from
overwriting its log buffer after panic, and saved registers are needed to
find the last position where Intel PT wrote data.

After the crash dump is captured by kdump, users can retrieve the log buffer
from the vmcore and use it to investigate bad kernel behavior.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-3-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:26 +01:00
Takao Indoh
24cc12b176 perf/x86/intel/pt: Add interface to stop Intel PT logging
This patch add a function for external components to stop Intel PT.
Basically this function is used when kernel panic occurs. When it is
called, the intel_pt driver disables Intel PT and saves its registers
using pt_event_stop(), which is also used by pmu.stop handler.

This function stops Intel PT on the CPU where it is working, therefore
users of it need to call it for each CPU to stop all logging.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-2-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:26 +01:00
Andi Kleen
b16a5b52eb perf/x86: Add option to disable reading branch flags/cycles
With LBRv5 reading the extra LBR flags like mispredict, TSX, cycles is
not free anymore, as it has moved to a separate MSR.

For callstack mode we don't need any of this information; so we can
avoid the unnecessary MSR read. Add flags to the perf interface where
perf record can request not collecting this information.

Add branch_sample_type flags for CYCLES and FLAGS. It's a bit unusual
for branch_sample_types to be negative (disable), not positive (enable),
but since the legacy ABI reported the flags we need some form of
explicit disabling to avoid breaking the ABI.

After we have the flags the x86 perf code can keep track if any users
need the flags. If noone needs it the information is not collected.

This cuts down the cost of LBR callstack on Skylake significantly.
Profiling a kernel build with LBR call stack the average run time of
the PMI handler drops by 43%.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:25 +01:00
Andi Kleen
75925e1ad7 perf/x86: Optimize stack walk user accesses
Change the perf user stack walking to use the new
__copy_from_user_nmi(), and split each access into word sized transfer
sizes. This allows to inline the complete access and optimize it all
into a single load.

The main advantage is that this avoids the overhead of double page
faults.  When normal copy_from_user() fails it reexecutes the copy to
compute an accurate number of non copied bytes. This leads to
executing the expensive page fault twice.

While walking stacks having a fault at some point is relatively common
(typically when some part of the program isn't compiled with frame
pointers), so this is a large overhead.

With the optimized copies we avoid this problem because they only do
all accesses once. And of course they're much faster too when the
access does not fault because they're just single instructions instead
of complex function calls.

While profiling a kernel build with -g, the patch brings down the
average time of the PMI handler from 966ns to 552ns (-43%).

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1445551641-13379-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:25 +01:00
Andi Kleen
10013ebb5d x86: Add an inlined __copy_from_user_nmi() variant
Add a inlined __ variant of copy_from_user_nmi. The inlined variant allows
the user to:

 - batch the access_ok() check for multiple accesses

 - avoid having a pagefault_disable/enable() on every access if the
   caller already ensures disabled page faults due to its context.

 - get all the optimizations in copy_*_user() for small constant sized
   transfers

It is just a define to __copy_from_user_inatomic().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445551641-13379-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:24 +01:00
Peter Zijlstra
90eec103b9 treewide: Remove old email address
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:44:58 +01:00
Andi Kleen
b28ae9560b perf/x86: Fix LBR call stack save/restore
This fixes a bug I added in the following commit:

  90405aa022 ("perf/x86/intel/lbr: Limit LBR accesses to TOS in callstack mode")

The bug could lead to lost LBR call stacks. When restoring the LBR state
we need to use the TOS of the previous context, not the current context.
To do that we need to save/restore the TOS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:44:57 +01:00
Stephane Eranian
614e4c4ebc perf/core: Robustify the perf_cgroup_from_task() RCU checks
This patch reinforces the lockdep checks performed by
perf_cgroup_from_tsk() by passing the perf_event_context
whenever possible. It is okay to not hold the RCU read lock
when we know we hold the ctx->lock. This patch makes sure this
property holds.

In some functions, such as perf_cgroup_sched_in(), we do not
pass the context because we are sure we are holding the RCU
read lock.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: edumazet@google.com
Link: http://lkml.kernel.org/r/1447322404-10920-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:21:03 +01:00
Len Brown
2fde46b79e x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs
Fix a Linux-4.3 corner case performance regression, introduced by commit:

  f1ccd24931 ("x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior")

which allowed the cmdline  "cpu_init_udelay=" to work with all values,
including the default of 10000.

But in setting the default of 10000, it over-rode the code stat sets
the delay 0 on modern processors.

Also, tidy up use of INT/UINT.

Reported-by: Shane <shrybman@teksavvy.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dparsons@brightdsl.net
Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:08:33 +01:00
Linus Torvalds
069ec22915 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This update contains:

   - MPX updates for handling 32bit processes

   - A fix for a long standing bug in 32bit signal frame handling
     related to FPU/XSAVE state

   - Handle get_xsave_addr() correctly in KVM

   - Fix SMAP check under paravirtualization

   - Add a comment to the static function trace entry to avoid further
     confusion about the difference to dynamic tracing"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix SMAP check in PVOPS environments
  x86/ftrace: Add comment on static function tracing
  x86/fpu: Fix get_xsave_addr() behavior under virtualization
  x86/fpu: Fix 32-bit signal frame handling
  x86/mpx: Fix 32-bit address space calculation
  x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels
2015-11-22 12:00:12 -08:00
Andrew Cooper
581b7f158f x86/cpu: Fix SMAP check in PVOPS environments
There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:07:49 +01:00
Namhyung Kim
112677d683 x86/ftrace: Add comment on static function tracing
There was a confusion between update_ftrace_function() and static
function tracing trampoline regarding 3rd parameter (ftrace_ops).
Add a comment for clarification.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1447721004-2551-1-git-send-email-namhyung@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:07:49 +01:00
Juergen Gross
4609586592 x86/paravirt: Remove unused pv_apic_ops structure
The only member of that structure is startup_ipi_hook which is always
set to paravirt_nop.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: jeremy@goop.org
Cc: chrisw@sous-sol.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xen.org
Cc: konrad.wilk@oracle.com
Cc: boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/1447767872-16730-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:03:58 +01:00
Thomas Gleixner
2f7a3f8e87 x86/tsc: Remove unused tsc_pre_init() hook
No more users. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
2015-11-19 11:03:13 +01:00
Matt Gingell
62a193edaf KVM: x86: request interrupt window when IRQ chip is split
Before this patch, we incorrectly enter the guest without requesting an
interrupt window if the IRQ chip is split between user space and the
kernel.

Because lapic_in_kernel no longer implies the PIC is in the kernel, this
patch tests pic_in_kernel to determining whether an interrupt window
should be requested when entering the guest.

If the APIC is in the kernel and we request an interrupt window the
guest will return immediately. If the APIC is masked the guest will not
not make forward progress and unmask it, leading to a loop when KVM
reenters and requests again. This patch adds a check to ensure the APIC
is ready to accept an interrupt before requesting a window.

Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Matt Gingell <gingell@google.com>
[Use the other newly introduced functions. - Paolo]
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:39 +01:00
Matt Gingell
934bf65354 KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space
Set KVM_REQ_EVENT when a PIC in user space injects a local interrupt.

Currently a request is only made when neither the PIC nor the APIC is in
the kernel, which is not sufficient in the split IRQ chip case.

This addresses a problem in QEMU where interrupts are delayed until
another path invokes the event loop.

Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Matt Gingell <gingell@google.com>
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:38 +01:00
Matt Gingell
782d422bca KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection
This patch breaks out a new function kvm_vcpu_ready_for_interrupt_injection.
This routine encapsulates the logic required to determine whether a vcpu
is ready to accept an interrupt injection, which is now required on
multiple paths.

Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Matt Gingell <gingell@google.com>
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:38 +01:00
Matt Gingell
127a457acb KVM: x86: fix interrupt window handling in split IRQ chip case
This patch ensures that dm_request_for_irq_injection and
post_kvm_run_save are in sync, avoiding that an endless ping-pong
between userspace (who correctly notices that IF=0) and
the kernel (who insists that userspace handles its request
for the interrupt window).

To synchronize them, it also adds checks for kvm_arch_interrupt_allowed
and !kvm_event_needs_reinjection.  These are always needed, not
just for in-kernel LAPIC.

Signed-off-by: Matt Gingell <gingell@google.com>
[A collage of two patches from Matt. - Paolo]
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:37 +01:00
Juergen Gross
ed29210cd6 x86: Remove unused function cpu_has_ht_siblings()
It is used nowhere.

Signed-off-by: Juergen Gross <jgross@suse.com>
Link: http://lkml.kernel.org/r/1447761943-770-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-17 15:40:17 +01:00
Rafael J. Wysocki
d9f67dbc0f Merge branch 'pm-tools'
* pm-tools:
  x86: remove unused definition of MSR_NHM_PLATFORM_INFO
  tools/power turbostat: use new name for MSR_PLATFORM_INFO
2015-11-16 22:57:02 +01:00
Linus Torvalds
0ca9b67606 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
 "Mostly updates to the perf tool plus two fixes to the kernel core code:

   - Handle tracepoint filters correctly for inherited events (Peter
     Zijlstra)

   - Prevent a deadlock in perf_lock_task_context (Paul McKenney)

   - Add missing newlines to some pr_err() calls (Arnaldo Carvalho de
     Melo)

   - Print full source file paths when using 'perf annotate --print-line
     --full-paths' (Michael Petlan)

   - Fix 'perf probe -d' when just one out of uprobes and kprobes is
     enabled (Wang Nan)

   - Add compiler.h to list.h to fix 'make perf-tar-src-pkg' generated
     tarballs, i.e. out of tree building (Arnaldo Carvalho de Melo)

   - Add the llvm-src-base.c and llvm-src-kbuild.c files, generated by
     the 'perf test' LLVM entries, when running it in-tree, to
     .gitignore (Yunlong Song)

   - libbpf error reporting improvements, using a strerror interface to
     more precisely tell the user about problems with the provided
     scriptlet, be it in C or as a ready made object file (Wang Nan)

   - Do not be case sensitive when searching for matching 'perf test'
     entries (Arnaldo Carvalho de Melo)

   - Inform the user about objdump failures in 'perf annotate' (Andi
     Kleen)

   - Improve the LLVM 'perf test' entry, introduce a new ones for BPF
     and kbuild tests to check the environment used by clang to compile
     .c scriptlets (Wang Nan)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
  tools include: Add compiler.h to list.h
  perf probe: Verify parameters in two functions
  perf session: Add missing newlines to some pr_err() calls
  perf annotate: Support full source file paths for srcline fix
  perf test: Add llvm-src-base.c and llvm-src-kbuild.c to .gitignore
  perf: Fix inherited events vs. tracepoint filters
  perf: Disable IRQs across RCU RS CS that acquires scheduler lock
  perf test: Do not be case sensitive when searching for matching tests
  perf test: Add 'perf test BPF'
  perf test: Enhance the LLVM tests: add kbuild test
  perf test: Enhance the LLVM test: update basic BPF test program
  perf bpf: Improve BPF related error messages
  perf tools: Make fetch_kernel_version() publicly available
  bpf tools: Add new API bpf_object__get_kversion()
  bpf tools: Improve libbpf error reporting
  perf probe: Cleanup find_perf_probe_point_from_map to reduce redundancy
  perf annotate: Inform the user about objdump failures in --stdio
  perf stat: Make stat options global
  perf sched latency: Fix thread pid reuse issue
  ...
2015-11-15 09:36:24 -08:00
Linus Torvalds
bba072dfd7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A couple of fixes and updates related to x86:

   - Fix the W+X check regression on XEN

   - The real fix for the low identity map trainwreck

   - Probe legacy PIC early instead of unconditionally allocating legacy
     irqs

   - Add cpu verification to long mode entry

   - Adjust the cache topology to AMD Fam17H systems

   - Let Merrifield use the TSC across S3"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Call verify_cpu() after having entered long mode too
  x86/setup: Fix low identity map for >= 2GB kernel range
  x86/mm: Skip the hypervisor range when walking PGD
  x86/AMD: Fix last level cache topology for AMD Fam17h systems
  x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
  x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
2015-11-15 09:32:59 -08:00
Len Brown
5369a21e3f x86: remove unused definition of MSR_NHM_PLATFORM_INFO
MSR_NHM_PLATFORM_INFO has been replaced by...
MSR_PLATFORM_INFO

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-13 23:28:01 +01:00
Linus Torvalds
3370b69eb0 Four changes:
- x86: work around two nasty cases where a benign exception occurs while
 another is being delivered.  The endless stream of exceptions causes an
 infinite loop in the processor, which not even NMIs or SMIs can interrupt;
 in the virt case, there is no possibility to exit to the host either.
 
 - x86: support for Skylake per-guest TSC rate.  Long supported by AMD,
 the patches mostly move things from there to common arch/x86/kvm/ code.
 
 - generic: remove local_irq_save/restore from the guest entry and exit
 paths when context tracking is enabled.  The patches are a few months
 old, but we discussed them again at kernel summit.  Andy will pick up
 from here and, in 4.5, try to remove it from the user entry/exit paths.
 
 - PPC: Two bug fixes, see merge commit 370289756b for details.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWRFb0AAoJEL/70l94x66DjjMH/31jr8d119MW0uv2x+03+wRq
 6dbJ8tjQ8grvBRExKvLsUVjDmHlhCa1BQl5qjCsyYhX9UeAf4NQOmoEFpq+YTLxh
 Ctveyn+yiZWC7qxbQDmauiQ4JCOp+W9ial782iqw5+ouQMajGOffq5WrojCa2ZNF
 jI278JgdHJLrKj/uie//WBu3V7MJY5Apc3p4zatnSYFSQ3MA0sxl4r4zIrwOa5qs
 23ZeeoqbP4sHh4X5wL/30Y6XFSCHj0qoYHHyAgzLi0PCMvBdt4DrAFUPDG/Rhlv6
 o1WB/kcUfcz3DtBX85wfSOMuw0nF6patWhWv07R/3EIbYoz3dKvp9d6ORYgXqlY=
 =Um9M
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second batch of kvm updates from Paolo Bonzini:
 "Four changes:

   - x86: work around two nasty cases where a benign exception occurs
     while another is being delivered.  The endless stream of exceptions
     causes an infinite loop in the processor, which not even NMIs or
     SMIs can interrupt; in the virt case, there is no possibility to
     exit to the host either.

   - x86: support for Skylake per-guest TSC rate.  Long supported by
     AMD, the patches mostly move things from there to common
     arch/x86/kvm/ code.

   - generic: remove local_irq_save/restore from the guest entry and
     exit paths when context tracking is enabled.  The patches are a few
     months old, but we discussed them again at kernel summit.  Andy
     will pick up from here and, in 4.5, try to remove it from the user
     entry/exit paths.

   - PPC: Two bug fixes, see merge commit 370289756b for details"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: x86: rename update_db_bp_intercept to update_bp_intercept
  KVM: svm: unconditionally intercept #DB
  KVM: x86: work around infinite loop in microcode when #AC is delivered
  context_tracking: avoid irq_save/irq_restore on guest entry and exit
  context_tracking: remove duplicate enabled check
  KVM: VMX: Dump TSC multiplier in dump_vmcs()
  KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
  KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
  KVM: VMX: Enable and initialize VMX TSC scaling
  KVM: x86: Use the correct vcpu's TSC rate to compute time scale
  KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
  KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
  KVM: x86: Replace call-back compute_tsc_offset() with a common function
  KVM: x86: Replace call-back set_tsc_khz() with a common function
  KVM: x86: Add a common TSC scaling function
  KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
  KVM: x86: Collect information for setting TSC scaling ratio
  KVM: x86: declare a few variables as __read_mostly
  KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
  KVM: PPC: Book3S HV: Don't dynamically split core when already split
  ...
2015-11-12 14:34:06 -08:00
Huang Rui
41ac18ebfc perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1446630233-3166-1-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:44:25 +01:00
Huaitong Han
a05917b6ba x86/fpu: Fix get_xsave_addr() behavior under virtualization
KVM uses the get_xsave_addr() function in a different fashion from
the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
not the currently running task.

But 'xsave' is replaced with current task's (host) xsave structure, so
get_xsave_addr() will incorrectly return the bad xsave address to KVM.

Fix it so that the passed in 'xsave' address is used - as intended
originally.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
[ Tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:34:58 +01:00
Dave Hansen
ab6b529475 x86/fpu: Fix 32-bit signal frame handling
(This should have gone to LKML originally. Sorry for the extra
 noise, folks on the cc.)

Background:

Signal frames on x86 have two formats:

  1. For 32-bit executables (whether on a real 32-bit kernel or
     under 32-bit emulation on a 64-bit kernel) we have a
    'fpregset_t' that includes the "FSAVE" registers.

  2. For 64-bit executables (on 64-bit kernels obviously), the
     'fpregset_t' is smaller and does not contain the "FSAVE"
     state.

When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.

Problem:

save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable.  This is for real 32-bit and
ia32 emulation.

But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.

This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog().  But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed).  This leads to the FPU/XSAVE state being initialized.

For MPX, this leads to the most permissive state and means we
silently lose bounds violations.  I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state.  I'm not sure why
no one has spotted this bug.

I believe this was broken by:

	72a671ced6 ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

way back in 2012.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Cc: fenghua.yu@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:23:45 +01:00
Dave Hansen
f3119b8302 x86/mpx: Fix 32-bit address space calculation
I received a bug report that running 32-bit MPX binaries on
64-bit kernels was broken.  I traced it down to this little code
snippet.  We were switching our "number of bounds directory
entries" calculation correctly.  But, we didn't switch the other
side of the calculation: the virtual space size.

This meant that we were calculating an absurd size for
bd_entry_virt_space() on 32-bit because we used the 64-bit
virt_space.

This was _also_ broken for 32-bit kernels running on 64-bit
hardware since boot_cpu_data.x86_virt_bits=48 even when running
in 32-bit mode.

Correct that and properly handle all 3 possible cases:

 1. 32-bit binary on 64-bit kernel
 2. 64-bit binary on 64-bit kernel
 3. 32-bit binary on 32-bit kernel

This manifested in having bounds tables not properly unmapped.
It "leaked" memory but had no functional impact otherwise.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:20:37 +01:00
Dave Hansen
46561c3959 x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels
When you call get_user(foo, bar), you effectively do a

	copy_from_user(&foo, bar, sizeof(*bar));

Note that the sizeof() is implicit.

When we reach out to userspace to try to zap an entire "bounds
table" we need to go read a "bounds directory entry" in order to
locate the table's address.  The size of a "directory entry"
depends on the binary being run and is always the size of a
pointer.

But, when we have a 64-bit kernel and a 32-bit application, the
directory entry is still only 32-bits long, but we fetch it with
a 64-bit pointer which makes get_user() does a 64-bit fetch.
Reading 4 extra bytes isn't harmful, unless we are at the end of
and run off the table.  It might also cause the zero page to get
faulted in unnecessarily even if you are not at the end.

Fix it up by doing a special 32-bit get_user() via a cast when
we have 32-bit userspace.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:20:37 +01:00
Linus Torvalds
4bde961e52 Merge branch 'for-linus-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - a new hrtimer based clocksource by Anton Ivanov

 - ptrace() enhancments by Richard Weinberger

 - random cleanups and bug fixes all over the place

* 'for-linus-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Switch clocksource to hrtimers
  um: net: replace GFP_KERNEL with GFP_ATOMIC when spinlock is held
  um: Report host OOM more nicely
  um: Simplify STUB_DATA loading
  um: Remove dead symbol from i386 syscall stub
  um: Remove dead code from x86_64 syscall stub
  um: Get rid of open coded NR_SYSCALLS
  um: Store syscall number after syscall_trace_enter()
  um: Define PTRACE_OLDSETOPTIONS
2015-11-10 16:33:37 -08:00
Linus Torvalds
264015f8a8 libnvdimm for 4.4:
1/ Add support for the ACPI 6.0 NFIT hot add mechanism to process
    updates of the NFIT at runtime.
 
 2/ Teach the coredump implementation how to filter out DAX mappings.
 
 3/ Introduce NUMA hints for allocations made by the pmem driver, and as
    a side effect all devm allocations now hint their NUMA node by
    default.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQX2sAAoJEB7SkWpmfYgCWsEQAK7w/xM9zClVY/DDlFJxFtYq
 DZJ4faPj+E3FMTiJIEDzjtRgQvOFE+wmJtntYsCqKH/QZmpnyk9jeT/CbJzEEL2k
 WsAk+qHGLcVUlSb36blwN1RFzYqC+IDYThewJqUvxDbOwL1AbiibbX7gplzZHLhW
 +rj3ScVlSNOPRDgGGpkAeLNNsttuKvsGo7nB/JZopm0tV6g14rSK09wQbVhv6S6T
 Lu7VGYqnJlkteL9YlzRiROf9hW2ZFCMGJz1YZydPTy3aX3hGTBX4w2qvmsPwBIKP
 kW/gCNisVJGk1cZCk4joSJ8i/b3x3fE0zdZ5waivJ5jDvYbUUfyk0KtJkfw207Rl
 14yWitUC6aeVuCeOqXHgsjRi+1QVN9Pg7i49xgGiUN1igQiUYRTgQPWZxDv6Zo/s
 USrLFQBaRd+hJw+dl7A47lJ3mUF96tPCoQb4LCQ7DVsg5U4J2TvqXLH9Gek/CCZ4
 QsMkZDTQlZw4+JEDlzBgg/L7xVty8DadplTADMdjaRhFU3y8zKNJ85Ileokt7KVt
 IsBT4+S5HeZLvinZY95932DwAmFp1DtsyENd1BUXL06ddyvlQrFJ6NQaXji4fuDc
 EVQmMoTAqDujZFupMAux9vkUBDFj/hmaVD5F7j3+MWP87OCritw/IZn+2LgTaKoX
 EmttaYrDr2jJwIaGyw+H
 =a2/L
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "Outside of the new ACPI-NFIT hot-add support this pull request is more
  notable for what it does not contain, than what it does.  There were a
  handful of development topics this cycle, dax get_user_pages, dax
  fsync, and raw block dax, that need more more iteration and will wait
  for 4.5.

  The patches to make devm and the pmem driver NUMA aware have been in
  -next for several weeks.  The hot-add support has not, but is
  contained to the NFIT driver and is passing unit tests.  The coredump
  support is straightforward and was looked over by Jeff.  All of it has
  received a 0day build success notification across 107 configs.

  Summary:

   - Add support for the ACPI 6.0 NFIT hot add mechanism to process
     updates of the NFIT at runtime.

   - Teach the coredump implementation how to filter out DAX mappings.

   - Introduce NUMA hints for allocations made by the pmem driver, and
     as a side effect all devm allocations now hint their NUMA node by
     default"

* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  coredump: add DAX filtering for FDPIC ELF coredumps
  coredump: add DAX filtering for ELF coredumps
  acpi: nfit: Add support for hot-add
  nfit: in acpi_nfit_init, break on a 0-length table
  pmem, memremap: convert to numa aware allocations
  devm_memremap_pages: use numa_mem_id
  devm: make allocations numa aware by default
  devm_memremap: convert to return ERR_PTR
  devm_memunmap: use devres_release()
  pmem: kill memremap_pmem()
  x86, mm: quiet arch_add_memory()
2015-11-10 12:07:22 -08:00
Paolo Bonzini
a96036b8ef KVM: x86: rename update_db_bp_intercept to update_bp_intercept
Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:25 +01:00
Paolo Bonzini
cbdb967af3 KVM: svm: unconditionally intercept #DB
This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.

Reported-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:24 +01:00
Eric Northup
54a20552e1 KVM: x86: work around infinite loop in microcode when #AC is delivered
It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions.  This causes the
microcode to enter an infinite loop where the core never receives
another interrupt.  The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).

Signed-off-by: Eric Northup <digitaleric@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:24 +01:00
Haozhong Zhang
8cfe986696 KVM: VMX: Dump TSC multiplier in dump_vmcs()
This patch enhances dump_vmcs() to dump the value of TSC multiplier
field in VMCS.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:22 +01:00
Haozhong Zhang
be7b263ea9 KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:21 +01:00
Haozhong Zhang
ff2c3a1803 KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:20 +01:00
Haozhong Zhang
64903d6195 KVM: VMX: Enable and initialize VMX TSC scaling
This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:19 +01:00
Haozhong Zhang
27cca94e03 KVM: x86: Use the correct vcpu's TSC rate to compute time scale
This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:18 +01:00
Haozhong Zhang
4ba76538dd KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:18 +01:00
Haozhong Zhang
58ea676787 KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:17 +01:00
Haozhong Zhang
07c1419a32 KVM: x86: Replace call-back compute_tsc_offset() with a common function
Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:16 +01:00
Haozhong Zhang
381d585c80 KVM: x86: Replace call-back set_tsc_khz() with a common function
Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:16 +01:00
Haozhong Zhang
35181e86df KVM: x86: Add a common TSC scaling function
VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr.  Remove
 BUG_ON.  - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:15 +01:00
Haozhong Zhang
ad721883e9 KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:14 +01:00
Haozhong Zhang
bc9b961b35 KVM: x86: Collect information for setting TSC scaling ratio
The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:14 +01:00
Paolo Bonzini
893590c734 KVM: x86: declare a few variables as __read_mostly
These include module parameters and variables that are set by
kvm_x86_ops->hardware_setup.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:13 +01:00
Paolo Bonzini
450869d6db KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
They are exactly the same, except that handle_mmio_page_fault
has an unused argument and a call to WARN_ON.  Remove the unused
argument from the callers, and move the warning to (the former)
handle_mmio_page_fault_common.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:03 +01:00
Nicolas Pitre
77c5b5da02 kmap_atomic_to_page() has no users, remove it
Removal started in commit 5bbeed12bd ("sparc32: drop unused
kmap_atomic_to_page").  Let's do it across the whole tree.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Dan Williams
85ce230051 Merge branch 'for-4.4/hotplug' into libnvdimm-for-next 2015-11-09 13:29:39 -05:00
Linus Torvalds
ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds
99aaa9c64b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
 "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
  (as in such case it's possible for module struct to share a page with
  executable text, which is currently not being handled with grace) from
  Josh Poimboeuf"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
2015-11-07 12:15:17 -08:00
Borislav Petkov
79f1d83692 x86/paravirt: Kill some unused patching functions
paravirt_patch_ignore() is completely unused and paravirt_patch_nop()
doesn't do a whole lot. Remove them both.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1446542329-32037-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 18:09:35 +01:00
Borislav Petkov
04633df0c4 x86/cpu: Call verify_cpu() after having entered long mode too
When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

  # rdmsr -a 0x1a0
  400850089
  850089
  850089
  850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.

Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:45:02 +01:00
Krzysztof Mazur
68accac392 x86/setup: Fix low identity map for >= 2GB kernel range
The commit f5f3497cad extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).

Fixes: f5f3497cad "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:39:40 +01:00
Boris Ostrovsky
f4e342c877 x86/mm: Skip the hypervisor range when walking PGD
The range between 0xffff800000000000 and 0xffff87ffffffffff is reserved
for hypervisor and therefore we should not try to follow PGD's indexes
corresponding to those addresses.

While this has always been a problem, with the new W+X warning
mechanism ptdump_walk_pgd_level_core() can now be called during boot,
causing a PV Xen guest to crash.

[ tglx: Replaced the macro with a readable inline ]

Fixes: e1a58320a3 "x86/mm: Warn on W^X mappings"
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel@lists.xen.org
Link: http://lkml.kernel.org/r/1446749795-27764-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:39:39 +01:00
Aravind Gopalakrishnan
3849e91f57 x86/AMD: Fix last level cache topology for AMD Fam17h systems
On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.

We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Link: http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:37:51 +01:00
Vitaly Kuznetsov
8c058b0b9c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
Commit d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain
interfaces") brought a regression for Hyper-V Gen2 instances. These
instances don't have i8259 legacy PIC but they use legacy IRQs for serial
port, rtc, and acpi. With this commit included we end up with these IRQs
not initialized. Earlier, there was a special workaround for legacy IRQs
in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at
nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when
irq_domain_alloc_descs() returns -EEXIST.

The essence of the issue seems to be that early_irq_init() calls
arch_probe_nr_irqs() to figure out the number of legacy IRQs before
we probe for i8259 and gets 16. Later when init_8259A() is called we switch
to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already
have 16 descs allocated.

Solve the issue by separating i8259 probe from init and calling it in
arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information.

Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:37:37 +01:00
Andy Shevchenko
354dbaa7ff x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
The Intel Merrifield SoC is a successor of the Intel MID line of
SoCs. Let's set the neccessary capability for that chip. See commit
c54fdbb282 (x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3)
for the details.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: http://lkml.kernel.org/r/1444319786-36125-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:37:30 +01:00
Martin Kepplinger
78e3c79510 arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extension
Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman
d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Richard Weinberger
1b2411c283 um: Simplify STUB_DATA loading
As long STUB_DATA fits into 32bits we can use a plain mov.
If it will grow at some point in future we will switch to movabsq.
In any case the code is smaller and more easy to read
than the current one

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-11-06 22:49:11 +01:00
Richard Weinberger
246d254f1a um: Remove dead symbol from i386 syscall stub
syscall_stub is nowhere used these days.

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-11-06 22:49:11 +01:00
Richard Weinberger
ec2c6c01ff um: Remove dead code from x86_64 syscall stub
syscall_stub is dead code as um is using only
batch_syscall_stub.

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-11-06 22:49:11 +01:00
Linus Torvalds
22402cd0af Most of the changes are clean ups and small fixes. Some of them have
stable tags to them. I searched through my INBOX just as the merge window
 opened and found lots of patches to pull. I ran them through all my tests
 and they were in linux-next for a few days.
 
 Features added this release:
 ----------------------------
 
  o Module globbing. You can now filter function tracing to several
    modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
 
  o Tracer specific options are now visible even when the tracer is not
    active. It was rather annoying that you can only see and modify tracer
    options after enabling the tracer. Now they are in the options/ directory
    even when the tracer is not active. Although they are still only visible
    when the tracer is active in the trace_options file.
 
  o Trace options are now per instance (although some of the tracer specific
    options are global)
 
  o New tracefs file: set_event_pid. If any pid is added to this file, then
    all events in the instance will filter out events that are not part of
    this pid. sched_switch and sched_wakeup events handle next and the wakee
    pids.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWPLQ5AAoJEKKk/i67LK/8CTYIAI1u8DE5QCzv3J0p54jVpNVR
 J5FqEU3eXIzd6FS4JXD4nxCeMpUZAy21YnhlZpsnrbJJM5bc9bUsBCwiKKM+MuSZ
 ztmy2sgYKkO0h/KUdhNgYJrzis3/Ojquyx9iAqK5ST/Fr+nKYx81akFKjNK53iur
 RJRut45sSa8rv11LaL8sgJ6hAWQTc+YkybUdZ5xaMdJmZ6A61T7Y6VzTjbUexuvL
 hntCfTjYLtVd8dbfknAnf3B7n/VOO3IFF85wr7ciYR5oEVfPrF8tHmJBlhHExPpX
 kaXAiDDRY/UTg/5DQqnp4zmxJoR5BQ2l4pT5PwiLcnwhcphIDNYS8EYUmOYAWjU=
 =TjOE
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracking updates from Steven Rostedt:
 "Most of the changes are clean ups and small fixes.  Some of them have
  stable tags to them.  I searched through my INBOX just as the merge
  window opened and found lots of patches to pull.  I ran them through
  all my tests and they were in linux-next for a few days.

  Features added this release:
  ----------------------------

   - Module globbing.  You can now filter function tracing to several
     modules.  # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)

   - Tracer specific options are now visible even when the tracer is not
     active.  It was rather annoying that you can only see and modify
     tracer options after enabling the tracer.  Now they are in the
     options/ directory even when the tracer is not active.  Although
     they are still only visible when the tracer is active in the
     trace_options file.

   - Trace options are now per instance (although some of the tracer
     specific options are global)

   - New tracefs file: set_event_pid.  If any pid is added to this file,
     then all events in the instance will filter out events that are not
     part of this pid.  sched_switch and sched_wakeup events handle next
     and the wakee pids"

* tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
  tracefs: Fix refcount imbalance in start_creating()
  tracing: Put back comma for empty fields in boot string parsing
  tracing: Apply tracer specific options from kernel command line.
  tracing: Add some documentation about set_event_pid
  ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
  tracing: Allow dumping traces without tracking trace started cpus
  ring_buffer: Fix more races when terminating the producer in the benchmark
  ring_buffer: Do no not complete benchmark reader too early
  tracing: Remove redundant TP_ARGS redefining
  tracing: Rename max_stack_lock to stack_trace_max_lock
  tracing: Allow arch-specific stack tracer
  recordmcount: arm64: Replace the ignored mcount call into nop
  recordmcount: Fix endianness handling bug for nop_mcount
  tracepoints: Fix documentation of RCU lockdep checks
  tracing: ftrace_event_is_function() can return boolean
  tracing: is_legal_op() can return boolean
  ring-buffer: rb_event_is_commit() can return boolean
  ring-buffer: rb_per_cpu_empty() can return boolean
  ring_buffer: ring_buffer_empty{cpu}() can return boolean
  ring-buffer: rb_is_reader_page() can return boolean
  ...
2015-11-06 13:30:20 -08:00
Linus Torvalds
3c87b79188 PCI changes for the v4.4 merge window:
Resource management
     Add support for Enhanced Allocation devices (Sean O. Stalley)
     Add Enhanced Allocation register entries (Sean O. Stalley)
     Handle IORESOURCE_PCI_FIXED when sizing resources (David Daney)
     Handle IORESOURCE_PCI_FIXED when assigning resources (David Daney)
     Handle Enhanced Allocation capability for SR-IOV devices (David Daney)
     Clear IORESOURCE_UNSET when reverting to firmware-assigned address (Bjorn Helgaas)
     Make Enhanced Allocation bitmasks more obvious (Bjorn Helgaas)
     Expand Enhanced Allocation BAR output (Bjorn Helgaas)
     Add of_pci_check_probe_only to parse "linux,pci-probe-only" (Marc Zyngier)
     Fix lookup of linux,pci-probe-only property (Marc Zyngier)
     Add sparc mem64 resource parsing for root bus (Yinghai Lu)
 
   PCI device hotplug
     pciehp: Queue power work requests in dedicated function (Guenter Roeck)
 
   Driver binding
     Add builtin_pci_driver() to avoid registration boilerplate (Paul Gortmaker)
 
   Virtualization
     Set SR-IOV NumVFs to zero after enumeration (Alexander Duyck)
     Remove redundant validation of SR-IOV offset/stride registers (Alexander Duyck)
     Remove VFs in reverse order if virtfn_add() fails (Alexander Duyck)
     Reorder pcibios_sriov_disable() (Alexander Duyck)
     Wait 1 second between disabling VFs and clearing NumVFs (Alexander Duyck)
     Fix sriov_enable() error path for pcibios_enable_sriov() failures (Alexander Duyck)
     Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs (Ben Shelton)
     Don't try to restore VF BARs (Wei Yang)
 
   MSI
     Don't alloc pcibios-irq when MSI is enabled (Joerg Roedel)
     Add msi_controller setup_irqs() method for special multivector setup (Lucas Stach)
     Export all remapped MSIs to sysfs attributes (Romain Bezut)
     Disable MSI on SiS 761 (Ondrej Zary)
 
   AER
     Clear error status registers during enumeration and restore (Taku Izumi)
 
   Generic host bridge driver
     Fix lookup of linux,pci-probe-only property (Marc Zyngier)
     Allow multiple hosts with different map_bus() methods (David Daney)
     Pass starting bus number to pci_scan_root_bus() (David Daney)
     Fix address window calculation for non-zero starting bus (David Daney)
 
   Altera host bridge driver
     Add msi.h to ARM Kbuild (Ley Foon Tan)
     Add Altera PCIe host controller driver (Ley Foon Tan)
     Add Altera PCIe MSI driver (Ley Foon Tan)
 
   APM X-Gene host bridge driver
     Remove msi_controller assignment (Duc Dang)
 
   Broadcom iProc host bridge driver
     Fix header comment "Corporation" misspelling (Florian Fainelli)
     Fix code comment to match code (Ray Jui)
     Remove unused struct iproc_pcie.irqs[] (Ray Jui)
     Call pci_fixup_irqs() for ARM64 as well as ARM (Ray Jui)
     Fix PCIe reset logic (Ray Jui)
     Improve link detection logic (Ray Jui)
     Update PCIe device tree bindings (Ray Jui)
     Add outbound mapping support (Ray Jui)
 
   Freescale i.MX6 host bridge driver
     Return real error code from imx6_add_pcie_port() (Fabio Estevam)
     Add PCIE_PHY_RX_ASIC_OUT_VALID definition (Fabio Estevam)
 
   Freescale Layerscape host bridge driver
     Remove ls_pcie_establish_link() (Minghuan Lian)
     Ignore PCIe controllers in Endpoint mode (Minghuan Lian)
     Factor out SCFG related function (Minghuan Lian)
     Update ls_add_pcie_port() (Minghuan Lian)
     Remove unused fields from struct ls_pcie (Minghuan Lian)
     Add support for LS1043a and LS2080a (Minghuan Lian)
     Add ls_pcie_msi_host_init() (Minghuan Lian)
 
   HiSilicon host bridge driver
     Add HiSilicon SoC Hip05 PCIe driver (Zhou Wang)
 
   Marvell MVEBU host bridge driver
     Return zero for reserved or unimplemented config space (Russell King)
     Use exact config access size; don't read/modify/write (Russell King)
     Use of_get_available_child_count() (Russell King)
     Use for_each_available_child_of_node() to walk child nodes (Russell King)
     Report full node name when reporting a DT error (Russell King)
     Use port->name rather than "PCIe%d.%d" (Russell King)
     Move port parsing and resource claiming to  separate function (Russell King)
     Fix memory leaks and refcount leaks (Russell King)
     Split port parsing and resource claiming from  port setup (Russell King)
     Use gpio_set_value_cansleep() (Russell King)
     Use devm_kcalloc() to allocate an array (Russell King)
     Use gpio_desc to carry around gpio (Russell King)
     Improve clock/reset handling (Russell King)
     Add PCI Express root complex capability block (Russell King)
     Remove code restricting accesses to slot 0 (Russell King)
 
   NVIDIA Tegra host bridge driver
     Wrap static pgprot_t initializer with __pgprot() (Ard Biesheuvel)
 
   Renesas R-Car host bridge driver
     Build pci-rcar-gen2.c only on ARM (Geert Uytterhoeven)
     Build pcie-rcar.c only on ARM (Geert Uytterhoeven)
     Make PCI aware of the I/O resources (Phil Edworthy)
     Remove dependency on ARM-specific struct hw_pci (Phil Edworthy)
     Set root bus nr to that provided in DT (Phil Edworthy)
     Fix I/O offset for multiple host bridges (Phil Edworthy)
 
   ST Microelectronics SPEAr13xx host bridge driver
     Fix dw_pcie_cfg_read/write() usage (Gabriele Paoloni)
 
   Synopsys DesignWare host bridge driver
     Make "clocks" and "clock-names" optional DT properties (Bhupesh Sharma)
     Use exact access size in dw_pcie_cfg_read() (Gabriele Paoloni)
     Simplify dw_pcie_cfg_read/write() interfaces (Gabriele Paoloni)
     Require config accesses to be naturally aligned (Gabriele Paoloni)
     Make "num-lanes" an optional DT property (Gabriele Paoloni)
     Move calculation of bus addresses to DRA7xx (Gabriele Paoloni)
     Replace ARM pci_sys_data->align_resource with global function pointer (Gabriele Paoloni)
     Factor out MSI msg setup (Lucas Stach)
     Implement multivector MSI IRQ setup (Lucas Stach)
     Make get_msi_addr() return phys_addr_t, not u32 (Lucas Stach)
     Set up high part of MSI target address (Lucas Stach)
     Fix PORT_LOGIC_LINK_WIDTH_MASK (Zhou Wang)
     Revert "PCI: designware: Program ATU with untranslated address" (Zhou Wang)
     Use of_pci_get_host_bridge_resources() to parse DT (Zhou Wang)
     Make driver arch-agnostic (Zhou Wang)
 
   Miscellaneous
     Make x86 pci_subsys_init() static (Alexander Kuleshov)
     Turn off Request Attributes to avoid Chelsio T5 Completion erratum (Hariprasad Shenai)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPM9aAAoJEFmIoMA60/r89f8QALFMHegqKk5M08ZcCaG7unLF
 5U5t88y6KF/kNYF6HOqLQiHh79U3ToU5HdrNaAtutr0UnvgbFC2WulLqKYgLiq6Y
 YJnwR3EfgGmdG7DKAVAXq19+nc2hgTPAEe8sciU7HKlTbqQmGj6//y3sQULGNLx3
 zur0C33DCtrDgKDP7to273lkHO8Vl0YuLzyqwt0ePCMNcXR0h1dK8QxSTjuXBxaR
 c+T1V1Hw64MTxLz3xJd1/1ipy32u+9LnqqcdRz0zRq6qi48G9ch/i4Z6DHa8kTbj
 DUZrrTYKILQ2TcjcZSBmTueX11Z1Xa4/I/45sehIi6gVWL9qQbmGpt2E5YtFED+4
 GdcmBSbWG/qsNsabXk38uiM3ww7+ltXEOhTXbcK+EgjvIhE6gSK/plYG0fU9pybs
 AKViEXVdHoT1X0N1dLK12mq7kvDCQvShHn08lz97Q9YrZ32wv1Fnij6WVSbJvfWt
 DubtPtisVM+rVy+VTpOImNR9wO54lTmG5jK53yNqH7I20K89y1kqARlN9nMXMB1a
 2nQnwe9yWlsGj9gVNCn1KmyQSPOWjg+3Z+ekfwbxpca14s1AaN3Jm0N9Z61dXFoF
 y2ygoQtZ/z9BHr3quBpxXGt+aVUg2kcNw5GYeDYiALxXdJSObyzRrZ6HDb/zicU2
 ZH9hBj0ctXvucmy6I2mt
 =uZrt
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Resource management:
   - Add support for Enhanced Allocation devices (Sean O. Stalley)
   - Add Enhanced Allocation register entries (Sean O. Stalley)
   - Handle IORESOURCE_PCI_FIXED when sizing resources (David Daney)
   - Handle IORESOURCE_PCI_FIXED when assigning resources (David Daney)
   - Handle Enhanced Allocation capability for SR-IOV devices (David Daney)
   - Clear IORESOURCE_UNSET when reverting to firmware-assigned address (Bjorn Helgaas)
   - Make Enhanced Allocation bitmasks more obvious (Bjorn Helgaas)
   - Expand Enhanced Allocation BAR output (Bjorn Helgaas)
   - Add of_pci_check_probe_only to parse "linux,pci-probe-only" (Marc Zyngier)
   - Fix lookup of linux,pci-probe-only property (Marc Zyngier)
   - Add sparc mem64 resource parsing for root bus (Yinghai Lu)

  PCI device hotplug:
   - pciehp: Queue power work requests in dedicated function (Guenter Roeck)

  Driver binding:
   - Add builtin_pci_driver() to avoid registration boilerplate (Paul Gortmaker)

  Virtualization:
   - Set SR-IOV NumVFs to zero after enumeration (Alexander Duyck)
   - Remove redundant validation of SR-IOV offset/stride registers (Alexander Duyck)
   - Remove VFs in reverse order if virtfn_add() fails (Alexander Duyck)
   - Reorder pcibios_sriov_disable() (Alexander Duyck)
   - Wait 1 second between disabling VFs and clearing NumVFs (Alexander Duyck)
   - Fix sriov_enable() error path for pcibios_enable_sriov() failures (Alexander Duyck)
   - Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs (Ben Shelton)
   - Don't try to restore VF BARs (Wei Yang)

  MSI:
   - Don't alloc pcibios-irq when MSI is enabled (Joerg Roedel)
   - Add msi_controller setup_irqs() method for special multivector setup (Lucas Stach)
   - Export all remapped MSIs to sysfs attributes (Romain Bezut)
   - Disable MSI on SiS 761 (Ondrej Zary)

  AER:
   - Clear error status registers during enumeration and restore (Taku Izumi)

  Generic host bridge driver:
   - Fix lookup of linux,pci-probe-only property (Marc Zyngier)
   - Allow multiple hosts with different map_bus() methods (David Daney)
   - Pass starting bus number to pci_scan_root_bus() (David Daney)
   - Fix address window calculation for non-zero starting bus (David Daney)

  Altera host bridge driver:
   - Add msi.h to ARM Kbuild (Ley Foon Tan)
   - Add Altera PCIe host controller driver (Ley Foon Tan)
   - Add Altera PCIe MSI driver (Ley Foon Tan)

  APM X-Gene host bridge driver:
   - Remove msi_controller assignment (Duc Dang)

  Broadcom iProc host bridge driver:
   - Fix header comment "Corporation" misspelling (Florian Fainelli)
   - Fix code comment to match code (Ray Jui)
   - Remove unused struct iproc_pcie.irqs[] (Ray Jui)
   - Call pci_fixup_irqs() for ARM64 as well as ARM (Ray Jui)
   - Fix PCIe reset logic (Ray Jui)
   - Improve link detection logic (Ray Jui)
   - Update PCIe device tree bindings (Ray Jui)
   - Add outbound mapping support (Ray Jui)

  Freescale i.MX6 host bridge driver:
   - Return real error code from imx6_add_pcie_port() (Fabio Estevam)
   - Add PCIE_PHY_RX_ASIC_OUT_VALID definition (Fabio Estevam)

  Freescale Layerscape host bridge driver:
   - Remove ls_pcie_establish_link() (Minghuan Lian)
   - Ignore PCIe controllers in Endpoint mode (Minghuan Lian)
   - Factor out SCFG related function (Minghuan Lian)
   - Update ls_add_pcie_port() (Minghuan Lian)
   - Remove unused fields from struct ls_pcie (Minghuan Lian)
   - Add support for LS1043a and LS2080a (Minghuan Lian)
   - Add ls_pcie_msi_host_init() (Minghuan Lian)

  HiSilicon host bridge driver:
   - Add HiSilicon SoC Hip05 PCIe driver (Zhou Wang)

  Marvell MVEBU host bridge driver:
   - Return zero for reserved or unimplemented config space (Russell King)
   - Use exact config access size; don't read/modify/write (Russell King)
   - Use of_get_available_child_count() (Russell King)
   - Use for_each_available_child_of_node() to walk child nodes (Russell King)
   - Report full node name when reporting a DT error (Russell King)
   - Use port->name rather than "PCIe%d.%d" (Russell King)
   - Move port parsing and resource claiming to  separate function (Russell King)
   - Fix memory leaks and refcount leaks (Russell King)
   - Split port parsing and resource claiming from  port setup (Russell King)
   - Use gpio_set_value_cansleep() (Russell King)
   - Use devm_kcalloc() to allocate an array (Russell King)
   - Use gpio_desc to carry around gpio (Russell King)
   - Improve clock/reset handling (Russell King)
   - Add PCI Express root complex capability block (Russell King)
   - Remove code restricting accesses to slot 0 (Russell King)

  NVIDIA Tegra host bridge driver:
   - Wrap static pgprot_t initializer with __pgprot() (Ard Biesheuvel)

  Renesas R-Car host bridge driver:
   - Build pci-rcar-gen2.c only on ARM (Geert Uytterhoeven)
   - Build pcie-rcar.c only on ARM (Geert Uytterhoeven)
   - Make PCI aware of the I/O resources (Phil Edworthy)
   - Remove dependency on ARM-specific struct hw_pci (Phil Edworthy)
   - Set root bus nr to that provided in DT (Phil Edworthy)
   - Fix I/O offset for multiple host bridges (Phil Edworthy)

  ST Microelectronics SPEAr13xx host bridge driver:
   - Fix dw_pcie_cfg_read/write() usage (Gabriele Paoloni)

  Synopsys DesignWare host bridge driver:
   - Make "clocks" and "clock-names" optional DT properties (Bhupesh Sharma)
   - Use exact access size in dw_pcie_cfg_read() (Gabriele Paoloni)
   - Simplify dw_pcie_cfg_read/write() interfaces (Gabriele Paoloni)
   - Require config accesses to be naturally aligned (Gabriele Paoloni)
   - Make "num-lanes" an optional DT property (Gabriele Paoloni)
   - Move calculation of bus addresses to DRA7xx (Gabriele Paoloni)
   - Replace ARM pci_sys_data->align_resource with global function pointer (Gabriele Paoloni)
   - Factor out MSI msg setup (Lucas Stach)
   - Implement multivector MSI IRQ setup (Lucas Stach)
   - Make get_msi_addr() return phys_addr_t, not u32 (Lucas Stach)
   - Set up high part of MSI target address (Lucas Stach)
   - Fix PORT_LOGIC_LINK_WIDTH_MASK (Zhou Wang)
   - Revert "PCI: designware: Program ATU with untranslated address" (Zhou Wang)
   - Use of_pci_get_host_bridge_resources() to parse DT (Zhou Wang)
   - Make driver arch-agnostic (Zhou Wang)

  Miscellaneous:
   - Make x86 pci_subsys_init() static (Alexander Kuleshov)
   - Turn off Request Attributes to avoid Chelsio T5 Completion erratum (Hariprasad Shenai)"

* tag 'pci-v4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
  PCI: altera: Add Altera PCIe MSI driver
  PCI: hisi: Add HiSilicon SoC Hip05 PCIe driver
  PCI: layerscape: Add ls_pcie_msi_host_init()
  PCI: layerscape: Add support for LS1043a and LS2080a
  PCI: layerscape: Remove unused fields from struct ls_pcie
  PCI: layerscape: Update ls_add_pcie_port()
  PCI: layerscape: Factor out SCFG related function
  PCI: layerscape: Ignore PCIe controllers in Endpoint mode
  PCI: layerscape: Remove ls_pcie_establish_link()
  PCI: designware: Make "clocks" and "clock-names" optional DT properties
  PCI: designware: Make driver arch-agnostic
  ARM/PCI: Replace pci_sys_data->align_resource with global function pointer
  PCI: designware: Use of_pci_get_host_bridge_resources() to parse DT
  Revert "PCI: designware: Program ATU with untranslated address"
  PCI: designware: Move calculation of bus addresses to DRA7xx
  PCI: designware: Make "num-lanes" an optional DT property
  PCI: designware: Require config accesses to be naturally aligned
  PCI: designware: Simplify dw_pcie_cfg_read/write() interfaces
  PCI: designware: Use exact access size in dw_pcie_cfg_read()
  PCI: spear: Fix dw_pcie_cfg_read/write() usage
  ...
2015-11-06 11:29:53 -08:00
Linus Torvalds
54727e6e95 x86: don't make DEBUG_WX default to 'y' even with DEBUG_RODATA
It turns out that we still have issues with the EFI memory map that ends
up polluting our kernel page tables with writable executable pages.

That will get sorted out, but in the meantime let's not make the scary
complaint about them be on by default.  The code is useful for
developers, but not ready for end user testing yet.

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 09:12:41 -08:00
Josh Poimboeuf
e2391a2dca livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
When loading a patch module on a kernel with
!CONFIG_DEBUG_SET_MODULE_RONX, the following crash occurs:

  [  205.988776] livepatch: enabling patch 'kpatch_meminfo_string'
  [  205.989829] BUG: unable to handle kernel paging request at ffffffffa08d2fc0
  [  205.989863] IP: [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
  [  205.989888] PGD 1a10067 PUD 1a11063 PMD 7bcde067 PTE 3740e161
  [  205.989915] Oops: 0003 [#1] SMP
  [  205.990187] CPU: 2 PID: 14570 Comm: insmod Tainted: G           O  K 4.1.12
  [  205.990214] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
  [  205.990249] task: ffff8800374aaa90 ti: ffff8800794b8000 task.ti: ffff8800794b8000
  [  205.990276] RIP: 0010:[<ffffffff8154fecb>]  [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
  [  205.990307] RSP: 0018:ffff8800794bbd58  EFLAGS: 00010246
  [  205.990327] RAX: 0000000000000000 RBX: ffffffffa08d2fc0 RCX: 0000000000000000
  [  205.990356] RDX: 01ffff8000000080 RSI: 0000000000000000 RDI: ffffffff81a54b40
  [  205.990382] RBP: ffff88007b4c4d80 R08: 0000000000000007 R09: 0000000000000000
  [  205.990408] R10: 0000000000000008 R11: ffffea0001f18840 R12: 0000000000000000
  [  205.990433] R13: 0000000000000001 R14: ffffffffa08d2fc0 R15: ffff88007bd0bc40
  [  205.990459] FS:  00007f1128fbc700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
  [  205.990488] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  205.990509] CR2: ffffffffa08d2fc0 CR3: 000000002606e000 CR4: 00000000001406e0
  [  205.990536] Stack:
  [  205.990545]  ffff8800794bbec8 0000000000000001 ffffffffa08d3010 ffffffff810ecea9
  [  205.990576]  ffffffff810e8e40 000000000005f360 ffff88007bd0bc50 ffffffffa08d3240
  [  205.990608]  ffffffffa08d52c0 ffffffffa08d3210 ffff8800794bbed8 ffff8800794bbf1c
  [  205.990639] Call Trace:
  [  205.990651]  [<ffffffff810ecea9>] ? load_module+0x1e59/0x23a0
  [  205.990672]  [<ffffffff810e8e40>] ? store_uevent+0x40/0x40
  [  205.990693]  [<ffffffff810e99b5>] ? copy_module_from_fd.isra.49+0xb5/0x140
  [  205.990718]  [<ffffffff810ed5bd>] ? SyS_finit_module+0x7d/0xa0
  [  205.990741]  [<ffffffff81556832>] ? system_call_fastpath+0x16/0x75
  [  205.990763] Code: f9 00 00 00 74 23 49 c7 c0 92 e1 60 81 48 8d 53 18 89 c1 4c 89 c6 48 c7 c7 f0 85 7d 81 31 c0 e8 71 fa ff ff e8 58 0e 00 00 31 f6 <c7> 03 00 00 00 00 48 89 da 48 c7 c7 20 c7 a5 81 e8 d0 ec b3 ff
  [  205.990916] RIP  [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
  [  205.990940]  RSP <ffff8800794bbd58>
  [  205.990953] CR2: ffffffffa08d2fc0

With !CONFIG_DEBUG_SET_MODULE_RONX, module text and rodata pages are
writable, and the debug_align() macro allows the module struct to share
a page with executable text.  When klp_write_module_reloc() calls
set_memory_ro() on the page, it effectively turns the module struct into
a read-only structure, resulting in a page fault when load_module() does
"mod->state = MODULE_STATE_LIVE".

Reported-by: Cyril B. <cbay@alwaysdata.com>
Tested-by: Cyril B. <cbay@alwaysdata.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-11-06 11:10:03 +01:00
Linus Torvalds
2e3078af2c Merge branch 'akpm' (patches from Andrew)
Merge patch-bomb from Andrew Morton:

 - inotify tweaks

 - some ocfs2 updates (many more are awaiting review)

 - various misc bits

 - kernel/watchdog.c updates

 - Some of mm.  I have a huge number of MM patches this time and quite a
   lot of it is quite difficult and much will be held over to next time.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  selftests: vm: add tests for lock on fault
  mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
  mm: introduce VM_LOCKONFAULT
  mm: mlock: add new mlock system call
  mm: mlock: refactor mlock, munlock, and munlockall code
  kasan: always taint kernel on report
  mm, slub, kasan: enable user tracking by default with KASAN=y
  kasan: use IS_ALIGNED in memory_is_poisoned_8()
  kasan: Fix a type conversion error
  lib: test_kasan: add some testcases
  kasan: update reference to kasan prototype repo
  kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
  kasan: various fixes in documentation
  kasan: update log messages
  kasan: accurately determine the type of the bad access
  kasan: update reported bug types for kernel memory accesses
  kasan: update reported bug types for not user nor kernel memory accesses
  mm/kasan: prevent deadlock in kasan reporting
  mm/kasan: don't use kasan shadow pointer in generic functions
  mm/kasan: MODULE_VADDR is not available on all archs
  ...
2015-11-05 23:10:54 -08:00
Eric B Munson
a8ca5d0ecb mm: mlock: add new mlock system call
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add a
new mlock state making it useful.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Andrey Konovalov
c63f06dd15 kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
Move KASAN_SANITIZE in arch/x86/boot/Makefile above the comment
related to SVGA_MODE, since the comment refers to 'the next line'.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Andrey Konovalov
25add7ec70 kasan: update log messages
We decided to use KASAN as the short name of the tool and
KernelAddressSanitizer as the full one.  Update log messages according to
that.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Linus Torvalds
933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Thomas Gleixner
72613184a1 x86/smp: Remove single IPI wrapper
All APIC implementation have send_IPI now. Remove the conditional in
the calling code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.807817097@linutronix.de
2015-11-05 13:07:54 +01:00
Thomas Gleixner
6153058a03 x86/apic: Use default send single IPI wrapper
Wire up the default_send_IPI_single() wrapper to the last holdouts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.711224890@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner
7e29393b20 x86/apic: Provide default send single IPI wrapper
Instead of doing the wrapping in the smp code we can provide a default
wrapper for those APICs which insist on cpumasks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.631111846@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner
4727da2eb1 x86/apic: Implement single IPI for apic_noop
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.455429817@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner
c61a0d31ba x86/apic: Wire up single IPI for apic_numachip
The function already exists.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.551445489@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner
8642ea953d x86/apic: Wire up single IPI for x2apic_uv
The function already exists.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.376775625@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner
f2bffe8a3e x86/apic: Implement single IPI for x2apic_phys
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.296438009@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner
5789a12e28 x86/apic: Wire up single IPI for bigsmp_apic
Use the default implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.213292642@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner
500bd02fb1 x86/apic: Remove pointless indirections from bigsmp_apic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.133086575@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner
68cd88ff8d x86/apic: Wire up single IPI for apic_physflat
Use the default implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.055046864@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner
449112f4f3 x86/apic: Remove pointless indirections from apic_physflat
No value in having 32 byte extra text.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.975653382@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner
53be0fac8b x86/apic: Implement default single target IPI function
apic_physflat and bigsmp_apic can share that implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.898543767@linutronix.de
2015-11-05 13:07:52 +01:00
Linus Torvalds
7b6ce46cb3 x86/apic: Implement single target IPI function for x2apic_cluster
[ tglx: Split it out from the patch which provides the new callback ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.817975597@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-05 13:07:52 +01:00
Linus Torvalds
539da78772 x86/apic: Add a single-target IPI function to the apic
We still fall back on the "send mask" versions if an apic definition
doesn't have the single-target version, but at least this allows the
(trivial) case for the common clustered x2apic case.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.737120838@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-05 13:07:51 +01:00
Kai Huang
a3eaa8649e KVM: VMX: Fix commit which broke PML
I found PML was broken since below commit:

	commit feda805fe7
	Author: Xiao Guangrong <guangrong.xiao@linux.intel.com>
	Date:   Wed Sep 9 14:05:55 2015 +0800

	KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update

	Unify the update in vmx_cpuid_update()

	Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
	[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

The reason is in above commit vmx_cpuid_update calls vmx_secondary_exec_control,
in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally (as
PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called after
vcpu is created, PML will be disabled unexpectedly while log-dirty code still
thinks PML is used.

Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control
only when PML is not supported or not enabled (!enable_pml). This is more
reasonable as PML is currently either always enabled or disabled. With this
explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not
needed so also rename vmx_enable{disable}_pml to vmx_create{destroy}_pml_buffer.

Fixes: feda805fe7
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
[While at it, change a wrong ASSERT to an "if".  The condition can happen
 if creating the VCPU fails with ENOMEM. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 11:34:11 +01:00
Linus Torvalds
0d51ce9ca1 Power management and ACPI updates for v4.4-rc1
- ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).
 
    The most significant change is to allow the AML debugger to be
    built into the kernel.  On top of that there is an update related
    to the NFIT table (the ACPI persistent memory interface)
    and a few fixes and cleanups.
 
  - ACPI CPPC2 (Collaborative Processor Performance Control v2)
    support along with a cpufreq frontend (Ashwin Chaugule).
 
    This can only be enabled on ARM64 at this point.
 
  - New ACPI infrastructure for the early probing of IRQ chips and
    clock sources (Marc Zyngier).
 
  - Support for a new hierarchical properties extension of the ACPI
    _DSD (Device Specific Data) device configuration object allowing
    the kernel to handle hierarchical properties (provided by the
    platform firmware this way) automatically and make them available
    to device drivers via the generic device properties interface
    (Rafael Wysocki).
 
  - Generic device properties API extension to obtain an index of
    certain string value in an array of strings, along the lines of
    of_property_match_string(), but working for all of the supported
    firmware node types, and support for the "dma-names" device
    property based on it (Mika Westerberg).
 
  - ACPI core fix to parse the MADT (Multiple APIC Description Table)
    entries in the order expected by platform firmware (and mandated
    by the specification) to avoid confusion on systems with more than
    255 logical CPUs (Lukasz Anaczkowski).
 
  - Consolidation of the ACPI-based handling of PCI host bridges
    on x86 and ia64 (Jiang Liu).
 
  - ACPI core fixes to ensure that the correct IRQ number is used to
    represent the SCI (System Control Interrupt) in the cases when
    it has been re-mapped (Chen Yu).
 
  - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).
 
  - ACPI EC driver fixes (Lv Zheng).
 
  - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
    Kosina, Rami Rosen, Rasmus Villemoes).
 
  - New mechanism in the PM core allowing drivers to check if the
    platform firmware is going to be involved in the upcoming system
    suspend or if it has been involved in the suspend the system is
    resuming from at the moment (Rafael Wysocki).
 
    This should allow drivers to optimize their suspend/resume
    handling in some cases and the changes include a couple of users
    of it (the i8042 input driver, PCI PM).
 
  - PCI PM fix to prevent runtime-suspended devices with PME enabled
    from being resumed during system suspend even if they aren't
    configured to wake up the system from sleep (Rafael Wysocki).
 
  - New mechanism to report the number of a wakeup IRQ that woke up
    the system from sleep last time (Alexandra Yates).
 
  - Removal of unused interfaces from the generic power domains
    framework and fixes related to latency measurements in that
    code (Ulf Hansson, Daniel Lezcano).
 
  - cpufreq core sysfs interface rework to make it handle CPUs that
    share performance scaling settings (represented by a common
    cpufreq policy object) more symmetrically (Viresh Kumar).
 
    This should help to simplify the CPU offline/online handling among
    other things.
 
  - cpufreq core fixes and cleanups (Viresh Kumar).
 
  - intel_pstate fixes related to the Turbo Activation Ratio (TAR)
    mechanism on client platforms which causes the turbo P-states
    range to vary depending on platform firmware settings (Srinivas
    Pandruvada).
 
  - intel_pstate sysfs interface fix (Prarit Bhargava).
 
  - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
    and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
    Bhat, Luis de Bethencourt).
 
  - cpuidle mvebu driver cleanups (Russell King).
 
  - OPP (Operating Performance Points) framework code reorganization
    to make it more maintainable (Viresh Kumar).
 
  - Intel Broxton support for the RAPL (Running Average Power Limits)
    power capping driver (Amy Wiles).
 
  - Assorted power management code fixes and cleanups (Dan Carpenter,
    Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
    Villemoes).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJWOC9oAAoJEILEb/54YlRx/c8P/joflwoFsISwJccG62YTQMuc
 bMQKM4Kw0vl5La8+pkLpe5t6+mW7l81UFtYF6Dzd8LOKlD9sszD34z1lHmCeT/oR
 wn0uZpHagRyLMUfoyiEtlU/VRU6WQNNtS3EgjwUi7xgFz9Q0pjcCZ9OQ6vKov1j5
 +6j40ODif5sgo+2vl+rztJiV0SIMkYdkgNqgfN1FE9bdLA2Zkk+PxxJbtGQORuDu
 O/K+XhQT2xWquVWi/1p+VtQxs5glBS1oKm0kogV5bElCvNTRNIVABUNcjogITQwo
 QSAKgoCKIoaIl5jtDT6u5dc0y67q/dMtqOY9fOCcOz1Z7jbWQzR8D7mpFWIsJUPK
 K2LClI3t85ynpN6Jref246A6+C9nwB8JMAiAR11oBw7WbBlkd6tbRgcT5B+iz8UE
 FuCCif7pha/Fs+Jt1YRazscIqteQ2bAhhxikuIPMfw2M6M67MNfVNeKA1bAoSM34
 dH7JsilblitvV7shrwJHwXPXCOF2jEPoK8I4/q2+TR5qUxEpRJjelQxXGSaQScMZ
 iNnjeTgv8H8q+rY5Yjzsl4pxP0Fvf7IuqkptWOJbgepg4cQc9pS87wOpY3uEeQzr
 H7ruaQJFCnLO4aXbPNClsiJARhrBk+qMlsh4vBEyCJ2T0ucb+nIUcN4BTi8t85yl
 X97BfHHUiDoUrnIsNids
 =1gaH
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "Quite a new features are included this time.

  First off, the Collaborative Processor Performance Control interface
  (version 2) defined by ACPI will now be supported on ARM64 along with
  a cpufreq frontend for CPU performance scaling.

  Second, ACPI gets a new infrastructure for the early probing of IRQ
  chips and clock sources (along the lines of the existing similar
  mechanism for DT).

  Next, the ACPI core and the generic device properties API will now
  support a recently introduced hierarchical properties extension of the
  _DSD (Device Specific Data) ACPI device configuration object.  If the
  ACPI platform firmware uses that extension to organize device
  properties in a hierarchical way, the kernel will automatically handle
  it and make those properties available to device drivers via the
  generic device properties API.

  It also will be possible to build the ACPICA's AML interpreter
  debugger into the kernel now and use that to diagnose AML-related
  problems more efficiently.  In the future, this should make it
  possible to single-step AML execution and do similar things.
  Interesting stuff, although somewhat experimental at this point.

  Finally, the PM core gets a new mechanism that can be used by device
  drivers to distinguish between suspend-to-RAM (based on platform
  firmware support) and suspend-to-idle (or other variants of system
  suspend the platform firmware is not involved in) and possibly
  optimize their device suspend/resume handling accordingly.

  In addition to that, some existing features are re-organized quite
  substantially.

  First, the ACPI-based handling of PCI host bridges on x86 and ia64 is
  unified and the common code goes into the ACPI core (so as to reduce
  code duplication and eliminate non-essential differences between the
  two architectures in that area).

  Second, the Operating Performance Points (OPP) framework is
  reorganized to make the code easier to find and follow.

  Next, the cpufreq core's sysfs interface is reorganized to get rid of
  the "primary CPU" concept for configurations in which the same
  performance scaling settings are shared between multiple CPUs.

  Finally, some interfaces that aren't necessary any more are dropped
  from the generic power domains framework.

  On top of the above we have some minor extensions, cleanups and bug
  fixes in multiple places, as usual.

  Specifics:

   - ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).

     The most significant change is to allow the AML debugger to be
     built into the kernel.  On top of that there is an update related
     to the NFIT table (the ACPI persistent memory interface) and a few
     fixes and cleanups.

   - ACPI CPPC2 (Collaborative Processor Performance Control v2) support
     along with a cpufreq frontend (Ashwin Chaugule).

     This can only be enabled on ARM64 at this point.

   - New ACPI infrastructure for the early probing of IRQ chips and
     clock sources (Marc Zyngier).

   - Support for a new hierarchical properties extension of the ACPI
     _DSD (Device Specific Data) device configuration object allowing
     the kernel to handle hierarchical properties (provided by the
     platform firmware this way) automatically and make them available
     to device drivers via the generic device properties interface
     (Rafael Wysocki).

   - Generic device properties API extension to obtain an index of
     certain string value in an array of strings, along the lines of
     of_property_match_string(), but working for all of the supported
     firmware node types, and support for the "dma-names" device
     property based on it (Mika Westerberg).

   - ACPI core fix to parse the MADT (Multiple APIC Description Table)
     entries in the order expected by platform firmware (and mandated by
     the specification) to avoid confusion on systems with more than 255
     logical CPUs (Lukasz Anaczkowski).

   - Consolidation of the ACPI-based handling of PCI host bridges on x86
     and ia64 (Jiang Liu).

   - ACPI core fixes to ensure that the correct IRQ number is used to
     represent the SCI (System Control Interrupt) in the cases when it
     has been re-mapped (Chen Yu).

   - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).

   - ACPI EC driver fixes (Lv Zheng).

   - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
     Kosina, Rami Rosen, Rasmus Villemoes).

   - New mechanism in the PM core allowing drivers to check if the
     platform firmware is going to be involved in the upcoming system
     suspend or if it has been involved in the suspend the system is
     resuming from at the moment (Rafael Wysocki).

     This should allow drivers to optimize their suspend/resume handling
     in some cases and the changes include a couple of users of it (the
     i8042 input driver, PCI PM).

   - PCI PM fix to prevent runtime-suspended devices with PME enabled
     from being resumed during system suspend even if they aren't
     configured to wake up the system from sleep (Rafael Wysocki).

   - New mechanism to report the number of a wakeup IRQ that woke up the
     system from sleep last time (Alexandra Yates).

   - Removal of unused interfaces from the generic power domains
     framework and fixes related to latency measurements in that code
     (Ulf Hansson, Daniel Lezcano).

   - cpufreq core sysfs interface rework to make it handle CPUs that
     share performance scaling settings (represented by a common cpufreq
     policy object) more symmetrically (Viresh Kumar).

     This should help to simplify the CPU offline/online handling among
     other things.

   - cpufreq core fixes and cleanups (Viresh Kumar).

   - intel_pstate fixes related to the Turbo Activation Ratio (TAR)
     mechanism on client platforms which causes the turbo P-states range
     to vary depending on platform firmware settings (Srinivas
     Pandruvada).

   - intel_pstate sysfs interface fix (Prarit Bhargava).

   - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
     and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
     Bhat, Luis de Bethencourt).

   - cpuidle mvebu driver cleanups (Russell King).

   - OPP (Operating Performance Points) framework code reorganization to
     make it more maintainable (Viresh Kumar).

   - Intel Broxton support for the RAPL (Running Average Power Limits)
     power capping driver (Amy Wiles).

   - Assorted power management code fixes and cleanups (Dan Carpenter,
     Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
     Villemoes)"

* tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (108 commits)
  cpufreq: postfix policy directory with the first CPU in related_cpus
  cpufreq: create cpu/cpufreq/policyX directories
  cpufreq: remove cpufreq_sysfs_{create|remove}_file()
  cpufreq: create cpu/cpufreq at boot time
  cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
  cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  PM / Domains: Merge measurements for PM QoS device latencies
  PM / Domains: Don't measure ->start|stop() latency in system PM callbacks
  PM / clk: Fix broken build due to non-matching code and header #ifdefs
  ACPI / Documentation: add copy_dsdt to ACPI format options
  ACPI / sysfs: correctly check failing memory allocation
  ACPI / video: Add a quirk to force native backlight on Lenovo IdeaPad S405
  ACPI / CPPC: Fix potential memory leak
  ACPI / CPPC: signedness bug in register_pcc_channel()
  ACPI / PAD: power_saving_thread() is not freezable
  ACPI / PM: Fix incorrect wakeup IRQ setting during suspend-to-idle
  ACPI: Using correct irq when waiting for events
  ACPI: Use correct IRQ when uninstalling ACPI interrupt handler
  cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
  cpuidle: mvebu: clean up multiple platform drivers
  ...
2015-11-04 18:10:13 -08:00
Linus Torvalds
41ecf1404b xen: features for 4.4-rc0
- Improve balloon driver memory hotplug placement.
 - Use unpopulated hotplugged memory for foreign pages (if
   supported/enabled).
 - Support 64 KiB guest pages on arm64.
 - CPU hotplug support on arm/arm64.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOeSkAAoJEFxbo/MsZsTRph0H/0nE8Tx0GyGtOyCYfBdInTvI
 WgjvL8VR1XrweZMVis3668MzhLSYg6b5lvJsoi+L3jlzYRyze43iHXsKfvp+8p0o
 TVUhFnlHEHF8ASEtPydAi6HgS7Dn9OQ9LaZ45R1Gk0rHnwJjIQonhTn2jB0yS9Am
 Hf4aZXP2NVZphjYcloqNsLH0G6mGLtgq8cS0uKcVO2YIrR4Dr3sfj9qfq9mflf8n
 sA/5ifoHRfOUD1vJzYs4YmIBUv270jSsprWK/Mi2oXIxUTBpKRAV1RVCAPW6GFci
 HIZjIJkjEPWLsvxWEs0dUFJQGp3jel5h8vFPkDWBYs3+9rILU2DnLWpKGNDHx3k=
 =vUfa
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:

 - Improve balloon driver memory hotplug placement.

 - Use unpopulated hotplugged memory for foreign pages (if
   supported/enabled).

 - Support 64 KiB guest pages on arm64.

 - CPU hotplug support on arm/arm64.

* tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits)
  xen: fix the check of e_pfn in xen_find_pfn_range
  x86/xen: add reschedule point when mapping foreign GFNs
  xen/arm: don't try to re-register vcpu_info on cpu_hotplug.
  xen, cpu_hotplug: call device_offline instead of cpu_down
  xen/arm: Enable cpu_hotplug.c
  xenbus: Support multiple grants ring with 64KB
  xen/grant-table: Add an helper to iterate over a specific number of grants
  xen/xenbus: Rename *RING_PAGE* to *RING_GRANT*
  xen/arm: correct comment in enlighten.c
  xen/gntdev: use types from linux/types.h in userspace headers
  xen/gntalloc: use types from linux/types.h in userspace headers
  xen/balloon: Use the correct sizeof when declaring frame_list
  xen/swiotlb: Add support for 64KB page granularity
  xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb
  arm/xen: Add support for 64KB page granularity
  xen/privcmd: Add support for Linux 64KB page granularity
  net/xen-netback: Make it running on 64KB page granularity
  net/xen-netfront: Make it running on 64KB page granularity
  block/xen-blkback: Make it running on 64KB page granularity
  block/xen-blkfront: Make it running on 64KB page granularity
  ...
2015-11-04 17:32:42 -08:00
Linus Torvalds
e627078a0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "There is only one new feature in this pull for the 4.4 merge window,
  most of it is small enhancements, cleanup and bug fixes:

   - Add the s390 backend for the software dirty bit tracking.  This
     adds two new pgtable functions pte_clear_soft_dirty and
     pmd_clear_soft_dirty which is why there is a hit to
     arch/x86/include/asm/pgtable.h in this pull request.

   - A series of cleanup patches for the AP bus, this includes the
     removal of the support for two outdated crypto cards (PCICC and
     PCICA).

   - The irq handling / signaling on buffer full in the runtime
     instrumentation code is dropped.

   - Some micro optimizations: remove unnecessary memory barriers for a
     couple of functions: [smb_]rmb, [smb_]wmb, atomics, bitops, and for
     spin_unlock.  Use the builtin bswap if available and make
     test_and_set_bit_lock more cache friendly.

   - Statistics and a tracepoint for the diagnose calls to the
     hypervisor.

   - The CPU measurement facility support to sample KVM guests is
     improved.

   - The vector instructions are now always enabled for user space
     processes if the hardware has the vector facility.  This simplifies
     the FPU handling code.  The fpu-internal.h header is split into fpu
     internals, api and types just like x86.

   - Cleanup and improvements for the common I/O layer.

   - Rework udelay to solve a problem with kprobe.  udelay has busy loop
     semantics but still uses an idle processor state for the wait"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits)
  s390: remove runtime instrumentation interrupts
  s390/cio: de-duplicate subchannel validation
  s390/css: unneeded initialization in for_each_subchannel
  s390/Kconfig: use builtin bswap
  s390/dasd: fix disconnected device with valid path mask
  s390/dasd: fix invalid PAV assignment after suspend/resume
  s390/dasd: fix double free in dasd_eckd_read_conf
  s390/kernel: fix ptrace peek/poke for floating point registers
  s390/cio: move ccw_device_stlck functions
  s390/cio: move ccw_device_call_handler
  s390/topology: reduce per_cpu() invocations
  s390/nmi: reduce size of percpu variable
  s390/nmi: fix terminology
  s390/nmi: remove casts
  s390/nmi: remove pointless error strings
  s390: don't store registers on disabled wait anymore
  s390: get rid of __set_psw_mask()
  s390/fpu: split fpu-internal.h into fpu internals, api, and type headers
  s390/dasd: fix list_del corruption after lcu changes
  s390/spinlock: remove unneeded serializations at unlock
  ...
2015-11-04 11:31:31 -08:00
Linus Torvalds
b0f85fa11a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

Changes of note:

 1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.

 2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
    David Ahern.

 3) Allow the user to ask for the statistics to be filtered out of
    ipv4/ipv6 address netlink dumps.  From Sowmini Varadhan.

 4) More work to pass the network namespace context around deep into
    various packet path APIs, starting with the netfilter hooks.  From
    Eric W Biederman.

 5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
    Richter.

 6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.

 7) Support Very High Throughput in wireless MESH code, from Bob
    Copeland.

 8) Allow setting the ageing_time in switchdev/rocker.  From Scott
    Feldman.

 9) Properly autoload L2TP type modules, from Stephen Hemminger.

10) Fix and enable offload features by default in 8139cp driver, from
    David Woodhouse.

11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
    Jiri Benc.

12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
    Opstad.

13) Fix IPSEC flowcache overflows on large systems, from Steffen
    Klassert.

14) Convert bridging to track VLANs using rhashtable entries rather than
    a bitmap.  From Nikolay Aleksandrov.

15) Make TCP listener handling completely lockless, this is a major
    accomplishment.  Incoming request sockets now live in the
    established hash table just like any other socket too.

    From Eric Dumazet.

15) Provide more bridging attributes to netlink, from Nikolay
    Aleksandrov.

16) Use hash based algorithm for ipv4 multipath routing, this was very
    long overdue.  From Peter Nørlund.

17) Several y2038 cures, mostly avoiding timespec.  From Arnd Bergmann.

18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.

19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet.  This
    influences the port binding selection logic used by SO_REUSEPORT.

20) Add ipv6 support to VRF, from David Ahern.

21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.

22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.

23) Implement RACK loss recovery in TCP, from Yuchung Cheng.

24) Support multipath routes in MPLS, from Roopa Prabhu.

25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
    Dumazet.

26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
    Sudarsana Kalluru.

27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
    Sowa.

28) Support ipv6 geneve tunnels, from John W Linville.

29) Add flood control support to switchdev layer, from Ido Schimmel.

30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
    Hannes Frederic Sowa.

31) Support persistent maps and progs in bpf, from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
  sh_eth: use DMA barriers
  switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
  net: sched: kill dead code in sch_choke.c
  irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
  net: dsa: mv88e6xxx: include DSA ports in VLANs
  net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
  net/core: fix for_each_netdev_feature
  vlan: Invoke driver vlan hooks only if device is present
  arcnet/com20020: add LEDS_CLASS dependency
  bpf, verifier: annotate verbose printer with __printf
  dp83640: Only wait for timestamps for packets with timestamping enabled.
  ptp: Change ptp_class to a proper bitmask
  dp83640: Prune rx timestamp list before reading from it
  dp83640: Delay scheduled work.
  dp83640: Include hash in timestamp/packet matching
  ipv6: fix tunnel error handling
  net/mlx5e: Fix LSO vlan insertion
  net/mlx5e: Re-eanble client vlan TX acceleration
  net/mlx5e: Return error in case mlx5e_set_features() fails
  net/mlx5e: Don't allow more than max supported channels
  ...
2015-11-04 09:41:05 -08:00
Linus Torvalds
ccc9d4a6d6 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:

   - Add support for cipher output IVs in testmgr
   - Add missing crypto_ahash_blocksize helper
   - Mark authenc and des ciphers as not allowed under FIPS.

Algorithms:

   - Add CRC support to 842 compression
   - Add keywrap algorithm
   - A number of changes to the akcipher interface:
      + Separate functions for setting public/private keys.
      + Use SG lists.

Drivers:

   - Add Intel SHA Extension optimised SHA1 and SHA256
   - Use dma_map_sg instead of custom functions in crypto drivers
   - Add support for STM32 RNG
   - Add support for ST RNG
   - Add Device Tree support to exynos RNG driver
   - Add support for mxs-dcp crypto device on MX6SL
   - Add xts(aes) support to caam
   - Add ctr(aes) and xts(aes) support to qat
   - A large set of fixes from Russell King for the marvell/cesa driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits)
  crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params()
  crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used
  hwrng: exynos - Add Device Tree support
  hwrng: exynos - Fix missing configuration after suspend to RAM
  hwrng: exynos - Add timeout for waiting on init done
  dt-bindings: rng: Describe Exynos4 PRNG bindings
  crypto: marvell/cesa - use __le32 for hardware descriptors
  crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op()
  crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio()
  crypto: marvell/cesa - use gfp_t for gfp flags
  crypto: marvell/cesa - use dma_addr_t for cur_dma
  crypto: marvell/cesa - use readl_relaxed()/writel_relaxed()
  crypto: caam - fix indentation of close braces
  crypto: caam - only export the state we really need to export
  crypto: caam - fix non-block aligned hash calculation
  crypto: caam - avoid needlessly saving and restoring caam_hash_ctx
  crypto: caam - print errno code when hash registration fails
  crypto: marvell/cesa - fix memory leak
  crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req()
  crypto: marvell/cesa - rearrange handling for sw padded hashes
  ...
2015-11-04 09:11:12 -08:00
Laszlo Ersek
879ae18804 KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
Commit b18d5431ac ("KVM: x86: fix CR0.CD virtualization") was
technically correct, but it broke OVMF guests by slowing down various
parts of the firmware.

Commit fb279950ba ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") quirked the
first function modified by b18d5431ac, vmx_get_mt_mask(), for OVMF's
sake. This restored the speed of the OVMF code that runs before
PlatformPei (including the memory intensive LZMA decompression in SEC).

This patch extends the quirk to the second function modified by
b18d5431ac, kvm_set_cr0(). It eliminates the intrusive slowdown that
hits the EFI_MP_SERVICES_PROTOCOL implementation of edk2's
UefiCpuPkg/CpuDxe -- which is built into OVMF --, when CpuDxe starts up
all APs at once for initialization, in order to count them.

We also carry over the kvm_arch_has_noncoherent_dma() sub-condition from
the other half of the original commit b18d5431ac.

Fixes: b18d5431ac
Cc: stable@vger.kernel.org
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: Janusz Mocek <januszmk6@gmail.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>#
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:39 +01:00
Paolo Bonzini
89651a3dec KVM: x86: allow RSM from 64-bit mode
The SDM says that exiting system management mode from 64-bit mode
is invalid, but that would be too good to be true.  But actually,
most of the code is already there to support exiting from compat
mode (EFER.LME=1, EFER.LMA=0).  Getting all the way from 64-bit
mode to real mode only requires clearing CS.L and CR4.PCIDE.

Cc: stable@vger.kernel.org
Fixes: 660a5d517a
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:38 +01:00
Radim Krčmář
656ec4a492 KVM: VMX: fix SMEP and SMAP without EPT
The comment in code had it mostly right, but we enable paging for
emulated real mode regardless of EPT.

Without EPT (which implies emulated real mode), secondary VCPUs won't
start unless we disable SM[AE]P when the guest doesn't use paging.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:37 +01:00
Paolo Bonzini
8a22f234a8 KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
The function is not used outside device assignment, and
kvm_arch_set_irq_inatomic has a different prototype.  Move it here and
make it static to avoid confusion.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:36 +01:00
Paolo Bonzini
7695405698 KVM: device assignment: remove pointless #ifdefs
The symbols are always defined.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:35 +01:00
Paolo Bonzini
b97e6de9c9 KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
We do not want to do too much work in atomic context, in particular
not walking all the VCPUs of the virtual machine.  So we want
to distinguish the architecture-specific injection function for irqfd
from kvm_set_msi.  Since it's still empty, reuse the newly added
kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:35 +01:00
Radim Krčmář
0669a51015 KVM: x86: zero apic_arb_prio on reset
BSP doesn't get INIT so its apic_arb_prio isn't zeroed after reboot.
BSP won't get lowest priority interrupts until other VCPUs get enough
interrupts to match their pre-reboot apic_arb_prio.

That behavior doesn't fit into KVM's round-robin-like interpretation of
lowest priority delivery ... userspace should KVM_SET_LAPIC on reset, so
just zero apic_arb_prio there.

Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:34 +01:00
Andrey Smetanin
c75efa974e drivers/hv: share Hyper-V SynIC constants with userspace
Moved Hyper-V synic contants from guest Hyper-V drivers private
header into x86 arch uapi Hyper-V header.

Added Hyper-V synic msr's flags into x86 arch uapi Hyper-V header.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:33 +01:00
Radim Krčmář
f40606b147 KVM: x86: handle SMBASE as physical address in RSM
GET_SMSTATE depends on real mode to ensure that smbase+offset is treated
as a physical address, which has already caused a bug after shuffling
the code.  Enforce physical addressing.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:32 +01:00
Radim Krčmář
7a036a6f67 KVM: x86: add read_phys to x86_emulate_ops
We want to read the physical memory when emulating RSM.

X86EMUL_IO_NEEDED is returned on all errors for consistency with other
helpers.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:31 +01:00
Saurabh Sengar
2da29bccc5 KVM: x86: removing unused variable
removing unused variables, found by coccinelle

Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:31 +01:00
Paolo Bonzini
197a4f4b06 KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
 level-triggered semantics for the arch-timers, a series of patches to
 synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
 improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
 getting rid of redundant state, and finally a stylistic change that gets rid of
 some ctags warnings.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
 6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
 CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
 pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
 P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
 u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
 =GMNk
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Conflicts:
	arch/x86/include/asm/kvm_host.h
2015-11-04 16:24:17 +01:00
Linus Torvalds
66ef3493d4 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:
 "Misc updates to the Intel MID and SGI UV platforms"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel-mid: Make intel_mid_ops static
  arch/x86/intel-mid: Use kmemdup rather than duplicating its implementation
  x86/platform/uv: Implement simple dump failover if kdump fails
  x86/platform/uv: Insert per_cpu accessor function on uv_hub_nmi
2015-11-03 21:33:18 -08:00
Linus Torvalds
639ab3eb38 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "The main changes are: continued PAT work by Toshi Kani, plus a new
  boot time warning about insecure RWX kernel mappings, by Stephen
  Smalley.

  The new CONFIG_DEBUG_WX=y warning is marked default-y if
  CONFIG_DEBUG_RODATA=y is already eanbled, as a special exception, as
  these bugs are hard to notice and this check already found several
  live bugs"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Warn on W^X mappings
  x86/mm: Fix no-change case in try_preserve_large_page()
  x86/mm: Fix __split_large_page() to handle large PAT bit
  x86/mm: Fix try_preserve_large_page() to handle large PAT bit
  x86/mm: Fix gup_huge_p?d() to handle large PAT bit
  x86/mm: Fix slow_virt_to_phys() to handle large PAT bit
  x86/mm: Fix page table dump to show PAT bit
  x86/asm: Add pud_pgprot() and pmd_pgprot()
  x86/asm: Fix pud/pmd interfaces to handle large PAT bit
  x86/asm: Add pud/pmd mask interfaces to handle large PAT bit
  x86/asm: Move PUD_PAGE macros to page_types.h
  x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO
2015-11-03 21:23:56 -08:00
Linus Torvalds
4302d506d5 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 sigcontext header cleanups from Ingo Molnar:
 "This series reorganizes and cleans up various aspects of the main
  sigcontext UAPI headers, such as unifying the data structures and
  updating/adding lots of comments to explain all the ABI details and
  quirks.  The headers can now also be built in user-space standalone"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers: Clean up too long lines
  x86/headers: Remove <asm/sigcontext.h> references on the kernel side
  x86/headers: Remove direct sigcontext32.h uses
  x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
  x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
  x86/headers: Make sigcontext pointers bit independent
  x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
  x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
  x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
  x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
  x86/headers: Unify register type definitions between 32-bit compat and i386
  x86/headers: Use ABI types consistently in sigcontext*.h
  x86/headers: Separate out legacy user-space structure definitions
  x86/headers: Clean up and better document uapi/asm/sigcontext.h
  x86/headers: Clean up uapi/asm/sigcontext32.h
  x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h
2015-11-03 21:05:40 -08:00
Linus Torvalds
ce4d72fac1 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu changes from Ingo Molnar:
 "There are two main areas of changes:

   - Rework of the extended FPU state code to robustify the kernel's
     usage of cpuid provided xstate sizes - and related changes (Dave
     Hansen)"

   - math emulation enhancements: new modern FPU instructions support,
     with testcases, plus cleanups (Denys Vlasnko)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/fpu: Fixup uninitialized feature_name warning
  x86/fpu/math-emu: Add support for FISTTP instructions
  x86/fpu/math-emu, selftests: Add test for FISTTP instructions
  x86/fpu/math-emu: Add support for FCMOVcc insns
  x86/fpu/math-emu: Add support for F[U]COMI[P] insns
  x86/fpu/math-emu: Remove define layer for undocumented opcodes
  x86/fpu/math-emu, selftests: Add tests for FCMOV and FCOMI insns
  x86/fpu/math-emu: Remove !NO_UNDOC_CODE
  x86/fpu: Check CPU-provided sizes against struct declarations
  x86/fpu: Check to ensure increasing-offset xstate offsets
  x86/fpu: Correct and check XSAVE xstate size calculations
  x86/fpu: Add C structures for AVX-512 state components
  x86/fpu: Rework YMM definition
  x86/fpu/mpx: Rework MPX 'xstate' types
  x86/fpu: Add xfeature_enabled() helper instead of test_bit()
  x86/fpu: Remove 'xfeature_nr'
  x86/fpu: Rework XSTATE_* macros to remove magic '2'
  x86/fpu: Rename XFEATURES_NR_MAX
  x86/fpu: Rename XSAVE macros
  x86/fpu: Remove partial LWP support definitions
  ...
2015-11-03 20:50:26 -08:00
Linus Torvalds
0f25f2c1b1 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kgdb fixlet from Ingo Molnar:
 "A single debugging related commit: compress the memory usage of a kgdb
  data structure"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kgdb: Replace bool_int_array[NR_CPUS] with bitmap
2015-11-03 20:12:10 -08:00
Linus Torvalds
f323c49b30 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu changes from Ingo Molnar:
 "Two changes in this cycle: a Kconfig help text enhancement, and an AMD
  CLZERO instruction capability detection and enumeration"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add CLZERO detection
  x86/Kconfig/cpus: Fix/complete CPU type help texts
2015-11-03 19:39:42 -08:00
Linus Torvalds
33d46f9765 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "An early_printk cleanup plus deinlining enhancements"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/early_printk: Set __iomem address space for IO
  x86/signal: Deinline get_sigframe, save 240 bytes
  x86: Deinline early_console_register, save 403 bytes
  x86/e820: Deinline e820_type_to_string, save 126 bytes
2015-11-03 19:34:22 -08:00
Linus Torvalds
378e4e9825 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot cleanup from Ingo Molnar:
 "A single commit: remove an obsolete kcrash boot flag"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kexec: Remove obsolete 'in_crash_kexec' flag
2015-11-03 19:28:37 -08:00
Linus Torvalds
a75a3f6fc9 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The main change in this cycle is another step in the big x86 system
  call interface rework by Andy Lutomirski, which moves most of the low
  level x86 entry code from assembly to C, for all syscall entries
  except native 64-bit system calls:

    arch/x86/entry/entry_32.S        | 182 ++++------
    arch/x86/entry/entry_64_compat.S | 547 ++++++++-----------------------
    194 insertions(+), 535 deletions(-)

  ... our hope is that the final remaining step (converting native
  64-bit system calls) will be less painful as all the previous steps,
  given that most of the legacies and quirks are concentrated around
  native 32-bit and compat environments"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/entry/32: Fix FS and GS restore in opportunistic SYSEXIT
  x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on
  um/x86: Fix build after x86 syscall changes
  x86/asm: Remove the xyz_cfi macros from dwarf2.h
  selftests/x86: Style fixes for the 'unwind_vdso' test
  x86/entry/64/compat: Document sysenter_fix_flags's reason for existence
  x86/entry: Split and inline syscall_return_slowpath()
  x86/entry: Split and inline prepare_exit_to_usermode()
  x86/entry: Use pt_regs_to_thread_info() in syscall entry tracing
  x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRY
  x86/entry: Micro-optimize compat fast syscall arg fetch
  x86/entry: Force inlining of 32-bit syscall code
  x86/entry: Make irqs_disabled checks in exit code depend on lockdep
  x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls
  x86/asm: Remove thread_info.sysenter_return
  x86/entry/32: Re-implement SYSENTER using the new C path
  x86/entry/32: Switch INT80 to the new C syscall path
  x86/entry/32: Open-code return tracking from fork and kthreads
  x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls
  x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
  ...
2015-11-03 18:59:10 -08:00
Linus Torvalds
d2bea739f8 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic changes from Ingo Molnar:
 "The main changes in this cycle were:

   - Numachip updates: new hardware support, fixes and cleanups.
     (Daniel J Blueman)

   - misc smaller cleanups and fixlets"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/io_apic: Make eoi_ioapic_pin() static
  x86/irq: Drop unlikely before IS_ERR_OR_NULL
  x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC
  x86/apic: Deinline various functions
  x86/numachip: Fix timer build conflict
  x86/numachip: Introduce Numachip2 timer mechanisms
  x86/numachip: Add Numachip IPI optimisations
  x86/numachip: Add Numachip2 APIC support
  x86/numachip: Cleanup Numachip support
2015-11-03 18:33:15 -08:00
Linus Torvalds
53528695ff Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle were:

   - sched/fair load tracking fixes and cleanups (Byungchul Park)

   - Make load tracking frequency scale invariant (Dietmar Eggemann)

   - sched/deadline updates (Juri Lelli)

   - stop machine fixes, cleanups and enhancements for bugs triggered by
     CPU hotplug stress testing (Oleg Nesterov)

   - scheduler preemption code rework: remove PREEMPT_ACTIVE and related
     cleanups (Peter Zijlstra)

   - Rework the sched_info::run_delay code to fix races (Peter Zijlstra)

   - Optimize per entity utilization tracking (Peter Zijlstra)

   - ... misc other fixes, cleanups and smaller updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS
  sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
  sched: Start stopper early
  stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
  stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
  stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
  stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
  stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
  sched/x86: Fix typo in __switch_to() comments
  sched/core: Remove a parameter in the migrate_task_rq() function
  sched/core: Drop unlikely behind BUG_ON()
  sched/core: Fix task and run queue sched_info::run_delay inconsistencies
  sched/numa: Fix task_tick_fair() from disabling numa_balancing
  sched/core: Add preempt_count invariant check
  sched/core: More notrace annotations
  sched/core: Kill PREEMPT_ACTIVE
  sched/core, sched/x86: Kill thread_info::saved_preempt_count
  sched/core: Simplify preempt_count tests
  sched/core: Robustify preemption leak checks
  sched/core: Stop setting PREEMPT_ACTIVE
  ...
2015-11-03 18:03:50 -08:00
Linus Torvalds
b831ef2cad Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS changes from Ingo Molnar:
 "The main system reliability related changes were from x86, but also
  some generic RAS changes:

   - AMD MCE error injection subsystem enhancements.  (Aravind
     Gopalakrishnan)

   - Fix MCE and CPU hotplug interaction bug.  (Ashok Raj)

   - kcrash bootup robustness fix.  (Baoquan He)

   - kcrash cleanups.  (Borislav Petkov)

   - x86 microcode driver rework: simplify it by unmodularizing it and
     other cleanups.  (Borislav Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
  x86/mce: Add a Scalable MCA vendor flags bit
  MAINTAINERS: Unify the microcode driver section
  x86/microcode/intel: Move #ifdef DEBUG inside the function
  x86/microcode/amd: Remove maintainers from comments
  x86/microcode: Remove modularization leftovers
  x86/microcode: Merge the early microcode loader
  x86/microcode: Unmodularize the microcode driver
  x86/mce: Fix thermal throttling reporting after kexec
  kexec/crash: Say which char is the unrecognized
  x86/setup/crash: Check memblock_reserve() retval
  x86/setup/crash: Cleanup some more
  x86/setup/crash: Remove alignment variable
  x86/setup: Cleanup crashkernel reservation functions
  x86/amd_nb, EDAC: Rename amd_get_node_id()
  x86/setup: Do not reserve crashkernel high memory if low reservation failed
  x86/microcode/amd: Do not overwrite final patch levels
  x86/microcode/amd: Extract current patch level read to a function
  x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
  x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
  ...
2015-11-03 17:51:33 -08:00
Linus Torvalds
b02ac6b18c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Improve accuracy of perf/sched clock on x86.  (Adrian Hunter)

   - Intel DS and BTS updates.  (Alexander Shishkin)

   - Intel cstate PMU support.  (Kan Liang)

   - Add group read support to perf_event_read().  (Peter Zijlstra)

   - Branch call hardware sampling support, implemented on x86 and
     PowerPC.  (Stephane Eranian)

   - Event groups transactional interface enhancements.  (Sukadev
     Bhattiprolu)

   - Enable proper x86/intel/uncore PMU support on multi-segment PCI
     systems.  (Taku Izumi)

   - ... misc fixes and cleanups.

  The perf tooling team was very busy again with 200+ commits, the full
  diff doesn't fit into lkml size limits.  Here's an (incomplete) list
  of the tooling highlights:

  New features:

   - Change the default event used in all tools (record/top): use the
     most precise "cycles" hw counter available, i.e. when the user
     doesn't specify any event, it will try using cycles:ppp, cycles:pp,
     etc and fall back transparently until it finds a working counter.
     (Arnaldo Carvalho de Melo)

   - Integration of perf with eBPF that, given an eBPF .c source file
     (or .o file built for the 'bpf' target with clang), will get it
     automatically built, validated and loaded into the kernel via the
     sys_bpf syscall, which can then be used and seen using 'perf trace'
     and other tools.

     (Wang Nan)

  Various user interface improvements:

   - Automatic pager invocation on long help output.  (Namhyung Kim)

   - Search for more options when passing args to -h, e.g.: (Arnaldo
     Carvalho de Melo)

        $ perf report -h interface

        Usage: perf report [<options>]

         --gtk    Use the GTK2 interface
         --stdio  Use the stdio interface
         --tui    Use the TUI interface

   - Show ordered command line options when -h is used or when an
     unknown option is specified.  (Arnaldo Carvalho de Melo)

   - If options are passed after -h, show just its descriptions, not all
     options.  (Arnaldo Carvalho de Melo)

   - Implement column based horizontal scrolling in the hists browser
     (top, report), making it possible to use the TUI for things like
     'perf mem report' where there are many more columns than can fit in
     a terminal.  (Arnaldo Carvalho de Melo)

   - Enhance the error reporting of tracepoint event parsing, e.g.:

       $ oldperf record -e sched:sched_switc usleep 1
       event syntax error: 'sched:sched_switc'
                            \___ unknown tracepoint
       Run 'perf list' for a list of valid events

     Now we get the much nicer:

       $ perf record -e sched:sched_switc ls
       event syntax error: 'sched:sched_switc'
                            \___ can't access trace events

       Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc
       Hint:  Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'

     And after we have those mount point permissions fixed:

       $ perf record -e sched:sched_switc ls
       event syntax error: 'sched:sched_switc'
                            \___ unknown tracepoint

       Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found.
       Hint:  Perhaps this kernel misses some CONFIG_ setting to enable this feature?.

     I.e.  basically now the event parsing routing uses the strerror_open()
     routines introduced by and used in 'perf trace' work.  (Jiri Olsa)

   - Fail properly when pattern matching fails to find a tracepoint,
     i.e. '-e non:existent' was being correctly handled, with a proper
     error message about that not being a valid event, but '-e
     non:existent*' wasn't, fix it.  (Jiri Olsa)

   - Do event name substring search as last resort in 'perf list'.
     (Arnaldo Carvalho de Melo)

     E.g.:

       # perf list clock

       List of pre-defined events (to be used in -e):

        cpu-clock                                          [Software event]
        task-clock                                         [Software event]

        uncore_cbox_0/clockticks/                          [Kernel PMU event]
        uncore_cbox_1/clockticks/                          [Kernel PMU event]

        kvm:kvm_pvclock_update                             [Tracepoint event]
        kvm:kvm_update_master_clock                        [Tracepoint event]
        power:clock_disable                                [Tracepoint event]
        power:clock_enable                                 [Tracepoint event]
        power:clock_set_rate                               [Tracepoint event]
        syscalls:sys_enter_clock_adjtime                   [Tracepoint event]
        syscalls:sys_enter_clock_getres                    [Tracepoint event]
        syscalls:sys_enter_clock_gettime                   [Tracepoint event]
        syscalls:sys_enter_clock_nanosleep                 [Tracepoint event]
        syscalls:sys_enter_clock_settime                   [Tracepoint event]
        syscalls:sys_exit_clock_adjtime                    [Tracepoint event]
        syscalls:sys_exit_clock_getres                     [Tracepoint event]
        syscalls:sys_exit_clock_gettime                    [Tracepoint event]
        syscalls:sys_exit_clock_nanosleep                  [Tracepoint event]
        syscalls:sys_exit_clock_settime                    [Tracepoint event]

  Intel PT hardware tracing enhancements:

   - Accept a zero --itrace period, meaning "as often as possible".  In
     the case of Intel PT that is the same as a period of 1 and a unit
     of 'instructions' (i.e.  --itrace=i1i).  (Adrian Hunter)

   - Harmonize itrace's synthesized callchains with the existing
     --max-stack tool option.  (Adrian Hunter)

   - Allow time to be displayed in nanoseconds in 'perf script'.
     (Adrian Hunter)

   - Fix potential infinite loop when handling Intel PT timestamps.
     (Adrian Hunter)

   - Slighly improve Intel PT debug logging.  (Adrian Hunter)

   - Warn when AUX data has been lost, just like when processing
     PERF_RECORD_LOST.  (Adrian Hunter)

   - Further document export-to-postgresql.py script.  (Adrian Hunter)

   - Add option to synthesize branch stack from auxtrace data.  (Adrian
     Hunter)

  Misc notable changes:

   - Switch the default callchain output mode to 'graph,0.5,caller', to
     make it look like the default for other tools, reducing the
     learning curve for people used to 'caller' based viewing.  (Arnaldo
     Carvalho de Melo)

   - various call chain usability enhancements.  (Namhyung Kim)

   - Introduce the 'P' event modifier, meaning 'max precision level,
     please', i.e.:

        $ perf record -e cycles:P usleep 1

     Is now similar to:

        $ perf record usleep 1

     Useful, for instance, when specifying multiple events.  (Jiri Olsa)

   - Add 'socket' sort entry, to sort by the processor socket in 'perf
     top' and 'perf report'.  (Kan Liang)

   - Introduce --socket-filter to 'perf report', for filtering by
     processor socket.  (Kan Liang)

   - Add new "Zoom into Processor Socket" operation in the perf hists
     browser, used in 'perf top' and 'perf report'.  (Kan Liang)

   - Allow probing on kmodules without DWARF.  (Masami Hiramatsu)

   - Fix 'perf probe -l' for probes added to kernel module functions.
     (Masami Hiramatsu)

   - Preparatory work for the 'perf stat record' feature that will allow
     generating perf.data files with counting data in addition to the
     sampling mode we have now (Jiri Olsa)

   - Update libtraceevent KVM plugin.  (Paolo Bonzini)

   - ... plus lots of other enhancements that I failed to list properly,
     by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda,
     Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He
     Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan
     Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim,
     Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane
     Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang
     Nan, Yang Shi and Yunlong Song"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits)
  perf unwind: Pass symbol source to libunwind
  tools build: Fix libiberty feature detection
  perf tools: Compile scriptlets to BPF objects when passing '.c' to --event
  perf record: Add clang options for compiling BPF scripts
  perf bpf: Attach eBPF filter to perf event
  perf tools: Make sure fixdep is built before libbpf
  perf script: Enable printing of branch stack
  perf trace: Add cmd string table to decode sys_bpf first arg
  perf bpf: Collect perf_evsel in BPF object files
  perf tools: Load eBPF object into kernel
  perf tools: Create probe points for BPF programs
  perf tools: Enable passing bpf object file to --event
  perf ebpf: Add the libbpf glue
  perf tools: Make perf depend on libbpf
  perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore
  perf tools: Enable pre-event inherit setting by config terms
  perf symbols: we can now read separate debug-info files based on a build ID
  perf symbols: Fix type error when reading a build-id
  perf tools: Search for more options when passing args to -h
  perf stat: Cache aggregated map entries in extra cpumap
  ...
2015-11-03 17:38:09 -08:00
Linus Torvalds
d63a978865 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar:
 "The main changes in this cycle were:

   - More gradual enhancements to atomic ops: new atomic*_read_ctrl()
     ops, synchronize atomic_{read,set}() ordering requirements between
     architectures, add atomic_long_t bitops.  (Peter Zijlstra)

   - Add _{relaxed|acquire|release}() variants for inc/dec atomics and
     use them in various locking primitives: mutex, rtmutex, mcs, rwsem.
     This enables weakly ordered architectures (such as arm64) to make
     use of more locking related optimizations.  (Davidlohr Bueso)

   - Implement atomic[64]_{inc,dec}_relaxed() on ARM.  (Will Deacon)

   - Futex kernel data cache footprint micro-optimization.  (Rasmus
     Villemoes)

   - pvqspinlock runtime overhead micro-optimization.  (Waiman Long)

   - misc smaller fixlets"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec}
  locking/rwsem: Use acquire/release semantics
  locking/mcs: Use acquire/release semantics
  locking/rtmutex: Use acquire/release semantics
  locking/mutex: Use acquire/release semantics
  locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics
  atomic: Implement atomic_read_ctrl()
  atomic, arch: Audit atomic_{read,set}()
  atomic: Add atomic_long_t bitops
  futex: Force hot variables into a single cache line
  locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL
  locking/osq: Relax atomic semantics
  locking/qrwlock: Rename ->lock to ->wait_lock
  locking/Documentation/lockstat: Fix typo - lokcing -> locking
  locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
2015-11-03 16:10:43 -08:00
Linus Torvalds
f5a8160c1e Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI changes from Ingo Molnar:
 "The main changes in this cycle were:

   - further EFI code generalization to make it more workable for ARM64
   - various extensions, such as 64-bit framebuffer address support,
     UEFI v2.5 EFI_PROPERTIES_TABLE support
   - code modularization simplifications and cleanups
   - new debugging parameters
   - various fixes and smaller additions"

* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  efi: Fix warning of int-to-pointer-cast on x86 32-bit builds
  efi: Use correct type for struct efi_memory_map::phys_map
  x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabled
  efi: Add "efi_fake_mem" boot option
  x86/efi: Rename print_efi_memmap() to efi_print_memmap()
  efi: Auto-load the efi-pstore module
  efi: Introduce EFI_NX_PE_DATA bit and set it from properties table
  efi: Add support for UEFIv2.5 Properties table
  efi: Add EFI_MEMORY_MORE_RELIABLE support to efi_md_typeattr_format()
  efifb: Add support for 64-bit frame buffer addresses
  efi/arm64: Clean up efi_get_fdt_params() interface
  arm64: Use core efi=debug instead of uefi_debug command line parameter
  efi/x86: Move efi=debug option parsing to core
  drivers/firmware: Make efi/esrt.c driver explicitly non-modular
  efi: Use the generic efi.memmap instead of 'memmap'
  acpi/apei: Use appropriate pgprot_t to map GHES memory
  arm64, acpi/apei: Implement arch_apei_get_mem_attributes()
  arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT
  acpi, x86: Implement arch_apei_get_mem_attributes()
  efi, x86: Rearrange efi_mem_attributes()
  ...
2015-11-03 15:05:52 -08:00
Linus Torvalds
7b2a4306f9 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer departement provides:

   - More y2038 work in the area of ntp and pps.

   - Optimization of posix cpu timers

   - New time related selftests

   - Some new clocksource drivers

   - The usual pile of fixes, cleanups and improvements"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  timeconst: Update path in comment
  timers/x86/hpet: Type adjustments
  clocksource/drivers/armada-370-xp: Implement ARM delay timer
  clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
  clocksource/drivers/imx: Allow timer irq affinity change
  clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
  clocksource/drivers/h8300_*: Remove unneeded memset()s
  clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
  clocksource/drivers/em_sti: Remove unneeded memset()s
  clocksource/drivers/mediatek: Use GPT as sched clock source
  clockevents/drivers/mtk: Fix spurious interrupt leading to crash
  posix_cpu_timer: Reduce unnecessary sighand lock contention
  posix_cpu_timer: Convert cputimer->running to bool
  posix_cpu_timer: Check thread timers only when there are active thread timers
  posix_cpu_timer: Optimize fastpath_timer_check()
  timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
  timers: Use __fls in apply_slack()
  clocksource: Remove return statement from void functions
  net: sfc: avoid using timespec
  ntp/pps: use y2038 safe types in pps_event_time
  ...
2015-11-03 14:13:41 -08:00
Bjorn Helgaas
1f9a30ec2a Merge branches 'pci/aer', 'pci/hotplug', 'pci/misc', 'pci/msi', 'pci/resource' and 'pci/virtualization' into next
* pci/aer:
  PCI/AER: Clear error status registers during enumeration and restore

* pci/hotplug:
  PCI: pciehp: Queue power work requests in dedicated function

* pci/misc:
  PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum
  x86/PCI: Make pci_subsys_init() static
  PCI: Add builtin_pci_driver() to avoid registration boilerplate
  PCI: Remove unnecessary "if" statement

* pci/msi:
  x86/PCI: Don't alloc pcibios-irq when MSI is enabled
  PCI/MSI: Export all remapped MSIs to sysfs attributes
  PCI: Disable MSI on SiS 761

* pci/resource:
  sparc/PCI: Add mem64 resource parsing for root bus
  PCI: Expand Enhanced Allocation BAR output
  PCI: Make Enhanced Allocation bitmasks more obvious
  PCI: Handle Enhanced Allocation capability for SR-IOV devices
  PCI: Add support for Enhanced Allocation devices
  PCI: Add Enhanced Allocation register entries
  PCI: Handle IORESOURCE_PCI_FIXED when assigning resources
  PCI: Handle IORESOURCE_PCI_FIXED when sizing resources
  PCI: Clear IORESOURCE_UNSET when reverting to firmware-assigned address

* pci/virtualization:
  PCI: Fix sriov_enable() error path for pcibios_enable_sriov() failures
  PCI: Wait 1 second between disabling VFs and clearing NumVFs
  PCI: Reorder pcibios_sriov_disable()
  PCI: Remove VFs in reverse order if virtfn_add() fails
  PCI: Remove redundant validation of SR-IOV offset/stride registers
  PCI: Set SR-IOV NumVFs to zero after enumeration
  PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs
  PCI: Don't try to restore VF BARs
2015-11-02 15:57:03 -06:00
Zhenzhong Duan
abed7d0710 xen: fix the check of e_pfn in xen_find_pfn_range
On some NUMA system, after dom0 up, we see below warning even if there are
enough pfn ranges that could be used for remapping:
"Unable to find available pfn range, not remapping identity pages"

Fix it to avoid getting a memory region of zero size in xen_find_pfn_range.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-11-02 17:47:38 +00:00
Rafael J. Wysocki
69f8947b8c Merge branches 'pm-cpufreq' and 'pm-cpuidle'
* pm-cpufreq:
  cpufreq: postfix policy directory with the first CPU in related_cpus
  cpufreq: create cpu/cpufreq/policyX directories
  cpufreq: remove cpufreq_sysfs_{create|remove}_file()
  cpufreq: create cpu/cpufreq at boot time
  cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
  cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value
  cpufreq: intel_pstate: Avoid calculation for max/min
  Documentation: kernel_parameters for Intel P state driver
  cpufreq: intel_pstate: Use ACPI perf configuration
  cpufreq: intel-pstate: Use separate max pstate for scaling
  cpufreq: intel_pstate: get P1 from TAR when available
  cpufreq: Drop redundant check for inactive policies
  cpufreq : powernv: Report Pmax throttling if capped below nominal frequency
  cpufreq: imx: update the clock switch flow to support imx6ul
  cpufreq: tegra20: remove superfluous CONFIG_PM ifdefs
  cpufreq: conservative: remove 'enable' field
  cpufreq: integrator: Fix module autoload for OF platform driver

* pm-cpuidle:
  cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
  cpuidle: mvebu: clean up multiple platform drivers
2015-11-02 00:54:10 +01:00
Wan Zongshun
2167ceabf3 x86/cpu: Add CLZERO detection
AMD Fam17h processors introduce support for the CLZERO
instruction. It zeroes out the 64 byte cache line specified in
RAX.

Add the bit here to allow /proc/cpuinfo to list the feature.

Boris: we're adding this as a separate ->x86_capability leaf
because CPUID_80000008_EBX is going to contain more feature bits
and it will fill out with time.

Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ Wrap code in patch form, fix comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:23 +01:00
Borislav Petkov
dc34bdd236 x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
Caught by building with W= which enable -Wswitch-default also.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:14 +01:00
Aravind Gopalakrishnan
c7f54d21fb x86/mce: Add a Scalable MCA vendor flags bit
Scalable MCA (SMCA) is a new feature in AMD Fam17h processors
which indicates presence of MCA extensions.

MCA extensions expands existing register space for the MCE banks
and also introduces a new MSR range to accommodate new banks.

Add the detection bit.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Reformat mce_vendor_flags definitions and save indentation levels. Improve comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1446207099-24948-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:13 +01:00
David S. Miller
b75ec3af27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-11-01 00:15:30 -04:00
Andy Lutomirski
2459ee8651 x86/vm86: Set thread.vm86 to NULL on fork/clone
thread.vm86 points to per-task information -- the pointer should not
be copied on clone.

Fixes: d4ce0f26c7 ("x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Stas Sergeev <stsp@list.ru>
Link: http://lkml.kernel.org/r/71c5d6985d70ec8197c8d72f003823c81b7dcf99.1446270067.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-31 09:50:25 +01:00
David Vrabel
914beb9fc2 x86/xen: add reschedule point when mapping foreign GFNs
Mapping a large range of foreign GFNs can take a long time, add a
reschedule point after each batch of 16 GFNs.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-10-28 13:46:27 +00:00
Ard Biesheuvel
44511fb9e5 efi: Use correct type for struct efi_memory_map::phys_map
We have been getting away with using a void* for the physical
address of the UEFI memory map, since, even on 32-bit platforms
with 64-bit physical addresses, no truncation takes place if the
memory map has been allocated by the firmware (which only uses
1:1 virtually addressable memory), which is usually the case.

However, commit:

  0f96a99dab ("efi: Add "efi_fake_mem" boot option")

adds code that clones and modifies the UEFI memory map, and the
clone may live above 4 GB on 32-bit platforms.

This means our use of void* for struct efi_memory_map::phys_map has
graduated from 'incorrect but working' to 'incorrect and
broken', and we need to fix it.

So redefine struct efi_memory_map::phys_map as phys_addr_t, and
get rid of a bunch of casts that are now unneeded.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: izumi.taku@jp.fujitsu.com
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: linux-efi@vger.kernel.org
Cc: matt.fleming@intel.com
Link: http://lkml.kernel.org/r/1445593697-1342-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-28 12:28:06 +01:00
Ingo Molnar
06ef431ab8 * Fix a kernel panic by not passing EFI virtual mapping addresses to
__pa() in the x86 pageattr code. Since these virtual addreses are
   not part of the direct mapping or kernel text mapping, passing them
   to __pa() will trigger a BUG_ON() when CONFIG_DEBUG_VIRTUAL is
   enabled - Sai Praneeth Prakhya
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWLLEiAAoJEC84WcCNIz1VWL4P/3i6TI8IksbMkjLrUPS80g9Q
 QgYw2U1r750JrVkYGMQ6VyE1nZtSI7sbXRAjhYGLOQvs6dnCcsmAcT8zPVNSNVs4
 hxRW2+hXe8LfMqddpGhSN7/s1Ssg7weEynUvQ58F+qRI4e5Lbk2zz4pqO6IQQVv2
 Q4jIXosy7VlDxbx7Nn0wUov6cHNTeLQhGLqUiDJEw32qUnWA6NhYGrQof98Pu+gi
 vcigb62ebicvduv6TRm5zPYANYz8AJqt7cywL7DkjT/vSKF+k/l6W5KWfOoVPd8N
 99z9uTSJa1j+LuRYkPr8+YSPW9OtXD55QPkGYAFo/OWTt7j/QBgbGj0zJKWayrdK
 3JefcQlH5wau5adaAjJwCm9qbapRS8De1yCMEz05gW8g5eZfqo32dnMg7UAWbWh5
 fT6dAWffSxbYBLqPKV18nMgWTopwXh4IGddhB6y811SImUXhOvA7jwjJHYei8dKf
 3eBN5rQ4owo0GfIiBGpKOt4ET2klgSF8xnnlYmbA7QeMRPKySNcXowsJ9wPimfZS
 MGhmpz+tv/030ROuBqVjrvkGyFdxKgPbtH2hw6Ka7t88YqlyUyUIiu7pxZM2LY9g
 l+6tLRWmExF993Tk3S9vCSfYEZwDHU2aYsObscgy1YEdydX4858uTSH91sVkoyBR
 dtv8PmonJp/z3aw7XZKU
 =xdZV
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi

Pull EFI fix from Matt Fleming:

  - Fix a kernel panic by not passing EFI virtual mapping addresses to
    __pa() in the x86 pageattr code. Since these virtual addreses are
    not part of the direct mapping or kernel text mapping, passing them
    to __pa() will trigger a BUG_ON() when CONFIG_DEBUG_VIRTUAL is
    enabled. (Sai Praneeth Prakhya)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-27 18:40:47 +01:00
Werner Pawlitschko
ababae4410 x86/ioapic: Prevent NULL pointer dereference in setup_ioapic_dest()
Commit 4857c91f0d changed the way how irq affinity is setup in
setup_ioapic_dest() from using the core helper function to
unconditionally calling the irq_set_affinity() callback of the
underlying irq chip.

That results in a NULL pointer dereference for the rare case where the
underlying irq chip is lapic_chip which has no irq_set_affinity()
callback. lapic_chip is occasionally used for the timer interrupt (irq
0).

The fix is simple: Check the availability of the callback instead of
calling it unconditionally.

Fixes: 4857c91f0d "x86/ioapic: Force affinity setting in setup_ioapic_dest()"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2015-10-27 09:18:34 +09:00
Ville Syrjälä
298a96c12b x86/dma-mapping: Fix arch_dma_alloc_attrs() oops with NULL dev
Commit 6894258eda broke drivers that pass NULL as the device pointer
to dma_alloc. The reason is that arch_dma_alloc_attrs() now calls
dma_alloc_coherent_gfp_flags() which in turn calls
dma_alloc_coherent_mask(), where the device pointer is dereferenced
unconditionally.

Fix things by moving the ISA DMA fallback device assignment before the
call to dma_alloc_coherent_gfp_flags().

Fixes: 6894258eda ("dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}")
Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/1445807503-8920-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-26 14:59:36 +09:00
Rafael J. Wysocki
ba210f5de4 Merge branch 'acpi-pci'
* acpi-pci:
  ia64/PCI/ACPI: Use common interface to support PCI host bridge
  x86/PCI/ACPI: Use common interface to support PCI host bridge
  ACPI/PCI: Reset acpi_root_dev->domain to 0 when pci_ignore_seg is set
  PCI/ACPI: Add interface acpi_pci_root_create()
  ia64/PCI: Use common struct resource_entry to replace struct iospace_resource
  ia64/PCI/ACPI: Use common ACPI resource parsing interface for host bridge
  ACPI/PCI: Enhance ACPI core to support sparse IO space
2015-10-25 22:55:31 +01:00
Rafael J. Wysocki
343ccb040e Merge branches 'acpi-scan', 'acpi-tables', 'acpi-ec' and 'acpi-assorted'
* acpi-scan:
  ACPI / scan: use kstrdup_const() in acpi_add_id()
  ACPI / scan: constify struct acpi_hardware_id::id
  ACPI / scan: constify first argument of struct acpi_scan_handler::match

* acpi-tables:
  ACPI / tables: test the correct variable
  x86, ACPI: Handle apic/x2apic entries in MADT in correct order
  ACPI / tables: Add acpi_subtable_proc to ACPI table parsers

* acpi-ec:
  ACPI / EC: Fix a race issue in acpi_ec_guard_event()
  ACPI / EC: Fix query handler related issues

* acpi-assorted:
  ACPI: change acpi_sleep_proc_init() to return void
  ACPI: change init_acpi_device_notify() to return void
2015-10-25 22:54:46 +01:00
Sai Praneeth
2c66e24d75 x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabled
When CONFIG_DEBUG_VIRTUAL is enabled, all accesses to __pa(address) are
monitored to see whether address falls in direct mapping or kernel text
mapping (see Documentation/x86/x86_64/mm.txt for details), if it does
not, the kernel panics. During 1:1 mapping of EFI runtime services we access
virtual addresses which are == physical addresses, thus the 1:1 mapping
and these addresses do not fall in either of the above two regions and
hence when passed as arguments to __pa() kernel panics as reported by
Dave Hansen here https://lkml.kernel.org/r/5462999A.7090706@intel.com.

So, before calling __pa() virtual addresses should be validated which
results in skipping call to split_page_count() and that should be fine
because it is used to keep track of everything *but* 1:1 mappings.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Glenn P Williamson <glenn.p.williamson@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2015-10-25 10:22:25 +00:00
Linus Torvalds
0386729247 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: two KASAN fixes, two EFI boot fixes, two boot-delay
  optimization fixes, and a fix for a IRQ handling hang observed on
  virtual platforms"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm, kasan: Silence KASAN warnings in get_wchan()
  compiler, atomics, kasan: Provide READ_ONCE_NOCHECK()
  x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels
  x86/smpboot: Fix CPU #1 boot timeout
  x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior
  x86/ioapic: Disable interrupts when re-routing legacy IRQs
  x86/setup: Extend low identity map to cover whole kernel range
  x86/efi: Fix multiple GOP device support
2015-10-23 22:34:32 +09:00
Stefano Stabellini
a314e3eb84 xen/arm: Enable cpu_hotplug.c
Build cpu_hotplug for ARM and ARM64 guests.

Rename arch_(un)register_cpu to xen_(un)register_cpu and provide an
empty implementation on ARM and ARM64. On x86 just call
arch_(un)register_cpu as we are already doing.

Initialize cpu_hotplug on ARM.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-10-23 14:20:47 +01:00
Julien Grall
291be10fd7 xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb
With 64KB page granularity support, the frame number will be different.

It will be easier to modify the behavior in a single place rather than
in each caller.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23 14:20:43 +01:00
Julien Grall
008c320a96 xen/grant: Introduce helpers to split a page into grant
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23 14:20:33 +01:00
David Vrabel
8edfcf882e x86/xen: export xen_alloc_p2m_entry()
Rename alloc_p2m() to xen_alloc_p2m_entry() and export it.

This is useful for ensuring that a p2m entry is allocated (i.e., not a
shared missing or identity entry) so that subsequent set_phys_to_machine()
calls will require no further allocations.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
---
v3:
- Make xen_alloc_p2m_entry() a nop on auto-xlate guests.
2015-10-23 14:20:28 +01:00
David Vrabel
81b286e0f1 xen/balloon: make alloc_xenballoon_pages() always allocate low pages
All users of alloc_xenballoon_pages() wanted low memory pages, so
remove the option for high memory.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2015-10-23 14:20:05 +01:00
David Vrabel
f5775e0b61 x86/xen: discard RAM regions above the maximum reservation
During setup, discard RAM regions that are above the maximum
reservation (instead of marking them as E820_UNUSABLE).  This allows
hotplug memory to be placed at these addresses.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2015-10-23 14:20:03 +01:00
Christoffer Dall
3217f7c25b KVM: Add kvm_arch_vcpu_{un}blocking callbacks
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:41 +02:00
Minfei Huang
883a1e867e ftrace: Calculate the correct dyn_ftrace number to report to the userspace
Now, ftrace only calculate the dyn_ftrace number in the adding
breakpoint loop, not in adding update and finish update loop.

Calculate the correct dyn_ftrace, once ftrace reports the failure message
to the userspace.

Link: http://lkml.kernel.org/r/1442420382-13130-1-git-send-email-mnfhuang@gmail.com

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-22 15:44:24 -04:00
Joerg Roedel
8affb487d4 x86/PCI: Don't alloc pcibios-irq when MSI is enabled
The pcibios-irq and MSI both use dev->irq to store the IRQ number.  While
the MSI code checks for that and frees the pcibios-irq before overwriting
dev->irq, the pcibios_alloc_irq() function does not.

Usually this is not a problem, as the pcibios-irq is allocated before probe
time of the device and the MSI IRQ is allocted from the driver's probe
path.

But there are PCI devices handled by the core kernel and not by a standard
PCI driver, like the AMD IOMMU for example.  For the AMD IOMMU a normal PCI
device driver does not make sense, because a driver can be forcibly unbound
from its device, which is not a good idea for an IOMMU.

Nevertheless the PCI core code tries to match the PCI device implementing
the AMD IOMMU against drivers, and allocates/frees a pcibios IRQ every time
it tries out a new driver.  This overwrites the dev->irq field set by
pci_enable_msi() and sets it to 0 in the end (because the probe fails and
the pcibios-irq is freed again).

On suspend/resume this breaks the kernel, because the IRQ descriptor for
IRQ 0 is NULL.

Fix this by not allocating a pcibios-irq when MSI is already active.  This
also has the benefit, that a device claimed by the core kernel can not be
probed by a PCI driver later.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
2015-10-21 11:19:25 -05:00
Borislav Petkov
c595ac2bac x86/microcode/intel: Move #ifdef DEBUG inside the function
... and save us the stub.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:12 +02:00
Borislav Petkov
6f7fc44bf1 x86/microcode/amd: Remove maintainers from comments
We have the MAINTAINERS file for that. Also, Andreas doesn't
have the time for this work anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:12 +02:00
Borislav Petkov
6b26e1bf66 x86/microcode: Remove modularization leftovers
Remove the remaining module functionality leftovers. Make
"dis_ucode_ldr" an early_param and make it static again. Drop
module aliases, autoloading table, description, etc.

Bump version number, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:12 +02:00
Borislav Petkov
fe055896c0 x86/microcode: Merge the early microcode loader
Merge the early loader functionality into the driver proper. The
diff is huge but logically, it is simply moving code from the
_early.c files into the main driver.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:12 +02:00
Borislav Petkov
9a2bc335f1 x86/microcode: Unmodularize the microcode driver
Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
since early loader was forcing it to =y.

Regardless, there's no real reason to have something be a module which
gets built-in on the majority of installations out there. And its not
like there's noticeable change in functionality - we still can load late
microcode - just the module glue disappears.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:11 +02:00
Jan Beulich
3d45ac4b35 timers/x86/hpet: Type adjustments
Standardize on bool instead of an inconsistent mixture of u8 and plain 'int'.

Also use u32 or 'unsigned int' instead of 'unsigned long' when a 32-bit type
suffices, generating slightly better code on x86-64.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5624E3A002000078000AC49A@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:17:32 +02:00
Borislav Petkov
221836e92c x86/Kconfig/cpus: Fix/complete CPU type help texts
Move the generic help text explaining each CPU type and what to
select under the "Processor Family" prompt and not under the
M486 option. Also, amend it with the missing options.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445244077-25120-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:12:56 +02:00
Andi Kleen
81ffdcdd97 x86/mce: Fix thermal throttling reporting after kexec
The per CPU thermal vector init code checks if the thermal
vector is already installed and complains and bails out if it
is.

This happens after kexec, as kernel shut down does not clear the
thermal vector APIC register.

This causes two problems:

1. So we always do not fully initialize thermal reports after
   kexec. The CPU is still likely initialized, as the previous
   kernel should have done it. But we don't set up the software
   pointer to the thermal vector, so reporting may end up with a
   unknown thermal interrupt message.

2. Also it complains for every logical CPU, even though the
   value is actually derived from BP only.

The problem is that we end up with one message per CPU, so on
larger systems it becomes very noisy and messes up the otherwise
nicely formatted CPU bootup numbers in the kernel log.

Just remove the check. I checked the code and there's no valid
code paths where the thermal init code for a CPU could be called
multiple times.

Why the kernel does not clean up this value on shutdown:

The thermal monitoring is controlled per logical CPU thread.
Normal shutdown code is just running on one CPU. To disable it
we would need a broadcast NMI to all CPUs on shut down. That's
overkill for this. So we just ignore it after kexec.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445246268-26285-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:57 +02:00
Borislav Petkov
6f3760570e x86/setup/crash: Check memblock_reserve() retval
memblock_reserve() can fail but the crashkernel reservation code
doesn't check that and this can lead the user into believing
that the crashkernel region was actually reserved. Make sure we
check that return value and we exit early with a failure message
in the error case.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Chao <chaowang@redhat.com>
Cc: jerry_hoemann@hp.com
Link: http://lkml.kernel.org/r/1445246268-26285-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:56 +02:00
Borislav Petkov
f56d55781c x86/setup/crash: Cleanup some more
* Remove unused auto_set variable
* Cleanup local function variable declarations
* Reformat printk string and use pr_info()

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Chao <chaowang@redhat.com>
Cc: jerry_hoemann@hp.com
Link: http://lkml.kernel.org/r/1445246268-26285-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:56 +02:00
Borislav Petkov
606134f77c x86/setup/crash: Remove alignment variable
Use a macro instead. No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Chao <chaowang@redhat.com>
Cc: jerry_hoemann@hp.com
Link: http://lkml.kernel.org/r/1445246268-26285-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:56 +02:00
Borislav Petkov
97eac21bab x86/setup: Cleanup crashkernel reservation functions
* Shorten variable names
* Realign code, space out for better readability

No code changed:

  # arch/x86/kernel/setup.o:

   text    data     bss     dec     hex filename
   4543    3096   69904   77543   12ee7 setup.o.before
   4543    3096   69904   77543   12ee7 setup.o.after

md5:
   8a1b7c6738a553ca207b56bd84a8f359  setup.o.before.asm
   8a1b7c6738a553ca207b56bd84a8f359  setup.o.after.asm

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Chao <chaowang@redhat.com>
Cc: jerry_hoemann@hp.com
Link: http://lkml.kernel.org/r/1445246268-26285-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:56 +02:00
Aravind Gopalakrishnan
1a6775c1a2 x86/amd_nb, EDAC: Rename amd_get_node_id()
This function doesn't give us the "Node ID" as the function name
suggests. Rather, it receives a PCI device as argument, checks
the available F3 PCI device IDs in the system and returns the
index of the matching Bus/Device IDs.

Rename it to amd_pci_dev_to_node_id().

No functional change is introduced.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:55 +02:00
Baoquan He
eb6db83d10 x86/setup: Do not reserve crashkernel high memory if low reservation failed
People reported that when allocating crashkernel memory using
the ",high" and ",low" syntax, there were cases where the
reservation of the high portion succeeds but the reservation of
the low portion fails.

Then kexec can load the kdump kernel successfully, but booting
the kdump kernel fails as there's no low memory.

The low memory allocation for the kdump kernel can fail on large
systems for a couple of reasons. For example, the manually
specified crashkernel low memory can be too large and thus no
adequate memblock region would be found.

Therefore, we try to reserve low memory for the crash kernel
*after* the high memory portion has been allocated. If that
fails, we free crashkernel high memory too and return. The user
can then take measures accordingly.

Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Massage text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Chao <chaowang@redhat.com>
Cc: jerry_hoemann@hp.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:55 +02:00
David S. Miller
26440c835f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20 06:08:27 -07:00
Andrey Ryabinin
f7d27c35dd x86/mm, kasan: Silence KASAN warnings in get_wchan()
get_wchan() is racy by design, it may access volatile stack
of running task, thus it may access redzone in a stack frame
and cause KASAN to warn about this.

Use READ_ONCE_NOCHECK() to silence these warnings.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Link: http://lkml.kernel.org/r/1445243838-17763-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 11:04:19 +02:00
Stephane Eranian
d892819faa perf/x86: Add support for PERF_SAMPLE_BRANCH_CALL
This patch enables the suport for the PERF_SAMPLE_BRANCH_CALL
for Intel x86 processors. When the processor support LBR filtering
this the selection is done in hardware. Otherwise, the filter is
applied by software. Note that we chose to include zero length calls
because they also represent calls.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: khandual@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1444720151-10275-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:30:53 +02:00
Adrian Hunter
b9511cd761 perf/x86: Fix time_shift in perf_event_mmap_page
Commit:

  b20112edea ("perf/x86: Improve accuracy of perf/sched clock")

allowed the time_shift value in perf_event_mmap_page to be as much
as 32.  Unfortunately the documented algorithms for using time_shift
have it shifting an integer, whereas to work correctly with the value
32, the type must be u64.

In the case of perf tools, Intel PT decodes correctly but the timestamps
that are output (for example by perf script) have lost 32-bits of
granularity so they look like they are not changing at all.

Fix by limiting the shift to 31 and adjusting the multiplier accordingly.

Also update the documentation of perf_event_mmap_page so that new code
based on it will be more future-proof.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: b20112edea ("perf/x86: Improve accuracy of perf/sched clock")
Link: http://lkml.kernel.org/r/1445001845-13688-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:30:52 +02:00
Ingo Molnar
6af597de62 Merge branch 'sched/urgent' into sched/core, to pick up fixes and resolve conflicts
Conflicts:
	kernel/sched/fair.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:18:16 +02:00
Ingo Molnar
a1a2ab2ff7 Linux 4.3-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWJCaDAAoJEHm+PkMAQRiGvJQH/jGlxawu+mnNFjIRXmk8Zucq
 cxDVTPnQ4S1DJMyRSQTqe6ccDgDM/TEISJPZ0Gwc2SQLda27zdQiz05upbqHSPSH
 /FUHMKiMRhhIBOzdfEGR4Ry2YZID+DlG5EWCgRKf/GlgY3STZSpjIBPYd3Rl3zsU
 qragNjtbCE6eUpnLwkv40Q2mzzR3/fZxJ0drgE/QiSF8P0VmvVbrcQMF58vnUQgy
 9URp48mLhmKj+IrElSnvXMG0XLrKn1tRYXGXd3w+x1Nlr//SHYsIuQHUcp6SAVQh
 p2HXLL7eoy8rs45MCiOfXG7VgNWF6HFjVMtrNUcNfgBsLCx0qFMVrKIILrGtKh0=
 =ETze
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rc6' into locking/core, to pick up fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:16:46 +02:00
Hans-Werner Hilse
37e81a016c um: Do not rely on libc to provide modify_ldt()
modify_ldt() was declared as an external symbol. Despite the man
page for this syscall telling that there is no wrapper in glibc,
since version 2.1 there actually is, so linking to the glibc
works.

Since modify_ldt() is not a POSIX interface, other libc
implementations do not always provide a wrapper function.
Even glibc headers do not provide a corresponding declaration.

So go the recommended way to call this using syscall().

Signed-off-by: Hans-Werner Hilse <hwhilse@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-10-19 22:53:37 +02:00
Takuya Yoshikawa
8c85ac1c0a KVM: x86: MMU: Initialize force_pt_level before calling mapping_level()
Commit fd13690218 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap()
call in mapping_level()") forgot to initialize force_pt_level to false
in FNAME(page_fault)() before calling mapping_level() like
nonpaging_map() does.  This can sometimes result in forcing page table
level mapping unnecessarily.

Fix this and move the first *force_pt_level check in mapping_level()
before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that
the variable must be initialized before mapping_level() gets called.

This change can also avoid calling kvm_vcpu_gfn_to_memslot() when
!check_hugepage_cache_consistency() check in tdp_page_fault() forces
page table level mapping.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-19 11:36:05 +02:00
Paolo Bonzini
5690891bce kvm: x86: zero EFER on INIT
Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
without clearing EFER.LME first (which it should not know about).
Yang Zhang from Intel confirmed that the manual is wrong and EFER is
cleared to zero on INIT.

Fixes: d28bc9dd25
Cc: stable@vger.kernel.org
Cc: Yang Z Zhang <yang.z.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-19 11:34:45 +02:00
Chuck Ebbert
558a65bc31 sched/x86: Fix typo in __switch_to() comments
Fix obvious mistake: FS/GS should be DS/ES.

Signed-off-by: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151014143119.78858eeb@r5
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-19 10:18:53 +02:00
Andrey Ryabinin
a75ca545e8 x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels
Declaration of memcpy() is hidden under #ifndef CONFIG_KMEMCHECK.
In asm/efi.h under #ifdef CONFIG_KASAN we #undef memcpy(), due to
which the following happens:

  In file included from arch/x86/kernel/setup.c:96:0:
  ./arch/x86/include/asm/desc.h: In function ‘native_write_idt_entry’:
  ./arch/x86/include/asm/desc.h:122:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration]   memcpy(&idt[entry], gate, sizeof(*gate));
    ^
    cc1: some warnings being treated as errors
    make[2]: *** [arch/x86/kernel/setup.o] Error 1

We will get rid of that #undef in asm/efi.h eventually.
But in the meanwhile move memcpy() declaration out of #ifdefs
to fix the build.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1444994933-28328-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-19 10:07:23 +02:00
Len Brown
fcafddec4e x86/smpboot: Fix CPU #1 boot timeout
The following commit:

  a9bcaa02a5 ("x86/smpboot: Remove SIPI delays from cpu_up()")

Caused some Intel Core2 processors to time-out when bringing up CPU #1,
resulting in the missing of that CPU after bootup.

That patch reduced the SIPI delays from udelay() 300, 200 to udelay() 0,
0 on modern processors.

Several Intel(R) Core(TM)2 systems failed to bring up CPU #1 10/10 times
after that change.

Increasing either of the SIPI delays to udelay(1) results in
success. So here we increase both to udelay(10).  While this may
be 20x slower than the absolute minimum, it is still 20x to 30x
faster than the original code.

Tested-by: Donald Parsons <dparsons@brightdsl.net>
Tested-by: Shane <shrybman@teksavvy.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dparsons@brightdsl.net
Cc: shrybman@teksavvy.com
Link: http://lkml.kernel.org/r/6dd554ee8945984d85aafb2ad35793174d068af0.1444968087.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-19 09:14:41 +02:00
Len Brown
f1ccd24931 x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior
For legacy machines cpu_init_udelay defaults to 10,000.
For modern machines it is set to 0.

The user should be able to set cpu_init_udelay to
any value on the cmdline, including 10,000.

Before this patch, that was seen as "unchanged from default"
and thus on a modern machine, the user request was ignored
and the delay was set to 0.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dparsons@brightdsl.net
Cc: shrybman@teksavvy.com
Link: http://lkml.kernel.org/r/de363cdbbcfcca1d22569683f7eb9873e0177251.1444968087.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-19 09:14:41 +02:00
Andy Lutomirski
3bd29515d1 x86/entry/32: Fix FS and GS restore in opportunistic SYSEXIT
We either need to restore them before popping and thus changing
ESP, or we need to adjust the offsets.  The former is simpler.

Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5f310f739b x86/entry/32: ("Re-implement SYSENTER using the new C path")
Link: http://lkml.kernel.org/r/461e5c7d8fa3821529893a4893ac9c4bc37f9e17.1445035014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-18 12:11:16 +02:00
Andy Lutomirski
657c1eea00 x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on
When I rewrote entry_INT80_32, I thought that int80 was an
interrupt gate.  It's a trap gate.  *facepalm*

Thanks to Brian Gerst for pointing out that it's better to
change the entry code than to change the gate type.

Suggested-by: Brian Gerst <brgerst@gmail.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 150ac78d63 ("x86/entry/32: Switch INT80 to the new C syscall path")
Link: http://lkml.kernel.org/r/dc09d9b574a5c1dcca996847875c73f8341ce0ad.1445035014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-18 12:11:16 +02:00
Jiang Liu
4d6b4e69a2 x86/PCI/ACPI: Use common interface to support PCI host bridge
Use common interface to simplify ACPI PCI host bridge implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-16 22:18:52 +02:00
Jiang Liu
a3669868d9 ACPI/PCI: Reset acpi_root_dev->domain to 0 when pci_ignore_seg is set
Reset acpi_root_dev->domain to 0 when pci_ignore_seg is set to keep
consistence between ACPI PCI root device and PCI host bridge device.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-16 22:18:52 +02:00
Rafael J. Wysocki
7855e10294 Merge back earlier cpufreq material for v4.4. 2015-10-16 22:12:02 +02:00
Vitaly Kuznetsov
c0ff971ef9 x86/ioapic: Disable interrupts when re-routing legacy IRQs
A sporadic hang with consequent crash is observed when booting Hyper-V Gen1
guests:

 Call Trace:
  <IRQ>
  [<ffffffff810ab68d>] ? trace_hardirqs_off+0xd/0x10
  [<ffffffff8107b616>] queue_work_on+0x46/0x90
  [<ffffffff81365696>] ? add_interrupt_randomness+0x176/0x1d0
  ...
  <EOI>
  [<ffffffff81471ddb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
  [<ffffffff810c295e>] __irq_put_desc_unlock+0x1e/0x40
  [<ffffffff810c5c35>] irq_modify_status+0xb5/0xd0
  [<ffffffff8104adbb>] mp_register_handler+0x4b/0x70
  [<ffffffff8104c55a>] mp_irqdomain_alloc+0x1ea/0x2a0
  [<ffffffff810c7f10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
  [<ffffffff810c860c>] __irq_domain_alloc_irqs+0x13c/0x2b0
  [<ffffffff8104b070>] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0
  [<ffffffff8104bfa5>] mp_map_pin_to_irq+0x165/0x2d0
  [<ffffffff8104c157>] pin_2_irq+0x47/0x80
  [<ffffffff81744253>] setup_IO_APIC+0xfe/0x802
  ...
  [<ffffffff814631c0>] ? rest_init+0x140/0x140

The issue is easily reproducible with a simple instrumentation: if
mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls
in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing
IRQ0. The issue seems to be caused by the fact that we don't disable
interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually
happens. 

Protect the setup sequence against concurrent interrupts.

[ tglx: Make the protection unconditional and not only for legacy
  	interrupts ]

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1444930943-19336-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-16 16:31:24 +02:00
Paolo Bonzini
f5f3497cad x86/setup: Extend low identity map to cover whole kernel range
On 32-bit systems, the initial_page_table is reused by
efi_call_phys_prolog as an identity map to call
SetVirtualAddressMap.  efi_call_phys_prolog takes care of
converting the current CPU's GDT to a physical address too.

For PAE kernels the identity mapping is achieved by aliasing the
first PDPE for the kernel memory mapping into the first PDPE
of initial_page_table.  This makes the EFI stub's trick "just work".

However, for non-PAE kernels there is no guarantee that the identity
mapping in the initial_page_table extends as far as the GDT; in this
case, accesses to the GDT will cause a page fault (which quickly becomes
a triple fault).  Fix this by copying the kernel mappings from
swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
identity mapping.

For some reason, this is only reproducible with QEMU's dynamic translation
mode, and not for example with KVM.  However, even under KVM one can clearly
see that the page table is bogus:

    $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
    $ gdb
    (gdb) target remote localhost:1234
    (gdb) hb *0x02858f6f
    Hardware assisted breakpoint 1 at 0x2858f6f
    (gdb) c
    Continuing.

    Breakpoint 1, 0x02858f6f in ?? ()
    (gdb) monitor info registers
    ...
    GDT=     0724e000 000000ff
    IDT=     fffbb000 000007ff
    CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
    ...

The page directory is sane:

    (gdb) x/4wx 0x32b7000
    0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
    (gdb) x/4wx 0x3398000
    0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
    (gdb) x/4wx 0x3399000
    0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003

but our particular page directory entry is empty:

    (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
    0x32b7070:	0x00000000

[ It appears that you can skate past this issue if you don't receive
  any interrupts while the bogus GDT pointer is loaded, or if you avoid
  reloading the segment registers in general.

  Andy Lutomirski provides some additional insight:

   "AFAICT it's entirely permissible for the GDTR and/or LDT
    descriptor to point to unmapped memory.  Any attempt to use them
    (segment loads, interrupts, IRET, etc) will try to access that memory
    as if the access came from CPL 0 and, if the access fails, will
    generate a valid page fault with CR2 pointing into the GDT or
    LDT."

  Up until commit 23a0d4e8fa ("efi: Disable interrupts around EFI
  calls, not in the epilog/prolog calls") interrupts were disabled
  around the prolog and epilog calls, and the functional GDT was
  re-installed before interrupts were re-enabled.

  Which explains why no one has hit this issue until now. ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ Updated changelog. ]
2015-10-16 10:52:29 +01:00
Marcelo Tosatti
7cae2bedcb KVM: x86: move steal time initialization to vcpu entry time
As reported at https://bugs.launchpad.net/qemu/+bug/1494350,
it is possible to have vcpu->arch.st.last_steal initialized
from a thread other than vcpu thread, say the iothread, via
KVM_SET_MSRS.

Which can cause an overflow later (when subtracting from vcpu threads
sched_info.run_delay).

To avoid that, move steal time accumulation to vcpu entry time,
before copying steal time data to guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:16 +02:00
Takuya Yoshikawa
5225fdf8c8 KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level()
Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be
avoided since getting a slot by binary search may not be negligible,
especially for virtual machines with many memory slots.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:02 +02:00
Takuya Yoshikawa
d8aacf5df8 KVM: x86: MMU: Remove mapping_level_dirty_bitmap()
Now that it has only one caller, and its name is not so helpful for
readers, remove it.  The new memslot_valid_for_gpte() function
makes it possible to share the common code between
gfn_to_memslot_dirty_bitmap() and mapping_level().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:01 +02:00
Takuya Yoshikawa
fd13690218 KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level()
This is necessary to eliminate an extra memory slot search later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:00 +02:00
Takuya Yoshikawa
5ed5c5c8fd KVM: x86: MMU: Simplify force_pt_level calculation code in FNAME(page_fault)()
As a bonus, an extra memory slot search can be eliminated when
is_self_change_mapping is true.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:00 +02:00
Takuya Yoshikawa
cd1872f028 KVM: x86: MMU: Make force_pt_level bool
This will be passed to a function later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:33:59 +02:00
Joerg Roedel
6092d3d3e6 kvm: svm: Only propagate next_rip when guest supports it
Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das <bsd@redhat.com>
Cc: Dirk Mueller <dmueller@suse.com>
Tested-By: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:32:17 +02:00
Paolo Bonzini
951f9fd74f KVM: x86: manually unroll bad_mt_xwr loop
The loop is computing one of two constants, it can be simpler to write
everything inline.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:32:16 +02:00
Wanpeng Li
089d7b6ec5 KVM: nVMX: expose VPID capability to L1
Expose VPID capability to L1. For nested guests, we don't do anything
specific for single context invalidation. Hence, only advertise support
for global context invalidation. The major benefit of nested VPID comes
from having separate vpids when switching between L1 and L2, and also
when L2's vCPUs not sched in/out on L1.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:55 +02:00
Wanpeng Li
5c614b3583 KVM: nVMX: nested VPID emulation
VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
kernel    Linux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.60000 2.88000  nested VPID
kernel    Linux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  vanilla

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:35 +02:00
Wanpeng Li
99b83ac893 KVM: nVMX: emulate the INVVPID instruction
Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:24 +02:00
Srinivas Pandruvada
6a35fc2d6c cpufreq: intel_pstate: get P1 from TAR when available
After Ivybridge, the max non turbo ratio obtained from platform info msr
is not always guaranteed P1 on client platforms. The max non turbo
activation ratio (TAR), determines the max for the current level of TDP.
The ratio in platform info is physical max. The TAR MSR can be locked,
so updating this value is not possible on all platforms.
This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if
available,
but also do some sanity checking to make sure that this value is
correct.
The sanity check involves reading the TDP ratio for the current tdp
control value when platform has configurable TDP present and matching
TAC
with this.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:53:18 +02:00
Lukasz Anaczkowski
d81056b527 x86, ACPI: Handle apic/x2apic entries in MADT in correct order
ACPI specifies the following rules when listing APIC IDs:
(1) Boot processor is listed first
(2) For multi-threaded processors, BIOS should list the first logical
    processor of each of the individual multi-threaded processors in MADT
    before listing any of the second logical processors.
(3) APIC IDs < 0xFF should be listed in APIC subtable, APIC IDs >= 0xFF
    should be listed in X2APIC subtable

Because of above, when there's more than 0xFF logical CPUs, BIOS
interleaves APIC/X2APIC subtables.

Assuming, there's 72 cores, 72 hyper-threads each, 288 CPUs total,
listing is like this:

APIC (0,4,8, .., 252)
X2APIC (258,260,264, .. 284)
APIC (1,5,9,...,253)
X2APIC (259,261,265,...,285)
APIC (2,6,10,...,254)
X2APIC (260,262,266,..,286)
APIC (3,7,11,...,251)
X2APIC (255,261,262,266,..,287)

Now, before this patch, due to how ACPI MADT subtables were parsed (BSP
then X2APIC then APIC), kernel enumerated CPUs in reverted order (i.e.
high APIC IDs were getting low logical IDs, and low APIC IDs were
getting high logical IDs).
This is wrong for the following reasons:
() it's hard to predict how cores and threads are enumerated
() when it's hard to predict, s/w threads cannot be properly affinitized
   causing significant performance impact due to e.g. inproper cache
   sharing
() enumeration is inconsistent with how threads are enumerated on
   other Intel Xeon processors

So, order in which MADT APIC/X2APIC handlers are passed is
reverse and both handlers are passed to be called during same MADT
table to walk to achieve correct CPU enumeration.

In scenario when someone boots kernel with options 'maxcpus=72 nox2apic',
in result less cores may be booted, since some of the CPUs the kernel
will try to use will have APIC ID >= 0xFF. In such case, one
should not pass 'nox2apic'.

Disclimer: code parsing MADT APIC/X2APIC has not been touched since 2009,
when X2APIC support was initially added. I do not know why MADT parsing
code was added in the reversed order in the first place.
I guess it didn't matter at that time since nobody cared about cores
with APIC IDs >= 0xFF, right?

This patch is based on work of "Yinghai Lu <yinghai@kernel.org>"
previously published at https://lkml.org/lkml/2013/1/21/563

Here's the explanation why parsing interface needs to be changed
and why simpler approach will not work https://lkml.org/lkml/2015/9/7/285

Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de> (commit message)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:31:06 +02:00
Linus Torvalds
cfed1e3de4 Bug fixes for system management mode emulation. The first two patches
fix SMM emulation on Nehalem processors.  The others fix some cases
 that became apparent as work progressed on the firmware side.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWHms0AAoJEL/70l94x66DfQkIAIpya6c/1UAthxSTqJ1wFOf8
 ZKp3GCMjUjtm9k88kk6JGOlPiAvWz7CG9BVbptpkJGpgoDzquvr6ZKGG2BV88F17
 MnkZCid4IBW6VeKYy7R2otkKw7+Pw8DTHRQks+VI6BN/KkeaZLzh5J8+FAl4ZaWk
 YX/VulRce6SfZPYuUTRkkK8aebsopZNVG8mwWIGuBYwyH54R3KH1k/euX2joUPwm
 oopzmQLgEWW7e3RsO67T36rIRgEorJLZaiiexvj1djI+e0kEEudvhJ9nC6eB52qa
 oZ9nR0nkkmBmrBF8gldKDZBC+Y/ci1cJLAaoi7tdsp0wVCebPxubwbPOXxKwD8g=
 =ij8Q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Bug fixes for system management mode emulation.

  The first two patches fix SMM emulation on Nehalem processors.  The
  others fix some cases that became apparent as work progressed on the
  firmware side"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix RSM into 64-bit protected mode
  KVM: x86: fix previous commit for 32-bit
  KVM: x86: fix SMI to halted VCPU
  KVM: x86: clean up kvm_arch_vcpu_runnable
  KVM: x86: map/unmap private slots in __x86_set_memory_region
  KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
2015-10-14 10:01:32 -07:00
Andy Lutomirski
612bece654 um/x86: Fix build after x86 syscall changes
I didn't realize that um didn't include x86's asm/syscall.h.
Re-add a missing typedef.

Reported-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 034042cc1e ("x86/entry/syscalls: Move syscall table declarations into asm/syscalls.h")
Link: http://lkml.kernel.org/r/8d15b9a88f4fd49e3342757e0a34624ee5ce9220.1444696194.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:56:28 +02:00
Andy Lutomirski
af22aa7c76 x86/asm: Remove the xyz_cfi macros from dwarf2.h
They are currently unused, and I don't think that anyone was
ever particularly happy with them.  They had the unfortunate
property that they made it easy to CFI-annotate things without
thinking about them -- when pushing, do you want to just update
the CFA offset, or do you also want to update the saved location
of the register being pushed?

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447bfbd10bb268b4593b32534ecefa1f4df287e.1444696194.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:56:28 +02:00
Ingo Molnar
790a2ee242 * Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway - Paul Gortmaker
 
  * Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64 - Leif Lindholm
 
  * Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses - Matt Fleming
 
  * Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel
 
  * Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module - Ben Hutchings
 
  * Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support - Taku Izumi
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
 +OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
 H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
 2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
 ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
 0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
 zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
 pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
 v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
 dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
 v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
 8xb/rET4JrhCG4gFMUZ7
 =1Oee
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi

Pull v4.4 EFI updates from Matt Fleming:

  - Make the EFI System Resource Table (ESRT) driver explicitly
    non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway. (Paul Gortmaker)

  - Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64. (Leif Lindholm)

  - Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses. (Matt Fleming)

  - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel

  - Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module. (Ben Hutchings)

  - Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support. (Taku Izumi)

Note: there is a semantic conflict between the following two commits:

  8a53554e12 ("x86/efi: Fix multiple GOP device support")
  ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")

I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:51:34 +02:00
Wanpeng Li
dd5f5341a3 KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid
Introduce __vmx_flush_tlb() to handle specific vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:09 +02:00
Wanpeng Li
991e7a0eed KVM: VMX: adjust interface to allocate/free_vpid
Adjust allocate/free_vid so that they can be reused for the nested vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:09 +02:00
Radim Krčmář
13db77347d KVM: x86: don't notify userspace IOAPIC on edge EOI
On real hardware, edge-triggered interrupts don't set a bit in TMR,
which means that IOAPIC isn't notified on EOI.  Do the same here.

Staying in guest/kernel mode after edge EOI is what we want for most
devices.  If some bugs could be nicely worked around with edge EOI
notifications, we should invest in a better interface.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář
db2bdcbbbd KVM: x86: fix edge EOI and IOAPIC reconfig race
KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration;  thus EOI from the pending interrupt is not
recognized.

(Reconfiguration is not a problem for level interrupts, because IOAPIC
 sends interrupt with the new configuration.)

For an edge interrupt with ACK notifiers, like i8254 timer; things can
happen in this order
 1) IOAPIC inject a vector from i8254
 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
    on original VCPU gets cleared
 3) guest's handler for the vector does EOI
 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
    not in that VCPU's eoi_exit_bitmap
 5) i8254 stops working

A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.

This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated.  The simple solution also doesn't work
if we are reconfiguring the vector.  (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)

The are no races because ioapic injection and reconfig are locked.

Fixes: b053b2aef2 ("KVM: x86: Add EOI exit bitmap inference")
[Before b053b2aef2, this bug happened only with APICv.]
Fixes: c7c9c56ca2 ("x86, apicv: add virtual interrupt delivery support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář
c77f3fab44 kvm: x86: set KVM_REQ_EVENT when updating IRR
After moving PIR to IRR, the interrupt needs to be delivered manually.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Paolo Bonzini
bff98d3b01 Merge branch 'kvm-master' into HEAD
Merge more important SMM fixes.
2015-10-14 16:40:46 +02:00
Paolo Bonzini
b10d92a54d KVM: x86: fix RSM into 64-bit protected mode
In order to get into 64-bit protected mode, you need to enable
paging while EFER.LMA=1.  For this to work, CS.L must be 0.
Currently, we load the segments before CR0 and CR4, which means
that if RSM returns into 64-bit protected mode CS.L is already 1
and everything breaks.

Luckily, CS.L=0 is always the case when executing RSM, because it
is forbidden to execute RSM from 64-bit protected mode.  Hence it
is enough to load CR0 and CR4 first, and only then the segments.

Fixes: 660a5d517a
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:39:52 +02:00
Paolo Bonzini
25188b9986 KVM: x86: fix previous commit for 32-bit
Unfortunately I only noticed this after pushing.

Fixes: f0d648bdf0
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:39:25 +02:00
Ingo Molnar
c7d77a7980 Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:05:18 +02:00
Kővágó, Zoltán
8a53554e12 x86/efi: Fix multiple GOP device support
When multiple GOP devices exists, but none of them implements
ConOut, the code should just choose the first GOP (according to
the comments). But currently 'fb_base' will refer to the last GOP,
while other parameters to the first GOP, which will likely
result in a garbled display.

I can reliably reproduce this bug using my ASRock Z87M Extreme4
motherboard with CSM and integrated GPU disabled, and two PCIe
video cards (NVidia GT640 and GTX980), booting from efi-stub
(booting from grub works fine).  On the primary display the
ASRock logo remains and on the secondary screen it is garbled
up completely.

Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1444659236-24837-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:02:43 +02:00
Martin Schwidefsky
a7b7617493 mm: add architecture primitives for software dirty bit clearing
There are primitives to create and query the software dirty bits
in a pte or pmd. But the clearing of the software dirty bits is done
in common code with x86 specific page table functions.

Add the missing architecture primitives to clear the software dirty
bits to allow the feature to be used on non-x86 systems, e.g. the
s390 architecture.

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:05 +02:00
Andy Shevchenko
3435dd0809 x86/early_printk: Set __iomem address space for IO
There are following warnings on unpatched code:

arch/x86/kernel/early_printk.c:198:32: warning: incorrect type in initializer (different address spaces)
arch/x86/kernel/early_printk.c:198:32:    expected void [noderef] <asn:2>*vaddr
arch/x86/kernel/early_printk.c:198:32:    got unsigned int [usertype] *<noident>
arch/x86/kernel/early_printk.c:205:32: warning: incorrect type in initializer (different address spaces)
arch/x86/kernel/early_printk.c:205:32:    expected void [noderef] <asn:2>*vaddr
arch/x86/kernel/early_printk.c:205:32:    got unsigned int [usertype] *<noident>

Annotate it proper.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: http://lkml.kernel.org/r/1444646837-42615-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 21:45:56 +02:00
Paolo Bonzini
58f800d5ac Merge branch 'kvm-master' into HEAD
This merge brings in a couple important SMM fixes, which makes it
easier to test latest KVM with unrestricted_guest=0 and to test
the in-progress work on SMM support in the firmware.

Conflicts:
	arch/x86/kvm/x86.c
2015-10-13 21:32:50 +02:00
Linus Torvalds
6006d4521b Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - Fix AVX detection to prevent use of non-existent AESNI.

   - Some SPARC ciphers did not set their IV size which may lead to
     memory corruption"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ahash - ensure statesize is non-zero
  crypto: camellia_aesni_avx - Fix CPU feature checks
  crypto: sparc - initialize blkcipher.ivsize
2015-10-13 10:18:54 -07:00
Paolo Bonzini
7391773933 KVM: x86: fix SMI to halted VCPU
An SMI to a halted VCPU must wake it up, hence a VCPU with a pending
SMI must be considered runnable.

Fixes: 64d6067057
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:29:41 +02:00
Paolo Bonzini
5d9bc648b9 KVM: x86: clean up kvm_arch_vcpu_runnable
Split the huge conditional in two functions.

Fixes: 64d6067057
Cc: stable@vger.kernel.org
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:59 +02:00
Paolo Bonzini
f0d648bdf0 KVM: x86: map/unmap private slots in __x86_set_memory_region
Otherwise, two copies (one of them never populated and thus bogus)
are allocated for the regular and SMM address spaces.  This breaks
SMM with EPT but without unrestricted guest support, because the
SMM copy of the identity page map is all zeros.

By moving the allocation to the caller we also remove the last
vestiges of kernel-allocated memory regions (not accessible anymore
in userspace since commit b74a07beed, "KVM: Remove kernel-allocated
memory regions", 2010-06-21); that is a nice bonus.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:58 +02:00