Commit Graph

23168 Commits

Author SHA1 Message Date
Paolo Bonzini
9dbe6cf941 KVM: x86: expose MSR_TSC_AUX to userspace
If we do not do this, it is not properly saved and restored across
migration.  Windows notices due to its self-protection mechanisms,
and is very upset about it (blue screen of death).

Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
db3975717a kvm/x86: Hyper-V kvm exit
A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
5c919412fe kvm/x86: Hyper-V synthetic interrupt controller
SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
d62caabb41 kvm/x86: per-vcpu apicv deactivation support
The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Andrey Smetanin
6308630bd3 kvm/x86: split ioapic-handled and EOI exit bitmaps
The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Andrey Smetanin
abdb080f7a kvm/irqchip: kvm_arch_irq_routing_update renaming split
Actually kvm_arch_irq_routing_update() should be
kvm_arch_post_irq_routing_update() as it's called at the end
of irq routing update.

This renaming frees kvm_arch_irq_routing_update function name.
kvm_arch_irq_routing_update() weak function which will be used
to update mappings for arch-specific irq routing entries
(in particular, the upcoming Hyper-V synthetic interrupts).

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Haozhong Zhang
b2467e744f KVM: nVMX: remove incorrect vpid check in nested invvpid emulation
This patch removes the vpid check when emulating nested invvpid
instruction of type all-contexts invalidation. The existing code is
incorrect because:
 (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
     Translations Based on VPID", invvpid instruction does not check
     vpid in the invvpid descriptor when its type is all-contexts
     invalidation.
 (2) According to the same document, invvpid of type all-contexts
     invalidation does not require there is an active VMCS, so/and
     get_vmcs12() in the existing code may result in a NULL-pointer
     dereference. In practice, it can crash both KVM itself and L1
     hypervisors that use invvpid (e.g. Xen).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 15:52:55 +01:00
Paolo Bonzini
8bd142c016 KVM/ARM Fixes for v4.4-rc3.
Includes some timer fixes, properly unmapping PTEs, an errata fix, and two
 tweaks to the EL2 panic code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWVJ7gAAoJEEtpOizt6ddyD5MH/3M/nhtZTnT6v0RPDvHWJo7s
 5BQmITJYPHFkTO14OHWTVLXiGgLws8gPZnWHxC4jjHjpuJnL+/MM551FpCOqDDd7
 vweYgVlSqD8ANH5nKbv1PPnzjrqhTVN+yi3ZItXy2pxsfvu63FC6Z43B2axelLvw
 XYmHoMZaeWBBw2gHi3djGfju3Yj/2SOe+ozuvAXpxA5+NhSiPHHnMefGy5k3wKnJ
 sETwshPdjiMeK4ItfMhveFTDRjl4uh9uQyORfaa5gqG0uePt3EalYynw+gEjZ6RX
 Bpc3nLwboIfRIa/WwyoHm+nmLIUYjU8dAgLwUOIbdeG0igpdALdvsB0aBHCgngk=
 =+7ED
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/ARM Fixes for v4.4-rc3.

Includes some timer fixes, properly unmapping PTEs, an errata fix, and two
tweaks to the EL2 panic code.
2015-11-24 19:34:40 +01:00
Linus Torvalds
069ec22915 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This update contains:

   - MPX updates for handling 32bit processes

   - A fix for a long standing bug in 32bit signal frame handling
     related to FPU/XSAVE state

   - Handle get_xsave_addr() correctly in KVM

   - Fix SMAP check under paravirtualization

   - Add a comment to the static function trace entry to avoid further
     confusion about the difference to dynamic tracing"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix SMAP check in PVOPS environments
  x86/ftrace: Add comment on static function tracing
  x86/fpu: Fix get_xsave_addr() behavior under virtualization
  x86/fpu: Fix 32-bit signal frame handling
  x86/mpx: Fix 32-bit address space calculation
  x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels
2015-11-22 12:00:12 -08:00
Andrew Cooper
581b7f158f x86/cpu: Fix SMAP check in PVOPS environments
There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:07:49 +01:00
Namhyung Kim
112677d683 x86/ftrace: Add comment on static function tracing
There was a confusion between update_ftrace_function() and static
function tracing trampoline regarding 3rd parameter (ftrace_ops).
Add a comment for clarification.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1447721004-2551-1-git-send-email-namhyung@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:07:49 +01:00
Matt Gingell
62a193edaf KVM: x86: request interrupt window when IRQ chip is split
Before this patch, we incorrectly enter the guest without requesting an
interrupt window if the IRQ chip is split between user space and the
kernel.

Because lapic_in_kernel no longer implies the PIC is in the kernel, this
patch tests pic_in_kernel to determining whether an interrupt window
should be requested when entering the guest.

If the APIC is in the kernel and we request an interrupt window the
guest will return immediately. If the APIC is masked the guest will not
not make forward progress and unmask it, leading to a loop when KVM
reenters and requests again. This patch adds a check to ensure the APIC
is ready to accept an interrupt before requesting a window.

Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Matt Gingell <gingell@google.com>
[Use the other newly introduced functions. - Paolo]
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:39 +01:00
Matt Gingell
934bf65354 KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space
Set KVM_REQ_EVENT when a PIC in user space injects a local interrupt.

Currently a request is only made when neither the PIC nor the APIC is in
the kernel, which is not sufficient in the split IRQ chip case.

This addresses a problem in QEMU where interrupts are delayed until
another path invokes the event loop.

Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Matt Gingell <gingell@google.com>
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:38 +01:00
Matt Gingell
782d422bca KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection
This patch breaks out a new function kvm_vcpu_ready_for_interrupt_injection.
This routine encapsulates the logic required to determine whether a vcpu
is ready to accept an interrupt injection, which is now required on
multiple paths.

Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Matt Gingell <gingell@google.com>
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:38 +01:00
Matt Gingell
127a457acb KVM: x86: fix interrupt window handling in split IRQ chip case
This patch ensures that dm_request_for_irq_injection and
post_kvm_run_save are in sync, avoiding that an endless ping-pong
between userspace (who correctly notices that IF=0) and
the kernel (who insists that userspace handles its request
for the interrupt window).

To synchronize them, it also adds checks for kvm_arch_interrupt_allowed
and !kvm_event_needs_reinjection.  These are always needed, not
just for in-kernel LAPIC.

Signed-off-by: Matt Gingell <gingell@google.com>
[A collage of two patches from Matt. - Paolo]
Fixes: 1c1a9ce973
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-18 12:25:37 +01:00
Rafael J. Wysocki
d9f67dbc0f Merge branch 'pm-tools'
* pm-tools:
  x86: remove unused definition of MSR_NHM_PLATFORM_INFO
  tools/power turbostat: use new name for MSR_PLATFORM_INFO
2015-11-16 22:57:02 +01:00
Linus Torvalds
0ca9b67606 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
 "Mostly updates to the perf tool plus two fixes to the kernel core code:

   - Handle tracepoint filters correctly for inherited events (Peter
     Zijlstra)

   - Prevent a deadlock in perf_lock_task_context (Paul McKenney)

   - Add missing newlines to some pr_err() calls (Arnaldo Carvalho de
     Melo)

   - Print full source file paths when using 'perf annotate --print-line
     --full-paths' (Michael Petlan)

   - Fix 'perf probe -d' when just one out of uprobes and kprobes is
     enabled (Wang Nan)

   - Add compiler.h to list.h to fix 'make perf-tar-src-pkg' generated
     tarballs, i.e. out of tree building (Arnaldo Carvalho de Melo)

   - Add the llvm-src-base.c and llvm-src-kbuild.c files, generated by
     the 'perf test' LLVM entries, when running it in-tree, to
     .gitignore (Yunlong Song)

   - libbpf error reporting improvements, using a strerror interface to
     more precisely tell the user about problems with the provided
     scriptlet, be it in C or as a ready made object file (Wang Nan)

   - Do not be case sensitive when searching for matching 'perf test'
     entries (Arnaldo Carvalho de Melo)

   - Inform the user about objdump failures in 'perf annotate' (Andi
     Kleen)

   - Improve the LLVM 'perf test' entry, introduce a new ones for BPF
     and kbuild tests to check the environment used by clang to compile
     .c scriptlets (Wang Nan)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
  tools include: Add compiler.h to list.h
  perf probe: Verify parameters in two functions
  perf session: Add missing newlines to some pr_err() calls
  perf annotate: Support full source file paths for srcline fix
  perf test: Add llvm-src-base.c and llvm-src-kbuild.c to .gitignore
  perf: Fix inherited events vs. tracepoint filters
  perf: Disable IRQs across RCU RS CS that acquires scheduler lock
  perf test: Do not be case sensitive when searching for matching tests
  perf test: Add 'perf test BPF'
  perf test: Enhance the LLVM tests: add kbuild test
  perf test: Enhance the LLVM test: update basic BPF test program
  perf bpf: Improve BPF related error messages
  perf tools: Make fetch_kernel_version() publicly available
  bpf tools: Add new API bpf_object__get_kversion()
  bpf tools: Improve libbpf error reporting
  perf probe: Cleanup find_perf_probe_point_from_map to reduce redundancy
  perf annotate: Inform the user about objdump failures in --stdio
  perf stat: Make stat options global
  perf sched latency: Fix thread pid reuse issue
  ...
2015-11-15 09:36:24 -08:00
Linus Torvalds
bba072dfd7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A couple of fixes and updates related to x86:

   - Fix the W+X check regression on XEN

   - The real fix for the low identity map trainwreck

   - Probe legacy PIC early instead of unconditionally allocating legacy
     irqs

   - Add cpu verification to long mode entry

   - Adjust the cache topology to AMD Fam17H systems

   - Let Merrifield use the TSC across S3"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Call verify_cpu() after having entered long mode too
  x86/setup: Fix low identity map for >= 2GB kernel range
  x86/mm: Skip the hypervisor range when walking PGD
  x86/AMD: Fix last level cache topology for AMD Fam17h systems
  x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
  x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
2015-11-15 09:32:59 -08:00
Len Brown
5369a21e3f x86: remove unused definition of MSR_NHM_PLATFORM_INFO
MSR_NHM_PLATFORM_INFO has been replaced by...
MSR_PLATFORM_INFO

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-13 23:28:01 +01:00
Linus Torvalds
3370b69eb0 Four changes:
- x86: work around two nasty cases where a benign exception occurs while
 another is being delivered.  The endless stream of exceptions causes an
 infinite loop in the processor, which not even NMIs or SMIs can interrupt;
 in the virt case, there is no possibility to exit to the host either.
 
 - x86: support for Skylake per-guest TSC rate.  Long supported by AMD,
 the patches mostly move things from there to common arch/x86/kvm/ code.
 
 - generic: remove local_irq_save/restore from the guest entry and exit
 paths when context tracking is enabled.  The patches are a few months
 old, but we discussed them again at kernel summit.  Andy will pick up
 from here and, in 4.5, try to remove it from the user entry/exit paths.
 
 - PPC: Two bug fixes, see merge commit 370289756b for details.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWRFb0AAoJEL/70l94x66DjjMH/31jr8d119MW0uv2x+03+wRq
 6dbJ8tjQ8grvBRExKvLsUVjDmHlhCa1BQl5qjCsyYhX9UeAf4NQOmoEFpq+YTLxh
 Ctveyn+yiZWC7qxbQDmauiQ4JCOp+W9ial782iqw5+ouQMajGOffq5WrojCa2ZNF
 jI278JgdHJLrKj/uie//WBu3V7MJY5Apc3p4zatnSYFSQ3MA0sxl4r4zIrwOa5qs
 23ZeeoqbP4sHh4X5wL/30Y6XFSCHj0qoYHHyAgzLi0PCMvBdt4DrAFUPDG/Rhlv6
 o1WB/kcUfcz3DtBX85wfSOMuw0nF6patWhWv07R/3EIbYoz3dKvp9d6ORYgXqlY=
 =Um9M
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second batch of kvm updates from Paolo Bonzini:
 "Four changes:

   - x86: work around two nasty cases where a benign exception occurs
     while another is being delivered.  The endless stream of exceptions
     causes an infinite loop in the processor, which not even NMIs or
     SMIs can interrupt; in the virt case, there is no possibility to
     exit to the host either.

   - x86: support for Skylake per-guest TSC rate.  Long supported by
     AMD, the patches mostly move things from there to common
     arch/x86/kvm/ code.

   - generic: remove local_irq_save/restore from the guest entry and
     exit paths when context tracking is enabled.  The patches are a few
     months old, but we discussed them again at kernel summit.  Andy
     will pick up from here and, in 4.5, try to remove it from the user
     entry/exit paths.

   - PPC: Two bug fixes, see merge commit 370289756b for details"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: x86: rename update_db_bp_intercept to update_bp_intercept
  KVM: svm: unconditionally intercept #DB
  KVM: x86: work around infinite loop in microcode when #AC is delivered
  context_tracking: avoid irq_save/irq_restore on guest entry and exit
  context_tracking: remove duplicate enabled check
  KVM: VMX: Dump TSC multiplier in dump_vmcs()
  KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
  KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
  KVM: VMX: Enable and initialize VMX TSC scaling
  KVM: x86: Use the correct vcpu's TSC rate to compute time scale
  KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
  KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
  KVM: x86: Replace call-back compute_tsc_offset() with a common function
  KVM: x86: Replace call-back set_tsc_khz() with a common function
  KVM: x86: Add a common TSC scaling function
  KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
  KVM: x86: Collect information for setting TSC scaling ratio
  KVM: x86: declare a few variables as __read_mostly
  KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
  KVM: PPC: Book3S HV: Don't dynamically split core when already split
  ...
2015-11-12 14:34:06 -08:00
Huang Rui
41ac18ebfc perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1446630233-3166-1-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:44:25 +01:00
Huaitong Han
a05917b6ba x86/fpu: Fix get_xsave_addr() behavior under virtualization
KVM uses the get_xsave_addr() function in a different fashion from
the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
not the currently running task.

But 'xsave' is replaced with current task's (host) xsave structure, so
get_xsave_addr() will incorrectly return the bad xsave address to KVM.

Fix it so that the passed in 'xsave' address is used - as intended
originally.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
[ Tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:34:58 +01:00
Dave Hansen
ab6b529475 x86/fpu: Fix 32-bit signal frame handling
(This should have gone to LKML originally. Sorry for the extra
 noise, folks on the cc.)

Background:

Signal frames on x86 have two formats:

  1. For 32-bit executables (whether on a real 32-bit kernel or
     under 32-bit emulation on a 64-bit kernel) we have a
    'fpregset_t' that includes the "FSAVE" registers.

  2. For 64-bit executables (on 64-bit kernels obviously), the
     'fpregset_t' is smaller and does not contain the "FSAVE"
     state.

When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.

Problem:

save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable.  This is for real 32-bit and
ia32 emulation.

But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.

This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog().  But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed).  This leads to the FPU/XSAVE state being initialized.

For MPX, this leads to the most permissive state and means we
silently lose bounds violations.  I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state.  I'm not sure why
no one has spotted this bug.

I believe this was broken by:

	72a671ced6 ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

way back in 2012.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Cc: fenghua.yu@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:23:45 +01:00
Dave Hansen
f3119b8302 x86/mpx: Fix 32-bit address space calculation
I received a bug report that running 32-bit MPX binaries on
64-bit kernels was broken.  I traced it down to this little code
snippet.  We were switching our "number of bounds directory
entries" calculation correctly.  But, we didn't switch the other
side of the calculation: the virtual space size.

This meant that we were calculating an absurd size for
bd_entry_virt_space() on 32-bit because we used the 64-bit
virt_space.

This was _also_ broken for 32-bit kernels running on 64-bit
hardware since boot_cpu_data.x86_virt_bits=48 even when running
in 32-bit mode.

Correct that and properly handle all 3 possible cases:

 1. 32-bit binary on 64-bit kernel
 2. 64-bit binary on 64-bit kernel
 3. 32-bit binary on 32-bit kernel

This manifested in having bounds tables not properly unmapped.
It "leaked" memory but had no functional impact otherwise.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:20:37 +01:00
Dave Hansen
46561c3959 x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels
When you call get_user(foo, bar), you effectively do a

	copy_from_user(&foo, bar, sizeof(*bar));

Note that the sizeof() is implicit.

When we reach out to userspace to try to zap an entire "bounds
table" we need to go read a "bounds directory entry" in order to
locate the table's address.  The size of a "directory entry"
depends on the binary being run and is always the size of a
pointer.

But, when we have a 64-bit kernel and a 32-bit application, the
directory entry is still only 32-bits long, but we fetch it with
a 64-bit pointer which makes get_user() does a 64-bit fetch.
Reading 4 extra bytes isn't harmful, unless we are at the end of
and run off the table.  It might also cause the zero page to get
faulted in unnecessarily even if you are not at the end.

Fix it up by doing a special 32-bit get_user() via a cast when
we have 32-bit userspace.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:20:37 +01:00
Linus Torvalds
4bde961e52 Merge branch 'for-linus-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - a new hrtimer based clocksource by Anton Ivanov

 - ptrace() enhancments by Richard Weinberger

 - random cleanups and bug fixes all over the place

* 'for-linus-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Switch clocksource to hrtimers
  um: net: replace GFP_KERNEL with GFP_ATOMIC when spinlock is held
  um: Report host OOM more nicely
  um: Simplify STUB_DATA loading
  um: Remove dead symbol from i386 syscall stub
  um: Remove dead code from x86_64 syscall stub
  um: Get rid of open coded NR_SYSCALLS
  um: Store syscall number after syscall_trace_enter()
  um: Define PTRACE_OLDSETOPTIONS
2015-11-10 16:33:37 -08:00
Linus Torvalds
264015f8a8 libnvdimm for 4.4:
1/ Add support for the ACPI 6.0 NFIT hot add mechanism to process
    updates of the NFIT at runtime.
 
 2/ Teach the coredump implementation how to filter out DAX mappings.
 
 3/ Introduce NUMA hints for allocations made by the pmem driver, and as
    a side effect all devm allocations now hint their NUMA node by
    default.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQX2sAAoJEB7SkWpmfYgCWsEQAK7w/xM9zClVY/DDlFJxFtYq
 DZJ4faPj+E3FMTiJIEDzjtRgQvOFE+wmJtntYsCqKH/QZmpnyk9jeT/CbJzEEL2k
 WsAk+qHGLcVUlSb36blwN1RFzYqC+IDYThewJqUvxDbOwL1AbiibbX7gplzZHLhW
 +rj3ScVlSNOPRDgGGpkAeLNNsttuKvsGo7nB/JZopm0tV6g14rSK09wQbVhv6S6T
 Lu7VGYqnJlkteL9YlzRiROf9hW2ZFCMGJz1YZydPTy3aX3hGTBX4w2qvmsPwBIKP
 kW/gCNisVJGk1cZCk4joSJ8i/b3x3fE0zdZ5waivJ5jDvYbUUfyk0KtJkfw207Rl
 14yWitUC6aeVuCeOqXHgsjRi+1QVN9Pg7i49xgGiUN1igQiUYRTgQPWZxDv6Zo/s
 USrLFQBaRd+hJw+dl7A47lJ3mUF96tPCoQb4LCQ7DVsg5U4J2TvqXLH9Gek/CCZ4
 QsMkZDTQlZw4+JEDlzBgg/L7xVty8DadplTADMdjaRhFU3y8zKNJ85Ileokt7KVt
 IsBT4+S5HeZLvinZY95932DwAmFp1DtsyENd1BUXL06ddyvlQrFJ6NQaXji4fuDc
 EVQmMoTAqDujZFupMAux9vkUBDFj/hmaVD5F7j3+MWP87OCritw/IZn+2LgTaKoX
 EmttaYrDr2jJwIaGyw+H
 =a2/L
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "Outside of the new ACPI-NFIT hot-add support this pull request is more
  notable for what it does not contain, than what it does.  There were a
  handful of development topics this cycle, dax get_user_pages, dax
  fsync, and raw block dax, that need more more iteration and will wait
  for 4.5.

  The patches to make devm and the pmem driver NUMA aware have been in
  -next for several weeks.  The hot-add support has not, but is
  contained to the NFIT driver and is passing unit tests.  The coredump
  support is straightforward and was looked over by Jeff.  All of it has
  received a 0day build success notification across 107 configs.

  Summary:

   - Add support for the ACPI 6.0 NFIT hot add mechanism to process
     updates of the NFIT at runtime.

   - Teach the coredump implementation how to filter out DAX mappings.

   - Introduce NUMA hints for allocations made by the pmem driver, and
     as a side effect all devm allocations now hint their NUMA node by
     default"

* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  coredump: add DAX filtering for FDPIC ELF coredumps
  coredump: add DAX filtering for ELF coredumps
  acpi: nfit: Add support for hot-add
  nfit: in acpi_nfit_init, break on a 0-length table
  pmem, memremap: convert to numa aware allocations
  devm_memremap_pages: use numa_mem_id
  devm: make allocations numa aware by default
  devm_memremap: convert to return ERR_PTR
  devm_memunmap: use devres_release()
  pmem: kill memremap_pmem()
  x86, mm: quiet arch_add_memory()
2015-11-10 12:07:22 -08:00
Paolo Bonzini
a96036b8ef KVM: x86: rename update_db_bp_intercept to update_bp_intercept
Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:25 +01:00
Paolo Bonzini
cbdb967af3 KVM: svm: unconditionally intercept #DB
This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.

Reported-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:24 +01:00
Eric Northup
54a20552e1 KVM: x86: work around infinite loop in microcode when #AC is delivered
It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions.  This causes the
microcode to enter an infinite loop where the core never receives
another interrupt.  The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).

Signed-off-by: Eric Northup <digitaleric@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:24 +01:00
Haozhong Zhang
8cfe986696 KVM: VMX: Dump TSC multiplier in dump_vmcs()
This patch enhances dump_vmcs() to dump the value of TSC multiplier
field in VMCS.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:22 +01:00
Haozhong Zhang
be7b263ea9 KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:21 +01:00
Haozhong Zhang
ff2c3a1803 KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:20 +01:00
Haozhong Zhang
64903d6195 KVM: VMX: Enable and initialize VMX TSC scaling
This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:19 +01:00
Haozhong Zhang
27cca94e03 KVM: x86: Use the correct vcpu's TSC rate to compute time scale
This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:18 +01:00
Haozhong Zhang
4ba76538dd KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:18 +01:00
Haozhong Zhang
58ea676787 KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:17 +01:00
Haozhong Zhang
07c1419a32 KVM: x86: Replace call-back compute_tsc_offset() with a common function
Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:16 +01:00
Haozhong Zhang
381d585c80 KVM: x86: Replace call-back set_tsc_khz() with a common function
Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:16 +01:00
Haozhong Zhang
35181e86df KVM: x86: Add a common TSC scaling function
VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr.  Remove
 BUG_ON.  - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:15 +01:00
Haozhong Zhang
ad721883e9 KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:14 +01:00
Haozhong Zhang
bc9b961b35 KVM: x86: Collect information for setting TSC scaling ratio
The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:14 +01:00
Paolo Bonzini
893590c734 KVM: x86: declare a few variables as __read_mostly
These include module parameters and variables that are set by
kvm_x86_ops->hardware_setup.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:13 +01:00
Paolo Bonzini
450869d6db KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
They are exactly the same, except that handle_mmio_page_fault
has an unused argument and a call to WARN_ON.  Remove the unused
argument from the callers, and move the warning to (the former)
handle_mmio_page_fault_common.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:03 +01:00
Nicolas Pitre
77c5b5da02 kmap_atomic_to_page() has no users, remove it
Removal started in commit 5bbeed12bd ("sparc32: drop unused
kmap_atomic_to_page").  Let's do it across the whole tree.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Dan Williams
85ce230051 Merge branch 'for-4.4/hotplug' into libnvdimm-for-next 2015-11-09 13:29:39 -05:00
Linus Torvalds
ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds
99aaa9c64b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
 "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
  (as in such case it's possible for module struct to share a page with
  executable text, which is currently not being handled with grace) from
  Josh Poimboeuf"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
2015-11-07 12:15:17 -08:00
Borislav Petkov
04633df0c4 x86/cpu: Call verify_cpu() after having entered long mode too
When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

  # rdmsr -a 0x1a0
  400850089
  850089
  850089
  850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.

Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:45:02 +01:00
Krzysztof Mazur
68accac392 x86/setup: Fix low identity map for >= 2GB kernel range
The commit f5f3497cad extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).

Fixes: f5f3497cad "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:39:40 +01:00